A kind of content-based caching method that is applied to Context resolution storage
Technical field
The present invention relates to the caching method in a kind of computer memory system, be specifically related to a kind of content-based caching method that is applied to Context resolution storage; Belong to computer memory system field.
Background technology
In desktop virtual product, large-scale virtual machine mirrored storage has been brought the problem of Data duplication storage at present, has increased the storage space pressure of shared memory systems, and Context resolution memory technology (CAS) is used to solve the problem of Data duplication storage.It stores repeating data by the content similarity that detects data to merge, and can avoid the repeatedly storage of repeating data, reaches the object that reduces virtual machine image storage overhead.In reducing storage overhead, can not cause significant impact to the performance of virtual machine.And optimization method should be transparent concerning monitor of virtual machine and VM operating system.
Hash function is a kind of method that creates little numeral " fingerprint " from any data.Data are upset mixing by this function, re-creates a fingerprint that is called cryptographic hash.In CAS, by file content piecemeal being calculated to corresponding cryptographic hash storage.
CAS technology carries out calculating cryptographic hash sequence after piecemeal to file, judges repeating data by cryptographic hash sequence.The data block identical for content only retains portion, and the number of times that record correspondence is shared in each point of block file.For the file that adopts CAS mode to store, original file actual storage be the information such as cryptographic hash sequence and file data size of each piecemeal of this file.Data division corresponding to Hash sequence is kept at shared memory systems the inside.
Because of computing machine processing speed and memory speed inconsistent, in computer system, exist a kind of caching mechanism.The storage speed of buffer memory is higher than the speed of the storer of practical operation, be buffered in the effect of playing a buffering between processor and actual storage, in the time that processor need to ceaselessly be read and write identical memory content, these data can be temporary in buffer memory, to improve readwrite performance.
Existing caching method is all using memory address as buffer memory index, this method encounters problems in the time being applied in CAS file system, because of in CAS file system, what in file address, preserve is cryptographic hash instead of the content of content, and the buffer memory taking cryptographic hash as index can not well improve file system performance.The present invention improves the performance of CAS file system with regard to being to provide a kind of caching method in CAS file system
Summary of the invention
The technical problem to be solved in the present invention is that a kind of caching mechanism is provided in CAS file system, reduces actual file operation, to improve the performance of CAS.General usage data address is as the caching mechanism of buffer memory index and be not suitable for CAS, and the present invention uses the cryptographic hash of cache contents as buffer memory index, can effectively improve the performance of CAS.
Realize said method in order to reach, technical scheme of the present invention is such:
A content-based caching method that is applied to Context resolution storage, specifically comprises following content:
In CAS file system, embed one and make the cache module of cryptographic hash as buffer memory index using cache contents, replace original disk operating with the read-write operation of cache module.In the time that CAS file system is initiated disk read-write operation, cache module first checks whether corresponding data piece is buffered, can directly recall use from the buffer area of cache module inside as being buffered; In the time checking data not at buffer area, then initiate actual read-write operation by cache module, to reduce actual read-write number of times.Buffer area is the memory block being defined in cache module, is made up of multiple buffer units, and each buffer unit can data block of buffer memory.
Wherein, for the read operation of CAS file, CAS need to read data block corresponding to cryptographic hash sequence from shared memory, at this moment, in cache module, first calls cache_read () and reads caching.If success has read from the buffer area of self data block needing, directly return to CAS.Otherwise, initiate disk read operation and read shared memory, obtain after data block, by cache_write (), data block is write to buffer area, then return to upper strata.
Wherein, for CAS file write operation, in the time that CAS initiates write operation, CAS need to write to shared memory the mapping of a cryptographic hash sequence and data block, now cache module is equally taking cryptographic hash sequence as index, call the buffer area that cache_write () is saved in data block oneself, then initiate actual disk write operation and write shared memory.
Cache_read () is the read operation of a buffer area, finds corresponding buffer unit by the buffer memory index importing at buffer area, if cache hit copies the data block in buffer unit, returns to failure flags if do not hit.Cache_write () is the write operation of a buffer area, calculates index value by data block, and finds buffer unit at buffer area and preserve.
A kind of content-based caching method that is applied to Context resolution storage of the present invention, its advantage and effect are to provide a kind of caching mechanism using in CAS file system, effectively improve the performance of CAS file system.
Brief description of the drawings
Fig. 1 is the structural drawing with caching mechanism CAS file system
Fig. 2 is the CAS read operation flow process with buffer memory
Fig. 3 is the CAS write operation flow process with buffer memory
Fig. 4 is the structure of a buffer unit
Fig. 5 is buffer area structure
Fig. 6 is the operating process of cache_read ()
Fig. 7 is the operating process of cache_write ()
Embodiment
For making the object, technical solutions and advantages of the present invention express clearlyer, below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail.
As shown in Figure 1, a kind of content-based caching method that is applied to Context resolution storage of the present invention, is in CAS file system, to embed a cache module, replaces original disk operating with the read-write operation of cache module.In the time checking data not at buffer area, then operated by cache module initiation disk read-write, to reduce disk read-write number of times.
Wherein: as shown in Figure 2, in the time that CAS initiates read operation.The cache_read () that cache module calls oneself reads caching.If success has read from buffer area the data that need, directly return to CAS.Otherwise, initiate disk read operation, obtain after data, write buffer area by cache_write (), then return to CAS.
As shown in Figure 3, in the time that CAS initiates write operation, cache module calls cache_write () and records data to buffer memory, then initiates disk write operation.
Fig. 4 has described the data structure of a buffer unit.Comprise the region of storing a cryptographic hash sequence, one represents the integer of buffer unit vital values and the region of storage data block.
Fig. 5 has explained the syndeton between buffer unit in buffer area.A group has represented the identical buffer area of a class buffer memory index.Buffer memory index is to be calculated by cryptographic hash sequence.In the time of a buffer unit of addressing, first calculate buffer memory index by cryptographic hash sequence, find corresponding group, then in group, determine a buffer unit by the method for traversal.
Inner at cache_read (), need to read caching.As shown in Figure 6, first calculate buffer memory index by cryptographic hash sequence, traversal and cryptographic hash sequence alignment in group corresponding to index, if find identical cryptographic hash sequence, think cache hit, the content in return cache cell data district.And replacement buffer unit vital values.If all do not hit, think that buffer memory does not hit, the buffer unit vital values in group is reduced to a unit.Rreturn value is representing whether buffer memory is hit.
Inner at cache_write (), caching need to keep a record.As shown in Figure 7, first calculate buffer memory index by cryptographic hash sequence, traversal and cryptographic hash sequence alignment in group corresponding to index, if find identical cryptographic hash sequence, think buffer memory by, do not need to do other operations.If do not hit, search the buffer unit of vital values minimum, by it replacement.Replacement operation comprises copy data district, copies the operation of cryptographic hash sequence and replacement vital values.
Which buffer unit is vital values be replaced at for determining.In the time of the firm record data of buffer unit, be reset (for maximal value), after, be traversed each time then from subtracting a unit.Be considered to no longer valid from reducing to 0 rear buffer unit.Replacement operation is searched the buffer unit of vital values minimum, and it is replaced to new data.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to above-described embodiment, those of ordinary skill in the art is to be understood that: still can modify or be equal to replacement the present invention, and not departing from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of claim scope of the present invention.