CN102508790B - Content-based cache method applied to content analysis storage - Google Patents

Content-based cache method applied to content analysis storage Download PDF

Info

Publication number
CN102508790B
CN102508790B CN201110365027.7A CN201110365027A CN102508790B CN 102508790 B CN102508790 B CN 102508790B CN 201110365027 A CN201110365027 A CN 201110365027A CN 102508790 B CN102508790 B CN 102508790B
Authority
CN
China
Prior art keywords
cache
buffer
read
cryptographic hash
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110365027.7A
Other languages
Chinese (zh)
Other versions
CN102508790A (en
Inventor
肖利民
龚韬
赵国玉
李秀桥
阮利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201110365027.7A priority Critical patent/CN102508790B/en
Publication of CN102508790A publication Critical patent/CN102508790A/en
Application granted granted Critical
Publication of CN102508790B publication Critical patent/CN102508790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a content-based cache method applied to content analysis storage, which particularly includes the steps: embedding a cache module with cache content serving as a hash code and as a cache index into a CAS (content analysis storage) file system, and utilizing read-write operation of the cache module to replace original disk operation. When the CAS file system initiates disk read-write operation, the cache module firstly checks whether a corresponding data block is cached or not, and the data block can be directly called out for use from a cache region inside the cache module if the data block is cached; and when checked data are not in the cache region, the cache module initiates practical read-write operation so as to decrease practical read-write times. The cache region is a storage region defined in the cache module and consists of a plurality of cache units, and each cache unit can be used for caching one data block. The method can provide a caching mechanism used in the CAS file system so that the performance of the CAS file system is effectively improved.

Description

A kind of content-based caching method that is applied to Context resolution storage
Technical field
The present invention relates to the caching method in a kind of computer memory system, be specifically related to a kind of content-based caching method that is applied to Context resolution storage; Belong to computer memory system field.
Background technology
In desktop virtual product, large-scale virtual machine mirrored storage has been brought the problem of Data duplication storage at present, has increased the storage space pressure of shared memory systems, and Context resolution memory technology (CAS) is used to solve the problem of Data duplication storage.It stores repeating data by the content similarity that detects data to merge, and can avoid the repeatedly storage of repeating data, reaches the object that reduces virtual machine image storage overhead.In reducing storage overhead, can not cause significant impact to the performance of virtual machine.And optimization method should be transparent concerning monitor of virtual machine and VM operating system.
Hash function is a kind of method that creates little numeral " fingerprint " from any data.Data are upset mixing by this function, re-creates a fingerprint that is called cryptographic hash.In CAS, by file content piecemeal being calculated to corresponding cryptographic hash storage.
CAS technology carries out calculating cryptographic hash sequence after piecemeal to file, judges repeating data by cryptographic hash sequence.The data block identical for content only retains portion, and the number of times that record correspondence is shared in each point of block file.For the file that adopts CAS mode to store, original file actual storage be the information such as cryptographic hash sequence and file data size of each piecemeal of this file.Data division corresponding to Hash sequence is kept at shared memory systems the inside.
Because of computing machine processing speed and memory speed inconsistent, in computer system, exist a kind of caching mechanism.The storage speed of buffer memory is higher than the speed of the storer of practical operation, be buffered in the effect of playing a buffering between processor and actual storage, in the time that processor need to ceaselessly be read and write identical memory content, these data can be temporary in buffer memory, to improve readwrite performance.
Existing caching method is all using memory address as buffer memory index, this method encounters problems in the time being applied in CAS file system, because of in CAS file system, what in file address, preserve is cryptographic hash instead of the content of content, and the buffer memory taking cryptographic hash as index can not well improve file system performance.The present invention improves the performance of CAS file system with regard to being to provide a kind of caching method in CAS file system
Summary of the invention
The technical problem to be solved in the present invention is that a kind of caching mechanism is provided in CAS file system, reduces actual file operation, to improve the performance of CAS.General usage data address is as the caching mechanism of buffer memory index and be not suitable for CAS, and the present invention uses the cryptographic hash of cache contents as buffer memory index, can effectively improve the performance of CAS.
Realize said method in order to reach, technical scheme of the present invention is such:
A content-based caching method that is applied to Context resolution storage, specifically comprises following content:
In CAS file system, embed one and make the cache module of cryptographic hash as buffer memory index using cache contents, replace original disk operating with the read-write operation of cache module.In the time that CAS file system is initiated disk read-write operation, cache module first checks whether corresponding data piece is buffered, can directly recall use from the buffer area of cache module inside as being buffered; In the time checking data not at buffer area, then initiate actual read-write operation by cache module, to reduce actual read-write number of times.Buffer area is the memory block being defined in cache module, is made up of multiple buffer units, and each buffer unit can data block of buffer memory.
Wherein, for the read operation of CAS file, CAS need to read data block corresponding to cryptographic hash sequence from shared memory, at this moment, in cache module, first calls cache_read () and reads caching.If success has read from the buffer area of self data block needing, directly return to CAS.Otherwise, initiate disk read operation and read shared memory, obtain after data block, by cache_write (), data block is write to buffer area, then return to upper strata.
Wherein, for CAS file write operation, in the time that CAS initiates write operation, CAS need to write to shared memory the mapping of a cryptographic hash sequence and data block, now cache module is equally taking cryptographic hash sequence as index, call the buffer area that cache_write () is saved in data block oneself, then initiate actual disk write operation and write shared memory.
Cache_read () is the read operation of a buffer area, finds corresponding buffer unit by the buffer memory index importing at buffer area, if cache hit copies the data block in buffer unit, returns to failure flags if do not hit.Cache_write () is the write operation of a buffer area, calculates index value by data block, and finds buffer unit at buffer area and preserve.
A kind of content-based caching method that is applied to Context resolution storage of the present invention, its advantage and effect are to provide a kind of caching mechanism using in CAS file system, effectively improve the performance of CAS file system.
Brief description of the drawings
Fig. 1 is the structural drawing with caching mechanism CAS file system
Fig. 2 is the CAS read operation flow process with buffer memory
Fig. 3 is the CAS write operation flow process with buffer memory
Fig. 4 is the structure of a buffer unit
Fig. 5 is buffer area structure
Fig. 6 is the operating process of cache_read ()
Fig. 7 is the operating process of cache_write ()
Embodiment
For making the object, technical solutions and advantages of the present invention express clearlyer, below in conjunction with drawings and the specific embodiments, the present invention is further described in more detail.
As shown in Figure 1, a kind of content-based caching method that is applied to Context resolution storage of the present invention, is in CAS file system, to embed a cache module, replaces original disk operating with the read-write operation of cache module.In the time checking data not at buffer area, then operated by cache module initiation disk read-write, to reduce disk read-write number of times.
Wherein: as shown in Figure 2, in the time that CAS initiates read operation.The cache_read () that cache module calls oneself reads caching.If success has read from buffer area the data that need, directly return to CAS.Otherwise, initiate disk read operation, obtain after data, write buffer area by cache_write (), then return to CAS.
As shown in Figure 3, in the time that CAS initiates write operation, cache module calls cache_write () and records data to buffer memory, then initiates disk write operation.
Fig. 4 has described the data structure of a buffer unit.Comprise the region of storing a cryptographic hash sequence, one represents the integer of buffer unit vital values and the region of storage data block.
Fig. 5 has explained the syndeton between buffer unit in buffer area.A group has represented the identical buffer area of a class buffer memory index.Buffer memory index is to be calculated by cryptographic hash sequence.In the time of a buffer unit of addressing, first calculate buffer memory index by cryptographic hash sequence, find corresponding group, then in group, determine a buffer unit by the method for traversal.
Inner at cache_read (), need to read caching.As shown in Figure 6, first calculate buffer memory index by cryptographic hash sequence, traversal and cryptographic hash sequence alignment in group corresponding to index, if find identical cryptographic hash sequence, think cache hit, the content in return cache cell data district.And replacement buffer unit vital values.If all do not hit, think that buffer memory does not hit, the buffer unit vital values in group is reduced to a unit.Rreturn value is representing whether buffer memory is hit.
Inner at cache_write (), caching need to keep a record.As shown in Figure 7, first calculate buffer memory index by cryptographic hash sequence, traversal and cryptographic hash sequence alignment in group corresponding to index, if find identical cryptographic hash sequence, think buffer memory by, do not need to do other operations.If do not hit, search the buffer unit of vital values minimum, by it replacement.Replacement operation comprises copy data district, copies the operation of cryptographic hash sequence and replacement vital values.
Which buffer unit is vital values be replaced at for determining.In the time of the firm record data of buffer unit, be reset (for maximal value), after, be traversed each time then from subtracting a unit.Be considered to no longer valid from reducing to 0 rear buffer unit.Replacement operation is searched the buffer unit of vital values minimum, and it is replaced to new data.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to above-described embodiment, those of ordinary skill in the art is to be understood that: still can modify or be equal to replacement the present invention, and not departing from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of claim scope of the present invention.

Claims (1)

1. one kind is applied to the content-based caching method of Context resolution storage, specifically comprise following content: in CAS file system, embed one and make the cache module of cryptographic hash as buffer memory index using cache contents, replace original disk operating with the read-write operation of cache module; In the time that CAS file system is initiated disk read-write operation, cache module first checks whether corresponding data piece is buffered, can directly recall use from the buffer area of cache module inside as being buffered; In the time checking data not at buffer area, then initiate actual read-write operation by cache module, to reduce actual read-write number of times; Described buffer area is the memory block being defined in cache module, is made up of multiple buffer units, and each buffer unit can data block of buffer memory;
Wherein, for the read operation of CAS file, CAS need to read data block corresponding to cryptographic hash sequence from shared memory, at this moment, in cache module, first calls cache_read () and reads caching; If success has read from the buffer area of self data block needing, directly return to CAS; Otherwise, initiate disk read operation and read shared memory, obtain after data block, by cache_write (), data block is write to buffer area, then return to upper strata;
Wherein, for CAS file write operation, in the time that CAS initiates write operation, CAS need to write to shared memory the mapping of a cryptographic hash sequence and data block, now cache module is equally taking cryptographic hash sequence as index, call the buffer area that cache_write () is saved in data block oneself, then initiate actual disk write operation and write shared memory;
Wherein, described cache_read () is the read operation of a buffer area, finds corresponding buffer unit by the buffer memory index importing at buffer area, if cache hit copies the data block in buffer unit, returns to failure flags if do not hit; Described cache_write () is the write operation of a buffer area, calculates index value by data block, and finds buffer unit at buffer area and preserve;
Buffer memory index is to be calculated by cryptographic hash sequence; In the time of a buffer unit of addressing, first calculate buffer memory index by cryptographic hash sequence, find corresponding group, then in group, determine a buffer unit by the method for traversal;
Inner at cache_read (), need to read caching; First calculate buffer memory index by cryptographic hash sequence, traversal and cryptographic hash sequence alignment in group corresponding to index, if find identical cryptographic hash sequence, think cache hit, the content in return cache cell data district; And replacement buffer unit vital values; If all do not hit, think that buffer memory does not hit, the buffer unit vital values in group is reduced to a unit; Rreturn value is representing whether buffer memory is hit;
Inner at cache_write (), caching need to keep a record; First calculate buffer memory index by cryptographic hash sequence, traversal and cryptographic hash sequence alignment in group corresponding to index, if find identical cryptographic hash sequence, think buffer memory by, do not need to do other operations; If do not hit, search the buffer unit of vital values minimum, by it replacement; Replacement operation comprises copy data district, copies the operation of cryptographic hash sequence and replacement vital values;
Which buffer unit is vital values be replaced at for determining; In the time of the firm record data of buffer unit, be reset, after, be traversed each time then from subtracting a unit; Be considered to no longer valid from reducing to 0 rear buffer unit; Replacement operation is searched the buffer unit of vital values minimum, and it is replaced to new data.
CN201110365027.7A 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage Active CN102508790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110365027.7A CN102508790B (en) 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110365027.7A CN102508790B (en) 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage

Publications (2)

Publication Number Publication Date
CN102508790A CN102508790A (en) 2012-06-20
CN102508790B true CN102508790B (en) 2014-08-13

Family

ID=46220881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110365027.7A Active CN102508790B (en) 2011-11-17 2011-11-17 Content-based cache method applied to content analysis storage

Country Status (1)

Country Link
CN (1) CN102508790B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034684A (en) * 2012-11-27 2013-04-10 北京航空航天大学 Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage)
CN103095686B (en) * 2012-12-19 2016-06-08 华为技术有限公司 Focus metadata access control method and service device
CN108537719B (en) * 2018-03-26 2021-10-19 上海交通大学 System and method for improving performance of general graphic processor
US11249915B2 (en) * 2020-01-09 2022-02-15 Vmware, Inc. Content based cache failover

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611898B1 (en) * 2000-12-22 2003-08-26 Convergys Customer Management Group, Inc. Object-oriented cache management system and method
CN101887398A (en) * 2010-06-25 2010-11-17 浪潮(北京)电子信息产业有限公司 Method and system for dynamically enhancing input/output (I/O) throughput of server

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611898B1 (en) * 2000-12-22 2003-08-26 Convergys Customer Management Group, Inc. Object-oriented cache management system and method
CN101887398A (en) * 2010-06-25 2010-11-17 浪潮(北京)电子信息产业有限公司 Method and system for dynamically enhancing input/output (I/O) throughput of server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩德志等.固定内容存储所涉及的关键技术.《高可用存储网络关键技术的研究》.科学出版社,2009,(第1版),153-154. *

Also Published As

Publication number Publication date
CN102508790A (en) 2012-06-20

Similar Documents

Publication Publication Date Title
CN103049222B (en) A kind of RAID5 writes IO optimized treatment method
CN110825748B (en) High-performance and easily-expandable key value storage method by utilizing differentiated indexing mechanism
CN105574104B (en) A kind of LogStructure storage system and its method for writing data based on ObjectStore
CN102609360B (en) Data processing method, data processing device and data processing system
JP6713934B2 (en) Storage device and its operating method and system
CN107203411B (en) Virtual machine memory expansion method and system based on remote SSD
CN103885728A (en) Magnetic disk cache system based on solid-state disk
CN103631536B (en) A kind of method utilizing the invalid data of SSD to optimize RAID5/6 write performance
CN102521330A (en) Mirror distributed storage method under desktop virtual environment
CN102508790B (en) Content-based cache method applied to content analysis storage
CN110196757A (en) TLB filling method, device and the storage medium of virtual machine
CN103916459A (en) Big data filing and storing system
CN106933494A (en) Mix the operating method and device of storage device
CN103049224A (en) Method, device and system for importing data into physical tape
CN103942161A (en) Redundancy elimination system and method for read-only cache and redundancy elimination method for cache
US11263180B2 (en) Method for facilitating recovery from crash of solid-state storage device, method of data synchronization, computer system, and solid-state storage device
US11379326B2 (en) Data access method, apparatus and computer program product
CN101808141B (en) Host and client cooperated page swapping method based on virtualized platform
US8151053B2 (en) Hierarchical storage control apparatus, hierarchical storage control system, hierarchical storage control method, and program for controlling storage apparatus having hierarchical structure
CN111443874B (en) Solid-state disk memory cache management method and device based on content awareness and solid-state disk
US20230142948A1 (en) Techniques for managing context information for a storage device
Lv et al. Zonedstore: A concurrent zns-aware cache system for cloud data storage
CN110659305A (en) High performance relational database service based on non-volatile storage system
WO2024021487A1 (en) Data processing method and apparatus, and computer device and storage medium
TWI522805B (en) Method for performing cache management in a storage system, and associated apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Xiao Limin

Inventor after: Gong Tao

Inventor after: Zhao Guoyu

Inventor after: Li Xiuqiao

Inventor after: Ruan Li

Inventor before: Gong Tao

Inventor before: Xiao Limin

Inventor before: Zhao Guoyu

Inventor before: Li Xiuqiao

Inventor before: Ruan Li

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: GONG TAO XIAO LIMIN ZHAO GUOYU LI XIUQIAO RUAN LI TO: XIAO LIMIN GONG TAO ZHAO GUOYU LI XIUQIAO RUAN LI

C14 Grant of patent or utility model
GR01 Patent grant