US20060259527A1 - Changed files list with time buckets for efficient storage management - Google Patents

Changed files list with time buckets for efficient storage management Download PDF

Info

Publication number
US20060259527A1
US20060259527A1 US11/128,781 US12878105A US2006259527A1 US 20060259527 A1 US20060259527 A1 US 20060259527A1 US 12878105 A US12878105 A US 12878105A US 2006259527 A1 US2006259527 A1 US 2006259527A1
Authority
US
United States
Prior art keywords
time
change
timestamp
objects
buckets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/128,781
Inventor
Murthy Devarakonda
Frank Filz
Marc Kaplan
James Seeger
Jason Young
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US11/128,781 priority Critical patent/US20060259527A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FILZ, FRANK STEWART, KAPLAN, MARC ADAM, SEEGER, JAMES JOHN, JR., YOUNG, JASON C., DEVARAKONDA, MURTHY V.
Publication of US20060259527A1 publication Critical patent/US20060259527A1/en
Priority to US12/061,323 priority patent/US8548965B2/en
Priority to US13/937,901 priority patent/US20130297610A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Definitions

  • the present invention relates generally to data file storage systems and, more particularly, to a changed files list with time buckets for efficient storage management.
  • journal based backup feature is (potentially) unbounded (or until it breaks). That is, every change is recorded in the journal and so the journal keeps growing at a rate that is proportional to the rate of file system change.
  • the journal is periodically processed and trimmed by the storage management subsystem(s).
  • the rate and amount of change can outpace the storage capacity of the journal and/or the processing cycles allocated to the storage management subsystem(s). When this “breakage” occurs, change information is lost. The management system then has to resort to a traditional full metadata scan.
  • the present invention may be implemented, e.g., as an apparatus, a method, and a computer program product.
  • an apparatus for managing object data includes a changed objects manager for creating and managing a changed objects list that at least identifies the objects that have changed based on time of change.
  • the changed objects list is associated with a plurality of time buckets.
  • Each of the plurality of time buckets is associated with a respective date and time period and with object change records for objects having a timestamp falling within the respective date and time period.
  • Each of the object change records is associated with a unique object identifier and the timestamp for a corresponding one of the objects.
  • the timestamp specifies a date and a time corresponding to a latest one of a creation time or a most recent update time for the corresponding one of the objects.
  • FIG. 1 is a block diagram illustrating an exemplary computer processing system to which the present invention may be applied, in accordance with the principles of the present invention
  • FIG. 2 is a block diagram illustrating an exemplary data storage management (DSM) system in accordance with the principles of the present invention
  • FIG. 3 is a flow diagram illustrating an exemplary process for updating a changed files list in accordance with the principles of the present invention.
  • FIG. 4 is a flow diagram illustrating an exemplary process for using a changed files list with time buckets in accordance with the principles of the present invention.
  • the present invention is directed to a changed files list with time buckets for efficient storage management. It is to be appreciated that while the present invention is primarily described herein with respect to files, the present invention may be implemented with respect to any set of objects within and processed by a computer processing system. Moreover, the present invention is particularly suited to a set of managed objects most of which do not change during a given period of time, but where it is desired to concisely track which ones of the objects have changed.
  • the present invention is useful within computerized data file storage systems for efficiently selecting files that have been accessed recently, where such files are typically the primary subjects of data management tasks or jobs.
  • files are typically the primary subjects of data management tasks or jobs.
  • a “changed file list” is a persistent data structure with just one short file-change record for each file.
  • the changed file list is (conceptually) partitioned into time buckets. For illustrative purposes, consider that there is a bucket for every hour of every day. Of course, it is to be appreciated that different granularities of time could be chosen, as described herein below, while maintaining the scope of the present invention.
  • the file system is augmented such that each time the metadata of a file f is updated, the current date and time of day (t_now) is compared with the timestamp representing the last metadata change (t_prev) of file f.
  • a storage management process that runs occasionally such as, e.g., a backup job, normally should need to only consider and process files that have changed since the previous run. Knowing the hour and date of the last run, the storage management process can readily determine which files have been changed (and/or whose metadata has changed) by just reading the file-change records included within the time buckets representing the hours between now and then. Since a bucket holds file-change records covering a whole hour, the storage management process may consider some files in buckets that represent the hour(s) during which its previous run occurred. By reading the complete metadata for just those files, it can determine which ones need to be processed. However, the vast majority of unchanged files will be represented by old buckets and can be completely ignored by the storage management process.
  • the phrase “at least one”, when used to refer to more than one object refers to one of A or one of B or one of A and one of B.
  • Embodiments of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment including both hardware and software elements.
  • the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • a computer-usable or computer-readable medium can be any apparatus that may include, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • a data-processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • the computer processing system 100 includes at least one processor (CPU) 102 connected in signal communication with other components via a system bus 104 .
  • a read only memory (ROM) 106 , a random access memory (RAM) 108 , a display adapter 110 , an I/O adapter 112 , a user interface adapter 114 , a sound adapter 170 , and a network adapter 198 are each connected in signal communication with the system bus 104 .
  • the CPU 102 may include one or more “on-board” caches (e.g., L1 caches) (hereinafter “L1 cache”) 166 . Moreover, the CPU may be in signal communication with one or more “external caches” (e.g., disk caches and RAM caches) (hereinafter “disk cache” 167 and “RAM cache 168”). Further, the CPU 102 may also be in signal communication with one or more other “external caches” (e.g., on a chip other than the CPU and RAM chips such as, e.g., L2 caches) (hereinafter “L2 cache”) 168 . Of course, other cache configurations may also be employed in accordance with the present invention while maintaining the scope of the present invention.
  • a display device 116 is connected in signal communication with system bus 104 by display adapter 110 .
  • a disk storage device (e.g., a magnetic or optical disk storage device) 118 is connected in signal communication with system bus 104 by I/O adapter 112 .
  • a mouse 120 and keyboard 122 are connected in signal communication with system bus 104 by user interface adapter 114 .
  • the mouse 120 and keyboard 122 are used to input and output information to and from computer processing system 100 .
  • At least one speaker (herein after “speaker”) 185 is connected in signal communication with system bus 104 by sound adapter 170 .
  • a (digital and/or analog) modem 196 is connected in signal communication with system bus 104 by network adapter 198 .
  • FIG. 2 a data processing system having file and data storage management subsystems augmented with a changed files list with time buckets is indicated generally by the reference numeral 200 .
  • the data storage management system 200 includes an exemplary changed files list 210 with time buckets 210 a in accordance with the principles of the present invention. Moreover, the data storage management system 200 includes a file system processing module 220 , a data storage management processing module 230 , an archival and backup data storage device 240 . File inodes 250 are used by the file system processing module 220 , and the data storage management processing module 230 .
  • the changed files list 210 (with time buckets) relates to the present invention.
  • the changed files list 210 with time buckets is a data structure that organizes subsets of inode numbers into buckets.
  • Each bucket 210 A (also represented herein by the reference character “B”) represents a time period.
  • the presence of an inode number i in a bucket B records the fact that the file represented by inode number i last changed during the time represented by bucket B.
  • the bucket labeled “3:00” represents files whose last change occurred on or after 3 o'clock but before 4 o'clock.
  • the bucket logically includes the files represented by inode numbers 6 , 11 , and 18 .
  • a changed files manager 220 A disposed in the file system processing module 220 , creates and manages the changed files list 210 . While the changed files manager 220 A is shown and described with respect to file system processing module 220 , it is to be appreciated that the changed files manager 220 A may be implemented as a stand alone device, or may be implemented in one or more other elements of a data storage management (DSM) system or a computer processing system, while maintaining the scope of the present invention.
  • DSM data storage management
  • a file is represented by an inode and each inode within a file system has a unique number (IBM SanFS has the same concepts, except they use the word “object” and the phrase “object identifier.”).
  • the inode includes metadata that describes some attributes of the file and also includes pointers to the data blocks that hold the data of the file.
  • a file change is an event that causes any of the data or the meta-data to be modified (this includes any change in the file length, ownership, permissions (ACLs), and so forth).
  • Directories may be considered to be special case files. Renaming, adding or removing an entry e from a directory d is a modification (mtime) of the directory d, as well as a change (ctime) to the inode referenced by entry e.
  • the meta-data field atime (last access time), which records the last time at which any application accessed the file, is a special case, as except for the atime field itself, there are no changes to the file or its meta-data.
  • An atime only change to a data file is usually of no interest to a data backup system. However, it may well be of interest to other data management systems such as, e.g., a hierarchical storage management (HSM) system with a policy of keeping recently accessed files in primary storage and moving unused files to secondary storage.
  • HSM hierarchical storage management
  • An atime only change to a directory is usually of no interest to a typical data management system.
  • Our changed files list is a list of file-change records.
  • the list is partitioned into time buckets and/or otherwise stored and organized so that file-change records can be rapidly accessed by the value of their timestamps.
  • Two records with timestamps that indicate the same date and hour are considered to be in the same time bucket. While we use a granularity of an hour for illustrative purposes, any other convenient amount of time may be chosen for use in accordance with the principles of the present invention, while maintaining the scope of the present invention.
  • Object deletion is a special case. Besides the time buckets, the changed file list also includes a deleted objects bucket (popularly known as the bit bucket).
  • the changed files list and its buckets are persistent data structures that are organized in a way that records can be efficiently (a) created afresh, OR (b) located within a time bucket and (c) removed from a time bucket, updated and then appended to (or inserted into) the time bucket representing the current date and hour or appended to the deleted objects bucket.
  • Each file-change record includes several fields.
  • One such field in a file-change record is a timestamp for the record. This is the date and time when the file-change record was created or most recently updated. We also call this the btime (bucket time) of the file object.
  • the timestamp value need not represent a real wall clock time. For example, any increasing values might be used. Of course, other values and representations may also be used to provide a timestamp or time indication while maintaining the scope of the present invention.
  • Another field in a file-change record is the object inode number. That is, the inode number of the file (or directory.)
  • a field in a file-change record is the type of object.
  • this information could be recovered by fetching the object inode, but its cheap and easy to encode this as just an extra byte of information in the file-change record, which will allow any management processes reading the changed files list to avoid the overhead of an inode fetch for any file types the management process should ignore.
  • a data backup process might be configured to ignore device inodes.
  • a change type code word that indicates the nature of the change or (accumulated) changes.
  • a change type coding scheme allows any sensible combination of changes to be indicated by a single code word.
  • One exemplary change type is a directory entry insertion.
  • a directory entry referring to object inode was inserted (linked) into the parent directory.
  • the file was just given a name within the parent directory.
  • Another change type is a directory entry deletion.
  • a directory entry referring to the subject inode was deleted (unlinked) from the parent directory.
  • the file (subject inode) could have been deleted or renamed.
  • Another change type is that the ctime of the subject inode was updated. This usually results because some metadata/attribute of the file was updated.
  • Another change type is that the atime of the subject inode was updated. This usually results because an application read at least some of the file data. If all of the management systems that will use a changed files list do not care about atime changes, then atime-only-change records may be omitted. Thus, this can be configured as needed, based on the implementation.
  • another field in a file-change record is a list of parent inode numbers of the directories through which the object has been and may be accessed. Some implementations may use this field to help locate a (path)name for the object inode. This is described further herein below. Other implementations may maintain a separate objects-to-parent-directories map and, thus, may not require this field.
  • the root inode will have a distinguished, well known inode number and/or the root directory will include a special-case “. . . ” entry.
  • a pathname for the subject file is just (the reverse) of the list of names discovered whilst walking up the tree. Most objects have just one parent. However, POSIX allows a single non-directory object to be referred to by multiple directory entries. This sort of walk-up-the-tree approach is also performed by the Posix command /bin/pwd.
  • the storage management system might find the path(s) that leads to a subject inode by maintaining an inode to path look aside table.
  • the changed file list need not be updated for every change to a given file. For most management purposes, for any given file, it will be sufficient to record, e.g., the following: the first meta-data or data change that occurs within an hour (or other suitable unit of time granularity). Recall these changes always include an update to one or more of the following: atime; ctime; and mtime.
  • each unlink of the inode from a directory may also be recorded: each unlink of the inode from a directory; and each link of the inode into a directory.
  • an active or hot file and its inode will undergo several or many meta-data and/or data changes while being accessed and/or manipulated by an application.
  • a start block 302 passes control to a decision block 310 .
  • the decision block 310 determines whether or not a directory entry is to be updated for a subject inode. If so, then control is passed to a function block 320 . Otherwise, control is passed to a function block 330 .
  • the function block 320 locates or creates the change record for the file, and passes control to a function block 322 .
  • the function block 322 updates the list of parent inode numbers within the change record, and passes control to a function block 324 .
  • the function block 324 appends the change record to the time bucket for the current hour (or the deleted objects bucket if the object is now unlinked from all directories), and passes control to function block 330 .
  • function block 340 determines whether or not hour(time of this inode change)>hour(time of previous change) In particular, function block 340 may perform the following determinations: hour(new_mtime)>hour(old_mtime), hour(new_ctime)>hour(old_ctime), hour(new_atime)>hour(old_atime).
  • control is passed to an end block 370 . Otherwise, if hour(time of this inode change)>hour(time of previous change), then control is passed to a function block 350 .
  • the function block 350 sets the flag to indicate the type of change, locates or creates the change record for this inode, and passes control to a function block 352 .
  • the function block 352 removes the change record from the old bucket, and passes control to a function block 354 .
  • the action of removing the change record from the old bucket can be a logical delete. That is, a reclamation of storage can be postponed to a convenient later time when old buckets will be compacted, similar to the known art of maintaining B-trees and similar data structures.
  • other courses of action with respect to removing the change record may also be employed while maintaining the scope of the present invention.
  • the function block 354 updates the timestamp of the change record with the current time and its type with flag, and passes control to a function block 356 .
  • the function block 356 appends the change record to the time bucket for the current hour (otherwise there is no need to update the change-record for the object), and passes control to end block 370 .
  • the file-change records and the buckets of the changed files list are metadata that should be maintained in a way that is robust and consistent across system crashes and restarts. This can be accomplished by journaling all updates and including updates to the file change records and buckets within the same transaction scope as related to the inode and directory updates.
  • journaling file system should record all inode and directory updates and, thus, very little or even no additional information may be required in the journal to facilitate the replay of changed file list updates during crash recovery.
  • each time bucket represents all the files that changed during a particular hour.
  • the hour unit is somewhat arbitrary and, given the teachings of the present invention provided herein, any convenient amount of time could be chosen as the unit of time bucket granularity while maintaining the scope of the present invention.
  • the amount of time represented by buckets can be variable. That is, different buckets can represent different amounts of time.
  • time granularity Another consideration in choosing and using different time granularities relates to efficiency and simplicity of the hour(t) function. It may be a good idea to choose a unit of granularity so that two timestamps can be quickly and simply compared to see if they represent times within the same bucket. For example, if timestamps are represented by an integral number of milliseconds, then a conventional hour would be 3,600,000 timestamp units. However, we might choose time bucket granularity to be 4,194,304 (a power of 2), so that timestamps could be converted to bucket time units by a single binary shift instruction.
  • buckets on demand At any convenient time, e.g., just prior to the starting of a file management job, the bucket accumulating changes can be closed and a new bucket designated by the current time can be created and begin accumulating change records.
  • the hour(t) function depends on the closing times of the buckets.
  • the closing times of buckets can be coordinated with file system snapshots.
  • Each snapshot operation will close the bucket accumulating change records and create a new bucket.
  • VERITAS FILE SYSTEM and IBM's Storage Tank now known as IBM San File System
  • IBM San File System support snapshot versioning.
  • the hour(t) function yields a snapshot version number.
  • any two or more buckets adjacent in time can be merged into a single larger bucket, whenever that is convenient or desirable. For example, very old change records can be gathered into fewer buckets, by day, by week by month, by year, and so forth.
  • a typical data management job needs to find all the files within a file system that have changed since the last run. Using the changed files list with time buckets makes this simple and efficient. Moreover, multiple kinds of management jobs can all use the same changed files list with time buckets, even if they run on different schedules. Examples of different kinds of data management jobs are: backup, archiving, migration (moving data from one set of devices to another to improve or balance performance and/or lower costs, etc.), accounting, usage analysis and planning.
  • the process 400 relates to a data management operation, in particular, finding the set S of all files (or the corresponding inode numbers) whose data or metadata has changed between two times t 1 and t 2 , where t 1 is earlier than t 2 .
  • a start box 402 passes control to a function block 405 .
  • the function block 410 lets bucket B 1 be the bucket that represents hour(t 1 ), and passes control to a function block 415 .
  • B 1 is the foremost bucket that might include a file changed at time t 1 or later.
  • the function block 420 performs a loop for each file f in bucket B, checking the metadata of file f, and begins the loop by passing control to a decision block 425 .
  • the first and last buckets we visit may include some files that changed before t 1 or after t 2 .
  • an implementation might choose to defer removing entries from buckets. Thus, we may need to check the change time of each file f in bucket B before adding it to the set S.
  • decision block 425 determines whether or not file f changed between times t 1 and t 2 . If so, the control is passed to a function block 430 . Otherwise, control is returned to decision block 420 .
  • the function block 430 adds file f to set S, and passes control to a loop limit block 435 when there are no more files f in bucket B or returns control to function block 420 when there is another file f in bucket B.
  • the loop limit block 435 ends the loop, and passes control to a function block 440 .
  • the function block 440 lets B 2 be the bucket that immediately follows bucket B, and passes control to a decision block 445 .
  • the decision block 445 determines whether or not there is such a bucket B 2 (i.e., is bucket B 2 defined in the changed files list). If so, the control is passed to a decision block 450 . Otherwise, control is passed to a function block 470 .
  • the decision block 450 determines whether or not the hour of B 2 ⁇ hour(t 2 ). If so, then control is passed to a function block 455 . Otherwise, control is passed to a function block 470 . In other words, if there is not bucket following bucket B or the bucket following bucket B represents an hour>hour(t 2 ), then we are done.
  • set S now includes all of the files that have changed between times t 1 and t 2 , we may also want S to include the set of files deleted between times t 1 and t 2 . Accordingly, the function block 470 finds those files in the deleted objects bucket, adds them to set S, and passes control to an end block 480 .
  • the deleted objects bucket should be organized in such a way (using well known art) that all entries representing deletions between two times ti and t 2 can be efficiently retrieved.

Abstract

There is provided, in a computer processing system, an apparatus for managing object data. The apparatus includes a changed objects manager for creating and managing a changed objects list that at least identifies the objects that have changed based on time of change. The changed objects list is associated with a plurality of time buckets. Each of the plurality of time buckets is associated with a respective date and time period and with object change records for objects having a timestamp falling within the respective date and time period. Each of the object change records is associated with a unique object identifier and the timestamp for a corresponding one of the objects. The timestamp specifies a date and a time corresponding to a latest one of a creation time or a most recent update time for the corresponding one of the objects.

Description

    BACKGROUND
  • 1. Technical Field
  • The present invention relates generally to data file storage systems and, more particularly, to a changed files list with time buckets for efficient storage management.
  • 2. Description of the Related Art
  • Use of electronic data storage for long-term recordkeeping is increasing at an exponential rate. Much of this data is stored in file systems. Moreover, much of this data is write-once and is to be retained for long periods of time.
  • The most commonly used disk storage devices are cheap, but not free and certainly not perfectly reliable nor absolutely durable. Accordingly, there is a need to migrate data to cheaper and/or more reliable media, a need to backup data, and a need to make replicas.
  • The vast amounts of data and numbers of files maintained make manual management of data backup, replication, retention, and deletion burdensome, error prone, and impractical. Also, government regulations and business requirements demand that data management be conducted according to policy rules that conform to laws, practices, and so forth.
  • Even in a typical consumer home, there will be tens of thousands of files. For example, consider the operating system(s) and application program files, as well as financial documents and digital media photos (e.g., jpeg), music (e.g., mp3), and movies (e.g., mpeg). In an enterprise with thousands of employees, customer databases, and so forth, there can be hundreds of millions of files to be managed.
  • Taken together, the multitude of legal and business requirements and the vast number of file objects to be managed necessitate the automated application of data management policy rules.
  • Currently, almost every implementation of a data management system for files operates by reading the complete catalog of all directory entries for all of the files each time a management job is initiated.
  • The overhead of searching and reading the file catalogs and directories (scanning the metadata of a file system) whilst performing policy or rule driven maintenance operations such as backup and data migration is chewing up a significant number of cycles, so much so that it is becoming a significant problem or expense in the operation of these systems, as exemplified by Tivoli Storage Manager(TSM) (data backup) and Tivoli Storage Manager for Space Management(HSM) (data migration, which is also known as hierarchical storage management).
  • Regarding the prior art, recent versions of data backup products for WINDOWS NTFS partially address the above-described problem by implementing a change journal based backup feature. However, this approach has some limitations. For example, one limitation is that the change journal based backup feature is not crash proof. Journal integrity is lost upon reboot. A reboot event necessitates a complete new scan of all file system meta-data and a re-synchronizing of file lists and stats with the backup server. Moreover, another limitation is that the change journal based backup feature can degrade file system performance. Further, another limitation is that the change journal based backup feature is only supported on certain versions of the WINDOWS operating system. Also, another limitation is that the change journal based backup feature does not address the meta-data scanning problem for HSM. Additionally, another limitation is that the space required by the change journal based backup feature is (potentially) unbounded (or until it breaks). That is, every change is recorded in the journal and so the journal keeps growing at a rate that is proportional to the rate of file system change. Thus, in practice, the journal is periodically processed and trimmed by the storage management subsystem(s). However, the rate and amount of change can outpace the storage capacity of the journal and/or the processing cycles allocated to the storage management subsystem(s). When this “breakage” occurs, change information is lost. The management system then has to resort to a traditional full metadata scan.
  • SUMMARY
  • These and other drawbacks and disadvantages of the prior art are addressed by the present invention, which is directed to a changed files list with time buckets for efficient storage management.
  • The present invention may be implemented, e.g., as an apparatus, a method, and a computer program product.
  • According to an aspect of the present invention, in a computer processing system, there is provided an apparatus for managing object data. The apparatus includes a changed objects manager for creating and managing a changed objects list that at least identifies the objects that have changed based on time of change. The changed objects list is associated with a plurality of time buckets. Each of the plurality of time buckets is associated with a respective date and time period and with object change records for objects having a timestamp falling within the respective date and time period. Each of the object change records is associated with a unique object identifier and the timestamp for a corresponding one of the objects. The timestamp specifies a date and a time corresponding to a latest one of a creation time or a most recent update time for the corresponding one of the objects.
  • These and other objects, features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
  • FIG. 1 is a block diagram illustrating an exemplary computer processing system to which the present invention may be applied, in accordance with the principles of the present invention;
  • FIG. 2 is a block diagram illustrating an exemplary data storage management (DSM) system in accordance with the principles of the present invention;
  • FIG. 3 is a flow diagram illustrating an exemplary process for updating a changed files list in accordance with the principles of the present invention; and
  • FIG. 4 is a flow diagram illustrating an exemplary process for using a changed files list with time buckets in accordance with the principles of the present invention.
  • These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • The present invention is directed to a changed files list with time buckets for efficient storage management. It is to be appreciated that while the present invention is primarily described herein with respect to files, the present invention may be implemented with respect to any set of objects within and processed by a computer processing system. Moreover, the present invention is particularly suited to a set of managed objects most of which do not change during a given period of time, but where it is desired to concisely track which ones of the objects have changed.
  • Advantageously, the present invention is useful within computerized data file storage systems for efficiently selecting files that have been accessed recently, where such files are typically the primary subjects of data management tasks or jobs. Of course, given the teachings of the present invention provided herein, one of ordinary skill in this and related arts will contemplate these and other applications and systems to which the present invention may be applied, while maintaining the scope of the present invention.
  • In an exemplary embodiment of the present invention, we maintain a “changed file list”, which is a persistent data structure with just one short file-change record for each file. The changed file list is (conceptually) partitioned into time buckets. For illustrative purposes, consider that there is a bucket for every hour of every day. Of course, it is to be appreciated that different granularities of time could be chosen, as described herein below, while maintaining the scope of the present invention. The file system is augmented such that each time the metadata of a file f is updated, the current date and time of day (t_now) is compared with the timestamp representing the last metadata change (t_prev) of file f. If t_now is a different hour or day than t_prev, then the file-change record for f is (logically) moved to the time bucket representing the current date and hour (hour_of(t_now)). Otherwise, the file-change record is already in the correct bucket and need not be accessed nor modified. Of course, after this test, we go ahead and update the last metadata change timestamp of f just as it would be in a traditional Posix-like file system.
  • It is to be noted that the changed file list does not grow except when new files are created within the file system. The t_now to t_prev comparison adds a trivial few instructions to the traditional metadata processing by the file system. The processing required to move a file-change record from one time bucket to another is roughly the same as moving/renaming (Posix mv) a file from one directory to another. However, in one embodiment, we limit the moves between buckets to at most once per hour for each accessed file. Of course, other time limits for moving between buckets may also be used, while maintaining the scope of the present invention.
  • A storage management process that runs occasionally (typically a few times each week) such as, e.g., a backup job, normally should need to only consider and process files that have changed since the previous run. Knowing the hour and date of the last run, the storage management process can readily determine which files have been changed (and/or whose metadata has changed) by just reading the file-change records included within the time buckets representing the hours between now and then. Since a bucket holds file-change records covering a whole hour, the storage management process may consider some files in buckets that represent the hour(s) during which its previous run occurred. By reading the complete metadata for just those files, it can determine which ones need to be processed. However, the vast majority of unchanged files will be represented by old buckets and can be completely ignored by the storage management process.
  • It should be understood that the elements shown in the Figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in software on one or more appropriately programmed general-purpose digital computers having a processor and memory and input/output interfaces.
  • It is to be appreciated that as used herein, the phrase “at least one”, when used to refer to more than one object (e.g., A and B), refers to one of A or one of B or one of A and one of B.
  • Embodiments of the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that may include, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • A data-processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • Turning to FIG. 1, an exemplary computer processing system to which the present invention may be applied is indicated generally by the reference numeral 100. The computer processing system 100 includes at least one processor (CPU) 102 connected in signal communication with other components via a system bus 104. A read only memory (ROM) 106, a random access memory (RAM) 108, a display adapter 110, an I/O adapter 112, a user interface adapter 114, a sound adapter 170, and a network adapter 198, are each connected in signal communication with the system bus 104.
  • The CPU 102 may include one or more “on-board” caches (e.g., L1 caches) (hereinafter “L1 cache”) 166. Moreover, the CPU may be in signal communication with one or more “external caches” (e.g., disk caches and RAM caches) (hereinafter “disk cache” 167 and “RAM cache 168”). Further, the CPU 102 may also be in signal communication with one or more other “external caches” (e.g., on a chip other than the CPU and RAM chips such as, e.g., L2 caches) (hereinafter “L2 cache”) 168. Of course, other cache configurations may also be employed in accordance with the present invention while maintaining the scope of the present invention.
  • A display device 116 is connected in signal communication with system bus 104 by display adapter 110.
  • A disk storage device (e.g., a magnetic or optical disk storage device) 118 is connected in signal communication with system bus 104 by I/O adapter 112.
  • A mouse 120 and keyboard 122 are connected in signal communication with system bus 104 by user interface adapter 114. The mouse 120 and keyboard 122 are used to input and output information to and from computer processing system 100.
  • At least one speaker (herein after “speaker”) 185 is connected in signal communication with system bus 104 by sound adapter 170.
  • A (digital and/or analog) modem 196 is connected in signal communication with system bus 104 by network adapter 198.
  • Turning to FIG. 2, a data processing system having file and data storage management subsystems augmented with a changed files list with time buckets is indicated generally by the reference numeral 200.
  • Turning to FIG. 2, an exemplary data storage management (DSM) system is indicated generally by the reference numeral 200. The data storage management system 200 includes an exemplary changed files list 210 with time buckets 210 a in accordance with the principles of the present invention. Moreover, the data storage management system 200 includes a file system processing module 220, a data storage management processing module 230, an archival and backup data storage device 240. File inodes 250 are used by the file system processing module 220, and the data storage management processing module 230.
  • The changed files list 210 (with time buckets) relates to the present invention. The changed files list 210 with time buckets is a data structure that organizes subsets of inode numbers into buckets. Each bucket 210A (also represented herein by the reference character “B”) represents a time period. The presence of an inode number i in a bucket B records the fact that the file represented by inode number i last changed during the time represented by bucket B. For example, in FIG. 2, the bucket labeled “3:00” represents files whose last change occurred on or after 3 o'clock but before 4 o'clock. The bucket logically includes the files represented by inode numbers 6, 11, and 18.
  • A changed files manager 220A, disposed in the file system processing module 220, creates and manages the changed files list 210. While the changed files manager 220A is shown and described with respect to file system processing module 220, it is to be appreciated that the changed files manager 220A may be implemented as a stand alone device, or may be implemented in one or more other elements of a data storage management (DSM) system or a computer processing system, while maintaining the scope of the present invention.
  • A description will now be given regarding what is considered to be a file and what is considered to be a change of course, the present invention is not limited to the preceding definitions and, thus, given the teachings of the present invention provided herein, other definitions and interpretations of what is considered a file and a change may also be employed in accordance with the principles of the present invention, while maintaining the scope of the present invention.
  • In Posix-like systems, a file is represented by an inode and each inode within a file system has a unique number (IBM SanFS has the same concepts, except they use the word “object” and the phrase “object identifier.”). The inode includes metadata that describes some attributes of the file and also includes pointers to the data blocks that hold the data of the file. A file change is an event that causes any of the data or the meta-data to be modified (this includes any change in the file length, ownership, permissions (ACLs), and so forth).
  • We also must consider any change that causes a file to be renamed, deleted, or to acquire a new alias name. Indeed, on a Posix/Linux system, changing any file attribute or renaming a file causes the ctime attribute of the inode of that file to be updated. Modifying, appending or truncating the file data causes the mtime attribute of the inode of that file to be updated.
  • Directories may be considered to be special case files. Renaming, adding or removing an entry e from a directory d is a modification (mtime) of the directory d, as well as a change (ctime) to the inode referenced by entry e.
  • The meta-data field atime (last access time), which records the last time at which any application accessed the file, is a special case, as except for the atime field itself, there are no changes to the file or its meta-data. An atime only change to a data file is usually of no interest to a data backup system. However, it may well be of interest to other data management systems such as, e.g., a hierarchical storage management (HSM) system with a policy of keeping recently accessed files in primary storage and moving unused files to secondary storage.
  • An atime only change to a directory is usually of no interest to a typical data management system.
  • We must also consider a complication introduced by the hard link concept, typical of Posix-like systems. A single inode number can appear one or more times in one or several directories. Thus, a single inode/file can be known by several different names under several different paths.
  • A description will now be given regarding file change records in the changed files list with time buckets.
  • Our changed files list is a list of file-change records. The list is partitioned into time buckets and/or otherwise stored and organized so that file-change records can be rapidly accessed by the value of their timestamps. Two records with timestamps that indicate the same date and hour are considered to be in the same time bucket. While we use a granularity of an hour for illustrative purposes, any other convenient amount of time may be chosen for use in accordance with the principles of the present invention, while maintaining the scope of the present invention.
  • Object deletion is a special case. Besides the time buckets, the changed file list also includes a deleted objects bucket (popularly known as the bit bucket).
  • The changed files list and its buckets are persistent data structures that are organized in a way that records can be efficiently (a) created afresh, OR (b) located within a time bucket and (c) removed from a time bucket, updated and then appended to (or inserted into) the time bucket representing the current date and hour or appended to the deleted objects bucket.
  • Each file-change record includes several fields. One such field in a file-change record is a timestamp for the record. This is the date and time when the file-change record was created or most recently updated. We also call this the btime (bucket time) of the file object. The timestamp value need not represent a real wall clock time. For example, any increasing values might be used. Of course, other values and representations may also be used to provide a timestamp or time indication while maintaining the scope of the present invention.
  • Another field in a file-change record is the object inode number. That is, the inode number of the file (or directory.)
  • Moreover, another field in a file-change record is the type of object. This indicates an ordinary data file, a directory, or an inode that has no associated data, such as a symbolic link or a device, a socket, and so forth. Of course, this information could be recovered by fetching the object inode, but its cheap and easy to encode this as just an extra byte of information in the file-change record, which will allow any management processes reading the changed files list to avoid the overhead of an inode fetch for any file types the management process should ignore. For example, a data backup process might be configured to ignore device inodes.
  • Further, another field in a file-change record is a change type code word that indicates the nature of the change or (accumulated) changes. A change type coding scheme allows any sensible combination of changes to be indicated by a single code word.
  • One exemplary change type is a directory entry insertion. A directory entry referring to object inode was inserted (linked) into the parent directory. The file was just given a name within the parent directory.
  • Another change type is a directory entry deletion. A directory entry referring to the subject inode was deleted (unlinked) from the parent directory. The file (subject inode) could have been deleted or renamed.
  • Yet another change type is a directory entry rename. This is a special case of two previous entries that are combined. The subject file was renamed but stayed within the same parent directory.
  • Moreover, another change type is that the mtime of the subject inode was updated. This usually results because the contents of the file were modified.
  • Further, another change type is that the ctime of the subject inode was updated. This usually results because some metadata/attribute of the file was updated.
  • Also, another change type is that the atime of the subject inode was updated. This usually results because an application read at least some of the file data. If all of the management systems that will use a changed files list do not care about atime changes, then atime-only-change records may be omitted. Thus, this can be configured as needed, based on the implementation.
  • Also, another field in a file-change record is a list of parent inode numbers of the directories through which the object has been and may be accessed. Some implementations may use this field to help locate a (path)name for the object inode. This is described further herein below. Other implementations may maintain a separate objects-to-parent-directories map and, thus, may not require this field.
  • We include the parent inode numbers so that the complete pathnames for the subject file can be found, if need be, without conducting a full search of the directories of the file system, as follows. In a first step, we find a directory entry with the subject's inode number within the (immediate) parent directory. This (immediate parent directory) includes the name of the subject file. In a second step, we walk up the tree towards the root by finding the inode number of the grand parent stored in the “. . . ” entry within the parent. Then, we find the directory entry with the parent's inode number within the grandparent directory. This directory entry includes the name of the parent directory. We repeat step 2 for higher-level directories until the root inode of the file system is reached. The root inode will have a distinguished, well known inode number and/or the root directory will include a special-case “. . . ” entry. A pathname for the subject file is just (the reverse) of the list of names discovered whilst walking up the tree. Most objects have just one parent. However, POSIX allows a single non-directory object to be referred to by multiple directory entries. This sort of walk-up-the-tree approach is also performed by the Posix command /bin/pwd.
  • Alternatively, the storage management system might find the path(s) that leads to a subject inode by maintaining an inode to path look aside table.
  • Maintenance of objects-to-parent-directories maps and/or path look aside tables can be done step-by-step with each file system change or can be done periodically or on-demand by using the changes files list itself to find changed directories. The changed directories can then be scanned to update and/or re-generate the relevant entries in the maps or tables.
  • A description will now be given regarding when to add (or update and move) entries to (or within) the changed files list.
  • The changed file list need not be updated for every change to a given file. For most management purposes, for any given file, it will be sufficient to record, e.g., the following: the first meta-data or data change that occurs within an hour (or other suitable unit of time granularity). Recall these changes always include an update to one or more of the following: atime; ctime; and mtime.
  • Moreover, presuming we are maintaining the list of parent inode numbers in the change-record, the following may also be recorded: each unlink of the inode from a directory; and each link of the inode into a directory.
  • Typically, an active or hot file and its inode will undergo several or many meta-data and/or data changes while being accessed and/or manipulated by an application. We want to capture the fact that a particular file has changed, but we do not need to record every change in the changed files list, since that would introduce untenable overhead into a file management system.
  • A description will now be given regarding an embodiment of the invention, relating to updating the changes files list.
  • Presume that we augment a conventional state of the art file system (e.g., but not limited to, EXT3, JFS, and so forth) such that the system executes the following steps described with respect to FIG. 3 below, e.g., prior to performing an inode or directory update for a file.
  • Turning to FIG. 3, an exemplary process for updating a changed files list is indicated generally by the reference numeral 300. A start block 302 passes control to a decision block 310. The decision block 310 determines whether or not a directory entry is to be updated for a subject inode. If so, then control is passed to a function block 320. Otherwise, control is passed to a function block 330.
  • The function block 320 locates or creates the change record for the file, and passes control to a function block 322. The function block 322 updates the list of parent inode numbers within the change record, and passes control to a function block 324. The function block 324 appends the change record to the time bucket for the current hour (or the deleted objects bucket if the object is now unlinked from all directories), and passes control to function block 330.
  • The function block 330 lets the old_mtime be the mtime value of the inode, just prior to the update that is about to be executed, lets the new_mtime be the mtime value of the inode that is the updated value, performs similar assignments for old_ctime, new_ctime, old_atime, new_atime, lets btime be the time the file-change record for the inode was last updated, lets hour(t) be a function that rounds a timestamp t to the granularity of the time buckets, lets flag={ } (empty), and passes control to a decision block 340. With respect to function block 330, nominally hour(t) rounds down to the hour, but more generally we only need the following property: hour(t1)<hour(t2), when timestamp t1 belongs to an older bucket than timestamp t2. The function block 340 determines whether or not hour(time of this inode change)>hour(time of previous change) In particular, function block 340 may perform the following determinations: hour(new_mtime)>hour(old_mtime), hour(new_ctime)>hour(old_ctime), hour(new_atime)>hour(old_atime).
  • If hour(time of this inode change)≦hour(time of previous change), then control is passed to an end block 370. Otherwise, if hour(time of this inode change)>hour(time of previous change), then control is passed to a function block 350.
  • The function block 350 sets the flag to indicate the type of change, locates or creates the change record for this inode, and passes control to a function block 352. With respect to function block 350, in particular: when hour(old_mtime)<hour(new_mtime), then flag:=flag ∪ {mtime_updated}; when hour(old_ctime)<hour(new_ctime), then flag:=flag ∪ {ctime_updated}; and when hour(old_atime)<hour(new_atime), then flag:=flag ∪ {atime_updated}.
  • The function block 352 removes the change record from the old bucket, and passes control to a function block 354. With respect to function block 352, the action of removing the change record from the old bucket can be a logical delete. That is, a reclamation of storage can be postponed to a convenient later time when old buckets will be compacted, similar to the known art of maintaining B-trees and similar data structures. Of course, other courses of action with respect to removing the change record may also be employed while maintaining the scope of the present invention.
  • The function block 354 updates the timestamp of the change record with the current time and its type with flag, and passes control to a function block 356. The function block 356 appends the change record to the time bucket for the current hour (otherwise there is no need to update the change-record for the object), and passes control to end block 370.
  • It is to be noted that for a special case, namely, for a newly created file, we combine the change records to indicate the directory and other update and append a single (i.e., combined) new change-record to the time bucket for the current hour.
  • A description will now be given regarding maintaining a changed files list correctly in spite of a crash/reboot.
  • Just like the inodes, directories and other metadata, the file-change records and the buckets of the changed files list are metadata that should be maintained in a way that is robust and consistent across system crashes and restarts. This can be accomplished by journaling all updates and including updates to the file change records and buckets within the same transaction scope as related to the inode and directory updates.
  • Notice that a typical journaling file system should record all inode and directory updates and, thus, very little or even no additional information may be required in the journal to facilitate the replay of changed file list updates during crash recovery.
  • A description will now be given regarding time bucket granularity.
  • For illustrative purposes, we have supposed that-each time bucket represents all the files that changed during a particular hour. However, it is to be appreciated that the hour unit is somewhat arbitrary and, given the teachings of the present invention provided herein, any convenient amount of time could be chosen as the unit of time bucket granularity while maintaining the scope of the present invention. Also, the amount of time represented by buckets can be variable. That is, different buckets can represent different amounts of time. Some considerations and variations in choosing and using different time granularities are provided herein for illustrative purposes. However, it is to be appreciated that other considerations and variations may also be considered and implemented with respect to unit of time bucket granularity and using different times (amount of times) for different buckets in accordance with the principles of the present invention, while maintaining the scope of the present invention.
  • One consideration in choosing and using different time granularities relates to tradeoffs of overhead. Larger units of granularity will decrease the overhead of maintaining the changed files list, since there will be less updating and moving of file change records. On the other hand, each management process will have to scan through bigger buckets of file change records to be sure to find all files that have changed since a previous run.
  • Another consideration in choosing and using different time granularities relates to efficiency and simplicity of the hour(t) function. It may be a good idea to choose a unit of granularity so that two timestamps can be quickly and simply compared to see if they represent times within the same bucket. For example, if timestamps are represented by an integral number of milliseconds, then a conventional hour would be 3,600,000 timestamp units. However, we might choose time bucket granularity to be 4,194,304 (a power of 2), so that timestamps could be converted to bucket time units by a single binary shift instruction.
  • Moreover, another consideration in choosing and using different time granularities relates to practical choice. We expect that most files are not continually modified. A typical file is created and/or opened by an application, operated on, and then closed within a short time (e.g., an hour or less). Most file management jobs are run once or a very few times a day. To avoid re-scanning files that really have not changed, it would be preferable, but not mandatory, to have at least several buckets worth of time elapse between runs of the same management job. Hence, we expect any choice of granularity between a few minutes and a few hours will be appropriate for current systems. However, as mentioned herein, other units of granularity may also be employed.
  • Also, another consideration in choosing and using different time granularities relates to buckets on demand. At any convenient time, e.g., just prior to the starting of a file management job, the bucket accumulating changes can be closed and a new bucket designated by the current time can be created and begin accumulating change records. In this variation of the scheme, the hour(t) function depends on the closing times of the buckets.
  • Alternatively, with respect to buckets on demand, the closing times of buckets can be coordinated with file system snapshots. Each snapshot operation will close the bucket accumulating change records and create a new bucket. Several known-art file systems such as VERITAS FILE SYSTEM and IBM's Storage Tank (now known as IBM San File System) support snapshot versioning. In this variation of the scheme, the hour(t) function yields a snapshot version number.
  • Also, another consideration in choosing and using different time granularities relates to merging buckets. Any two or more buckets adjacent in time can be merged into a single larger bucket, whenever that is convenient or desirable. For example, very old change records can be gathered into fewer buckets, by day, by week by month, by year, and so forth.
  • Additionally, another consideration in choosing and using different time granularities relates to the two bucket solution. In this special case (that is logically a further refinement of buckets on demand and merging buckets described above), which reduces some of the bookkeeping overhead, we can keep just two buckets. One bucket with records for all files that have NOT changed since a particular time T, and one bucket with records for files that have changed after time T.
  • A description will now be given regarding using the changed files list with time buckets for data management.
  • A typical data management job needs to find all the files within a file system that have changed since the last run. Using the changed files list with time buckets makes this simple and efficient. Moreover, multiple kinds of management jobs can all use the same changed files list with time buckets, even if they run on different schedules. Examples of different kinds of data management jobs are: backup, archiving, migration (moving data from one set of devices to another to improve or balance performance and/or lower costs, etc.), accounting, usage analysis and planning.
  • Turning to FIG. 4, an exemplary process for using a changed files list with time buckets is indicated generally by the reference numeral 400. The process 400 relates to a data management operation, in particular, finding the set S of all files (or the corresponding inode numbers) whose data or metadata has changed between two times t1 and t2, where t1 is earlier than t2.
  • A start box 402 passes control to a function block 405. The function block 405 initializes the set S:={ }, namely the empty set, and passes control to a function block 410. The function block 410 lets bucket B1 be the bucket that represents hour(t1), and passes control to a function block 415. With respect to function block 410, stated another way, B1 is the foremost bucket that might include a file changed at time t1 or later.
  • The function block 415 sets variable bucket B:=B1, and passes control to a function block 420.
  • The function block 420 performs a loop for each file f in bucket B, checking the metadata of file f, and begins the loop by passing control to a decision block 425. With respect to function block 420, it is to be noted that the first and last buckets we visit may include some files that changed before t1 or after t2. Also, an implementation might choose to defer removing entries from buckets. Thus, we may need to check the change time of each file f in bucket B before adding it to the set S.
  • Accordingly, decision block 425 determines whether or not file f changed between times t1 and t2. If so, the control is passed to a function block 430. Otherwise, control is returned to decision block 420.
  • The function block 430 adds file f to set S, and passes control to a loop limit block 435 when there are no more files f in bucket B or returns control to function block 420 when there is another file f in bucket B.
  • The loop limit block 435 ends the loop, and passes control to a function block 440. The function block 440 lets B2 be the bucket that immediately follows bucket B, and passes control to a decision block 445. The decision block 445 determines whether or not there is such a bucket B2 (i.e., is bucket B2 defined in the changed files list). If so, the control is passed to a decision block 450. Otherwise, control is passed to a function block 470.
  • The decision block 450 determines whether or not the hour of B2≦hour(t2). If so, then control is passed to a function block 455. Otherwise, control is passed to a function block 470. In other words, if there is not bucket following bucket B or the bucket following bucket B represents an hour>hour(t2), then we are done.
  • The function block 455 sets variable bucket B:=B2, and returns control to function block 420 (beginning a new execution of the loop to consider the files in bucket B).
  • Since set S now includes all of the files that have changed between times t1 and t2, we may also want S to include the set of files deleted between times t1 and t2. Accordingly, the function block 470 finds those files in the deleted objects bucket, adds them to set S, and passes control to an end block 480. The deleted objects bucket should be organized in such a way (using well known art) that all entries representing deletions between two times ti and t2 can be efficiently retrieved.
  • Having described preferred embodiments of a system and method (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope and spirit of the invention as outlined by the appended claims. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

Claims (30)

1. In a computer processing system, an apparatus for managing object data, comprising:
a changed objects manager for creating and managing a changed objects list that at least identifies the objects that have changed based on time of change, the changed objects list associated with a plurality of time buckets, each of the plurality of time buckets associated with a respective date and time period and with object change records for objects having a timestamp falling within the respective date and time period, each of the object change records associated with a unique object identifier and the timestamp for a corresponding one of the objects, the timestamp specifying a date and a time corresponding to a latest one of a creation time or a most recent update time for the corresponding one of the objects.
2. The apparatus of claim 1, wherein the changed objects list is configured for use by a storage management process.
3. The apparatus of claim 1, wherein the storage management process includes at least one of data backup and data replication.
4. The apparatus of claim 1, further comprising:
a file system processing module for comparing a current date and time against a timestamp for a given object, creating a new object change record for the given object and inserting the new object change record into a corresponding one of the plurality of time buckets having the respective time date and time period in which the timestamp falls when an existing object change record does not exist for the given object, and moving the existing object change record for the given object into the corresponding one of the plurality of time buckets having the respective time date and time period in which the timestamp falls when the current date and time is different than the timestamp and the existing object change record already exists for the given object.
5. The apparatus of claim 4, wherein the existing object change record is maintained in a currently inserted time bucket, when the timestamp matches the current date and time and falls within the respective date and time period of the currently inserted time bucket and the existing object change record already exists for the given object.
6. The apparatus of claim 1, wherein the apparatus is implemented in a file system, and an amount of object change records in the changed objects list is without change except when a new object is created or when an existing object is deleted from the file system.
7. The apparatus of claim 1, wherein the changed objects list is configured for use by a storage management process that is executed periodically or on demand, and wherein, for a current execution of the storage management process, only object change records in corresponding ones of the plurality of time buckets that are subsequent to an immediately previous execution of the storage management process are considered.
8. The apparatus of claim 1, wherein an object change involves any of a data change and a metadata change.
9. The apparatus of claim 1, wherein each of the objects in the object change records is further associated with at least one of an object type, a change type, a list of parent identifiers of directories through which that object is accessible.
10. The apparatus of claim 1, wherein the changed files list is configured such that time granularities of the plurality of time buckets are variable so that different ones of the plurality of time buckets are capable of representing different time periods.
11. In a computer processing system, a method for managing object data, comprising the step of:
at least one of creating and maintaining a changed objects list that is partitioned into a plurality of time buckets, each of the plurality of time buckets associated with a respective date and time period and with object change records for objects having a timestamp falling within the respective date and time period, each of the object change records associated with a unique object identifier and the timestamp for a corresponding one of the objects, the timestamp specifying a date and a time corresponding to a latest one of a creation time or a most recent update time for the corresponding one of the objects,
wherein the changed objects list identifies the objects that have changed based on the respective date and time periods of corresponding ones of the plurality of time buckets.
12. The method of claim 11, further comprising the step of using the changed objects list for a storage management process.
13. The method of claim 11, wherein the storage management process includes at least one of data backup and data replication.
14. The method of claim 11, further comprising the steps of:
comparing a current date and time against a timestamp for a given object;
creating a new object change record for the given object and inserting the new object change record into a corresponding one of the plurality of time buckets having the respective time date and time period in which the timestamp falls, when an existing object change record does not exist for the given object; and
moving the existing object change record for the given object into the corresponding one of the plurality of time buckets having the respective time date and time period in which the timestamp falls, when the current date and time is different than the timestamp and the existing object change record already exists for the given object.
15. The method of claim 14, further comprising the step of maintaining the existing object change record in a currently inserted time bucket, when the timestamp matches the current date and time and falls within the respective date and time period of the currently inserted time bucket and the existing object change record already exists for the given object.
16. The method of claim 11, wherein the method is implemented in a file system, and an amount of object change records in the changed objects list is without change except when a new object is created or when an existing object is deleted from the file system.
17. The method of claim 11, wherein the method is used for a storage management process that is executed periodically or on demand, and the method comprises the step of, for a current execution of the storage management process, considering only object change records in corresponding ones of the plurality of time buckets that are subsequent to an immediately previous execution of the storage management process.
18. The method of claim 11, wherein an object change involves any of a data change and a metadata change.
19. The method of claim 11, wherein each of the objects in the object change records is further associated with at least one of an object type, a change type, a list of parent identifiers of directories through which that object is accessible.
20. The method of claim 11, wherein time granularities of the plurality of time buckets are variable such that different ones of the plurality of time buckets are capable of representing different time periods.
21. A computer program product comprising a computer usable medium including computer usable program code for managing object data, said computer program product including:
computer usable program code for at least one of creating and maintaining a changed objects list that is partitioned into a plurality of time buckets, each of the plurality of time buckets associated with a respective date and time period and with object change records for objects having a timestamp falling within the respective date and time period, each of the object change records associated with a unique object identifier and the timestamp for a corresponding one of the objects, the timestamp specifying a date and a time corresponding to a latest one of a creation time or a most recent update time for the corresponding one of the objects,
wherein the changed objects list identifies the objects that have changed based on the respective date and time periods of corresponding ones of the plurality of time buckets.
22. The computer program product of claim 21, further comprising computer usable program code for using the changed objects list for a storage management process.
23. The computer program product of claim 21, wherein the storage management process includes at least one of data backup and data replication.
24. The computer program product of claim 21, further comprising:
computer usable program code for comparing a current date and time against a timestamp for a given object;
computer usable program code for creating a new object change record for the given object and inserting the new object change record into a corresponding one of the plurality of time buckets having the respective time date and time period in which the timestamp falls, when an existing object change record does not exist for the given object; and
computer usable program code for moving the existing object change record for the given object into the corresponding one of the plurality of time buckets having the respective time date and time period in which the timestamp falls, when the current date and time is different than the timestamp and the existing object change record already exists for the given object.
25. The computer program product of claim 24, further comprising computer usable program code for maintaining the existing object change record in a currently inserted time bucket, when the timestamp matches the current date and time and falls within the respective date and time period of the currently inserted time bucket and the existing object change record already exists for the given object.
26. The computer program product of claim 21, wherein the method is implemented in a file system, and an amount of object change records in the changed objects list is without change except when a new object is created or when an existing object is deleted from the file system.
27. The computer program product of claim 21, wherein the method is used for a storage management process that is executed periodically or on demand, and the method comprises computer usable program code for, for a current execution of the storage management process, considering only object change records in corresponding ones of the plurality of time buckets that are subsequent to an immediately previous execution of the storage management process.
28. The computer program product of claim 21, wherein an object change involves any of a data change and a metadata change.
29. The computer program product of claim 21, wherein each of the objects in the object change records is further associated with at least one of an object type, a change type, a list of parent identifiers of directories through which that object is accessible.
30. The computer program product of claim 21, further comprising computer usable program code for configuring time granularities of the plurality of time buckets to be variable such that different ones of the plurality of time buckets are capable of representing different time periods.
US11/128,781 2005-05-13 2005-05-13 Changed files list with time buckets for efficient storage management Abandoned US20060259527A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/128,781 US20060259527A1 (en) 2005-05-13 2005-05-13 Changed files list with time buckets for efficient storage management
US12/061,323 US8548965B2 (en) 2005-05-13 2008-04-02 Changed files list with time buckets for efficient storage management
US13/937,901 US20130297610A1 (en) 2005-05-13 2013-07-09 Changed files list with time buckets for efficient storage management

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/128,781 US20060259527A1 (en) 2005-05-13 2005-05-13 Changed files list with time buckets for efficient storage management

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/061,323 Continuation US8548965B2 (en) 2005-05-13 2008-04-02 Changed files list with time buckets for efficient storage management

Publications (1)

Publication Number Publication Date
US20060259527A1 true US20060259527A1 (en) 2006-11-16

Family

ID=37420425

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/128,781 Abandoned US20060259527A1 (en) 2005-05-13 2005-05-13 Changed files list with time buckets for efficient storage management
US12/061,323 Expired - Fee Related US8548965B2 (en) 2005-05-13 2008-04-02 Changed files list with time buckets for efficient storage management
US13/937,901 Abandoned US20130297610A1 (en) 2005-05-13 2013-07-09 Changed files list with time buckets for efficient storage management

Family Applications After (2)

Application Number Title Priority Date Filing Date
US12/061,323 Expired - Fee Related US8548965B2 (en) 2005-05-13 2008-04-02 Changed files list with time buckets for efficient storage management
US13/937,901 Abandoned US20130297610A1 (en) 2005-05-13 2013-07-09 Changed files list with time buckets for efficient storage management

Country Status (1)

Country Link
US (3) US20060259527A1 (en)

Cited By (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070100807A1 (en) * 2005-10-31 2007-05-03 Kabushiki Kaisha Toshiba Data searching system, method of synchronizing metadata and data searching apparatus
US20070229914A1 (en) * 2006-04-04 2007-10-04 Noriko Matsuzawa Image processing apparatus, control method thereof, and program
US20080005187A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation Methods and apparatus for managing configuration management database via composite configuration item change history
US20090327340A1 (en) * 2008-06-30 2009-12-31 International Business Machines Corporation System and Method for Managing Data Using a Hierarchical Metadata Management System
US20100205150A1 (en) * 2005-11-28 2010-08-12 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
WO2011082113A1 (en) * 2009-12-31 2011-07-07 Commvault Systems, Inc. Asynchronous methods of data classification using change journals and other data structures
US8234249B2 (en) 2006-12-22 2012-07-31 Commvault Systems, Inc. Method and system for searching stored data
US8392386B2 (en) 2009-08-05 2013-03-05 International Business Machines Corporation Tracking file contents
US20130185503A1 (en) * 2012-01-12 2013-07-18 Vigneshwara Bhatta Method for metadata persistence
US20140082033A1 (en) * 2012-09-14 2014-03-20 Salesforce.Com, Inc. Methods and systems for managing files in an on-demand system
US8719264B2 (en) 2011-03-31 2014-05-06 Commvault Systems, Inc. Creating secondary copies of data based on searches for content
US8892523B2 (en) 2012-06-08 2014-11-18 Commvault Systems, Inc. Auto summarization of content
US8930496B2 (en) 2005-12-19 2015-01-06 Commvault Systems, Inc. Systems and methods of unified reconstruction in storage systems
US20150012496A1 (en) * 2013-07-04 2015-01-08 Fujitsu Limited Storage device and method for controlling storage device
US20150046401A1 (en) * 2011-06-30 2015-02-12 Emc Corporation File deletion detection in key value databases for virtual backups
US20150134900A1 (en) * 2013-11-08 2015-05-14 Mei-Ling Lin Cache efficiency in a shared disk database cluster
US20150269214A1 (en) * 2014-03-19 2015-09-24 Red Hat, Inc. Identifying files in change logs using file content location identifiers
US9158835B2 (en) 2006-10-17 2015-10-13 Commvault Systems, Inc. Method and system for offline indexing of content and classifying stored data
US9244964B2 (en) 2012-05-21 2016-01-26 International Business Machines Corporation Determining a cause of an incident based on text analytics of documents
US20160078076A1 (en) * 2014-09-15 2016-03-17 International Business Machines Corporation Parallel container and record organization
US20160124815A1 (en) 2011-06-30 2016-05-05 Emc Corporation Efficient backup of virtual data
US9380431B1 (en) 2013-01-31 2016-06-28 Palantir Technologies, Inc. Use of teams in a mobile application
US9501507B1 (en) * 2012-12-27 2016-11-22 Palantir Technologies Inc. Geo-temporal indexing and searching
US9509652B2 (en) 2006-11-28 2016-11-29 Commvault Systems, Inc. Method and system for displaying similar email messages based on message contents
US9600146B2 (en) 2015-08-17 2017-03-21 Palantir Technologies Inc. Interactive geospatial map
US9639580B1 (en) 2015-09-04 2017-05-02 Palantir Technologies, Inc. Computer-implemented systems and methods for data management and visualization
US9684473B2 (en) 2011-06-30 2017-06-20 EMC IP Holding Company LLC Virtual machine disaster recovery
US9864656B1 (en) 2011-06-30 2018-01-09 EMC IP Holding Company LLC Key value databases for virtual backups
US9891808B2 (en) 2015-03-16 2018-02-13 Palantir Technologies Inc. Interactive user interfaces for location-based data analysis
US9916324B2 (en) 2011-06-30 2018-03-13 EMC IP Holding Company LLC Updating key value databases for virtual backups
US9953445B2 (en) 2013-05-07 2018-04-24 Palantir Technologies Inc. Interactive data object map
US20180137134A1 (en) * 2015-07-14 2018-05-17 Alibaba Group Holding Limited Data snapshot acquisition method and system
US9986029B2 (en) 2014-03-19 2018-05-29 Red Hat, Inc. File replication using file content location identifiers
US20180181584A1 (en) * 2016-12-23 2018-06-28 Nexenta Systems, Inc. Method and system for maintaining and searching index records
US10025808B2 (en) 2014-03-19 2018-07-17 Red Hat, Inc. Compacting change logs using file content location identifiers
US10089190B2 (en) 2011-06-30 2018-10-02 EMC IP Holding Company LLC Efficient file browsing using key value databases for virtual backups
US20180285004A1 (en) * 2017-03-31 2018-10-04 International Business Machines Corporation Dynamically reacting to events within a data storage system
US10109094B2 (en) 2015-12-21 2018-10-23 Palantir Technologies Inc. Interface to index and display geospatial data
US10270727B2 (en) 2016-12-20 2019-04-23 Palantir Technologies, Inc. Short message communication within a mobile graphical map
US10346799B2 (en) 2016-05-13 2019-07-09 Palantir Technologies Inc. System to catalogue tracking data
US10371537B1 (en) 2017-11-29 2019-08-06 Palantir Technologies Inc. Systems and methods for flexible route planning
US10429197B1 (en) 2018-05-29 2019-10-01 Palantir Technologies Inc. Terrain analysis for automatic route determination
US10437850B1 (en) 2015-06-03 2019-10-08 Palantir Technologies Inc. Server implemented geographic information system with graphical interface
US10467435B1 (en) 2018-10-24 2019-11-05 Palantir Technologies Inc. Approaches for managing restrictions for middleware applications
US10474631B2 (en) * 2009-06-26 2019-11-12 Hewlett Packard Enterprise Company Method and apparatus for content derived data placement in memory
US10515433B1 (en) 2016-12-13 2019-12-24 Palantir Technologies Inc. Zoom-adaptive data granularity to achieve a flexible high-performance interface for a geospatial mapping system
US10521309B1 (en) * 2013-12-23 2019-12-31 EMC IP Holding Company LLC Optimized filesystem walk for backup operations
US10540516B2 (en) 2016-10-13 2020-01-21 Commvault Systems, Inc. Data protection within an unsecured storage environment
US10579239B1 (en) 2017-03-23 2020-03-03 Palantir Technologies Inc. Systems and methods for production and display of dynamically linked slide presentations
US10642886B2 (en) 2018-02-14 2020-05-05 Commvault Systems, Inc. Targeted search of backup data using facial recognition
US10698756B1 (en) 2017-12-15 2020-06-30 Palantir Technologies Inc. Linking related events for various devices and services in computer log files on a centralized server
US10706434B1 (en) 2015-09-01 2020-07-07 Palantir Technologies Inc. Methods and systems for determining location information
US10769103B1 (en) * 2017-10-06 2020-09-08 EMC IP Holding Company LLC Efficient content indexing of incremental block-based backups
US10795723B2 (en) 2014-03-04 2020-10-06 Palantir Technologies Inc. Mobile tasks
US10830599B2 (en) 2018-04-03 2020-11-10 Palantir Technologies Inc. Systems and methods for alternative projections of geographical information
US10896234B2 (en) 2018-03-29 2021-01-19 Palantir Technologies Inc. Interactive geographical map
US10895946B2 (en) 2017-05-30 2021-01-19 Palantir Technologies Inc. Systems and methods for using tiled data
US10896208B1 (en) 2016-08-02 2021-01-19 Palantir Technologies Inc. Mapping content delivery
US10915498B2 (en) 2017-03-30 2021-02-09 International Business Machines Corporation Dynamically managing a high speed storage tier of a data storage system
US10984041B2 (en) 2017-05-11 2021-04-20 Commvault Systems, Inc. Natural language processing integrated with database and data storage management
US11016941B2 (en) 2014-02-28 2021-05-25 Red Hat, Inc. Delayed asynchronous file replication in a distributed file system
US11025672B2 (en) 2018-10-25 2021-06-01 Palantir Technologies Inc. Approaches for securing middleware data access
US11035690B2 (en) 2009-07-27 2021-06-15 Palantir Technologies Inc. Geotagging structured data
US11100174B2 (en) 2013-11-11 2021-08-24 Palantir Technologies Inc. Simple web search
US11159469B2 (en) 2018-09-12 2021-10-26 Commvault Systems, Inc. Using machine learning to modify presentation of mailbox objects
US20210349853A1 (en) * 2020-05-11 2021-11-11 Cohesity, Inc. Asynchronous deletion of large directories
US11334216B2 (en) 2017-05-30 2022-05-17 Palantir Technologies Inc. Systems and methods for visually presenting geospatial information
US11442820B2 (en) 2005-12-19 2022-09-13 Commvault Systems, Inc. Systems and methods of unified reconstruction in storage systems
US11494417B2 (en) 2020-08-07 2022-11-08 Commvault Systems, Inc. Automated email classification in an information management system
US11585672B1 (en) 2018-04-11 2023-02-21 Palantir Technologies Inc. Three-dimensional representations of routes
US11599706B1 (en) 2017-12-06 2023-03-07 Palantir Technologies Inc. Systems and methods for providing a view of geospatial information

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4749266B2 (en) * 2006-07-27 2011-08-17 株式会社日立製作所 Backup control apparatus and method without duplication of information resources
US8606751B1 (en) * 2009-12-21 2013-12-10 Emc Corporation System and method for backup by inode number
EP2548122B1 (en) * 2010-03-16 2021-06-09 BlackBerry Limited Highly scalable and distributed data de-duplication
US9069593B2 (en) * 2011-06-23 2015-06-30 Salesforce.Com, Inc. Systems and methods for deletion of untracked datastore paths
US8438146B2 (en) 2011-06-30 2013-05-07 International Business Machines Corporation Generating containers for electronic records based on configurable parameters
US9031909B2 (en) * 2011-11-29 2015-05-12 Microsoft Technology Licensing, Llc Provisioning and/or synchronizing using common metadata
US9774676B2 (en) 2012-05-21 2017-09-26 Google Inc. Storing and moving data in a distributed storage system
US9449006B2 (en) 2012-06-04 2016-09-20 Google Inc. Method and system for deleting obsolete files from a file system
US9747310B2 (en) 2012-06-04 2017-08-29 Google Inc. Systems and methods of increasing database access concurrency using granular timestamps
US9659038B2 (en) * 2012-06-04 2017-05-23 Google Inc. Efficient snapshot read of a database in a distributed storage system
US9230000B1 (en) 2012-06-04 2016-01-05 Google Inc. Pipelining Paxos state machines
US9218494B2 (en) 2013-10-16 2015-12-22 Citrix Systems, Inc. Secure client drive mapping and file storage system for mobile device management type security
US9824233B2 (en) 2015-11-17 2017-11-21 International Business Machines Corporation Posixly secure open and access files by inode number
US10248352B2 (en) * 2016-09-15 2019-04-02 International Business Machines Corporation Management of object location in hierarchical storage
US11797600B2 (en) * 2020-11-18 2023-10-24 Ownbackup Ltd. Time-series analytics for database management systems

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055534A (en) * 1995-07-20 2000-04-25 Fuji Xerox Co., Ltd. File management system and file management method
US6189016B1 (en) * 1998-06-12 2001-02-13 Microsoft Corporation Journaling ordered changes in a storage volume
US20020111956A1 (en) * 2000-09-18 2002-08-15 Boon-Lock Yeo Method and apparatus for self-management of content across multiple storage systems
US6460055B1 (en) * 1999-12-16 2002-10-01 Livevault Corporation Systems and methods for backing up data files
US6549916B1 (en) * 1999-08-05 2003-04-15 Oracle Corporation Event notification system tied to a file system
US6560615B1 (en) * 1999-12-17 2003-05-06 Novell, Inc. Method and apparatus for implementing a highly efficient, robust modified files list (MFL) for a storage system volume
US20030187881A1 (en) * 2002-03-29 2003-10-02 Fujitsu Limited Electronic document management method and program
US20050033777A1 (en) * 2003-08-04 2005-02-10 Moraes Mark A. Tracking, recording and organizing changes to data in computer systems
US20050149493A1 (en) * 2004-01-07 2005-07-07 Kazuhiko Yamashita Data recording apparatus and data recording method
US6922708B1 (en) * 1999-02-18 2005-07-26 Oracle International Corporation File system that supports transactions
US20050198076A1 (en) * 2003-10-17 2005-09-08 Stata Raymond P. Systems and methods for indexing content for fast and scalable retrieval
US7162601B2 (en) * 2003-06-26 2007-01-09 Hitachi, Ltd. Method and apparatus for backup and recovery system using storage based journaling
US7225208B2 (en) * 2003-09-30 2007-05-29 Iron Mountain Incorporated Systems and methods for backing up data files
US20070226279A1 (en) * 2004-01-23 2007-09-27 Barton Edward M Method and system for backing up files

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067541A (en) * 1997-09-17 2000-05-23 Microsoft Corporation Monitoring document changes in a file system of documents with the document change information stored in a persistent log
US7890469B1 (en) * 2002-12-30 2011-02-15 Symantec Operating Corporation File change log
US7152076B2 (en) * 2003-01-23 2006-12-19 Microsoft Corporation System and method for efficient multi-master replication
US7158991B2 (en) * 2003-09-30 2007-01-02 Veritas Operating Corporation System and method for maintaining temporal data in data storage
US7620624B2 (en) * 2003-10-17 2009-11-17 Yahoo! Inc. Systems and methods for indexing content for fast and scalable retrieval
US7630924B1 (en) * 2005-04-20 2009-12-08 Authorize.Net Llc Transaction velocity counting for fraud detection

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055534A (en) * 1995-07-20 2000-04-25 Fuji Xerox Co., Ltd. File management system and file management method
US6189016B1 (en) * 1998-06-12 2001-02-13 Microsoft Corporation Journaling ordered changes in a storage volume
US6922708B1 (en) * 1999-02-18 2005-07-26 Oracle International Corporation File system that supports transactions
US6549916B1 (en) * 1999-08-05 2003-04-15 Oracle Corporation Event notification system tied to a file system
US6460055B1 (en) * 1999-12-16 2002-10-01 Livevault Corporation Systems and methods for backing up data files
US6560615B1 (en) * 1999-12-17 2003-05-06 Novell, Inc. Method and apparatus for implementing a highly efficient, robust modified files list (MFL) for a storage system volume
US20020111956A1 (en) * 2000-09-18 2002-08-15 Boon-Lock Yeo Method and apparatus for self-management of content across multiple storage systems
US20030187881A1 (en) * 2002-03-29 2003-10-02 Fujitsu Limited Electronic document management method and program
US7162601B2 (en) * 2003-06-26 2007-01-09 Hitachi, Ltd. Method and apparatus for backup and recovery system using storage based journaling
US20050033777A1 (en) * 2003-08-04 2005-02-10 Moraes Mark A. Tracking, recording and organizing changes to data in computer systems
US7225208B2 (en) * 2003-09-30 2007-05-29 Iron Mountain Incorporated Systems and methods for backing up data files
US20050198076A1 (en) * 2003-10-17 2005-09-08 Stata Raymond P. Systems and methods for indexing content for fast and scalable retrieval
US20050149493A1 (en) * 2004-01-07 2005-07-07 Kazuhiko Yamashita Data recording apparatus and data recording method
US20070226279A1 (en) * 2004-01-23 2007-09-27 Barton Edward M Method and system for backing up files

Cited By (138)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7519593B2 (en) * 2005-10-31 2009-04-14 Kabushiki Kaisha Toshiba Data searching system, method of synchronizing metadata and data searching apparatus
US20070100807A1 (en) * 2005-10-31 2007-05-03 Kabushiki Kaisha Toshiba Data searching system, method of synchronizing metadata and data searching apparatus
US9098542B2 (en) 2005-11-28 2015-08-04 Commvault Systems, Inc. Systems and methods for using metadata to enhance data identification operations
US8725737B2 (en) 2005-11-28 2014-05-13 Commvault Systems, Inc. Systems and methods for using metadata to enhance data identification operations
US8832406B2 (en) 2005-11-28 2014-09-09 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
US20100205150A1 (en) * 2005-11-28 2010-08-12 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
US20110047180A1 (en) * 2005-11-28 2011-02-24 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
US11256665B2 (en) 2005-11-28 2022-02-22 Commvault Systems, Inc. Systems and methods for using metadata to enhance data identification operations
US10198451B2 (en) 2005-11-28 2019-02-05 Commvault Systems, Inc. Systems and methods for using metadata to enhance data identification operations
US8010769B2 (en) 2005-11-28 2011-08-30 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
US8612714B2 (en) 2005-11-28 2013-12-17 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
US9606994B2 (en) 2005-11-28 2017-03-28 Commvault Systems, Inc. Systems and methods for using metadata to enhance data identification operations
US8285964B2 (en) 2005-11-28 2012-10-09 Commvault Systems, Inc. Systems and methods for classifying and transferring information in a storage network
US9996430B2 (en) 2005-12-19 2018-06-12 Commvault Systems, Inc. Systems and methods of unified reconstruction in storage systems
US9633064B2 (en) 2005-12-19 2017-04-25 Commvault Systems, Inc. Systems and methods of unified reconstruction in storage systems
US8930496B2 (en) 2005-12-19 2015-01-06 Commvault Systems, Inc. Systems and methods of unified reconstruction in storage systems
US11442820B2 (en) 2005-12-19 2022-09-13 Commvault Systems, Inc. Systems and methods of unified reconstruction in storage systems
US20070229914A1 (en) * 2006-04-04 2007-10-04 Noriko Matsuzawa Image processing apparatus, control method thereof, and program
US7983513B2 (en) * 2006-04-04 2011-07-19 Canon Kabushiki Kaisha Image processing apparatus, control method thereof, and program
US20080005187A1 (en) * 2006-06-30 2008-01-03 International Business Machines Corporation Methods and apparatus for managing configuration management database via composite configuration item change history
US9158835B2 (en) 2006-10-17 2015-10-13 Commvault Systems, Inc. Method and system for offline indexing of content and classifying stored data
US10783129B2 (en) 2006-10-17 2020-09-22 Commvault Systems, Inc. Method and system for offline indexing of content and classifying stored data
US9509652B2 (en) 2006-11-28 2016-11-29 Commvault Systems, Inc. Method and system for displaying similar email messages based on message contents
US9967338B2 (en) 2006-11-28 2018-05-08 Commvault Systems, Inc. Method and system for displaying similar email messages based on message contents
US9639529B2 (en) 2006-12-22 2017-05-02 Commvault Systems, Inc. Method and system for searching stored data
US8234249B2 (en) 2006-12-22 2012-07-31 Commvault Systems, Inc. Method and system for searching stored data
US8615523B2 (en) 2006-12-22 2013-12-24 Commvault Systems, Inc. Method and system for searching stored data
US8024354B2 (en) * 2008-06-30 2011-09-20 International Business Machines Corporation System and method for managing data using a hierarchical metadata management system
US20090327340A1 (en) * 2008-06-30 2009-12-31 International Business Machines Corporation System and Method for Managing Data Using a Hierarchical Metadata Management System
US10708353B2 (en) 2008-08-29 2020-07-07 Commvault Systems, Inc. Method and system for displaying similar email messages based on message contents
US11516289B2 (en) 2008-08-29 2022-11-29 Commvault Systems, Inc. Method and system for displaying similar email messages based on message contents
US11082489B2 (en) 2008-08-29 2021-08-03 Commvault Systems, Inc. Method and system for displaying similar email messages based on message contents
US10474631B2 (en) * 2009-06-26 2019-11-12 Hewlett Packard Enterprise Company Method and apparatus for content derived data placement in memory
US11035690B2 (en) 2009-07-27 2021-06-15 Palantir Technologies Inc. Geotagging structured data
US8392386B2 (en) 2009-08-05 2013-03-05 International Business Machines Corporation Tracking file contents
US9047296B2 (en) 2009-12-31 2015-06-02 Commvault Systems, Inc. Asynchronous methods of data classification using change journals and other data structures
WO2011082113A1 (en) * 2009-12-31 2011-07-07 Commvault Systems, Inc. Asynchronous methods of data classification using change journals and other data structures
US8442983B2 (en) 2009-12-31 2013-05-14 Commvault Systems, Inc. Asynchronous methods of data classification using change journals and other data structures
US11003626B2 (en) 2011-03-31 2021-05-11 Commvault Systems, Inc. Creating secondary copies of data based on searches for content
US8719264B2 (en) 2011-03-31 2014-05-06 Commvault Systems, Inc. Creating secondary copies of data based on searches for content
US10372675B2 (en) 2011-03-31 2019-08-06 Commvault Systems, Inc. Creating secondary copies of data based on searches for content
US20160124815A1 (en) 2011-06-30 2016-05-05 Emc Corporation Efficient backup of virtual data
US10394758B2 (en) * 2011-06-30 2019-08-27 EMC IP Holding Company LLC File deletion detection in key value databases for virtual backups
US10275315B2 (en) 2011-06-30 2019-04-30 EMC IP Holding Company LLC Efficient backup of virtual data
US9684473B2 (en) 2011-06-30 2017-06-20 EMC IP Holding Company LLC Virtual machine disaster recovery
US9864656B1 (en) 2011-06-30 2018-01-09 EMC IP Holding Company LLC Key value databases for virtual backups
US10089190B2 (en) 2011-06-30 2018-10-02 EMC IP Holding Company LLC Efficient file browsing using key value databases for virtual backups
US9916324B2 (en) 2011-06-30 2018-03-13 EMC IP Holding Company LLC Updating key value databases for virtual backups
US20150046401A1 (en) * 2011-06-30 2015-02-12 Emc Corporation File deletion detection in key value databases for virtual backups
US20130185503A1 (en) * 2012-01-12 2013-07-18 Vigneshwara Bhatta Method for metadata persistence
US9244964B2 (en) 2012-05-21 2016-01-26 International Business Machines Corporation Determining a cause of an incident based on text analytics of documents
US10372672B2 (en) 2012-06-08 2019-08-06 Commvault Systems, Inc. Auto summarization of content
US11580066B2 (en) 2012-06-08 2023-02-14 Commvault Systems, Inc. Auto summarization of content for use in new storage policies
US9418149B2 (en) 2012-06-08 2016-08-16 Commvault Systems, Inc. Auto summarization of content
US11036679B2 (en) 2012-06-08 2021-06-15 Commvault Systems, Inc. Auto summarization of content
US8892523B2 (en) 2012-06-08 2014-11-18 Commvault Systems, Inc. Auto summarization of content
US9977788B2 (en) * 2012-09-14 2018-05-22 Salesforce.Com, Inc. Methods and systems for managing files in an on-demand system
US20140082033A1 (en) * 2012-09-14 2014-03-20 Salesforce.Com, Inc. Methods and systems for managing files in an on-demand system
US9501507B1 (en) * 2012-12-27 2016-11-22 Palantir Technologies Inc. Geo-temporal indexing and searching
US10691662B1 (en) 2012-12-27 2020-06-23 Palantir Technologies Inc. Geo-temporal indexing and searching
US9380431B1 (en) 2013-01-31 2016-06-28 Palantir Technologies, Inc. Use of teams in a mobile application
US9953445B2 (en) 2013-05-07 2018-04-24 Palantir Technologies Inc. Interactive data object map
US10360705B2 (en) 2013-05-07 2019-07-23 Palantir Technologies Inc. Interactive data object map
US20150012496A1 (en) * 2013-07-04 2015-01-08 Fujitsu Limited Storage device and method for controlling storage device
US20150134900A1 (en) * 2013-11-08 2015-05-14 Mei-Ling Lin Cache efficiency in a shared disk database cluster
US9262415B2 (en) * 2013-11-08 2016-02-16 Sybase, Inc. Cache efficiency in a shared disk database cluster
US11100174B2 (en) 2013-11-11 2021-08-24 Palantir Technologies Inc. Simple web search
US10521309B1 (en) * 2013-12-23 2019-12-31 EMC IP Holding Company LLC Optimized filesystem walk for backup operations
US11726884B2 (en) 2013-12-23 2023-08-15 EMC IP Holding Company LLC Optimized filesystem walk for backup operations
US11016941B2 (en) 2014-02-28 2021-05-25 Red Hat, Inc. Delayed asynchronous file replication in a distributed file system
US10795723B2 (en) 2014-03-04 2020-10-06 Palantir Technologies Inc. Mobile tasks
US10025808B2 (en) 2014-03-19 2018-07-17 Red Hat, Inc. Compacting change logs using file content location identifiers
US11064025B2 (en) 2014-03-19 2021-07-13 Red Hat, Inc. File replication using file content location identifiers
US20150269214A1 (en) * 2014-03-19 2015-09-24 Red Hat, Inc. Identifying files in change logs using file content location identifiers
US9965505B2 (en) * 2014-03-19 2018-05-08 Red Hat, Inc. Identifying files in change logs using file content location identifiers
US9986029B2 (en) 2014-03-19 2018-05-29 Red Hat, Inc. File replication using file content location identifiers
US20160078076A1 (en) * 2014-09-15 2016-03-17 International Business Machines Corporation Parallel container and record organization
US20160078026A1 (en) * 2014-09-15 2016-03-17 International Business Machines Corporation Parallel container and record organization
US10324898B2 (en) * 2014-09-15 2019-06-18 International Business Machines Corporation Parallel container and record organization
US9619476B2 (en) * 2014-09-15 2017-04-11 International Business Machines Corporation Parallel container and record organization
US10459619B2 (en) 2015-03-16 2019-10-29 Palantir Technologies Inc. Interactive user interfaces for location-based data analysis
US9891808B2 (en) 2015-03-16 2018-02-13 Palantir Technologies Inc. Interactive user interfaces for location-based data analysis
US10437850B1 (en) 2015-06-03 2019-10-08 Palantir Technologies Inc. Server implemented geographic information system with graphical interface
US20180137134A1 (en) * 2015-07-14 2018-05-17 Alibaba Group Holding Limited Data snapshot acquisition method and system
US10444941B2 (en) 2015-08-17 2019-10-15 Palantir Technologies Inc. Interactive geospatial map
US9600146B2 (en) 2015-08-17 2017-03-21 Palantir Technologies Inc. Interactive geospatial map
US10444940B2 (en) 2015-08-17 2019-10-15 Palantir Technologies Inc. Interactive geospatial map
US10706434B1 (en) 2015-09-01 2020-07-07 Palantir Technologies Inc. Methods and systems for determining location information
US9639580B1 (en) 2015-09-04 2017-05-02 Palantir Technologies, Inc. Computer-implemented systems and methods for data management and visualization
US9996553B1 (en) 2015-09-04 2018-06-12 Palantir Technologies Inc. Computer-implemented systems and methods for data management and visualization
US11238632B2 (en) 2015-12-21 2022-02-01 Palantir Technologies Inc. Interface to index and display geospatial data
US10109094B2 (en) 2015-12-21 2018-10-23 Palantir Technologies Inc. Interface to index and display geospatial data
US10733778B2 (en) 2015-12-21 2020-08-04 Palantir Technologies Inc. Interface to index and display geospatial data
US10346799B2 (en) 2016-05-13 2019-07-09 Palantir Technologies Inc. System to catalogue tracking data
US11652880B2 (en) 2016-08-02 2023-05-16 Palantir Technologies Inc. Mapping content delivery
US10896208B1 (en) 2016-08-02 2021-01-19 Palantir Technologies Inc. Mapping content delivery
US11443061B2 (en) 2016-10-13 2022-09-13 Commvault Systems, Inc. Data protection within an unsecured storage environment
US10540516B2 (en) 2016-10-13 2020-01-21 Commvault Systems, Inc. Data protection within an unsecured storage environment
US11042959B2 (en) 2016-12-13 2021-06-22 Palantir Technologies Inc. Zoom-adaptive data granularity to achieve a flexible high-performance interface for a geospatial mapping system
US11663694B2 (en) 2016-12-13 2023-05-30 Palantir Technologies Inc. Zoom-adaptive data granularity to achieve a flexible high-performance interface for a geospatial mapping system
US10515433B1 (en) 2016-12-13 2019-12-24 Palantir Technologies Inc. Zoom-adaptive data granularity to achieve a flexible high-performance interface for a geospatial mapping system
US10541959B2 (en) 2016-12-20 2020-01-21 Palantir Technologies Inc. Short message communication within a mobile graphical map
US10270727B2 (en) 2016-12-20 2019-04-23 Palantir Technologies, Inc. Short message communication within a mobile graphical map
US20180181584A1 (en) * 2016-12-23 2018-06-28 Nexenta Systems, Inc. Method and system for maintaining and searching index records
US11054975B2 (en) 2017-03-23 2021-07-06 Palantir Technologies Inc. Systems and methods for production and display of dynamically linked slide presentations
US11487414B2 (en) 2017-03-23 2022-11-01 Palantir Technologies Inc. Systems and methods for production and display of dynamically linked slide presentations
US10579239B1 (en) 2017-03-23 2020-03-03 Palantir Technologies Inc. Systems and methods for production and display of dynamically linked slide presentations
US10915498B2 (en) 2017-03-30 2021-02-09 International Business Machines Corporation Dynamically managing a high speed storage tier of a data storage system
US10795575B2 (en) * 2017-03-31 2020-10-06 International Business Machines Corporation Dynamically reacting to events within a data storage system
US20180285004A1 (en) * 2017-03-31 2018-10-04 International Business Machines Corporation Dynamically reacting to events within a data storage system
US10984041B2 (en) 2017-05-11 2021-04-20 Commvault Systems, Inc. Natural language processing integrated with database and data storage management
US10895946B2 (en) 2017-05-30 2021-01-19 Palantir Technologies Inc. Systems and methods for using tiled data
US11809682B2 (en) 2017-05-30 2023-11-07 Palantir Technologies Inc. Systems and methods for visually presenting geospatial information
US11334216B2 (en) 2017-05-30 2022-05-17 Palantir Technologies Inc. Systems and methods for visually presenting geospatial information
US10769103B1 (en) * 2017-10-06 2020-09-08 EMC IP Holding Company LLC Efficient content indexing of incremental block-based backups
US10371537B1 (en) 2017-11-29 2019-08-06 Palantir Technologies Inc. Systems and methods for flexible route planning
US11199416B2 (en) 2017-11-29 2021-12-14 Palantir Technologies Inc. Systems and methods for flexible route planning
US11599706B1 (en) 2017-12-06 2023-03-07 Palantir Technologies Inc. Systems and methods for providing a view of geospatial information
US10698756B1 (en) 2017-12-15 2020-06-30 Palantir Technologies Inc. Linking related events for various devices and services in computer log files on a centralized server
US10642886B2 (en) 2018-02-14 2020-05-05 Commvault Systems, Inc. Targeted search of backup data using facial recognition
US10896234B2 (en) 2018-03-29 2021-01-19 Palantir Technologies Inc. Interactive geographical map
US10830599B2 (en) 2018-04-03 2020-11-10 Palantir Technologies Inc. Systems and methods for alternative projections of geographical information
US11280626B2 (en) 2018-04-03 2022-03-22 Palantir Technologies Inc. Systems and methods for alternative projections of geographical information
US11774254B2 (en) 2018-04-03 2023-10-03 Palantir Technologies Inc. Systems and methods for alternative projections of geographical information
US11585672B1 (en) 2018-04-11 2023-02-21 Palantir Technologies Inc. Three-dimensional representations of routes
US11703339B2 (en) 2018-05-29 2023-07-18 Palantir Technologies Inc. Terrain analysis for automatic route determination
US10697788B2 (en) 2018-05-29 2020-06-30 Palantir Technologies Inc. Terrain analysis for automatic route determination
US11274933B2 (en) 2018-05-29 2022-03-15 Palantir Technologies Inc. Terrain analysis for automatic route determination
US10429197B1 (en) 2018-05-29 2019-10-01 Palantir Technologies Inc. Terrain analysis for automatic route determination
US11159469B2 (en) 2018-09-12 2021-10-26 Commvault Systems, Inc. Using machine learning to modify presentation of mailbox objects
US11138342B2 (en) 2018-10-24 2021-10-05 Palantir Technologies Inc. Approaches for managing restrictions for middleware applications
US10467435B1 (en) 2018-10-24 2019-11-05 Palantir Technologies Inc. Approaches for managing restrictions for middleware applications
US11681829B2 (en) 2018-10-24 2023-06-20 Palantir Technologies Inc. Approaches for managing restrictions for middleware applications
US11025672B2 (en) 2018-10-25 2021-06-01 Palantir Technologies Inc. Approaches for securing middleware data access
US11818171B2 (en) 2018-10-25 2023-11-14 Palantir Technologies Inc. Approaches for securing middleware data access
US11500817B2 (en) * 2020-05-11 2022-11-15 Cohesity, Inc. Asynchronous deletion of large directories
US20210349853A1 (en) * 2020-05-11 2021-11-11 Cohesity, Inc. Asynchronous deletion of large directories
US11494417B2 (en) 2020-08-07 2022-11-08 Commvault Systems, Inc. Automated email classification in an information management system

Also Published As

Publication number Publication date
US20130297610A1 (en) 2013-11-07
US20080201366A1 (en) 2008-08-21
US8548965B2 (en) 2013-10-01

Similar Documents

Publication Publication Date Title
US8548965B2 (en) Changed files list with time buckets for efficient storage management
US6496944B1 (en) Method for database assisted file system restore
US8250033B1 (en) Replication of a data set using differential snapshots
US11914485B2 (en) Restoration of specified content from an archive
US8965850B2 (en) Method of and system for merging, storing and retrieving incremental backup data
US7860907B2 (en) Data processing
US8484172B2 (en) Efficient search for migration and purge candidates
US8874517B2 (en) Summarizing file system operations with a file system journal
US9189342B1 (en) Generic process for determining child to parent inheritance for fast provisioned or linked clone virtual machines
US8433863B1 (en) Hybrid method for incremental backup of structured and unstructured files
US8341123B2 (en) Event structured file system (ESFS)
US20160283501A1 (en) Posix-compatible file system, method of creating a file list and storage device
US9043280B1 (en) System and method to repair file system metadata
KR20090110823A (en) System for automatically shadowing data and file directory structures that are recorded on a computer memory
US8090925B2 (en) Storing data streams in memory based on upper and lower stream size thresholds
CA2534170A1 (en) Event structured file system (esfs)
WO2008001094A1 (en) Data processing
CN112800019A (en) Data backup method and system based on Hadoop distributed file system
EP1977348A1 (en) Event structured file system (esfs)
US8176087B2 (en) Data processing
US7752176B1 (en) Selective data restoration
RU2406118C2 (en) Method and system for synthetic backup and restoration of data
US8886656B2 (en) Data processing
US8290993B2 (en) Data processing
KR101335881B1 (en) Method of file recovery

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DEVARAKONDA, MURTHY V.;FILZ, FRANK STEWART;KAPLAN, MARC ADAM;AND OTHERS;REEL/FRAME:016273/0177;SIGNING DATES FROM 20050510 TO 20050511

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION