US20060085394A1 - Methods and apparatus for processing a database query - Google Patents

Methods and apparatus for processing a database query Download PDF

Info

Publication number
US20060085394A1
US20060085394A1 US10/965,184 US96518404A US2006085394A1 US 20060085394 A1 US20060085394 A1 US 20060085394A1 US 96518404 A US96518404 A US 96518404A US 2006085394 A1 US2006085394 A1 US 2006085394A1
Authority
US
United States
Prior art keywords
query
data
result
database
computed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/965,184
Inventor
Eric Barsness
John Santosuosso
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Airbnb Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US10/965,184 priority Critical patent/US20060085394A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SANTOSUOSSO, JOHN MATTHEW, BARSNESS, ERIC LAWRENCE`
Priority to CNB2005101140113A priority patent/CN100440212C/en
Publication of US20060085394A1 publication Critical patent/US20060085394A1/en
Priority to US13/560,992 priority patent/US8515993B2/en
Assigned to Airbnb, Inc. reassignment Airbnb, Inc. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24557Efficient disk access during query execution

Definitions

  • the present invention relates generally to computer systems, and more particularly to methods and apparatus for processing a database query.
  • a database frequently receives a query for information, such as a sum, average, or the like, from a range of data in a database.
  • information such as a sum, average, or the like
  • the database accesses every record, each of which includes one or more data entries, in the range of data to process the query.
  • a method for processing a database query results in large processing overhead.
  • a first method includes the steps of (1) pre-computing a query result for each of a plurality of whole segments of data included in a database; (2) receiving a query specifying a defined range of data in the database; (3) determining if any of the whole segments are within the defined range; (4) performing the query on any partial segments of data within the defined range; and (5) determining the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range.
  • a method of processing a database query for information includes the steps of (1) receiving a request to perform a query on a first portion of data in a database; (2) determining the result of the query on the first portion of the database is based on at least one pre-computed result, wherein the pre-computed result is a result of the query on a second portion of data in the database that is included in the first portion; (3) performing the query on a third portion of data in the database, wherein the third portion is a portion of the first portion other than the second portion; and (4) computing the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion.
  • Each computer program product described herein may be carried by a medium readable by a computer (e.g., a carrier wave signal, a floppy disc, a compact disc, a DVD, a hard drive, a random access memory, etc.).
  • a medium readable by a computer e.g., a carrier wave signal, a floppy disc, a compact disc, a DVD, a hard drive, a random access memory, etc.
  • FIG. 1 is block diagram of an apparatus for processing a database query in accordance with an embodiment of the present invention.
  • FIG. 2 is a block diagram of an example of a database that may be used in an apparatus for processing a database query in accordance with an embodiment of the present invention.
  • FIG. 3 is a first exemplary method for processing a database query in accordance with an embodiment of the present invention.
  • FIG. 4 is a second exemplary method for processing a database query in accordance with an embodiment of the present invention.
  • the present invention provides methods and apparatus for efficiently processing a database query. More specifically, the present methods and apparatus avoid having to access a large portion of a range (e.g., the entire range) specified by the database query. Consequently, the present methods and apparatus provide performance gains for processing database queries compared to conventional database query processing.
  • a range e.g., the entire range
  • FIG. 1 is block diagram of an apparatus for processing a database query in accordance with an embodiment of the present invention.
  • the apparatus 100 for processing a database query includes a memory 102 for storing data coupled to a processor 104 .
  • the processor 104 is coupled to a storage device 106 , such as a disk drive or the like, which includes a database 108 . Details of the database 108 are described below with reference to FIG. 2 .
  • the processor 104 is adapted to execute code, which is adapted to efficiently query the database 108 . More specifically, the processor 104 is adapted to execute code adapted to perform a first and/or second exemplary method in accordance with an embodiment of the present invention. Details of the first and second exemplary methods are described below with reference to FIGS. 3 and 4 , respectively.
  • the apparatus 100 may include a communication (e.g., com) port 110 for communicating with other apparatus.
  • a communication e.g., com
  • FIG. 2 is a block diagram of an example of a database that may be used in an apparatus for processing a database query in accordance with an embodiment of the present invention.
  • the database 108 may store data relevant to, for example, a specific customer or application. Such data may be organized in a manner specified by a user, such as a database administrator.
  • the database 108 may include a first column 202 of data representing dates. More specifically, the first column 202 includes a data entry (e.g., in a row corresponding to the column) for each quarter from the first quarter of year 2000 to the third quarter of year 2004.
  • the database 108 includes a second column 204 of data representing revenue.
  • the second column 204 of data includes a data entry, corresponding to each quarter in the first column 202 , representing revenue.
  • the database 108 includes data based on (e.g., computed from) other data in the database 108 . More specifically, the database 108 may include a third column 206 of data, each entry of which represents total revenue for a year (e.g., a sum of the revenue data for each quarter of the year). Similarly, the database 108 may include a fourth column 208 of data, each entry of which represents average quarterly revenue for a year (e.g., an average of the revenue data for the quarters of the year). For example, each data entry in the third 206 and/or fourth column 208 is based on data entries in the second column 204 that correspond to quarters of a particular year from the first column 202 .
  • data in the database 108 may be grouped together to form one or more (e.g., a plurality) of whole segments 210 - 216 .
  • data in the database 108 of FIG. 2 includes four whole segments 210 - 216
  • data in the database 108 may include a larger or smaller number of whole segments.
  • a whole segment may include a larger or smaller amount of data than the whole segments 210 - 216 illustrated in FIG. 2 .
  • the structure of the database 108 (e.g., arrangement of data in rows and columns) described above is exemplary.
  • the database 108 may be structured in a different manner.
  • the database 108 may be a relational database, flat-file database, object database, distributed database, or the like.
  • the description of data populating the database 108 is exemplary.
  • the database 108 may include different data.
  • step 302 the method 300 begins.
  • step 304 a query result for each of a plurality of whole segments of data included in a database is pre-computed.
  • the database 108 may be grouped into one or more (e.g., a plurality of) whole segments.
  • data entries of the second column 204 of the database 108 that correspond to quarters from a particular year (e.g., 2000, 2001, 2002 or 2003) of the first column 202 may be grouped into respective whole segments 210 - 216 (shown in phantom). Therefore, the example database 108 includes four whole segments.
  • a query for information, such as a sum, average, or the like, from the database e.g., from a range of data in the database
  • the database accesses every record, which includes one or more entries, in the range of data to process the query.
  • a query for information such as a sum, average, or the like, is performed on each whole segment 210 - 216 in the database 108 (e.g., in advance of computing a result for the query for information from the range of data).
  • the result of the query on each of the plurality of whole segments 210 - 216 is computed before receiving and/or computing the result of a query for information from a range of data in the database (e.g., pre-computed).
  • Such result may be stored as an entry in the database 108 (e.g., in the third 206 or fourth column 208 of the database 108 ).
  • hybrid index would store the pre-computed values as metadata, and queries using the index would retrieve the metadata for whole segments from the index rather than directly from the table.
  • the sum of data in each whole segment 210 - 216 which represents total revenue for years 2000, 2001, 2002 and 2003, respectively, is computed. Additionally or alternatively, the average of data in each whole segment 210 - 216 , which represents average quarterly revenue for years 2000, 2001, 2002 and 2003, respectively, is computed.
  • the third column 206 of the database 108 stores annual data and the fourth column 208 of average quarterly data
  • the hybrid index may be configured to store data representing other periods of time (e.g., daily, monthly, etc.).
  • a query specifying a defined range of data in the database is received. More specifically, as stated above, a query for information such as a sum, average, or the like, from a range of data in the database may be received by the database. For example, the database may be queried for total revenue from the third quarter of year 2000 to the second quarter of year 2003.
  • step 308 it is determined if any whole segments are within the defined range. For example, the apparatus 100 determines that two whole segments 212 - 214 are within the predefined range. Because the result of such query for each whole segment 210 - 216 in the database 108 is pre-computed, by determining whether any whole segments 210 - 216 are within the defined range, the apparatus 100 may avoid having to access every record in the defined range to process the query. More specifically, the apparatus 100 may not need to access every record corresponding to such whole segments 212 - 214 (e.g., in response to the query) to process the query.
  • the query is performed on any partial segments of data within the defined range. More specifically, the apparatus 100 may determine the defined range includes one or more partial segments (e.g., parts of a whole segment) of data. In the example above, the apparatus 100 determines the defined range includes a part (e.g., revenue for the last two quarters of year 2000) of the whole segment 210 that represents revenue for year 2000 and includes a part (e.g., revenue for the first two quarters of year 2003) of the whole segment 216 that represents the revenue for year 2003. The apparatus 100 performs the query on a range defined by each such partial segment. For example, the apparatus 100 may determine total revenue for the third and fourth quarter of year 2000 by accessing the entries in such range. Similarly, the apparatus 100 may determine total revenue for the first and second quarter of year 2003 by accessing the entries in such range. In this manner, the result of the query on any partial segments of data within the defined range is determined.
  • the apparatus 100 may determine the defined range includes one or more partial segments (e.g., parts of a whole segment
  • the result of the query specifying the defined range is determined based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the queries on any partial segments within the defined range. More specifically, the pre-computed query result for each whole section 212 - 214 determined to be within the defined range is accessed. For example, entries in the third column 206 corresponding to the total revenue for year 2001 (e.g., “8000”) and 2002 (e.g., “12000”) are accessed. Such entries are added to the result of the query on the partial segments to produce the result of the query specifying the defined range. In embodiments which employ a hybrid index, a pre-computed query result in the hybrid index is accessed. Thereafter, step 314 is performed. In step 314 , the method 300 ends.
  • step 402 the method 400 begins.
  • step 404 a request to perform a query on a first portion of data in a database is received.
  • the apparatus 100 may query a defined range of the database 108 for information.
  • the defined range serves as the first portion.
  • the apparatus 100 may query the database for total revenue from the third quarter of year 2000 to the second quarter of year 2003, in which case revenue entries from the third quarter of year 2000 to the second quarter of year 2003 are included in the first portion.
  • step 406 it is determined whether a result of the query has been pre-computed for at least one second portion of data in the database, wherein the second portion is included in the first portion.
  • a second portion may be, for example, quarterly revenue data for year 2001 and/or quarterly revenue data for year 2002, which is entirely included in first portion of data.
  • the second portion may have a pre-computed result associated with it. Therefore, in step 406 , the apparatus 100 may determine whether a total revenue for year 2001 (e.g., a sum of the quarterly revenues for year 2001) and/or quarterly revenue data for year 2002 has been pre-computed.
  • step 408 is performed.
  • step 410 is performed.
  • the query is performed on the second portion and the result of such query is stored as a pre-computed result.
  • the apparatus may query a second portion of data, which is entirely included in first portion of data, for information before arriving at the result of the query on the first portion of data.
  • the apparatus 100 may pre-compute total revenue for year 2001 by querying the database for total revenue from the first quarter of year 2001 to the fourth quarter of year 2001, in which case revenue entries from the first quarter of year 2001 to the fourth quarter of year 2001 are included in the second portion, and store the result of the query as a pre-computed result in a third column 206 of the database 108 .
  • the apparatus 100 may pre-compute total revenue for year 2002 by querying the database for total revenue from the first quarter of year 2002 to the fourth quarter of year 2002, in which case revenue entries from the first quarter of year 2002 to the fourth quarter of year 2002 are included in the second portion, and store the result of the query as pre-computed result in the third column 206 of the database 108 .
  • a query is performed on the second portion of data and the result of such query is stored as needed (e.g., only in response to a request to perform a query on the first portion of data), thereby avoiding unnecessary processing.
  • step 410 is performed.
  • step 410 it is determined that the result of the query on the first portion of the database is based on at least one pre-computed result.
  • the apparatus 100 may determine that pre-computed results may be employed to determine the query result.
  • step 412 the pre-computed result is retrieved.
  • the apparatus may access the third column 206 of the database 108 and retrieve the value of the total revenue for year 2001 and/or year 2002.
  • the query is performed on a third portion of data in the database, wherein the third portion is a portion of the first data other than the second portion.
  • the apparatus 100 may query the database for total revenue for the third and fourth quarters of year 2000 and the first and second quarters of year 2003, in which case revenue entries from the third and fourth quarters of year 2000 and the first and second quarters of year 2003 are included in the third portion.
  • step 416 the result of the query on the first portion is computed based on the at least one pre-computed result and the result of the query on the third portion of data. For example, the at least one pre-computed result is added to the result of the query on the third portion of data to yield the result of the query on the first portion of data in the database. Thereafter, step 418 is performed. In step 418 , the method 400 ends.
  • a result of a query which specifies a defined range of data in a database, may be determined without accessing every record corresponding to the defined range in response to the query.
  • the method 300 and method 400 may efficiently query a database, thereby providing performance gains over conventional database queries.
  • the apparatus may determine whether to continue storing the results based upon such data. In this manner, unused data may not unnecessarily occupy database space.
  • a threshold age may be employed such that only pre-computed results performed on a recent date range of data (e.g., defined by the threshold age, such as data stored within the last year, month, week, etc.) are stored. For example, if the database includes a long history of data but only data within a recent date range is accessed, according to the present methods and apparatus, only pre-computed results performed on such date range are stored, thereby saving database space. Pre-computed results performed on older date ranges may be removed.

Abstract

In a first aspect, a method is provided that includes the steps of (1) pre-computing a query result for each of a plurality of whole segments of data included in a database; (2) receiving a query specifying a defined range of data in the database; (3) determining if any of the whole segments are within the defined range; (4) performing the query on any partial segments of data within the defined range; and (5) determining the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range. Numerous other aspects are provided.

Description

    FIELD OF THE INVENTION
  • The present invention relates generally to computer systems, and more particularly to methods and apparatus for processing a database query.
  • BACKGROUND
  • A database frequently receives a query for information, such as a sum, average, or the like, from a range of data in a database. Conventionally, in response to such query, the database accesses every record, each of which includes one or more data entries, in the range of data to process the query. However, such a method for processing a database query results in large processing overhead.
  • Accordingly, methods and apparatus for improved processing of a database query are desired.
  • SUMMARY OF THE INVENTION
  • In a first aspect of the invention, a first method is provided. The first method includes the steps of (1) pre-computing a query result for each of a plurality of whole segments of data included in a database; (2) receiving a query specifying a defined range of data in the database; (3) determining if any of the whole segments are within the defined range; (4) performing the query on any partial segments of data within the defined range; and (5) determining the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range.
  • In a second aspect of the invention, a method of processing a database query for information is provided. The method includes the steps of (1) receiving a request to perform a query on a first portion of data in a database; (2) determining the result of the query on the first portion of the database is based on at least one pre-computed result, wherein the pre-computed result is a result of the query on a second portion of data in the database that is included in the first portion; (3) performing the query on a third portion of data in the database, wherein the third portion is a portion of the first portion other than the second portion; and (4) computing the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion. Numerous other aspects are provided, as are methods, systems, apparatus and computer program products in accordance with these other aspects of the invention. Each computer program product described herein may be carried by a medium readable by a computer (e.g., a carrier wave signal, a floppy disc, a compact disc, a DVD, a hard drive, a random access memory, etc.).
  • Other features and aspects of the present invention will become more fully apparent from the following detailed description, the appended claims and the accompanying drawings.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is block diagram of an apparatus for processing a database query in accordance with an embodiment of the present invention.
  • FIG. 2 is a block diagram of an example of a database that may be used in an apparatus for processing a database query in accordance with an embodiment of the present invention.
  • FIG. 3 is a first exemplary method for processing a database query in accordance with an embodiment of the present invention.
  • FIG. 4 is a second exemplary method for processing a database query in accordance with an embodiment of the present invention.
  • DETAILED DESCRIPTION
  • The present invention provides methods and apparatus for efficiently processing a database query. More specifically, the present methods and apparatus avoid having to access a large portion of a range (e.g., the entire range) specified by the database query. Consequently, the present methods and apparatus provide performance gains for processing database queries compared to conventional database query processing.
  • FIG. 1 is block diagram of an apparatus for processing a database query in accordance with an embodiment of the present invention. With reference to FIG. 1, the apparatus 100 for processing a database query includes a memory 102 for storing data coupled to a processor 104. The processor 104 is coupled to a storage device 106, such as a disk drive or the like, which includes a database 108. Details of the database 108 are described below with reference to FIG. 2. The processor 104 is adapted to execute code, which is adapted to efficiently query the database 108. More specifically, the processor 104 is adapted to execute code adapted to perform a first and/or second exemplary method in accordance with an embodiment of the present invention. Details of the first and second exemplary methods are described below with reference to FIGS. 3 and 4, respectively. Additionally, the apparatus 100 may include a communication (e.g., com) port 110 for communicating with other apparatus.
  • FIG. 2 is a block diagram of an example of a database that may be used in an apparatus for processing a database query in accordance with an embodiment of the present invention. With reference to FIG. 2, the database 108 may store data relevant to, for example, a specific customer or application. Such data may be organized in a manner specified by a user, such as a database administrator. For example, the database 108 may include a first column 202 of data representing dates. More specifically, the first column 202 includes a data entry (e.g., in a row corresponding to the column) for each quarter from the first quarter of year 2000 to the third quarter of year 2004. The database 108 includes a second column 204 of data representing revenue. For example, the second column 204 of data includes a data entry, corresponding to each quarter in the first column 202, representing revenue.
  • Further, the database 108 includes data based on (e.g., computed from) other data in the database 108. More specifically, the database 108 may include a third column 206 of data, each entry of which represents total revenue for a year (e.g., a sum of the revenue data for each quarter of the year). Similarly, the database 108 may include a fourth column 208 of data, each entry of which represents average quarterly revenue for a year (e.g., an average of the revenue data for the quarters of the year). For example, each data entry in the third 206 and/or fourth column 208 is based on data entries in the second column 204 that correspond to quarters of a particular year from the first column 202. As described further below, data in the database 108 may be grouped together to form one or more (e.g., a plurality) of whole segments 210-216. Although data in the database 108 of FIG. 2 includes four whole segments 210-216, data in the database 108 may include a larger or smaller number of whole segments. Further, a whole segment may include a larger or smaller amount of data than the whole segments 210-216 illustrated in FIG. 2.
  • The structure of the database 108 (e.g., arrangement of data in rows and columns) described above is exemplary. The database 108 may be structured in a different manner. For example, the database 108 may be a relational database, flat-file database, object database, distributed database, or the like. Further, the description of data populating the database 108 is exemplary. The database 108 may include different data.
  • The operation of the apparatus 100 for processing a database query is now described with reference to FIG. 2 and with reference to FIG. 3, which illustrates a first exemplary method 300 for processing a database query in accordance with an embodiment of the present invention. With reference to FIG. 3, in step 302, the method 300 begins. In step 304, a query result for each of a plurality of whole segments of data included in a database is pre-computed. As stated, the database 108 may be grouped into one or more (e.g., a plurality of) whole segments. For example, data entries of the second column 204 of the database 108 that correspond to quarters from a particular year (e.g., 2000, 2001, 2002 or 2003) of the first column 202 may be grouped into respective whole segments 210-216 (shown in phantom). Therefore, the example database 108 includes four whole segments. A query for information, such as a sum, average, or the like, from the database (e.g., from a range of data in the database) are frequently received. Conventionally, in response to such query, the database accesses every record, which includes one or more entries, in the range of data to process the query. However, in step 304, a query for information, such as a sum, average, or the like, is performed on each whole segment 210-216 in the database 108 (e.g., in advance of computing a result for the query for information from the range of data). In this manner, the result of the query on each of the plurality of whole segments 210-216 is computed before receiving and/or computing the result of a query for information from a range of data in the database (e.g., pre-computed). Such result may be stored as an entry in the database 108 (e.g., in the third 206 or fourth column 208 of the database 108). While this example shows additional columns explicitly added to the database table for use in storing pre-computed values, a hybrid index could be used instead. The hybrid index would store the pre-computed values as metadata, and queries using the index would retrieve the metadata for whole segments from the index rather than directly from the table.
  • For example, the sum of data in each whole segment 210-216, which represents total revenue for years 2000, 2001, 2002 and 2003, respectively, is computed. Additionally or alternatively, the average of data in each whole segment 210-216, which represents average quarterly revenue for years 2000, 2001, 2002 and 2003, respectively, is computed. Although, the third column 206 of the database 108 stores annual data and the fourth column 208 of average quarterly data, the hybrid index may be configured to store data representing other periods of time (e.g., daily, monthly, etc.).
  • In step 306, a query specifying a defined range of data in the database is received. More specifically, as stated above, a query for information such as a sum, average, or the like, from a range of data in the database may be received by the database. For example, the database may be queried for total revenue from the third quarter of year 2000 to the second quarter of year 2003.
  • In step 308, it is determined if any whole segments are within the defined range. For example, the apparatus 100 determines that two whole segments 212-214 are within the predefined range. Because the result of such query for each whole segment 210-216 in the database 108 is pre-computed, by determining whether any whole segments 210-216 are within the defined range, the apparatus 100 may avoid having to access every record in the defined range to process the query. More specifically, the apparatus 100 may not need to access every record corresponding to such whole segments 212-214 (e.g., in response to the query) to process the query.
  • In step 310, the query is performed on any partial segments of data within the defined range. More specifically, the apparatus 100 may determine the defined range includes one or more partial segments (e.g., parts of a whole segment) of data. In the example above, the apparatus 100 determines the defined range includes a part (e.g., revenue for the last two quarters of year 2000) of the whole segment 210 that represents revenue for year 2000 and includes a part (e.g., revenue for the first two quarters of year 2003) of the whole segment 216 that represents the revenue for year 2003. The apparatus 100 performs the query on a range defined by each such partial segment. For example, the apparatus 100 may determine total revenue for the third and fourth quarter of year 2000 by accessing the entries in such range. Similarly, the apparatus 100 may determine total revenue for the first and second quarter of year 2003 by accessing the entries in such range. In this manner, the result of the query on any partial segments of data within the defined range is determined.
  • In step 312, the result of the query specifying the defined range is determined based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the queries on any partial segments within the defined range. More specifically, the pre-computed query result for each whole section 212-214 determined to be within the defined range is accessed. For example, entries in the third column 206 corresponding to the total revenue for year 2001 (e.g., “8000”) and 2002 (e.g., “12000”) are accessed. Such entries are added to the result of the query on the partial segments to produce the result of the query specifying the defined range. In embodiments which employ a hybrid index, a pre-computed query result in the hybrid index is accessed. Thereafter, step 314 is performed. In step 314, the method 300 ends.
  • The operation of the apparatus 100 for processing a database query is now described with reference to FIG. 2 and with reference to FIG. 4, which illustrates a second exemplary method 400 for processing a database query in accordance with an embodiment of the present invention. With reference to FIG. 4, in step 402, the method 400 begins. In step 404, a request to perform a query on a first portion of data in a database is received. More specifically, the apparatus 100 may query a defined range of the database 108 for information. The defined range serves as the first portion. For example, the apparatus 100 may query the database for total revenue from the third quarter of year 2000 to the second quarter of year 2003, in which case revenue entries from the third quarter of year 2000 to the second quarter of year 2003 are included in the first portion.
  • In step 406, it is determined whether a result of the query has been pre-computed for at least one second portion of data in the database, wherein the second portion is included in the first portion. A second portion may be, for example, quarterly revenue data for year 2001 and/or quarterly revenue data for year 2002, which is entirely included in first portion of data. The second portion may have a pre-computed result associated with it. Therefore, in step 406, the apparatus 100 may determine whether a total revenue for year 2001 (e.g., a sum of the quarterly revenues for year 2001) and/or quarterly revenue data for year 2002 has been pre-computed. If it is determined in step 406 that a result of the query has not been pre-computed for at least one second portion of data in the database, step 408 is performed. Alternatively, if it is determined a result of the query has been pre-computed for at least one second portion of data in the database, step 410 is performed.
  • In step 408, the query is performed on the second portion and the result of such query is stored as a pre-computed result. For example, similar to step 304 of method 300, the apparatus may query a second portion of data, which is entirely included in first portion of data, for information before arriving at the result of the query on the first portion of data. For example, the apparatus 100 may pre-compute total revenue for year 2001 by querying the database for total revenue from the first quarter of year 2001 to the fourth quarter of year 2001, in which case revenue entries from the first quarter of year 2001 to the fourth quarter of year 2001 are included in the second portion, and store the result of the query as a pre-computed result in a third column 206 of the database 108. Similarly, the apparatus 100 may pre-compute total revenue for year 2002 by querying the database for total revenue from the first quarter of year 2002 to the fourth quarter of year 2002, in which case revenue entries from the first quarter of year 2002 to the fourth quarter of year 2002 are included in the second portion, and store the result of the query as pre-computed result in the third column 206 of the database 108. In this manner, a query is performed on the second portion of data and the result of such query is stored as needed (e.g., only in response to a request to perform a query on the first portion of data), thereby avoiding unnecessary processing. Thereafter, step 410 is performed.
  • In step 410, it is determined that the result of the query on the first portion of the database is based on at least one pre-computed result. Upon receiving a request to perform the query on the first portion of data in the database, the apparatus 100 may determine that pre-computed results may be employed to determine the query result.
  • In step 412, the pre-computed result is retrieved. For example, the apparatus may access the third column 206 of the database 108 and retrieve the value of the total revenue for year 2001 and/or year 2002.
  • In step 414, the query is performed on a third portion of data in the database, wherein the third portion is a portion of the first data other than the second portion. For example, the apparatus 100 may query the database for total revenue for the third and fourth quarters of year 2000 and the first and second quarters of year 2003, in which case revenue entries from the third and fourth quarters of year 2000 and the first and second quarters of year 2003 are included in the third portion.
  • In step 416, the result of the query on the first portion is computed based on the at least one pre-computed result and the result of the query on the third portion of data. For example, the at least one pre-computed result is added to the result of the query on the third portion of data to yield the result of the query on the first portion of data in the database. Thereafter, step 418 is performed. In step 418, the method 400 ends.
  • Through use of the method 300 or method 400, a result of a query, which specifies a defined range of data in a database, may be determined without accessing every record corresponding to the defined range in response to the query. In this manner, the method 300 and method 400 may efficiently query a database, thereby providing performance gains over conventional database queries.
  • The foregoing description discloses only exemplary embodiments of the invention. Modifications of the above disclosed apparatus and methods which fall within the scope of the invention will be readily apparent to those of ordinary skill in the art. For instance, although in one or more embodiments above, data in the database was grouped into segments or portions by date, the data in the database may be grouped into segments or portions based on another parameter associated with the data. Further, in one or more embodiments, data in the whole segments or second portion, and therefore, results based on such data, which is stored in a hybrid index, may be monitored to determine how frequently such data is accessed. Data used infrequently may be discarded. In some embodiments, an algorithm may be used to determine how frequently such data is determined. Based upon the monitoring, the apparatus may determine whether to continue storing the results based upon such data. In this manner, unused data may not unnecessarily occupy database space. Further, a threshold age may be employed such that only pre-computed results performed on a recent date range of data (e.g., defined by the threshold age, such as data stored within the last year, month, week, etc.) are stored. For example, if the database includes a long history of data but only data within a recent date range is accessed, according to the present methods and apparatus, only pre-computed results performed on such date range are stored, thereby saving database space. Pre-computed results performed on older date ranges may be removed.
  • Accordingly, while the present invention has been disclosed in connection with exemplary embodiments thereof, it should be understood that other embodiments may fall within the spirit and scope of the invention, as defined by the following claims.

Claims (42)

1. A method, comprising:
pre-computing a query result for each of a plurality of whole segments of data included in a database;
receiving a query specifying a defined range of data in the database;
determining if any of the whole segments are within the defined range;
performing the query on any partial segments of data within the defined range; and
determining the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range.
2. The method of claim 1 wherein determining the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range includes determining the result of the query by retrieving the pre-computed query results and determining the result of the query on any partial segments of data within the defined range.
3. The method of claim 1 further comprising:
storing the pre-computed query results in the database; and
removing pre-computed query results performed on a date range of data older than a specified age from the database.
4. The method of claim 1 wherein data in the database is grouped into one or more whole segments and partial segments based on a date associated with the data.
5. The method of claim 1 wherein pre-computing the query result for each of a plurality of whole segments of data included in a database includes pre-computing the query result for each of a plurality of whole segments of data included in a database in response to receiving the query specifying the defined range of data in the database.
6. A method of processing a database query for information, comprising:
receiving a request to perform a query on a first portion of data in a database;
determining the result of the query on the first portion of the database is based on at least one pre-computed result, wherein the pre-computed result is a result of the query on a second portion of data in the database that is included in the first portion;
performing the query on a third portion of data in the database, wherein the third portion is a portion of the first portion other than the second portion; and
computing the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion.
7. The method of claim 6 wherein computing the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion includes merging the pre-computed result and the result of the query on the third portion of data.
8. The method of claim 6 wherein the second portion of data is larger than the third portion of data.
9. The method of claim 6 wherein computing the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion includes computing the result of the query on the first portion by retrieving the pre-computed result and computing the result of the query on the third portion.
10. The method of claim 6 further comprising:
storing the pre computed query results in the database; and
removing pre-computed query results performed on a date range of data older than a specified age from the database.
11. The method of claim 6 wherein the data in the database is grouped into the first, second and third portion of data based on a date associated with the data.
12. The method of claim 11 wherein at least one of a day, month and year is associated with the second portion of data.
13. The method of claim 6 wherein the pre-computed result is computed in response to receiving the request to perform a query on the first portion of data in the database.
14. The method of claim 6 further comprising:
monitoring the pre-computed result; and
determining whether to continue to store the pre-computed result based on the monitoring.
15. An apparatus, comprising:
a database for storing data;
a processor coupled to the database and adapted to execute code adapted to:
pre-compute a query result for each of a plurality of whole segments of data included in the database;
receive a query specifying a defined range of data in the database;
determine if any of the whole segments are within the defined range;
perform the query on any partial segments of data within the defined range; and
determine the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range.
16. The apparatus of claim 15 wherein the code is further adapted to determine the result of the query by retrieving the pre-computed query results and determining the result of the query on any partial segments of data within the defined range.
17. The apparatus of claim 15 wherein the code is further adapted to:
store the pre-computed query results in the database; and
remove pre-computed query results performed on a date range of data older than a specified age from the database.
18. The apparatus of claim 15 wherein data in the database is grouped into one or more whole segments and partial segments based on a date associated with the data.
19. The apparatus of claim 15 wherein the code is further adapted to pre-compute a query result for each of a plurality of whole segments of data included in a database in response to receiving the query specifying the defined range of data in the database.
20. An apparatus for processing a database query, comprising:
a database for storing data;
a processor coupled to the database and adapted to execute code adapted to:
receive a request to perform a query on a first portion of data in the database;
determine the result of the query on the first portion of the database is based on a pre-computed result, wherein the pre-computed result is a result of the query on a second portion of data in the database that is included in the first portion;
perform the query on a third portion of data in the database, wherein the third portion is a portion of the first portion other than the second portion; and
compute the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion.
21. The apparatus of claim 20 wherein the code is further adapted to merge the pre-computed result and the result of the query on the third portion of data.
22. The apparatus of claim 20 wherein the second portion of data is larger than the third portion of data.
23. The apparatus of claim 20 wherein the code is further adapted to compute the result of the query on the first portion by retrieving the pre-computed result and computing the result of the query on the third portion.
24. The apparatus of claim 20 wherein the code is further adapted to:
store the pre-computed query results in the database; and
remove pre-computed query results performed on a date range of data older than a specified age from the database.
25. The apparatus of claim 20 wherein the data in the database is grouped into the first, second and third portion of data based on a date associated with the data.
26. The apparatus of claim 25 wherein at least one of a day, month and year is associated with the second portion of data.
27. The apparatus of claim 20 wherein the code is further adapted to compute the pre-computed result in response to the request to perform a query on the first portion of data in the database.
28. The apparatus of claim 20 wherein the code is further adapted to:
monitor the pre-computed result; and
determine whether to continue to store the pre-computed result based on the monitoring.
29. A computer program product, comprising:
a medium readable by a computer, the computer readable medium having computer program code adapted to:
pre-compute a query result for each of a plurality of whole segments of data included in a database;
receive a query specifying a defined range of data in the database;
determine if any of the whole segments are within the defined range;
perform the query on any partial segments of data within the defined range; and
determine the result of the query based on the pre-computed query results for any whole segments determined to be within the defined range and the result of the query on any partial segments within the defined range.
30. The computer program product of claim 29 wherein the computer program code is further adapted to determine the result of the query by retrieving the pre-computed query results and determining the result of the query on any partial segments of data within the defined range.
31. The computer program product of claim 29 wherein the computer program code is further adapted to:
store the pre-computed query results in the database; and
remove pre-computed query results performed on a date range of data older than a specified age from the database.
32. The computer program product of claim 29 wherein data in the database is grouped into one or more whole segments and partial segments based on a date associated with the data.
33. The computer program product of claim 29 wherein the computer program code is further adapted to compute the pre-computed query result in response to receiving the query specifying the defined range of data in the database.
34. A computer program product, comprising:
a medium readable by a computer, the computer readable medium having computer program code adapted to:
receive a request to perform a query on a first portion of data in a database;
determine the result of the query on the first portion of the database is based on a pre-computed result, wherein the pre-computed result is a result of the query on a second portion of data in the database that is included in the first portion;
perform the query on a third portion of data in the database, wherein the third portion is a portion of the first portion other than the second portion; and
compute the result of the query on the first portion based on the pre-computed result and the result of the query on the third portion.
35. The computer program product of claim 34 wherein the computer program code is further adapted to merge the pre-computed result and the result of the query on the third portion of data.
36. The computer program product of claim 34 wherein the second portion of data is larger than the third portion of data.
37. The computer program product of claim 34 wherein the computer program code is further adapted to compute the result of the query on the first portion by retrieving the pre-computed result and computing the result of the query on the third portion.
38. The computer program product of claim 34 wherein the computer program code is further adapted to:
store the pre-computed query results in the database; and
remove pre-computed query results performed on a date range of data older than a specified age from the database.
39. The computer program product of claim 34 wherein the data in the database is grouped into the first, second and third portion of data based on a date associated with the data.
40. The computer program product of claim 39 wherein at least one of a day, month and year is associated with the second portion of data.
41. The computer program product of claim 39 wherein the computer program code is further adapted to compute the pre-computed result in response to receiving the request to perform a query on the first portion of data in the database.
42. The computer program product of claim 39 wherein the computer program code is further adapted to:
monitor the pre-computed result; and
determine whether to continue to store the pre-computed result based on the monitoring.
US10/965,184 2004-10-14 2004-10-14 Methods and apparatus for processing a database query Abandoned US20060085394A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/965,184 US20060085394A1 (en) 2004-10-14 2004-10-14 Methods and apparatus for processing a database query
CNB2005101140113A CN100440212C (en) 2004-10-14 2005-10-13 Methods and apparatus for processing a database query
US13/560,992 US8515993B2 (en) 2004-10-14 2012-07-27 Methods and apparatus for processing a database query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/965,184 US20060085394A1 (en) 2004-10-14 2004-10-14 Methods and apparatus for processing a database query

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/560,992 Continuation US8515993B2 (en) 2004-10-14 2012-07-27 Methods and apparatus for processing a database query

Publications (1)

Publication Number Publication Date
US20060085394A1 true US20060085394A1 (en) 2006-04-20

Family

ID=36182006

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/965,184 Abandoned US20060085394A1 (en) 2004-10-14 2004-10-14 Methods and apparatus for processing a database query
US13/560,992 Active US8515993B2 (en) 2004-10-14 2012-07-27 Methods and apparatus for processing a database query

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/560,992 Active US8515993B2 (en) 2004-10-14 2012-07-27 Methods and apparatus for processing a database query

Country Status (2)

Country Link
US (2) US20060085394A1 (en)
CN (1) CN100440212C (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100325151A1 (en) * 2009-06-19 2010-12-23 Jorg Heuer Method and apparatus for searching in a memory-efficient manner for at least one query data element
US8306953B2 (en) 2010-08-31 2012-11-06 International Business Machines Corporation Online management of historical data for efficient reporting and analytics
US9395966B1 (en) * 2010-09-27 2016-07-19 Symantec Corporation Systems and methods for associating installed software components with software products
US11169990B2 (en) * 2018-09-20 2021-11-09 Amadeus S.A.S. Re-computing pre-computed search results

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577455A (en) * 2012-07-31 2014-02-12 国际商业机器公司 Data processing method and system for database aggregating operation
US9703825B2 (en) 2013-10-17 2017-07-11 Sybase, Inc. Maintenance of a pre-computed result set
CN105574060A (en) * 2015-01-13 2016-05-11 北京中体骏彩信息技术有限公司 Lottery statistic data extraction method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5511190A (en) * 1995-01-20 1996-04-23 Tandem Computers, Inc. Hash-based database grouping system and method
US5781896A (en) * 1994-06-30 1998-07-14 Microsoft Corporation Method and system for efficiently performing database table aggregation using an aggregation index
US6009271A (en) * 1996-10-28 1999-12-28 Bmc Software, Inc. Method of retrieving data from a relational database
US20020107835A1 (en) * 2001-02-08 2002-08-08 Coram Michael T. System and method for adaptive result set caching
US20030093407A1 (en) * 1999-05-21 2003-05-15 Roberta Jo Cochrane Incremental maintenance of summary tables with complex grouping expressions
US20040059757A1 (en) * 2002-09-19 2004-03-25 Myfamily.Com, Inc. Systems and methods for partitioning data on multiple servers
US20040111410A1 (en) * 2002-10-14 2004-06-10 Burgoon David Alford Information reservoir
US6931390B1 (en) * 2001-02-27 2005-08-16 Oracle International Corporation Method and mechanism for database partitioning
US7133858B1 (en) * 2000-06-30 2006-11-07 Microsoft Corporation Partial pre-aggregation in relational database queries

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926807A (en) * 1997-05-08 1999-07-20 Microsoft Corporation Method and system for effectively representing query results in a limited amount of memory
CN1341901A (en) * 2001-01-04 2002-03-27 中国科学院南京土壤研究所 Agricultural ecological multi-dimensional data management technique
CN1477560A (en) * 2002-08-19 2004-02-25 英业达股份有限公司 Praject management inquiry system and method
US10838957B2 (en) * 2010-06-17 2020-11-17 Microsoft Technology Licensing, Llc Slicing relational queries using spool operators

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781896A (en) * 1994-06-30 1998-07-14 Microsoft Corporation Method and system for efficiently performing database table aggregation using an aggregation index
US5511190A (en) * 1995-01-20 1996-04-23 Tandem Computers, Inc. Hash-based database grouping system and method
US6009271A (en) * 1996-10-28 1999-12-28 Bmc Software, Inc. Method of retrieving data from a relational database
US20030093407A1 (en) * 1999-05-21 2003-05-15 Roberta Jo Cochrane Incremental maintenance of summary tables with complex grouping expressions
US7133858B1 (en) * 2000-06-30 2006-11-07 Microsoft Corporation Partial pre-aggregation in relational database queries
US20020107835A1 (en) * 2001-02-08 2002-08-08 Coram Michael T. System and method for adaptive result set caching
US6931390B1 (en) * 2001-02-27 2005-08-16 Oracle International Corporation Method and mechanism for database partitioning
US20040059757A1 (en) * 2002-09-19 2004-03-25 Myfamily.Com, Inc. Systems and methods for partitioning data on multiple servers
US20040111410A1 (en) * 2002-10-14 2004-06-10 Burgoon David Alford Information reservoir

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100325151A1 (en) * 2009-06-19 2010-12-23 Jorg Heuer Method and apparatus for searching in a memory-efficient manner for at least one query data element
US8788483B2 (en) * 2009-06-19 2014-07-22 Siemens Aktiengesellschaft Method and apparatus for searching in a memory-efficient manner for at least one query data element
US8306953B2 (en) 2010-08-31 2012-11-06 International Business Machines Corporation Online management of historical data for efficient reporting and analytics
US9395966B1 (en) * 2010-09-27 2016-07-19 Symantec Corporation Systems and methods for associating installed software components with software products
US11169990B2 (en) * 2018-09-20 2021-11-09 Amadeus S.A.S. Re-computing pre-computed search results

Also Published As

Publication number Publication date
CN100440212C (en) 2008-12-03
US20120290618A1 (en) 2012-11-15
CN1783077A (en) 2006-06-07
US8515993B2 (en) 2013-08-20

Similar Documents

Publication Publication Date Title
US8515993B2 (en) Methods and apparatus for processing a database query
US10387411B2 (en) Determining a density of a key value referenced in a database query over a range of rows
US5655080A (en) Distributed hash group-by cooperative processing
US9043307B2 (en) Generating statistics on text pattern matching predicates for access planning
JP5088668B2 (en) Computer load estimation system, computer load estimation method, computer load estimation program
US7496584B2 (en) Incremental cardinality estimation for a set of data values
US8745033B2 (en) Database query optimization using index carryover to subset an index
US9063982B2 (en) Dynamically associating different query execution strategies with selective portions of a database table
US6212528B1 (en) Case-based reasoning system and method for scoring cases in a case database
US9372889B1 (en) Incremental statistics update
US8280907B2 (en) System and method for managing access to data in a database
US7263520B2 (en) Fast aggregation of compressed data using full table scans
US8583657B2 (en) Method and apparatus for using a hash-partitioned index to access a table that is not partitioned or partitioned independently of the hash partitioned index
US11163735B2 (en) Database capacity estimation for database sizing
US7447679B2 (en) Optimizing execution of a database query by using the partitioning schema of a partitioned object to select a subset of partitions from another partitioned object
US7194449B2 (en) Method and system for optimizing snow flake queries
EP1787226A1 (en) System and method for reorganizing a database
CN107577787B (en) Method and system for storing associated data information
US20040002956A1 (en) Approximate query processing using multiple samples
WO2012095839A2 (en) Systems and methods for performing online analytical processing
US20170316003A1 (en) Bulk Sets for Executing Database Queries
US9582522B2 (en) Management of database allocation during reorganization
US20090187591A1 (en) Retrieving database records for aggregation without redundant database read operations
CN112347098A (en) Database table splitting method and system, electronic equipment and storage medium
US8140520B2 (en) Embedding densities in a data structure

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BARSNESS, ERIC LAWRENCE`;SANTOSUOSSO, JOHN MATTHEW;REEL/FRAME:015285/0148;SIGNING DATES FROM 20041013 TO 20041014

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION

AS Assignment

Owner name: AIRBNB, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:056427/0193

Effective date: 20210106