WO2003081464A2 - Database system comprising database access object with cache - Google Patents

Database system comprising database access object with cache Download PDF

Info

Publication number
WO2003081464A2
WO2003081464A2 PCT/GB2003/001233 GB0301233W WO03081464A2 WO 2003081464 A2 WO2003081464 A2 WO 2003081464A2 GB 0301233 W GB0301233 W GB 0301233W WO 03081464 A2 WO03081464 A2 WO 03081464A2
Authority
WO
WIPO (PCT)
Prior art keywords
database
database access
data
client
cache
Prior art date
Application number
PCT/GB2003/001233
Other languages
French (fr)
Other versions
WO2003081464A3 (en
Inventor
Timothy Stephen Hoverd
Dean Alan Sheehan
Denis Ronald Howlett
Original Assignee
Isocra Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Isocra Limited filed Critical Isocra Limited
Priority to AU2003216854A priority Critical patent/AU2003216854A1/en
Publication of WO2003081464A2 publication Critical patent/WO2003081464A2/en
Publication of WO2003081464A3 publication Critical patent/WO2003081464A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results

Definitions

  • This invention relates to a database system.
  • it relates to a system for accessing databases, and specifically, but not exclusively relational databases, over a computer network.
  • caching schemes With all caching schemes, great care must be taken when several user programs access a database simultaneously to ensure that the data seen by the various programs is consistent. In particular, where one client updates the database, it is essential that all cached copies of the changed part of the database are likewise updated or marked as invalid. Hitherto, the use of caching schemes gives rise to a large amount of traffic to and from the server relating to synchronisation of various caches. Moreover, with known database access systems, it is not possible to implement caching without altering the application program interface (API) that the database system exposes to the clients, this necessitating alteration of clients upon implementation of the scheme.
  • API application program interface
  • An aim of this invention is to provide a system that can improve the performance of multi-user access to a database server without creating excessive server overload and that can be applied to existing database systems without the requirement to re-engineer client applications. From a first aspect, this invention provides a database access system comprising
  • a database access object operating on a client computer that functions as an interface between a database client and a database server
  • each database access object receives database queries from the client and returns a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and
  • the client upon the data cached in the client being changed, the client sends a message to other like database access objects (for example, connected to the same client) to indicate to the other database access objects the extent to which the data in their caches is invalid.
  • other like database access objects for example, connected to the same client
  • each database access object acts as a virtual database server connection for the client. It can therefore be made transparent to the client such that, from the client's point of view, there is a direct connection with the databases server. Moreover, since the access objects communicate directly with one another, the server is not required to handle invalidation of the cached data. Thus, the invention can be implemented on a conventional system without re-engineering of either the clients or the server.
  • the client access object may send a message to enable the other database access objects to update their caches to maintain them in synchronism with the database. This allows the other access objects to continue to use the content of their caches after modification of their contents.
  • the query and/or its result are used to update the information contained in the object's cache.
  • each client application communicates with a database driver that handles access to the database, the driver exposing a pre-defined API that abstracts the actual nature of the database.
  • the database access object may operate as a connection between the client application and the database driver.
  • the database access object most advantageously exposes an API that is functionally equivalent to the API that is exposed by the database driver.
  • a system embodying the invention typically includes one or more delegate objects.
  • the delegate objects operate to establish communication with a database driver.
  • instances of the delegates are fewer in number than instances of the database access objects. This arrangement can be provided because the existence of the cache reduces the number of required actual connections to the database.
  • the query to the database access object waits for a delegate to become available, thereby blocking the process that made the query.
  • the database access object may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
  • the invention provides a database access object operable on a client computer to function as an interface between a database client and a database server, the database object being operative to receive database queries from the client and to return a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and in which, upon the data cached in the database access object being changed, the database access object sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid.
  • a database access embodying this aspect of the invention may defer table writes by retaining the modified data in the table cache and sending synchronisation messages to other database access objects connected to the same database server. By this means, repetitive writes of the same or similar information to the underlying database server are avoided.
  • the invention provides a database access system comprising
  • a database access object operating on a client computer that functions as an interface between a database client and a database server
  • each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a request to the database server to retrieve data to make a response
  • the database access object in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
  • the other columns may also be retrieved and stored in the cache.
  • the database access object may expose entry points to its API that enable the client application to give an indication to the database access object of a strategy that should be adopted in obtaining additional data for populating the cache.
  • Additional data for the cache may be obtained through the request made to the database server being an augmented form of the query received from the client. Augmentation of the query is based upon a prediction of the expected nature of subsequent requests from the client, the aim being to provide the cache with optimal content.
  • a database access system upon the data cached in the client being changed, the client sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid.
  • This can be considered to be a combination of the features of the first and third aspects.
  • the invention provides a database access object operable on a client computer to function as an interface between a database client application and a database server, the database object being operative to receive database queries from the client application and to return a response to the queries to the client application wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and in which, in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
  • a database access object optionally in accordance with this aspect of the invention may be operative to create virtual tables presented to the client application that are entirely held within the cache. These can be accessed and manipulated without reference to the underlying database.
  • a schema for the virtual tables may be defined within configuration data for the database access object independently of the database schema.
  • the configuration information presented to the database access object defines the schema of these "virtual" tables.
  • a database access system embodying the invention may implement optimistic locking control.
  • queries that are received by the database access object may augmented to implement optimistic locking control thus rendering the control transparent to the application program.
  • the method may augment the database schema, and the queries and updates that operate on that schema, to insert optimistic control columns in tables that initially contained no such control columns.
  • the values recorded in the optimistic control parts of the schema can be used by database access objects connected to the same server to detect a situation where another database access object has modified the data in a particular row without the first database access object being informed of the change; this delay in informing the latter database access object being most likely the result of network latency.
  • systems embodying the invention are of particular benefit for use with clients, such as J2EE application servers, that make many naive queries on the underlying database. Although the first of these queries will run substantially at the speed it would have done without implementation of the invention, subsequent queries are typically executed faster.
  • Each data access object is responsible for analysing the queries passed to it and determining if the required results may be obtained from the local cache, or only by reference to the underlying database. If a database query is made, the data access object may retrieve rather more data than was actually requested by the client software. For example, it may retrieve all of a row rather than just some specific columns. In this manner the data access object populates its internal cache.
  • Figure 1 is a diagram illustrating a client/server database system embodying the invention
  • Figure 2 is a class diagram of a database access object being a principal component of an embodiment of the invention
  • Figure 3 is a diagram of logical connections within a database system embodying the invention.
  • Figure 4 is a diagram of a caching scheme operating within a database system embodying the invention
  • Figure 5 is a diagram of a query parsing scheme operating within a database system embodying the invention
  • Figure 6 is a diagram of a clustering scheme operating within a database system embodying the invention.
  • Figure 7 is a diagram of events within a database system embodying the invention.
  • a plurality of database clients 10 communicate with a database server 12.
  • the database server 12 executes a database driver 14 that exposes a data access API to which the clients can make calls in order to access the data on the server 12.
  • a database access object 20 is interposed between each client 10 and the database driver 14. It will be seen from Figure 1 that each database access object can communicate with one or with more than one client 10. The position of the database access object within the system is such that it serves as a "shim" layer between a client application and the database driver that it uses.
  • the embodiment will be described with reference to a database access object 20 that can operate in accordance with the Java Database Connectivity (JDBC) standard defined by Sun Microsystems.
  • JDBC Java Database Connectivity
  • the database access object 20 will be described as code executing on a Java virtual machine, although it may equally be implemented as native code executing on a real machine.
  • Each instance of the database access object, indicated in Figure 2 at livestore services a number of JDBC Java . sql . Connection implementations, each indicated in Figure 2 at ServicedConnection.
  • the number of connections serviced by a single database access object instance is controlled by the client application that is using the system to access a database, and may be subject to connection pooling.
  • DelegateConnection objects represent connections to the underlying database provided via a third party JDBC driver and can handle database queries expressed in SQL.
  • a ServicedConnection object does not always require a DelegateConnection object to which to delegate SQL query handling, and there may therefore be fewer DelegateConnections instances than ServicedConnection instances.
  • a DelegateConnection instance is bound to a ServicedConnection instance only for the duration of the cu ⁇ ent transaction upon that ServicedConnection instance when a SQL write operation is performed or a SQL read is performed that cannot be answered by the database access object from data in its internal in-memory cache, as will be described below.
  • the database access object instances cache two types of information:
  • the first type of cached data provides the database access object with the ability to support a variety of queries based upon the coverage of the cached data. For example, if the database access object has all of the rows of a particular table in memory, then it can answer any SQL query based upon that table (provided that the query does not use any joins to other tables that are not also in the cache memory). If the database access object has all of a defined subset of the rows from a particular table, then it can support SQL queries from its in-memory cache based upon that table whether the select criterion of the query is stronger than the defined subset; that is, if the query is guaranteed to be a subset of cached subset.
  • the database access object is capable of maintaining this cached data current with respect to the SQL write operations that are performed via one of its ServicedConnection objects, or for which it received cluster synchronisation messages for from a remote application, as will be described.
  • the second type of cache is a cache of the results from SQL queries that have been performed by a DelegateConnection object in instances in which the database access object did not have the data in its cache.
  • the database access object only tracks from which tables the results are drawn, and if it intercepts any modification to the data in any one of the source tables, then this select cache is invalidated.
  • the database access object makes use of a SQL syntax parser to parse both prepared and non-prepared SQL statements created in the ServicedConnection objects.
  • the results of this parsing operation provide an abstract representation of the query allowing the database access object to determine whether the SQL statement is a read or a write operation, upon which tables the statement depends, and so forth.
  • the database access object provides JDBC connection implementations for use by its containing application.
  • the database access object uses JDBC connections from a delegate JDBC driver.
  • the number of delegate JDBC connections available for use by the database access object is typically less than the number of connections offered up by database access object.
  • This structure is beneficial in that it is the source of a major scalability improvement to the application as delegate JDBC connections are expensive resources both financially (JDBC drivers and the database they provide access are commonly licensed on a per-connection basis) and in computing terms. Indeed, because the invention reduces the load that a client application places on the database, all applications connected to that database, whether they work though embodiments of the invention or not, may benefit from a performance improvement. The requirement for fewer delegate JDBC connections arises because some of the database activity can be carried out with reference to data stored in the cache only, with no requirement to access a JDBC database connection.
  • the database access object is capable of offering XA JDBC connections that can be enlisted in distributed, or two-phase, commit transactions. In order to do this the delegate JDBC connections must be XA Connections. All of the connections offered by the database access object and all of the delegate connections used by the database access object will be of the same type, either XA or non-XA connections depending on the configuration.
  • the database access object is the resource manager (as defined within the XA protocol definition) it never makes any modification to the database using anything other than a delegate XA connection. Therefore, the database access object can just delegate the XA protocol request to the delegate XA connection if one has been bound to the database access object connection. If no delegate JDBC XA connection has been bound to the database access object connection then no SQL writes have taken place and therefore responding to the XA protocol query is trivial and the database access object can do this directly.
  • the caching classes are at the central to the operation of embodiments of the invention and Figure 4 outlines the important classes and interfaces.
  • the database access object maintains three types of cache.
  • a table can be configured for caching either via a configuration file, or it can be done programmatically.
  • Sequence caches which maintain pools of unallocated numbers from tables identified sequence sources. These may be normal tables or in some cases (Oracle, for example) may be primary concepts within the database.
  • Cached result sets which are the values returned from a previously completed complex query that was delegated to the underlying database.
  • Connections maintain a record of data that has been modified but not committed during a current transaction, and this information is used to modify the result of any query performed within the cache based upon the cache rows to make the query results consistent with the changes logged by the connections. For example, a connection may update the salary of a particular employee and then query to find all employee with the salary that the modified employee had previously. The modified employee should not be in the result set returned by the database access object, even though the cached data may indicate that it should be returned.
  • connection query engine object that can access the uncommitted modifications held by the connection in order to modify the results of a query processed by the more general query engine that sources its data from the public cache of table and sequence data.
  • the database access object parses the SQL statements that it handles.
  • Figure 5 identifies the main classes and interfaces involved in the parsing of SQL statements.
  • Each database access object instance locates an implementation of the ParserFactory interface based on the database connected to by the delegate JDBC connection.
  • Different databases have different dialects of SQL and therefore it is expected that ParserFactory and Parser will be implemented for generic several specific implementations, including, for example, SQL99, Oracle, DB2, SQLServer, Sybase and possibly others.
  • the SQL syntax of these database systems only varies by small amounts but enough to require different syntax grammars for their SQL.
  • Each of the SQL dialects supported are made available as a ParserFactory class in a Java package.
  • ParserFactory objects are used to create an instance of a class that implements the Parser interface. Each connection uses the ParserFactory to instantiate its own parser instance ensuring that parsing operations carried out by one connection do not interact with parsing operations performed by another connection.
  • Parser implementations parse SQL syntax that originates from the callers specification of a JDBC Statement or a PreparedStatement object implemented as part of the invention, and the database access object implementations of the JDBC Statement and PreparedStatement hold references to the abstract syntax tree built by the parser from the SQL statement.
  • the database access object PreparedStatement objects have access to equivalent, although not necessarily identical, PreparedStatement objects against delegate connections.
  • the delegate PreparedStatement object created by the database access object may be different from the one provided by database access object, in SQL terms, as the delegate PreparedStatement object may have been modified to make its results contribute to the content of the database access object cache more effectively.
  • Database statements addressed to the database access object may or may not result in a statement on a delegate connection as it may be answered from the cached data and JDBC Statement objects are transient unlike PreparedStatement objects.
  • All JDBC Connection oriented operations that are non-standard are implemented as static methods in a connection class called that will accept as its first parameter the JDBC connection being used by the application. If the connection provided is not a JDBC connection to a database access object then the operation will have no impact and no exceptions will be thrown.
  • clustering describes the ability of the database access object to synchronize a group of database access object instances running within different application servers. Clustering ensures that each database access object instance has the current version of the content of the database as soon as is possible given the latency of local and wide area networks.
  • each database access object needs to be able to interact with a customers' chosen message oriented middleware, this being software that ensures a message can be transmitted between one piece of software and another in a reliable and asynchronous manner.
  • a particular advantage of embodiments of the invention is the ability to built connectors that allow the database access objects to use any reasonable messaging system.
  • Figure 6 outlines the major classes and interfaces that are involved in the clustering mechanism of embodiments of the invention.
  • ClusterConnection is the implementation of the connector that allows the database access object to publish cluster synchronisation messages to and subscribe to cluster messages from a particular message system.
  • ClusterConnection is the implementation of the connector that allows the database access object to publish cluster synchronisation messages to and subscribe to cluster messages from a particular message system.
  • the embodiment provides an implementation of the ClusterConnectionContainer interface.
  • ClusterConnection implementations are called upon to publish cluster synchronisation messages after the JDBC connection has committed is changes and hence the publication is non-transactional. That is to say, it is possible for the containing application server or the messaging system to fail after the JDBC connection has committed is changed, resulting in a change to the content of the database without the publication of a synchronisation message. This could lead to other database access object instances that are members of the cluster having a stale but not invalidated cache. In some applications, this is acceptable, since this kind of failure is rare and guaranteeing transactional publication of synchronisation messages comes at a cost.
  • a cluster connector In order to support transactional publication of cluster synchronisation messages, it is possible for a cluster connector to support the TransactionalClusterConnector interface in which the case database access object will request the publication of a synchronisation message before the JDBC connection transaction has been committed. This gives the TransactionalClusterConnector an opportunity to enlist its publication in the global transaction or make use of the JDBC connection to ensure transactional publication.
  • the ClusterConnectionFactory makes it possible for the database access object to instantiate and configure a particular cluster connection implementation without any prior knowledge of the messaging transport being used.
  • a configuration document specifies the name of the ClusterConnectionFactory to instantiate in order to create instances of the required cluster connector.
  • the event package of implemented in embodiments of the invention is not central to providing the advantages of the invention, but in the embodiments in which it is present, it can provide an integration interface for our other software systems as well as providing customers with the integration point for constructing event-driven architectures.
  • Figure 7 presents the classes and interfaces that form the system's event API.
  • Each DataChangedEvent describes a transactional unit of changes in terms of the tables, and rows within those tables, that have been inserted, updated or deleted.
  • the following description provides a general description of the processing performed by the database access objects resulting from the principal inputs from a JDBC connection and clustering.
  • This section describes the steps involved in processing a SQL select statement that queries the data based on a given expression.
  • select statement must be parsed. If the select is being used to prepare a statement then it may have bind variables to be given appropriate substitutions prior to execution. Otherwise, the statement is complete as is.
  • An initial check is made to determine whether or not the database access object is capable of answering this select statement form cache no matter what the cache content is. This is a check based on the structure of the select statement. In simple embodiments, the database access object may only be able to support queries from a single table with a where clause that requires the comparison of one or more columns with a literal value. If the database access object cannot attempt to answer this query from its cache then the process proceeds from step 6 below and otherwise continues at step 2.
  • connectionQueryEngine instance related to the connection that is performing the select is asked to answer the select from the public cache of data via the single QueryEngine instance, taking into account any modifications to the data that may have been made with respect to this connection that have not been committed. If enough data is available in the cache then the result may be given to the caller otherwise continue to step 4.
  • the query must be direct down to the actual database. If no delegate connection has been bound to this the database access object connection then one will be chosen from the delegate connection pool, possibly causing this connection to block until one becomes available.
  • the statement Prior to the delegation of the select statement, the statement may be "augmented" to improve the contribution that the results will make to the data in the cache.
  • the select statement will be augmented in accordance with rules to be determined, preferably to reflect the pattern of data requests made by the client application. For example, the query may be augmented to ensure that primary key columns and optimistic control columns are included.
  • the query may already been performed and the results deemed reliable. If the statement is non-prepared then the statement itself will be used to lookup the cached result set. If the statement is of the prepared form then the statement and the values of each of the bind variables are the key into the result set cache. If the result can be found then it can be returned otherwise the query is delegated to the database, as described in step 4 above, with the additional step of recording the results of the query in the result set cache.
  • the database access object may provide calls that a client can use in order to indicate an advantageous strategy for augmenting the SQL query.
  • the following are examples of circumstances that this may be useful.
  • An operation allowing the application to ask a database access object connection for the delegate JDBC connection so that it can perform operations against the data source within the same transactional context but without incurring any parsing cost by the database access object or more importantly cache implications.
  • An example use of this would be where the application wished to make an obscure call against, say, Oracle that does not change any data.
  • Making the call on the JDBC connection of the database access object would cause the database access object to clear its local cache and send a cluster message to other objects requesting that they do the same. Performing this operation on the delegate connection would bypass the database access object and maintain the contents of the caches.
  • Non-prepared queries may be configured to avoid caching. This will mean that operations performed by the Statement interface can be configured to incur no parsing cost.
  • a query In order for a query to contribute to the cached data it must contain at least the primary key columns. If it does not have any primary key columns then it cannot be cached and a configuration that states it should be cached will result in a configuration warning. If optimistic locking is being used then a query must include the optimistic control columns. The database access object will augment queries to include both primary key columns and optimistic control columns where necessary.
  • the database access object may be configured to specify a column load dependency graph that identifies common columns to be included in a select list based on the presence of another column.
  • the database access object ensures that any select list extension will not impact the ResultSet passed back to the caller. This is done by appending the additional columns rather than prepending them, and intercepting the access to the ResultSet meta data to hide the additional columns.
  • Queries selecting columns from more than one table will be augmented to contain the primary key and optimistic control columns of all tables.
  • An Oracle sequence will be treated in the same way as generic table sequence except that Oracle provides a syntax for inserting a row and making use of the next sequence entry without the application actually reading the value of the sequence. This syntax places the sequence name where the literal value for the column would otherwise be.
  • the database access object modifies the insert statement to use a literal value supplied from the in-memory pool of sequence numbers. Queries that cannot be answered from cache, but that are known to be based on tables configured for caching, can have the resulting ResultSet cached. If the query originates from a prepared statement then the ResultSet will be cached against the prepared statement with the actual bound parameters as a key. Otherwise the ResultSet will be cached against the actually query text as a key. Cached result sets are cleared in accordance with the cache management policy and when any one of the source tables undergoes a change.
  • Each cached table maintains a collection of indexes that will be used to promote fast in- cache retrieval of data.
  • the most important index is the primary key index and all queries are augmented to ensure that the primary key is included in the ResultSet so that the index can be maintained.
  • As a primary key column, or columns, is unique then the data in a ResultSet can contribute to the primary key index no mater what the select criterion are. This is also the case for any column marked as unique as long as the column is present in the ResultSet.
  • Simple embodiments of the invention do not support deferring the writes requested by the calling application.
  • the, first action that all write operations must perform is to obtain a delegate connection if one has not already been bound to this connection.
  • the insert statement is delegated to the underlying database via the delegate JDBC connection bound to the applicable database access object connection. If the table to which the insert relates has an optimistic control column defined and a value for this has not been supplied and it does not default in the actual database, then the database access object will insert a starting value of 1. This will provide automatic optimistic lock control via the automatic augmentation of extra column queries to the SQL that the database access object receives.
  • the insert is recorded so that on commit the public cache can be modified and an appropriate cluster synchronisation message sent.
  • the insert must also be recorded so that the ConnectionQueryEngine can modify any results obtained from the public cache, via the QueryEngine, based upon the un-committed modification of this connection/transaction.
  • the insert statement refers to a sequence. This instructs Oracle to insert the specified row in the table and give the specified column the next value from the sequence, increasing the value in the sequence for the next user.
  • the database access object identifies the reference to the sequence and replaces the sequence reference with the literal value for the column as taken from the sequence number pool being managed by reference.
  • Inserts can take two forms. The first is a single row insert with all of the mandatory columns of the row being supplied with literal values or an insert into one table based on the select from another table(s). This form of the insert is by far the most common and it is easy for the database access object to make the appropriate modification in its cache.
  • a single row insert will result in a new row in the table cache, if the table is being cached, and any relevant indexes being managed. Any cached ResuItSets drawn from the table being inserted against will be cleared.
  • a multi-row insert will cause the database access object to flush its cache of all values against the appropriate table. Any cached ResuItSets drawn from the table being inserted against will be cleared.
  • Updates may be performed against tables or tables being used to source sequence numbers. These two are considered separately.
  • To update a table modify it to include version update and version restriction in where clause; delegate it; record id. Updating a sequence involves no operation.
  • the first, and most common type of update is the update of a single row in a table as specified by its foreign key values.
  • the cache in the database access object can support both of these updates but there are issues with respect to inter-object instance concurrency.
  • the first form of the update causes the row in the table cache, if present, to be modified to reflect the new values and any relevant indexes to be maintained; sal ary in this example.
  • multi-row deletes are more common than multi-row updates because of the common requirement for cascaded deletes.
  • JDBC and ODBC specify four different levels of transactional isolation that state the maximum interaction between two or more transactions.
  • TRANS ACTION_RE ADJUNCOMMITTED allows transactions to see uncommitted changes to the data. This means that dirty reads, non- repeatable reads, and phantom reads are possible.
  • TRANSACTION_READ_COMMITTED means that any changes made inside a transaction are not visible outside the transaction until the transaction is committed. This prevents dirty reads, but non-repeatable reads and phantom reads are still possible.
  • TRANSACTION_REPEATABLE_READ disallows dirty reads and non-repeatable reads. Phantom read are still possible.
  • TRANSACTION_SERIALIZABLE specifies that dirty reads, non- repeatable reads, and phantom reads are prevented.
  • the database access object must respect these isolation levels (which may be configured via the java.sql. Connection interface) within the limits defined by the delegate JDBC driver.
  • TRANSACTION_READ_COMMITTED behaviour can be given where TRANS ACTION_READ_UNCOMMITTED is requested.
  • Some embodiments of the invention will not support TRANSACTION_READ_UNCOMMITTED, the lowest isolation level supported being TRANSACTION_READ_COMMITTED.
  • the database access object does not enhance or improve the isolation levels offered by the delegate JDBC driver or target data source; it will only ensure that it respects these isolation levels with the operations it performs, relying on the delegate JDBC driver and target database to implement the require isolation level correctly.
  • some embodiments of the invention not support TRANS ACTION_SERIALIZABLE.
  • each isolation level proves more difficult to implement and implies slower performance.
  • applications tend to use READ_COMMITTED isolation level.
  • a request for this isolation level will result in a SQLException being thrown and the supportsTransactionlsolationLevel operation on the DatabaseMetaData interface of the database access object for this level will return false. If the default isolation level for the JDBC driver and database is TRANSACTION_SERIALIZABLE then if a connection is used prior to having its isolation level reduced then a SQLException will be thrown.
  • TRANSACTION_READ_COMMITTED all modifications made to rows, be they insert updates or deletes, must be recorded against the connection until it is successfully committed at which point the changes may be rolled back into the public table cache.
  • TRANSACTION_REPEATABLE_READ support may not be implemented in all embodiments of the invention since most performance or concurrency sensitive applications do not use it.
  • Queries that contribute to the table caches may be made public immediately if no write has been performed against the source tables. If any write has been performed then the contribution to the public cache may only be made once the transaction has been committed.
  • ResuItSets that are cached may be cached in the public portion of the cache if the source tables of the ResultSet have not been written to. If the source tables of a cacheable ResultSet have been modified then the ResultSet may only be cached publicly upon successful commit of the transaction.
  • Locking is not something that appears as part of the JDBC specification; it is the mechanism by which database support the transactional isolation rules along with the more general ACID rules.
  • the database access object provides the following locking strategies:
  • the database access object can be configured to perform automatically the addition of the optimistic locking control to the SQL statements. This allows application developers to concentrate on their application logic, rather than on the optimistic control logic.
  • the locking control will happen in memory for non-clustered embodiments of the invention, and therefore the SQL sent to the database will be unchanged.
  • the database schema must be changed to include these optimistic lock control columns.
  • In-memory write locking can be configured to detect write-write conflicts between multiple transactions within the same database access object instance without storing optimistic control values within the table.
  • the database access object is not capable of detecting or preventing read-write conflicts where one transaction modifies data being read by another transaction.
  • Individual tables may be configured to use table optimistic locking, memory optimistic locking or no locking at all. In the cases where a write is performed against a row that has not been read then the optimistic control field is incremented but not tested in the where clause of the update or delete operation.
  • Multi-row updates cannot support optimistic locking because it is not possible to state the required old value for each row targeted by the update statement.
  • the database access object will allow multi-row updates to be performed and it will do so by reading the oc value of all rows that will be targeted by the update (for update) and placing a write-lock on them so that they cannot be updated by anyone else. Then the rows will be located in cache and the oc values are compared to check that no row has been changed.
  • Multi-row deletes must also be handle by the database access object. As with multi-row updates, the old value of the optimistic control value cannot be specified. Multi-row deletes occur more frequently than multi-row deletes, generally as the result of a cascaded delete, that is the deletion of an ownership network of objects. Multi-row deletes are handled in the same way as multi-row updates; that is to read the optimistic control variable for each row targeted by the delete (for update), causing a write lock and then doing an in-memory check of all the oc values.
  • the database access object provides the ability, via its configuration document, to preload its cache ahead of any usage by its calling application and initialise any required indexes.
  • the preload element of the database access object configuration document supports two different sub-elements; code, and sql .
  • the code and sql elements can appear in any order and may occur multiple times.
  • the code element supports the execution of an arbitrary Java class that implements the preloader interface.
  • the format of the code content is ⁇ cl ass>[( ⁇ parameters>)].
  • the class referred to must be available to the livestore instance via Cl ass . forNa eO and must implement the com . i socra . I i vestore. api . Prel oader interface.
  • the class must only have a single constructor which may require arguments in which case the code element must specify the values.
  • An instance of the Prel oader class will be created and the execute method will be called with a suitable JDBC connection allowing the Preloader to invoke SQL select statements. If any other form of SQL statement is attempted an exception will be reported and this livestore instance will not complete its initialisation correctly.
  • the SQL element allows the specification of a SQL select statement that will be executed to load data in the cache. Only SQL select statements will be supported, any other SQL statement will cause an exception and the failure of the database access object initialisation.
  • Database access object instances may be configured into clusters and these cluster share synchronisation information in order to ensure that one database access object cache reflects changes that have been committed via another database access object cache.
  • Embodiments of the invention can support partitioning of cluster synchronisation messages in order to reduce or refine resource usage. It is also possible for applications the do not use database access objects to generate and consume cluster messages, thus aiding legacy integration.
  • the database access object configuration document specifies the cluster configuration for a database access object instance.
  • a particular embodiment of the invention provides two cluster communication implementations: database access object clustering and JMS clustering.
  • Cluster messages may be formatted in an opaque binary format, preferred for performance reasons, or an open XML format and the database access object configuration document will specify which format to be used.
  • Database access object clustering uses no third party products, just Java 1.2's TCP/IP sockets.
  • a daemon process called LivestoreCIusterHub, acts as the registration point and message broadcaster for all database access object instances within the cluster.
  • the database access object cluster implementation requires three parameters:
  • the class that implements the database access object clustering ClusterConnectionFactor is com.isocra.livestore.cluster.livestore.ClusterConnectionFactory.
  • LivestoreCIusterHub instances may be brought up as primary or secondary instances. If a hub is configured as a secondary instance then it will talk to the primary, which will publish the availability of the secondary to the cluster members.
  • Database access object clustering has two different configurations controlled by a configuration property called “transactional” with values "true” or "false”.
  • the non-transactional configuration of database access object clustering publishes cluster message after the transaction that causes them has been committed. If the database access object instance dies (or is terminated) between committing the transaction and publishing the cluster message to the hub then the message has been lost. In this case the hub will inform the other member of the cluster and they will be forced to flush their cache of data.
  • a cluster member If a cluster member has a problem communicating with the hub then it will attempt to reconnect once and then it will attempt to contact the secondary hub. If it cannot connect to the secondary, or there is no secondary, then it will inform the database access object instance that it must flush its cache and refrain from caching anything until the connection is re-established.
  • the transactional configuration of database access object clustering makes use of the delegate JDBC connection to persist the cluster message to be published as part of the transaction thereby achieving atomicity with the application transaction.
  • the ClusterConnector is still responsible for publishing the cluster message after the transaction has been committed but if the hub detects the loss of a cluster member it can look at the database and check which messages it never received and publish those automatically. If the hub itself dies then when each cluster member goes to the secondary hub it must inform it of the last message id received from each of the other members of the hub thus allowing the hub to check the database for any later message and re-publish them.
  • the JMS cluster communication implementation will make use of a third party, not- bundled, JMS implementation to publish cluster messages and subscribe to cluster messages. Both a Publish-Subscribe (Topic) and Point-to-Point (Queue) mechanisms will be supported.
  • JNDI lookup names for the QueueConnectionFactory and Queue are required.
  • topic configuration the JNDI lookup names for the TopicConnectionFactory and Topic are required. If the JMS implementation supports distributed transactions, XA interfaces, and the database access object configuration for the JMS clustering specifies it then transactional publication will be used.
  • the example causes the receiving database access object instance to clear all cached data. This is the kind of message that would be emitted as a result of an unknown write operation being performed.
  • the clearAH operation cannot be combined with any other operations:
  • the following example causes the receiving database access object instance to clear all data cached in a particular table. This is the kind of message that would be emitted as a result of an unknown operation where the actual target table was identified. No other operations can be specified for the same table although operations on other tables can be included:
  • a utility class is provided that converts between opaque cluster messages to XML cluster messages and vice versa. This conversion may be built into an application that consumes from one cluster connection and publishes on another cluster connection with a format change.
  • the database access object provides a cache of statements that have been prepared against each connection.
  • JDBC has the notion of prepared statements that allow a parameterised piece of SQL to be given to the connection (and hence the database) for parsing and possibly query optimisation for later execution with specific variable substitutions. J2EE makes the mechanism almost useless. J2EE contained components do not hold references to
  • JDBC connections they ask the J2EE container for connections allowing the container to leverage connection pools and other scalability mechanisms. This means that every time a component requests a connection object it can be given a different one and using a statement prepared against one connection in another connection is invalid.
  • JDBC 3.0 contains facilities for preparing statements against a connection pool which removes this problem. Until the availability and use of JDBC 3.0 drivers becomes pervasive the caching of prepared statements it the best way forward.
  • the database access object parses SQL statements and anything that can be done to reduce the amount of parsing that goes on the better.
  • the database access object checks its cache of statements, using the full text of the SQL string being provided as the basis for the prepared statement key. If the database access object finds a prepared statement then it will return it otherwise a new prepared statement is created against the delegate driver connection and registered with the prepared statement cache for the connection. As prepared statements are an expensive resource, the number of cached prepared statements can be managed and the least used statements will be cleared from the cache if the limit is reached.
  • the pscachemax attribute in the configuration document specifies the maximum number of prepared statements that can be cached. When pscachemax is set to zero then no statements will be cached.
  • ⁇ !-- depid is an fkey to the department table and as such livestore will maintain indexes for que ⁇ es performed in this fkey. This turns it off. -->
  • ⁇ >-- factory identifies the class that implements the factory interface Format specifies either opaque (default if not specified) or xml --> ⁇ propert ⁇ es>
  • ⁇ '-- maxConnections specifies the number of delegate connections that livestore will use at max This max will only be reached when the same number of livestore connections has been opened up by the application connectionTimeout specifies how long livestore will block waiting to acquire a delegate connection -->
  • Virtual Tables Virtual tables support the storage of state information between transactions and instances of clustered database access objects.
  • a virtual table has much of the functionality of an actual database table, but is maintained entirely within the cache and not in the database itself.
  • a virtual table is defined within the configuration document in much the same way as it would be defined within a database.
  • Virtual tables may have rows inserted, updated and deleted and selects performed within that table. Joins may or may not be supported, as required.
  • Virtual tables may form the basis of any support for the Object Caching Service for Java specification.
  • Embodiments may include a loader API which can load rows into the virtual table at the beginning.
  • Deferred Writes By default, all writes performed to database access objects connections are converted immediately into writes performed on the delegate JDBC connections. Deferred writes support the collection of all writes operations to just before the commit of the transaction and may make use of the delegate JDBC drivers batch write facility. As soon as a query is performed that cannot be answered in cache then any writes that have been deferred may need to be flushed depending on the tables being searched and the tables the writes impact. By this means, repetitive writes of the same or similar information to the underlying database server are avoided.
  • Partitioning is the term used to describe the segmentation of inter-database access objects synchronisation messages into discrete groups with the objective of managing the bandwidth requirements, CPU resources and providing for different qualities of service in the distribution of synchronisation messages.
  • the simplest form of partitioning is schema-based whereby changes to particular tables are separated from changes in other tables. For example, one cluster of servers may have synchronised reference data but not synchronized order data but each server in the cluster may need to synchronized its order data with anther cluster of order management servers. More complex forms of partitioning are possible with data based partitioning where row in a table that meet one criterion are synchronised via one cluster and row in the same table that meet another criterion are synchronized via another cluster.
  • Mapped connections Provision of mapping layer that can map from an abstract scheme to the real physical schema. This can support richer CMP storage for J2EE servers.
  • Java Interfaces The Java classes and interfaces that form the Application Programmer Interface available to the users of embodiments of the invention above and beyond the JDBC 2.1 for example, within a CVS repository.
  • the Java classes and interfaces are specified in the javadoc documentation reproduced below.

Abstract

The present invention relates to a database system. In particular, the invention relates to a system for accessing databases, and specifically, but not exclusively relational databases, over a computer network.The database access system comprises a database access object (20), which operates on a client computer that functions as an interface between a database client (10) and a database server (12). The database object receives database queries from the client and returns a response to the queries to the client. Each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data. Otherwise, a query is passed to the database server to retrieve data to make a response. Upon a change in the data cached in the client, the client sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid.

Description

Database System
This invention relates to a database system. In particular, it relates to a system for accessing databases, and specifically, but not exclusively relational databases, over a computer network.
As is well-understood, when a client computer system accesses data in a database stored in a remote server computer, performance of the entire database system is limited by the speed with which data can be transmitted over the network. Various solutions to this problem have been proposed that involve keeping a copy of some or all of the data at the client computer, such that the cached data can be accessed without creating network traffic.
With all caching schemes, great care must be taken when several user programs access a database simultaneously to ensure that the data seen by the various programs is consistent. In particular, where one client updates the database, it is essential that all cached copies of the changed part of the database are likewise updated or marked as invalid. Hitherto, the use of caching schemes gives rise to a large amount of traffic to and from the server relating to synchronisation of various caches. Moreover, with known database access systems, it is not possible to implement caching without altering the application program interface (API) that the database system exposes to the clients, this necessitating alteration of clients upon implementation of the scheme.
An aim of this invention is to provide a system that can improve the performance of multi-user access to a database server without creating excessive server overload and that can be applied to existing database systems without the requirement to re-engineer client applications. From a first aspect, this invention provides a database access system comprising
a database access object operating on a client computer that functions as an interface between a database client and a database server,
in which the database object receives database queries from the client and returns a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and
in which, upon the data cached in the client being changed, the client sends a message to other like database access objects (for example, connected to the same client) to indicate to the other database access objects the extent to which the data in their caches is invalid.
Thus, each database access object acts as a virtual database server connection for the client. It can therefore be made transparent to the client such that, from the client's point of view, there is a direct connection with the databases server. Moreover, since the access objects communicate directly with one another, the server is not required to handle invalidation of the cached data. Thus, the invention can be implemented on a conventional system without re-engineering of either the clients or the server.
In addition to, or as an alternative to, invalidating the content of the cache of another database access object, the client access object may send a message to enable the other database access objects to update their caches to maintain them in synchronism with the database. This allows the other access objects to continue to use the content of their caches after modification of their contents.
Where the database access object passes the query to the database server, the query and/or its result are used to update the information contained in the object's cache.
In many systems, such as ODBC and JDBC, each client application communicates with a database driver that handles access to the database, the driver exposing a pre-defined API that abstracts the actual nature of the database. In such embodiments, the database access object may operate as a connection between the client application and the database driver. In such embodiments, the database access object most advantageously exposes an API that is functionally equivalent to the API that is exposed by the database driver.
A system embodying the invention typically includes one or more delegate objects. The delegate objects operate to establish communication with a database driver. In an advantageous configuration, instances of the delegates are fewer in number than instances of the database access objects. This arrangement can be provided because the existence of the cache reduces the number of required actual connections to the database.
In systems embodying the last-preceding paragraph, in the event that a delegate is not available, the query to the database access object waits for a delegate to become available, thereby blocking the process that made the query.
As an optional enhancement, which, in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
From a second aspect, the invention provides a database access object operable on a client computer to function as an interface between a database client and a database server, the database object being operative to receive database queries from the client and to return a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and in which, upon the data cached in the database access object being changed, the database access object sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid.
Such a database access object can be readily introduced into an existing database system to confer the advantages offered by the invention upon it. A database access embodying this aspect of the invention may defer table writes by retaining the modified data in the table cache and sending synchronisation messages to other database access objects connected to the same database server. By this means, repetitive writes of the same or similar information to the underlying database server are avoided.
From a third aspect, the invention provides a database access system comprising
a database access object operating on a client computer that functions as an interface between a database client and a database server,
in which the database object receives a request for data from the client and returns a response to the request to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a request to the database server to retrieve data to make a response, and
in which, in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
Thus, by populating the cache with data in addition to that which is actually required to answer the query, it is possible to anticipate subsequent queries, and load the data into the cache before the query is made.
For example, if a request made to the database access object requires a subset of columns from a row of a database, the other columns may also be retrieved and stored in the cache.
In order that the cache may be populated more effectively for the purposes of a given client, the database access object may expose entry points to its API that enable the client application to give an indication to the database access object of a strategy that should be adopted in obtaining additional data for populating the cache.
Additional data for the cache may be obtained through the request made to the database server being an augmented form of the query received from the client. Augmentation of the query is based upon a prediction of the expected nature of subsequent requests from the client, the aim being to provide the cache with optimal content.
In a database access system according to this aspect of the invention, upon the data cached in the client being changed, the client sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid. This can be considered to be a combination of the features of the first and third aspects.
From a fourth aspect, the invention provides a database access object operable on a client computer to function as an interface between a database client application and a database server, the database object being operative to receive database queries from the client application and to return a response to the queries to the client application wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and in which, in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
A database access object optionally in accordance with this aspect of the invention may be operative to create virtual tables presented to the client application that are entirely held within the cache. These can be accessed and manipulated without reference to the underlying database. A schema for the virtual tables may be defined within configuration data for the database access object independently of the database schema.
The configuration information presented to the database access object defines the schema of these "virtual" tables. A database access system embodying the invention may implement optimistic locking control. In particular, queries that are received by the database access object may augmented to implement optimistic locking control thus rendering the control transparent to the application program. The method may augment the database schema, and the queries and updates that operate on that schema, to insert optimistic control columns in tables that initially contained no such control columns. The values recorded in the optimistic control parts of the schema can be used by database access objects connected to the same server to detect a situation where another database access object has modified the data in a particular row without the first database access object being informed of the change; this delay in informing the latter database access object being most likely the result of network latency.
Further aspects of the invention provide a method of accessing a database by a method that employs the caching scheme described herein.
In summary, systems embodying the invention are of particular benefit for use with clients, such as J2EE application servers, that make many naive queries on the underlying database. Although the first of these queries will run substantially at the speed it would have done without implementation of the invention, subsequent queries are typically executed faster.
Each data access object is responsible for analysing the queries passed to it and determining if the required results may be obtained from the local cache, or only by reference to the underlying database. If a database query is made, the data access object may retrieve rather more data than was actually requested by the client software. For example, it may retrieve all of a row rather than just some specific columns. In this manner the data access object populates its internal cache.
Embodiments of the invention will now be described in detail, by way of example, and with reference to the accompanying drawings, in which:
Figure 1 is a diagram illustrating a client/server database system embodying the invention;
Figure 2 is a class diagram of a database access object being a principal component of an embodiment of the invention;
Figure 3 is a diagram of logical connections within a database system embodying the invention;
Figure 4 is a diagram of a caching scheme operating within a database system embodying the invention; Figure 5 is a diagram of a query parsing scheme operating within a database system embodying the invention;
Figure 6 is a diagram of a clustering scheme operating within a database system embodying the invention; and
Figure 7 is a diagram of events within a database system embodying the invention.
With reference to Figure 1 , in a database system embodying the invention a plurality of database clients 10 communicate with a database server 12. The database server 12 executes a database driver 14 that exposes a data access API to which the clients can make calls in order to access the data on the server 12.
In this embodiment, a database access object 20 is interposed between each client 10 and the database driver 14. It will be seen from Figure 1 that each database access object can communicate with one or with more than one client 10. The position of the database access object within the system is such that it serves as a "shim" layer between a client application and the database driver that it uses.
The embodiment will be described with reference to a database access object 20 that can operate in accordance with the Java Database Connectivity (JDBC) standard defined by Sun Microsystems. However, it will be understood that it could readily be applied to other database access standards such as OCI, ODBC and further standards yet to be developed. Accordingly, the database access object 20 will be described as code executing on a Java virtual machine, although it may equally be implemented as native code executing on a real machine.
Within a Java VM, process, there may exist many instances of the database access object. Instances are differentiated by their delegate connection specification, such as a connection to Oracle STAFF Database or Sybase FORECAST Database, and the database access object configuration document.
Each instance of the database access object, indicated in Figure 2 at livestore services a number of JDBC Java . sql . Connection implementations, each indicated in Figure 2 at ServicedConnection. The number of connections serviced by a single database access object instance is controlled by the client application that is using the system to access a database, and may be subject to connection pooling.
In order to service requests from a Java . sql . Connectl on object, a smaller number of delegate java . sql . Connecti on objects, refeπed to as DelegateConnection, are used. DelegateConnection objects represent connections to the underlying database provided via a third party JDBC driver and can handle database queries expressed in SQL.
A ServicedConnection object does not always require a DelegateConnection object to which to delegate SQL query handling, and there may therefore be fewer DelegateConnections instances than ServicedConnection instances. A DelegateConnection instance is bound to a ServicedConnection instance only for the duration of the cuπent transaction upon that ServicedConnection instance when a SQL write operation is performed or a SQL read is performed that cannot be answered by the database access object from data in its internal in-memory cache, as will be described below.
The database access object instances cache two types of information:
1. Data drawn from a particular row of a table; and
2. Result of a SQL Select statement that cannot be answered from cache.
The first type of cached data provides the database access object with the ability to support a variety of queries based upon the coverage of the cached data. For example, if the database access object has all of the rows of a particular table in memory, then it can answer any SQL query based upon that table (provided that the query does not use any joins to other tables that are not also in the cache memory). If the database access object has all of a defined subset of the rows from a particular table, then it can support SQL queries from its in-memory cache based upon that table whether the select criterion of the query is stronger than the defined subset; that is, if the query is guaranteed to be a subset of cached subset. The database access object is capable of maintaining this cached data current with respect to the SQL write operations that are performed via one of its ServicedConnection objects, or for which it received cluster synchronisation messages for from a remote application, as will be described. The second type of cache is a cache of the results from SQL queries that have been performed by a DelegateConnection object in instances in which the database access object did not have the data in its cache. In this embodiment, the database access object only tracks from which tables the results are drawn, and if it intercepts any modification to the data in any one of the source tables, then this select cache is invalidated.
Individual ServicedConnection objects maintain a record of what data they have modified during a transaction and then, upon successful committing of those changes via the DelegateConnection object, the changes are worked into the public in-memory cache of the database access object instance.
The database access object makes use of a SQL syntax parser to parse both prepared and non-prepared SQL statements created in the ServicedConnection objects. The results of this parsing operation provide an abstract representation of the query allowing the database access object to determine whether the SQL statement is a read or a write operation, upon which tables the statement depends, and so forth.
The following sections provide more detail on particular aspects of the design of the database access object.
As described in the above, the database access object provides JDBC connection implementations for use by its containing application. In order to make transactional changes to the database and answer queries that cannot be drawn from cache, the database access object uses JDBC connections from a delegate JDBC driver.
The number of delegate JDBC connections available for use by the database access object is typically less than the number of connections offered up by database access object. This structure is beneficial in that it is the source of a major scalability improvement to the application as delegate JDBC connections are expensive resources both financially (JDBC drivers and the database they provide access are commonly licensed on a per-connection basis) and in computing terms. Indeed, because the invention reduces the load that a client application places on the database, all applications connected to that database, whether they work though embodiments of the invention or not, may benefit from a performance improvement. The requirement for fewer delegate JDBC connections arises because some of the database activity can be carried out with reference to data stored in the cache only, with no requirement to access a JDBC database connection. For example, if the application only performs reads on a particular connection during a transaction and those reads can be satisfied from the cache of the database access object, then there is no need to delegate any SQL activities to the database and therefore no need to bind a delegate JDBC connection to the database access object connection. Such an operation, therefore, does not remove' a delegate connection from the pool of connections available for use by the other database access objects. Clearly, as read operations are made, the cache becomes increasingly populated, and the likelihood of any query being answerable from data in the cache alone increases.
The database access object is capable of offering XA JDBC connections that can be enlisted in distributed, or two-phase, commit transactions. In order to do this the delegate JDBC connections must be XA Connections. All of the connections offered by the database access object and all of the delegate connections used by the database access object will be of the same type, either XA or non-XA connections depending on the configuration.
Supporting the XA protocol is complex. However, since the database access object is the resource manager (as defined within the XA protocol definition) it never makes any modification to the database using anything other than a delegate XA connection. Therefore, the database access object can just delegate the XA protocol request to the delegate XA connection if one has been bound to the database access object connection. If no delegate JDBC XA connection has been bound to the database access object connection then no SQL writes have taken place and therefore responding to the XA protocol query is trivial and the database access object can do this directly.
The caching classes are at the central to the operation of embodiments of the invention and Figure 4 outlines the important classes and interfaces.
The database access object maintains three types of cache.
1. A cache of previously read rows from tables configured for caching along with indexes into these rows based on primary key columns, formally identified foreign key columns and empirically defined foreign key columns. (A table can be configured for caching either via a configuration file, or it can be done programmatically.)
2. Sequence caches which maintain pools of unallocated numbers from tables identified sequence sources. These may be normal tables or in some cases (Oracle, for example) may be primary concepts within the database.
3. Cached result sets which are the values returned from a previously completed complex query that was delegated to the underlying database.
Connections maintain a record of data that has been modified but not committed during a current transaction, and this information is used to modify the result of any query performed within the cache based upon the cache rows to make the query results consistent with the changes logged by the connections. For example, a connection may update the salary of a particular employee and then query to find all employee with the salary that the modified employee had previously. The modified employee should not be in the result set returned by the database access object, even though the cached data may indicate that it should be returned.
Each connection has an associated connection query engine object that can access the uncommitted modifications held by the connection in order to modify the results of a query processed by the more general query engine that sources its data from the public cache of table and sequence data.
In order to respond to JDBC presented SQL queries, and understand the implications of the insert, update or delete statements upon the cached data, the database access object parses the SQL statements that it handles. Figure 5 identifies the main classes and interfaces involved in the parsing of SQL statements.
Each database access object instance locates an implementation of the ParserFactory interface based on the database connected to by the delegate JDBC connection. Different databases have different dialects of SQL and therefore it is expected that ParserFactory and Parser will be implemented for generic several specific implementations, including, for example, SQL99, Oracle, DB2, SQLServer, Sybase and possibly others. The SQL syntax of these database systems only varies by small amounts but enough to require different syntax grammars for their SQL. Each of the SQL dialects supported are made available as a ParserFactory class in a Java package.
ParserFactory objects are used to create an instance of a class that implements the Parser interface. Each connection uses the ParserFactory to instantiate its own parser instance ensuring that parsing operations carried out by one connection do not interact with parsing operations performed by another connection.
Parser implementations parse SQL syntax that originates from the callers specification of a JDBC Statement or a PreparedStatement object implemented as part of the invention, and the database access object implementations of the JDBC Statement and PreparedStatement hold references to the abstract syntax tree built by the parser from the SQL statement.
The database access object PreparedStatement objects have access to equivalent, although not necessarily identical, PreparedStatement objects against delegate connections. The delegate PreparedStatement object created by the database access object may be different from the one provided by database access object, in SQL terms, as the delegate PreparedStatement object may have been modified to make its results contribute to the content of the database access object cache more effectively.
Database statements addressed to the database access object may or may not result in a statement on a delegate connection as it may be answered from the cached data and JDBC Statement objects are transient unlike PreparedStatement objects.
All JDBC Connection oriented operations that are non-standard are implemented as static methods in a connection class called that will accept as its first parameter the JDBC connection being used by the application. If the connection provided is not a JDBC connection to a database access object then the operation will have no impact and no exceptions will be thrown.
In this context, clustering describes the ability of the database access object to synchronize a group of database access object instances running within different application servers. Clustering ensures that each database access object instance has the current version of the content of the database as soon as is possible given the latency of local and wide area networks.
In order to move messages from one database access object instance to another, each database access object needs to be able to interact with a customers' chosen message oriented middleware, this being software that ensures a message can be transmitted between one piece of software and another in a reliable and asynchronous manner. A particular advantage of embodiments of the invention is the ability to built connectors that allow the database access objects to use any reasonable messaging system. Figure 6 outlines the major classes and interfaces that are involved in the clustering mechanism of embodiments of the invention.
There are two main interfaces in clustering, the first and most important is ClusterConnection, which is the implementation of the connector that allows the database access object to publish cluster synchronisation messages to and subscribe to cluster messages from a particular message system. In order for the ClusterConnection implementation to communicate with the database access object implementation in order to pass on cluster synchronisation messages that have been received, the embodiment provides an implementation of the ClusterConnectionContainer interface.
ClusterConnection implementations are called upon to publish cluster synchronisation messages after the JDBC connection has committed is changes and hence the publication is non-transactional. That is to say, it is possible for the containing application server or the messaging system to fail after the JDBC connection has committed is changed, resulting in a change to the content of the database without the publication of a synchronisation message. This could lead to other database access object instances that are members of the cluster having a stale but not invalidated cache. In some applications, this is acceptable, since this kind of failure is rare and guaranteeing transactional publication of synchronisation messages comes at a cost.
In order to support transactional publication of cluster synchronisation messages, it is possible for a cluster connector to support the TransactionalClusterConnector interface in which the case database access object will request the publication of a synchronisation message before the JDBC connection transaction has been committed. This gives the TransactionalClusterConnector an opportunity to enlist its publication in the global transaction or make use of the JDBC connection to ensure transactional publication.
The ClusterConnectionFactory makes it possible for the database access object to instantiate and configure a particular cluster connection implementation without any prior knowledge of the messaging transport being used. A configuration document specifies the name of the ClusterConnectionFactory to instantiate in order to create instances of the required cluster connector.
The event package of implemented in embodiments of the invention is not central to providing the advantages of the invention, but in the embodiments in which it is present, it can provide an integration interface for our other software systems as well as providing customers with the integration point for constructing event-driven architectures. Figure 7 presents the classes and interfaces that form the system's event API.
As instances of the database access object come into being as and when the controlling application creates a JDBC connection, it is possible to register an implementation of com.isocra.livestore.api.event.LivestoresListener using the static addLivestoresListener registration method on the com.isocra.Iivestore.api.even LivestoresListeners utility class. These listeners are informed about the opening of a new database access object instance and the closing of existing database access object instances as the first connection is opened and last connection is closed respectively.
Once the application has a reference to a database access object that is passed as an argument to the open method of a LivestoreListener implementation, it is possible to register an implementation of LivestoreListener in order to receive events of type DataChangedEvent, which describe changes in the enterprise data. Each DataChangedEvent describes a transactional unit of changes in terms of the tables, and rows within those tables, that have been inserted, updated or deleted. The following description provides a general description of the processing performed by the database access objects resulting from the principal inputs from a JDBC connection and clustering.
Reads (e.g. SQL select statements') and Caching Policies
This section describes the steps involved in processing a SQL select statement that queries the data based on a given expression.
1. First the select statement must be parsed. If the select is being used to prepare a statement then it may have bind variables to be given appropriate substitutions prior to execution. Otherwise, the statement is complete as is.
2. An initial check is made to determine whether or not the database access object is capable of answering this select statement form cache no matter what the cache content is. This is a check based on the structure of the select statement. In simple embodiments, the database access object may only be able to support queries from a single table with a where clause that requires the comparison of one or more columns with a literal value. If the database access object cannot attempt to answer this query from its cache then the process proceeds from step 6 below and otherwise continues at step 2.
3. The ConnectionQueryEngine instance related to the connection that is performing the select is asked to answer the select from the public cache of data via the single QueryEngine instance, taking into account any modifications to the data that may have been made with respect to this connection that have not been committed. If enough data is available in the cache then the result may be given to the caller otherwise continue to step 4.
4. The query must be direct down to the actual database. If no delegate connection has been bound to this the database access object connection then one will be chosen from the delegate connection pool, possibly causing this connection to block until one becomes available. Prior to the delegation of the select statement, the statement may be "augmented" to improve the contribution that the results will make to the data in the cache. The select statement will be augmented in accordance with rules to be determined, preferably to reflect the pattern of data requests made by the client application. For example, the query may be augmented to ensure that primary key columns and optimistic control columns are included.
5. The result returned by the delegate connection is passed on to the caller. As rows of data are iterated over (in a typical sequence of data accesses) by the caller the data is recorded in the public cache. When the result has been completely scanned, any index guaranteed by the statement will be stored in the public cache. For example, as each employee is iterated over from the result of a statement such as:
select surname from employee where department = 'engineering'
then the table row corresponding to each employee will be put into the public cache, along with an entry in its primary key index. However, only when all of the employees have been scanned can the department = 'engineering' index be built. The system must carefully manage the synchronisation of messages with the building of these indexes. The process then stops.
6. The query cannot inherently be answered from cache due to its complexity.
In this case, the query may already been performed and the results deemed reliable. If the statement is non-prepared then the statement itself will be used to lookup the cached result set. If the statement is of the prepared form then the statement and the values of each of the bind variables are the key into the result set cache. If the result can be found then it can be returned otherwise the query is delegated to the database, as described in step 4 above, with the additional step of recording the results of the query in the result set cache.
Note that the database access object may provide calls that a client can use in order to indicate an advantageous strategy for augmenting the SQL query. The following are examples of circumstances that this may be useful. • An operation allowing the application to ask a database access object connection for the delegate JDBC connection so that it can perform operations against the data source within the same transactional context but without incurring any parsing cost by the database access object or more importantly cache implications. An example use of this would be where the application wished to make an obscure call against, say, Oracle that does not change any data. Making the call on the JDBC connection of the database access object would cause the database access object to clear its local cache and send a cluster message to other objects requesting that they do the same. Performing this operation on the delegate connection would bypass the database access object and maintain the contents of the caches.
• Inform the database access object of the action it should take upon the next SQL write statement performed via the connection. This would cause database access object to ignore its default action (which may be to flush everything from its cache) and use instead the specified action to manage its own cache and replication in remote caches.
All SQL select statements (queries)can contribute to the cached data excluding:
• queries with a group by clause;
• queries with a having clause;
• queries with a distinct clause.
Non-prepared queries may be configured to avoid caching. This will mean that operations performed by the Statement interface can be configured to incur no parsing cost.
In order for a query to contribute to the cached data it must contain at least the primary key columns. If it does not have any primary key columns then it cannot be cached and a configuration that states it should be cached will result in a configuration warning. If optimistic locking is being used then a query must include the optimistic control columns. The database access object will augment queries to include both primary key columns and optimistic control columns where necessary.
The database access object may be configured to specify a column load dependency graph that identifies common columns to be included in a select list based on the presence of another column.
The database access object ensures that any select list extension will not impact the ResultSet passed back to the caller. This is done by appending the additional columns rather than prepending them, and intercepting the access to the ResultSet meta data to hide the additional columns.
Queries selecting columns from more than one table will be augmented to contain the primary key and optimistic control columns of all tables.
For example sel ect e. name , d . name from employee e, department d where e .depid = d . id would be augmented to sel ect e . name, d . name, e . id , e . version , d . name, d . version from empl oyee e , department d where e. depid = d . id .
Consider the case where the configuration of the database access object specifies that a table is being used to generate unique sequence numbers. The first read of sequence will cause the database access object to obtain a configured number of sequence allocations from the sequence. Subsequent allocations will come from in-memory pool until pool is exhausted at which point another batch of sequence numbers will be acquired form the underlying sequence table.
An Oracle sequence will be treated in the same way as generic table sequence except that Oracle provides a syntax for inserting a row and making use of the next sequence entry without the application actually reading the value of the sequence. This syntax places the sequence name where the literal value for the column would otherwise be. The database access object modifies the insert statement to use a literal value supplied from the in-memory pool of sequence numbers. Queries that cannot be answered from cache, but that are known to be based on tables configured for caching, can have the resulting ResultSet cached. If the query originates from a prepared statement then the ResultSet will be cached against the prepared statement with the actual bound parameters as a key. Otherwise the ResultSet will be cached against the actually query text as a key. Cached result sets are cleared in accordance with the cache management policy and when any one of the source tables undergoes a change.
Each cached table maintains a collection of indexes that will be used to promote fast in- cache retrieval of data. The most important index is the primary key index and all queries are augmented to ensure that the primary key is included in the ResultSet so that the index can be maintained. As a primary key column, or columns, is unique then the data in a ResultSet can contribute to the primary key index no mater what the select criterion are. This is also the case for any column marked as unique as long as the column is present in the ResultSet.
Indexes are also built from select criterion involving the equality of one or more columns from the same table. This includes both formal foreign key queries, such as sel ect e . name from empl oyee where department = ' engineering ' and non-foreign key queries such as sel ect e . name from empl oyee where sal ary = 5000. Indexes for foreign key columns are built automatically, unless turned off, whereas indexes on non-foreign key columns will only be built if configured.
In some cases the query may need to be augmented in order to maximise the use of indexes. For example: select surname from empl oyee , department where department . name = ' engi neering ' and emloyee . depid == department . depid and depa rtment . name= ' engi neeπ ng '
If a select returns data from a single table based on a foreign key restriction where the actual value for the foreign key is coming from a join. If the select was augmented to include the depid then the depid index for employee can be built. When an index is built, the order in which the elements are recorded is remembered so that a future select based on that index can re-use or invert the order. For example: sel ect name from employee where depid = ' engineeri ng ' order by sal ary ascending would build and index on depid and populate it for ' engi neeri ng ' and record that the employees are reference by the index in salary ascending order. If another use of the same select or even a select with order by sal ary descendi ng is used then the data required to respond to the query can be obtained from the cache.
Writes
Simple embodiments of the invention do not support deferring the writes requested by the calling application. As a result the, first action that all write operations must perform is to obtain a delegate connection if one has not already been bound to this connection.
Insert
First, the insert statement is delegated to the underlying database via the delegate JDBC connection bound to the applicable database access object connection. If the table to which the insert relates has an optimistic control column defined and a value for this has not been supplied and it does not default in the actual database, then the database access object will insert a starting value of 1. This will provide automatic optimistic lock control via the automatic augmentation of extra column queries to the SQL that the database access object receives.
The insert is recorded so that on commit the public cache can be modified and an appropriate cluster synchronisation message sent. The insert must also be recorded so that the ConnectionQueryEngine can modify any results obtained from the public cache, via the QueryEngine, based upon the un-committed modification of this connection/transaction.
If the database is Oracle, it is possible for the insert statement to refer to a sequence. This instructs Oracle to insert the specified row in the table and give the specified column the next value from the sequence, increasing the value in the sequence for the next user. For this type of operation, the database access object identifies the reference to the sequence and replaces the sequence reference with the literal value for the column as taken from the sequence number pool being managed by reference.
Inserts can take two forms. The first is a single row insert with all of the mandatory columns of the row being supplied with literal values or an insert into one table based on the select from another table(s). This form of the insert is by far the most common and it is easy for the database access object to make the appropriate modification in its cache.
A single row insert will result in a new row in the table cache, if the table is being cached, and any relevant indexes being managed. Any cached ResuItSets drawn from the table being inserted against will be cleared.
A multi-row insert will cause the database access object to flush its cache of all values against the appropriate table. Any cached ResuItSets drawn from the table being inserted against will be cleared.
Update
Updates may be performed against tables or tables being used to source sequence numbers. These two are considered separately. To update a table, modify it to include version update and version restriction in where clause; delegate it; record id. Updating a sequence involves no operation.
In more detail, the first, and most common type of update, is the update of a single row in a table as specified by its foreign key values.
For example: update empl oyee, set sal ary = 5001 where empιd=1001
It is also possible to perform updates against tables where the rows are restricted based on complex where clauses or joins. update employee, set sal ary = sal ary + 1 where depιd= ' engineeri ng '
The cache in the database access object can support both of these updates but there are issues with respect to inter-object instance concurrency. The first form of the update causes the row in the table cache, if present, to be modified to reflect the new values and any relevant indexes to be maintained; sal ary in this example.
Similar behaviour would result from the second update. If we have a depid index for ' engineeri ng ' then the database access object can quickly modify the salary of all appropriate employees. If there is not index, then a scan of the table cache will be performed. If the salary is not resident in cache then it need not be updated. It will be pulled loaded required. Any indexes will be maintained. In this example, maintenance of the indexes would be quite complicated, since an empl oyee row would move from a index with one salary value to an index with another salary value.
A problem arises when optimistic locking is being used and a multi-row update is performed because it is not possible to push the multi-row update down to the delegate connection and specify the optimistic version that each targeted row must have. As a result of this optimistic control issue and the fact that multi-row updates are relatively infrequent, in this embodiment, the database access object will invalidate the appropriate table cache and any ResultSet drawing results from the table.
Delete
Similar to updates in that there are two forms of delete, single row primary key delete or multi-row delete. Typically, multi-row deletes are more common than multi-row updates because of the common requirement for cascaded deletes. Imagine a rule that causes an application to delete all employees of a department if the department is deleted, or maybe a slightly more realistic example of deleting all orderitems if an order is deleted.
It is possible to handle multi-row deletes if the table and the delete being used meet certain criteria: if all orderitems with a fkey of ordereid are deleted and orderid in the order table is unique then it can be handled with grace rather than a complex flush of the orderitem table cache. The reason is that no orderitem can be inserted, or another orderitem modified, to have the orderid being used in the delete criterion in parallel with this delete operation. Transactional Isolation
JDBC and ODBC specify four different levels of transactional isolation that state the maximum interaction between two or more transactions.
1. TRANS ACTION_RE ADJUNCOMMITTED — allows transactions to see uncommitted changes to the data. This means that dirty reads, non- repeatable reads, and phantom reads are possible.
2. TRANSACTION_READ_COMMITTED — means that any changes made inside a transaction are not visible outside the transaction until the transaction is committed. This prevents dirty reads, but non-repeatable reads and phantom reads are still possible.
3. TRANSACTION_REPEATABLE_READ — disallows dirty reads and non-repeatable reads. Phantom read are still possible.
4. TRANSACTION_SERIALIZABLE — specifies that dirty reads, non- repeatable reads, and phantom reads are prevented.
The database access object must respect these isolation levels (which may be configured via the java.sql. Connection interface) within the limits defined by the delegate JDBC driver.
It is always possible to use a stronger isolation level where a weaker one is not available, so TRANSACTION_READ_COMMITTED behaviour can be given where TRANS ACTION_READ_UNCOMMITTED is requested. Some embodiments of the invention will not support TRANSACTION_READ_UNCOMMITTED, the lowest isolation level supported being TRANSACTION_READ_COMMITTED. Moreover, the database access object does not enhance or improve the isolation levels offered by the delegate JDBC driver or target data source; it will only ensure that it respects these isolation levels with the operations it performs, relying on the delegate JDBC driver and target database to implement the require isolation level correctly.
Moreover, some embodiments of the invention not support TRANS ACTION_SERIALIZABLE. The reason for this is that, from a specification perspective, each isolation level proves more difficult to implement and implies slower performance. In general, applications tend to use READ_COMMITTED isolation level. A request for this isolation level will result in a SQLException being thrown and the supportsTransactionlsolationLevel operation on the DatabaseMetaData interface of the database access object for this level will return false. If the default isolation level for the JDBC driver and database is TRANSACTION_SERIALIZABLE then if a connection is used prior to having its isolation level reduced then a SQLException will be thrown.
With TRANSACTION_READ_COMMITTED, all modifications made to rows, be they insert updates or deletes, must be recorded against the connection until it is successfully committed at which point the changes may be rolled back into the public table cache. TRANSACTION_REPEATABLE_READ support may not be implemented in all embodiments of the invention since most performance or concurrency sensitive applications do not use it.
Queries that contribute to the table caches may be made public immediately if no write has been performed against the source tables. If any write has been performed then the contribution to the public cache may only be made once the transaction has been committed.
ResuItSets that are cached may be cached in the public portion of the cache if the source tables of the ResultSet have not been written to. If the source tables of a cacheable ResultSet have been modified then the ResultSet may only be cached publicly upon successful commit of the transaction.
Locking
Locking is not something that appears as part of the JDBC specification; it is the mechanism by which database support the transactional isolation rules along with the more general ACID rules.
The database access object provides the following locking strategies:
1. Optimistic locking using version control columns in tables
2. In-memory write locking
The first of these, using version control columns in tables, is the only way to detect write-write conflicts between two transactions in different virtual machines. Modifications to rows in tables that do not have optimistic control columns could result in write-write conflicts between distributed applications that may not be detected.
In the general case optimistic locking is done by including a count value in all records, the count value being updated whenever a change is made. This resulting code might look like this. sel ect name , age , sex, opt ock from person where ID=227 which might return the following data name : George Bπ sco, age : 21. sex: mal e, optj ock : 6
Now suppose at some time later the client application wishes to update this row. It will use an update statement as follows: update person set age=22, opt_l ock=7 where ID=227 and opt_lock=6
If no-one else has updated the record then this should be successful, and the update will occur. However, if someone else has changed the data, then the update will fail as it will be impossible to find a row in the database where ID=227 and opt_lock=6. The application therefore knows that this has happened and can take appropriate action.
If this type of control is to work all the optimistic locking control must always be done if it is to be useful. It can also obscure the real logic of the system that is being developed. In embodiments of the invention, the database access object can be configured to perform automatically the addition of the optimistic locking control to the SQL statements. This allows application developers to concentrate on their application logic, rather than on the optimistic control logic.
The locking control will happen in memory for non-clustered embodiments of the invention, and therefore the SQL sent to the database will be unchanged. For clustered implementations the database schema must be changed to include these optimistic lock control columns.
In-memory write locking can be configured to detect write-write conflicts between multiple transactions within the same database access object instance without storing optimistic control values within the table. The database access object is not capable of detecting or preventing read-write conflicts where one transaction modifies data being read by another transaction. Individual tables may be configured to use table optimistic locking, memory optimistic locking or no locking at all. In the cases where a write is performed against a row that has not been read then the optimistic control field is incremented but not tested in the where clause of the update or delete operation.
Multi-row updates (those performed without a primary key restriction) cannot support optimistic locking because it is not possible to state the required old value for each row targeted by the update statement. The database access object will allow multi-row updates to be performed and it will do so by reading the oc value of all rows that will be targeted by the update (for update) and placing a write-lock on them so that they cannot be updated by anyone else. Then the rows will be located in cache and the oc values are compared to check that no row has been changed.
Multi-row deletes must also be handle by the database access object. As with multi-row updates, the old value of the optimistic control value cannot be specified. Multi-row deletes occur more frequently than multi-row deletes, generally as the result of a cascaded delete, that is the deletion of an ownership network of objects. Multi-row deletes are handled in the same way as multi-row updates; that is to read the optimistic control variable for each row targeted by the delete (for update), causing a write lock and then doing an in-memory check of all the oc values.
It is possible to read different parts of the same row at different times and get data based on different versions of the row as identified by the optimistic control column. If the connection reading the row is using READ_COMMITTED isolation then everything is fine. However, if it is in REPEABLE_READ mode then a SQLException will be thrown.
Cache Preloading
The database access object provides the ability, via its configuration document, to preload its cache ahead of any usage by its calling application and initialise any required indexes. The preload element of the database access object configuration document supports two different sub-elements; code, and sql . The code and sql elements can appear in any order and may occur multiple times.
For example:
<preload>
<code>com. I socra .11 vestore . exampl es . Prel oader ( "a" , 55)</code> <sql>select * from appserver where name='UK-l-l '</sql>
</preload> The code element supports the execution of an arbitrary Java class that implements the preloader interface. The format of the code content is <cl ass>[(<parameters>)]. The class referred to must be available to the livestore instance via Cl ass . forNa eO and must implement the com . i socra . I i vestore. api . Prel oader interface. The class must only have a single constructor which may require arguments in which case the code element must specify the values. An instance of the Prel oader class will be created and the execute method will be called with a suitable JDBC connection allowing the Preloader to invoke SQL select statements. If any other form of SQL statement is attempted an exception will be reported and this livestore instance will not complete its initialisation correctly.
The SQL element allows the specification of a SQL select statement that will be executed to load data in the cache. Only SQL select statements will be supported, any other SQL statement will cause an exception and the failure of the database access object initialisation.
Clustering
Database access object instances may be configured into clusters and these cluster share synchronisation information in order to ensure that one database access object cache reflects changes that have been committed via another database access object cache. Embodiments of the invention can support partitioning of cluster synchronisation messages in order to reduce or refine resource usage. It is also possible for applications the do not use database access objects to generate and consume cluster messages, thus aiding legacy integration. The database access object configuration document specifies the cluster configuration for a database access object instance.
A particular embodiment of the invention provides two cluster communication implementations: database access object clustering and JMS clustering. Cluster messages may be formatted in an opaque binary format, preferred for performance reasons, or an open XML format and the database access object configuration document will specify which format to be used.
Database access object clustering uses no third party products, just Java 1.2's TCP/IP sockets. A daemon process, called LivestoreCIusterHub, acts as the registration point and message broadcaster for all database access object instances within the cluster.
The database access object cluster implementation requires three parameters:
1. The name or IP address of the host running the LivestoreCIusterHub. Defaults to localhost.
2. The number of the port the LivestoreCIusterHub is listening on, no default.
3. The host name or IP address of the particular network interface card that this database access object cluster connector should use for its communication. This is useful in multi-homed server machines that have multiple network cards.
The class that implements the database access object clustering ClusterConnectionFactor, the class that should be mentioned in the database access object configuration document, is com.isocra.livestore.cluster.livestore.ClusterConnectionFactory.
LivestoreCIusterHub instances may be brought up as primary or secondary instances. If a hub is configured as a secondary instance then it will talk to the primary, which will publish the availability of the secondary to the cluster members.
Database access object clustering has two different configurations controlled by a configuration property called "transactional" with values "true" or "false".
The non-transactional configuration of database access object clustering publishes cluster message after the transaction that causes them has been committed. If the database access object instance dies (or is terminated) between committing the transaction and publishing the cluster message to the hub then the message has been lost. In this case the hub will inform the other member of the cluster and they will be forced to flush their cache of data.
If a cluster member has a problem communicating with the hub then it will attempt to reconnect once and then it will attempt to contact the secondary hub. If it cannot connect to the secondary, or there is no secondary, then it will inform the database access object instance that it must flush its cache and refrain from caching anything until the connection is re-established.
The transactional configuration of database access object clustering makes use of the delegate JDBC connection to persist the cluster message to be published as part of the transaction thereby achieving atomicity with the application transaction. The ClusterConnector is still responsible for publishing the cluster message after the transaction has been committed but if the hub detects the loss of a cluster member it can look at the database and check which messages it never received and publish those automatically. If the hub itself dies then when each cluster member goes to the secondary hub it must inform it of the last message id received from each of the other members of the hub thus allowing the hub to check the database for any later message and re-publish them.
The JMS cluster communication implementation will make use of a third party, not- bundled, JMS implementation to publish cluster messages and subscribe to cluster messages. Both a Publish-Subscribe (Topic) and Point-to-Point (Queue) mechanisms will be supported.
For the queue configuration, JNDI lookup names for the QueueConnectionFactory and Queue are required. For the topic configuration, the JNDI lookup names for the TopicConnectionFactory and Topic are required. If the JMS implementation supports distributed transactions, XA interfaces, and the database access object configuration for the JMS clustering specifies it then transactional publication will be used.
The XML format for the cluster message are demonstrated in the following example.
The example causes the receiving database access object instance to clear all cached data. This is the kind of message that would be emitted as a result of an unknown write operation being performed. The clearAH operation cannot be combined with any other operations:
<l ιvestorecm src="some identi fier for the source"> <clearAl l /> </l ιvestorecm>
The following example causes the receiving database access object instance to clear all data cached in a particular table. This is the kind of message that would be emitted as a result of an unknown operation where the actual target table was identified. No other operations can be specified for the same table although operations on other tables can be included:
<l ιvestorecm src="some identifier for the source"> <table name="EMPL0YEE">
<cl earAl l /> </table> </l ιvestorecm>
The following example demonstrates the specification of inserts, updates and deletes against a particular table and is the normal form of messages expected: <l ι vestorecm>
<tabl e name="EMPL0YEE">
<ι nsert surname="SHAMIR" sal ary="100" , versιon="l"/> <update surname="H0VERD" sal ary="101"/> <delete surname="SHEEHAN" versιon="101"/>
<tabl e> </l ι vestore>
A utility class is provided that converts between opaque cluster messages to XML cluster messages and vice versa. This conversion may be built into an application that consumes from one cluster connection and publishes on another cluster connection with a format change.
The database access object provides a cache of statements that have been prepared against each connection.
Although JDBC has the notion of prepared statements that allow a parameterised piece of SQL to be given to the connection (and hence the database) for parsing and possibly query optimisation for later execution with specific variable substitutions. J2EE makes the mechanism almost useless. J2EE contained components do not hold references to
JDBC connections, they ask the J2EE container for connections allowing the container to leverage connection pools and other scalability mechanisms. This means that every time a component requests a connection object it can be given a different one and using a statement prepared against one connection in another connection is invalid. JDBC 3.0 contains facilities for preparing statements against a connection pool which removes this problem. Until the availability and use of JDBC 3.0 drivers becomes pervasive the caching of prepared statements it the best way forward. The database access object parses SQL statements and anything that can be done to reduce the amount of parsing that goes on the better.
When a connection requests a new prepared statement, the database access object checks its cache of statements, using the full text of the SQL string being provided as the basis for the prepared statement key. If the database access object finds a prepared statement then it will return it otherwise a new prepared statement is created against the delegate driver connection and registered with the prepared statement cache for the connection. As prepared statements are an expensive resource, the number of cached prepared statements can be managed and the least used statements will be cleared from the cache if the limit is reached. The pscachemax attribute in the configuration document specifies the maximum number of prepared statements that can be cached. When pscachemax is set to zero then no statements will be cached.
An example configuration document is given below with comments to describe the properties and values:
<l ι vestore pscachemax="100" <!- states the max number of cached prepared statements. Infinte by default. --> rscachemax="10" <!-state sthe max number of cached result sets. Infinite by default --> >
<secuπty>
<user name="x" password="y"/>
<!-- restricts the user name and password that can be used to connect to this livestore instance to x/y. Without this entry a connection with no user name or password is accepted. --> </secuπty>
<table name="EMP" cache="true" cluster="true"> <i-- By defaults a table will not be cached and will not send cluster synchronisation messages. Caching and clustering are independently controllable as one member of a cluster may not cache a table but another one does and therefore cluster synchromsaton message must be sent regarding changes to this table. -->
<ιndex fιelds="depιd" on="false"/>
<!-- depid is an fkey to the department table and as such livestore will maintain indexes for queπes performed in this fkey. This turns it off. -->
<ιndex fιelds="salary" on="true"/>
<!-- unlike depid which is an fkey, salary is not and therefore livestore would not maintain indexes for it but this forces livestore. -->
<lockιng ocfιeld="versιon'7> </table> <sequence name="abc" cache="true" allocatιonSιze="10"/>
<cluster factory="some class implementing " format="opaque">
<>-- factory identifies the class that implements the factory interface Format specifies either opaque (default if not specified) or xml --> <propertιes>
<property name="hub">deans 1099</property> </propertιes>
</cluster>
<delegate maxConnectιons="5" connect!onTιmeout="500ms">
<'-- maxConnections specifies the number of delegate connections that livestore will use at max This max will only be reached when the same number of livestore connections has been opened up by the application connectionTimeout specifies how long livestore will block waiting to acquire a delegate connection -->
<'-- The delegate connection specification will take one of the following forms -->
<dπver class="a b c d e" url="" user="" password="'7> <propertιes>
<property name="">value</property> </propertιes>
</dπver>
<datasource name="a b c d e" user="" password=""/>
<ιmtialContextProperties> <name>value</name>
</ιmtιalContextProperties> </datasource>
7delegate>
<preload>
<sql>select * from employee where depιd='engιneeπng'</sql> <prel oader class="com i socra l ivestore examples Preloader" params="a , 55"/>
</preload> </l ιvestore> There are several enhancements to the embodiment described above that can optionally be incoφorated into embodiments of the invention.
Virtual Tables: Virtual tables support the storage of state information between transactions and instances of clustered database access objects. A virtual table has much of the functionality of an actual database table, but is maintained entirely within the cache and not in the database itself.
A virtual table is defined within the configuration document in much the same way as it would be defined within a database. Virtual tables may have rows inserted, updated and deleted and selects performed within that table. Joins may or may not be supported, as required. Virtual tables may form the basis of any support for the Object Caching Service for Java specification. Embodiments may include a loader API which can load rows into the virtual table at the beginning.
Deferred Writes: By default, all writes performed to database access objects connections are converted immediately into writes performed on the delegate JDBC connections. Deferred writes support the collection of all writes operations to just before the commit of the transaction and may make use of the delegate JDBC drivers batch write facility. As soon as a query is performed that cannot be answered in cache then any writes that have been deferred may need to be flushed depending on the tables being searched and the tables the writes impact. By this means, repetitive writes of the same or similar information to the underlying database server are avoided.
Partitioning: Partitioning is the term used to describe the segmentation of inter-database access objects synchronisation messages into discrete groups with the objective of managing the bandwidth requirements, CPU resources and providing for different qualities of service in the distribution of synchronisation messages. The simplest form of partitioning is schema-based whereby changes to particular tables are separated from changes in other tables. For example, one cluster of servers may have synchronised reference data but not synchronized order data but each server in the cluster may need to synchronized its order data with anther cluster of order management servers. More complex forms of partitioning are possible with data based partitioning where row in a table that meet one criterion are synchronised via one cluster and row in the same table that meet another criterion are synchronized via another cluster.
Mapped connections: Provision of mapping layer that can map from an abstract scheme to the real physical schema. This can support richer CMP storage for J2EE servers.
Java Interfaces: The Java classes and interfaces that form the Application Programmer Interface available to the users of embodiments of the invention above and beyond the JDBC 2.1 for example, within a CVS repository. The Java classes and interfaces are specified in the javadoc documentation reproduced below.

Claims

Claims
1. A database access system comprising
a. a database access object operating on a client computer that functions as an interface between a database client and a database server,
b. in which the database object receives database queries from the client and returns a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and
in which, upon the data cached in the client being changed, the client sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid.
2. A database access system according to claim 1 in which in addition to, or as an alternative to, invalidating the content of the cache of another database access object, the client access object sends a message to enable the other database access objects to update their caches to maintain them in synchronism with the database.
3. A database access system according to claim 1 or claim 2 in which in cases where the database access object passes the query to the database server, the query and/or its result are used to update the information contained in the object's cache.
4. A database access system according to any preceding claim in which each client application communicates with a database driver that handles access to the database, the driver exposing a pre-defined API that abstracts the actual nature of the database.
5. A database access system according to claim 4 in which the database access object operates as a connection between the client application and the database driver.
6. A database access system according to claim 5 in which the database access object exposes an API that is functionally equivalent to the API that is exposed by the database driver.
7. A database access system according to any preceding claim including one or more delegate objects, each object operating to establish communication with a database driver.
8. A database access system according to claim 7 in which instances of the delegates are fewer in number than instances of the database access objects.
9. A database access system according to claim 8 in which, in the event that a delegate is not available, the query to the database access object waits for a delegate to become available, thereby blocking the process that made the query.
10. A database access system according to any preceding claim in which, in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
11. A database access system according to any preceding claim operative to defer table writes by retaining the modified data in the table cache and sending synchronisation messages to other database access objects connected to the same database server.
12. A database access object operable on a client computer to function as an interface between a database client and a database server, the database object being operative to receive database queries from the client and to return a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and in which, upon the data cached in the client being changed, the client sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid or to update their caches.
13. A database access object according to claim 12 operative to defer table writes by retaining the modified data in the table cache and sending synchronisation messages to other database access objects connected to the same database server.
14. A database access system comprising a database access object operating on a client computer that functions as an interface between a database client and a database server, in which
a. the database object receives a request for data from the client and returns a response to the request to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a request to the database server to retrieve data to make a response, and in which
b. in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, all data retrieved from the database server being stored in the cache.
15. A database access system according to claim 14 in which the database access object exposes entry points to its API that enable the client application to give an indication to the database access object of a strategy that should be adopted in obtaining additional data for populating the cache.
16. A database access system according to claim 14 or claim 15 in which the request made to the database server is an augmented form of the query received from the client.
17. A database system according to claim 14 in which augmentation of the query is based upon a prediction of the expected nature of subsequent requests from the client.
18. A database access system according to any one of claims 14 to 17 in which, upon the data cached in the client being changed, the client sends a message to other like database access objects to indicate to the other database access objects the extent to which the data in their caches is invalid.
19. A database access system according to any one of claims 14 to 18 in which queries that are received by the database access object are augmented to implement optimistic locking control.
20. A database access method according to claim 19 which augments the database schema, and the queries and updates that operate on that schema, to insert optimistic control columns in tables.
21. A database access object operable on a client computer to function as an interface between a database client and a database server, the database object being operative to receive database queries from the client and to return a response to the queries to the client wherein each database access object maintains a cache of data and if the response data is available in the cache provides the response from the cached data and otherwise passes a query to the database server to retrieve data to make a response, and in which, in cases where the database access object retrieves data from the database server, it may retrieve data in addition to that required to make the response, data retrieved from the database server being stored in the cache.
22. A database access object according to claim 21 operative to create virtual tables presented to the client application that are entirely held within the cache.
3. A database access object according to claim 22 in which a schema for the virtual tables are defined within configuration data for the database access object.
PCT/GB2003/001233 2002-03-22 2003-03-21 Database system comprising database access object with cache WO2003081464A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003216854A AU2003216854A1 (en) 2002-03-22 2003-03-21 Database system comprising database access object with cache

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB0206810.4A GB0206810D0 (en) 2002-03-22 2002-03-22 Database system
GB0206810.4 2002-03-22

Publications (2)

Publication Number Publication Date
WO2003081464A2 true WO2003081464A2 (en) 2003-10-02
WO2003081464A3 WO2003081464A3 (en) 2004-06-03

Family

ID=9933528

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/001233 WO2003081464A2 (en) 2002-03-22 2003-03-21 Database system comprising database access object with cache

Country Status (3)

Country Link
AU (1) AU2003216854A1 (en)
GB (1) GB0206810D0 (en)
WO (1) WO2003081464A2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7328222B2 (en) * 2004-08-26 2008-02-05 Oracle International Corporation Method and apparatus for preserving data coherency in a database by generating a command object that includes instructions for writing a data record to a local cache
WO2008038271A1 (en) * 2006-09-28 2008-04-03 Xeround Systems Ltd. An apparatus and method for a distributed storage global database
EP2146292A1 (en) * 2008-07-18 2010-01-20 QlikTech International AB Method and apparatus for extracting information from a database
US7783900B2 (en) * 2003-05-30 2010-08-24 International Business Machines Corporation Querying encrypted data in a relational database system
US9971827B2 (en) * 2010-06-22 2018-05-15 Microsoft Technology Licensing, Llc Subscription for integrating external data from external system
CN110347707A (en) * 2018-04-03 2019-10-18 艾玛迪斯简易股份公司 Update cached data
CN111770182A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Data pushing method and device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841889A (en) * 2011-06-20 2012-12-26 中兴通讯股份有限公司 Method and device for achieving efficient database access based on ORM (object relational mapping) architecture

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897634A (en) * 1997-05-09 1999-04-27 International Business Machines Corporation Optimized caching of SQL data in an object server system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897634A (en) * 1997-05-09 1999-04-27 International Business Machines Corporation Optimized caching of SQL data in an object server system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FRANKLIN M ET AL: "Transactional client-server cache consistency: alternatives and performance" ACM TRANSACTIONS ON DATABASE SYSTEMS, ASSOCIATION FOR COMPUTING MACHINERY. NEW YORK, US, vol. 22, no. 3, September 1997 (1997-09), pages 315-363, XP002247935 ISSN: 0362-5915 *
INTERNATIONAL BUSINESS MACHINES CORPORATION: "Using consistency check value as optimistic locking value for staging host production data at client/server for access by IMS csobject client/server feature" RESEARCH DISCLOSURE, KENNETH MASON PUBLICATIONS, HAMPSHIRE, GB, vol. 442, no. 150, February 2001 (2001-02), XP007127709 ISSN: 0374-4353 *
KELLER A M ET AL: "A predicate-based caching scheme for client-server database architectures" PARALLEL AND DISTRIBUTED INFORMATION SYSTEMS, 1994., PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AUSTIN, TX, USA 28-30 SEPT. 1994, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, 28 September 1994 (1994-09-28), pages 229-238, XP010100051 ISBN: 0-8186-6400-2 *
U. HALICI, AND A. DOGAC: "An Optimistic Locking Technique For Concurrency Control in Distributed Databases" IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, vol. 17, no. 7, July 1991 (1991-07), pages 712-724, XP002271650 *
YONGDONG WANG ET AL: "CACHE CONSISTENCY AND CONCURRENCY CONTROL IN A CLIENT/SERVER DBMS ARCHITECTURE" SIGMOD RECORD, ASSOCIATION FOR COMPUTING MACHINERY, NEW YORK, US, vol. 20, no. 2, 1 June 1991 (1991-06-01), pages 367-376, XP000364651 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7783900B2 (en) * 2003-05-30 2010-08-24 International Business Machines Corporation Querying encrypted data in a relational database system
US7328222B2 (en) * 2004-08-26 2008-02-05 Oracle International Corporation Method and apparatus for preserving data coherency in a database by generating a command object that includes instructions for writing a data record to a local cache
WO2008038271A1 (en) * 2006-09-28 2008-04-03 Xeround Systems Ltd. An apparatus and method for a distributed storage global database
US7890463B2 (en) 2006-09-28 2011-02-15 Xeround Systems Ltd. Apparatus and method for a distributed storage global database
EP2146292A1 (en) * 2008-07-18 2010-01-20 QlikTech International AB Method and apparatus for extracting information from a database
US8244741B2 (en) 2008-07-18 2012-08-14 Qliktech International Ab Method and apparatus for extracting information from a database
US9971827B2 (en) * 2010-06-22 2018-05-15 Microsoft Technology Licensing, Llc Subscription for integrating external data from external system
CN110347707A (en) * 2018-04-03 2019-10-18 艾玛迪斯简易股份公司 Update cached data
CN111770182A (en) * 2020-06-30 2020-10-13 北京百度网讯科技有限公司 Data pushing method and device

Also Published As

Publication number Publication date
WO2003081464A3 (en) 2004-06-03
AU2003216854A1 (en) 2003-10-08
AU2003216854A8 (en) 2003-10-08
GB0206810D0 (en) 2002-05-01

Similar Documents

Publication Publication Date Title
US11176140B2 (en) Updating a table using incremental and batch updates
US6820085B2 (en) Web system having clustered application servers and clustered databases
CN1708757B (en) A transparent edge-of-network data cache
US7162467B2 (en) Systems and methods for managing distributed database resources
US7526479B2 (en) Configuration manager in enterprise computing system
US5448727A (en) Domain based partitioning and reclustering of relations in object-oriented relational database management systems
US7970823B2 (en) System for sharing data objects among applications
US6681225B1 (en) Method, system and program products for concurrent write access to a global data repository
US6502103B1 (en) Providing composed containers and data objects to support multiple resources
US8156082B2 (en) System and methods for temporary data management in shared disk cluster
US6567809B2 (en) Disabling and reloading enterprise java beans using database trigger programs
US6219675B1 (en) Distribution of a centralized database
US7555488B2 (en) Prefetching and caching persistent objects
US7272833B2 (en) Messaging service in a federated content management system
US6823514B1 (en) Method and system for caching across multiple contexts
CN100437590C (en) Method for prefetching object
KR101400214B1 (en) Appratus for providing a hybrid c interface
US6571252B1 (en) System and method for managing persistent objects using a database system
Bernstein et al. Indexing in an Actor-Oriented Database.
US8655853B2 (en) System and method for optimizing database transaction termination
US7827135B2 (en) Method and apparatus for relaxed transactional isolation in a client-server caching architecture
WO2003081464A2 (en) Database system comprising database access object with cache
Yang Data synchronization for integration systems based on trigger
KR20020038575A (en) Method and apparatus for maintaining data integrity across distributed computer systems
Porto et al. Persistent object synchronization with active relational databases

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP