US8904381B2 - User defined data partitioning (UDP)—grouping of data based on computation model - Google Patents
User defined data partitioning (UDP)—grouping of data based on computation model Download PDFInfo
- Publication number
- US8904381B2 US8904381B2 US12/358,995 US35899509A US8904381B2 US 8904381 B2 US8904381 B2 US 8904381B2 US 35899509 A US35899509 A US 35899509A US 8904381 B2 US8904381 B2 US 8904381B2
- Authority
- US
- United States
- Prior art keywords
- data
- computer processor
- partitions
- region
- partitioning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G06F17/30339—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
-
- G06F17/30584—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/964—Database arrangement
- Y10S707/966—Distributed
- Y10S707/967—Peer-to-peer
- Y10S707/968—Partitioning
Definitions
- IT information technology
- BI operational business intelligence
- HPC high performance computing
- scalable data warehousing both of which are based on the use of computer cluster technology and partitioning of tasks and data for parallel processing.
- improper partitioning of data over computer cluster nodes often causes a mismatch in converging computation and data.
- FIG. 1A illustrates a river drainage network model, according to an embodiment
- FIG. 1B illustrates a cluster of servers to implement a river drainage network model described with reference to FIG. 1A , according to an embodiment
- FIG. 2A illustrates a partitioning of river segments into regions, the river segments being included in a river drainage network model described with reference to FIG. 1A , according to an embodiment
- FIG. 2B illustrates a data dependency graph for defining an order of processing data partitions, according to an embodiment
- FIG. 3A illustrates a block diagram of a UDP based parallel processing system, according to an embodiment
- FIG. 3B illustrates a region tree with region levels in data partitioning, according to an embodiment
- FIG. 3C illustrates parallel access of allocated partitioned data, according to an embodiment
- FIG. 4 illustrates a block diagram for a generalized process for parallel processing based on a UDP, according to an embodiment
- FIG. 5A illustrates a flow chart of a method for generating a UDP, according to an embodiment
- FIG. 5B illustrates a flow chart of a method for allocating data partitions, according to an embodiment
- FIG. 6 illustrates a system architecture based on a convergent cluster for implementing UDP based parallel processing, according to an embodiment
- FIG. 7 illustrates a block diagram of a computer system, according to an embodiment.
- System One or more interdependent elements, components, modules, or devices that co-operate to perform one or more functions.
- Configuration Describes a set up of elements, components, modules, devices, and/or a system, and refers to a process for setting, defining, or selecting hardware and/or software properties, parameters, or attributes associated with the elements, components, modules, devices, and/or the system.
- a cluster of servers may be configured to include 2**N servers, N being an integer.
- An architecture used in an information technology (IT) environment may include electronic hardware, software, and services building blocks (used as platform devices) that are designed to work with each other to deliver core functions and extensible functions.
- the core functions are typically a portion of the architecture that may selectable but not modifiable by a user.
- the extensible functions are typically a portion of the architecture that has been explicitly designed to be customized and extended by the user as a part of the implementation process.
- Model can be a representation of the characteristics and behavior of a system, element, solution, application, or service.
- a model as described herein captures the design of a particular IT system, element, solution, application, or service.
- the model can include a declarative specification of the structural, functional, non-functional, and runtime characteristics of the IT system, element, solution, application, or service. The instantiation of a model creates a model instance.
- Embodiments of systems and methods for partitioning of data based on a computation model are disclosed herein that enable convergence of data intensive computation and data management for improved performance and reduced data flow.
- co-locating computation and data is desirable for efficiency and scalability. Therefore, it is desirable to partition data in a manner that is consistent with the computation model.
- the systems and methods disclosed herein provide a user defined data partitioning (UDP) key for making application-aware data partitioning of original data.
- UDP user defined data partitioning
- Moving data is often more expensive and inefficient than moving programs, thus it is desirable that computation be data-driven.
- the goal of co-locating computation and supporting data may be achieved if data partitioning of the original data and allocation of the data partitions to the computational resources are both driven by a computation model representing an application.
- a hydrologic application is described that uses the UDP key for data partitioning based on the computational model for the application. Based on hydrologic fundamentals, a watershed computation is made region by region from upstream to downstream in a river drainage network. Therefore, the original data for the hydrologic application is to be partitioned in accordance to the computational model for computation efficiency.
- the UDP enables grouping of data based on the semantics at the data intensive computing level. This allows data partitioning to be consistent with the data access scoping of the computation model, which underlies the co-location of data partitions and task executions.
- the partition keys are generated or learnt from the original data by a labeling process based on the application level semantics and computation model, representing certain high-level concepts.
- the UDP partitions data by taking into account the control flow in parallel computing based on a data dependency graph.
- the UDP methodology supports computation-model aware data partitioning, for tightly incorporating parallel data management with data intensive computation while accommodating the order dependency in multi-step parallel data processing.
- the disclosure includes a section outlining an application involving watershed computation performed by a river drainage network, a section describing additional details of user defined data partitioning (UDP), and a section to describe implementation considerations.
- UDP user defined data partitioning
- FIG. 1A illustrates a river drainage network model 100 , according to an embodiment.
- FIG. 1B illustrates a cluster of servers 110 to implement the river drainage network model 100 described with reference to FIG. 1A , according to an embodiment.
- the river drainage network model 100 is a hydro-informatics system (HIS) that includes one or more servers 112 (also referred to as computational devices or computational servers) coupled by a communication network 116 to carry out a class of space-time oriented data intensive hydrologic computations that are performed periodically or on demand with near-real-time response (e.g., responsive in a time frame that is soon enough to take a corrective action).
- the HIS like many other earth information systems, may be implemented as a cluster technology based HPC system. Additional details of the implementation aspects of a cluster technology based HPC system architecture is described with reference to FIG. 6 .
- the river drainage network model 100 collects data (such as rainfall, water level, flow rate, discharge volume, and others) from various inputs.
- the data which may be stored in a database 114 , is referred to as the original data.
- Computation results, which may utilize the data, may be stored in the same underlying databases to be retrieved for analysis, mash-up and visualization. The locality match of parallel computing and parallel data management is desirable to improve the efficiency of such data intensive computation.
- the majority of data stored in the river drainage network model 100 are location sensitive geographic information.
- the river drainage network model 100 may be illustrated as an unbalanced binary tree, where river segments are named by binary string codification. For example, starting downstream at a mouth of a river is binary segment 0 and ending upstream at an origin of the river is binary segment 0000000, thereby indicating there are 7 river segments between the mouth of the river and the origin of the river. A tributary nearest to the mouth of the river is shown as binary segment 01.
- Data describing the river segments binary tree may be stored in a table, where each row represents a river segment, or a tree node.
- a table storing the binary tree representing the river drainage network model 100 includes 21 rows for the 21 binary segments. It is understood that the number of river segments may vary depending on each application.
- the table may include attributes such as node_id, left_child_id, right_child_id, node_type (e.g., RR if it is the root of a region; or RN otherwise), and a region_id that is generated as the UDP key.
- FIG. 2A illustrates a partitioning of river segments included in the river drainage network model 100 into regions, according to an embodiment.
- the river segments may be grouped or partitioned into regions 210 and may be processed in an order from upstream regions to downstream regions.
- the twenty-one (21) river segments shown in the river drainage network model 100 may be partitioned into 5 regions 210 . It is understood that the number of river segments and the number of regions may vary depending on each application.
- Regions 210 also form a tree but not necessarily a binary tree. Each region is represented by a node in the region tree, and viewed as a partition of the river segments tree.
- a region has the following properties (amongst others):
- region_id that takes the value of the root node_id
- region_level as the length of its longest descendant path counted by region, bottom-up from the leaves of the region tree
- the concept of defining or configuring a region is driven by the computational needs defined by the application and the model (is application-aware and is consistent with the computational model) and the desire to co-locate data and computation to reduce data flow.
- the formation of a region is not an original property or attribute of river segments. That is, the original data associated with the river drainage network model 100 excludes the region as one of its property or attribute.
- the formation or configuration of a region represents the results of a data labeling process and the generated region_id instances from that labeling process serve as the user defined data partitioning (UDP) keys of the river segments table. Additional details of the UDP key are described with reference to FIGS. 3A , 3 B, and 3 C.
- the river-segment table is partitioned by region across multiple server nodes 112 to be accessed in parallel.
- the same function may be applied, in a desired order, to multiple data partitions corresponding to the geographic regions. For example, computations being performed on a region need to retrieve the updated information of the root nodes of its child regions.
- the results of local executions are communicated through database access, using either permanent tables or temporary tables.
- FIG. 2B illustrates a data dependency graph 220 for defining an order of processing data partitions, according to an embodiment.
- watershed computations are made in a desired sequence as indicated by the data dependency graph 220 , region-by-region, from upstream to downstream.
- the region tree is post-order traversed, the root being computed last.
- the desired order in performing computation is described as the ‘data dependency graph’ 220 based parallel processing since geographically dependent regions 210 are desired to be processed in certain order, but the parallel processing opportunities exist for the regions 210 which can be computed in any order. For instance, regions 210 at different tree branches may be processed in parallel.
- the data partitioning is performed in a manner that is consistent with the data dependency graph.
- FIG. 3A illustrates a block diagram of a UDP based parallel processing system 300 , according to an embodiment.
- the parallel processing system 300 includes a table T 310 that includes at least a portion of the original data that may be parallel processed, a UDP key 320 used to partition data included in the table T 310 , data partitions 330 , and allocated partitioned data 340 .
- the processing system 300 supports the processing of a query to retrieve one or more data records stored in the allocated partitioned data 340 .
- the UDP key 320 that includes at least one key property excluded from the original data (e.g., a region described with reference to the river drainage network model 100 ) is generated or learnt from the original data based on an application 360 , including application level semantics and a computation model representing the application.
- a UDP key for partitioning a table T 310 (that includes at least a portion of the original data) includes the following processes:
- a labeling process 322 to mark rows of T 310 for representing their group memberships, e.g., to generate partition keys for data partitioning;
- an allocating (or distributing or partitioning) process 332 to distribute data groups (or partitions) to corresponding nodes of the cluster of servers 110 ;
- the processes for labeling 322 , allocating 332 and retrieving 352 are often data model oriented and are described using the river drainage tree model and the corresponding watershed computation as a reference.
- watershed computation is applied to river segments regions 210 from upstream to downstream, the river segments are grouped into regions 210 and allocated them over multiple databases.
- a region contains a binary tree of river segments.
- the regions 210 themselves also form a tree but not necessarily a binary tree.
- the partitioning is also made bottom-up from upstream (child) to downstream (parent) of the river, to be consistent with the geographic dependency of hydrologic computation.
- the river segments tree is partitioned based on the following criterion. Counted bottom-up in the river segments tree, every sub-tree of a given height forms a region, which is counted from either leaf nodes or the root nodes of its child regions. In order to capture the geographic dependency between regions, the notion of region level is introduced as the partition level of a region that is counted bottom-up from its farthest leaf region, thus represents the length of its longest descendant path on the region tree. As described with reference to FIGS. 2A and 2B , the levels between a pair of parent/child regions may not be consecutive. The computation independence (e.g, parallelizability) of the regions at the same level is statically assured.
- Labeling 322 aims at grouping the nodes of the river segments tree into regions 210 and then assigning a region_id to each tree node. Labeling 322 is made bottom-up from leaves. Each region spans k levels in the river-segment tree, where k is referred to as partition_depth, and for a region, counted from either leaf nodes river segments tree or the root nodes of its child regions. The top-level region may span the remainder levels smaller than k. Other variables are explained below.
- the depth of a node is its distance from the root; the depth of a binary tree is the depth of its deepest node; the height of a node is defined as the depth of the binary tree rooted by this node.
- the height of a leaf node is 0.
- the node_type of a node is assigned to either RR or RN after its group is determined during the labeling process. This variable also indicates whether a node is already labeled or not.
- CRR is used to abbreviate the Closest RR nodes beneath a node t where each of these RR nodes can be identified by checking the parent_region_id value of the region it roots, as either the region_id of t, or un-assigned yet.
- the Closest Descendant Regions beneath a node may be abbreviated as its CDR.
- adj-height( ) returns 0 if the node type of t is RR, otherwise as the height of the binary tree beneath t where all the CRR nodes, and the sub-trees beneath them, are ignored.
- adj-desc( ) returns the list of descendant nodes of t where all the CRR nodes, and the sub-trees beneath them, are exclusive.
- max-cdr-level( ) returns the maximal region_level value of t's CRR (or CDR).
- a labeling algorithm 362 generates region_id for each tree node as its label, or partition key (the UDP key 320 may be generated automatically by executing the labeling algorithm 362 or the UDP key 320 may be generated manually); as well as the information about partitioned regions, including the id, level, parent region for each region.
- the labeling algorithm 362 (configured to be in accordance with a computational model) to generate the UDP key 320 is outlined below:
- FIG. 3B illustrates a region tree with region levels in data partitioning, according to an embodiment.
- the river segments (included in T 310 ) are partitioned into data partitions 330 corresponding to the regions 210 .
- Regions 210 form a tree 370 .
- each region has a region-level as its longest path.
- a tree 380 is illustrated having 9 levels (level 0 through level 8).
- a processing load is balanced by evenly distributing the data partitions 330 to each server 112 as allocated partitioned data 340 .
- the allocation process 332 addresses how to map the data partitions 330 (labeled river regions) to multiple databases and corresponding server nodes 112 . As the river regions at the same region level have no geographic dependency they can be processed in parallel. The allocation may proceed in a conservative manner to distribute regions 210 , using the following process:
- Process 1 generate region-hash from region_id
- Process 2 map the region-hash values to the keys of a mapping table that is independent of the cluster configuration; then distribute regions to server-nodes based on that mapping table.
- the separation of logical partition and physical allocation makes the data partitioning independent of the underlying infrastructure.
- Process 3 balance load, e.g., maximally evening the number of regions over the server nodes level by level in the bottom-up order along the region hierarchy.
- Process 4 record the distribution of regions and make it visible to all server nodes.
- FIG. 3C illustrates parallel access of allocated partitioned data, according to an embodiment.
- allocated partitioned data 340 is generated.
- To locate the region of a river segment given in a query received can be very different from searching the usual hash partitioned or range partitioned data, in case the partition keys are generated through labeling but not given in the “unlabeled” query inputs.
- the general mechanism is based on “ALL-NODES” parallel search 360 shown in FIG. 3C .
- Another technique creates ‘partition indices’ 380 , e.g., to have region_ids indexed by river segment_ids and to hash partition the indices.
- the full records of river segments are partitioned by region, and in addition, the river segment_ids for indexing regions are partitioned by hash.
- querying a river segment given its id but without region is a two step search 370 as shown in FIG.
- 3C first, based on the hash value of the river segment id, only one node is identified for indexing its region, and second, based on the hash value of the region, the node containing the full record of the river segment is identified for data retrieval.
- the full record size of a river segment may be very large, a storage overhead of preparing ‘partition indices’ is relatively small.
- FIG. 4 illustrates a block diagram for a generalized process 400 for parallel processing based on a UDP, according to an embodiment.
- a learning process 420 may be used to analyze original data 410 and formulate a model for a particular application, e.g., river drainage network model 100 .
- a model instance 430 of the model is used to determine computation functions and data partitioning.
- the computation functions are implemented as database user defined functions 440 (UDFs).
- UDFs are scheduled to run at the server nodes where the applied data partitions reside.
- the allocation of UDF's is performed to evenly distribute a processing load on the multiple server nodes while exploiting the parallel processing opportunities for the UDF applications without static and dynamic dependencies. End result of the data partitions 330 being allocated and evenly distributed on the server nodes is allocated partitioned data 340 .
- partitioning data The purpose of partitioning data is to have computation functions applied to data partitions in parallel whenever possible; for this two factors are taken into account: the scope of data grouping should match the domain of the computation function, and the order dependency of function applications should be enforced.
- a flat data-parallel processing falls in one of the following typical cases:
- a computation job is parallelized based on a data dependency graph such as the graph 220 , where the above flat-data parallel execution plans are combined in processing data partitions in sequential, parallel or branching.
- a data dependency graph such as the graph 220
- the focus is on embarrassing parallel computing without in-task communication but with retrieval of previous computation results through database accessing.
- the conventional data partitioning methods expect to group data objects based on existing partition key values, which may not be feasible if there are no key values suitable for the application preexist.
- the UDP is characterized by partitioning data based on the high-level concept relating to the computation model, which are extracted from the original data and serve as the generated partition keys.
- partition of data is based on the concept region whose values are not pre-associated with the original river segment data, but generated in the labeling process.
- UDP aims at partitioning data objects into regions and distribution of data belonging to different regions over a number K of server nodes.
- a region is a geographic area in the river drainage network.
- the notion of region is domain specific; but in general a region means a multidimensional space.
- Labeling is a mapping, possibly with probabilistic measures.
- a labeling mapping potentially yields a confident ranging over 0 to 1.
- the labeling algorithm is used to find the appropriate or best-fit mappings X ⁇ Yi for each i.
- Allocating is a mapping from the above label space to an integer; e.g., map a label vector with probabilistic measures to a number that represents a server node. This mapping may be made in two steps.
- a label vector is mapped to a logical partition id called region-hash (e.g. 1-1024) independent of the actual number (e.g. 1-128) of server node.
- region-hash e.g. 1-1024 independent of the actual number (e.g. 1-128) of server node.
- region-hash is mapped to a physical partition id such as a server node number by a hash-map.
- the method for generating label-hash can be domain specific.
- a mapping from a multidimensional vector to a unique single value can be done using spatial filing curves that turn a multidimensional vector to an integer, and then such an integer can be hash mapped to a label hash value.
- Methods taking into account of confidence of labels can also be domain specific, e.g. in computer tomography interpretation.
- FIG. 5A illustrates a flow chart of a method 500 for generating and subsequent use of a UDP, according to an embodiment.
- the conventional hash, range and list partitioning methods rely on existing partition key values to group data. For many applications, data is often grouped based on the criteria presented at an aggregate or summarized level, and there are no partition keys that preexist in the original data for such grouping.
- a UDP which is characterized by partitioning data based on certain higher-level concepts reflecting the application semantics, addresses this issue. In parallel processing applications using a UDP, partition key values may not present in the original data, but instead they are generated or learnt by a labeling process.
- the method 500 is used for generating and using a UDP of described with reference to FIGS. 1A , 1 B, 2 A, 2 B, 3 A, 3 B, 3 C, and 4 .
- a user defined data partitioning (UDP) key is labeled to configure data partitions of original data, the UDP being labeled to include at least one key property excluded from the original data.
- the labeling may be performed by learning from the original data to generate the UDP key.
- the UDP key is generated in accordance with a computation model that is aware of the data partitions.
- the data partitions are distributed or allocated to co-locate the data partitions and corresponding computational servers.
- a data record of the data partitions is retrieved by performing an all-node parallel search of the computational servers using the UDP key.
- FIG. 5B illustrates a flow chart of a method 540 for allocating data partitions, according to an embodiment.
- the method 540 is used for allocating data partitions generated by using a UDP of described with reference to FIGS. 1A , 1 B, 2 A, 2 B, 3 A, 3 B, 3 C, and 4 .
- a region-hash is generated from a region_ID corresponding to one of multiple regions, the region_ID being generated as a user defined data partitioning (UDP) key to configure data partitions of original data, the UDP being generated to include at least one key property excluded from the original data.
- values of the region-hash are mapped to keys of a mapping table that is independent of cluster configuration.
- the regions are allocated to server-nodes of the cluster configuration in accordance to the mapping table.
- a load of each server-node is balanced by evenly distributing the regions over the server-nodes.
- a distribution of the regions is recorded to make the distribution visible to each one of the server nodes.
- the river segments data are divided into partitions based on the watershed computation model and allocated to multiple servers for parallel processing;
- the data processing on one region retrieves and updates its local data, where accessing a small amount of neighborhood information from upstream regions may be required;
- FIG. 6 illustrates a system architecture 600 based on a convergent cluster for implementing UDP based parallel processing, according to an embodiment.
- the cluster platforms of parallel data management and parallel computing may be converged, for shared resource utilization, for reduced data movement between database and applications, and for mutually optimized performance.
- implementation options may include a selection between using a parallel database or multiple individual databases, with the latter being selected for the watershed application.
- a single cluster of server machines for both parallel data management and parallel computing may be selected for implementation.
- the clustered server nodes 110 may execute individual share-nothing relational DBMS 610 ; data are partitioned to multiple databases based on their domain specific properties, allowing the data access throughput to increase linearly along with the increase of server nodes.
- the server nodes 110 form one or more cliques in data accessing, allowing a data partition to be visible to multiple nodes, and a node to access multiple data partitions. This arrangement is desired for simplifying inter-node messaging and for tolerating faults (as described above, the computation on a region may need to retrieve the updated information of the root nodes of its child regions).
- the computation functions may be implemented as database user defined functions (UDFs) for co-locating data intensive computation and data management.
- UDFs database user defined functions
- VSL Virtual Software Layer
- VDM Virtual Data Management
- VTM Virtual Task Management
- the VSL 620 resides at each server node, all server nodes are treated equally: every server node holds partitions of data, as well as the meta-data describing data partitioning; has VDM capability as well as VTM 630 capability. The locations of data partitions and function executions are consistent but transparent from applications.
- the parallel computation opportunities exist statically in processing the geographically independent regions either at the same level or not, and dynamically in processing the regions with all their children regions have been processed. These two kinds of opportunities will be interpreted and realized by the system layer.
- the computation functions e.g., UDFs are made available on all the server nodes.
- the participating server nodes also know the partition of regions and their locations, the connectivity of regions, particular computation models, UDF settings and default values.
- each VTM is provided with a UDF invoker 640 and an ODBC connector.
- a computation job can be task-partitioned among multiple server nodes to be executed in parallel.
- Task scheduling is data-driven, based on the locality and geo-dependency of the statically partitioned data.
- UDFs are scheduled to run at the server nodes where the applied data partitions reside.
- Local execution results are stored in databases, and communicated through database access. The computation results from multiple server nodes may be assembled if necessary.
- task scheduling is based on the master-slave architecture.
- Each server node can act as either master or slave, and can have both of them.
- the VTM-master is responsible for scheduling tasks based on the location of data partitions, their processing dependencies, and the execution status. It determines the parallel processing opportunities for the UDF applications without static and dynamic dependencies, send task requests together with parameters to the VTM-slaves where the data to be computed on reside, monitors execution status, re-executes tasks upon failure, etc. Currently, the resembling of local results is handled directly by the VTM-master module.
- VTM-slaves Upon receipt of task execution requests and parameters from the VTM-master, the VTM-slaves execute their tasks through UDF invokers.
- VTM master and slaves serve as MPI masters and slaves.
- data from master to slave may include static inputs associated with a new region, processes on different regions pass information through database access.
- Embodiments disclosed herein provide a User Defined Data Partitioning (UDP) technique that correlates data partitioning and application semantics.
- UDP User Defined Data Partitioning
- the conventional data partitioning methods do not take into account the application level semantics thus may not be able to partition data properly to fit in the computation model.
- These partitioning methods are primarily used to support flat parallel computing, and based on the existing partition key values, but the criterion of partitioning data could relate to a concept presented at the application level rather than in the original data; should that happen, there would be no appropriate partition keys identifiable.
- partition key values are not expected to pre-exist, but generated or learnt in a labeling process based on certain higher level concept extracted from the original data, which relates to the computation model, and especially the “complex” parallel computing scheme based on data dependency graphs.
- the UDP technique supports computation model aware data partitioning and supports to correlate data analysis, machine learning to parallel data management.
- As applied to a hydro-informatics system for supporting periodical, near-real-time, data-intensive hydrologic computation on a database cluster, experimental results reveal its performance and efficiency in tightly coupling data partitioning with ‘complex’ parallel computing in the presence of data processing dependencies.
- FIG. 7 illustrates a block diagram of a computer system 700 , according to an embodiment.
- the computer system 700 includes a processor 710 coupled to a memory 720 .
- the memory 720 is operable to store program instructions 730 that are executable by the processor 710 to perform one or more functions.
- the term “computer system” is intended to encompass any device having a processor that is capable of executing program instructions from a computer-readable medium such as memory devices and storage devices.
- the various functions, processes, methods 500 and 540 , and operations described herein may be implemented using the computer system 700 .
- the river drainage network model 100 and components thereof, e.g., the cluster of servers 110 may be implemented as program instructions 730 using one or more of the computer system 700 .
- the various functions, processes, methods, and operations performed or executed by the system 700 can be implemented as the program instructions 730 (also referred to as software or simply programs) on computer readable medium that are executable by the processor 710 and various types of computer processors, controllers, microcontrollers, central processing units, microprocessors, digital signal processors, state machines, programmable logic arrays, and the like.
- the computer system 700 may be networked (using wired or wireless networks) with other computer systems.
- the program instructions 730 may be implemented in various ways, including procedure-based techniques, component-based techniques, object-oriented techniques, rule-based techniques, among others.
- the program instructions 730 can be stored on the memory 720 or any computer-readable medium for use by or in connection with any computer-related system or method.
- a computer-readable medium is an electronic, magnetic, optical, or other physical device or means that can contain or store computer program logic instructions for use by or in connection with a computer-related system, method, process, or procedure.
- Programs can be embodied in a computer-readable medium for use by or in connection with an instruction execution system, device, component, element, or apparatus, such as a system based on a computer or processor, or other system that can fetch instructions from an instruction memory or storage of any appropriate type.
- a computer-readable medium can be any structure, device, component, product, or other means that can store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
Abstract
Description
Algorithm PostorderTreeNodeLabeling (bt, k) |
Input: | (1) BinaryTree bt |
(2) int k as partition depth | |
Output: | (1) region_id of each node (label) |
(2) id, level, parent of each region |
Procedure |
1: if bt = ø then | ||
2: return | ||
3: if bt.node_type ≠ UNDEF then | ||
4: return | ||
5: if bt.left_child ≠ ø && bt.left_child.adj-height( ) ≧ k then | ||
6: PostorderTreeNodeLabeling (bt.left_child) | ||
7: if bt.right_child≠ø && bt.right_child.adj-height( ) ≧ k then | ||
8: PostorderTreeNodeLabelping (bt.right_child) | ||
9: if bt.is_root( ) || bt.adj-height( ) = k then | ||
10: Region p = new Region(bt.node_id) | ||
11: bt.region_id = p.get-id( ) // optionally as bt.node_id | ||
12: bt.region_level = bt.max-cdr-level( ) + 1 | ||
13: bt.node_type = RR | ||
14: List cdr = bt.cdr( ) | ||
15: for each ncdr in cdr do | ||
16: ncdr.parent_region_id = bt.region_id | ||
17: List members = bt.adj-desc( ) | ||
18: for each nm in members do | ||
19: nm.region_id = bt.region_id | ||
20: nm.node_type = RN. | ||
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/358,995 US8904381B2 (en) | 2009-01-23 | 2009-01-23 | User defined data partitioning (UDP)—grouping of data based on computation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/358,995 US8904381B2 (en) | 2009-01-23 | 2009-01-23 | User defined data partitioning (UDP)—grouping of data based on computation model |
Publications (2)
Publication Number | Publication Date |
---|---|
US20100192148A1 US20100192148A1 (en) | 2010-07-29 |
US8904381B2 true US8904381B2 (en) | 2014-12-02 |
Family
ID=42355225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/358,995 Active 2033-05-21 US8904381B2 (en) | 2009-01-23 | 2009-01-23 | User defined data partitioning (UDP)—grouping of data based on computation model |
Country Status (1)
Country | Link |
---|---|
US (1) | US8904381B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150347473A1 (en) * | 2014-05-29 | 2015-12-03 | International Business Machines Corporation | Database partition |
CN111241143A (en) * | 2020-01-09 | 2020-06-05 | 湖南华博信息技术有限公司 | Distributed calculation method and system for water supply amount and water fee |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8380643B2 (en) * | 2009-09-30 | 2013-02-19 | International Business Machines Corporation | Searching multi-dimensional data using a parallelization framework comprising data partitioning and short-cutting via early out |
US9047674B2 (en) * | 2009-11-03 | 2015-06-02 | Samsung Electronics Co., Ltd. | Structured grids and graph traversal for image processing |
CN102207891B (en) * | 2011-06-10 | 2013-03-06 | 浙江大学 | Method for achieving dynamic partitioning and load balancing of data-partitioning distributed environment |
US8892502B2 (en) * | 2011-12-07 | 2014-11-18 | Sap Se | Parallel processing of semantically grouped data in data warehouse environments |
US8694575B2 (en) * | 2012-06-11 | 2014-04-08 | The Johns Hopkins University | Data-intensive computer architecture |
US9158548B2 (en) | 2012-11-13 | 2015-10-13 | The Johns Hopkins University | System and method for program and resource allocation within a data-intensive computer |
CN105335411A (en) | 2014-07-31 | 2016-02-17 | 国际商业机器公司 | Method and system for data processing |
US9684689B2 (en) * | 2015-02-03 | 2017-06-20 | Ca, Inc. | Distributed parallel processing system having jobs processed by nodes based on authentication using unique identification of data |
WO2016139770A1 (en) * | 2015-03-04 | 2016-09-09 | オリンパス株式会社 | Image processing device |
US10261943B2 (en) | 2015-05-01 | 2019-04-16 | Microsoft Technology Licensing, Llc | Securely moving data across boundaries |
JP6530811B2 (en) | 2015-05-14 | 2019-06-12 | オリンパス株式会社 | Image processing device |
WO2017143405A1 (en) * | 2016-02-26 | 2017-08-31 | Cryspintel Pty Ltd | A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository |
CN107451154B (en) * | 2016-05-31 | 2021-03-30 | 华为技术有限公司 | Data table processing method, device and system |
US20180205790A1 (en) * | 2017-01-13 | 2018-07-19 | Hewlett Packard Enterprise Development Lp | Distributed data structure in a software defined networking environment |
US11036471B2 (en) * | 2018-06-06 | 2021-06-15 | Sap Se | Data grouping for efficient parallel processing |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5878409A (en) * | 1995-06-01 | 1999-03-02 | International Business Machines Corporation | Method and apparatus for implementing partial declustering in a parallel database system |
US6341289B1 (en) * | 1999-05-06 | 2002-01-22 | International Business Machines Corporation | Object identity and partitioning for user defined extents |
US20040199526A1 (en) * | 2003-03-18 | 2004-10-07 | Norifumi Nishikawa | Information processing system and system setting method |
US20050050085A1 (en) * | 2003-08-25 | 2005-03-03 | Akinobu Shimada | Apparatus and method for partitioning and managing subsystem logics |
US20050268298A1 (en) * | 2004-05-11 | 2005-12-01 | International Business Machines Corporation | System, method and program to migrate a virtual machine |
US20070067261A1 (en) * | 2005-09-20 | 2007-03-22 | Louis Burger | System and a method for identifying a selection of index candidates for a database |
US20080092112A1 (en) * | 2006-10-11 | 2008-04-17 | International Business Machines Corporation | Method and Apparatus for Generating Code for an Extract, Transform, and Load (ETL) Data Flow |
US7406522B2 (en) * | 2001-09-26 | 2008-07-29 | Packeteer, Inc. | Dynamic partitioning of network resources |
US20080189239A1 (en) * | 2007-02-02 | 2008-08-07 | Aster Data Systems, Inc. | System and Method for Join-Partitioning For Local Computability of Query Over Shared-Nothing Clusters |
US20080263312A1 (en) * | 2004-11-04 | 2008-10-23 | International Business Machines Corporation | Parallel installation of logical partitions |
US7577637B2 (en) * | 2005-08-15 | 2009-08-18 | Oracle International Corporation | Communication optimization for parallel execution of user-defined table functions |
US7788646B2 (en) * | 2005-10-14 | 2010-08-31 | International Business Machines Corporation | Method for optimizing integrated circuit device design and service |
-
2009
- 2009-01-23 US US12/358,995 patent/US8904381B2/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5878409A (en) * | 1995-06-01 | 1999-03-02 | International Business Machines Corporation | Method and apparatus for implementing partial declustering in a parallel database system |
US6341289B1 (en) * | 1999-05-06 | 2002-01-22 | International Business Machines Corporation | Object identity and partitioning for user defined extents |
US7406522B2 (en) * | 2001-09-26 | 2008-07-29 | Packeteer, Inc. | Dynamic partitioning of network resources |
US20040199526A1 (en) * | 2003-03-18 | 2004-10-07 | Norifumi Nishikawa | Information processing system and system setting method |
US20050050085A1 (en) * | 2003-08-25 | 2005-03-03 | Akinobu Shimada | Apparatus and method for partitioning and managing subsystem logics |
US20050268298A1 (en) * | 2004-05-11 | 2005-12-01 | International Business Machines Corporation | System, method and program to migrate a virtual machine |
US20080263312A1 (en) * | 2004-11-04 | 2008-10-23 | International Business Machines Corporation | Parallel installation of logical partitions |
US7577637B2 (en) * | 2005-08-15 | 2009-08-18 | Oracle International Corporation | Communication optimization for parallel execution of user-defined table functions |
US20070067261A1 (en) * | 2005-09-20 | 2007-03-22 | Louis Burger | System and a method for identifying a selection of index candidates for a database |
US7788646B2 (en) * | 2005-10-14 | 2010-08-31 | International Business Machines Corporation | Method for optimizing integrated circuit device design and service |
US20080092112A1 (en) * | 2006-10-11 | 2008-04-17 | International Business Machines Corporation | Method and Apparatus for Generating Code for an Extract, Transform, and Load (ETL) Data Flow |
US20080189239A1 (en) * | 2007-02-02 | 2008-08-07 | Aster Data Systems, Inc. | System and Method for Join-Partitioning For Local Computability of Query Over Shared-Nothing Clusters |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150347473A1 (en) * | 2014-05-29 | 2015-12-03 | International Business Machines Corporation | Database partition |
US10229377B2 (en) * | 2014-05-29 | 2019-03-12 | International Business Machines Corporation | Database partition |
US10282691B2 (en) * | 2014-05-29 | 2019-05-07 | International Business Machines Corporation | Database partition |
CN111241143A (en) * | 2020-01-09 | 2020-06-05 | 湖南华博信息技术有限公司 | Distributed calculation method and system for water supply amount and water fee |
Also Published As
Publication number | Publication date |
---|---|
US20100192148A1 (en) | 2010-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8904381B2 (en) | User defined data partitioning (UDP)—grouping of data based on computation model | |
US20210286787A1 (en) | System and method for slowly changing dimension and metadata versioning in a multidimensional database environment | |
US11789978B2 (en) | System and method for load, aggregate and batch calculation in one scan in a multidimensional database environment | |
US11922221B2 (en) | System and method for automatic dependency analysis for use with a multidimensional database | |
US10007698B2 (en) | Table parameterized functions in database | |
US10146834B2 (en) | Split processing paths for a database calculation engine | |
US11093459B2 (en) | Parallel and efficient technique for building and maintaining a main memory, CSR-based graph index in an RDBMS | |
US10990587B2 (en) | System and method of storing and analyzing information | |
US7822785B2 (en) | Methods and apparatus for composite configuration item management in configuration management database | |
US20130191523A1 (en) | Real-time analytics for large data sets | |
US8046373B2 (en) | Structured parallel data intensive computing | |
US10146814B1 (en) | Recommending provisioned throughput capacity for generating a secondary index for an online table | |
US10158709B1 (en) | Identifying data store requests for asynchronous processing | |
Ezeife et al. | Distributed object based design: Vertical fragmentation of classes | |
Camacho-Rodríguez et al. | Building large XML stores in the Amazon cloud | |
US10289723B1 (en) | Distributed union all queries | |
US20220138195A1 (en) | User defined functions for database query languages based on call-back functions | |
US11036471B2 (en) | Data grouping for efficient parallel processing | |
US20090271382A1 (en) | Expressive grouping for language integrated queries | |
Grossniklaus | The case for object databases in cloud data management | |
Savnik et al. | Method of Big-graph partitioning using a skeleton graph | |
Chen et al. | User defined partitioning-Group data based on computation model | |
US10853419B2 (en) | Database with time-dependent graph index | |
Karthick et al. | Enhancing dynamic location data by means of d-toss in spatial data heuristic partition | |
Soliman | Big Data Query Engines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHEN, QIMING;HSU, MEICHUN;REEL/FRAME:022159/0695 Effective date: 20090122 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001 Effective date: 20151027 |
|
AS | Assignment |
Owner name: ENTIT SOFTWARE LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP;REEL/FRAME:042746/0130 Effective date: 20170405 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., DELAWARE Free format text: SECURITY INTEREST;ASSIGNORS:ENTIT SOFTWARE LLC;ARCSIGHT, LLC;REEL/FRAME:044183/0577 Effective date: 20170901 Owner name: JPMORGAN CHASE BANK, N.A., DELAWARE Free format text: SECURITY INTEREST;ASSIGNORS:ATTACHMATE CORPORATION;BORLAND SOFTWARE CORPORATION;NETIQ CORPORATION;AND OTHERS;REEL/FRAME:044183/0718 Effective date: 20170901 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
AS | Assignment |
Owner name: MICRO FOCUS LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ENTIT SOFTWARE LLC;REEL/FRAME:050004/0001 Effective date: 20190523 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0577;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:063560/0001 Effective date: 20230131 Owner name: NETIQ CORPORATION, WASHINGTON Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 Owner name: MICRO FOCUS SOFTWARE INC. (F/K/A NOVELL, INC.), WASHINGTON Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 Owner name: ATTACHMATE CORPORATION, WASHINGTON Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 Owner name: SERENA SOFTWARE, INC, CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 Owner name: MICRO FOCUS (US), INC., MARYLAND Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 Owner name: BORLAND SOFTWARE CORPORATION, MARYLAND Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 Owner name: MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399 Effective date: 20230131 |