WO2016138566A1

WO2016138566A1 - A system and method for federated enterprise analysis

Info

Publication number: WO2016138566A1
Application number: PCT/AU2016/050147
Authority: WO
Inventors: Nino SVONJA
Original assignee: Lumanetix Pty Ltd
Priority date: 2015-03-04
Filing date: 2016-03-04
Publication date: 2016-09-09

Abstract

A system for the federated enterprise analysis of data within a data source comprising: a virtual graph overlying said data source and providing a unified view of relational schemas in the data source; and at least one graph engine, each graph engine storing a respective section of the virtual graph that is mapped to a respective relational schema within said data source; whereby enterprise analysis can be performed on a single graph engine within the at least one graph engine, or on any subset of graph engines within the at least one graph engine linked according to the virtual graph.

Description

A SYSTEM AND METHOD FOR FEDERATED ENTERPRISE ANALYSIS

FIELD OF INVENTION

The present invention relates to a system and method for federated enterprise analysis. In particular, the invention relates to systems and methods in which a virtual graph is created over relational schema of at least one underlying physical data source, for example one or more database or structured or unstructured data which may exist in the enterprise.

BACKGROUND ART

Existing systems for the analysis of data generally create external data warehouse silo/s, which can be extremely large. Such systems pull, cleanse and transform the data out of enterprise data sources into the silos and store it in a predefined way or with predefined views. This facilitates running of specific business intelligence operations Generally, existing systems transform the data and store it in some pre-defined de-normalised format as either large tables or dimensional models. As such these systems require de-normalisation which may cause extensive big data and data update cascade burden. Existing Systems that involve the creation of new customised big data warehouses generally give users drop downs and checkboxes to filter the data that is to be visualised.

The system of embodiments of the invention advantageously creates a plurality of small local "warehouses" mainly consisting of indexes that sit close to or over existing enterprise data sources. Search and business intelligence operations can then advantageously be performed at any level either within, or on a single graph search node, or any subset of connected graphs search nodes. This architecture is inverse to existing systems and advantageously pushes the computational requirements down closer to the data source. Advantageously, embodiments of the invention therefore provide a consistent standardised virtualised platform for Business Intelligence (Bl) and make it easier and less expensive to adopt the platform from ground up. The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practice. Further anticipated advantages of embodiments of the invention will be elucidated and described in more detail below.

SUMMARY OF INVENTION The present invention relates to a system and method for federated enterprise analysis. In particular, the invention relates to systems and methods in which a virtual graph is created over relational schema of at least one underlying physical data source, for example one or more database or structured or unstructured data which may exist in the enterprise.

One aspect of the present invention provides a system for the federated enterprise analysis of data within a data source comprising:

a virtual graph overlying the data source and providing a unified view of relational schemas in the data source; and at least one graph engine, each graph engine storing a respective section of the virtual graph that is mapped to a respective relational schema within the data source;

whereby enterprise analysis can be performed on a single graph engine within the at least one graph engine, or on any subset of graph engines within the at least one graph engine linked according to the virtual graph.

As used herein the term "relational schema" is intended to include within its scope the actual fixed relational schema of the underlying data source.

As used herein, the term "virtual graph' is intended to include within its scope the logical or virtual graph schema that is created on top of the underlying relational schema, excel spreadsheet or other unstructured data. By default, it is expected that the relational schema will be imported in a one-to-one relationship into the virtual graph. However, it is also envisaged that users will be able to map and restructure the virtual graph as they wish such that multiple different relational schemas could map to a single section of the virtual graph. For example, different customer tables from different databases could all map to single entity (i.e. section of the virtual graph) called "Customers". The virtual graph effectively drives all structured searches, and may often be very similar to the database schema. Put another way, it is anticipated that schemas will converge into virtual schemas which will converge into a standard set of schemas for different business units that will eventually become the same for all companies, making it easier to create cross compatible applications. Neighbourhoods or trees within the virtual graph can be produced through a collection of paths that follow virtual graph relationships to other entities and that stem from a single entity. Each entity will generally have a neighbourhood or tree that stems from it. Using the neighbourhoods or trees, entities may be found through other relevant connected data. As used herein, the term "data source" is intended to include within its scope one or more sources of data. For example, this may include a single source of data, or may include multiple sources of data from a local data centre, a remote data centre or a wide area network (WAN).

In certain embodiments, each of said graph engines comprises:

a processor;

a search and query engine;

a virtual graph/schema engine;

a data index; and

a relationship index.

In certain embodiments one or more of the graph engines is a bridging graph engine that is adapted to communicate with and link two or more of the graph engines, thereby linking sections of virtual graph stored within the linked graph engines. The locations of the linked graph engines are not particularly limited. For example, they may be located in a local data centre, a remote data centre or a wide area network (WAN). The bridging graph engines may comprise:

a processor;

a search and query engine;

a virtual graph/schema engine;

a data index bridging said two or more graph engines linked by said parent search node; and

a relationship index bridging said two or more graph engines linked by said parent search node.

The data source may comprise any number of sources of data. For example, the data source may comprise a plurality of data sources. The data source, or each source of data, may comprise structured and/or unstructured data. For example, the data source may comprise database repositories including structured data, or file repositories containing unstructured data. For example, unstructured data may include one or more files.

In certain embodiments, one or more of said graph engines is an external graph engine embedded or deployed externally as part of an external application. In this case, the system comprises a graph engine adapter that facilitates association of the external graph engine with the virtual graph. The external application may comprise, for example, an external vendor application or a cloud based application.

The system may display to a user in any suitable manner. In one embodiment, the system includes a graph search gateway component adapted to facilitate bubble graph search visualisation on a user interface. In the bubble graph search visualisation, sections of the virtual graph may be represented as interconnected bubbles and relationships between the sections may be represented by connections between the bubbles. Advantageously, the graph search gateway component is adapted to facilitate grouping of interconnected bubbles, representing section of the virtual graph, within the bubble graph search visualisation. It should be appreciated that reference to a bubble graph search visualisation does not limit the visualisation to any particular "bubble" shape. While circular bubbles are illustrated in the following detailed description of the invention, the bubbles may be any other shape without limitation.

In certain embodiments, bubbles within the bubble graph search visualisation include indicia that indicate the amount of data within the bubbles. For example, the indicia may be selected from colour, size or badging of the bubbles. Preferably, the system includes zoom functionality that facilitates zooming into and out of user selected regions of the bubble graph visualisation. Zooming into the bubble graph visualisation will generally increase the level of information presented in the user selected regions of said bubble graph visualisation.

In certain embodiments of the system, data search and analysis may also be facilitated through a list view type search. Again, any such list view type search is built from the structure of the virtual graph. The search results within the list view type search may comprise data rows grouped into neighbourhoods that pivot on root entities sourced from data mapped through the virtual graph. Generally, the list view type search also comprises a search navigation panel that summarises possible relationships within the virtual graph that can yield results for a given input query.

Generally, the system is provided with a user interface, the user interface comprising a search box for entry of a search query by a user. As such, in a preferred embodiment an autocomplete helper adapted to identify entry of a search query by a user and provide structured suggestions based on the entry of the search query and underlying data in the underlying data repositories is provided. In certain embodiments, the autocomplete helper provides multiple dimensions of structured suggestions based on the virtual graph, data and metadata in the data source facilitating disambiguation of where, what and how search results are fetched.

In certain embodiments, a user interface is provided, the user interface comprising an app store tray. In this embodiment, the system additionally be adapted to identify a user query and list most popular and/or most relevant apps in the app store tray. The user may then be able to launch desired apps over the dataset and run the apps on the dataset, for example running analytics on the dataset. That is, apps installed by a user can be used to bind and traverse relational and tree data across any subset of the virtual graph. When apps have been selected and launched, analysis results will need to be displayed. The system therefore preferably comprises a user dashboard that displays results of analysis conducted by respective apps selected by a user of the system. In that regard, the user dashboard may be self-created by users of the system. The user dashboard generally consolidates analytic report output from said apps.

In certain embodiments, security of the system is decentralised whereby designated administrators of each graph engine can set permissions relating to a user's access to associated entities and attributes.

According to another aspect of the invention there is provided method for the federated enterprise analysis of data within a data source comprising:

overlaying the data source with a virtual graph so as to provide a unified view of relational schemas in the data source;

mapping at least one graph engine storing a section of the virtual graph to a respective relational schema within the data source; and

performing enterprise analysis on a single graph engine within the at least one graph engine, or on any subset of graph engines within the at least one graph engine linked according to the virtual graph.

In certain embodiments, one or more of the graph engines is a bridging graph engine that is adapted to communicate with and link two or more of the graph engines, the method comprising linking graph engines through the bridging graph engine, thereby linking sections of virtual graph stored within the linked graph engines. In certain embodiments, one or more of the graph engines is an external graph engine embedded or deployed externally as part of an external application and the method comprises associating the external graph engine with the virtual graph through a graph engine adapter. Although not limiting on the invention, the method may comprise grouping data rows into neighbourhoods that pivot on root entities sourced from data mapped through the virtual graph and visualising structured and relational data in one or more of a list, summary or graph view.

The step of performing enterprise analysis is not particularly limited. However, it is envisaged that this may comprise running one or more apps on each of the graph engines or subset of graph engines. In that regard the method may further comprise collating the output from the one or more apps and displaying the collated output on a dashboard. It is considered that the system of the invention provides for architecture benefits and lends itself to distributed searching, distributed processing, distributed and decentralised security management, enhanced data collaboration for purposes of analysis and traversal and creating dynamic views over the data that may span the virtual graph.

The present invention consists of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.

BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS

To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It should be appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting on its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which: FIG. 1 illustrates a single graph engine of a system according to an embodiment of the invention.

FIG. 2 illustrates another alternative for a single graph engine of a system according to an embodiment of the invention.

FIG. 3 illustrates an alternative arrangement for the graph engine showing how an external source may be embedded within the engine stack. FIG. 4 illustrates a system of an embodiment of the invention.

FIGS. 5-13 illustrate a user interface displaying a bubble graph search visualisation. FIG. 14 illustrates methodology for selecting data bindings to structured selections according to an embodiment of the invention.

FIGS. 15-16 illustrate a user interface displaying a navigation list. FIG. 17 illustrates a flow chart of searching methodology according to an embodiment of the invention.

FIG. 18 illustrates a flow chart of data index ingestion flow according to an embodiment of the invention.

FIGS. 19-24 illustrate an example of structured suggestions flow according to an embodiment of the invention.

FIG. 25 illustrates a flow chart of structured autocomplete suggestions flow according to an embodiment of the invention. FIG. 26 illustrates a flow chart of metadata index ingestion flow according to an embodiment of the invention. FIG. 27 illustrates a flow chart of functional query operators index ingestion flow according to an embodiment of the invention.

FIGS. 28-29 illustrate screen shots displaying availability and implementation of apps according to an embodiment of the invention.

FIG. 30 illustrates a flow diagram of work flow for app launching according to an embodiment of the invention.

FIG. 31 illustrates an embodiment of the system of the invention.

FIG. 32 illustrates how the virtual graph can be extended with an organisational layer.

FIG. 33 illustrates how the virtual graph is used generate summarisations, search listings and visual searches.

FIG. 34 illustrates formation of dashboards comprising a consolidated view of apps. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

As previously noted, the present invention relates to a system and method for federated enterprise analysis. In particular, the invention relates to systems and methods in which a virtual graph is created over relational schema of at least one underlying physical data source, for example one or more database or structured or unstructured data which may exist in the enterprise. Hereinafter, this specification will describe the present invention according to various preferred embodiments. It is to be understood that the description of the preferred embodiments of the invention that follows is merely to facilitate a better understanding of the present invention and it is envisioned that other embodiments may be within the ambit of the invention without departing from the scope of the appended claims.

Data Search and Analysis System

Referring to Figure 1 , a single graph engine 100 suitable for use in a system in accordance with an embodiment of the invention is illustrated. The graph engine 100 includes a server 101 that is provided with a processor 102, memory 103 and disk 104. Memory 103 includes a data index 105 and relationship index 106, as well as a search and query engine 107 that makes use of a virtual graph or virtual schema engine 108. The disk 104 includes data in the form of a database repository 109. It is envisaged that the indexes 105, 106 and engines 107, 108 may also be stored on disk. Likewise, the database repository 109 may be stored in memory.

As illustrated, the graph engine 100 is connected to a network 120, which provides connectivity to a user 130 and client 140.

Figure 2 illustrates a similar graph engine 200, but in this case the disk 204 includes data in the form of a file repository 209. The file repository 209 can be composed of one or more files of, for example, excel, pdf, Word or other format. That is, the file repository may contain any unstructured data.

Referring to Figure 3, a similar graph engine 300 is illustrated. This example illustrates how an external source 310, such as an external vendor application, or cloud based service or application can be embedded with the same engine stack and serve as a data source repository itself, or even sit side by side and participate directly in the virtual graph. In this embodiment, the external source 310 includes a server repository 31 1 that is provided with a processor 312, memory 313 and disk 314. Memory 313 includes a data index 315 and relationship index 316, as well as a search and query engine 317 that makes use of a virtual graph or virtual schema engine 318. The disk 314 includes data in the form of a database repository 319a and/or file repository 319b. In this embodiment, the graph engine 300 includes a processor 302' that is connected to the external source 310 via network 321 .

A system 400 according to an embodiment of the invention is illustrated in more detail in Figure 4. This figure illustrates how separate single graph engines 410a and 410b are connected to form a larger search fabric and larger virtual graph. Each graph engine 410a, 410b is responsible for its own section of the virtual graph and a bridging graph engine 420 communicates and links the two underlying graph engines 410a, 410b. Additional graph engines (not shown) may also be provided and linked by the illustrated bridging graph engine 420 or other bridging graph engines. That is, any of the graph engines, usually provided with additional "smarts", within the fabric may in effect be a bridging graph engine.

Each of the graph engines 410a, 410b has componentry as previously described. As such, this will not be reiterated here in detail. However, as illustrated in Figure 4, graph engine 410a is associated with a data repository 41 1 a of data set A. Graph engine 410a then includes a data index 412a and relationship index 413a for data set A, as well as a search and query engine 414a that makes use of a virtual graph or virtual schema engine 415a for data set A. In a similar way, graph engine 410b is associated with a data repository 41 1 b of data set B and includes a data index 412b and relationship index 413b for data set B, as well as a search and query engine 414b that makes use of a virtual graph or virtual schema engine 415b for data set B.

The bridging search engine 420 includes a processor 421 and memory 422. The memory is provided with a data index 423 and relationship index 424 for bridging the data and relationships of graph engines 410a, 410b. The memory also includes a search and query engine 425 that makes use of a virtual graph or virtual schema engine 426 for data sets A, B. Clients 430 and users 440 can then connect and search seamlessly on any section or groups of sections in the fabric. On executing a search, the search is sent to every graph engine 410a, 410b (and others not shown) that may match results or that the user has constrained the search to. One graph engine will then take ownership and combine the results and return these back to the client.

Graph Search Gateway Component

Users of the system will be provided with a user interface that facilitates data searching and analysis. The interface according to certain embodiments of the invention will be discussed in more detail below with reference to Figures 5 to 1 5. Search and analysis methodology according to certain embodiments of the invention will also be described in more detail.

In traditional search engines an input box is provided facilitating entry of terms that drive the search. Graph searching allows users to enter keys at any time and search. The system detects the current whereabouts of the user in the application and determines whether the search or further query is relevant. The system then collects the searches that have been conducted and stacks them as a visual overlay on the screen. This gives the user some freedom as they do not need to click on a specific search box. It also facilitates visual interaction with the search results. In this way, a user can find and select datasets and then build on those datasets by connecting relevant information from other parts of the graph/business. Referring to Figures 5 to 14, a user interface 500 is illustrated that includes an example of bubble graph search visualisation according to an embodiment of the invention. In this example, the relationships and entities of the virtual graph are represented as interconnected bubbles 510 that represent the entities, and the connections 520 that represent relationships. In this instance, the entities include Sales 51 1 , Marketing 512, Human Resources (HR) 513, Client Relations Manager (CRM) 514 and Technology 515. The interface also includes an input box 530 and a zoom 540

The user can zoom in and out (like Google maps) into any part of the graph through what may be considered a virtual graph grouping or organisational layer that defines groupings at different zoom levels. It is envisaged that such groupings may be defined by users of the system. When they do, the view will be centred on where the user's mouse was placed and extra detail will be shown. This is illustrated in Figure 6. As such, when the view is zoomed out to the full extent possible, the interface may show a very high level business organisational/departmental structure as seen in Figure 5, where the structure is fully configurable and dynamically derived from the graph. When zooming in to specific departments, as seen in Figure 6, more detailed relationships, tables and data sources may be shown, while maintaining the bubble graph representation. In this example, Sales 51 1 is broken down into "products", "spec" and "reports", Marketing 512 is broken down into "promotional", "campaigns" and "stats", HR 513 is broken down into "employees", "salary" and "titles", and CRM 514 is broken down into "customers", "accounts", "research" and "sales". As illustrated in Figure 7, a lasso tool 710 may be provided to select which bubbles 510 of the graph are to be selected. Click multi-selections can also be provided to custom select what parts of the graph are to be searched. Statistics 720 on the selected search may be identified. Also, the size of the bubbles 510 may be indicative of the amount of data within the field. Relationships may be annotated with descriptions and generally a user can switch at any time from list to grid/bubble view using a relevant list/graph icon 730.

Referring to Figure 8, depending on the zoom level parts of the graph may be hidden out of view. Bubbles 510 may be annotated or badged 810 with the number of data sources within the bubble 510 and other contextual information may be available when hovering over the bubble (or always). Bubbles 510 with no entries may be greyed out 820. When text is entered in the input box 530, as illustrated in Figure 9, a search is initiated and bubbles 510 will begin to be badged 810 with the number of data sources that contain results or number of results, depending on whether the user is looking at a group of fields or a single field. A more detailed breakdown 910 within each bubble 510 may also be provided. Other data sources may be greyed out 820.

As illustrated in Figures 10-1 1 , a user can continue zooming in until they select the source of data they want the results from. In this example, the CRM is selected (Figure 10), followed by the selection of David the customer 1 1 10, not David the lead 1 120

As illustrated in Figure 12, the user can continue entering free text 1210 to narrow down their search until they find the data source with the relevant information. Once a single entity/bubble/node is found the user can continue entering more words. The screen will then centre on the previously selected entity and relational neighbourhoods will be searched and presented as a bubble ring 1220 around a central bubble 1230.

The user can continue to navigate and refine their search on surrounding neighbourhood bubbles until a path is found that yields the right list of results. In this example, illustrated in Figure 13, David (from: customer bubble) with bad review (from CRM: social bubble 1310) is selected.

All the disambiguated entities and relationships are collected and appended to the search box to communicate what options the user has focused in on. Furthermore a summary of the path used can be presented optionally and allow user to cancel and remove any selections made thus far. Further transform functions may also be facilitated, such as See All 1320, Stash 1330, Export 1340 and so on.

A method 1400 of selecting data bindings to the structured suggestions may be described according to the following methodology, with reference to Figure 14.

Initially, a user composes a search query, either directly in the input box or with minimal suggestions 1410. Query terms and current zoom level and then received 1420. The query is then parsed and a determination made as to which entities in the virtual graph/schema contain results 1430. This step may include determining K most relevant data row results from every entity from the virtual graph/schema data index including a count of the total in each entity. The grouping structure at required zoom level is then determined and the results and counts are grouped and aggregated into buckets as predefined in the system configuration and metadata annotations 1440. Buckets can be arbitrary groupings at different levels and may be described by enterprise owners of repositories. For example, these may be location based, system based, department based, etc. The method 1400 then involves determining N most relevant entities defined as having the most relevant result 1450 and determining the M most relevant paths through the virtual graph that connect the entities with results 1460. The query results are then determined and displayed in the form of a graph of edges and nodes representing how the entities are related in the underlying virtual graph 1470. Badges are drawn against bubbles to indicate the number of results in each group, bubble colour and/or size can be indicative of the number of results in that group, and intermediate join tables can be hidden and a single line drawn for each relationship path. A user can then pan, zoom, and select the groups or individual entities to indicate the paths through the enterprise virtual graph that data should be bound to 1480. This may include zooming down to a single entity level and displaying the list of results found in that entity. Once all binding is completed, the user clicks search 1490.

In certain embodiments, the user interface may provide a list view type search. This is similar to the visualised graph search discussed above, but all results come back as result items in a list. Reference is made to Figures 15 and 16.

Referring to Figures 15 and 16, a navigation list 1510 of structured search summary is provided that shows the graph nodes/entities and other entities and/or groups they may be relationally related to. The search results are summarised and classified under headings 1520. As exemplified, searching for "leonie orders" 1530 the search results identify orders by "customer" as the top result. Since there may be more than one such customer the user can click on a customer pill 1540 and select only the entries of interested. The number of results in "orders" will automatically refilter/recalculate. Similarly, the user may wish to identify orders by employees named leonie, in which case the user can filter by selecting "employees". The user may additional remove any part of the summary that is not of interest to them. When a user selects any part of a result, the remaining results may be hidden or greyed out 1610 as illustrated in Figure 16. Larger results may include pagination 1620 for facilitating selection. A flowchart illustrating a method 1700 of searching is provided in Figure 17, which is described in detail below. Initially, a user composes a search query either directly in the search box or via the structured suggestions autocomplete helper or via the immersive structured search visualisation method 1705 and the query terms received 1710. The query is then parsed and any virtual graph data bindings explicitly required by the user are determined 1715 (e.g. Database/repository, entity name, attribute name, etc.). If bindings are found, this information is used to restrict the search or to promote the ranking of any row results found near these data bindings. If no bindings are provided, a search determines any entity/table that contains data rows that match any part of the given query. The query is then parsed to determine any query function operator bindings explicitly required by the user 1720 (e.g. Latest X, best Y, Sum, min, max, greater then, less then, between, approx., etc.). The query is run against a function index to determine any matching query operators that match (e.g. Latest X may match Latest X = sort:X:descending). Matching functional query operators are added to the query according to their rules. The query is then executed against the metadata index 1725 and functional index 1728. If any metadata documents match (e.g. "seafood products" matches products entity name), then method involves boosting/infusing the score of any row results from products table to take into account the metadata match. The query is then executed against each entity data index 1730. If desired, indexes may be combined to improve performance.

The method then involves determining K most relevant data row results from every entity from the virtual graph/schema 1735 and determining N most relevant paths through the virtual graph that connect the entities with results 1740. Following this, the M most relevant path row results for N most relevant paths through the virtual graphs are determined 1745. All result paths in the same neighbourhood are grouped together 1750 (Path starts with the same entity) by virtue of these relationships already being formatted as neighbourhood graph results. Finally, the method includes combining and displaying query results 1755.

Data Index Ingestion Flow 1800 according to an embodiment of the invention is illustrated in Figure 18. Referring to Figure 18, collections of row data are received 1810. According to one embodiment, (Initial or manual trigger) collections of row data are extracted from a database repository 1820. This may include: (i) looping for each row of data from each table in a database data source 1821 ; (ii) reading the columns and values for the row and creating a document 1822; (iii) transforming the values and columns with possible normalisation and user defined cleansing and transformation rules 1823; (iv) creating a key for the document that maps bi-directionally to the database primary key 1824; and (v) indexing the resulting document item 1825. In another embodiment, the database is continuously polled at a configured interval and all data rows selected whose updated date or integer version has increased 1830. In a further embodiment, a complete re-indexing is executed at regular predefined scheduled time and intervals 1840. Another option may include installing custom database connectors on applications connected to the target database repository that listen and detect data changes and send update notifications to the ingestion system API 1850. A further option may include Clients sending programmatic notifications of row data changes over the network via the ingestion system API 1860. Each of these options may include: (i) looping for each row of data from each table in a database data source 1821 ; (ii) reading the columns and values for the row and creating a document 1822; (iii) transforming the values and columns with possible normalisation and user defined cleansing and transformation rules 1823; (iv) creating a key for the document that maps bi- directionally to the database primary key 1824; and (v) indexing the resulting document item 1825.

The data index ingestion flow 1800 then includes processing and transforming the data values with possible cleansing, and normalisation rules defined in by configuration into a searchable indexable document 1870. A key is then created for the data document that maps bi-directionally to the database primary key 1880. The resulting data row document is then indexed where all rows for a given table are indexed together in a single index 1890.

Structured Intention Suggestions for Composing Queries for Structured Search

The system and method of the invention may also provide structured intention suggestions for composing queries for structure searches. This may include a non-intrusive multidimensional interactive auto-complete menu with intention suggestion to help a user define complicated contextual queries against structured data in simple plain language. The intentions suggestions may be automatically generated based on the structure/schema of the underlying data, which may include reference to a combination of virtual graph and data, meta data and functional indexes.

An autocomplete dropdown with multiple columns for multiple levels of suggestions is provided that acts as a wizard to guide the user to structure the search appropriately to get the most relevant information. Each column, once selected, moves left and hides, being summarized in search breadcrumbs at the top of the dropdown. The dropdown can show one or many autocomplete columns as relevant according to the part of the query the user is typing at the time. This will be discussed in more detail below with reference to Figures 19 to 25. Referring to Figure 19, an example of structured suggestions flow 1900 is illustrated. In this figure, one frame 1910 illustrates the autocomplete suggesting categories or suppliers in the "What" column 191 1 for a "seafood" query. Specific instances (data rows) of categories and suppliers are grouped in the "Which" column 1912. The user can navigate up and down and across the Which column 1912 and What column 191 1 and then make a desired selection.

In the second frame 1920, after binding "categories: seafood" the user continues the query to define "products" 1921 and the wizard autocomplete frame repeats back to What/Which.

In the third frame 1930, after desired entities and search text is selected, "How" 1931 and "Actions" 1932 autocomplete may be presented which show functional query operators 1933 for sorting, etc. and any matching apps 1934 a user may want to run across the dataset output. This will usually be dependent on relevance and compatibility with the virtual graph sections (metadata) and data identified.

Figure 20 illustrates structured suggestions 2000 across all dimensions and states. In the How Dimension 2010, an example of sorting function is shown. In the How Dimension 2010, there is also an example of function alias bindings, that is natural language expressions that alias and invoke some other functional operators (e.g. best = sort:price:descending). In the How Dimension 2010, an example of aggregation and basic analytics functions is also shown, as is an example of temporal alias bindings, that is natural language expressions that get converted to dates and ranges (e.g. this year = from 2013-2014). In the What Dimension 2020, an example of database and entity names (e.g. products) is shown. The What Dimension 2020 also provides an example of database, entity and column names (e.g. age). The Which Dimension 2030 provides an example of row data results grouped by source entity and an example of column facet values. The Analyse Dimension 2040 provides an example of apps suggestions. The autocomplete breadcrumbs appear above the autocomplete and below the input box and gather all the structured queries as tags so the user is aware of the bindings in the context of the structured query 2050.

Examples of structured suggestions are also provided in Figures 21 to 24. In these figures, in the first frame 2100 "What" 21 10 includes entities from the graph that contain relevant results for search (e.g. Wolf the customer, the employee or company) In the second frame 2200, after the user selects "customer" 2210, they may select all customers named wolf, or a specific one or more. That is, they can multi select or choose all. In the third frame 2300, after the user selects "wolf/customer" the system may want to know what, if any, other information the user wants, for example customers invoices 2310 or order 2320, or any other entity that is in the customer neighbourhood according to the relational graph. The user can select all invoices or select individual invoices. If the user continues typing "paid", only paid invoices are filtered. The user can further disambiguate between which aspect of the invoices they want to be paid. (Status "paid," but not description like "has not yet paid". The user can optionally sort the data using sorting suggestions 2410, as shown in the fourth frame 2400.

Structured autocomplete suggestions flow 2500 according to an embodiment of the invention is illustrated in Figure 25. According to this figure, the system detects a user typing a search query and sends partial query searches asynchronously to the structured suggestions engine 2510. The structured suggestions engine parses the query and executes term and partial term prefix searches against separate dedicated indexes of structural dimensions 2520. This includes searching against the virtual graph metadata index (standard full-text index of graph metadata) 2530a, searching against a query function operators index (standard full-text index of documents describing query operators) 2530b, searching against data index (standard full-text index of row data) 2530c, and searching against a data facet index (if the user has previously selected a specific column from the "What" column) 2530d. Optionally, an "Analyse" column may be displayed that gives a user suggestions of apps that could be run on the selected data 2530e.

In the search against virtual graph metadata 2530a, an index of all repository/database, entity/table or attribute/column names and or synonyms for the names of the entire virtual graph is considered (e.g. "seafood pro" matches -> products table) 2531 a. This returns a list of repositories or entities or attributes that match the term in descending order of full-text relevance 2532a. The lists of results are presented in "Where" and "What" columns in the autocomplete dropdown 2533a. Listing might also be alphabetical, custom configurable, etc.

In the search against the query function operators index 2530b an index of any query operator mappings between free text to functional query is considered (e.g. latest books = sort:books:desc) 2531 b. This returns a list of suggested operators based on the query and matches them in decreasing relevance order 2532b. Where there have been previous virtual graph bindings, functions near the entities or columns will be ranked higher. The list of matches is presented in a "How/Action" column in the autocomplete dropdown 2533b.

In the search against the data index 2530c a standard data full text index is considered 2531 c. This returns a list of row suggestions based on the query matching the data in those rows (e.g. "mission" -> "mission impossible I, II, etc."). The list of matches is presented in a "Which" column in the autocomplete dropdown 2532c. This also returns a list of entities/tables that have at least one matching result (e.g. "Seafood" returns "Categories, Suppliers"). The entity names are populated into the "What" and/or used as grouping categories 2533c. If the user has not yet selected an entity the results are grouped by each entity 2534c. In the search against a data facet index 2530d, an index of all possible values for a specific column value is considered 2531 d. This returns a list of column values for the last specified column presented in decreasing order of occurrence distribution 2532d and presents the list of matches in a "Which" column in the autocomplete dropdown 2533d.

Results from each of the above index searches are displayed into horizontal columns of the auto suggest dropdown under the user query 2540. The user can then use arrow keys or a mouse to navigate down any column of structure suggestions, or left and right to move to an adjacent column. When the user makes a selection 2550, additional structured bindings are added/autocompleted to their query box, or additional structured bindings are hidden from the user query box, but stored/remembered and sent to the search server when the user completes full query. The entire drop down then slides to the left and the next column/dimension is selectable by the user while the previous column/dimension disappears, (e.g. For a query "Seafood" if user selects the "category" entity from the "What" column then the "Which" column is displayed with a list of rows from categories entity matching the "seafood" term). The user can then select up to any level of explicitness they desire and/or continue typing for more terms 2560.

The menu lifecycle is repeated for every new term in the context of the previous term, and any combination of What, Where, Which and How is displayed depending on the context and what the user is typing 2570. During and after the user executes the search, all of the structured query and data bindings are summarised under the search box so the users is aware of the exact query context 2580. This allows refining of query results by manipulating the bindings summary 2590. In particular, the user can cancel and remove a binding, which may be represented by a tag. The user can also click on a binding or tag which will show the suggestions autocomplete dropdown and allow the user to make alternate selections by moving left and right, which will cycle through every What/Where/etc. that was used to compile the structured query.

Metadata index ingestion flow 2600, as illustrated in Figure 26, includes extracting virtual graph relational schema metadata from a database repository 2610. This involves looping for each column of each table in the database 2620, reading the column name and any additional user name synonyms 2630 and creating and indexing the resulting metadata document with that column text 2640.

Functional query operators index ingestion flow 2700 is illustrated in Figure 27 and includes extracting each functional query definition and all synonyms from the system configuration 2710. This involves looping for each synonym of each functional query user configured mapping 2720, creating a structured document with attributes for the synonym text, type of function and schema locations where the function applies, etc. 2730, and indexing the resulting function document 2740.

Apps Launching

The search page of the system of embodiments of the invention may act as an app store by providing a tray and view to search for apps that can be used to run analytics across the searched datasets. The apps can be bought, installed and saved from the same search portal. This is illustrated in Figures 28 and 29.

Referring to Figure 28, screen shots 2800 are illustrated that show the locations of an app store tray 2810 on right, and saved apps 2820 and a toolbar tray 2830 on the left. Both trays 2810 and 2830 can be maximised to provide an extended view for easier searching and browsing of the app store and saved content, apps, and queries. An expanded app store view 2840 is exemplified in this figure. Figure 29 illustrates how an app might guide the user through an installation process. It is envisaged that each set of results required to run the analytic, including locations of entities and attributes that contain the required data, would be specified and a user would employ a standard search to find and bind the dataset. In more detail, this figure shows how an app that is running 2910 will position itself near the search bar 2920 and display analytics results 2930 below the search bar 2920. The resulting output may, for example, consist of graphs, and additional arbitrary buttons and actions a user can perform may be provided. In certain embodiments, the app may request a binding to a data set of customers. For example, the app may request the selection of a specific column (e.g. price). It is envisaged that output windows of a single application (e.g. graphs, analytics etc.) may be provided, as may an arbitrary app toolbar for instigating further actions and interactions with the app. To be clear, though, it is considered that apps may generally be created by other partners and vendors and used by enterprises across their virtual graphs. That is, they are self- contained units of code that can be deployed on demand or user request and run across data of the virtual graph (i.e. read, analyse and eventually write back). The work flow for apps launching from an apps suggestion tray is illustrated in Figure 30. Referring to this figure, app launching 3000 includes a user performing a search using the structured suggestions autocomplete and search page 3010. The user clicks or hovers on the app store strip, discussed above, to open the apps tray 3020. The user query may be sent to the server and a list of popular and most relevant suggestions returned 3030. The user can hover over any of the applications which may show, for example, the price and description of the app 3040. The user can then scroll the apps tray and may be provided with a search box at the top of the apps tray to refine the query and find the app they need 3050. In certain embodiments, the user can also expand the apps tray to full view which then covers the whole page for easier viewing and searching of apps 3060. Once an app has been identified, the user clicks on a buy or run button to launch the app 3070. The app is then automatically moved from the apps dock and stacked on the left side of the screen in the saved/installed apps section with a smaller icon 3080. If a single search is sufficient to define the dataset for the app, the app can then be run over the complete dataset.

The app installation flow may also simply involve a user clicking on the apps tab to enter app store, followed by searching for an app. The user can click and buy an app, or if previously purchased under enterprise license just install the app. Either way, the app is then move into the saved apps section. It is envisaged that the app will generally display a context box and guide the user through a standardised wizard to select and bind datasets required, through the standard query search process, to drive the apps analytics. Once completed, the app will appear on the saved apps tray and may be subsequently launched with a single click.

The app launching flow for saved apps is relatively straight forward. A user clicks on a recently used or saved app from the saved apps tray on the left. The cached app output is redisplayed, and the query for the app dataset re-executed and results displayed.

Likewise, launching flow for saved queries is relatively straight forward and includes a user clicking on the saved apps tray "saved queries" button. The list of most recently used queries and saved queries is then displayed and the user can search for a specific query by name and/or description. Once identified, the user can click on a query and the results are displayed.

Further clarification of the system and method of the invention can be gleaned from Figures 31 to 34. Referring to Figure 31 , the virtual graph is a software metadata layer describing a unified view of all the relational schemas in the enterprise (Virtual Graph). The virtual graph maps to physical relational schemas in physical databases underneath. The virtual graph is made up of sub sections of a virtual graph that are stored in the physical memory of each of the Graph Engines (Graph Engine). Each graph engine stores a section of the virtual graph that is mapped to a physical database relational schema, or a file, or collection of files. Some graph engines may be bridging graph engines that facilitate the joining of virtual graph sections found in memory of 2 or more other graph engines. Graph engines ingest and store the relational schema (data and relationships) of underlying physical data sources, or files, etc. (Databases or Structured or Unstructured Data).

Figure 32 shows how the virtual graph can be extended with an organisational layer that groups entities and relationships together into groups all the way up to a single group called "Enterprise X"

Figure 33 shows how the virtual graph is used to derive/generate the summarisations, search listings, and visual search (each summarisation is a path through the graph). Referring to Figure 34, dashboards are made up of consolidated view of apps (Dashboard). Apps are created or sourced from a library/suite of apps distributed across the fabric (Marketplace). Apps consist of visualisation that read and/or write data to the virtual graph (Apps). Apps can work on any subset/section of the virtual graph. The virtual graph may pass data through to the physical databases underneath.

The system of embodiments of the invention advantageously creates a plurality of small local "warehouses" mainly consisting of indexes that sit close to or over existing enterprise data sources. These may be considered as clustered standalone graph search engines that can be further connected and talk to each other, for example like a p2p network, to form a larger graph search engine. Search and business intelligence operations can then advantageously be performed at any level either within, or on a single graph search node, or any subset of connected graphs search nodes. At higher levels the system may look and behave like a single large graph search node. Hence the embodiments of the system act as a platform that provides a consistent set of functions and Application Programming Interfaces (APIs) to facilitate running Business Intelligence (Bl) operations across the whole or any part of a business data infrastructure. This architecture is the inverse to existing systems and advantageously pushes the computational requirements down closer to the data source. Advantageously, embodiments of the invention therefore provide a consistent standardised virtualised platform for Bl and make it easier and less expensive to adopt the platform from ground up. According to certain embodiments, the system of the invention maintains the existing normalised structure of the underlying data sources and indexes that structure intelligently and close to the source to still allow the required business intelligence queries. It may achieve this by providing a view over a full-text index. The system advantageously does not require de-normalisation of the data and hence provides a greater range of flexibility for Bl because it allows any path or tree along the graph to be analysed for Bl without incurring big data/index bloat caused by de-normalisation. Furthermore, the system of embodiments of the invention advantageously maintains optimised relational indexes and keeps the data normalised and efficiently stored but still allows full relational searches to connected pieces of data.

The systems of certain embodiments of the invention reuses, leverages and sits on an existing database that includes structured or unstructured data sources. The system creates only a virtual API and graphical representation and necessary indexes for facilitating highly parallel computation, analysis and performance. The standards advantageously allow interoperability of these computations across different databases using the same graph layer by bridging them into a single larger virtual database. Advantageously, certain embodiments of the system of the invention may allow holistic full-text search across a single or any superset of graph nodes with a single search box. It may allow for natural language input of business intelligence questions where words and phrases can be mapped and trigger and invoke business intelligence functions and computations. Still further it may allow visual navigation to any part, subset, or grouping of the enterprise graph where the user wants to find the data. It may also allow a user to manipulate and play with data result sets on the graph by pivoting on certain tables/entities and searching the neighbourhood graph. It is also envisaged that the system of the invention may provide advantages in terms of security and levels of authorisation of use to user of the system. For example, it may be possible for an owner of a data repository to define authorisation levels for users of the system across the enterprise, thereby facilitating access, or not, to a particular set of data.

It should be understood that the above advantages may apply only to certain embodiments of the invention. It should not be considered that each and every embodiment of the present invention should satisfy each and every one of the above advantages. For example, a particular embodiment of the invention may facilitate one or more of the above advantages.

Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements. Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term "comprising" is used in an inclusive sense and thus should be understood as meaning "including principally, but not necessarily solely". It will be appreciated that the foregoing description has been given by way of illustrative example of the invention and that all such modifications and variations thereto as would be apparent to persons of skill in the art are deemed to fall within the broad scope and ambit of the invention as herein set forth.

Claims

1 . A system for the federated enterprise analysis of data within a data source comprising:

a virtual graph overlying said data source and providing a unified view of relational schemas in said data source; and

at least one graph engine, each graph engine storing a respective section of said virtual graph that is mapped to a respective relational schema within said data source;

whereby enterprise analysis can be performed on a single graph engine within said at least one graph engine, or on any subset of graph engines within said at least one graph engine linked according to said virtual graph.

2. A system according to claim 1 , wherein each of said graph engines comprises:

a processor;

a search and query engine;

a virtual graph/schema engine;

a data index; and

a relationship index.

3. A system according to claim 1 or 2, wherein one or more of said graph engines is a bridging graph engine that is adapted to communicate with and link two or more of said graph engines, thereby linking sections of virtual graph stored within the linked graph engines.

4. A system according to claim 3, wherein said bridging graph engine comprises:

a processor; a search and query engine;

a virtual graph/schema engine;

a data index bridging said two or more graph engines linked by said bridging graph engine; and

a relationship index bridging said two or more graph engines linked by said bridging graph engine.

5. A system according to any one of the preceding claims, wherein said data source comprises a plurality of sources of data.

6. A system according to any one of the preceding claims, wherein said data source comprises structured and/or unstructured data.

7. A system according to claim 6, wherein said unstructured data includes one or more files.

8. A system according to any one of the preceding claims, wherein one or more of said graph engines is an external graph engine embedded or deployed externally as part of an external application.

9. A system according to claim 8, where said system comprises a graph engine adapter that facilitates association of said external graph engine with said virtual graph.

10. A system according to claim 8 or 9, wherein said external application comprises an external vendor application or a cloud based application.

1 1 . A system according to any one of the preceding claims, comprising a graph search gateway component adapted to facilitate bubble graph search visualisation on a user interface.

12. A system according to claim 1 1 , wherein in said bubble graph search visualisation, sections of said virtual graph are represented as interconnected bubbles and relationships between said sections are represented by connections between said bubbles.

13. A system according to claim 12, wherein said graph search gateway component is adapted to facilitate grouping of interconnected bubbles, representing sections of said virtual graph, within said bubble graph search visualisation.

14. A system according to any one of claims 1 1 to 13, wherein bubbles within said bubble graph search visualisation include indicia that indicate the amount of data within said bubbles.

15. A system according to claim 13, wherein said indicia are selected from colour, size or badging of said bubbles.

16. A system according to any one of claims 1 1 to 15, comprising zoom functionality that facilitates zooming into and out of user selected regions of said bubble graph visualisation.

17. A system according to claim 16, wherein zooming into said bubble graph visualisation increases the level of information presented in said user selected regions of said bubble graph visualisation.

18. A system according to any one of claims 1 to 10, wherein data search and analysis is facilitated through a list view type search.

19. A system according to claim 18, wherein search results within said list view type search comprise data rows grouped into neighbourhoods that pivot on root entities sourced from data mapped through said virtual graph.

20. A system according to claim 18 or 19, wherein said list view type search comprises a search navigation panel that summarises possible relationships within said virtual graph that can yield results for a given input query.

21 . A system according to any one of the preceding claims, wherein a user interface is provided, said user interface comprising a search box for entry of a search query by a user.

22. A system according to claim 21 , comprising an autocomplete helper adapted to identify entry of a search query by a user and provide structured suggestions based on said entry of said search query and underlying data in said underlying data repositories.

23. A system according to claim 22, wherein said autocomplete helper provides multiple dimensions of structured suggestions based on said virtual graph, data and metadata in said data source facilitating disambiguation of where, what and how search results are fetched.

24. A system according to any one of the preceding claims, wherein a user interface is provided, said user interface comprising an app store tray and wherein said system is adapted to identify a user query and list most popular and/or most relevant apps in said app store tray.

25. A system according to claim 24, wherein apps installed by a user can be used to bind and traverse relational and tree data across any subset of said virtual graph.

26. A system according to claim 24 or 25, where said system comprises a user dashboard that displays results of analysis conducted by respective apps selected by a user of the system.

27. A system according to claim 26, wherein said user dashboard is self- created by users of the system.

28. A system according to claim 26 or 27, where said user dashboard consolidates analytic report output from said apps.

29. A system according to any one of the preceding claims, wherein security of said system is decentralised whereby designated administrators of each graph engine can set permissions relating to a user's access to associated entities and attributes.

30. A method for the federated enterprise analysis of data within a data source comprising:

overlaying said data source with a virtual graph so as to provide a unified view of relational schemas in said data source;

mapping at least one graph engine storing a section of said virtual graph to a respective relational schema within said data source; and

performing enterprise analysis on a single graph engine within said at least one graph engine, or on any subset of graph engines within said at least one graph engine linked according to said virtual graph.

31 . A method according to claim 30, wherein one or more of said graph engines is a bridging graph engine that is adapted to communicate with and link two or more of said graph engines, said method comprising linking graph engines through said bridging graph engine, thereby linking sections of virtual graph stored within the linked graph engines.

32. A method according to claim 30 or 31 , wherein one or more of said graph engines is an external graph engine embedded or deployed externally as part of an external application and said method comprises associating said external graph engine with said virtual graph through a graph engine adapter.

A method according to any one of claims 30 to 32, comprising grouping data rows into neighbourhoods that pivot on root entities sourced from data mapped through said virtual graph and visualising structured and relational data in one or more of a list, summary or graph view.

34. A method according to any one of claims 30 to 33, wherein said step of performing enterprise analysis comprises running one or more apps on each of said graph engines or subset of graph engines.

35. A method according to claim 34, comprising collating the output from said one or more apps and displaying said collated output on a dashboard.