CN104536740A

CN104536740A - Virtual function sharing in sharing virtual storage between heterogeneous processors of computing platform

Info

Publication number: CN104536740A
Application number: CN201410790536.8A
Authority: CN
Inventors: S.颜; S.罗; X.周; Y.高; H.陈; B.萨哈
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2010-09-24
Filing date: 2010-09-24
Publication date: 2015-04-22
Anticipated expiration: 2030-09-24
Also published as: CN104536740B

Abstract

A computing platform can comprise heterogeneous processors (such as a CPU and a GPU) to support sharing of virtual functions between the processors. In one embodiment, a CPU side virtual function table pointer for accessing a sharing object from a CPU 110 can be used for determining a GPU virtual function table if a GPU side table exists. In another embodiment, sharing incoherent zones which do not need to keep data consistency can be established in a sharing virtual storage, and CPU side data and GPU side data stored in the sharing incoherent zones can have the same addresses seen from a CPU side and a GPU side. However, the content of the CPU side data can be different from the content of the GPU side data because the sharing virtual storage does not need to keep consistency during operation. In one embodiment, vptr can be modified to point to a CPU virtual function table and a GPU virtual function table stored in the sharing virtual storage.

Description

Virtual Function in shared virtual memory between the heterogeneous processor of computing platform is shared

Background technology

Computing platform can comprise heterogeneous processor, such as CPU (central processing unit) (CPU) and Graphics Processing Unit (GPU), symmetrical and asymmetric processor.Class example (or object) can reside in the first memory be associated with first side (such as CPU) of CPU-GPU platform.The member function that the second side (GPU side) can not be made to call reside in the object in the first memory be associated with the first side of CPU-GPU platform (CPU side) and be associated.The member function that the first side also can not be enable to call reside in the object on the second side (GPU) side in second memory and be associated.When class example or object are stored in different address space, existing communication mechanism may only allow one-way communication between heterogeneous processor (CPU and GPU) to the Virtual Function calling class example He be associated.

This one-way communication method prevents the natural functions of the class example between heterogeneous processor to divide.Object can comprise towards the member function of handling capacity and some scalar member functions.Such as, the scene class in game application can have can be suitable for GPU play up function, and the physics and artificial intelligence (AI) function that can be suitable for performing on CPU can be comprised.By current one-way communication mechanism, the scene class that existence two is different usually, comprise CPU (physics in upper example and AI) member function and GPU (what be applicable to GPU plays up function) member function respectively.Another may to need between two scene classes copies data back and forth for GPU for CPU to make two different scene classes one.

Accompanying drawing explanation

In the accompanying drawings exemplarily but not as being limitation of the present invention described herein.In order to the simplification that illustrates and clear, the unit shown in figure is not necessarily drawn in proportion.Such as, for the sake of clarity, the size of some unit may be exaggerated relative to other unit.In addition, when thinking fit, Reference numeral has been reused to indicate corresponding or similar unit between the figures.

Fig. 1 shows the platform 100 that the Virtual Function that stores in the shared virtual memory being supported between the heterogeneous processor that provides in computer platform according to an embodiment is shared;

Fig. 2 shows the process flow diagram of the operation that the Virtual Function that stores in the shared virtual memory between the heterogeneous processor that provided in computer platform by being supported in of performing of platform 100 according to an embodiment is shared;

Fig. 3 shows according to an embodiment for loading CPU side and the GPU side code of Virtual Function pointer from shared object;

Fig. 4 shows the generation table that performed by platform 100 according to the first embodiment with the process flow diagram of the operation being supported in the Virtual Function that stores in the shared virtual memory between the heterogeneous processor that provides in computer platform and sharing;

Fig. 5 to show according to an embodiment by platform 100 for supporting that the member function by the sharable object of heterogeneous processor carries out the process flow diagram of two-way communication between CPU 110 and GPU 180;

Fig. 6 shows the process flow diagram of the process describing GPU Virtual Function and the GPU function call of being undertaken by CPU side according to the first embodiment practically;

Fig. 7 shows the process flow diagram that the operation that the Virtual Function between heterogeneous processor is shared is supported in the virtual shared incoherent region of the use performed by platform 100 according to embodiment;

Fig. 8 shows and uses virtual shared incoherent region to support the graph of a relation that the Virtual Function between heterogeneous processor is shared according to embodiment;

The computer system of the support that the Virtual Function stored in the shared virtual memory between Fig. 9 shows and can to provide the heterogeneous processor provided in computer platform according to an embodiment is shared.

Embodiment

Following description describes the technology that the Virtual Function that stores in the shared virtual memory between the heterogeneous processor of computing platform is shared.In the following description, set forth many specific detail, such as logic realization, resource division or to share or the copy of system component realizes, type and mutual relationship and logical partitioning or comprehensive selection, more thoroughly understood to provide of the present invention.But, those skilled in the art will recognize that do not have this type of specific detail also can implement the present invention.In other example, be not shown specifically control structure, gate level circuit and full software sequence, in order to avoid make the present invention fuzzy.Those skilled in the art can realize suitable functional with the description that comprises, without the need to too much experiment.

Mention " embodiment ", " embodiment " in the description, embodiment described by " example embodiment " instruction can comprise specific features, structure or characteristic, but each embodiment may differ to establish a capital and comprises this specific features, structure or characteristic.In addition, this phrase not necessarily refers to same embodiment.In addition, when describing specific features, structure or characteristic in conjunction with an embodiment, thinking that it in the knowledge of those skilled in the range, to affect this feature, structure or characteristic in conjunction with other embodiment, no matter whether clearly describing.

Useful hardware, firmware, software or their any combination realize embodiments of the invention.Embodiments of the invention also can be embodied as storage instruction on a machine-readable medium, and they can be read by one or more processor and be performed.Machinable medium can comprise any mechanism for storing or transmit the information adopting machine (such as calculation element) readable form.

Such as, machinable medium can comprise ROM (read-only memory) (ROM); Random access memory (RAM); Magnetic disk storage medium; Optical storage media; Flash memory device; The signal of electricity, light form.In addition, firmware, software, routine and instruction can be described as herein and perform some action.But should be realized that, just for convenience, and in fact this action comes from other device of calculation element, processor, controller and execution firmware, software, routine and instruction in this description.

In one embodiment, computing platform can support that one or more technology are to allow the two-way communication (function call) being undertaken between heterogeneous processor (such as CPU and GPU) by the member function of the Virtual Function of such as shared object by partition by fine granularities shared object.In one embodiment, computing platform can allow use be called the first technology of the technology of " based on table " carry out CPU and GPU between two-way communication.In other embodiments, computing platform can allow use be called second technology of " incoherent region " technology carry out CPU and GPU between two-way communication, in the art, can create in virtually shared memory virtual share incoherent region.

In one embodiment, when using the technology based on table, CPU side virtual table (vtable) pointer that can be used for the shared object of the shared object of accessing from CPU or GPU side can be used for determining GPU virtual table, if there is GPU side table.In one embodiment, GPU side virtual table can comprise < " class name ", CPU virtual table address, GPU virtual table address >.In one embodiment, will the technology obtaining GPU side virtual table address and generation GPU side table be described in more detail below.

In other embodiments, when " incoherent region " technology of use, create in shared virtual memory and share incoherent region.In one embodiment, share incoherent region and can not keep data consistency.In one embodiment, the CPU side data shared in incoherent region can have and the identical address seen from CPU side and GPU side with GPU side data.But the content of CPU side data can be different from the content of GPU side data because shared virtual memory operationally period can not keep consistency.In one embodiment, the new copy that incoherent region can be used for the virtual method table storing each share class is shared.In one embodiment, Virtual table can be remained on same address by this method.

Illustrate in Fig. 1 that provide can the embodiment of computing platform 100 of Virtual Function in heterogeneous processor, the virtually shared memory such as shared between CPU and GPU.In one embodiment, platform 100 can comprise CPU (central processing unit) (CPU) 110, the operating system (OS) 112 be associated with CPU 110, CPU private room 115, CPU compiler 118, shared virtual memory (or multi version shared storage) 130, Graphics Processing Unit (GPU) 180, the operating system (OS) 182, GPU private room 185 and the GPU compiler 188 that are associated with GPU 180.In one embodiment, OS 112 and OS182 can distinguish CPU management 110 and CPU private room 115, and the resource of GPU 180 and GPU private room 185.In one embodiment, in order to support that shared virtual memory 130, CPU private room 115 and GPU private room 185 can comprise the copy of multi-edition data.In one embodiment, in order to keep memory consistency, the such as metadata of object 131 can be used for the copy of stores synchronized in CPU private room 115 and GPU private room 185.In other embodiments, multi-edition data can be stored in physics shared storage such as shared storage 950 (Fig. 9's, the following describes).In one embodiment, shared virtual memory can be supported by the privately owned storage space of physics (the CPU private room 115 of such as heterogeneous processor CPU 110 and GPU180 and GPU private room 185) or physics shared storage (such as by the shared storage 950 of heterogeneous processors sharing).

In one embodiment, CPU compiler 118 and GPU compiler 188 can be coupled to CPU 110 and GPU 180 respectively, or also can long-rangely be provided in other platform or computer system.The compiler (one or more) 118 be associated with CPU 110 can generate the compiled code for CPU 110, and the compiler 188 be associated with GPU 180 can generate the compiled code for GPU 180.In one embodiment, one or more member functions of object of providing by compiling user higher level lanquage such as object-oriented language of CPU compiler 118 and GPU compiler 188 are to generate compiled code.In one embodiment, compiler 118 and 188 can make object be stored in shared storage 130, and shared object 131 can comprise the member function distributing to CPU side 110 or GPU side 180.In one embodiment, the shared object 131 be stored in shared storage 130 can comprise member function such as Virtual Function VF 133-A to 133-K and non-Virtual Function NVF136-A to 136-L.In one embodiment, the VF 133 of member function such as shared object 131 and NVF 136 can provide the two-way communication between CPU 110 and GPU 180.

In one embodiment, in order to realize dynamic binding object, one of CPU 110 or GPU 180 call Virtual Function by index virtual table (virtual table), such as VF 133-A (such as C++ Virtual Function).In one embodiment, the hiding pointer in shared object 131 can point to this virtual table.But, CPU 110 and GPU 180 can have different instruction set architectures (ISA), and when compiling function for the CPU 110 and GPU 180 with different I SA, represent that the code of the same functions compiled by compiler 118 and 188 can have different sizes.In the same fashion by code layout at GPU side and CPU side (that is, the GPU version of the same Virtual Function in the CPU version of the Virtual Function in share class and share class), may be challenging.If there are three Virtual Functions in share class Foo (), then, in the CPU version of code, function can be positioned at address A1, A2 and A3.But in the GPU version of code, function can be positioned at address B1, B2 and B3, and they can be different from A1, A2 and A3.In share class, the CPU side of Same Function and this different address location of GPU side code can imply that shared object (i.e. the example of share class) may need 2 virtual tables (the first virtual table and the second virtual table).First virtual table can comprise the address (A1, A2 and A3) of the CPU side version of function, and when can use object (or calling CPU side function) in CPU side, can use the first virtual table.Second virtual table can comprise the address (B1, B2 and B3) of the GPU side version of function, and when can use object (or calling GPU side function) in GPU side, can use the second virtual table.

In one embodiment, the Virtual Function realizing by being associated with shared object 131 by the first and second virtual tables storing in the shared virtual memory between CPU 110 and GPU 180 is shared.In one embodiment, create public virtual table by being associated by first and second virtual table of shared object 131, its Virtual Function that can be used in CPU side and GPU side calls.

Depict the embodiment that heterogeneous processor CPU 110 and GPU 180 shares the Virtual Function be stored in shared virtual memory in the flowchart of fig. 2.At block 210, first processor such as CPU 110 can identify the first processor side virtual table pointer (CPU side virtual table pointer) of shared object 131.In one embodiment, for shared object 131, can there is CPU side virtual table pointer, no matter shared object 131 can be accessed by CPU side or GPU side.

In one embodiment, such as only have the normal Virtual Function in the environment of CPU to call for computing system, code sequence can as shown in the block 310 of Fig. 3.In one embodiment, even in computing system such as 100, it can comprise heterogeneous processor, the CPU side code sequence called for normal Virtual Function can with describe in the block 310 of Fig. 3 identical.As in a block 310 describe, the code in row 301: Mov r1, the virtual table of shared object 131 can be loaded into variable r1 by [obj].Code in row 305: (Call* [r1+offsetFunction]) can call Virtual Function, the such as VF133-A of shared object 131.

In block 250, second processor such as GPU 180 can use the first processor side virtual table pointer of shared object 131 (CPU side virtual table pointer) to determine the second processor side virtual table (GPU side virtual table), if there is the second processor side table (GPU table).In one embodiment, the second processor side table (GPU table) can comprise < " class name ", virtual table address, first processor side, the second virtual table address, processor side >.

In one embodiment, in GPU side, GPU 180 can be created on the code sequence described in block 350, and it can be different from the code sequence described in a block 310.In one embodiment because GPU compiler 188 can according to type know each can share class, therefore GPU 180 can be created on the code sequence described in block 350, for loading Virtual Function pointer from shared object such as shared object 131.In one embodiment, the code in row 351: Mov r1, [obj] can load CPU virtual table address, and the code in row 353: R2=getvtableAddress (r1) can obtain GPU virtual table from GPU table.In one embodiment, the code in row 358: (Call* [r2+offsetFunction]) can call Virtual Function based on the GPU virtual table using CPU virtual table address to generate.In one embodiment, getvtableAddress function can use CPU side virtual table allocation index to determine GPU side virtual table in GPU table.

At block 280, use shared object 131 can realize first processor (CPU 110) and the second processor (GPU 180) carries out two-way communication.

The process flow diagram of Fig. 4 is used to show the embodiment creating GPU table.At block 410, can during initialization time, in one embodiment, by sensing can the function pointer of registration function (registration function) of share class (shared object 131) being covered in initialization segments (the CRT $ XCI section of such as MSC++), form this table.Such as, can the registration function of share class can be involved in MS CRT $ XCI section initialization segments.

At block 420, registration function can be performed during initialization time.As the result covered by the function pointer pointing to registration function in initialization segments, registration function can be performed while execution initialization segments.

At block 430, in first processor side (CPU side), " class name " and " CPU virtual table address " can be registered in the first table by registration function.At block 440, in the second processor side (GPU side), " class name " and " GPU virtual table address " can be registered in the second table by registration function.

At block 480, the first table and the second table can be merged into a common block list.Such as, if the first and second tables comprise identical " class name ", then the Section 1 that the Section 1 of the first table can be shown with second combines.As the result merged, the group item of the first and second tables can look like one with single class name.In one embodiment, common block list can reside in GPU side, and common block list or GPU table can comprise " class name ", CPU virtual table address and GPU virtual table address.

In one embodiment, establishment common block list or GPU table can avoid the requirement to virtual table matching addresses in CPU side and GPU side.Further, GPU table can support dynamic link library (DLL).In one embodiment, GPU side can to use or before initialization shared object 131, can on CPU side loading classes.But because application is generally loaded in CPU side, therefore also have static link library for the class defined in the application, GPU shows the two-way communication that can realize between CPU 110 and GPU 180.Can be carried on CPU side for DLL, DLL, and GPU table also can be used for the two-way communication of DLL.

Sharable object 131 can comprise CPU side virtual table pointer, and can not have extra virtual table pointer for GPU side virtual table.In one embodiment, use the CPU virtual table pointer in object, GPU virtual table pointer can be generated, as above described in the block 350 of Fig. 4.In one embodiment, former state can use CPU virtual table pointer on CPU side, and the GPU virtual table pointer in GPU side can be used for Virtual Function and calls.In one embodiment, this method can not relate to amendment or the participation of linker/loader, and does not require the extra vptr pointer field in shared object 131.This method can allow the partition by fine granularities of the application write with object-oriented language between CPU 110 and GPU 180.

Illustrate in Fig. 5 that computing platform 100 is for supporting the embodiment by can be carried out the process flow diagram of two-way communication between CPU 110 and GPU 180 by the member function of the object of heterogeneous processors sharing.In one embodiment, GPU compiler 188 can generate for the CPU far call API520 on the CPU counterfoil (stub) 510 of GPU function and CPU side 110.Further, GPU compiler 188 can generate GPU side adhesive logic (gluing logic) 530 for the GPU function in the GPU side 180 of the first member function.In one embodiment, CPU 110 can use first of the first path to enable path (comprising counterfoil logic 510, API 520 and adhesive logic 530) to call the first member function.In one embodiment, first enables path can allow CPU 110 and GPU side 180 to set up far call, and information is sent to GPU side 180 from CPU side 110.In one embodiment, GPU side adhesive logic 530 can allow GPU 180 to receive the information transmitted from CPU side 110.

In one embodiment, CPU counterfoil 510 can comprise the title identical with the first member function (i.e. original GPU member function), but can enclose API 520 and be directed to GPU 180 calling from CPU 110.In one embodiment, the code generated by CPU compiler 118 can call the first member function as former state, but calls and can be redirected to CPU counterfoil 510 and far call API 520.Moreover when carrying out far call, CPU counterfoil 510 can send the unique name of the first member function that expression is being called and point to the pointer of shared object and other parameter of invoked first member function.In one embodiment, GPU side adhesive logic 530 can receiving parameter, and dispatches the first member function and call.In one embodiment, GPU compiler 188 can generate adhesive logic (or scheduler), and its GPU side function address calling the first member function by the pointer to object being used as the first Parameter transfer is to dispatch non-Virtual Function.In one embodiment, the jump list registration that GPU compiler 188 can generate in GPU side calls to register GPU side adhesive logic 530, can communicate to make CPU counterfoil 510 with GPU side adhesive logic 530.

In one embodiment, GPU compiler 188 can create second and enable path, comprises for the GPU far call API 570 in the GPU counterfoil 550 of CPU function, GPU side 180 and the CPU side adhesive logic 580 for the second member function of distributing to CPU 110.In one embodiment, GPU 180 can use second to enable path to call CPU side 110.In one embodiment, GPU counterfoil 560 and API 570 can allow GPU 180 and CPU side 110 to set up far call, and information is sent to CPU side 110 from GPU side 180.In one embodiment, CPU side adhesive logic 580 can allow CPU 110 to receive the information transmitted from GPU side 180.

In one embodiment, in order to support that the second member function calls, the jump list that GPU compiler 188 can generate for CPU side adhesive logic 580 is registered.In one embodiment, the CPU side function address of the second member function can be called in CPU adhesive logic 580.In one embodiment, the code generated by CPU adhesive logic 580 can link with other code generated by CPU compiler 118.This method can provide support the path of two-way communication between heterogeneous processor 110 and 180.In one embodiment, CPU counterfoil logic 510 and CPU side adhesive logic 580 can be coupled to CPU 110 via CPU linker 590.In one embodiment, CPU linker 590 can use CPU counterfoil 510, CPU side adhesive logic 580 and other code building CPU of being generated by CPU compiler 118 to perform 595.In one embodiment, GPU counterfoil logic 560 and GPU side adhesive logic 570 can be coupled to GPU 180 via GPU linker 540.In one embodiment, GPU linker 540 can use GPU adhesive logic 570, GPU counterfoil 560 and other code building GPU of being generated by GPU compiler 188 to perform 545.

Illustrate in Fig. 6 that CPU side 110 uses the above-described technology based on table to call the embodiment of the process flow diagram 600 of GPU Virtual Function and the non-Virtual Function of GPU.Show block 610, comprise the object that share class example or title are share class Foo (), it comprises annotation Virtual Function (such as VF 133-A) and Virtual Function and calls the first comment tag #Pragma GPU of " Virtual void SomeVirtFunc () " and annotate the second comment tag #Pragma GPU that non-Virtual Function (such as NVF 136-A) and non-Virtual Function call " voidSomeNonVirtuFunc () ".

In one embodiment, ' pFoo' can the shared object 131 of sense(-)class Foo (), and long-range Virtual Function can be completed from CPU side 110 to GPU side 180 and call.In one embodiment, ' pFoo=new (SharedMemoryAllocator ()) Foo (); ' can with shared storage distribute/discharge run time call cover new/one of deleting operator may mode.In one embodiment, CPU compiler 118 in response in block 610 compile ' pFoo->SomeVirtuFunc () ', describing in the block 620 of task can be initiated.

GPU Virtual Function can be called in block 620, CPU side 110.Information (parameter) can be sent to GPU side 180 in block 630, CPU side counterfoil (for GPU member function) 510 and API 520.At block 640, GPU side adhesive logic (for GPU member function) 530 can obtain pGPUVptr (CPU side virtual table pointer) from THIS object, and can find GPU virtual table.At block 650, GPU side adhesive logic 540 (or scheduler) can have the above-described code sequence described in block 350, obtains GPU side virtual table to use CPU side virtual table pointer.

In one embodiment, GPU compiler 188 in response to compiling #Pragma GPU'voidSomeNonVirtuFunc () ' in block 610, can generate use ' pFooSomeNonVirtuFunc () ' initiates the code of task described in block 670.The non-Virtual Function of GPU can be called in block 670, CPU side 110.Information (parameter) can be sent to GPU side 180 at block 680, CPU side counterfoil 510 and API520.At block 690, GPU side adhesive logic 530 can push parameter and direct call address, because function address may there is known.

Show the embodiment that the operation that the Virtual Function between heterogeneous processor is shared is supported in the virtual shared incoherent region of the use performed by computing platform 100 in the flow chart of figure 7.Such as comprise in the computing system 100 of heterogeneous processor (such as CPU 110 and GPU 180) at computing system, CPU 110 and GPU 180 can run the different code generated by different compiler such as 118 and 188 (or having the identical compiler of different target), can not ensure that identical Virtual Function is positioned at identical address.Support that Virtual Function is shared although be likely modified as by compiler/linker/loader, " incoherent region " described below method (only having method during operation) may be allow the more simple technique that between CPU 110 and GPU 180, Virtual Function is shared.This method can allow easily accept and dispose shared virtual memory system, such as I/you/our (MYO).Although exemplarily use C++ object-oriented language, method below can be applicable to other the OO programming language supporting Virtual Function.

Can create in shared virtual memory 130 at block 710, CPU 110 and share incoherent region to store the virtual table of the share class of CPU 110 and GPU 180.In one embodiment, by specifying that the incoherent label of shared virtual memory 130 inner region creates shared incoherent region.In one embodiment, MYO can provide one or more application programmable interface (API) function to create virtual shared region (be called in the term of MYO " place (arena) ", and can create this type of places many in MYO) when running.Such as, can use such as myoArenaCreate (xxx ..., NonCoherentTag) or myoArenaCreateNon CoherentTag (xxx ...) label.In one embodiment, use and can create relevant or incoherent place with label.But, in other embodiments, api function can be used to change the attribute of storer bulk (or part).Such as, myoChangeToNonCoherent (addrsize) can be used to create first area as incoherent region or place, and second area (or part) is as relevant place.In one embodiment, first area can be specified by address size.

In one embodiment, can create storer place (that is, the storer bulk of management), it can allow data sharing and without the need to keeping data consistency, and this storer place can be described as shared incoherent region.In one embodiment, the cpu data be stored in shared incoherent region can have and the identical address seen by CPU 110 and GPU 180 with GPU data.But content (cpu data and GPU data) can be different, because shared virtual memory 130 such as MYO operationally can not keep consistency.In one embodiment, the new copy that incoherent region can be used for the virtual method table storing each share class is shared.In one embodiment, can be identical with the virtual table address that GPU 180 sees from CPU 110; But virtual table can be different.

At block 750, during initialization time, eachly the virtual table of share class can copy shared virtual memory 130 to from CPU private room 115 and GPU private room 185.In one embodiment, CPU side virtual table can be copied in the incoherent region in shared virtual memory 130, and GPU side virtual table also can be copied in the incoherent region in shared virtual memory 130.In one embodiment, in the communal space, CPU side virtual table and GPU side virtual table can be positioned at identical address.

In one embodiment, if tools chain support is available, then CPU compiler 118 or GPU compiler 188 can comprise CPU and GPU virtual table data in special data section, and special data section can be loaded into shared incoherent region by loader 540 or 570.In other embodiments, CPU compiler 118 or GPU compiler 188 can allow such as to use API Calls such as myoChangeToNonCoherent to create special data section in shared incoherent region.In one embodiment, CPU compiler 118 and GPU compiler 188 can guarantee that CPU virtual table and GPU virtual table can be positioned at same offset address (otherwise just having appropriate filling) in special data section.In one embodiment, when multiple inheritance, multiple virtual table pointer may be there is in object placement.In one embodiment, CPU compiler 118 and GPU compiler 188 also can guarantee that CPU virtual table and GPU virtual table pointer can be positioned at same offset in object placement.

When lacking tools chain and supporting, in one embodiment, user can be allowed to copy CPU virtual table and GPU virtual table to shared incoherent region.In one embodiment, this artificial copy that one or more grand so that CPU and GPU shows to share incoherent storage area can be generated.

Operationally, after can creating shared object such as shared object 131, can create object placement 801, it can comprise multiple " vptr " for multiple inheritance.In one embodiment, the Virtual table pointer (vptr) of the shared object 131 in Object table 801 can be updated the new copy that (patch installing) becomes to point to the virtual table shared in incoherent region.In one embodiment, the constructor of class can be used to upgrade the Virtual table pointer of shared object, and such can comprise Virtual Function.In one embodiment, as fruit does not comprise any Virtual Function, then the data sum functions of this kind can be shared, and may not necessarily operationally period upgrade (or patch installing).

Can be modified to point at block 780, vptr (virtual table pointer) and share incoherent region, create shared object 131 simultaneously.In one embodiment, vptr by acquiescence point to privately owned virtual table (CPU virtual table or GPU virtual table) can be modified (as in fig. 8 indicated by solid line 802-C) become point to share incoherent region 860.In one embodiment, Virtual Function can by calling as follows:

Mov eax, [ecx] #ecx contains " this " pointer, and eax contains vptr;

Call [eax, vfunc] #vfunc is the Virtual Function index in virtual table.

In CPU side, the CPU that above code can call Virtual Function realizes; And in GPU side, the GPU that above code can call Virtual Function realizes.This method can allow the data sharing of class and Virtual Function to share.

Figure 8 illustrates and use the virtual embodiment sharing the graph of a relation 800 that incoherent region supports the Virtual Function between heterogeneous processor to share.In one embodiment, object placement 801 can comprise the Virtual table pointer (vptr) in the first groove 801-A and the field 1 in other field such as groove 801-B and 801-C and field 2.In one embodiment, after CPU compiler 118 and GPU compiler 188 execution are arranged in the virtual table pointer (vptr) of groove 801-A, generate (indicated by dotted line 802-A) CPU virtual table and GPU virtual table (indicated by dotted line 802-B).CPU virtual table (CPU virtual table) can be positioned at the address 810 in CPU private address space 115, and GPU virtual table can be positioned at the address 840 in GPU private address space 185.In one embodiment, CPU virtual table can comprise the function pointer of such as vfunc1 and vfunc2, and GPU virtual table can comprise the function pointer of such as vfunc1' and vfunc2'.In one embodiment, function pointer (vfunc1 and vfunc2) and (vfunc1' and vfunc2') also can be different, because the difference of pointed Same Function realizes.

In one embodiment, as the result of amendment vptr (as shown in block 780), vptr can point to the shared incoherent region 860 in shared virtual memory 130.In one embodiment, CPU virtual table can be positioned at address Address870, and GPU virtual table can be positioned at identical address Address 870.In one embodiment, CPU virtual table can comprise the function pointer of such as vfunc1 and vfunc2, and GPU virtual table can comprise the function pointer of such as vfunc1' and vfunc2'.In one embodiment, function pointer (vfunc1 and vfunc2) can be different from (vfunc1' and vfunc2').In one embodiment, CPU virtual table and GPU virtual table are kept in shared incoherent region 860 and CPU 110 and GPU 180 can be made can to see CPU virtual table and GPU virtual table at same address location Address870 respectively, but the content (vfunc1 and vfunc2) of CPU virtual table can be different from the content (vfunc1' and vfunc2') of GPU virtual table.

Figure 9 illustrates the embodiment of the computer system 900 comprising the heterogeneous processor supporting two-way communication.With reference to figure 9, computer system 900 can comprise the general processor (or CPU) 902 and graphics processor unit (GPU) 905 that comprise single instruction multiple data (SIMD) processor.In one embodiment, CPU 902 also can perform enhancing operation except performing other task various or sequence of store instructions, to provide enhancing to operate in machinable medium 925.But instruction sequence also can be stored in the privately owned storer 920 of CPU, or in other suitable storage medium any.In one embodiment, CPU 902 can be associated with the old compiler 903 of CPU and CPU linker/loader 904.In one embodiment, GPU905 can be associated with the proprietary compiler 906 of GPU and GPU linker/loader 907.

Although depict independent graphics processor unit GPU 905 in fig .9, in certain embodiments, as another example, processor 902 can be used for performing enhancing operation.The processor 902 of operation computer system 900 can be the one or more processor cores being coupled to logic 930.Logic 930 can be coupled to one or more I/O device 960, and it can be provided to the interface of computer system 900.Logic 930 can be such as chipset logic in one embodiment.Logic 930 is coupled to storer 920, and it can be the reservoir of any kind, comprises light, magnetic or semiconductor memory mechanism.Graphics processor unit 905 is coupled to display 940 by frame buffer.

In one embodiment, computing platform 900 can support that one or more technology are to allow the two-way communication (function call) being undertaken between heterogeneous processor CPU 902 and GPU 905 by the member function of the Virtual Function of such as shared object by partition by fine granularities shared object.In one embodiment, computer system 900 use can be allowed to be called the first technology of the technology of " based on table " carries out the two-way communication between CPU 902 and GPU 905.In other embodiments, computing platform can allow use to be called, and second technology of " incoherent region " technology carries out the two-way communication between CPU 902 and GPU 905, in the art, virtual shared incoherent region can be created at the virtually shared memory being arranged in privately owned CPU storer 920, privately owned GPU storer 930 or shared storage 950.In one embodiment, independent shared storage such as shared storage 950 can not be provided in computer system 900, and in this case, shared storage can be provided in a privately owned storer such as CPU storer 920 or GPU storer 930 wherein.

In one embodiment, when using the technology based on table, can be used for the CPU side virtual table pointer of accessing from the shared object of the shared object of CPU 110 or GPU 180 and can be used for determining GPU virtual table, if there is GPU side table.In one embodiment, GPU side virtual table can comprise < " class name ", CPU virtual table address, GPU virtual table address >.In one embodiment, the technology of GPU side virtual table address and generation GPU side table is obtained, as mentioned above.

Graph processing technique described herein can realize with various hardware structure.Such as, graphics functionalities accessible site is in chipset.Alternatively, discrete graphic process unit can be used.As another embodiment, graphing capability can be realized by general processor, comprises polycaryon processor, or is embodied as the software instruction collection be stored in machine readable media.

Claims

1. a platform, comprising:

The combination of CPU (central processing unit) (CPU) and Graphics Processing Unit (GPU); And

The all addressable shared physical storage of described GPU and CPU, described shared physical storage can be mapped to all addressable shared virtual memory of described CPU and GPU by wherein said platform,

Wherein, described platform is suitable for:

The shared object comprising multiple Virtual Function is stored in described shared virtual memory; And

At least one in described multiple Virtual Function is shared between described CPU and GPU.

2. platform according to claim 1, wherein, described platform shared described multiple Virtual Function between described CPU and GPU is included in the two-way communication between described CPU and described GPU.

3. platform according to claim 1, wherein, described shared object also comprises non-Virtual Function.

4. platform according to claim 1, wherein, described shared object comprises the Virtual table pointer for index virtual table.

5. platform according to claim 1, wherein, described shared object comprises CPU side Virtual table pointer.

6. platform according to claim 5, wherein, described GPU utilizes described CPU side Virtual table pointer to determine GPU side Virtual table.

7. platform according to claim 6, wherein, described GPU side Virtual table comprises class name, Virtual table address, CPU side and GPU side Virtual table address.

8. platform according to claim 6, wherein, described GPU side Virtual table supports at least one in dynamic link library or static link library.

9. platform according to claim 1, wherein, described platform also creates in described shared virtual memory shares incoherent region, and CPU side Virtual table and GPU side Virtual table are copied in described shared virtual memory, wherein, described CPU side Virtual table and described GPU side Virtual table have identical address in described shared virtual memory.

10. platform according to claim 9, wherein, described platform also revises Virtual table pointer to point to described identical address, wherein said CPU side Virtual table comprises CPU side function pointer, and wherein said GPU side Virtual table comprises the GPU side function pointer being different from described CPU side function pointer.

11. platforms according to claim 10, wherein, described platform also comprises the privately owned storage space of CPU, and wherein said platform copy is from the CPU side Virtual table of the privately owned storage space of described CPU.