US20110067038A1 - Co-processing techniques on heterogeneous gpus having different device driver interfaces - Google Patents

Co-processing techniques on heterogeneous gpus having different device driver interfaces Download PDF

Info

Publication number
US20110067038A1
US20110067038A1 US12/649,864 US64986409A US2011067038A1 US 20110067038 A1 US20110067038 A1 US 20110067038A1 US 64986409 A US64986409 A US 64986409A US 2011067038 A1 US2011067038 A1 US 2011067038A1
Authority
US
United States
Prior art keywords
class
driver interface
device driver
computing device
graphics processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/649,864
Inventor
Alejandro Troccoli
Franck Diard
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Priority to US12/649,864 priority Critical patent/US20110067038A1/en
Assigned to NVIDIA CORPORATION reassignment NVIDIA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIARD, FRANCK, TROCCOLI, ALEJANDRO
Publication of US20110067038A1 publication Critical patent/US20110067038A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Definitions

  • Conventional computing systems may include a discrete graphics processing unit (dGPU) or an integral graphics processing unit (iGPU).
  • dGPU discrete graphics processing unit
  • iGPU integral graphics processing unit
  • the discrete GPU and integral GPU are heterogeneous because of their different designs.
  • the integrated GPU generally has relatively poor processing performance compared to the discrete GPU.
  • the integrated GPU generally consumes less power compared to the discrete GPU.
  • the conventional operating system does not readily support co-processing using such heterogeneous GPUs.
  • FIG. 1 a graphics processing technique according to the conventional art is shown.
  • an application 110 calls the user mode level runtime application programming interface (e.g., DirectX API d3d9.dll) 120 to determine what display adapters are available.
  • the runtime API 120 enumerates the adapters that are attached to the desktop (e.g., the primary display 180 ).
  • a display adapter 165 , 175 even recognized and initialized by the operating system, will not be enumerated in the adapter list by the runtime API 120 if it is not attached to the desktop.
  • the runtime API 120 loads the device driver interface (DDI) (e.g., user mode driver (umd.dll)) 130 for the GPU 170 attached to the primary display 180 .
  • DDI device driver interface
  • the runtime API 120 of the operating system will not load the DDI of the discrete GPU 175 because the discrete GPU 175 is not attached to the display adapter.
  • the DDI 130 configures command buffers of the graphics processor 170 attached to the primary display 180 .
  • the DDI 130 will then call back to the runtime API 120 when the command buffers have been configured.
  • the application 110 makes graphics request to the user mode level runtime API (e.g., DirectX API d3d9.dll) 120 of the operating system.
  • the runtime 120 sends graphics requests to the DDI 130 which configures command buffers.
  • the DDI calls to the operating system kernel mode driver (e.g., DirectX driver dxgkrnl.sys) 150 , through the runtime API 120 , to schedule the graphics request.
  • the operating system kernel mode driver then calls to the device specific kernel mode driver (e.g., kmd.sys) 150 to set the command register of the GPU 170 attached to the primary display 180 to execute the graphics requests from the command buffers.
  • the device specific kernel mode driver 160 controls the GPU 170 (e.g., integral GPU) attached to the primary display 180 .
  • Embodiments of the present technology are directed toward graphics co-processing.
  • the present technology may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiment of the present technology.
  • a graphics co-processing method includes injecting an application initialization routine when an application starts.
  • the injected application initialization routine includes an entry point that changes a search path for a device driver interface to a search path of a shim layer library.
  • the loaded shim layer library initializes a device driver interface of a first class for a first graphics processing unit class on a primary adapter and a device driver interface of a second class for a second graphics processing unit on an unattached adapter.
  • the shim translates calls between the first device driver interface of the first class and the second device driver interface of the second class.
  • a graphics co-processing method includes loading a shim layer library, by a runtime application programming interface.
  • the shim layer library loads and initializing a device driver interface on the primary adapter.
  • the shim layer also loads and initializing a device driver interface on an unattached adapter.
  • the shim layer also translates calls between the runtime application programming interface and commands of a first device driver interface class for the first device driver interface class.
  • the shim layer may further convert a display format of a second device driver interface class for the device driver interface on an unattached adapter to a display format of the first device driver interface class.
  • FIG. 1 shows a graphics processing technique according to the convention art.
  • FIG. 2 shows a graphics co-processing computing platform, in accordance with one embodiment of the present technology.
  • FIG. 3 shows a graphics co-processing technique, in accordance with one embodiment of the present technology.
  • FIG. 4 shows a graphics co-processing technique, in accordance with another embodiment of the present technology.
  • FIG. 5 shows a method of synchronizing copy and present operations on a first and second GPU, in accordance with one embodiment of the present technology.
  • FIG. 6 shows an exemplary set of render and display operations, in accordance with one embodiment of the present technology.
  • FIG. 7 shows an exemplary set of render and display operations, in accordance with another embodiment of the present technology.
  • FIG. 8 shows a method of compressing rendered data, in accordance with one embodiment of the present technology.
  • FIG. 9 shows an exemplary desktop 910 including an exemplary graphical user interface for selection of the GPU to run a given application, in accordance with one embodiment of the present technology.
  • FIG. 10 shows a graphics co-processing technique, in accordance with another embodiment of the present technology.
  • Embodiments of the present technology introduce a shim layer between the runtime API (e.g., DirectX) and the device driver interface (DDI) (e.g., user mode driver (UMD)) to separate the display commands from the rendering commands, allowing retargeting of rendering commands to an adapter other than the adapter the application is displaying on.
  • the shim layer allows the DDI layer to redirect a runtime (e.g., Direct3D (D3D)) default adapter creation to an off-screen graphics processing unit (GPU), such as a discrete GPU, not attached to the desktop.
  • the shim layer effectively layers the device driver interface, and therefore does not hook a system component.
  • the exemplary computing platform may include one or more central processing units (CPUs) 205 , a plurality of graphics processing units (GPUs) 210 , 215 , volatile and/or non-volatile memory (e.g., computer readable media) 220 , 225 , one or more chip sets 230 , 235 , and one or more peripheral devices 215 , 240 - 260 communicatively coupled by one or more busses.
  • the GPUs include heterogeneous designs.
  • a first GPU may be an integral graphics processing unit (iGPU) and a second GPU may be a discrete graphics processing unit (dGPU).
  • the chipset 230 , 235 acts as a simple input/output hub for communicating data and instructions between the CPU 205 , the GPUs 210 , 215 , the computing device-readable media 220 , 225 , and peripheral devices 215 , 240 - 265 .
  • the chipset includes a northbridge 230 and southbridge 235 .
  • the northbridge 230 provides for communication between the CPU 205 , system memory 220 and the southbridge 235 .
  • the northbridge 230 includes an integral GPU.
  • the southbridge 235 provides for input/output functions.
  • the peripheral devices 215 , 240 - 265 may include a display device 240 , a network adapter (e.g., Ethernet card) 245 , CD drive, DVD drive, a keyboard, a pointing device, a speaker, a printer, and/or the like.
  • the second graphics processing unit is coupled as a discrete GPU peripheral device 215 by a bus such as a Peripheral Component Interconnect Express (PCIe) bus.
  • PCIe Peripheral Component Interconnect Express
  • the computing device-readable media 220 , 225 may be characterized as primary memory and secondary memory.
  • the secondary memory such as a magnetic and/or optical storage, provides for non-volatile storage of computer-readable instructions and data for use by the computing device.
  • the disk drive 225 may store the operating system (OS), applications and data.
  • the primary memory such as the system memory 220 and/or graphics memory, provides for volatile storage of computer-readable instructions and data for use by the computing device.
  • the system memory 220 may temporarily store a portion of the operating system, a portion of one or more applications and associated data that are currently used by the CPU 205 , GPU 210 and the like.
  • the GPUs 210 , 215 may include integral or discrete frame buffers 211 , 216 .
  • an application 110 calls the user mode level runtime application programming interface (e.g., DirectX API d3d9.dll) 120 to determine what display adapters are available.
  • an application initialization routine is injected when the application starts.
  • the application initialization routine is a short dynamic link library (e.g., appln.dll).
  • the application initialization routine injected in the application includes some entry points, one of which includes a call (e.g., set_dll_searchpath( )) to change the search path for the display device driver interface.
  • the search path for the device driver interface (e.g., c: ⁇ windows ⁇ system32 ⁇ . . . ⁇ umd.dll) is changed to the search path of a shim layer library (e.g., c: ⁇ . . . ⁇ coproc ⁇ . . . ⁇ umd.dll). Therefore the runtime API 120 will search for the same DDI name but in a different path, which will result in the runtime API 120 loading the shim layer 125 .
  • a shim layer library e.g., c: ⁇ . . . ⁇ coproc ⁇ . . . ⁇ umd.dll
  • the shim layer library 125 has the same entry points as a conventional display driver interface (DDI).
  • the runtime API 120 passes one or more function pointers to the shim layer 125 when calling into the applicable entry point (e.g., OpenAdapter( ) in the shim layer 125 .
  • the function pointers passed to the shim layer 125 are call backs into the runtime API 120 .
  • the shim layer 125 stores the function pointers.
  • the shim layer 125 loads and initializes the DDI on the primary adapter 130 .
  • the DDI on the primary adapter 130 returns a data structure pointer to the shim layer 125 representing the attached adapter.
  • the shim layer 125 also loads and initializes the device driver interface on the unattached adapter 135 by passing two function pointers which are call backs into local functions of the shim layer 125 .
  • the DDI on the unattached adapter 135 also returns a data structure pointer to the shim layer 125 representing the unattached adapter.
  • the data structure pointers returned by the DDI on the primary adapter 130 and unattached adapter 135 are stored by the shim layer 125 .
  • the shim layer 125 returns to the runtime API 120 a pointer to a composite data structure that contains the two handles. Accordingly, the DDI on the unattached adapter 135 is able to initialize without talking back to the runtime API 120 .
  • the shim layer 125 is an independent library.
  • the independent shim layer may be utilized when the primary GPU/display and the secondary GPU are provided by different vendors.
  • the shim layer 125 may be integral to the display device interface on the unattached adapter.
  • the shim layer integral to the display device driver may be utilized when the primary GPU/display and secondary GPU are from the same vendor.
  • the application initialization routine (e.g., appln.dll) injected in the application also includes other entry points, one of which includes an application identifier.
  • the application identifier may be the name of the application.
  • the shim layer 125 application makes a call to the injected application initialization routine (e.g., appln.dll) to determine the application identifier when a graphics command is received.
  • the application identifier is compared with the applications in a white list (e.g., a text file).
  • the white list indicates an affinity between one or more applications and the second graphics processing unit.
  • the white list includes one or more applications that would perform better if executed on the second graphics processing unit.
  • the shim layer 125 calls the device driver interface on the primary adapter 130 .
  • the device driver interface on the primary adapter 130 sets the command buffers.
  • the device driver interface on the primary adapter then calls, through the runtime 120 and a thunk layer 140 , to the operating system kernel mode driver (e.g., DirectX driver dxgkrnl.sys) 150 .
  • the operating system kernel mode driver 160 in turn schedules the graphics command with the device specific kernel mode driver (e.g., kmd.sys) 160 for the GPU 210 attached to the primary display 240 .
  • the GPU 210 attached to the primary display 240 is also referred to hereinafter as the first GPU.
  • the device specific kernel mode driver 160 sets command register of the GPU 210 to execute the graphics command on the GPU 210 (e.g., integral GPU) attached to the primary display 240 .
  • the handle from the runtime API 120 is swapped by the shim layer 125 with functions local to the shim layer 125 .
  • the local function stored in the shim layer 125 will call into the DDI on the unattached adapter 135 to set command buffer.
  • the DDI on the unattached adapter 135 will call local functions in the shim layer 125 that route the call through the thunk layer 140 to the operating system kernel mode driver 150 to schedule the rendering command.
  • the operating system kernel mode driver 150 calls the device specific kernel mode driver (e.g., dkmd.sys) 165 for the GPU on the unattached adapter 215 to set the command registers.
  • the GPU on the unattached adapter 215 e.g., discrete GPU
  • the DDI on the unattached adapter 135 can call local functions in the thunk layer 140 .
  • the thunk layer 140 routes the graphics request to the operating system kernel mode driver (e.g., DirectX driver dxgkrnl.sys) 150 .
  • the operating system kernel mode driver 150 schedules the graphics command with the device specific kernel mode driver (e.g., dkmd.sys) 165 on the unattached adapter.
  • the device specific kernel mode driver 165 controls the GPU on the unattached adapter 215 .
  • the shim layer 125 splits the display related command received from the application 110 into a set of commands for execution by the GPU on the unattached adapter 215 and another set of commands for execution by the GPU on the primary adapter 210 .
  • the shim layer 125 calls to the DDI on the unattached adapter 135 to cause a copy the frame buffer 216 of the GPU on the unattached adapter 215 to a corresponding buffer in system memory 220 .
  • the shim layer 125 will also call the DDI on the primary adapter 130 to cause a copy from the corresponding buffer in system memory 220 to the frame buffer 211 of the GPU on the attached adapter 210 and then a present by the GPU on the attached adapter 210 .
  • the memory accesses between the frame buffers 211 , 216 and system memory 220 may be direct memory accesses (DMA).
  • DMA direct memory accesses
  • the operating system (e.g., Window7Starter) will not load a second graphics driver 165 .
  • FIG. 4 a graphics co-processing technique, in accordance with another embodiment of the present technology, is shown.
  • the second GPU 475 is tagged as a non-graphics device adapter that has its own driver 465 . Therefore the second GPU 475 and its device specific kernel mode driver 465 are not seen by the operating system as a graphics adapter.
  • the second GPU 475 and its driver 465 are tagged as a memory controller.
  • the shim layer 125 loads and configures the DDI 130 for the first GPU 210 on the primary adapter and the DDI 135 for the second GPU 475 If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475 , the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130 , calls the DDI on the unattached adapter to sets the commands buffers for the second GPU 475 , and routes them to the driver 465 for the second GPU 475 . The shim layer 125 also intercepts the callbacks from the driver 465 for the second GPU 475 to the runtime 120 . In another implementation, the shim layer 125 implements the DDI 135 for the second GPU 475 . Accordingly, the shim layer 125 splits graphics command and redirects them to the two DDIs 130 , 135 .
  • the embodiments described with reference to FIG. 3 enables the application to run on a second GPU instead of a first GPU when the particular version of the operating system will allow the driver for the second GPU to be loaded but the runtime API will not allow a second device driver interface to be initialized.
  • the embodiments described with reference to FIG. 4 enables an application to run on a second GPU, such as a discrete GPU, instead of a first GPU, such as an integrated GPU, when the particular version of the operation system (e.g., Win7Starter) will not allow the driver for the second GPU to be loaded.
  • the DDI 135 for the second GPU 475 cannot talkback through the runtime 120 or the thunk layer 140 to a graphics adapter handled by an OS specific kernel mode driver.
  • the shim layer 125 receives a plurality of rendering 605 - 615 and display operations for execution by the GPU on the unattached adapter 215 .
  • the shim layer 125 splits each display operation into a set of commands including 1) a copy 620 - 630 from a frame buffer 216 of the GPU on the unattached adapter 215 to a corresponding buffer in system memory 220 having shared access with the GPU on the attached adapter 210 , 2) a copy 635 , 640 from the buffer in shared system memory 220 to a frame buffer of the GPU on the primary adapter 210 , and 3) a present 645 , 650 on the primary display 240 by the GPU on the primary adapter 210 .
  • the copy and present operations on the first and second GPUs 210 , 215 are synchronized.
  • the frame buffers 211 , 216 and shared system memory 220 may be double or ring buffered.
  • the current rendering operations is stored in a given one of the double buffers 605 and the other one of the double buffers is blitted to a corresponding given one of the double buffers of the system memory.
  • the next rendering operation is stored in the other one of the double buffers and the content of the given one of the double buffers is blitted 620 to the corresponding other one of the double buffers of the system memory.
  • the rendering and blitting alternate back and forth between the buffers of the frame buffer of the second GPU 215 .
  • the blit to system memory is executed asynchronously.
  • the frame buffer of the second GPU 215 is double buffered and the corresponding buffer in system memory 220 is a three buffer ring buffer.
  • the second GPU 210 After the corresponding one of the double buffers of the frame buffer 216 in the second GPU 215 is blitted 620 to the system memory 220 , the second GPU 210 generates an interrupt to the OS.
  • the OS is programmed to signal an event to the shim layer 125 in response to the interrupt and the shim layer 125 is programmed to wait on the event before sending a copy command 635 and a present command 645 to the first GPU 210 .
  • the display thread the shim layer waits for receipt of the event indicating that the copy from the frame buffer to system memory is done, referred to herein after as the copy event interrupt.
  • a separate thread is used so that the rendering commands on the first and second GPUs 210 , 215 are not stalled in the application thread while waiting for the copy event interrupt.
  • the display thread may also have a higher priority than the application thread.
  • a race condition may occur where the next rendering to a given one of the double buffers for the second GPU 215 begins before the previous copy from the given buffer is complete.
  • a plurality of copy event interrupts may be utilized.
  • a ring buffer and four events are utilized.
  • the display thread Upon receipt of the copy event interrupt, the display thread queues the blit from system memory 220 and the present call into the first GPU 210 .
  • the first GPU 210 blits the given one of the system memory 220 buffers to a corresponding given one of the frame buffers of the first GPU 210 .
  • the content of the given one of the frame buffers of the first GPU 210 is presented on the primary display 240 .
  • the corresponding other of the system memory 220 buffers is blitted into the other one of the frame buffer of the first GPU 210 and then the content is presented on the primary display 240 .
  • the copy event interrupt is used to delay programming, thereby effectively delaying the scheduling of the copy from system memory 220 to the frame buffer of the first GPU 210 and presenting on the primary display 240 .
  • a notification on the display side indicates that the frame has been present on the display 240 by the first GPU 210 .
  • the OS is programmed to signal an event when the command buffer causing the first GPU 210 to present its frame buffer on the display is done executing.
  • the notification maintains synchronization where an application runs with vertical blank (vblank) synchronization.
  • FIG. 7 an exemplary set of render and display operations, in accordance with another embodiment of the present technology, is shown.
  • the rendering and copy operations executed on the second GPU 215 may be performed by different engines. Therefore, the rendering and copy operations may be performed substantially simultaneously in the second GPU 215 .
  • the second GPU 215 is coupled to the system memory 220 by a bus having a relatively high bandwidth.
  • the bus coupling the second GPU 215 may not provide sufficient bandwidth for blitting the frame buffer 216 of the second GPU 215 to system memory 220 .
  • an application may be rendered at a resolution of 1280 ⁇ 1024 pixels. Therefore, approximately 5 MB/frame of RGB data is rendered. If the application renders at 100 frame/s, than the second GPU needs approximately 500 MB/s for blitting upstream to the system memory 220 .
  • a Peripheral Component Interconnect Express (PCIe) 1 ⁇ bus typically used to couple the second GPU 215 system memory 220 has a bandwidth of approximately 250 MB/s in each direction.
  • PCIe Peripheral Component Interconnect Express
  • the second GPU 215 renders frames of RGB data, at 810 .
  • the frames of RGB data are converted using a pixel shader in the second GPU 215 to YUV sub-sample data.
  • the RGB data is processed as texture data by the pixel shader in three passes to generate YUV sub-sample data.
  • the U and V components are sub-sampled spatially, however, the Y is not sub-sampled.
  • the RGB data may be converted to YUV data using the 4.2.0 color space conversion algorithm.
  • the YUV sub-sample data is blitted to the corresponding buffers in the system memory with an asynchronous copy engine of the second GPU.
  • the YUV sub-sample data is blitted from the system memory to buffers of the first GPU, at 840 .
  • the YUV data is blitted to corresponding texture buffers in the second GPU.
  • the Y, U, and V sub-sample data are buffered in three corresponding buffers, and therefore the copy from frame buffer of the second GPU 215 to the system memory 220 and the copy from system memory 220 to the texture buffers of first GPU 210 are each implemented by sets of three copies.
  • the YUV sub-sample data is converted using a pixel shader in the first GPU 210 to recreate the RGB frame data, at 850 .
  • the device driver interface on the attached adapter is programmed to render a full screened aligned quad from the corresponding texture buffers holding the YUV data.
  • the recreated RGB frame data is then presented on the primary display 240 by the first GPU 210 . Accordingly, the shaders are utilized to provide YUV compression and decompression.
  • each buffer of Y, U and V samples is double buffered in the frame buffer of the second GPU 215 and the system memory 220 .
  • the Y, U and V samples copied into the first GPU 210 are double buffered as textures.
  • the Y, U and V sample buffers in the second GPU 215 and corresponding texture buffers in the first GPU 210 are each double buffered.
  • the Y, U and V sample buffered in the system memory 220 may each be triple buffered.
  • the shim layer 125 tracks the bandwidth needed for blitting and the efficiency of transfers on the bus to enable the compression or not.
  • the shim layer 125 enables the YUV compression or not based on the type of application.
  • the shim layer 125 may enable compression for game application but not for technical applications such as a Computer Aided Drawing (CAD) application.
  • CAD Computer Aided Drawing
  • the white list accessed by the shim layer 125 to determine if graphics requests should be executed on the first GPU 210 or the second GPU 215 is loaded and updated by the a vendor and/or system administrator.
  • a graphical user interface can be provided to allow the user to specific the use of the second GPU (e.g., discrete GPU) 215 for rendering a given application. The user may right click on the icon for the given application.
  • a graphical user interface may be generated that allows the user to specify the second GPU for use when rendering image for the given application.
  • the operating system is programmed to populate the graphical interface with a choice to run the given application on the GPU on the unattached adapter.
  • a routine e.g., dynamic linked library registered to handle this context menu item will scan the shortcut link to the application, gather up the options and argument, and then call an application launcher that will spawn a process to launch the application as well as setting an environment variable that will be read by the shim layer 125 .
  • the shim layer 125 will run the graphics context for the given application on the second GPU 215 . Therefore, the user can override, update, or the like, the white list loaded on the computing device.
  • an exemplary desktop 910 including an exemplary graphical user interface for selection of the GPU to run a given application on is shown.
  • the desktop includes icons 920 - 950 for one or more applications.
  • a pull-down menu 970 is generated.
  • the pull-down menu 970 is populated with an additional item of ‘run on dGPU’ or the like.
  • the menu item for the second GPU 215 may provide for product branding by identifying the manufacturer and/or model of the second GPU. If the user selects the ‘run’ item or double left clicks on the icon, the graphics requests from the given application will run on the GPU on the primary adapter (e.g., the default iGPU) 210 . If the user selects the ‘run on dGPU’ item, the graphics requests from the given application will run on the GPU on the unattached adapter (e.g., dGPU) 215 .
  • the primary adapter e.g., the default iGPU
  • the second graphics processing unit may support a set of rendering application programming interfaces and the first graphics processing unit may support a limited subset of the same application programming interfaces.
  • An application programming interface is implemented by a different runtime API 120 and a matching driver interface 130 .
  • FIG. 10 a graphics co-processing technique, in accordance with another embodiment of the present technology, is shown.
  • the runtime API 120 loads a shim layer 125 that will support all device driver interfaces.
  • the shim layer 125 loads and configures the DDI 130 for the first GPU 210 using a device driver interface that this one supports on the primary adapter and the DDI 135 for the second GPU 215 of a second device driver interface that can talk with the runtime API 120 .
  • the second GPU 215 may be a DirectX10 class device and the first GPU 210 may be a DirectX9 class device that does not support DirectX10.
  • the shim layer 125 appears to the DDI 130 for the first GPU 210 as a first application programming class runtime API (e.g., D3D9.dll), translates command between the two device driver interface classes and may also convert between display formats.
  • D3D9.dll first application programming class runtime API
  • the shim layer 125 includes a translation layer 126 that translates calls between the runtime API 120 device driver interface and the device driver interface class. In one implementation, the shim layer 125 translates display commands between the DirectX10 runtime API 120 and the DirectX9 DDI on the primary adapter 130 .
  • the shim layer therefore, creates a Dx9 compatible context on the first GPU 210 , which is the recipient of frames rendered by the Dx10 class second GPU 215 .
  • the shim layer 125 advantageously splits graphics commands into rendering and display commands, redirects the rendering commands to the DDI on the unattached adapter 135 and the display commands to the DDI on the primary adapter 130 .
  • the shim layer also translates between the commands for the Dx9 DDI on the primary adapter 130 , the Dx10 DDI on the unattached adapter 135 , the Dx10 runtime API 120 and Dx10 thunk layer 140 , and provides for format conversion of necessary.
  • the shim layer 125 intercepts commands from the Dx10 runtime 120 and translates these into the DX9 DDI on the primary adapter (e.g., iUMD.dll).
  • the commands may include: CreateResource, OpenResource, DestroyResource, DxgiPresent—which triggers the surface transfer mechanism that ends up with the surface displayed on the iGPU, DxgiRotateResourceIdentities, DxgiBlt—present blits are translated, and DxgiSetDisplayMode.
  • the Dx9 DDI 130 for the first GPU 210 cannot talkback directly through the runtime 120 to talk to a graphics adapter handled by an OS specific kernel mode driver because the runtime 120 expects the call to come from a Dx10 device.
  • the shim layer 125 intercepts callbacks from the Dx9 DDI and exchanges device handles, before forwarding the callback to the Dx10 runtime API 120 , which expects the calls to come from a Dx10 device.
  • Dx10 and Dx11 runtime APIs 120 use a layer for presentation called DXGI, which has its own present callback, not existing in the Dx9 callback interface. Therefore, when the display side DDI on the primary adapter calls the present callback, the shim layer translates it to a DXGI callback.
  • DXGI layer for presentation
  • the shim layer 125 may also include a data structure 127 for converting display formats between the first graphics processing unit DDI and the second graphics processing unit DDI.
  • the shim layer 125 may include a lookup table to convert a 10 bit rendering format in Dx10 to an 8 bit format supported by the Dx9 class integrated GPU 210 .
  • the rendered frame may be copied to a staging surface, a two-dimensional (2D) engine of the discrete GPU 215 utilizes the lookup table to convert the rendered frame to a Dx9 format.
  • the Dx9 format frame is then copied to the frame buffer of the integrated GPU 210 and then presented on the primary display 240 .
  • the following format conversions may be performed:
  • the copying and conversion can happen as an atomic operation.

Abstract

The graphics co-processing technique includes loading a shim layer library. The shim layer library loads and initializes a device driver interface of a first class on the primary adapter and a device driver interface of a second class on an unattached adapter. The shim layer also translates calls between the first device driver interface of the first class on the primary adapter and the second device driver interface of the second class on the unattached adapter.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This claims the benefit of U.S. Provisional Patent Application No. 61/243,155 filed Sep. 16, 2009 and U.S. Provisional Patent Application No. 61/243,164 filed Sep. 17, 2009.
  • BACKGROUND OF THE INVENTION
  • Conventional computing systems may include a discrete graphics processing unit (dGPU) or an integral graphics processing unit (iGPU). The discrete GPU and integral GPU are heterogeneous because of their different designs. The integrated GPU generally has relatively poor processing performance compared to the discrete GPU. However, the integrated GPU generally consumes less power compared to the discrete GPU.
  • The conventional operating system does not readily support co-processing using such heterogeneous GPUs. Referring to FIG. 1, a graphics processing technique according to the conventional art is shown. When an application 110 starts, it calls the user mode level runtime application programming interface (e.g., DirectX API d3d9.dll) 120 to determine what display adapters are available. In response, the runtime API 120 enumerates the adapters that are attached to the desktop (e.g., the primary display 180). A display adapter 165, 175, even recognized and initialized by the operating system, will not be enumerated in the adapter list by the runtime API 120 if it is not attached to the desktop. The runtime API 120 loads the device driver interface (DDI) (e.g., user mode driver (umd.dll)) 130 for the GPU 170 attached to the primary display 180. The runtime API 120 of the operating system will not load the DDI of the discrete GPU 175 because the discrete GPU 175 is not attached to the display adapter. The DDI 130 configures command buffers of the graphics processor 170 attached to the primary display 180. The DDI 130 will then call back to the runtime API 120 when the command buffers have been configured.
  • Thereafter, the application 110 makes graphics request to the user mode level runtime API (e.g., DirectX API d3d9.dll) 120 of the operating system. The runtime 120 sends graphics requests to the DDI 130 which configures command buffers. The DDI calls to the operating system kernel mode driver (e.g., DirectX driver dxgkrnl.sys) 150, through the runtime API 120, to schedule the graphics request. The operating system kernel mode driver then calls to the device specific kernel mode driver (e.g., kmd.sys) 150 to set the command register of the GPU 170 attached to the primary display 180 to execute the graphics requests from the command buffers. The device specific kernel mode driver 160 controls the GPU 170 (e.g., integral GPU) attached to the primary display 180.
  • Therefore, there is a need to enable co-processing on heterogeneous GPUs. For example, it may be desired to use a first GPU to perform graphics processing for a first class of applications and a second GPU for a second class of applications depending upon processing performance and power consumption parameters.
  • SUMMARY OF THE INVENTION
  • Embodiments of the present technology are directed toward graphics co-processing. The present technology may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiment of the present technology.
  • In one embodiment, a graphics co-processing method includes injecting an application initialization routine when an application starts. The injected application initialization routine includes an entry point that changes a search path for a device driver interface to a search path of a shim layer library. As a result, the loaded shim layer library initializes a device driver interface of a first class for a first graphics processing unit class on a primary adapter and a device driver interface of a second class for a second graphics processing unit on an unattached adapter. The shim translates calls between the first device driver interface of the first class and the second device driver interface of the second class.
  • In another embodiment, a graphics co-processing method includes loading a shim layer library, by a runtime application programming interface. The shim layer library loads and initializing a device driver interface on the primary adapter. The shim layer also loads and initializing a device driver interface on an unattached adapter. The shim layer also translates calls between the runtime application programming interface and commands of a first device driver interface class for the first device driver interface class. The shim layer may further convert a display format of a second device driver interface class for the device driver interface on an unattached adapter to a display format of the first device driver interface class.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments of the present technology are illustrated by way of example and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
  • FIG. 1 shows a graphics processing technique according to the convention art.
  • FIG. 2 shows a graphics co-processing computing platform, in accordance with one embodiment of the present technology.
  • FIG. 3 shows a graphics co-processing technique, in accordance with one embodiment of the present technology.
  • FIG. 4 shows a graphics co-processing technique, in accordance with another embodiment of the present technology.
  • FIG. 5 shows a method of synchronizing copy and present operations on a first and second GPU, in accordance with one embodiment of the present technology.
  • FIG. 6 shows an exemplary set of render and display operations, in accordance with one embodiment of the present technology.
  • FIG. 7 shows an exemplary set of render and display operations, in accordance with another embodiment of the present technology.
  • FIG. 8 shows a method of compressing rendered data, in accordance with one embodiment of the present technology.
  • FIG. 9 shows an exemplary desktop 910 including an exemplary graphical user interface for selection of the GPU to run a given application, in accordance with one embodiment of the present technology.
  • FIG. 10 shows a graphics co-processing technique, in accordance with another embodiment of the present technology.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Reference will now be made in detail to the embodiments of the present technology, examples of which are illustrated in the accompanying drawings. While the present technology will be described in conjunction with these embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present technology, numerous specific details are set forth in order to provide a thorough understanding of the present technology. However, it is understood that the present technology may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the present technology.
  • Embodiments of the present technology introduce a shim layer between the runtime API (e.g., DirectX) and the device driver interface (DDI) (e.g., user mode driver (UMD)) to separate the display commands from the rendering commands, allowing retargeting of rendering commands to an adapter other than the adapter the application is displaying on. In one implementation, the shim layer allows the DDI layer to redirect a runtime (e.g., Direct3D (D3D)) default adapter creation to an off-screen graphics processing unit (GPU), such as a discrete GPU, not attached to the desktop. The shim layer effectively layers the device driver interface, and therefore does not hook a system component.
  • Referring to FIG. 2, a graphics co-processing computing platform, in accordance with one embodiment of the present technology is shown. The exemplary computing platform may include one or more central processing units (CPUs) 205, a plurality of graphics processing units (GPUs) 210, 215, volatile and/or non-volatile memory (e.g., computer readable media) 220, 225, one or more chip sets 230, 235, and one or more peripheral devices 215, 240-260 communicatively coupled by one or more busses. The GPUs include heterogeneous designs. In one implementation, a first GPU may be an integral graphics processing unit (iGPU) and a second GPU may be a discrete graphics processing unit (dGPU). The chipset 230, 235 acts as a simple input/output hub for communicating data and instructions between the CPU 205, the GPUs 210, 215, the computing device- readable media 220, 225, and peripheral devices 215, 240-265. In one implementation, the chipset includes a northbridge 230 and southbridge 235. The northbridge 230 provides for communication between the CPU 205, system memory 220 and the southbridge 235. In one implementation, the northbridge 230 includes an integral GPU. The southbridge 235 provides for input/output functions. The peripheral devices 215, 240-265 may include a display device 240, a network adapter (e.g., Ethernet card) 245, CD drive, DVD drive, a keyboard, a pointing device, a speaker, a printer, and/or the like. In one implementation, the second graphics processing unit is coupled as a discrete GPU peripheral device 215 by a bus such as a Peripheral Component Interconnect Express (PCIe) bus.
  • The computing device- readable media 220, 225 may be characterized as primary memory and secondary memory. Generally, the secondary memory, such as a magnetic and/or optical storage, provides for non-volatile storage of computer-readable instructions and data for use by the computing device. For instance, the disk drive 225 may store the operating system (OS), applications and data. The primary memory, such as the system memory 220 and/or graphics memory, provides for volatile storage of computer-readable instructions and data for use by the computing device. For instance, the system memory 220 may temporarily store a portion of the operating system, a portion of one or more applications and associated data that are currently used by the CPU 205, GPU 210 and the like. In addition, the GPUs 210, 215 may include integral or discrete frame buffers 211, 216.
  • Referring to FIG. 3, a graphics co-processing technique, in accordance with one embodiment of the present technology, is shown. When an application 110 starts, it calls the user mode level runtime application programming interface (e.g., DirectX API d3d9.dll) 120 to determine what display adapters are available. In addition, an application initialization routine is injected when the application starts. In one implementation, the application initialization routine is a short dynamic link library (e.g., appln.dll). The application initialization routine injected in the application includes some entry points, one of which includes a call (e.g., set_dll_searchpath( )) to change the search path for the display device driver interface. During initialization, the search path for the device driver interface (e.g., c:\windows\system32\ . . . \umd.dll) is changed to the search path of a shim layer library (e.g., c:\ . . . \coproc\ . . . \umd.dll). Therefore the runtime API 120 will search for the same DDI name but in a different path, which will result in the runtime API 120 loading the shim layer 125.
  • The shim layer library 125 has the same entry points as a conventional display driver interface (DDI). The runtime API 120 passes one or more function pointers to the shim layer 125 when calling into the applicable entry point (e.g., OpenAdapter( ) in the shim layer 125. The function pointers passed to the shim layer 125 are call backs into the runtime API 120. The shim layer 125 stores the function pointers. The shim layer 125 loads and initializes the DDI on the primary adapter 130. The DDI on the primary adapter 130 returns a data structure pointer to the shim layer 125 representing the attached adapter. The shim layer 125 also loads and initializes the device driver interface on the unattached adapter 135 by passing two function pointers which are call backs into local functions of the shim layer 125. The DDI on the unattached adapter 135 also returns a data structure pointer to the shim layer 125 representing the unattached adapter. The data structure pointers returned by the DDI on the primary adapter 130 and unattached adapter 135 are stored by the shim layer 125. The shim layer 125 returns to the runtime API 120 a pointer to a composite data structure that contains the two handles. Accordingly, the DDI on the unattached adapter 135 is able to initialize without talking back to the runtime API 120.
  • In one implementation, the shim layer 125 is an independent library. The independent shim layer may be utilized when the primary GPU/display and the secondary GPU are provided by different vendors. In another implementation, the shim layer 125 may be integral to the display device interface on the unattached adapter. The shim layer integral to the display device driver may be utilized when the primary GPU/display and secondary GPU are from the same vendor.
  • The application initialization routine (e.g., appln.dll) injected in the application also includes other entry points, one of which includes an application identifier. In one implementation, the application identifier may be the name of the application. The shim layer 125 application makes a call to the injected application initialization routine (e.g., appln.dll) to determine the application identifier when a graphics command is received. The application identifier is compared with the applications in a white list (e.g., a text file). The white list indicates an affinity between one or more applications and the second graphics processing unit. In one implementation, the white list includes one or more applications that would perform better if executed on the second graphics processing unit.
  • If the application identifier is not on the white list, the shim layer 125 calls the device driver interface on the primary adapter 130. The device driver interface on the primary adapter 130 sets the command buffers. The device driver interface on the primary adapter then calls, through the runtime 120 and a thunk layer 140, to the operating system kernel mode driver (e.g., DirectX driver dxgkrnl.sys) 150. The operating system kernel mode driver 160 in turn schedules the graphics command with the device specific kernel mode driver (e.g., kmd.sys) 160 for the GPU 210 attached to the primary display 240. The GPU 210 attached to the primary display 240 is also referred to hereinafter as the first GPU. The device specific kernel mode driver 160 sets command register of the GPU 210 to execute the graphics command on the GPU 210 (e.g., integral GPU) attached to the primary display 240.
  • If the application identifier is a match to one or more identifiers on the white list, the handle from the runtime API 120 is swapped by the shim layer 125 with functions local to the shim layer 125. For a rendering command, the local function stored in the shim layer 125 will call into the DDI on the unattached adapter 135 to set command buffer. In response, the DDI on the unattached adapter 135 will call local functions in the shim layer 125 that route the call through the thunk layer 140 to the operating system kernel mode driver 150 to schedule the rendering command. The operating system kernel mode driver 150 calls the device specific kernel mode driver (e.g., dkmd.sys) 165 for the GPU on the unattached adapter 215 to set the command registers. The GPU on the unattached adapter 215 (e.g., discrete GPU) is also referred to hereinafter as the second GPU. Alternatively, the DDI on the unattached adapter 135 can call local functions in the thunk layer 140. The thunk layer 140 routes the graphics request to the operating system kernel mode driver (e.g., DirectX driver dxgkrnl.sys) 150. The operating system kernel mode driver 150 schedules the graphics command with the device specific kernel mode driver (e.g., dkmd.sys) 165 on the unattached adapter. The device specific kernel mode driver 165 controls the GPU on the unattached adapter 215.
  • For a display related command (e.g., Present( ), the shim layer 125 splits the display related command received from the application 110 into a set of commands for execution by the GPU on the unattached adapter 215 and another set of commands for execution by the GPU on the primary adapter 210. In one implementation, when the shim layer 125 receives a present call from the runtime 120, the shim layer 125 calls to the DDI on the unattached adapter 135 to cause a copy the frame buffer 216 of the GPU on the unattached adapter 215 to a corresponding buffer in system memory 220. The shim layer 125 will also call the DDI on the primary adapter 130 to cause a copy from the corresponding buffer in system memory 220 to the frame buffer 211 of the GPU on the attached adapter 210 and then a present by the GPU on the attached adapter 210. The memory accesses between the frame buffers 211, 216 and system memory 220 may be direct memory accesses (DMA). To synchronize the copy and presents on the GPUs 210, 215, a display thread is created, that is notified when the copy to system memory by the second GPU 215 is done. The display thread will then queue the copy from system memory 220 and the present call into the GPU on the attached adapter 210.
  • In another implementation, the operating system (e.g., Window7Starter) will not load a second graphics driver 165. Referring now to FIG. 4, a graphics co-processing technique, in accordance with another embodiment of the present technology, is shown. When the operation system will not load a second graphics driver, the second GPU 475 is tagged as a non-graphics device adapter that has its own driver 465. Therefore the second GPU 475 and its device specific kernel mode driver 465 are not seen by the operating system as a graphics adapter. In one implementation, the second GPU 475 and its driver 465 are tagged as a memory controller. The shim layer 125 loads and configures the DDI 130 for the first GPU 210 on the primary adapter and the DDI 135 for the second GPU 475 If there is a specified affinity for executing rendering commands from the application 110 on the second GPU 475, the shim layer 125 intercepts the rendering commands sent by the runtime API 120 to the DDI on the primary adapter 130, calls the DDI on the unattached adapter to sets the commands buffers for the second GPU 475, and routes them to the driver 465 for the second GPU 475. The shim layer 125 also intercepts the callbacks from the driver 465 for the second GPU 475 to the runtime 120. In another implementation, the shim layer 125 implements the DDI 135 for the second GPU 475. Accordingly, the shim layer 125 splits graphics command and redirects them to the two DDIs 130, 135.
  • Accordingly, the embodiments described with reference to FIG. 3, enables the application to run on a second GPU instead of a first GPU when the particular version of the operating system will allow the driver for the second GPU to be loaded but the runtime API will not allow a second device driver interface to be initialized. The embodiments described with reference to FIG. 4 enables an application to run on a second GPU, such as a discrete GPU, instead of a first GPU, such as an integrated GPU, when the particular version of the operation system (e.g., Win7Starter) will not allow the driver for the second GPU to be loaded. The DDI 135 for the second GPU 475 cannot talkback through the runtime 120 or the thunk layer 140 to a graphics adapter handled by an OS specific kernel mode driver.
  • Referring now to FIG. 5, a method of synchronizing the copy and present operations on the first and second GPUs is shown. The method is illustrated in FIG. 6 with reference to an exemplary set of render and display operations, in accordance with one embodiment of the present technology. At 510, the shim layer 125 receives a plurality of rendering 605-615 and display operations for execution by the GPU on the unattached adapter 215. At 520, the shim layer 125 splits each display operation into a set of commands including 1) a copy 620-630 from a frame buffer 216 of the GPU on the unattached adapter 215 to a corresponding buffer in system memory 220 having shared access with the GPU on the attached adapter 210, 2) a copy 635, 640 from the buffer in shared system memory 220 to a frame buffer of the GPU on the primary adapter 210, and 3) a present 645, 650 on the primary display 240 by the GPU on the primary adapter 210. At 530, the copy and present operations on the first and second GPUs 210, 215 are synchronized.
  • The frame buffers 211, 216 and shared system memory 220 may be double or ring buffered. In a double buffered implementation, the current rendering operations is stored in a given one of the double buffers 605 and the other one of the double buffers is blitted to a corresponding given one of the double buffers of the system memory. When the rendering operation is complete, the next rendering operation is stored in the other one of the double buffers and the content of the given one of the double buffers is blitted 620 to the corresponding other one of the double buffers of the system memory. The rendering and blitting alternate back and forth between the buffers of the frame buffer of the second GPU 215. The blit to system memory is executed asynchronously. In another implementation, the frame buffer of the second GPU 215 is double buffered and the corresponding buffer in system memory 220 is a three buffer ring buffer.
  • After the corresponding one of the double buffers of the frame buffer 216 in the second GPU 215 is blitted 620 to the system memory 220, the second GPU 210 generates an interrupt to the OS. In one implementation, the OS is programmed to signal an event to the shim layer 125 in response to the interrupt and the shim layer 125 is programmed to wait on the event before sending a copy command 635 and a present command 645 to the first GPU 210. In a thread separate from the application thread, referred to hereinafter as the display thread, the shim layer waits for receipt of the event indicating that the copy from the frame buffer to system memory is done, referred to herein after as the copy event interrupt. A separate thread is used so that the rendering commands on the first and second GPUs 210, 215 are not stalled in the application thread while waiting for the copy event interrupt. The display thread may also have a higher priority than the application thread.
  • A race condition may occur where the next rendering to a given one of the double buffers for the second GPU 215 begins before the previous copy from the given buffer is complete. In such case, a plurality of copy event interrupts may be utilized. In one implementation, a ring buffer and four events are utilized.
  • Upon receipt of the copy event interrupt, the display thread queues the blit from system memory 220 and the present call into the first GPU 210. The first GPU 210 blits the given one of the system memory 220 buffers to a corresponding given one of the frame buffers of the first GPU 210. When the blit operation is complete, the content of the given one of the frame buffers of the first GPU 210 is presented on the primary display 240. When the next copy and present commands are received by the first GPU 210, the corresponding other of the system memory 220 buffers is blitted into the other one of the frame buffer of the first GPU 210 and then the content is presented on the primary display 240. The blit and present alternate back and forth between the double buffered frame buffer of the first GPU 210. The copy event interrupt is used to delay programming, thereby effectively delaying the scheduling of the copy from system memory 220 to the frame buffer of the first GPU 210 and presenting on the primary display 240.
  • In one implementation, a notification on the display side indicates that the frame has been present on the display 240 by the first GPU 210. The OS is programmed to signal an event when the command buffer causing the first GPU 210 to present its frame buffer on the display is done executing. The notification maintains synchronization where an application runs with vertical blank (vblank) synchronization.
  • Referring now to FIG. 7, an exemplary set of render and display operations, in accordance with another embodiment of the present technology, is shown. The rendering and copy operations executed on the second GPU 215 may be performed by different engines. Therefore, the rendering and copy operations may be performed substantially simultaneously in the second GPU 215.
  • Generally, the second GPU 215 is coupled to the system memory 220 by a bus having a relatively high bandwidth. However, in some systems the bus coupling the second GPU 215 may not provide sufficient bandwidth for blitting the frame buffer 216 of the second GPU 215 to system memory 220. For example, an application may be rendered at a resolution of 1280×1024 pixels. Therefore, approximately 5 MB/frame of RGB data is rendered. If the application renders at 100 frame/s, than the second GPU needs approximately 500 MB/s for blitting upstream to the system memory 220. However, a Peripheral Component Interconnect Express (PCIe) 1× bus typically used to couple the second GPU 215 system memory 220 has a bandwidth of approximately 250 MB/s in each direction. Referring now to FIG. 8, a method of compressing rendered data, in accordance with one embodiment of the present technology is shown. The second GPU 215 renders frames of RGB data, at 810. At 820, the frames of RGB data are converted using a pixel shader in the second GPU 215 to YUV sub-sample data. The RGB data is processed as texture data by the pixel shader in three passes to generate YUV sub-sample data. In one implementation, the U and V components are sub-sampled spatially, however, the Y is not sub-sampled. The RGB data may be converted to YUV data using the 4.2.0 color space conversion algorithm. At 830, the YUV sub-sample data is blitted to the corresponding buffers in the system memory with an asynchronous copy engine of the second GPU. The YUV sub-sample data is blitted from the system memory to buffers of the first GPU, at 840. The YUV data is blitted to corresponding texture buffers in the second GPU. The Y, U, and V sub-sample data are buffered in three corresponding buffers, and therefore the copy from frame buffer of the second GPU 215 to the system memory 220 and the copy from system memory 220 to the texture buffers of first GPU 210 are each implemented by sets of three copies. The YUV sub-sample data is converted using a pixel shader in the first GPU 210 to recreate the RGB frame data, at 850. The device driver interface on the attached adapter is programmed to render a full screened aligned quad from the corresponding texture buffers holding the YUV data. At 860, the recreated RGB frame data is then presented on the primary display 240 by the first GPU 210. Accordingly, the shaders are utilized to provide YUV compression and decompression.
  • In one implementation, each buffer of Y, U and V samples is double buffered in the frame buffer of the second GPU 215 and the system memory 220. In addition, the Y, U and V samples copied into the first GPU 210 are double buffered as textures. In another implementation, the Y, U and V sample buffers in the second GPU 215 and corresponding texture buffers in the first GPU 210 are each double buffered. The Y, U and V sample buffered in the system memory 220 may each be triple buffered.
  • In one implementation, the shim layer 125 tracks the bandwidth needed for blitting and the efficiency of transfers on the bus to enable the compression or not. In another implementation, the shim layer 125 enables the YUV compression or not based on the type of application. For example, the shim layer 125 may enable compression for game application but not for technical applications such as a Computer Aided Drawing (CAD) application.
  • In one embodiment the white list accessed by the shim layer 125 to determine if graphics requests should be executed on the first GPU 210 or the second GPU 215 is loaded and updated by the a vendor and/or system administrator. In another embodiment, a graphical user interface can be provided to allow the user to specific the use of the second GPU (e.g., discrete GPU) 215 for rendering a given application. The user may right click on the icon for the given application. In response to the user selection, a graphical user interface may be generated that allows the user to specify the second GPU for use when rendering image for the given application. In one implementation, the operating system is programmed to populate the graphical interface with a choice to run the given application on the GPU on the unattached adapter. A routine (e.g., dynamic linked library) registered to handle this context menu item will scan the shortcut link to the application, gather up the options and argument, and then call an application launcher that will spawn a process to launch the application as well as setting an environment variable that will be read by the shim layer 125. In response, the shim layer 125 will run the graphics context for the given application on the second GPU 215. Therefore, the user can override, update, or the like, the white list loaded on the computing device.
  • Referring now to FIG. 9, an exemplary desktop 910 including an exemplary graphical user interface for selection of the GPU to run a given application on is shown. The desktop includes icons 920-950 for one or more applications. When the user right clicks on a given application, 930 a pull-down menu 970 is generated. The pull-down menu 970 is populated with an additional item of ‘run on dGPU’ or the like. The menu item for the second GPU 215 may provide for product branding by identifying the manufacturer and/or model of the second GPU. If the user selects the ‘run’ item or double left clicks on the icon, the graphics requests from the given application will run on the GPU on the primary adapter (e.g., the default iGPU) 210. If the user selects the ‘run on dGPU’ item, the graphics requests from the given application will run on the GPU on the unattached adapter (e.g., dGPU) 215.
  • In another implementation, the second graphics processing unit may support a set of rendering application programming interfaces and the first graphics processing unit may support a limited subset of the same application programming interfaces. An application programming interface is implemented by a different runtime API 120 and a matching driver interface 130. Referring now to FIG. 10, a graphics co-processing technique, in accordance with another embodiment of the present technology, is shown. The runtime API 120 loads a shim layer 125 that will support all device driver interfaces. The shim layer 125 loads and configures the DDI 130 for the first GPU 210 using a device driver interface that this one supports on the primary adapter and the DDI 135 for the second GPU 215 of a second device driver interface that can talk with the runtime API 120. For example, in one implementation, the second GPU 215 may be a DirectX10 class device and the first GPU 210 may be a DirectX9 class device that does not support DirectX10. The shim layer 125 appears to the DDI 130 for the first GPU 210 as a first application programming class runtime API (e.g., D3D9.dll), translates command between the two device driver interface classes and may also convert between display formats.
  • The shim layer 125 includes a translation layer 126 that translates calls between the runtime API 120 device driver interface and the device driver interface class. In one implementation, the shim layer 125 translates display commands between the DirectX10 runtime API 120 and the DirectX9 DDI on the primary adapter 130. The shim layer, therefore, creates a Dx9 compatible context on the first GPU 210, which is the recipient of frames rendered by the Dx10 class second GPU 215. The shim layer 125 advantageously splits graphics commands into rendering and display commands, redirects the rendering commands to the DDI on the unattached adapter 135 and the display commands to the DDI on the primary adapter 130. The shim layer also translates between the commands for the Dx9 DDI on the primary adapter 130, the Dx10 DDI on the unattached adapter 135, the Dx10 runtime API 120 and Dx10 thunk layer 140, and provides for format conversion of necessary. The shim layer 125, in one implementation, intercepts commands from the Dx10 runtime 120 and translates these into the DX9 DDI on the primary adapter (e.g., iUMD.dll). The commands may include: CreateResource, OpenResource, DestroyResource, DxgiPresent—which triggers the surface transfer mechanism that ends up with the surface displayed on the iGPU, DxgiRotateResourceIdentities, DxgiBlt—present blits are translated, and DxgiSetDisplayMode.
  • The Dx9 DDI 130 for the first GPU 210 cannot talkback directly through the runtime 120 to talk to a graphics adapter handled by an OS specific kernel mode driver because the runtime 120 expects the call to come from a Dx10 device. The shim layer 125 intercepts callbacks from the Dx9 DDI and exchanges device handles, before forwarding the callback to the Dx10 runtime API 120, which expects the calls to come from a Dx10 device. Dx10 and Dx11 runtime APIs 120 use a layer for presentation called DXGI, which has its own present callback, not existing in the Dx9 callback interface. Therefore, when the display side DDI on the primary adapter calls the present callback, the shim layer translates it to a DXGI callback. For example:
  • PFND3DDDI_PRESENTCB->PFNDDXGIDDI_PRESENTCB
  • The shim layer 125 may also include a data structure 127 for converting display formats between the first graphics processing unit DDI and the second graphics processing unit DDI. For example, the shim layer 125 may include a lookup table to convert a 10 bit rendering format in Dx10 to an 8 bit format supported by the Dx9 class integrated GPU 210. The rendered frame may be copied to a staging surface, a two-dimensional (2D) engine of the discrete GPU 215 utilizes the lookup table to convert the rendered frame to a Dx9 format. The Dx9 format frame is then copied to the frame buffer of the integrated GPU 210 and then presented on the primary display 240. For example, the following format conversions may be performed:
  • DXGI_FORMAT_R16G16B16A16_FLOAT(render)->D3DDDIFMT_A8R8G8B8(display), DXGI_FORMAT_R10G10B10A2_UNORM(render)->D3DDDIFMT_A8R8G8B8(display).
    In one implementation, the copying and conversion can happen as an atomic operation.
  • The foregoing descriptions of specific embodiments of the present technology have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the present technology and its practical application, to thereby enable others skilled in the art to best utilize the present technology and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

Claims (20)

What is claimed is:
1. One or more computing device readable media having computing device executable instructions which when executed perform a method comprising:
injecting an application initialization routine, when an application starts, that includes an entry point that changes a search path for a device driver interface to a search path of a shim layer library; and
loading the shim layer library, at the changed search path, that initializes a device driver interface of a first class for a first graphics processing unit class on a primary adapter and a device driver interface of a second class for a second graphics processing unit on an unattached adapter, wherein the shim layer library translates calls between the first device driver interface of the first class and the second device driver interface of the second class.
2. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the device driver interface on the primary adapter comprises a DirectX9 user mode driver dynamic linked library (UMD.dll) for the first graphics processing unit.
3. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the device driver interface on the unattached adapter comprises a DirectX10 or DirectX11 user mode driver dynamic linked library (UMD.dll) for the second graphics processing unit.
4. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the application initialization routine comprises a dynamic linked library.
5. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the runtime application programming interface comprises a DirectX10 or DirectX11 application programming interface (D3D10.dll or D3D11.dll).
6. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the first graphics processing unit comprises an integrated graphics processing unit.
7. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the second graphics processing unit comprises a discrete graphics processing unit.
8. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the first graphics processing unit and the second graphics processing unit are heterogeneous graphics processing units.
9. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 1, wherein the shim layer library converts a display format between the first device driver interface of the first class and the second device driver interface of the class.
10. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 9, wherein the shim layer uses a lookup table to convert the display format between the first device driver interface of the first class and the second device driver interface of the class.
11. One or more computing device readable media having computing device executable instructions which when executed perform a method comprising:
loading a shim layer library, by a runtime application programming interface;
loading and initializing a device driver interface on the primary adapter, by the shim layer library;
loading and initializing a device driver interface on an unattached adapter, by the shim layer library;
translating calls between the runtime application programming interface and commands of a first device driver interface class for the first device driver interface class, by the shim layer library; and
converting a display format of a second device driver interface class for the device driver interface on an unattached adapter to a display format of the first device driver interface class.
12. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 11, further comprising routing render commands from the runtime application to the second graphics processing unit running the same device driver interface class.
13. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 11, wherein converting the display format further comprises:
copying a frame by a second graphics processing unit of the second device driver interface class to a staging surface;
converting the rendered frame in the staging surface by a two-dimensional engine of the second graphics unit to a format of the first display driver model class; and
copying the frame in the format of the first device driver interface class to a frame buffer of a first graphics processing unit.
14. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 13, wherein converting the rendered frame in the format of the second device driver interface class to the format of the first device driver interface class comprises mapping the format of the second device driver interface class to the format of the first device driver interface class from a conversion lookup table.
15. The one or more computing device readable media having computing device executable instructions which when executed perform the method of claim 11, wherein the first graphics processing unit is an integrated graphics processing unit and the second graphics processing unit is a discrete graphics processing unit having a different design than the integrated graphics processing unit.
16. A method comprising:
loading a shim layer library;
loading and initializing a device driver interface of a first class on the primary adapter, by the shim layer library;
loading and initializing a device driver interface of a second class on an unattached adapter, by the shim layer library; and
translating calls between the first device driver interface of the first class on the primary adapter and the second device driver interface of the second class on the unattached adapter, by the shim layer library.
17. The method according to claim 16, wherein the translated calls comprise resource management and presentation functions.
18. The method according to claim 16, wherein translating calls between the first device driver interface of the first class on the primary adapter and the second device driver interface of the second class on the unattached adapter comprises:
intercepting callbacks from the first device driver interface of the first class on the primary adapter to a runtime; and
exchanging the handles of the first class with corresponding handles of callbacks of the second class.
19. The method according to claim 16, wherein the device driver interface on the primary adapter comprises a DirectX9 user mode driver dynamic linked library (UMD.dll) for a first graphics processing unit.
20. The method according to claim 19, wherein the device driver interface on the unattached adapter comprises a DirectX10 or DirectX11 user mode driver dynamic linked library (UMD.dll) for a second graphics processing unit.
US12/649,864 2009-09-16 2009-12-30 Co-processing techniques on heterogeneous gpus having different device driver interfaces Abandoned US20110067038A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/649,864 US20110067038A1 (en) 2009-09-16 2009-12-30 Co-processing techniques on heterogeneous gpus having different device driver interfaces

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US24315509P 2009-09-16 2009-09-16
US24316409P 2009-09-17 2009-09-17
US12/649,864 US20110067038A1 (en) 2009-09-16 2009-12-30 Co-processing techniques on heterogeneous gpus having different device driver interfaces

Publications (1)

Publication Number Publication Date
US20110067038A1 true US20110067038A1 (en) 2011-03-17

Family

ID=43730074

Family Applications (6)

Application Number Title Priority Date Filing Date
US12/649,253 Abandoned US20110063305A1 (en) 2009-09-16 2009-12-29 Co-processing techniques on heterogeneous graphics processing units
US12/649,326 Abandoned US20110063304A1 (en) 2009-09-16 2009-12-29 Co-processing synchronizing techniques on heterogeneous graphics processing units
US12/649,317 Abandoned US20110063306A1 (en) 2009-09-16 2009-12-29 CO-PROCESSING TECHNIQUES ON HETEROGENEOUS GPUs INCLUDING IDENTIFYING ONE GPU AS A NON-GRAPHICS DEVICE
US12/649,310 Abandoned US20110063309A1 (en) 2009-09-16 2009-12-29 User interface for co-processing techniques on heterogeneous graphics processing units
US12/649,329 Active 2031-10-23 US8773443B2 (en) 2009-09-16 2009-12-30 Compression for co-processing techniques on heterogeneous graphics processing units
US12/649,864 Abandoned US20110067038A1 (en) 2009-09-16 2009-12-30 Co-processing techniques on heterogeneous gpus having different device driver interfaces

Family Applications Before (5)

Application Number Title Priority Date Filing Date
US12/649,253 Abandoned US20110063305A1 (en) 2009-09-16 2009-12-29 Co-processing techniques on heterogeneous graphics processing units
US12/649,326 Abandoned US20110063304A1 (en) 2009-09-16 2009-12-29 Co-processing synchronizing techniques on heterogeneous graphics processing units
US12/649,317 Abandoned US20110063306A1 (en) 2009-09-16 2009-12-29 CO-PROCESSING TECHNIQUES ON HETEROGENEOUS GPUs INCLUDING IDENTIFYING ONE GPU AS A NON-GRAPHICS DEVICE
US12/649,310 Abandoned US20110063309A1 (en) 2009-09-16 2009-12-29 User interface for co-processing techniques on heterogeneous graphics processing units
US12/649,329 Active 2031-10-23 US8773443B2 (en) 2009-09-16 2009-12-30 Compression for co-processing techniques on heterogeneous graphics processing units

Country Status (1)

Country Link
US (6) US20110063305A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8259119B1 (en) * 2007-11-08 2012-09-04 Nvidia Corporation System and method for switching between graphical processing units
US20120227057A1 (en) * 2011-03-04 2012-09-06 Microsoft Corporation Driver Shimming
US20120229481A1 (en) * 2010-12-13 2012-09-13 Ati Technologies Ulc Accessibility of graphics processing compute resources
US20130021353A1 (en) * 2011-07-18 2013-01-24 Apple Inc. Virtual GPU
WO2013106491A1 (en) 2012-01-13 2013-07-18 Microsoft Corporation Para-virtualized domain, hull, and geometry shaders
US20140055470A1 (en) * 2012-05-02 2014-02-27 Nvidia Corporation Host Context Techniques for Server Based Graphics Processing
CN103718156A (en) * 2011-07-29 2014-04-09 英特尔公司 CPU/GPU synchronization mechanism
US20150009221A1 (en) * 2013-07-05 2015-01-08 Nvidia Corporation Direct interfacing of an external graphics card to a data processing device at a motherboard-level
US9003363B2 (en) 2011-03-21 2015-04-07 Microsoft Technology Licensing, Llc Device flags
US9176795B2 (en) 2010-12-13 2015-11-03 Advanced Micro Devices, Inc. Graphics processing dispatch from user mode
US9176794B2 (en) 2010-12-13 2015-11-03 Advanced Micro Devices, Inc. Graphics compute process scheduling
US9311169B2 (en) 2012-05-02 2016-04-12 Nvidia Corporation Server based graphics processing techniques
US9542715B2 (en) 2012-05-02 2017-01-10 Nvidia Corporation Memory space mapping techniques for server based graphics processing
US9805439B2 (en) 2012-05-02 2017-10-31 Nvidia Corporation Memory space mapping techniques for server based graphics processing
US10657698B2 (en) * 2017-06-22 2020-05-19 Microsoft Technology Licensing, Llc Texture value patch used in GPU-executed program sequence cross-compilation
CN111736913A (en) * 2019-03-25 2020-10-02 华为技术有限公司 Class loading method and device

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9171350B2 (en) 2010-10-28 2015-10-27 Nvidia Corporation Adaptive resolution DGPU rendering to provide constant framerate with free IGPU scale up
KR20120069364A (en) * 2010-12-20 2012-06-28 삼성전자주식회사 Apparatus and method of processing the frame for considering processing capability and power consumption in multicore environment
US9465660B2 (en) 2011-04-11 2016-10-11 Hewlett Packard Enterprise Development Lp Performing a task in a system having different types of hardware resources
US9135189B2 (en) 2011-09-07 2015-09-15 Microsoft Technology Licensing, Llc Delivering GPU resources across machine boundaries
US9830288B2 (en) * 2011-12-19 2017-11-28 Nvidia Corporation System and method for transmitting graphics rendered on a primary computer to a secondary computer
US9338036B2 (en) 2012-01-30 2016-05-10 Nvidia Corporation Data-driven charge-pump transmitter for differential signaling
US9019289B2 (en) * 2012-03-07 2015-04-28 Qualcomm Incorporated Execution of graphics and non-graphics applications on a graphics processing unit
US9329877B2 (en) 2012-03-18 2016-05-03 Microsoft Technology Licensing, Llc Static verification of parallel program code
US9153539B2 (en) * 2013-03-15 2015-10-06 Nvidia Corporation Ground-referenced single-ended signaling connected graphics processing unit multi-chip module
EP3126971B1 (en) * 2014-03-30 2020-04-29 Universiteit Gent Program execution on heterogeneous platform
CN104244087B (en) * 2014-09-19 2018-05-01 青岛海信移动通信技术股份有限公司 A kind of method and apparatus of Video Rendering
US9766918B2 (en) * 2015-02-23 2017-09-19 Red Hat Israel, Ltd. Virtual system device identification using GPU to host bridge mapping
CN105163128B (en) * 2015-08-31 2018-04-13 华南理工大学 A kind of screen picture acquisition methods for accelerating image to change parallel using GPU
US10575007B2 (en) 2016-04-12 2020-02-25 Microsoft Technology Licensing, Llc Efficient decoding and rendering of blocks in a graphics pipeline
US10157480B2 (en) 2016-06-24 2018-12-18 Microsoft Technology Licensing, Llc Efficient decoding and rendering of inter-coded blocks in a graphics pipeline
US11197010B2 (en) 2016-10-07 2021-12-07 Microsoft Technology Licensing, Llc Browser-based video decoder using multiple CPU threads
CN110177214B (en) * 2019-06-28 2021-09-24 Oppo广东移动通信有限公司 Image processor, image processing method, photographing device and electronic equipment
CN110430444B (en) * 2019-08-12 2022-06-07 中科寒武纪科技股份有限公司 Video stream processing method and system
US20220317756A1 (en) * 2021-03-31 2022-10-06 Advanced Micro Devices, Inc. Graphics processing unit (gpu) selection based on a utilized power source
CN113867961B (en) * 2021-09-30 2022-07-22 中国矿业大学(北京) Heterogeneous GPU cluster deep learning hybrid load scheduling optimization method
CN115660940B (en) * 2022-11-11 2023-04-28 北京麟卓信息科技有限公司 Graphic application frame rate synchronization method based on vertical blanking simulation

Citations (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5742607A (en) * 1995-12-20 1998-04-21 Intel Corporation Method and apparatus for controlling two way communication via disparate physical media
US6289396B1 (en) * 1995-11-21 2001-09-11 Diamond Multimedia Systems, Inc. Dynamic programmable mode switching device driver architecture
US6323875B1 (en) * 1999-04-28 2001-11-27 International Business Machines Corporation Method for rendering display blocks on display device
US6337717B1 (en) * 1997-11-21 2002-01-08 Xsides Corporation Alternate display content controller
US6411302B1 (en) * 1999-01-06 2002-06-25 Concise Multimedia And Communications Inc. Method and apparatus for addressing multiple frame buffers
US20020154214A1 (en) * 2000-11-02 2002-10-24 Laurent Scallie Virtual reality game system using pseudo 3D display driver
US20030014561A1 (en) * 2001-07-13 2003-01-16 Cooper Neil A. System for loading device-specific code and method thereof
US20030131147A1 (en) * 2002-01-04 2003-07-10 Microsoft Corporation Systems and methods for managing drivers in a computing system
US20030140179A1 (en) * 2002-01-04 2003-07-24 Microsoft Corporation Methods and system for managing computational resources of a coprocessor in a computing system
US6677964B1 (en) * 2000-02-18 2004-01-13 Xsides Corporation Method and system for controlling a complementary user interface on a display surface
US20040032423A1 (en) * 1999-09-21 2004-02-19 Xsides Corporation Method and system for controlling a complementary user interface on a display surface
US6745385B1 (en) * 1999-09-01 2004-06-01 Microsoft Corporation Fixing incompatible applications by providing stubs for APIs
US20040177338A1 (en) * 2003-03-07 2004-09-09 Microsoft Corporation Method for testing a software shim
US20040231000A1 (en) * 2003-02-18 2004-11-18 Gossalia Anuj B. Video aperture management
US20050012749A1 (en) * 2003-07-15 2005-01-20 Nelson Gonzalez Multiple parallel processor computer graphics system
US20050149947A1 (en) * 2003-12-10 2005-07-07 Callender Robin L. Driver-specific context for kernel-mode shimming
US20060109266A1 (en) * 2004-06-29 2006-05-25 Sensable Technologies, Inc. Apparatus and methods for haptic rendering using data in a graphics pipeline
US20070094413A1 (en) * 2005-10-19 2007-04-26 Gabriel Salazar System and method for display sharing
US20070126749A1 (en) * 2005-12-01 2007-06-07 Exent Technologies, Ltd. System, method and computer program product for dynamically identifying, selecting and extracting graphical and media objects in frames or scenes rendered by a software application
US20070129990A1 (en) * 2005-12-01 2007-06-07 Exent Technologies, Ltd. System, method and computer program product for dynamically serving advertisements in an executing computer game based on the entity having jurisdiction over the advertising space in the game
US20070171222A1 (en) * 2006-01-23 2007-07-26 Autodesk, Inc. Application-independent method for capturing three-dimensional model data and structure for viewing and manipulation
US20070283175A1 (en) * 2006-05-30 2007-12-06 Ati Technologies Inc. Device Having Multiple Graphics Subsystems and Reduced Power Consumption Mode, Software and Methods
US20080012792A1 (en) * 2006-07-14 2008-01-17 Lenovo (Beijing) Limited Method for acquiring graphics device interface invocation by using filter driver
US20080042923A1 (en) * 2006-08-16 2008-02-21 Rick De Laet Systems, methods, and apparatus for recording of graphical display
US20080136825A1 (en) * 2003-11-19 2008-06-12 Reuven Bakalash PC-based computing system employing a multi-GPU graphics pipeline architecture supporting multiple modes of GPU parallelization dymamically controlled while running a graphics application
US20080163263A1 (en) * 2006-12-28 2008-07-03 Legend Holdings Ltd. Method for acquisition of gdi and direct x data
US20080168479A1 (en) * 2007-01-05 2008-07-10 Thomas Joseph Purtell Bypass Virtualization
US20080211816A1 (en) * 2003-07-15 2008-09-04 Alienware Labs. Corp. Multiple parallel processor computer graphics system
US20080316218A1 (en) * 2007-06-18 2008-12-25 Panologic, Inc. Remote graphics rendering across a network
US20090153540A1 (en) * 2007-12-13 2009-06-18 Advanced Micro Devices, Inc. Driver architecture for computer device having multiple graphics subsystems, reduced power consumption modes, software and methods
US20090172707A1 (en) * 2007-12-31 2009-07-02 S3 Graphics, Inc. Method and system for supporting multiple display devices
US7598953B2 (en) * 2004-11-05 2009-10-06 Microsoft Corporation Interpreter for simplified programming of graphics processor units in general purpose programming languages
US20090307699A1 (en) * 2008-06-06 2009-12-10 Munshi Aaftab A Application programming interfaces for data parallel computing on multiple processors
US7698579B2 (en) * 2006-08-03 2010-04-13 Apple Inc. Multiplexed graphics architecture for graphics power management
US20100302261A1 (en) * 2009-05-29 2010-12-02 Microsoft Corporation Fixed Function Pipeline Application Remoting Through A Shader Pipeline Conversion Layer

Family Cites Families (175)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4208810A (en) 1978-09-11 1980-06-24 The Singer Company Clipping polygon faces through a polyhedron of vision
US4918626A (en) 1987-12-09 1990-04-17 Evans & Sutherland Computer Corp. Computer graphics priority system with antialiasing
US5081594A (en) 1989-01-31 1992-01-14 Kroy Inc. Real-time rasterization system for a column-oriented printing apparatus or the like
US5212633A (en) 1989-08-18 1993-05-18 Sharedata System for transferring resident programs to virtual area and recalling for instant excution in memory limited DOS system using program control tables
EP0430501B1 (en) 1989-11-17 1999-02-03 Digital Equipment Corporation System and method for drawing antialiased polygons
JPH0683969A (en) 1990-11-15 1994-03-25 Internatl Business Mach Corp <Ibm> Graphics processor and method of graphics and data processing
US5237460A (en) 1990-12-14 1993-08-17 Ceram, Inc. Storage of compressed data on random access storage devices
EP0587783B1 (en) 1991-06-04 1997-10-15 Qualcomm, Inc. Adaptive block size image compression system
US5446836A (en) 1992-10-30 1995-08-29 Seiko Epson Corporation Polygon rasterization
US5570463A (en) 1993-01-06 1996-10-29 Compaq Computer Corporation Bresenham/DDA line draw circuitry
US5313287A (en) 1993-04-30 1994-05-17 Hewlett-Packard Company Imposed weight matrix error diffusion halftoning of image data
US6181822B1 (en) 1993-05-12 2001-01-30 The Duck Corporation Data compression apparatus and method
GB2278524B (en) 1993-05-28 1997-12-10 Nihon Unisys Ltd Method and apparatus for rendering visual images employing area calculation and blending of fractional pixel lists for anti-aliasing and transparency
US5684939A (en) 1993-07-09 1997-11-04 Silicon Graphics, Inc. Antialiased imaging with improved pixel supersampling
US5432898A (en) 1993-09-20 1995-07-11 International Business Machines Corporation System and method for producing anti-aliased lines
US5483258A (en) 1993-12-10 1996-01-09 International Business Machines Corporation Pick correlation
US5664162A (en) 1994-05-23 1997-09-02 Cirrus Logic, Inc. Graphics accelerator with dual memory controllers
US5633297A (en) 1994-11-04 1997-05-27 Ppg Industries, Inc. Cationic resin containing capped isocyanate groups suitable for use in electrodeposition
US6002411A (en) * 1994-11-16 1999-12-14 Interactive Silicon, Inc. Integrated video and memory controller with data processing and graphical processing capabilities
US5543935A (en) 1994-11-18 1996-08-06 Xerox Corporation Halftoning method using space filling curves
US5594854A (en) 1995-03-24 1997-01-14 3Dlabs Inc. Ltd. Graphics subsystem with coarse subpixel correction
US5623692A (en) 1995-05-15 1997-04-22 Nvidia Corporation Architecture for providing input/output operations in a computer system
US5734744A (en) 1995-06-07 1998-03-31 Pixar Method and apparatus for compression and decompression of color data
EP0840915A4 (en) 1995-07-26 1998-11-04 Raycer Inc Method and apparatus for span sorting rendering system
US5854637A (en) 1995-08-17 1998-12-29 Intel Corporation Method and apparatus for managing access to a computer system memory shared by a graphics controller and a memory controller
US5854631A (en) 1995-11-22 1998-12-29 Silicon Graphics, Inc. System and method for merging pixel fragments based on depth range values
US5815162A (en) 1996-04-19 1998-09-29 Silicon Graphics, Inc. System and method of drawing anti-aliased lines using a modified bresenham line-drawing algorithm
US5790125A (en) * 1996-04-22 1998-08-04 International Business Machines Corporation System and method for use in a computerized imaging system to efficiently transfer graphics information to a graphics subsystem employing masked span
US6104417A (en) 1996-09-13 2000-08-15 Silicon Graphics, Inc. Unified memory computer architecture with dynamic graphics memory allocation
US5790705A (en) 1996-09-13 1998-08-04 Apple Computer, Inc. Compression techniques for substantially lossless digital image data storage
US6115049A (en) 1996-09-30 2000-09-05 Apple Computer, Inc. Method and apparatus for high performance antialiasing which minimizes per pixel storage and object data bandwidth
US6160557A (en) 1996-10-17 2000-12-12 International Business Machines Corporation Method and apparatus providing efficient rasterization with data dependent adaptations
TW369746B (en) 1996-11-13 1999-09-11 Sanyo Electric Co Surround circuit
US6697063B1 (en) 1997-01-03 2004-02-24 Nvidia U.S. Investment Company Rendering pipeline
US6034699A (en) 1997-05-01 2000-03-07 Ati Technologies, Inc. Rendering polygons
US6028608A (en) 1997-05-09 2000-02-22 Jenkins; Barry System and method of perception-based image generation and encoding
US6249853B1 (en) 1997-06-25 2001-06-19 Micron Electronics, Inc. GART and PTES defined by configuration registers
GB2333678B (en) 1997-08-04 2002-03-27 Sony Corp Image data processing devices and methods
US6104407A (en) 1997-09-23 2000-08-15 Ati Technologies, Inc. Method and apparatus for processing fragment pixel information in a three-dimensional graphics processing system
US6201545B1 (en) 1997-09-23 2001-03-13 Ati Technologies, Inc. Method and apparatus for generating sub pixel masks in a three dimensional graphic processing system
US6160559A (en) 1997-09-30 2000-12-12 Intel Corporation Method and apparatus for providing frame-time feedback to graphics application programs
US6204859B1 (en) 1997-10-15 2001-03-20 Digital Equipment Corporation Method and apparatus for compositing colors of images with memory constraints for storing pixel data
US6128000A (en) 1997-10-15 2000-10-03 Compaq Computer Corporation Full-scene antialiasing using improved supersampling techniques
US6856320B1 (en) 1997-11-25 2005-02-15 Nvidia U.S. Investment Company Demand-based memory system for graphics applications
US6624823B2 (en) 1998-02-17 2003-09-23 Sun Microsystems, Inc. Graphics system configured to determine triangle orientation by octant identification and slope comparison
US6717578B1 (en) 1998-02-17 2004-04-06 Sun Microsystems, Inc. Graphics system with a variable-resolution sample buffer
US6003083A (en) 1998-02-19 1999-12-14 International Business Machines Corporation Workload management amongst server objects in a client/server network with distributed objects
US6137918A (en) 1998-03-23 2000-10-24 Xerox Corporation Memory efficient method and apparatus to enable tagging of thin antialiased lines
US6259460B1 (en) 1998-03-26 2001-07-10 Silicon Graphics, Inc. Method for efficient handling of texture cache misses by recirculation
US6115050A (en) 1998-04-08 2000-09-05 Webtv Networks, Inc. Object-based anti-aliasing
US6356588B1 (en) 1998-04-17 2002-03-12 Ayao Wada Method for digital compression of color images
US6606093B1 (en) 1998-05-19 2003-08-12 Microsoft Corporation Method and apparatus for antialiasing by gamma corrected area calculation
US6188394B1 (en) 1998-08-28 2001-02-13 Ati Technologies, Inc. Method and apparatus for video graphics antialiasing
US6611272B1 (en) 1998-07-02 2003-08-26 Microsoft Corporation Method and apparatus for rasterizing in a hierarchical tile order
US6366289B1 (en) 1998-07-17 2002-04-02 Microsoft Corporation Method and system for managing a display image in compressed and uncompressed blocks
US6646639B1 (en) 1998-07-22 2003-11-11 Nvidia Corporation Modified method and apparatus for improved occlusion culling in graphics systems
US6480205B1 (en) 1998-07-22 2002-11-12 Nvidia Corporation Method and apparatus for occlusion culling in graphics systems
US7068272B1 (en) 2000-05-31 2006-06-27 Nvidia Corporation System, method and article of manufacture for Z-value and stencil culling prior to rendering in a computer graphics processing pipeline
WO2000011607A1 (en) 1998-08-20 2000-03-02 Apple Computer, Inc. Deferred shading graphics pipeline processor
US6771264B1 (en) 1998-08-20 2004-08-03 Apple Computer, Inc. Method and apparatus for performing tangent space lighting and bump mapping in a deferred shading graphics processor
US6219070B1 (en) 1998-09-30 2001-04-17 Webtv Networks, Inc. System and method for adjusting pixel parameters by subpixel positioning
US6434649B1 (en) 1998-10-14 2002-08-13 Hitachi, Ltd. Data streamer
US6362819B1 (en) 1998-10-16 2002-03-26 Microsoft Corporation Texture tessellation for three-dimensional models
GB2343602B (en) 1998-11-06 2003-03-19 Videologic Ltd Shading 3-dimensional computer generated images
GB2343603B (en) 1998-11-06 2003-04-02 Videologic Ltd Shading 3-dimensional computer generated images
US6359623B1 (en) 1998-11-12 2002-03-19 Hewlett-Packard Company Method and apparatus for performing scan conversion in a computer graphics display system
US6614448B1 (en) 1998-12-28 2003-09-02 Nvidia Corporation Circuit and method for displaying images using multisamples of non-uniform color resolution
US7224364B1 (en) 1999-02-03 2007-05-29 Ati International Srl Optimal initial rasterization starting point
US6323874B1 (en) 1999-02-08 2001-11-27 Silicon Graphics, Inc. System and method for rendering an image
JP3581037B2 (en) 1999-02-18 2004-10-27 シャープ株式会社 Image processing device
US6437780B1 (en) 1999-03-17 2002-08-20 Nvidia Us Investment Company Method for determining tiles in a computer display that are covered by a graphics primitive
DE19917092A1 (en) 1999-04-15 2000-10-26 Sp3D Chip Design Gmbh Accelerated method for grid forming of graphic basic element in order beginning with graphic base element instruction data to produce pixel data for graphic base element
US7064771B1 (en) 1999-04-28 2006-06-20 Compaq Information Technologies Group, L.P. Method and apparatus for compositing colors of images using pixel fragments with Z and Z gradient parameters
US6501564B1 (en) 1999-05-07 2002-12-31 Hewlett-Packard Company Tone dependent plane dependent error diffusion halftoning
EP1056047A1 (en) 1999-05-20 2000-11-29 Mannesmann VDO Aktiengesellschaft Method and apparatus for antialiased imaging of graphical objects
US6518974B2 (en) 1999-07-16 2003-02-11 Intel Corporation Pixel engine
US6429877B1 (en) 1999-07-30 2002-08-06 Hewlett-Packard Company System and method for reducing the effects of aliasing in a computer graphics system
US7152207B1 (en) * 1999-11-05 2006-12-19 Decentrix Inc. Method and apparatus for providing conditional customization for generating a web site
US6452595B1 (en) 1999-12-06 2002-09-17 Nvidia Corporation Integrated graphics processing unit with antialiasing
US6198488B1 (en) 1999-12-06 2001-03-06 Nvidia Transform, lighting and rasterization system embodied on a single semiconductor platform
US6504542B1 (en) 1999-12-06 2003-01-07 Nvidia Corporation Method, apparatus and article of manufacture for area rasterization using sense points
US6765575B1 (en) 1999-12-06 2004-07-20 Nvidia Corporation Clip-less rasterization using line equation-based traversal
US6545684B1 (en) 1999-12-29 2003-04-08 Intel Corporation Accessing data stored in a memory
US6947057B2 (en) 2000-01-11 2005-09-20 Sun Microsystems, Inc. Rendering lines with sample weighting
US6469707B1 (en) 2000-01-19 2002-10-22 Nvidia Corporation Method for efficiently rendering color information for a pixel in a computer system
US6704022B1 (en) 2000-02-25 2004-03-09 Ati International Srl System for accessing graphics data from memory and method thereof
GB0007448D0 (en) 2000-03-29 2000-05-17 Discreet Logic Inc Gamma calibration
US6523102B1 (en) 2000-04-14 2003-02-18 Interactive Silicon, Inc. Parallel compression/decompression system and method for implementation of in-memory compressed cache improving storage density and access speed for industry standard memory subsystems and in-line memory modules
US6898646B1 (en) * 2000-05-03 2005-05-24 Hewlett-Packard Development Company, L.P. Highly concurrent DMA controller with programmable DMA channels
US6828983B1 (en) 2000-05-12 2004-12-07 S3 Graphics Co., Ltd. Selective super-sampling/adaptive anti-aliasing of complex 3D data
US7119809B1 (en) 2000-05-15 2006-10-10 S3 Graphics Co., Ltd. Parallel architecture for graphics primitive decomposition
US7167259B2 (en) 2000-05-16 2007-01-23 International Business Machines Corporation System and method for merging line work objects using tokenization and selective compression
US7126600B1 (en) 2000-08-01 2006-10-24 Ati International Srl Method and apparatus for high speed block mode triangle rendering
US7039241B1 (en) 2000-08-11 2006-05-02 Ati Technologies, Inc. Method and apparatus for compression and decompression of color data
US6633297B2 (en) 2000-08-18 2003-10-14 Hewlett-Packard Development Company, L.P. System and method for producing an antialiased image using a merge buffer
US7184059B1 (en) * 2000-08-23 2007-02-27 Nintendo Co., Ltd. Graphics system with copy out conversions between embedded frame buffer and main memory
US7002591B1 (en) 2000-08-23 2006-02-21 Nintendo Co., Ltd. Method and apparatus for interleaved processing of direct and indirect texture coordinates in a graphics system
US6747663B2 (en) 2000-08-24 2004-06-08 Sun Microsystems, Inc. Interpolating sample values from known triangle vertex values
US6597356B1 (en) 2000-08-31 2003-07-22 Nvidia Corporation Integrated tessellator in a graphics processing unit
US6961057B1 (en) 2000-10-12 2005-11-01 Nvidia Corporation Method and apparatus for managing and accessing depth data in a computer graphics system
JP3522250B2 (en) 2000-10-27 2004-04-26 株式会社ソニー・コンピュータエンタテインメント Partition creation method and deletion method, recording medium recording program, and information processing apparatus
US6633197B1 (en) 2000-10-27 2003-10-14 Marvell International, Ltd. Gate capacitor stress reduction in CMOS/BICMOS circuit
US6567099B1 (en) 2000-11-15 2003-05-20 Sony Corporation Method and system for dynamically allocating a frame buffer for efficient anti-aliasing
US6900800B2 (en) 2001-02-27 2005-05-31 David Robert Baldwin Tile relative origin for plane equations
US6819332B2 (en) 2001-02-27 2004-11-16 3Dlabs Inc. Ltd. Antialias mask generation
US6919900B2 (en) * 2001-03-23 2005-07-19 Microsoft Corporation Methods and systems for preparing graphics for display on a computing device
TW493782U (en) 2001-04-03 2002-07-01 Giantplus Technology Co Ltd Pixel driving module of liquid crystal display
US7986330B2 (en) 2001-04-12 2011-07-26 International Business Machines Corporation Method and apparatus for generating gammacorrected antialiased lines
US6803916B2 (en) 2001-05-18 2004-10-12 Sun Microsystems, Inc. Rasterization using two-dimensional tiles and alternating bins for improved rendering utilization
US7009615B1 (en) 2001-11-30 2006-03-07 Nvidia Corporation Floating point buffer system and method for use during programmable fragment processing in a graphics pipeline
JP5031954B2 (en) 2001-07-25 2012-09-26 パナソニック株式会社 Display device, display method, and recording medium recording display control program
US7145577B2 (en) 2001-08-31 2006-12-05 Micron Technology, Inc. System and method for multi-sampling primitives to reduce aliasing
US6938176B1 (en) 2001-10-05 2005-08-30 Nvidia Corporation Method and apparatus for power management of graphics processors and subsystems that allow the subsystems to respond to accesses when subsystems are idle
US6788301B2 (en) 2001-10-18 2004-09-07 Hewlett-Packard Development Company, L.P. Active pixel determination for line generation in regionalized rasterizer displays
AU2002352443A1 (en) 2001-12-21 2003-07-15 Consejo Superior De Investigaciones Cientificas Compounds and their therapeutic use related to the phosphorylating activity of the enzyme gsk-3
US6747658B2 (en) * 2001-12-31 2004-06-08 Intel Corporation Automatic memory management for zone rendering
US6693637B2 (en) 2001-12-31 2004-02-17 Intel Corporation Method and apparatus for determining bins to be updated for polygons, including lines
US6836808B2 (en) 2002-02-25 2004-12-28 International Business Machines Corporation Pipelined packet processing
US7177452B2 (en) 2002-04-10 2007-02-13 Battelle Memorial Institute Visualization of information with an established order
US7308146B2 (en) 2002-09-30 2007-12-11 Canon Kabushiki Kaisha Digital video compression
US7154066B2 (en) 2002-11-06 2006-12-26 Ultratech, Inc. Laser scanning apparatus and methods for thermal processing
US7075542B1 (en) 2002-11-12 2006-07-11 Ati Technologies Inc. Selectable multi-performance configuration
US7061495B1 (en) 2002-11-18 2006-06-13 Ati Technologies, Inc. Method and apparatus for rasterizer interpolation
US7633506B1 (en) 2002-11-27 2009-12-15 Ati Technologies Ulc Parallel pipeline graphics system
US8111928B2 (en) 2003-02-13 2012-02-07 Ati Technologies Ulc Method and apparatus for compression of multi-sampled anti-aliasing color data
US7199806B2 (en) 2003-03-19 2007-04-03 Sun Microsystems, Inc. Rasterization of primitives using parallel edge units
GB0307095D0 (en) 2003-03-27 2003-04-30 Imagination Tech Ltd Improvements to a tiling system for 3d rendered graphics
US7148890B2 (en) 2003-04-02 2006-12-12 Sun Microsystems, Inc. Displacement mapping by using two passes through the same rasterizer
US7006110B2 (en) 2003-04-15 2006-02-28 Nokia Corporation Determining a coverage mask for a pixel
US6989838B2 (en) 2003-06-26 2006-01-24 Intel Corporation Methods, systems, and data structures for generating a rasterizer
US6956579B1 (en) 2003-08-18 2005-10-18 Nvidia Corporation Private addressing in a multi-processor graphics processing system
US7124318B2 (en) 2003-09-18 2006-10-17 International Business Machines Corporation Multiple parallel pipeline processor having self-repairing capability
US7081902B1 (en) 2003-09-24 2006-07-25 Nvidia Corporation Apparatus, system, and method for gamma correction of smoothed primitives
CA2442603C (en) 2003-10-01 2016-11-22 Aryan Saed Digital composition of a mosaic image
US7184040B1 (en) 2003-11-21 2007-02-27 Nvidia Corporation Early stencil test rejection
JP4522199B2 (en) 2003-11-27 2010-08-11 キヤノン株式会社 Image encoding apparatus, image processing apparatus, control method therefor, computer program, and computer-readable storage medium
US20050122338A1 (en) 2003-12-05 2005-06-09 Michael Hong Apparatus and method for rendering graphics primitives using a multi-pass rendering approach
JP4064339B2 (en) 2003-12-19 2008-03-19 株式会社東芝 Drawing processing apparatus, drawing processing method, and drawing processing program
US20050134588A1 (en) 2003-12-22 2005-06-23 Hybrid Graphics, Ltd. Method and apparatus for image processing
US7551174B2 (en) 2003-12-23 2009-06-23 Via Technologies, Inc. Method and apparatus for triangle rasterization with clipping and wire-frame mode support
US6978317B2 (en) 2003-12-24 2005-12-20 Motorola, Inc. Method and apparatus for a mobile device to address a private home agent having a public address and a private address
JP4030519B2 (en) 2004-04-15 2008-01-09 株式会社東芝 Image processing apparatus and image processing system
US6940514B1 (en) 2004-04-26 2005-09-06 Sun Microsystems, Inc. Parallel initialization path for rasterization engine
US7382368B1 (en) 2004-06-28 2008-06-03 Nvidia Corporation Planar z representation for z compression
US7307628B1 (en) 2004-08-06 2007-12-11 Nvidia Corporation Diamond culling of small primitives
US7505043B2 (en) 2004-08-30 2009-03-17 Qualcomm Incorporated Cache efficient rasterization of graphics data
US7243191B2 (en) 2004-08-31 2007-07-10 Intel Corporation Compressing data in a cache memory
US7383412B1 (en) * 2005-02-28 2008-06-03 Nvidia Corporation On-demand memory synchronization for peripheral systems with multiple parallel processors
US7668386B2 (en) 2005-03-25 2010-02-23 Microsoft Corporation Lossless compression algorithms for spatial data
US7440140B2 (en) 2005-04-29 2008-10-21 Hewlett-Packard Development Company, L.P. Sequential color error diffusion with forward and backward exchange of information between color planes
US8427496B1 (en) 2005-05-13 2013-04-23 Nvidia Corporation Method and system for implementing compression across a graphics bus interconnect
JP4218840B2 (en) 2005-05-27 2009-02-04 株式会社ソニー・コンピュータエンタテインメント Drawing processing apparatus and drawing processing method
US20060282604A1 (en) * 2005-05-27 2006-12-14 Ati Technologies, Inc. Methods and apparatus for processing graphics data using multiple processing circuits
JP4408836B2 (en) 2005-05-30 2010-02-03 キヤノン株式会社 Image processing apparatus, control method therefor, and program
US7650603B2 (en) * 2005-07-08 2010-01-19 Microsoft Corporation Resource management for virtualization of graphics adapters
TWI322354B (en) * 2005-10-18 2010-03-21 Via Tech Inc Method and system for deferred command issuing in a computer system
US7483029B2 (en) 2005-12-15 2009-01-27 Nvidia Corporation GPU having raster components configured for using nested boustrophedonic patterns to traverse screen areas
US7634637B1 (en) 2005-12-16 2009-12-15 Nvidia Corporation Execution of parallel groups of threads with per-instruction serialization
US7791617B2 (en) 2005-12-19 2010-09-07 Nvidia Corporation Method and system for rendering polygons having abutting edges
US7487466B2 (en) * 2005-12-29 2009-02-03 Sap Ag Command line provided within context menu of icon-based computer interface
US7478187B2 (en) 2006-03-28 2009-01-13 Dell Products L.P. System and method for information handling system hot insertion of external graphics
US7612783B2 (en) * 2006-05-08 2009-11-03 Ati Technologies Inc. Advanced anti-aliasing with multiple graphics processing units
US7965902B1 (en) 2006-05-19 2011-06-21 Google Inc. Large-scale image processing using mass parallelization techniques
US20070268298A1 (en) 2006-05-22 2007-11-22 Alben Jonah M Delayed frame buffer merging with compression
TW200744019A (en) 2006-05-23 2007-12-01 Smedia Technology Corp Adaptive tile depth filter
US8928676B2 (en) 2006-06-23 2015-01-06 Nvidia Corporation Method for parallel fine rasterization in a raster stage of a graphics pipeline
US7843468B2 (en) 2006-07-26 2010-11-30 Nvidia Corporation Accellerated start tile search
US9070213B2 (en) 2006-07-26 2015-06-30 Nvidia Corporation Tile based precision rasterization in a graphics pipeline
KR101186295B1 (en) 2006-10-27 2012-09-27 삼성전자주식회사 Method and Apparatus for rendering 3D graphic object
US8427487B1 (en) 2006-11-02 2013-04-23 Nvidia Corporation Multiple tile output using interface compression in a raster stage
AU2006246497B2 (en) 2006-11-30 2010-02-11 Canon Kabushiki Kaisha Method and apparatus for hybrid image compression
US7907138B2 (en) * 2006-12-29 2011-03-15 Intel Corporation System co-processor
US8063903B2 (en) 2007-11-09 2011-11-22 Nvidia Corporation Edge evaluation techniques for graphics hardware
US20110194616A1 (en) 2008-10-01 2011-08-11 Nxp B.V. Embedded video compression for hybrid contents
US20100226441A1 (en) * 2009-03-06 2010-09-09 Microsoft Corporation Frame Capture, Encoding, and Transmission Management

Patent Citations (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6289396B1 (en) * 1995-11-21 2001-09-11 Diamond Multimedia Systems, Inc. Dynamic programmable mode switching device driver architecture
US5742607A (en) * 1995-12-20 1998-04-21 Intel Corporation Method and apparatus for controlling two way communication via disparate physical media
US6337717B1 (en) * 1997-11-21 2002-01-08 Xsides Corporation Alternate display content controller
US20020067429A1 (en) * 1997-11-21 2002-06-06 Nason D. David Alternate display content controller
US6411302B1 (en) * 1999-01-06 2002-06-25 Concise Multimedia And Communications Inc. Method and apparatus for addressing multiple frame buffers
US6323875B1 (en) * 1999-04-28 2001-11-27 International Business Machines Corporation Method for rendering display blocks on display device
US6745385B1 (en) * 1999-09-01 2004-06-01 Microsoft Corporation Fixing incompatible applications by providing stubs for APIs
US20040032423A1 (en) * 1999-09-21 2004-02-19 Xsides Corporation Method and system for controlling a complementary user interface on a display surface
US6677964B1 (en) * 2000-02-18 2004-01-13 Xsides Corporation Method and system for controlling a complementary user interface on a display surface
US20020154214A1 (en) * 2000-11-02 2002-10-24 Laurent Scallie Virtual reality game system using pseudo 3D display driver
US20030014561A1 (en) * 2001-07-13 2003-01-16 Cooper Neil A. System for loading device-specific code and method thereof
US20030140179A1 (en) * 2002-01-04 2003-07-24 Microsoft Corporation Methods and system for managing computational resources of a coprocessor in a computing system
US20030131147A1 (en) * 2002-01-04 2003-07-10 Microsoft Corporation Systems and methods for managing drivers in a computing system
US20040231000A1 (en) * 2003-02-18 2004-11-18 Gossalia Anuj B. Video aperture management
US20040177338A1 (en) * 2003-03-07 2004-09-09 Microsoft Corporation Method for testing a software shim
US20060290700A1 (en) * 2003-07-15 2006-12-28 Alienware Labs. Corp. Multiple parallel processor computer graphics system
US20050012749A1 (en) * 2003-07-15 2005-01-20 Nelson Gonzalez Multiple parallel processor computer graphics system
US20080211816A1 (en) * 2003-07-15 2008-09-04 Alienware Labs. Corp. Multiple parallel processor computer graphics system
US20080136825A1 (en) * 2003-11-19 2008-06-12 Reuven Bakalash PC-based computing system employing a multi-GPU graphics pipeline architecture supporting multiple modes of GPU parallelization dymamically controlled while running a graphics application
US20050149947A1 (en) * 2003-12-10 2005-07-07 Callender Robin L. Driver-specific context for kernel-mode shimming
US20060109266A1 (en) * 2004-06-29 2006-05-25 Sensable Technologies, Inc. Apparatus and methods for haptic rendering using data in a graphics pipeline
US7598953B2 (en) * 2004-11-05 2009-10-06 Microsoft Corporation Interpreter for simplified programming of graphics processor units in general purpose programming languages
US20070094413A1 (en) * 2005-10-19 2007-04-26 Gabriel Salazar System and method for display sharing
US20070129990A1 (en) * 2005-12-01 2007-06-07 Exent Technologies, Ltd. System, method and computer program product for dynamically serving advertisements in an executing computer game based on the entity having jurisdiction over the advertising space in the game
US20070126749A1 (en) * 2005-12-01 2007-06-07 Exent Technologies, Ltd. System, method and computer program product for dynamically identifying, selecting and extracting graphical and media objects in frames or scenes rendered by a software application
US20070171222A1 (en) * 2006-01-23 2007-07-26 Autodesk, Inc. Application-independent method for capturing three-dimensional model data and structure for viewing and manipulation
US20070283175A1 (en) * 2006-05-30 2007-12-06 Ati Technologies Inc. Device Having Multiple Graphics Subsystems and Reduced Power Consumption Mode, Software and Methods
US20080012792A1 (en) * 2006-07-14 2008-01-17 Lenovo (Beijing) Limited Method for acquiring graphics device interface invocation by using filter driver
US7698579B2 (en) * 2006-08-03 2010-04-13 Apple Inc. Multiplexed graphics architecture for graphics power management
US20080042923A1 (en) * 2006-08-16 2008-02-21 Rick De Laet Systems, methods, and apparatus for recording of graphical display
US20080163263A1 (en) * 2006-12-28 2008-07-03 Legend Holdings Ltd. Method for acquisition of gdi and direct x data
US20080168479A1 (en) * 2007-01-05 2008-07-10 Thomas Joseph Purtell Bypass Virtualization
US20080316218A1 (en) * 2007-06-18 2008-12-25 Panologic, Inc. Remote graphics rendering across a network
US20090153540A1 (en) * 2007-12-13 2009-06-18 Advanced Micro Devices, Inc. Driver architecture for computer device having multiple graphics subsystems, reduced power consumption modes, software and methods
US20090172707A1 (en) * 2007-12-31 2009-07-02 S3 Graphics, Inc. Method and system for supporting multiple display devices
US20090307699A1 (en) * 2008-06-06 2009-12-10 Munshi Aaftab A Application programming interfaces for data parallel computing on multiple processors
US20100302261A1 (en) * 2009-05-29 2010-12-02 Microsoft Corporation Fixed Function Pipeline Application Remoting Through A Shader Pipeline Conversion Layer

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8259119B1 (en) * 2007-11-08 2012-09-04 Nvidia Corporation System and method for switching between graphical processing units
US9176795B2 (en) 2010-12-13 2015-11-03 Advanced Micro Devices, Inc. Graphics processing dispatch from user mode
US20120229481A1 (en) * 2010-12-13 2012-09-13 Ati Technologies Ulc Accessibility of graphics processing compute resources
US9176794B2 (en) 2010-12-13 2015-11-03 Advanced Micro Devices, Inc. Graphics compute process scheduling
US20120227057A1 (en) * 2011-03-04 2012-09-06 Microsoft Corporation Driver Shimming
US9519600B2 (en) * 2011-03-04 2016-12-13 Microsoft Technology Licensing, Llc Driver shimming
US9003363B2 (en) 2011-03-21 2015-04-07 Microsoft Technology Licensing, Llc Device flags
US20130021353A1 (en) * 2011-07-18 2013-01-24 Apple Inc. Virtual GPU
US10120728B2 (en) 2011-07-18 2018-11-06 Apple Inc. Graphical processing unit (GPU) implementing a plurality of virtual GPUs
US9727385B2 (en) * 2011-07-18 2017-08-08 Apple Inc. Graphical processing unit (GPU) implementing a plurality of virtual GPUs
CN103718156A (en) * 2011-07-29 2014-04-09 英特尔公司 CPU/GPU synchronization mechanism
US9892481B2 (en) * 2011-07-29 2018-02-13 Intel Corporation CPU/GPU synchronization mechanism
US9633407B2 (en) 2011-07-29 2017-04-25 Intel Corporation CPU/GPU synchronization mechanism
EP2802982A4 (en) * 2012-01-13 2015-09-09 Microsoft Technology Licensing Llc Para-virtualized domain, hull, and geometry shaders
WO2013106491A1 (en) 2012-01-13 2013-07-18 Microsoft Corporation Para-virtualized domain, hull, and geometry shaders
CN104040494A (en) * 2012-01-13 2014-09-10 微软公司 Para-virtualized domain, hull, and geometry shaders
US9613390B2 (en) * 2012-05-02 2017-04-04 Nvidia Corporation Host context techniques for server based graphics processing
US9542715B2 (en) 2012-05-02 2017-01-10 Nvidia Corporation Memory space mapping techniques for server based graphics processing
US9311169B2 (en) 2012-05-02 2016-04-12 Nvidia Corporation Server based graphics processing techniques
US20140055470A1 (en) * 2012-05-02 2014-02-27 Nvidia Corporation Host Context Techniques for Server Based Graphics Processing
US9805439B2 (en) 2012-05-02 2017-10-31 Nvidia Corporation Memory space mapping techniques for server based graphics processing
US20150009221A1 (en) * 2013-07-05 2015-01-08 Nvidia Corporation Direct interfacing of an external graphics card to a data processing device at a motherboard-level
US9117392B2 (en) * 2013-07-05 2015-08-25 Nvidia Corporation Direct interfacing of an external graphics card to a data processing device at a motherboard-level
US10657698B2 (en) * 2017-06-22 2020-05-19 Microsoft Technology Licensing, Llc Texture value patch used in GPU-executed program sequence cross-compilation
CN111736913A (en) * 2019-03-25 2020-10-02 华为技术有限公司 Class loading method and device
US11755341B2 (en) 2019-03-25 2023-09-12 Huawei Technologies Co., Ltd. Class loading method and apparatus

Also Published As

Publication number Publication date
US20110063302A1 (en) 2011-03-17
US20110063304A1 (en) 2011-03-17
US8773443B2 (en) 2014-07-08
US20110063305A1 (en) 2011-03-17
US20110063306A1 (en) 2011-03-17
US20110063309A1 (en) 2011-03-17

Similar Documents

Publication Publication Date Title
US8773443B2 (en) Compression for co-processing techniques on heterogeneous graphics processing units
US8780122B2 (en) Techniques for transferring graphics data from system memory to a discrete GPU
US20110169844A1 (en) Content Protection Techniques on Heterogeneous Graphics Processing Units
CN112269603B (en) Graphic display method and device for compatibly running Android application on Linux
US11798123B2 (en) Mechanism to accelerate graphics workloads in a multi-core computing architecture
JP6073533B1 (en) Optimized multi-pass rendering on tile-based architecture
KR101564859B1 (en) Memory copy engine for graphics processing
US8941670B2 (en) Para-virtualized high-performance computing and GDI acceleration
US10559112B2 (en) Hybrid mechanism for efficient rendering of graphics images in computing environments
JP2003233508A (en) Method for controlling calculation resource in coprocessor in computing system and computing device
JP7253507B2 (en) Early virtualization context switch for virtualization-accelerated processing devices
JP2013546043A (en) Instant remote rendering
JP2006190281A (en) System and method for virtualizing graphic subsystem
US20140198112A1 (en) Method of controlling information processing apparatus and information processing apparatus
US11232535B2 (en) Systems and methods for using EGL with an OpenGL API and a Vulkan graphics driver
EP2802982B1 (en) Para-virtualized domain, hull, and geometry shaders
CN113515396B (en) Graphics rendering method, graphics rendering device, electronic equipment and storage medium
EP3357034B1 (en) Graphics processing unit preemption with pixel tile level granularity
WO2012147364A1 (en) Heterogeneous graphics processor and configuration method thereof
US10089264B2 (en) Callback interrupt handling for multi-threaded applications in computing environments
US20110157189A1 (en) Shared buffer techniques for heterogeneous hybrid graphics
CN108604185B (en) Method and apparatus for efficiently submitting workload to a high performance graphics subsystem
CN114528090A (en) Vulkan-based method for realizing graphic rendering and related device
US20220129308A1 (en) Gang scheduling for low-latency task synchronization
AU2016203532B2 (en) Parallel runtime execution on multiple processors

Legal Events

Date Code Title Description
AS Assignment

Owner name: NVIDIA CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TROCCOLI, ALEJANDRO;DIARD, FRANCK;REEL/FRAME:023718/0340

Effective date: 20091222

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION