WO2009158679A2 - Shader interfaces - Google Patents
Shader interfaces Download PDFInfo
- Publication number
- WO2009158679A2 WO2009158679A2 PCT/US2009/048960 US2009048960W WO2009158679A2 WO 2009158679 A2 WO2009158679 A2 WO 2009158679A2 US 2009048960 W US2009048960 W US 2009048960W WO 2009158679 A2 WO2009158679 A2 WO 2009158679A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- shader
- instance
- media
- bytecode
- registers
- Prior art date
Links
- 238000000034 method Methods 0.000 claims description 30
- 238000004040 coloring Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 21
- 238000009877 rendering Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
- G06F9/4488—Object-oriented
- G06F9/449—Object-oriented method invocation or resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/541—Interprogram communication via adapters, e.g. between incompatible applications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
- G06T15/80—Shading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Definitions
- the current methodology for rendering complex 3D graphic scenes in real time consists of supporting parallel-architecture processors in conjunction with customized logic units to hide latency by distributing the overhead across multiple parallel units.
- the pipelines utilized are designed around a primitive rasterization pipeline that, when provided a high level 3D description of a collection of linear primitives like points, line segments, or triangles, will convert, or rasterize, the collection to the projected pixel representations.
- small programs called "shaders” are used to define the operation of certain stages of the rendering algorithm, like the transformations of the vertices of the primitives or computing the color of a single pixel on the screen.
- the shaders define a small amount of work to be performed in large parallel execution batches, often distributed across many specialized processors on a graphics processing unit (GPU).
- GPU graphics processing unit
- a "primitive” is a basic unit for describing a shape. In computer graphics, the triangle is typically considered the fundamental primitive because all possible 2D and 3D shapes can be composed of triangles. As one skilled in the art will appreciate, other shapes may alternatively be used as primitives in rendering graphics.
- a "shader” is a small, specialized computer program that performs some aspect of a rendering computation. Shaders are responsible for a number of major aspects in the typical rendering pipeline.
- a "rasterizer” is a component that takes an image made up of high-order primitives, such as lines, points, and triangles and converts the image into a raster image
- a raster image is bitmap representation of the primitives with color.
- a rasterizer is software, executed by a GPU, that is configured to color pixels according to primitives produced by shaders.
- a High Level Shading Language is a variant on a programming language (e.g., C, C++, C#, Java, or the like) designed for developing shaders.
- An IL is a low-level, instruction-based, binary representation of the operations a shader stage should perform. It acts as an intermediate optimization step of the compiler and the native instruction set of the graphics hardware. Thus, the IL translates the instructions designated by a developer by using an HLSL into the byte code necessary by the graphics hardware (i.e., the GPU) in order to render graphics.
- JIT Just-in-time
- the present invention takes the form of a computer- program product that includes computer-useable instructions embodied on one or more computer-readable media.
- Computer-readable media include both volatile and nonvolatile media as well as removable and nonremovable media.
- Computer-readable media comprise computer-storage media.
- Computer-storage media, or machine-readable media include media implemented in any method or technology for storing information.
- Examples of stored information include computer-useable instructions, data structures, program modules, and other data representations.
- Computer-storage media include, but are not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory used independently from or in conjunction with different storage media, such as, for example, compact-disc read-only memory (CD-ROM), digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory used independently from or in conjunction with different storage media, such as, for example, compact-disc read-only memory (CD-ROM), digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices.
- CD-ROM compact-disc read-only memory
- DVD digital versatile discs
- computing device 100 includes a bus 110 that directly or indirectly couples the following devices: memory 112, one or more processors 114, one or more presentation components 116, input/output ports 118, input/output components 120, and an illustrative power supply 122.
- Bus 110 represents what may be one or more busses (such as an address bus, data bus, or combination thereof).
- Simple registers 202, 204, 206, and 208 are allocated to available blocks 214, 216, 218, and 220.
- Complex shaders are allocated to available blocks 222 and 224.
- Available blocks allocated to simple shaders i.e., blocks 214, 216, 218, and 220
- contain fewer registers than the available blocks allocated to complex shaders i.e., blocks 222 and 224).
- HLSL contains object-oriented constructs that allow for the grouping of functions and independent resources like variables and textures into classes.
- interfaces can be declared to define a template from which multiple classes can be instantiated.
- the classes that inherit from an interface define the implementations that are to be linked using dynamic linkage.
- the developer creates an interface variable and the defined methods of that interface can be used without reference to the possible class inferences.
- a single shader implementation is selected and built into the shader.
- a particular point in the HLSL code where an interface is used defines a place where all implementations are inline, and all implementation bodies are inserted into the shader. Then, when the shader is actually running, long after compilation, a particular implementation is chosen to execute. [0045] The following code shows how an actual class instance can be selected and method calls replaced with shader code.
- lines 14-24 show IL for DirectionalLight's implementation of Calculate, optimized for the call site at line 27.
- an Fcall instruction indicates an array element and defines the call site for the Calculate routine for a variable, MyMaterial.
- the first parameter indicates the interface table that is to be used (fpO).
- the first bracketed index defines the method index. In this case, there is only one call site so the index is one.
- the second bracketed index defines the index of the call site being executed. In this case there is only one invocation of the Calculate method, so this index is zero, "in” and "out” indicate the registers that are utilized by this call.
- the first "in” parameter always refers to the place in constant memory that the class instance variables are stored — this case cbl4, element zero.
- the fcall instruction refers to the method to be called by providing an index, but does not define the exact implementation to call.
- code is emitted up to the fcall routine and the current state of the registers and other shader states are partially cached and restored around implementation generation.
- the code for the first implementation is generated starting with the current state of register allocation, scratch registers, etc.
- this generation step is complete, the state is restored to the cached state, and the generation is repeated for the next possible implementation is compiled. This cycle repeats until all implementations have code generated.
- the current state is restored the cached state and the impacts of the outputs of the fcall are applied to the current state, and code generation continues after the fcall.
- the resulting methods are emitted up to the fcall routine and the current state of the registers and other shader states are partially cached and restored around implementation generation.
- the code for the first implementation is generated starting with the current state of register allocation, scratch registers, etc.
- the generation is repeated for the next possible implementation is compiled.
- a list objects is obtained by providing the names of the HLSL class instances for the shader to the API that references class instances.
- some embodiments may change states in between sets of primitives, making the class instances only changeable between batches of primitives. This can be seen in the following code.
- Line 1 illustrates a routine, CreatePixelShader, that is provided a string parameter that contains the compiled shader bytecode in pShaderCode, a pointer to an API that references class instances (pMyClassLibrary), and a pointer to a pointer to a pixel
- Line 8 shows code for selecting what call instance to use based on the global input DirectionalLighting. Based on the selection made in line 8, a call is made with the
- a shader object in pMyPS along with one of the two possible class instances that can be applied to the HLSL variable Mylnstance.
- the final argument indicates the length of the array provided in the second argument as there might be more than one interface to resolve in any one shader.
- a call is show in line 13 to a function for rendering the geometry of a scene.
Abstract
Allocation of memory registers for shaders by a processor is described herein. For each shader, registers are allocated based on the shader's level of complexity. Simpler shader instances are restricted to a smaller number of memory registers. More complex shader instances are allotted more registers. To do so, developers' high level shading level (HLSL) language includes template classes of shaders that can later be replaced by complex or simple versions of the shader. The HLSL is converted to bytecode that can be used to rasterize pixels on a computing device.
Description
SHADER INTERFACES
BACKGROUND
[0001] Today's graphic processing units (GPUs) host all of the computations necessary to generate high-quality graphics on computer screens, leaving a computing device's central processing unit (CPU) available for other tasks. Specifically, GPUs render graphics on computer screens by processing numerous programs called "shaders." In short, a shader is a specialized computer program that performs an operation for rendering a two-dimensional (2D) or three-dimensional (3D) graphic. In modern GPUs, realistic scenes are generated by rendering geometry with various virtual materials that are controlled by the shaders. These materials are represented in shader program code, which processes a variety of inputs (including texture maps, light locations, and other data) to generate the visual result. Using shaders, developers can control virtually any graphics or graphic effect by incorporating different vertex shading, primitive shading, and pixel shading.
[0002] The current methodology for rendering complex 3D graphic scenes in real time consists of supporting parallel-architecture processors in conjunction with customized logic units to hide latency by distributing the overhead across multiple parallel units. The pipelines utilized are designed around a primitive rasterization pipeline that, when provided a high level 3D description of a collection of linear primitives like points, line segments, or triangles, will convert, or rasterize, the collection to the projected pixel representations. In existing 3D hardware technologies, small programs called "shaders" are used to define the operation of certain stages of the rendering algorithm, like the transformations of the vertices of the primitives or computing the color of a single pixel on the screen. The shaders define a small amount of work to be performed in large parallel
execution batches, often distributed across many specialized processors on a graphics processing unit (GPU).
[0003] Creation of shaders is done through a highly specialized programming language designed to target the hardware architectures available, and an equivalent compiler is available to take the code and reduce it down to instructions the hardware and associated device driver can use. Developers use this technology in order to customize the rendering pipeline to only the behavior desired for a specific application. For example, if the developer is creating an application that performs a non-photorealistic 3D rendering of very complex themes, the developer can optimize the shaders to be very simple in order to maximize the complexity of the scene. Conversely, if the developer wishes to have very high-fidelity material properties and lighting applied to less complex scenes, the developer may create highly-customized shaders to create very realistic effects that may be very complex. Furthermore, shaders are compiled into an abstract binary form, which a device driver maps for hardware to run.
[0004] To illustrate this point, consider a game scene in which a character is exposed to multiple light sources. One of the light sources may be simply ambient light from a moon at night. Another light source may be extending from a lamp post down a street. With the first light source, that being from the moon, a shader can be written to control the light emitting from the moon. In this case, the light is constant and only needs to be represented by a simple program to disperse the light throughout the scene. The lamp post, however, may be more complex. With the lamp post, the light may only be configured to shine in specific directions; however, the light from the lamp post may not bend around corners. Therefore, a shader written to govern the light from the lamp post may require a more complex computation than the shader written to govern the light
coming from the moon. In either scenario, a GPU must rasterize pixels according to the underlying computations from each shader.
[0005] The common architecture for GPUs provide the trade-off between scene complexity and shader complexity by making resources on the system flexible. To execute, the shader typically requires a processing unit, shader-instanced data, global resources (e.g., texture images), intermediary register banks to perform computations, and a set of output registers. For simple shaders, meaning the shaders require relative few registers to compute, many more shaders can be run simultaneously, resulting in an underlying application or game getting higher frame rates because more work can be done in parallel. For more complex shaders, meaning the shaders require more registers to compute, fewer instances of the more complex shaders can be executed in parallel because more registers are being used. In other words, allocation of registers have a direct determination on the number of shaders that can be processed in parallel. Because the time required to render graphics depends on parallel processing of shaders, it is advantageous to process as many shaders as possible, and thus the allocation of registers is crucial to performance.
SUMMARY
[0006] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. [0007] One aspect of the invention allows a single shader to have alternate paths of varying complexity within it. In this aspect, a selection of a particular path through the shader is provided in a way that allows efficient register allocation. Memory registers are
allocated based on the level of complexity of a shader path or instance. In one embodiment, shader instances are shader programs, or portions thereof, developed by shader developers. Simpler shader paths or instances may be restricted to a smaller number of memory registers. More complex shader paths or instances may be allotted more registers. Another aspect of the invention is directed to a GPU that is configured to allocated memory registers in such a manner.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS [0008] The present invention is described in detail below with reference to the attached drawing figures, wherein:
[0009] FIG. 1 is a block diagram of an exemplary operating environment for use in implementing an embodiment of the present invention;
[0010] FIG. 2 is a diagram illustrating the allocation of memory registers of a computing device in accordance with an embodiment of the invention; and [0011] FIG. 3 is a diagram of flow chart for allocating registers based on shader complexity in accordance with an embodiment of the invention.
DETAILED DESCRIPTION
[0012] The subject matter described herein is presented with specificity to meet statutory requirements. The description herein, however, is not intended to limit the scope of this patent. Rather, it is contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the term "block" may be used herein to connote different elements of
methods employed, the term should not be interpreted as implying any particular order among or between various steps herein disclosed.
[0013] Further, various technical terms are used throughout this description. A definition of such terms can be found in Newton's Telecom Dictionary by H. Newton, 21s* Edition (2005). These definitions are intended to provide a clearer understanding of the ideas disclosed herein but are not intended to limit the scope of the present invention. The definitions and terms should be interpreted broadly and liberally to the extent allowed the meaning of the words offered in the above-cited reference.
[0014] The invention can generally be described as one or more systems for, methods to, and computer-storage media for providing dynamic code linkage to optimally allocate registers for shaders based on their level of complexity. In one embodiment, developers can create their own shader classes and registers are allocated for each shader based on complexity. Simpler shaders are allocated fewer registers, and complex shaders are allocated additional registers.
[0015] As one skilled in the art will appreciate, embodiments of the present invention may be embodied as, among other things: a method, system, or computer-program product that is embodied on one or more tangible computer-readable media. Accordingly, the embodiments may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware.
[0016] Dynamic code binding provide a mechanism for abstracting the implementation of a function from the consumers of the function, offered by providing a level of indirection between a function call and implementation. Traditionally, this indirection is performed by first looking into a virtual function table to find the location of the function to execute. When the application is executed, the table which was previously empty is filled with the locations of the function implementations (the actual act of "linking"),
therefore allowing the application to actually execute the functions as needed. In one embodiment, a subset dynamic linkage is provided that reduces the number of permutations of specialized shaders while still offering global optimizations across the abstraction boundary.
[0017] Instead of providing a single compiled implementation for each abstract function, embodiments generate compiled code in such a way that each use of a specific instance of a shader is compiled as if it was inlined in the code and then stored in a table sorted by function type and call location. It is important to understand that embodiments described herein differ from typical linkage in that at runtime no calling convention is used. Instead, each time a function should be called, a version of the function is emitted to match the call site's register state and other state. Because a new version of the function is emitted for each location in the shader code that the function is called from, all optimizations used when inlining apply, except that the function code must remain functionally separate from main shader code. Because embodiments described herein differ from "real" linkage, the amount of code generated by embodiments described herein can become quite large. No code sharing occurs between multiple call sites. If code is larger than the code cache, and the penalty from the latency of the cache miss is not hidden.
[0018] Selectable inlining is used by some of the embodiments described herein. Selectable inlining allows a system to generate a shader instantiation that not is not only close to the optimal instruction usage, but also utilizes the minimum needed registers per invocation for a given task. In this embodiment, the total number of registers needed by a specific shader invocation can be calculated quickly by the device driver and allocated accordingly. This keeps very complex calculations from affecting register usage unless calculations are actually being performed.
[0019] In order to maintain optimization, embodiments are configured to emit a different version of each used method per call site, acting as if the method were inline to allow for optimizations across the method-call boundary. This has a trade -off of space and - unlike standard linkage which creates only one compiled version of each method, embodiments described herein create many, potentially causing larger binary files. [0020] Embodiments described herein generally reference the Direct3D APIs included within various versions of the Windows® operating system (OS). Embodiments are not limited to Direct3D APIs. One skilled in the art will understand that various APIs in different OSs provides similar functionality to the calls and routines described herein. For clarity sake, however, reference is made herein to Direct3D.
[0021] Before proceeding further, a number of key definitions should be defined. While the below definitions should aid the reader in understanding the embodiments described herein, the definitions are provided merely for explanatory purposes. . [0022] A "primitive" is a basic unit for describing a shape. In computer graphics, the triangle is typically considered the fundamental primitive because all possible 2D and 3D shapes can be composed of triangles. As one skilled in the art will appreciate, other shapes may alternatively be used as primitives in rendering graphics. [0023] A "shader" is a small, specialized computer program that performs some aspect of a rendering computation. Shaders are responsible for a number of major aspects in the typical rendering pipeline. These aspects include, inter alia, vertex shading, primitive (or geometry) shading, and pixel shading. One skilled in the art will understand that vertex shading refers to determining the position and orientation of the vertices of a primitive — e.g., where to place the vertices of a triangle in 2D so the triangle appears to be in 3D. Primitive shading describes surface operations for a single primitive. And pixel shading
colors each pixel based on a rendered primitive, in order to draw the primitive to the screen.
[0024] Shaders may be developed, or programmed, to handle virtually any aspect of a gaming experience. For example, a shader may be written to govern the reflection of light off of a character's skin, based on the color of the character, time, lighting, or other relevant variable. As previously mentioned, some shaders are relatively simple; whereas, others may require more complex computations. Simple shaders may, in some embodiments, require fewer memory registers to process than complex shaders.
[0025] A "rasterizer" is a component that takes an image made up of high-order primitives, such as lines, points, and triangles and converts the image into a raster image
(i.e., pixels) for output on a video display. A raster image is bitmap representation of the primitives with color. In one embodiment, a rasterizer is software, executed by a GPU, that is configured to color pixels according to primitives produced by shaders.
[0026] Direct3D is an application program interface (API) provided within Windows® for rendering 2D and 3D scenes. Direct 3D includes a primitive rasterizer with programmable stages that allows developers within the Windows® platform to load customized programs onto a GPU for rendering. Numerous versions of Direct3D are currently in use, and therefore should be understood by one of skill in the art.
[0027] A High Level Shading Language (HLSL) is a variant on a programming language (e.g., C, C++, C#, Java, or the like) designed for developing shaders.
Specifically, developers program shaders in HLSL. In operation, HLSL is compiled into an intermediary language (IL) for use with a graphics application.
[0028] An IL is a low-level, instruction-based, binary representation of the operations a shader stage should perform. It acts as an intermediate optimization step of the compiler and the native instruction set of the graphics hardware. Thus, the IL translates the
instructions designated by a developer by using an HLSL into the byte code necessary by the graphics hardware (i.e., the GPU) in order to render graphics.
[0029] Just-in-time (JIT) refers to a fast-process compilation performed on the shader IL to convert the shader into the native instruction set of the graphics hardware. JIT simply refers to the point in time when actions occur in programming and processing. One skilled in the art will understand how JIT compilation works in other programmatic languages, such as Python and C#.
[0030] In one embodiment, the present invention takes the form of a computer- program product that includes computer-useable instructions embodied on one or more computer-readable media. Computer-readable media include both volatile and nonvolatile media as well as removable and nonremovable media. By way of example, and not limitation, computer-readable media comprise computer-storage media. Computer-storage media, or machine-readable media, include media implemented in any method or technology for storing information.
[0031] Examples of stored information include computer-useable instructions, data structures, program modules, and other data representations. Computer-storage media include, but are not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory used independently from or in conjunction with different storage media, such as, for example, compact-disc read-only memory (CD-ROM), digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. These memory components can store data momentarily, temporarily, or permanently.
[0032] Having briefly described a general overview of the embodiments described herein, an exemplary computing device is described below. Referring initially to FIG. 1 in
particular, an exemplary operating environment for implementing the present invention is shown and designated generally as computing device 100. Computing device 100 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should computing device 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated. In one embodiment, computing device 100 is a conventional computer (e.g., a personal computer or laptop). [0033] One embodiment of the invention may be described in the general context of computer code or machine -useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine.. Generally, program modules including routines, programs, objects, components, data structures, and the like refer to code that perform particular tasks or implement particular abstract data types. Embodiments described herein may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, etc. Embodiments described herein may also be practiced in distributed computing environments where tasks are performed by remote -processing devices that are linked through a communications network. [0034] With continued reference to FIG. 1, computing device 100 includes a bus 110 that directly or indirectly couples the following devices: memory 112, one or more processors 114, one or more presentation components 116, input/output ports 118, input/output components 120, and an illustrative power supply 122. Bus 110 represents what may be one or more busses (such as an address bus, data bus, or combination thereof). Although the various blocks of FIG. 1 are shown with lines for the sake of clarity, in reality, delineating various components is not so clear, and metaphorically, the lines would more accurately be gray and fuzzy. For example, one may consider a
presentation component such as a display device to be an I/O component. Also, processors have memory. It will be understood by those skilled in the art that such is the nature of the art, and, as previously mentioned, the diagram of FIG. 1 is merely illustrative of an exemplary computing device that can be used in connection with one or more embodiments of the present invention. Distinction is not made between such categories as "workstation," "server," "laptop," "hand-held device," etc., as all are contemplated within the scope of FIG. 1 and reference to "computing device."
[0035] Computing device 100 typically includes a variety of computer-readable media. By way of example, and not limitation, computer-readable media may comprise RAM; ROM; EEPROM; flash memory or other memory technologies; CDROM, DVD or other optical or holographic media; magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or similar tangible media that configurable to store data and/or instructions relevant to the embodiments described herein.
[0036] Memory 112 includes computer-storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, nonremovable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, cache, optical-disc drives, etc. Computing device 100 includes one or more processors that read data from various entities such as memory 112 or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
[0037] I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O components 120, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
[0038] Computing device 100 also includes a GPU 124 capable of simultaneously processing multiple shaders in parallel threads. To do so, the GPU 124 may be equipped with various device drivers and, in actuality, comprise multiple processors. [0039] FIG. 2 is a diagram illustrating the allocation of memory registers of a computing device in accordance with an embodiment of the invention. As shown in FIG. 2, a plethora of memory registers 200 are available on the computing device. The same block of registers 200 is presented side-by-side to illustrate how the memory is allocated for either simple shader instances 202, 204, 206, and 208 and complex shader instances 210 and 212. Instances may include either programmatic representations of the shaders or bytecode renditions of the shaders. Moreover, the allocation of registers 200 may be performed by a GPU or CPU on the computing device.
[0040] Simple registers 202, 204, 206, and 208 are allocated to available blocks 214, 216, 218, and 220. Complex shaders are allocated to available blocks 222 and 224. Available blocks allocated to simple shaders (i.e., blocks 214, 216, 218, and 220) contain fewer registers than the available blocks allocated to complex shaders (i.e., blocks 222 and 224).
[0041] In one embodiment, the blocks of registers are allocated in code by designating two separate shaders for scenarios (simple and complex) and loading the appropriate scenario whenever necessary. To do so, pointers may be set for a given shader. [0042] FIG. 3 is a diagram of flow chart for allocating registers based on shader complexity in accordance with an embodiment of the invention. Initially, interfaces are declared in HLSL defining a template from which multiple shader classes can be instantiated, as indicated at 302. A variable for inlining a shader implementation is also defined in a shader program, as indicated at 304. In one embodiment, a shader implementation is a routine or sub-routine. Actual instances are designated, as indicated at
306, by pointing to a table storing data about the shader implementation. And method calls in the shader program are replaced with the actual class instances (routines or subroutines), as indicated at 308. To illustrate the aforesaid steps, code that can integrate with Direct3D is presented below and discussed in detail.
[0043] Direct3D contains a number of discreet shader stages, each meant for a separate purpose in the rendering pipeline. These six stages create a rendering pipeline where the developer writes code to control the operation of each shade or stage. To target these stages, the developer uses an HLSL and the associated HLSL compiler, which converts HLSL code into optimized shader byte code. As previously mentioned, byte code is a low-level representation of compiled HLSL code for use a graphics device driver in the graphics hardware.
[0044] In one embodiment, HLSL contains object-oriented constructs that allow for the grouping of functions and independent resources like variables and textures into classes. In this paradigm, interfaces can be declared to define a template from which multiple classes can be instantiated. When defined in this way, the classes that inherit from an interface define the implementations that are to be linked using dynamic linkage. In order to define a point in a shader program into which one of the implementations can be inserted, the developer creates an interface variable and the defined methods of that interface can be used without reference to the possible class inferences. In one embodiment, a single shader implementation is selected and built into the shader. In an alternative embodiment, a particular point in the HLSL code where an interface is used defines a place where all implementations are inline, and all implementation bodies are inserted into the shader. Then, when the shader is actually running, long after compilation, a particular implementation is chosen to execute.
[0045] The following code shows how an actual class instance can be selected and method calls replaced with shader code.
1 interface Light
2 {
3 float3 Calculate(float3 Position, float3 Normal);
4 } 5
6 class AmbientLight : Light
V {
8 float3 Calculate(float3 Position, float3 Normal)
9 {
10 return AmbientValue ;
11 }
12
13 float3 AmbientValue;
14 } 15
16 class DirectionalLight : Light
17 {
18 float3 Calculate(float3 Position, float3 Normal)
19 {
20 float3 LightDir = normalize(Position - LightPosition);
21 float LightContrib = saturate( dot( Normal, -LightDir) );
22 return LightColor * LightContrib;
23 } 24
25 float3 LightPosition;
26 float3 LightColor;
27 } 28
29 AmbientLight My Ambient;
30 DirectionalLight MyDirectional; 31
32 float4 main (Light Mylnstance, float3 CurPos: CurPosition,
33 float3 Normal : Normal) : SV_Target
34 {
35 float4 Ret;
36 Ret.xyz = Mylnstance. Calculate(CurPos, Normal);
37 Ret.w = 1.0; 38
39 return Ret;
40 }
The above example is written in HLSL, and the actual representation of bytecode is in
binary. An explanation of the above code is presented in the following paragraphs.
[0046] Lines 1-4 define an interface called Light, which is the parent interface of classes defined in the example. In line 3, a prototype for the Calculate method is defined. Calculate must be implemented by any subclass of Light. Lines 6-14 define AmbientLight, which is a simple implementation of the Light interface (i.e., a simple shader definition). Lines 18-23 show an implementation of the Calculate method with a signature that is identical to Light: :Calculate but with code more complex than DirectionalLight::Calculate (i.e., a complex shader definition).
[0047] Lines 25-26 show depict local class variables needed for the operation of DirectionalLight. Lines 29-30 show class instance definitions for a two variables: My Ambient (a simple shader definition) and MyDirectional (a complex shader definition). These variables act as binding points to a rendering pipeline and identify the possible implementations that can be selected for use in the My Instance variable's place described below.
[0048] Lines 32-40 show the main shader portion of the program. The first argument is a generic interface variable of type Light. At the point that Light is used, a special invocation instruction is inserted (in one embodiment). As a result, all implementation bodies will appear in the shader bytecode. Tables may then link the invocation to the bodies it might execute. The remaining parameters are standard rendering pipeline variables used in a standard Direct3D shader.
[0049] In operation, the above code may be written in HLSL code and sent to an HLSL compiler for conversion into bytecode. The bytecode will, in turn, be provided to a driver on the GPU to be set as the program for the shader stage described in the HLSL code. In previous versions of Direct3D, this bytecode consisted of a low-level description of the inputs, outputs, and dependent resources needed by the shader and assembly-style instructions that define the operation of the shader stage. With respect to embodiments
described herein, the Direct3D bytecode further includes of sub-routines that define inputs, outputs, and operational instructions for the sub-routine and tables that define the usage points of abstract interface methods and which of the method definitions can be inlined into various points in the shader. It is important to note that the example below is written in semi-readable text called disassembly, and the actual representation of bytecode is in binary.
[0050] The bytecode is meant to represent a highly optimized, expressive definition of the state and expected execution of the shader stage. In versions of Direct3D prior to Direct3D 10, this bytecode matched the exact instructions that could be executed on the graphics hardware. Because of divergent architectures in Direct3D 10 (e.g., pipeline emulation on the CPU), the bytecode was revised to instead provide an intermediate representation — the IL. In one embodiment, device drivers for a GPU are provided the IL
and convert the code to the proper native instructions in a JIT process. In another embodiment, the IL is designed in such a way that the JIT operation can be performed with minimal need for optimization or reformatting. Additionally, separate class instances may have the JIT operation performed on them at the creation of the shader definition — rather
than when linkage occurs — in order to assure that at link time, the linkage can be performed as a trivial inline. The aforesaid can be seen in the exemplary code below. It should be noted that the code presented below is not meant to limit embodiments of the present invention. Other code may alternatively be used.
1 dcl func table ftO { fbO }
2
3 dcl func table ftl { fbl }
4
5 dcl_func_ptr fpθ[l][l] = { ftO, ftl };
6
7 fbO: in:(const ivO.xyzw,
8 const ivl.xyzw,
9 const iv2.xyzw),
10 out:(oo0.xyzw)
11 mov ooO, cb[ivθ.x][ivθ.y]
12 ret
13
14 fbl : in:(const ivO.xyzw,
15 const ivl.xyzw,
16 const iv2.xyzw),
17 out(ooθ.xyzw)
18 add rO.xyz, ivl.xyzx, -cb[ivθ.x][ivθ.y].xyzx
19 dp3 rO.w, rO.xyzx, rO.xyzx
20 rsq rO.w, rO.w
21 mul rO.xyz, rO.xyzx, rO.wwww
22 dp3 sat rO.x, iv2.xyzx, -rO.xyzx
23 mul ooO.xyz, rO.xxx, cb[ivθ.x][ivθ.y+l].xyz
24 ret
25
26 main:
27 fcall fpO[O][O], in:(cbl4[0], vO, vl), out:(oO)
28 ret
[0051] Line 1 depicts a class instance table for AmbientLight and lists all function implementations for a specific class instance. In the above code, there is only one function, Calculate, which is called once in the main shader code. Therefore, only one implementation exists, fbO. Additional methods, functions, routines, or calls to existing methods in the class AmbientLight could be referenced as well. Furthermore, line 3 shows a class instance table for the variable DirectionalLight discussed in reference to the previous code.
[0052] The table interface used to dispatch via Mylnstance is on line 5. The first array dimension indicates if the interface variable is an array. In this case there is only one element, so the dimension is given a value of 1. The second array dimension is the number
of call sites for the interface. In this case there is only one method, Calculate, so the dimension is one. Finally, the list in braces is the list of class instance tables that can be used by this interface variable. Since both ftO (AmbientLight) and ftl (DirectionalLight) inherit from Light, these are the two tables that are listed.
[0053] Lines 7-12 show IL for AmbientLight's implementation of Calculate, optimized for the call site at line 27. If there were additional call sites that used the Calculate function for the AmbientLight class, there would be multiple blocks like this one optimized for the specific call site. Note that registers labeled as "iv" and "ov" are used instead of standard HLSL registers like x, s, or cb. If multiple call sites emit the same set of instructions, the redundant blocks can be removed and the various call sites will point to a single block. This means that the call site enumerates the registers, requiring that a substitution of registers needs to occur as part of the inlining process. Additionally, lines 14-24 show IL for DirectionalLight's implementation of Calculate, optimized for the call site at line 27.
[0054] The main shader code block is presented in lines 26-28. In 27, an Fcall instruction indicates an array element and defines the call site for the Calculate routine for a variable, MyMaterial. The first parameter indicates the interface table that is to be used (fpO). The first bracketed index defines the method index. In this case, there is only one call site so the index is one. The second bracketed index defines the index of the call site being executed. In this case there is only one invocation of the Calculate method, so this index is zero, "in" and "out" indicate the registers that are utilized by this call. The first "in" parameter always refers to the place in constant memory that the class instance variables are stored — this case cbl4, element zero.
[0055] In one embodiment, the fcall instruction refers to the method to be called by providing an index, but does not define the exact implementation to call. When generating the IL and then later the native hardware instructions for program execution, code is emitted up to the fcall routine and the current state of the registers and other shader states are partially cached and restored around implementation generation. The code for the first implementation is generated starting with the current state of register allocation, scratch
registers, etc. Once this generation step is complete, the state is restored to the cached state, and the generation is repeated for the next possible implementation is compiled. This cycle repeats until all implementations have code generated. Finally the current state is restored the cached state and the impacts of the outputs of the fcall are applied to the current state, and code generation continues after the fcall. The resulting methods
generated are defined in the IL and referenced in class instance tables, which match the structure of the interface definitions and have reference to each compiled function version created.
[0056] To implement the above in a C API, minimal changes are made to Direct3D. In one embodiment, a new API is added for referencing the class instances provided by a shader. Additionally, another API is created to reference a class instance. [0057] In operation, when this shader object is bound to the pipeline, the application
has the opportunity to provide a listing of the specific class instances it wishes to utilize for the available bind points in the shader. To do so, a list objects is obtained by providing the names of the HLSL class instances for the shader to the API that references class instances.
[0058] The class-referencing API may only allow interaction in a batched mechanism, meaning that the application can only change the state of the rendering pipeline between sets of draw calls rather than more granularly, like between the rendering of pixels. Yet,
some embodiments may change states in between sets of primitives, making the class instances only changeable between batches of primitives. This can be seen in the following code.
1 pDevice->CreatePixelShader(pShaderCode, pMyClassLibrary, &pMyPS); 2
3 pMyDirectionalLight = pMyClassLibrary->GetClassInstance("MyDirectional");
4 pMyAmbientLight = pMyClassLibrary->GetClassInstance("My Ambient");
5
6 while (true)
V {
8 if (DirectionalLighting)
9 pDevice->PSSetShader(pMyPS, &pMyDirectionalLight, 1);
10 else
11 pDevice->PSSetShader(pMyPS, &pMyAmbientLight, 1); 12
13 RenderS cene();
14 }
[0059] Line 1 illustrates a routine, CreatePixelShader, that is provided a string parameter that contains the compiled shader bytecode in pShaderCode, a pointer to an API that references class instances (pMyClassLibrary), and a pointer to a pointer to a pixel
shader (pMyPS). The routine examines the bytecode and populates pMyClassLibrary with information on what interfaces and class instances are available in the shader. Additionally, the bytecode is also provided to a device driver of a GPU, which performs a JIT conversion of the code to a native representation and stores the converted code internally. Finally a reference is returned to the shader mpMyPS for later API use. [0060] In line 3, a pointer (pMyDirectionalLight) to the API that references class
instances is set to reference the MyDirectional class instance in the shader. In line 4, a pointer (pMyAmbientLight) to the API that references class instances is set to reference the My Ambient class instance in the shader. Lines 6-14 depict a loop that will render the
scene repeatedly until the program exits.
[0061] Line 8 shows code for selecting what call instance to use based on the global input DirectionalLighting. Based on the selection made in line 8, a call is made with the
shader object in pMyPS along with one of the two possible class instances that can be applied to the HLSL variable Mylnstance. The final argument indicates the length of the array provided in the second argument as there might be more than one interface to resolve
in any one shader. Finally, a call is show in line 13 to a function for rendering the geometry of a scene.
[0062] Although the subject matter has been described in language specific to structural features and methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. For example, sampling rates and sampling periods other than those described herein may also be captured by the breadth of the claims.
Claims
1. One or more computer-readable media having computer-executable instructions embodied thereon for allocating memory registers to shader instances, the method comprising: declaring one or more interfaces to define a shader template, wherein one or more classes of a shader depend from the one or more interfaces (302); defining a variable in a shader program for inlining an actual shader instance (304); and replacing a call in the shader program with the actual shader instance (308).
2. The media of claim 1, further comprising allocating memory for the shader instance.
3. The media of claim 2, wherein a number of registers associated with the memory and allocated for the shader instance depends on a level of complexity for the shader instance.
4. The media of claim 1, further comprising: translating the shader instance into bytecode; and rasterizing pixels on a computing device based on the bytecode..
5. The media of claim 4, wherein the shader instance is compiled into bytecode by a high level shading language (HLSL) compiler.
6. The media of claim 1 , wherein the shader instance is programmed in a high level shading language (HLSL).
7. The media of claim 1, wherein the shader instance is used to determine an operation in a three-dimensional (3D) graphic.
8. The media of claim 1, further comprising: providing compiled shader bytecode; providing a pointer to an application program interface; providing a pointer to a pixel-shading shader; providing the shader bytecode to a device driver for a graphics processing uinit (GPU); and returining a reference to the shader instance.
9. The media of claim 8, further comprising: referencing the one or more classes of the shader; storing the one more classes in a memory location; and designating a pointer to the memory location.
10. The media of claim 1, further comprising binding the shader instance to a pipeline that is executed by a graphics processing unit (GPU).
11. The media of claim 10, wherein the pipeline simultaneously executes numerous shader instances in parallel.
12. The media of claim 1, wherein replacing a call in the shader program further comprises: receiving one of two or more shader instances of the shader; and replacing the call in the shader program with the one of two more shader instances of the shader.
13. A method for processing one or more shaders on a computing device, comprising: declaring one or more interfaces to define a shader template, wherein one or more classes of a shader depend from the one or more interfaces (302); defining a variable in a shader program for inlining an actual shader instance (304); and replacing a call in the shader program with the actual shader instance (308).
14. The media of claim 13, further comprising: providing compiled shader bytecode; providing a pointer to an application program interface; providing a pointer to a pixel-shading shader; providing the shader bytecode to a device driver for a graphics processing uinit (GPU); and returining a reference to the shader instance.
15. The media of claim 13, determining a number of registers to allocate to the shader instance based on a complexity associated with the shader instance.
16. The media of claim 13, wherein the shader instance is used to determine an operation in a three-dimensional (3D) graphic.
17. A computing device configured to render likenesses of three- dimensional graphics in two-dimensions, comprising: a memory unit with one or more memory registers (112); a graphics processing unit (GPU) configured to allocate the one or more memory registers associated with the shader based on a previously defined shader class (124).
18. The computing device of claim 17, wherein the GPU is further configured to: declare one or more interfaces to define a shader template, wherein one or more classes of a shader depend from the one or more interfaces; define a variable in a shader program for inlining an actual shader instance; and replace a call in the shader program with the actual shader instance.
19. The computing device of claim 17, wherein the GPU executes a rasterizer for coloring pixels based on the actual shader instance.
20. The computing device of claim 18, wherein the actual shader instance determines the position of one or more primitives.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP09771210.3A EP2289050B1 (en) | 2008-06-27 | 2009-06-26 | Shader interfaces |
CN200980124880.0A CN102077251B (en) | 2008-06-27 | 2009-06-26 | Shader interfaces |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/163,734 US8581912B2 (en) | 2008-06-27 | 2008-06-27 | Dynamic subroutine linkage optimizing shader performance |
US12/163,734 | 2008-06-27 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2009158679A2 true WO2009158679A2 (en) | 2009-12-30 |
WO2009158679A3 WO2009158679A3 (en) | 2010-05-06 |
WO2009158679A8 WO2009158679A8 (en) | 2010-11-18 |
Family
ID=41445370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2009/048960 WO2009158679A2 (en) | 2008-06-27 | 2009-06-26 | Shader interfaces |
Country Status (4)
Country | Link |
---|---|
US (3) | US8581912B2 (en) |
EP (1) | EP2289050B1 (en) |
CN (1) | CN102077251B (en) |
WO (1) | WO2009158679A2 (en) |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9849372B2 (en) * | 2012-09-28 | 2017-12-26 | Sony Interactive Entertainment Inc. | Method and apparatus for improving efficiency without increasing latency in emulation of a legacy application title |
US8416238B2 (en) * | 2009-02-18 | 2013-04-09 | Autodesk, Inc. | Modular shader architecture and method for computerized image rendering |
US8379024B2 (en) * | 2009-02-18 | 2013-02-19 | Autodesk, Inc. | Modular shader architecture and method for computerized image rendering |
US8368694B2 (en) * | 2009-06-04 | 2013-02-05 | Autodesk, Inc | Efficient rendering of multiple frame buffers with independent ray-tracing parameters |
US8970588B1 (en) * | 2009-07-31 | 2015-03-03 | Pixar | System and methods for implementing object oriented structures in a shading language |
US9245371B2 (en) * | 2009-09-11 | 2016-01-26 | Nvidia Corporation | Global stores and atomic operations |
US8756590B2 (en) | 2010-06-22 | 2014-06-17 | Microsoft Corporation | Binding data parallel device source code |
US8677186B2 (en) | 2010-12-15 | 2014-03-18 | Microsoft Corporation | Debugging in data parallel computations |
US8997066B2 (en) | 2010-12-27 | 2015-03-31 | Microsoft Technology Licensing, Llc | Emulating pointers |
US8539458B2 (en) | 2011-06-10 | 2013-09-17 | Microsoft Corporation | Transforming addressing alignment during code generation |
US9378560B2 (en) * | 2011-06-17 | 2016-06-28 | Advanced Micro Devices, Inc. | Real time on-chip texture decompression using shader processors |
US9495722B2 (en) * | 2013-05-24 | 2016-11-15 | Sony Interactive Entertainment Inc. | Developer controlled layout |
US10255650B2 (en) | 2013-05-24 | 2019-04-09 | Sony Interactive Entertainment Inc. | Graphics processing using dynamic resources |
US9779535B2 (en) | 2014-03-19 | 2017-10-03 | Microsoft Technology Licensing, Llc | Configuring resources used by a graphics processing unit |
US9766954B2 (en) | 2014-09-08 | 2017-09-19 | Microsoft Technology Licensing, Llc | Configuring resources used by a graphics processing unit |
KR102263326B1 (en) | 2014-09-18 | 2021-06-09 | 삼성전자주식회사 | Graphic processing unit and method of processing graphic data using the same |
US10210591B2 (en) * | 2015-02-02 | 2019-02-19 | Microsoft Technology Licensing, Llc | Optimizing compilation of shaders |
US9881351B2 (en) | 2015-06-15 | 2018-01-30 | Microsoft Technology Licensing, Llc | Remote translation, aggregation and distribution of computer program resources in graphics processing unit emulation |
US9786026B2 (en) | 2015-06-15 | 2017-10-10 | Microsoft Technology Licensing, Llc | Asynchronous translation of computer program resources in graphics processing unit emulation |
CN105374070B (en) * | 2015-12-11 | 2018-07-06 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of 3D image processing algorithms modeling and simulating method |
US10152819B2 (en) | 2016-08-15 | 2018-12-11 | Microsoft Technology Licensing, Llc | Variable rate shading |
KR102644276B1 (en) * | 2016-10-10 | 2024-03-06 | 삼성전자주식회사 | Apparatus and method for processing graphic |
US10147227B2 (en) | 2017-02-17 | 2018-12-04 | Microsoft Technology Licensing, Llc | Variable rate shading |
US20180275957A1 (en) * | 2017-03-27 | 2018-09-27 | Ca, Inc. | Assistive technology for code generation using voice and virtual reality |
GB2570304B (en) * | 2018-01-18 | 2022-06-01 | Imagination Tech Ltd | Topology preservation in a graphics pipeline |
CN108874396A (en) * | 2018-05-31 | 2018-11-23 | 苏州蜗牛数字科技股份有限公司 | The cross-compiler and Compilation Method of multi-platform multiple target language based on HLSL |
CN108830920B (en) * | 2018-06-28 | 2022-06-21 | 武汉斗鱼网络科技有限公司 | Method and device for creating constant buffer area and readable storage medium |
US11107263B2 (en) * | 2018-11-13 | 2021-08-31 | Intel Corporation | Techniques to manage execution of divergent shaders |
CN109710264B (en) * | 2018-12-19 | 2022-06-14 | 森大(深圳)技术有限公司 | Gerber file conversion method, system, device and storage medium |
US11295507B2 (en) * | 2020-02-04 | 2022-04-05 | Advanced Micro Devices, Inc. | Spatial partitioning in a multi-tenancy graphics processing unit |
US11069119B1 (en) | 2020-02-28 | 2021-07-20 | Verizon Patent And Licensing Inc. | Methods and systems for constructing a shader |
US11475533B2 (en) | 2020-05-18 | 2022-10-18 | Qualcomm Incorporated | GPR optimization in a GPU based on a GPR release mechanism |
US11550554B2 (en) * | 2021-01-07 | 2023-01-10 | Microsoft Technology Licensing, Llc | Merged machine-level intermediate representation optimizations |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6654951B1 (en) | 1998-12-14 | 2003-11-25 | International Business Machines Corporation | Removal of unreachable methods in object-oriented applications based on program interface analysis |
US6704927B1 (en) | 1998-03-24 | 2004-03-09 | Sun Microsystems, Inc. | Static binding of dynamically-dispatched calls in the presence of dynamic linking and loading |
US20040237074A1 (en) | 2003-05-23 | 2004-11-25 | Microsoft Corporation | Optimizing compiler transforms for a high level shader language |
Family Cites Families (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5838686A (en) * | 1994-04-22 | 1998-11-17 | Thomson Consumer Electronics, Inc. | System for dynamically allocating a scarce resource |
US6041179A (en) | 1996-10-03 | 2000-03-21 | International Business Machines Corporation | Object oriented dispatch optimization |
US7548238B2 (en) * | 1997-07-02 | 2009-06-16 | Nvidia Corporation | Computer graphics shader systems and methods |
US6175956B1 (en) * | 1998-07-15 | 2001-01-16 | International Business Machines Corporation | Method and computer program product for implementing method calls in a computer system |
US6507946B2 (en) * | 1999-06-11 | 2003-01-14 | International Business Machines Corporation | Process and system for Java virtual method invocation |
JP4118456B2 (en) * | 1999-06-29 | 2008-07-16 | 株式会社東芝 | Program language processing system, code optimization method, and machine-readable storage medium |
US6658657B1 (en) * | 2000-03-31 | 2003-12-02 | Intel Corporation | Method and apparatus for reducing the overhead of virtual method invocations |
US6704297B1 (en) | 2000-08-23 | 2004-03-09 | Northrop Grumman Corporation | Downlink orderwire integrator and separator for use in a satellite based communications system |
US6941550B1 (en) * | 2001-07-09 | 2005-09-06 | Microsoft Corporation | Interface invoke mechanism |
US7564460B2 (en) | 2001-07-16 | 2009-07-21 | Microsoft Corporation | Systems and methods for providing intermediate targets in a graphics system |
US7103878B2 (en) * | 2001-12-13 | 2006-09-05 | Hewlett-Packard Development Company, L.P. | Method and system to instrument virtual function calls |
US7159212B2 (en) | 2002-03-08 | 2007-01-02 | Electronic Arts Inc. | Systems and methods for implementing shader-driven compilation of rendering assets |
US7015909B1 (en) | 2002-03-19 | 2006-03-21 | Aechelon Technology, Inc. | Efficient use of user-defined shaders to implement graphics operations |
JP3956112B2 (en) * | 2002-06-12 | 2007-08-08 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Compiler, register allocation device, program, recording medium, compilation method, and register allocation method |
US6809732B2 (en) | 2002-07-18 | 2004-10-26 | Nvidia Corporation | Method and apparatus for generation of programmable shader configuration information from state-based control information and program instructions |
US20040095348A1 (en) * | 2002-11-19 | 2004-05-20 | Bleiweiss Avi I. | Shading language interface and method |
US6839062B2 (en) | 2003-02-24 | 2005-01-04 | Microsoft Corporation | Usage semantics |
US7523406B2 (en) * | 2003-07-22 | 2009-04-21 | Autodesk Inc. | Dynamic parameter interface |
US8035646B2 (en) | 2003-11-14 | 2011-10-11 | Microsoft Corporation | Systems and methods for downloading algorithmic elements to a coprocessor and corresponding techniques |
US7463259B1 (en) * | 2003-12-18 | 2008-12-09 | Nvidia Corporation | Subshader mechanism for programming language |
US7218291B2 (en) | 2004-09-13 | 2007-05-15 | Nvidia Corporation | Increased scalability in the fragment shading pipeline |
US20060082577A1 (en) | 2004-10-20 | 2006-04-20 | Ugs Corp. | System, method, and computer program product for dynamic shader generation |
US7598953B2 (en) | 2004-11-05 | 2009-10-06 | Microsoft Corporation | Interpreter for simplified programming of graphics processor units in general purpose programming languages |
US7548244B2 (en) * | 2005-01-12 | 2009-06-16 | Sony Computer Entertainment Inc. | Interactive debugging and monitoring of shader programs executing on a graphics processor |
US7394464B2 (en) | 2005-01-28 | 2008-07-01 | Microsoft Corporation | Preshaders: optimization of GPU programs |
US8144149B2 (en) * | 2005-10-14 | 2012-03-27 | Via Technologies, Inc. | System and method for dynamically load balancing multiple shader stages in a shared pool of processing units |
US20070091088A1 (en) | 2005-10-14 | 2007-04-26 | Via Technologies, Inc. | System and method for managing the computation of graphics shading operations |
US20070153015A1 (en) | 2006-01-05 | 2007-07-05 | Smedia Technology Corporation | Graphics processing unit instruction sets using a reconfigurable cache |
US20070229520A1 (en) * | 2006-03-31 | 2007-10-04 | Microsoft Corporation | Buffered Paint Systems |
US8766996B2 (en) * | 2006-06-21 | 2014-07-01 | Qualcomm Incorporated | Unified virtual addressed register file |
US8601456B2 (en) * | 2006-08-04 | 2013-12-03 | Microsoft Corporation | Software transactional protection of managed pointers |
US7750913B1 (en) * | 2006-10-24 | 2010-07-06 | Adobe Systems Incorporated | System and method for implementing graphics processing unit shader programs using snippets |
US8379032B2 (en) * | 2007-09-28 | 2013-02-19 | Qualcomm Incorporated | System and method of mapping shader variables into physical registers |
-
2008
- 2008-06-27 US US12/163,734 patent/US8581912B2/en active Active
-
2009
- 2009-06-26 WO PCT/US2009/048960 patent/WO2009158679A2/en active Application Filing
- 2009-06-26 CN CN200980124880.0A patent/CN102077251B/en active Active
- 2009-06-26 EP EP09771210.3A patent/EP2289050B1/en active Active
-
2013
- 2013-11-11 US US14/076,886 patent/US9390542B2/en active Active
-
2016
- 2016-07-12 US US15/208,328 patent/US9824484B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6704927B1 (en) | 1998-03-24 | 2004-03-09 | Sun Microsystems, Inc. | Static binding of dynamically-dispatched calls in the presence of dynamic linking and loading |
US6654951B1 (en) | 1998-12-14 | 2003-11-25 | International Business Machines Corporation | Removal of unreachable methods in object-oriented applications based on program interface analysis |
US20040237074A1 (en) | 2003-05-23 | 2004-11-25 | Microsoft Corporation | Optimizing compiler transforms for a high level shader language |
Also Published As
Publication number | Publication date |
---|---|
EP2289050A2 (en) | 2011-03-02 |
WO2009158679A8 (en) | 2010-11-18 |
EP2289050A4 (en) | 2012-01-11 |
US9390542B2 (en) | 2016-07-12 |
WO2009158679A3 (en) | 2010-05-06 |
EP2289050B1 (en) | 2019-12-04 |
US8581912B2 (en) | 2013-11-12 |
US20090322751A1 (en) | 2009-12-31 |
US20170039754A1 (en) | 2017-02-09 |
CN102077251B (en) | 2014-01-08 |
US20140063029A1 (en) | 2014-03-06 |
US9824484B2 (en) | 2017-11-21 |
CN102077251A (en) | 2011-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9824484B2 (en) | Dynamic subroutine linkage optimizing shader performance | |
US7159212B2 (en) | Systems and methods for implementing shader-driven compilation of rendering assets | |
Kessenich et al. | OpenGL Programming Guide: The official guide to learning OpenGL, version 4.5 with SPIR-V | |
US8400444B2 (en) | Method to render a root-less scene graph with a user controlled order of rendering | |
US8203558B2 (en) | Dynamic shader generation | |
US7750913B1 (en) | System and method for implementing graphics processing unit shader programs using snippets | |
US20140354658A1 (en) | Shader Function Linking Graph | |
US7463259B1 (en) | Subshader mechanism for programming language | |
Foley et al. | Spark: modular, composable shaders for graphics hardware | |
US8907979B2 (en) | Fast rendering of knockout groups using a depth buffer of a graphics processing unit | |
Marroquim et al. | Introduction to GPU Programming with GLSL | |
Martz | OpenGL distilled | |
Lalonde et al. | Shader-driven compilation of rendering assets | |
Kuo et al. | The design of LLVM-based shader compiler for embedded architecture | |
Haaser et al. | Cosmo: Intent-based composition of shader modules | |
Brumme | The OpenGL Shading Language | |
Borgo et al. | State of the Art Report on GPU Visualization | |
Lejdfors et al. | Paper I PyFX–An active effect framework | |
Tuler et al. | A high-level abstraction for graphics hardware programming | |
Lejdfors et al. | CEDGFGHIHIPRQSHGPITGUIVEWYXBbacHGPedIdfF3XIghQiWepqHEIYWed2rGsEDIQSHYgYp (QtYP uwvyx3 E t B | |
Shaders | Fundamentals of Pixel Shaders | |
Lobão et al. | Rendering Pipeline, Shaders, and Effects | |
Mirza | Issues In Introducing Micro programmable Graphics Hardware Into the Animated Production Process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980124880.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09771210 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009771210 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 8314/CHENP/2010 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |