WO2002041104A2 - An instruction set architecture to aid code generation for hardware platforms multiple heterogeneous functional units - Google Patents
An instruction set architecture to aid code generation for hardware platforms multiple heterogeneous functional units Download PDFInfo
- Publication number
- WO2002041104A2 WO2002041104A2 PCT/US2001/043255 US0143255W WO0241104A2 WO 2002041104 A2 WO2002041104 A2 WO 2002041104A2 US 0143255 W US0143255 W US 0143255W WO 0241104 A2 WO0241104 A2 WO 0241104A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hardware
- instruction
- functional units
- instructions
- mixed
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 67
- 230000006870 function Effects 0.000 claims description 33
- 239000011230 binding agent Substances 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 13
- 238000013461 design Methods 0.000 claims description 9
- 230000008878 coupling Effects 0.000 claims description 6
- 238000010168 coupling process Methods 0.000 claims description 6
- 238000005859 coupling reaction Methods 0.000 claims description 6
- 238000003491 array Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 15
- 238000011161 development Methods 0.000 abstract description 14
- 238000010586 diagram Methods 0.000 description 18
- 230000006399 behavior Effects 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/447—Target code generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
Definitions
- the present invention relates generally to microprocessors, and more particularly, to microprocessor Instruction Set Architectures (ISA) and software design processes.
- ISA microprocessor Instruction Set Architectures
- a typical mobile phone may have as much computational power as a typical high-end desktop machine, yet may be embodied in a package small enough to fit in the palm of the hand.
- This miniaturization can be accomplished by mixing a variety of hardware devices, each dedicated to performing a given task, such as digital- signal-processors (DSPs) for performing mathematically intensive filtering operations, or field-programmable-gate-array (FPGAs) for performing certain dedicated I/O operations, while a microprocessor may oversee and coordinate these activities.
- DSPs digital- signal-processors
- FPGAs field-programmable-gate-array
- Mixed hardware systems may include multiple functional units for performing different tasks. Mixed systems are similar to certain modern microprocessors, which include multiple functional units on a single chip, although not as tightly coupled.
- Reconfigurable computing refers to the use of reconfigurable hardware, such as a FPGA to serve as the computational unit.
- the FPGA may be coupled with a conventional microprocessor.
- An FPGA comprises an array of configurable logic blocks, each one of them able to perform a basic logic function.
- the reconfigurable hardware can assume a range of hardware functional behavior given a particular configuration code referred to as a bitstream. The bitstream instructs the device on how to configure itself so that it can behave in a certain way. Accordingly, such devices can be set to assume different behaviors as required by a particular application.
- the device may be configured at start-up and used with that configuration throughout the life of the application.
- the device may require reconfiguration at run-time during different phases of an executing application.
- An advantage of such devices is that they can assume different behaviors, at a hardware level, after the devices have been fabricated.
- Other devices, such as ASICs, which once fabricated, are bound to the functionality they were designed for and cannot assume different behaviors.
- a hardware device which is reconfigurable, therefore, can offer the advantage of being very fast at the task it is configured to perform, but at the same time be flexible so as to be capable of resetting to a new state of behavior by reconfiguration.
- Instruction Set Architectures were developed to help solve the above problems.
- An Instruction Set Architecture provides a functional view of the instructions implemented in an underlying set of hardware. This functional view leaves out details of the exact hardware implementation of instructions.
- a simple microprocessor Instruction Set Architecture may functionally define each instruction, such as an addition operation, with a unique n-bit identifier.
- this abstraction may also include details of the input and output operands, operand order and operand properties, among other details. These Operand properties may include operand sizes, and identify how operands are manipulated.
- some operations can only operate on register level operands instead of memory operands.
- the software engineers may use a set of tools such as editors, compilers, debuggers, etc., which are distinct from the tools the hardware engineers may use, such as CAD tools, etc.
- tools such as editors, compilers, debuggers, etc.
- CAD tools CAD tools
- Instructions not available in the hardware are dynamically paged by halting the processor and then partially configuring the hardware to include the new instruction.
- Programs for the processor are written using assembly language with each instruction having a corresponding assembly language opcode.
- new instructions mean that the assembly language needs to be appropriately augmented.
- the instruction modules used in the DISC processor are implemented a priori. This processor has a potentially unlimited instruction set bound by the number of bits used to represent the op-codes.
- the problem with DISC processors is that op-code meanings used in one program may be different to those used in another due to the potential lack of opcodes available for new instructions created. This can lead to code portability problems.
- Garp A MIPS Processor with a Reconfigurable Coprocessor, published by John R. Hauser and John Wawrzynek in IEEE Symposium on Field- Programmable Custom Computing Machines (1997) describes a system consisting of a single-issue MIPS processor core with reconfigurable hardware to be used as an accelerator.
- the MIPS core and the reconfigurable hardware are tightly coupled being resident on the same chip and sharing memory and cache.
- Explicit processor move instructions give the main processor complete control over the loading and execution of the reconfigurable hardware configurations.
- Standard ANSI-C code is used as input to the compiler, which generates code for the Garp platform.
- the compiler's target instruction set is the MIPS instruction set which includes direct access to the reconfigurable portion of the hardware.
- the Garp compiler does not compile high-level language statements into assembly code for execution by the reconfigurable portion of the processor.
- the FPGA configuration can only be invoked by using a set of new Garp-specific instructions that are unknown to a standard compiler and the programmer must provide assembly code to interface to the FPGA. There is no means for automatically generating assembly code to load a configuration, perform register allocation, execute the configuration, and read a return value from the FPGA.
- PRISC as described by R. Razdan and M. D. Smith in A High-Performance Microarchitecture with Hardware-Programmable Functional Units, Proceedings of the Twenty-Seventh Annual Microprogramming Workshop, IEEE Computer Society Press, 1994, is another approach that combines a microprocessor with reconfigurable logic.
- PRISC augments the conventional set of RISC instructions with application-specific instructions that are implemented in hardware-programmable functional units (PFUs).
- PFUs hardware-programmable functional units
- These PFU's which can only be combinational circuits with maximum delay of one CPU clock cycle, are attached directly to the CPU data path and are added in parallel with the existing functional units.
- a compilation mechanism is also disclosed that involves adding a hardware extraction step after the code generation stage of compilation.
- This hardware extraction stage involves identifying sets of sequential instructions which can be potentially implemented with a PFU. The identified set is then synthesized in hardware by a synthesis package. This collective set is replaced by a single op-code that identifies the given PFU.
- PFU information is unavailable to the compiler which can better optimize code if instruction information is available a priori.
- the op-code interface is not fully utilized since the op-code substitution is done after code generation.
- the OneChip-98 processor described by Jeffrey A. Jacob and Paul Chow in Memory Interfacing and Instruction Specification for Reconfigurable Processors, Proceedings of the 1999 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'99), at pages 145-154 (February 1999) is a scheme that couples on a single chip a 32-bit fixed logic RISC core processor with reconfigurable logic.
- the OneChip-98 processor incorporates the reconfigurable resources in the form of programmable functional units in parallel with the RISC processor's basic functional unit.
- the programmable functional units are application specific functions that may be combinational or sequential circuits. These programmable functional units are available as pre-compiled images that the programmer selects.
- the reference discloses an architecture where hardware is supplied in the form of circuits (a central services module, processor modules, and input/output modules, for example) and bitstreams for the Instruction Set Architecture.
- Application software may be executed by FPGAs configured on the processor modules.
- the reference also discloses a method of encapsulating binary machine instructions and data together with the hardware configurations required to execute the machine instructions.
- the compilation method proposed uses reconfiguration directives, #pragma meta-syntax declarations as provided by the C language, to cue the compiler to generate appropriate instructions for handling reconfigurable logic operations.
- the drawback with this method is that standard compilers cannot be used to generate code for this platform since the compiler has to be modified to handle the different syntax.
- the present invention affords a system and method for simplifying the development and deployment of high-performance embedded applications on mixed hardware systems.
- the invention provides a single, high-level development process, enabling system developers to utilize various programming languages to program both the reconfigurable devices and the microprocessor of a mixed hardware system.
- the invention provides a method for generating platform independent software code from a generic instruction set defining a set of primitive instructions for performing computational functions for execution on a mixed hardware platform includes compiling the generic instruction set; binding each primitive instruction to a particular implementation of the hardware functional units for executing the primitive instruction; coupling the primitive instructions to generate an instruction set packet; and issuing the instruction set packet to the functional units so that an appropriate primitive instruction is executed by respective ones of the functional units.
- the invention provides a method of instructing mixed hardware platforms having one or more hardware functional units for performing computational functions using a generic instruction set, each mixed hardware platform having a particular hardware device implementation for performing the computational functions that includes establishing one or more instructions that identify respective functional operations of primitive instructions capable of being performed by the one or more hardware devices in the mixed hardware platforms; binding each primitive instruction to a particular hardware functional unit implementation for performing the functional operation; coupling the instructions to generate an instruction set packet; and issuing the instruction set packet to the hardware functional units so that appropriate instructions are executed by respective ones of the hardware functional units.
- the binding may occur during compilation, or it may occur at a later time, such as prior to execution of the instruction.
- Each distinct hardware functional unit implementation for executing the primitive instructions may be determined in advance, and the functional operations of the instructions may be decoupled from the particular hardware functional unit implementations for executing the instructions.
- the instruction set packet may be decoded and appropriate instructions may be passed to appropriate hardware functional units for executing the respective instructions. Instruction set packets may be issued once every clocking cycle.
- the binding step may occur at different phases, such as during compilation and prior to executing the instruction.
- the invention affords a system for executing application code, generated from a generic instruction set, on a mixed hardware platform that may include one or more functional units configured to perform particular computational functions in accordance with an instruction.
- the functional units may include any of microprocessors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), discrete signal processors (DSPs) and other computational devices.
- the functional units may either be connected in a loosely coupled fashion or in a tightly coupled fashion.
- the system may also include a generic instruction set defining the functional operations of primitive instructions capable of being performed by the one or more functional units on the mixed hardware platform, and a hardware description information file for maintaining hardware description information relating to the particular hardware implementation of the functional units on the mixed hardware platform. Additionally, an interface layer for associating the primitive instructions in the generic instruction set with the hardware description information so that the primitive instructions can be executed by the one or more functional units on the mixed hardware platform may be provided.
- the generic instruction set may be platform independent.
- the hardware description information may be maintained independent from the functional description of the instructions in the generic instruction set such that the functional operations of primitive instructions are decoupled from a distinct hardware implementation of a functional unit for executing respective instructions.
- the invention provides a data structure for a set of instructions capable of being executed by a mixed hardware system having one or more functional units for performing computational functions may include a bit sequence for instructing the one or more functional units, the bit sequence being partitioned so as to form an instruction portion for each functional unit in the mixed hardware system, the instruction portions being rank-ordered so as to establish a predetermined order for instructing respective one of the functional units.
- the rank-ordering of instruction portions may be determined at design time.
- the invention provides a data structure for a set of instructions capable of being executed by a mixed hardware system having one or more functional units for performing computational functions may include a header packet for maintaining a count of the number of functional units targeted for a particular instruction cycle, and a series of instruction portions for instructing respective ones of the functional units.
- the instruction portions may be partitioned to include an associated header field and an associated instruction field, the header field encoding information that uniquely identifies a given functional unit, and the instruction field including an instruction for the targeted functional unit.
- the invention affords a system for generating application code from a generic instruction set to be executed by one or more functional units in a mixed hardware system that may include a compiler for receiving source code in a given language and for generating output instruction code in an intermediate format in accordance with hardware description information relating to the one or more functional units of the mixed hardware system, and a binder for binding the instruction code with particular hardware implementations of the one or more functional units for executing respective instructions to generate binary application code in a format native to the mixed hardware system that can be executed by the functional units.
- the binder may bind the instruction code with particular hardware implementations of the one or more functional units at different phases.
- Each instruction may also be bound to more than one particular hardware implementation so the instruction can be executed by different functional units in the mixed hardware system.
- the invention provides a method for generating application code for executing on a native microprocessor in a mixed hardware system that includes compiling a high-level sof ware representation of application code in accordance with a generic instruction set to produce output application code describing respective functional operations of primitive instructions to be executed by one or more functional units on the mixed hardware platform, and binding the functional operations of the primitive instructions with API function calls that implement communication with a respective functional unit in the mixed hardware platform to execute the primitive instructions on the mixed hardware system.
- the invention provides a method for generating application code for executing on a native microprocessor in a mixed hardware system includes compiling a high-level software representation of application code in accordance with a generic instruction set to produce output application code describing respective functional operations of primitive instructions to be executed by one or more functional units on the mixed hardware platform, and binding the functional operations of the primitive instructions with library calls that implement communication with a respective functional unit in the mixed hardware platform to execute the primitive instructions on the mixed hardware platform.
- the library calls may be customized for each mixed hardware platform. Accordingly, in the above aspects, the API and library calls effectively make the code particular to a given device, i.e., the microprocessor, so that the code is native to that device. Prior to API and/or library call substitution, the code may be executed by means of an interpretive engine.
- Fig. 1 is a block diagram illustrating a multi-functional based hardware system with which the invention can be utilized;
- Fig. 2 is a block diagram of an Instruction Set Architecture in accordance with the invention.
- Fig. 3 is a diagram of an instruction set using rank-ordering that may be used by the system in accordance with the invention
- Fig. 4 is a diagram of an instruction set using instruction identifiers that may be used by the system in accordance with the invention
- Fig. 5 is a diagram illustrating a preferred process flow for generating software code in accordance with the invention.
- Fig. 6 is a block diagram of a system capable of performing the process shown in Fig. 5;
- Fig. 7 is a diagram of a system that can be used to interpret the instructions generated in accordance with the invention.
- Fig. 8 is a diagram of another system that can be used to realize the ISA described in accordance with the invention.
- Fig. 9 is a diagram of a third system that can be used to realize the ISA described in accordance with the invention.
- the present invention is applicable to a wide array of implementations of mixed hardware systems comprised of multiple, distinct types of hardware functional units.
- the state of the art for general mixed systems offers development approaches with distinct design flows, tools and skill sets for each distinct hardware functional unit in the system.
- reconfigurable computing systems are used as one illustrative example of mixed systems.
- reconfigurable computing systems are merely exemplary of mixed systems of the type with which the invention may be employed, and are not intended to be limiting.
- an instruction set includes all of the instructions for a particular system.
- An instruction may be of a certain bit length (depending on the system architecture) and indicate functional behavior, such as add, subtract, and the like.
- a mixed system includes multiple hardware functional units, each of which can execute some of the instructions in the instruction set. Accordingly, instructions can be bound to a particular hardware functional unit.
- Fig. 1 is a block diagram illustrating a conventional mixed hardware system 100 with which the invention can be utilized.
- the system 100 shown in Fig. 1 includes different types of functional units connected to a data bus 101.
- the data bus 101 is responsible for transmitting relevant data and instructions between the various functional units.
- Examples of functional units may include, but are not limited to, electronic hardware devices capable of performing computational calculations, such as a microprocessor 102, a field- programmable gate array (FPGA) 103, an application-specific integrated circuit (ASIC) chip 104, and a digital signal processor (DSP) 105.
- the system may also have a master processor (not shown) that oversees the coordination of the devices.
- computational units may be resident in a tightly coupled setup, where synchronicity between respective units may be maintained by a clocking mechanism (not shown).
- the communications medium can be one of many alternatives including wireless or optical media resulting in the use of interfaces such as an air-interface, an optical-interface, a wire-interface, or other similar interface mechanisms.
- the characteristic of such a system 100 is that it is heterogeneous. That is, the functional units 102, 103, 104, 105 may display different computational capabilities both in terms of computational performance and in the range of tasks they can perform.
- communication may be asymmetric in that the time it takes for data to be transferred from one functional unit to another may not be the same system wide. Additionally, the communication of information between the devices may occur through various layers of software indirection.
- a set of instructions support hardware/software interaction. Instruction sets are commonly used in the microprocessor field to provide an abstraction of the underlying hardware to the various software applications which use that hardware. This abstraction decouples the software from the implementation details of the hardware. If any changes are made to the underlying hardware, and provided the abstract view is not changed, the software need not change to reflect the hardware changes.
- Compiler tools can use this abstract view to translate application software written in a high level language to a sequence of instructions drawn from the instruction set.
- This abstract view constructed around instructions is formally referred to as the Instruction Set Architecture (ISA).
- ISA Instruction Set Architecture
- instructions may be similarly used to abstract the functional capabilities of each of the hardware functional units.
- each instruction may be executed by more than one hardware functional unit. For example, consider a mixed system comprised of a microprocessor, FPGA and DSP functional units. Each unit may implement an integer multiplication operation (or other operation) to varying degrees of efficiency and speed and may therefore be capable of executing a multiplication instruction.
- the functional behavior of an instruction may be decoupled from the hardware implementation for the instruction.
- a set of instructions may be developed that can be executed by multiple alternative hardware functional units. This allows each instruction in a program and the particular hardware functional unit for executing the instruction to be bound at a later phase in the software development and execution cycle than occurs conventionally.
- an ADD instruction may be bound to an adder functional unit at the time the microprocessor is designed and cannot be changed. Thus, all ADD instructions in any program are always executed by the same functional unit.
- late binding implies that the hardware functional unit that executes an instruction is not determined (bound) until a much later phase in the compilation process.
- binding of instructions to hardware functional units can occur at various points in time. At one end of the spectrum the binding between instructions and hardware functional units may occur when the microprocessor and the instruction set are designed. At the other end of the spectrum the binding of instructions to hardware functional units may occur at run-time just prior to the execution of the instruction. Intermediate solutions are also feasible, for example, an instruction may be bound during software compilation.
- Fig. 2 is a block diagram illustrating a preferred system architecture 200 in accordance with the invention.
- the system hardware 201 may include different hardware functional units 202a-c for performing particular system tasks. Each hardware functional unit 202a-c may in fact have alternative hardware implementations with distinct cost and performance characteristics.
- An interface layer 203 may implement the instruction set abstraction 204. Implementing an instruction set abstraction means that the compiler's view of the hardware is one of a set of instructions, independent of which functional unit implements these instructions. In a traditional microprocessor the instruction sequence produced by the compiler is translated into a sequence of binary patterns that encodes the individual instructions and are interpreted by the microprocessor hardware in the course of executing the instructions.
- the interface layer 203 thus implements mechanisms for coordinating the execution of instructions on multiple hardware functional units. In doing so the mechanisms themselves remain transparent to the compilation process. Thus, the compiler retains the instruction set abstraction and the compilation process need not be concerned with the mechanisms for coordinating instruction execution across multiple functional units.
- a hardware description file 206 may be provided independently from the instruction set 204 and may indicate hardware specific details, such as latencies, or number of registers associated with hardware functional units 202a-c, or other hardware specific details. Since the description of the instruction set 204 and description of the hardware functional units 206 are provided separately, the hardware description information 206 can be used by a compiler to optimize the implementation of the application software 205 that is compiled for execution on a particular system platform 201. For example, the hardware descriptions 206 could be used by the interface layer 203 to schedule the execution of individual instructions on different hardware functional units 202a-c depending on the overall metric to be optimized. If, for example, the goal is to minimize execution time, the interface layer 203 may choose to have certain instructions scheduled for execution by hardware functional units to minimize execution time. Alternatively, the need to minimize power dissipation may cause the interface layer 203 to utilize a different set of hardware functional units 202a- c most likely at the expense of increased execution time.
- the instruction set 204 facilitates this code optimization as will be described below
- a mixed hardware system including a microprocessor 202a, a reconfigurable logic device 202b and a DSP 202c.
- Other functional units may be provided and the above are merely exemplary.
- An ISA may be defined for the system that may include i) a set of instructions that can be executed on the microprocessor, ii) a set of instructions that can be executed on the DSP platform, and iii) a set of instructions for which implementations are available on the reconfigurable logic device.
- This set of instructions may be available as a single instruction set to any common compiler that can be used to target this platform.
- This hardware description file 206 may include details about the microprocessor 202a, DSP 202c and reconfigurable logic device 202b; however, in accordance with the invention, the ISA abstracts the implementation details of the individual hardware functional units 202a-c. The details of how the various functional devices exchange data in a mixed hardware platform are hidden. The compilation of a source program thereby produces a sequence of instructions that can be executed on the system platform.
- the interface layer 203 is responsible for coordinating the execution of these instructions across the hardware functional units 202a-c as will be described in more detail below.
- the interface layer 203 may also perform the binding of instructions to hardware functional units 202a-c rather than have this predetermined a priori. These instructions can be scheduled to be executed on those functional units 202a-c in such a manner as to maximize an overall performance goal such as execution time or the like.
- late binding affords the ability to use a conventional compiler to target a mixed platform, but the binding stage can then be invoked very late in the process as a separate stage to generate an executable.
- Fig. 3 is a diagram illustrating an embodiment of an instruction packet 300 that may be executed in a single clock period in accordance with the invention.
- Each instruction packet includes a set of instructions.
- the instruction packet 300 of Fig. 3 may use a rank- ordered structure. Rank-ordering refers to a specific ordering of instructions in the packet. In this embodiment the ordering is such that each instruction position in the packet is associated with a specific hardware functional unit.
- a system platform may include the following functional units: a microprocessor, an ASIC, a FPGA and a DSP.
- the machine may include additional functional units; for simplification purposes the above are merely exemplary.
- the instruction packet 300 may be subdivided into bit segments 301a-d associated with respective functional units.
- the instruction set packet 300 may be 128 bits.
- the packet 300 may be subdivided into four 32-bit (or other sequence) segments 301a-d, each segment 301a-d being associated with one of the hardware functional units.
- the instruction packet 300 may consist of multiple instructions 301a-d for the given clock cycle, for invoking particular hardware functional units of the system. While four instructions are shown for the clock cycle, those skilled in the art will recognize that the number of instructions may vary depending on the system, and the above is merely exemplary.
- respective instructions 301a-d may be directed to respective functional units, e.g., the microprocessor, the ASIC, the FPGA and the DSP.
- the instruction packet may be rank-ordered, with the ordering of instructions being pre-selected a priori at the time the hardware is designed. It should be noted that the ordering of instructions in the packet 300 may change at a later time, should a different hardware design be implemented, however, at design time, the ordering of instructions for the packet 300 is preferably determined.
- a functional unit that is not invoked during the given clock cycle may receive a no-op or a null instruction from the packet 300.
- FIG. 4 is a diagram illustrating another embodiment of an instruction packet 400 using instruction identifiers that may be executed in a single clock period in accordance with the invention.
- the instruction packet 400 may include a header packet 401 that maintains a count of the number of functional units targeted for execution in the instruction cycle. Following the header packet 401, a series of instruction identifiers 402, 403 may be included with the packet 400.
- the instruction identifiers 402, 403 may be further subdivided to include an associated header field 404a, 404b, and an associated instruction field 405a, 405b.
- the header field 404a, 404b encodes information that uniquely identifies a given functional unit (such as, the microprocessor, the ASIC, the FPGA, the DSP, etc.), while the instruction field 405a, 405b contains an instruction for the targeted functional unit.
- a given functional unit such as, the microprocessor, the ASIC, the FPGA, the DSP, etc.
- the instruction field 405a, 405b contains an instruction for the targeted functional unit.
- the design eliminates the need for using no-ops or null instructions and makes more efficient use of instruction storage.
- the number of bits, k, used to represent a given instruction packet can be variable in length. While two instructions are shown in Fig. 4, those skilled in the art will recognize that any number of instructions can be provided.
- the above data structures 300, 400 for representing an instruction packet for mixed platforms are merely exemplary as a means of combining the individual instructions that are to be executed by the hardware functional units that make up the mixed hardware platform.
- the target to be a hardware platform made up of multiple functional units each capable of executing several instructions that are not unique to a given functional unit.
- an interpretive engine that is capable of deciphering the instruction packet and assigning a given instruction to a given hardware functional unit at run-time.
- a compiler can be established that can effectively compile application software to generate code for the mixed hardware platform.
- the compiler may use the hardware description information and target the platform as abstracted by the ISA. This is advantageous in that it allows a mixed platform to be treated uniformly thus leveraging conventional compiler technologies towards code generation for such systems. This also exemplifies an extreme case of late binding. That is, the meaning of a given instruction is bound to a hardware implementation at run-time. Thus, for example, if multiple ADD instructions are available on the mixed platform then it is possible that the interpretive engine may decide that any one of those ADD implementations are suitable candidates for run-time binding. This can be advantageous in systems where redundancy is of importance, such as in safety critical applications, where the malfunction of a functional unit would no longer be critical since instructions can be directed to alternative hardware functional units that can execute the same instruction.
- the hardware description file is a machine-readable file that records the characteristics of the individual functional units in a manner that can be queried by the compiler.
- Fig. 5 is a diagram showing a preferred process flow for generating a sequence of instruction packets in accordance with the invention.
- source code in a given language may be passed to a compiler (Step 501) which may generate output code in an intermediate form.
- hardware description information may be used by the compiler to aid in generating optimized intermediate code for the given system platform (Step 502).
- the optimized intermediate code resulting from the compiler may not yet be in a form suitable for execution on a particular system.
- a binding phase may be implemented (Step 503). The binding phase effectively binds an instruction to a particular hardware functional unit for executing the instruction.
- a binder module may be invoked during compilation to associate a particular hardware functional unit with the instruction.
- the binder module may associate the instruction with one of several alternative hardware implementations available for integer addition.
- This binding phase may be specific to a given hardware platform, i.e., mix of hardware functional units with specific capabilities and performance. Alternatively platform specific information can be obtained by a generic binder from an external source, such as the hardware description file.
- the output of the binding phase may result in binary application code (Step 504) in a form that may be native to the generic mixed hardware platform. That is, the binary application code may be in a form that will not run "as is" on any of the functional units comprising the mixed platform.
- the process flow described above with reference to Fig. 5 includes binding as a stage that occurs after the front-end of the compiler has generated some intermediate representation.
- Implicit binding refers to a process where the choice of which hardware functional unit is used for hosting a given instruction is decided by the choices made by the compiler in generating the intermediate code.
- no opportunity is available to the binder unit in deciding which hardware functional unit may be used for hosting a given instruction. This may be advantageous in cases where the binding phase may benefit further by the compiler making this determination based on the global information available to the compiler, which may not necessarily be available after the code has been converted to a lower level of representation.
- Fig. 6 is a block diagram of a system capable of performing the above-described process.
- a compiler 601 may receive source code 602 in a given language, and the compiler 601 may generate output code 603 in an intermediate form. During the compilation stage, the compiler 601 may obtain hardware description information from a hardware description information file 604 to aid in generating optimized intermediate code for the given system platform. However, the optimized code resulting from the compiler 601 may not yet be in a state suitable for execution on the system.
- a binder module 605 may bind an instruction to a particular implementation of the instruction as described above.
- the binder module 605 may be either tailored to include information about a given platform, such as the hardware functional units and their capabilities, or it may obtain the information from an external source, such as from the hardware description file 604.
- the output of the binder module 605 may result in binary application code 606 in a form that may be native to the generic mixed hardware platform. That is, the binary application code 606 may be in a form that will not run "as is" on any of the functional units comprising the mixed platform.
- Fig. 7 is a diagram of a preferred system capable of executing instructions generated in accordance with the invention.
- an emulator or interpretive engine 701 may be resident on a master processor.
- the master processor may itself be a functional unit in the mixed hardware platform 702, or it may be a dedicated microprocessor or hardware engine solely responsible for representing the mixed platform used to execute the instructions.
- the interpretive engine 701 may itself be a program.
- the engine 701 may mimic the fetch, decode and execute cycle of a typical microprocessor to execute the instructions. For example, consider a compiled program that is a sequence of instructions.
- the interpretive engine 701 can read the first instruction and either execute the instruction or dispatch the instruction for execution by one of the hardware functional units. The next instruction is read (fetched) by the interpretive engine and the process is repeated until the program (sequence of instructions) being interpreted terminates.
- the interpretive engine 701 may alternatively be viewed as an instruction scheduler.
- instructions may be issued to functional units at run-time so if a functional unit has failed, it may be possible to assign the instruction to another functional unit. This redundancy is possible as long as the instruction implemented on another functional unit displays the same characteristics as the execution of the instruction on the original functional unit. This form of redundancy is especially beneficial for safety critical applications.
- Fig. 8 is a diagram of another system capable of executing instructions in accordance with the invention.
- instructions not native to the microprocessor 803 are replaced by equivalent API function calls. That is, the instructions that are to be executed on other hardware functional units in the mixed platform are replaced by API function calls that implement communication with the functional unit. The purpose of communication is to provide input data to, and retrieve output data from, the hardware functional unit.
- the API module 801 may be implemented in a language that is native to the master processor. In the case where the master processor is itself a functional unit on the mixed system, preferably only those instructions native to any other hardware functional units are replaced with API equivalent function calls.
- the API module 801 hides communication details between the microprocessor and the functional units. For example, consider a MULT instruction that is to compute the product of two integers stored in registers Rl and R2 and place the result in register R3. Assuming that the MULT instruction is to be executed on a DSP in the mixed hardware system, the compiler will compile application code assuming the availability of a MULT instruction, and the compiled code will execute on the microprocessor. Continuing with this example, every occurrence of the MULT instruction is replaced by API function calls that transfer the contents of registers Rl and R2 to the DSP, reads the result from the DSP, and places the value in register R3.
- the API module implementation is customized for a particular microprocessor in a mixed system. However, the set of functions in the API and the parameters for these functions remain the same across all mixed hardware systems.
- Fig. 9 is a diagram of a third system capable of executing instructions in accordance with the invention.
- the system shown in Fig. 9 uses a precompiled library 901 of function calls that can be used to communicate with the hardware functional unit that implements an instruction not executed by the microprocessor. The purpose of such communication is to provide input data to, and retrieve output data from, the hardware functional unit.
- the library module 901 may be implemented in a language that is native to the microprocessor. Thus, instructions not executed by the microprocessor may be replaced by library calls much in the same way that the API calls are as described with reference to the system of Fig. 8.
- library calls may be customized for a particular mixed system and may not present the same functions or parameters for different mixed hardware platforms.
- a binder module 902 may receive the intermediate code generated by the compiler, and in accordance with the library of function calls 901, may produce code that can run on the native microprocessor.
- the data transfer functions may be optimized for a given mixed system including the use of library functions not available for execution on other mixed platforms. This tailoring of the library implementation renders this approach distinct from the use an API which retains the same library interface across implementations on alternative mixed platforms.
- a MULT instruction that is to compute the product of two integers stored in registers Rl and R2 and place the result in register R3.
- the compiler will compile application code assuming the availability of a MULT instruction, and the compiled code will execute on the microprocessor.
- every occurrence of the MULT instruction is replaced by library calls that transfer the contents of registers Rl and R2 to the reconfigurable device.
- the library calls may include those to first configure the device for operation as well as those for polling for completion. Such library calls may not be available for communication with another functional unit that can execute the integer multiplication, for example a DSP.
- the library module implementation as well as the interface is customized for a particular microprocessor in a mixed system.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002226901A AU2002226901A1 (en) | 2000-11-17 | 2001-11-19 | An instruction set architecture to aid code generation for hardware platforms multiple heterogeneous functional units |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71557800A | 2000-11-17 | 2000-11-17 | |
US09/715,578 | 2000-11-17 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2002041104A2 true WO2002041104A2 (en) | 2002-05-23 |
WO2002041104A3 WO2002041104A3 (en) | 2002-08-08 |
WO2002041104A9 WO2002041104A9 (en) | 2003-02-13 |
Family
ID=24874637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/043255 WO2002041104A2 (en) | 2000-11-17 | 2001-11-19 | An instruction set architecture to aid code generation for hardware platforms multiple heterogeneous functional units |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002226901A1 (en) |
WO (1) | WO2002041104A2 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5675804A (en) * | 1995-08-31 | 1997-10-07 | International Business Machines Corporation | System and method for enabling a compiled computer program to invoke an interpretive computer program |
US5764989A (en) * | 1996-02-29 | 1998-06-09 | Supercede, Inc. | Interactive software development system |
US6295561B1 (en) * | 1998-06-30 | 2001-09-25 | At&T Corp | System for translating native data structures and specific message structures by using template represented data structures on communication media and host machines |
-
2001
- 2001-11-19 WO PCT/US2001/043255 patent/WO2002041104A2/en not_active Application Discontinuation
- 2001-11-19 AU AU2002226901A patent/AU2002226901A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5675804A (en) * | 1995-08-31 | 1997-10-07 | International Business Machines Corporation | System and method for enabling a compiled computer program to invoke an interpretive computer program |
US5764989A (en) * | 1996-02-29 | 1998-06-09 | Supercede, Inc. | Interactive software development system |
US6295561B1 (en) * | 1998-06-30 | 2001-09-25 | At&T Corp | System for translating native data structures and specific message structures by using template represented data structures on communication media and host machines |
Also Published As
Publication number | Publication date |
---|---|
WO2002041104A9 (en) | 2003-02-13 |
AU2002226901A1 (en) | 2002-05-27 |
WO2002041104A3 (en) | 2002-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Marwedel et al. | Code generation for embedded processors | |
US8166450B2 (en) | Methods and apparatus for compiling instructions for a data processor | |
US20060026578A1 (en) | Programmable processor architecture hirarchical compilation | |
US6408382B1 (en) | Methods and apparatus for abbreviated instruction sets adaptable to configurable processor architecture | |
JPH10320214A (en) | Compile system and computer program product | |
EP1211598A1 (en) | Data processing apparatus, system and method | |
US20020174266A1 (en) | Parameterized application programming interface for reconfigurable computing systems | |
EP2577464B1 (en) | System and method to evaluate a data value as an instruction | |
Glossner et al. | The sandbridge sandblaster communications processor | |
US8438549B1 (en) | Data processing with microcode designed with source coding | |
Glossner et al. | Sandbridge software tools | |
Auler et al. | The case for flexible isas: unleashing hardware and software | |
Glossner et al. | A multithreaded processor architecture for SDR | |
JP5360506B2 (en) | Multi-core programming system, method and program | |
WO2002041104A2 (en) | An instruction set architecture to aid code generation for hardware platforms multiple heterogeneous functional units | |
Panainte et al. | Compiling for the molen programming paradigm | |
Stripf et al. | A compiler back-end for reconfigurable, mixed-ISA processors with clustered register files | |
Paulino et al. | A reconfigurable architecture for binary acceleration of loops with memory accesses | |
Van Praet et al. | nML: A structural processor modeling language for retargetable compilation and ASIP design | |
Chai et al. | XBT: FPGA Accelerated Binary Translation | |
Bansal | PRESENT crypto-core as closely-coupled coprocessor for efficient embedded socs | |
Biswas et al. | Code size reduction in heterogeneous-connectivity-based DSPs using instruction set extensions | |
KR100737802B1 (en) | Modular digital signal processor block and system-on-chip using thereof | |
Wu et al. | Integrating compiler and system toolkit flow for embedded VLIW DSP processors | |
Malik et al. | Tandem virtual machine—An efficient execution platform for GALS language SystemJ |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
COP | Corrected version of pamphlet |
Free format text: PAGES 1/9-9/9, DRAWINGS, REPLACED BY NEW PAGES 1/4-4/4; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: COMMUNICATION UNDER RULE 69 EPC ( EPO FORM 1205A DATED 04/09/03 ) |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase in: |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |