WO2002041104A2

WO2002041104A2 - An instruction set architecture to aid code generation for hardware platforms multiple heterogeneous functional units

Info

Publication number: WO2002041104A2
Application number: PCT/US2001/043255
Authority: WO
Inventors: Krishna Palem; Hitesh Patel; Sudhakar Yalamanchili
Original assignee: Proceler, Inc.
Priority date: 2000-11-17
Filing date: 2001-11-19
Publication date: 2002-05-23
Also published as: WO2002041104A9; AU2002226901A1; WO2002041104A3

Abstract

The present invention affords a system and method for simplifying the development and deployment of high-performance embedded applications on reconfigurable computing systems (100). The invention provides a single, high-level development process, enabling system developers to utilize various programming languages to program both the reconfigurable devices (103-105) and the microprocessor (102) of a reconfigurable computing system.

Description

AN INSTRUCTION SET ARCHITECTURE TO AID CODE GENERATION FOR HARDWARE PLATFORMS HAVING MULTIPLE HETEROGENOUS

FUNCTIONAL UNITS

The present invention relates generally to microprocessors, and more particularly, to microprocessor Instruction Set Architectures (ISA) and software design processes.

BACKGROUND OF THE INVENTION The need for increased computational power has resulted in many special purpose or dedicated electronic computational engines that are designed to perform efficiently particular functions. These devices usually outperform equivalent general purpose microprocessors in raw computational power. However, general purpose microprocessors still play a key role in system architectures. In the embedded industry, for example, it is common to find a collection of dedicated devices combined together to accomplish a particular task. Embedded systems are, generally, electronic computational engines that reside in devices ranging from toasters to^ aircraft. The characteristics of these systems, such as form-factor, size, durability and reliability, are generally dictated by the applications. As an example, a typical mobile phone may have as much computational power as a typical high-end desktop machine, yet may be embodied in a package small enough to fit in the palm of the hand. This miniaturization can be accomplished by mixing a variety of hardware devices, each dedicated to performing a given task, such as digital- signal-processors (DSPs) for performing mathematically intensive filtering operations, or field-programmable-gate-array (FPGAs) for performing certain dedicated I/O operations, while a microprocessor may oversee and coordinate these activities. Mixed hardware systems may include multiple functional units for performing different tasks. Mixed systems are similar to certain modern microprocessors, which include multiple functional units on a single chip, although not as tightly coupled.

Other systems, such as reconfigurable computing systems are one aspect, i.e., a subset, of mixed hardware systems. Reconfigurable computing refers to the use of reconfigurable hardware, such as a FPGA to serve as the computational unit. In many cases, the FPGA may be coupled with a conventional microprocessor. An FPGA comprises an array of configurable logic blocks, each one of them able to perform a basic logic function. Advantageously, the reconfigurable hardware can assume a range of hardware functional behavior given a particular configuration code referred to as a bitstream. The bitstream instructs the device on how to configure itself so that it can behave in a certain way. Accordingly, such devices can be set to assume different behaviors as required by a particular application. For example, in the simplest case, the device may be configured at start-up and used with that configuration throughout the life of the application. In other cases, the device may require reconfiguration at run-time during different phases of an executing application. An advantage of such devices is that they can assume different behaviors, at a hardware level, after the devices have been fabricated. Other devices, such as ASICs, which once fabricated, are bound to the functionality they were designed for and cannot assume different behaviors. A hardware device which is reconfigurable, therefore, can offer the advantage of being very fast at the task it is configured to perform, but at the same time be flexible so as to be capable of resetting to a new state of behavior by reconfiguration.

Software applications for systems comprising mixed hardware devices are typically created using a mixed mode of development by engineers having different skill-sets and using a mixed set of software development tools. This disparity in development paths (i.e., lack of a single cohesive development phase) makes the design and implementation of applications for mixed systems a challenge.

Additionally, while the issue of software portability is significant, it is not new to the microprocessor industry. In the early days of computing, the task of programming a microprocessor meant that a very intimate understanding of the particular underlying microprocessor architecture was necessary. Thus, software was written for a particular system, and changes within the same hardware family meant that software had to be written anew. As hardware became cheaper and software became more expensive, it was realized that a method of abstracting the hardware was necessary.

Accordingly, Instruction Set Architectures were developed to help solve the above problems. An Instruction Set Architecture provides a functional view of the instructions implemented in an underlying set of hardware. This functional view leaves out details of the exact hardware implementation of instructions. Thus, for example, a simple microprocessor Instruction Set Architecture may functionally define each instruction, such as an addition operation, with a unique n-bit identifier. Further, this abstraction may also include details of the input and output operands, operand order and operand properties, among other details. These Operand properties may include operand sizes, and identify how operands are manipulated. Thus, in register-register models of instruction set architectures some operations can only operate on register level operands instead of memory operands. By presenting a functional abstraction of the instructions in a given hardware functional unit it is possible to hide changes to the hardware implementation of instructions. This means that software that is written using the ISA of a given microprocessor is immune to changes in the hardware implementation of instructions as long as the functional view of the instructions is maintained. This functional view is used both by programmers and compiler tools to generate code for the hardware.

The problem of a coherent development phase is particularly true for reconfigurable computing systems. Reconfigurable computing techniques have been disclosed in several prior art references. Some are described herein. Generally, authors may use their own proprietary instruction sets or hardware architectures to utilize reconfigurable hardware. At design time, the interface mechanism or method of communication between the microprocessor and the reconfigurable hardware of the system is usually determined. Accordingly, a team of software engineers may focus on programming the microprocessor and another team, usually hardware engineers, may focus of the reconfigurable hardware. This may involve shared debugging efforts, shared testing efforts and so forth. To accomplish this, various processes or tools may be used during the development process that attempt to singularize the flow within the system. For example, the software engineers may use a set of tools such as editors, compilers, debuggers, etc., which are distinct from the tools the hardware engineers may use, such as CAD tools, etc. Unfortunately, this results in separate programming methods. The lack of a single development phase and the lack of portability of applications is, generally, also true of mixed hardware systems.

One conventional reconfigurable computing technique is described in A Dynamic Instruction Set Computer, published by M.J. Worthlin and B.L. Hutchings in Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines, at pages 99-107 (April 1995). The reference describes a system that supports a demand-driven modification of its instruction set. Instructions in this system are treated as removable modules that are paged in and out through partial reconfiguration as demanded by the executing program. The DISC (Dynamic Instruction Set Computer) processor is implemented entirely on a FPGA with each instruction in the instruction set implemented as independent circuit elements. The DISC processor includes a global controller that has access to a range of instruction modules which may or may not be available in the hardware. Instructions not available in the hardware are dynamically paged by halting the processor and then partially configuring the hardware to include the new instruction. Programs for the processor are written using assembly language with each instruction having a corresponding assembly language opcode. Thus, new instructions mean that the assembly language needs to be appropriately augmented. The instruction modules used in the DISC processor are implemented a priori. This processor has a potentially unlimited instruction set bound by the number of bits used to represent the op-codes. The problem with DISC processors is that op-code meanings used in one program may be different to those used in another due to the potential lack of opcodes available for new instructions created. This can lead to code portability problems. In addition, for every application, there may be a need to maintain a different code generation tool which uses the op-codes that the given application is familiar with.

Another reference, Garp: A MIPS Processor with a Reconfigurable Coprocessor, published by John R. Hauser and John Wawrzynek in IEEE Symposium on Field- Programmable Custom Computing Machines (1997) describes a system consisting of a single-issue MIPS processor core with reconfigurable hardware to be used as an accelerator. The MIPS core and the reconfigurable hardware are tightly coupled being resident on the same chip and sharing memory and cache. Explicit processor move instructions give the main processor complete control over the loading and execution of the reconfigurable hardware configurations. Standard ANSI-C code is used as input to the compiler, which generates code for the Garp platform. The compiler's target instruction set is the MIPS instruction set which includes direct access to the reconfigurable portion of the hardware. However, the Garp compiler does not compile high-level language statements into assembly code for execution by the reconfigurable portion of the processor. In addition, the FPGA configuration can only be invoked by using a set of new Garp-specific instructions that are unknown to a standard compiler and the programmer must provide assembly code to interface to the FPGA. There is no means for automatically generating assembly code to load a configuration, perform register allocation, execute the configuration, and read a return value from the FPGA.

PRISC, as described by R. Razdan and M. D. Smith in A High-Performance Microarchitecture with Hardware-Programmable Functional Units, Proceedings of the Twenty-Seventh Annual Microprogramming Workshop, IEEE Computer Society Press, 1994, is another approach that combines a microprocessor with reconfigurable logic. PRISC augments the conventional set of RISC instructions with application-specific instructions that are implemented in hardware-programmable functional units (PFUs). These PFU's, which can only be combinational circuits with maximum delay of one CPU clock cycle, are attached directly to the CPU data path and are added in parallel with the existing functional units. A compilation mechanism is also disclosed that involves adding a hardware extraction step after the code generation stage of compilation. This hardware extraction stage involves identifying sets of sequential instructions which can be potentially implemented with a PFU. The identified set is then synthesized in hardware by a synthesis package. This collective set is replaced by a single op-code that identifies the given PFU. One of the biggest drawbacks of the method is that PFU information is unavailable to the compiler which can better optimize code if instruction information is available a priori. The op-code interface is not fully utilized since the op-code substitution is done after code generation.

The OneChip-98 processor described by Jeffrey A. Jacob and Paul Chow in Memory Interfacing and Instruction Specification for Reconfigurable Processors, Proceedings of the 1999 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'99), at pages 145-154 (February 1999) is a scheme that couples on a single chip a 32-bit fixed logic RISC core processor with reconfigurable logic. Much like the PRISC processor described above, the OneChip-98 processor incorporates the reconfigurable resources in the form of programmable functional units in parallel with the RISC processor's basic functional unit. The programmable functional units are application specific functions that may be combinational or sequential circuits. These programmable functional units are available as pre-compiled images that the programmer selects. Unfortunately, two instruction flows exist, one that requires the creation of the custom instructions off-line and another when the program is written. Furthermore, new instructions mean that op-code extensions are required. That is, the code may not be portable from application to application if the op-code extensions are reused. Further, any code generation tool must also follow the particular version of the application, which may use a different set of programmable functional units. U.S. Patent No. 6,077,315, entitled Compiling System and Method for Partially Reconfigurable Computing, and issued to Greenbaum et al. (2000) makes reference to a method of compiling high-level source code into executable files for use in a dynamically reconfigurable processing unit having a selectively changeable internal hardware organization. The reference discloses an architecture where hardware is supplied in the form of circuits (a central services module, processor modules, and input/output modules, for example) and bitstreams for the Instruction Set Architecture. Application software may be executed by FPGAs configured on the processor modules. The reference also discloses a method of encapsulating binary machine instructions and data together with the hardware configurations required to execute the machine instructions. The compilation method proposed uses reconfiguration directives, #pragma meta-syntax declarations as provided by the C language, to cue the compiler to generate appropriate instructions for handling reconfigurable logic operations. The drawback with this method is that standard compilers cannot be used to generate code for this platform since the compiler has to be modified to handle the different syntax.

Accordingly, the prevalence of mixed hardware systems, including interesting hardware combinations, such as reconfigurable computing devices, all of which suffer from a lack of a unified development phase means that there is a need to provide a solution that not only offers a development path that alleviates the problem of multiple development phases, but also adequately deals with portability issues. It is to these ends that the present invention is directed.

SUMMARY OF THE INVENTION The present invention affords a system and method for simplifying the development and deployment of high-performance embedded applications on mixed hardware systems. The invention provides a single, high-level development process, enabling system developers to utilize various programming languages to program both the reconfigurable devices and the microprocessor of a mixed hardware system.

In one aspect, the invention provides a method for generating platform independent software code from a generic instruction set defining a set of primitive instructions for performing computational functions for execution on a mixed hardware platform includes compiling the generic instruction set; binding each primitive instruction to a particular implementation of the hardware functional units for executing the primitive instruction; coupling the primitive instructions to generate an instruction set packet; and issuing the instruction set packet to the functional units so that an appropriate primitive instruction is executed by respective ones of the functional units.

In another aspect, the invention provides a method of instructing mixed hardware platforms having one or more hardware functional units for performing computational functions using a generic instruction set, each mixed hardware platform having a particular hardware device implementation for performing the computational functions that includes establishing one or more instructions that identify respective functional operations of primitive instructions capable of being performed by the one or more hardware devices in the mixed hardware platforms; binding each primitive instruction to a particular hardware functional unit implementation for performing the functional operation; coupling the instructions to generate an instruction set packet; and issuing the instruction set packet to the hardware functional units so that appropriate instructions are executed by respective ones of the hardware functional units.

The binding may occur during compilation, or it may occur at a later time, such as prior to execution of the instruction. Each distinct hardware functional unit implementation for executing the primitive instructions may be determined in advance, and the functional operations of the instructions may be decoupled from the particular hardware functional unit implementations for executing the instructions. The instruction set packet may be decoded and appropriate instructions may be passed to appropriate hardware functional units for executing the respective instructions. Instruction set packets may be issued once every clocking cycle.

In another aspect, the invention provides a method for binding the functional operation of a primitive instruction with a particular hardware functional unit implementation for executing the instruction may include receiving a high-level software representation of application code; compiling the high-level software representation of application code in accordance with a generic instruction set to produce output application code of primitive instructions to be executed by one or more functional units in a mixed hardware platform; binding the primitive instructions with particular hardware functional unit implementation information so that the primitive instructions can be executed by particular functional units in the mixed hardware platform; and generating resultant software application code that can be executed on the mixed hardware platform. The binding step may occur at different phases, such as during compilation and prior to executing the instruction.

In still another aspect, the invention affords a system for executing application code, generated from a generic instruction set, on a mixed hardware platform that may include one or more functional units configured to perform particular computational functions in accordance with an instruction. The functional units may include any of microprocessors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), discrete signal processors (DSPs) and other computational devices. The functional units may either be connected in a loosely coupled fashion or in a tightly coupled fashion.

The system may also include a generic instruction set defining the functional operations of primitive instructions capable of being performed by the one or more functional units on the mixed hardware platform, and a hardware description information file for maintaining hardware description information relating to the particular hardware implementation of the functional units on the mixed hardware platform. Additionally, an interface layer for associating the primitive instructions in the generic instruction set with the hardware description information so that the primitive instructions can be executed by the one or more functional units on the mixed hardware platform may be provided.

In certain features of the invention, the generic instruction set may be platform independent. The hardware description information may be maintained independent from the functional description of the instructions in the generic instruction set such that the functional operations of primitive instructions are decoupled from a distinct hardware implementation of a functional unit for executing respective instructions.

In another aspect, the invention provides a data structure for a set of instructions capable of being executed by a mixed hardware system having one or more functional units for performing computational functions may include a bit sequence for instructing the one or more functional units, the bit sequence being partitioned so as to form an instruction portion for each functional unit in the mixed hardware system, the instruction portions being rank-ordered so as to establish a predetermined order for instructing respective one of the functional units. The rank-ordering of instruction portions may be determined at design time.

In still another aspect, the invention provides a data structure for a set of instructions capable of being executed by a mixed hardware system having one or more functional units for performing computational functions may include a header packet for maintaining a count of the number of functional units targeted for a particular instruction cycle, and a series of instruction portions for instructing respective ones of the functional units. The instruction portions may be partitioned to include an associated header field and an associated instruction field, the header field encoding information that uniquely identifies a given functional unit, and the instruction field including an instruction for the targeted functional unit.

In still another aspect, the invention affords a system for generating application code from a generic instruction set to be executed by one or more functional units in a mixed hardware system that may include a compiler for receiving source code in a given language and for generating output instruction code in an intermediate format in accordance with hardware description information relating to the one or more functional units of the mixed hardware system, and a binder for binding the instruction code with particular hardware implementations of the one or more functional units for executing respective instructions to generate binary application code in a format native to the mixed hardware system that can be executed by the functional units. The binder may bind the instruction code with particular hardware implementations of the one or more functional units at different phases. Each instruction may also be bound to more than one particular hardware implementation so the instruction can be executed by different functional units in the mixed hardware system.

In another aspect, the invention provides a method for generating application code for executing on a native microprocessor in a mixed hardware system that includes compiling a high-level sof ware representation of application code in accordance with a generic instruction set to produce output application code describing respective functional operations of primitive instructions to be executed by one or more functional units on the mixed hardware platform, and binding the functional operations of the primitive instructions with API function calls that implement communication with a respective functional unit in the mixed hardware platform to execute the primitive instructions on the mixed hardware system.

Similarly, in another aspect, the invention provides a method for generating application code for executing on a native microprocessor in a mixed hardware system includes compiling a high-level software representation of application code in accordance with a generic instruction set to produce output application code describing respective functional operations of primitive instructions to be executed by one or more functional units on the mixed hardware platform, and binding the functional operations of the primitive instructions with library calls that implement communication with a respective functional unit in the mixed hardware platform to execute the primitive instructions on the mixed hardware platform. The library calls may be customized for each mixed hardware platform. Accordingly, in the above aspects, the API and library calls effectively make the code particular to a given device, i.e., the microprocessor, so that the code is native to that device. Prior to API and/or library call substitution, the code may be executed by means of an interpretive engine.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a block diagram illustrating a multi-functional based hardware system with which the invention can be utilized;

Fig. 2 is a block diagram of an Instruction Set Architecture in accordance with the invention;

Fig. 3 is a diagram of an instruction set using rank-ordering that may be used by the system in accordance with the invention; Fig. 4 is a diagram of an instruction set using instruction identifiers that may be used by the system in accordance with the invention;

Fig. 5 is a diagram illustrating a preferred process flow for generating software code in accordance with the invention;

Fig. 6 is a block diagram of a system capable of performing the process shown in Fig. 5;

Fig. 7 is a diagram of a system that can be used to interpret the instructions generated in accordance with the invention;

Fig. 8 is a diagram of another system that can be used to realize the ISA described in accordance with the invention; and Fig. 9 is a diagram of a third system that can be used to realize the ISA described in accordance with the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT The present invention is applicable to a wide array of implementations of mixed hardware systems comprised of multiple, distinct types of hardware functional units. The state of the art for general mixed systems offers development approaches with distinct design flows, tools and skill sets for each distinct hardware functional unit in the system. To better appreciate the invention, reconfigurable computing systems are used as one illustrative example of mixed systems. However, reconfigurable computing systems are merely exemplary of mixed systems of the type with which the invention may be employed, and are not intended to be limiting.

As referred to herein, an instruction set includes all of the instructions for a particular system. An instruction may be of a certain bit length (depending on the system architecture) and indicate functional behavior, such as add, subtract, and the like. A mixed system includes multiple hardware functional units, each of which can execute some of the instructions in the instruction set. Accordingly, instructions can be bound to a particular hardware functional unit.

Fig. 1 is a block diagram illustrating a conventional mixed hardware system 100 with which the invention can be utilized. The system 100 shown in Fig. 1 includes different types of functional units connected to a data bus 101. The data bus 101 is responsible for transmitting relevant data and instructions between the various functional units. Examples of functional units may include, but are not limited to, electronic hardware devices capable of performing computational calculations, such as a microprocessor 102, a field- programmable gate array (FPGA) 103, an application-specific integrated circuit (ASIC) chip 104, and a digital signal processor (DSP) 105. The system may also have a master processor (not shown) that oversees the coordination of the devices.

These computational units may be resident in a tightly coupled setup, where synchronicity between respective units may be maintained by a clocking mechanism (not shown). In a loosely coupled setup of the same system, the communications medium can be one of many alternatives including wireless or optical media resulting in the use of interfaces such as an air-interface, an optical-interface, a wire-interface, or other similar interface mechanisms. Regardless, the characteristic of such a system 100 is that it is heterogeneous. That is, the functional units 102, 103, 104, 105 may display different computational capabilities both in terms of computational performance and in the range of tasks they can perform. Furthermore, communication may be asymmetric in that the time it takes for data to be transferred from one functional unit to another may not be the same system wide. Additionally, the communication of information between the devices may occur through various layers of software indirection.

In accordance with the invention, a set of instructions support hardware/software interaction. Instruction sets are commonly used in the microprocessor field to provide an abstraction of the underlying hardware to the various software applications which use that hardware. This abstraction decouples the software from the implementation details of the hardware. If any changes are made to the underlying hardware, and provided the abstract view is not changed, the software need not change to reflect the hardware changes.

Compiler tools can use this abstract view to translate application software written in a high level language to a sequence of instructions drawn from the instruction set. This abstract view constructed around instructions is formally referred to as the Instruction Set Architecture (ISA). For mixed hardware platforms, instructions may be similarly used to abstract the functional capabilities of each of the hardware functional units. Furthermore, each instruction may be executed by more than one hardware functional unit. For example, consider a mixed system comprised of a microprocessor, FPGA and DSP functional units. Each unit may implement an integer multiplication operation (or other operation) to varying degrees of efficiency and speed and may therefore be capable of executing a multiplication instruction.

Further, in accordance with the invention, the functional behavior of an instruction may be decoupled from the hardware implementation for the instruction. Thus, a set of instructions may be developed that can be executed by multiple alternative hardware functional units. This allows each instruction in a program and the particular hardware functional unit for executing the instruction to be bound at a later phase in the software development and execution cycle than occurs conventionally. For example, in an embedded microprocessor, an ADD instruction may be bound to an adder functional unit at the time the microprocessor is designed and cannot be changed. Thus, all ADD instructions in any program are always executed by the same functional unit. In contrast, late binding implies that the hardware functional unit that executes an instruction is not determined (bound) until a much later phase in the compilation process. In accordance with the invention, binding of instructions to hardware functional units can occur at various points in time. At one end of the spectrum the binding between instructions and hardware functional units may occur when the microprocessor and the instruction set are designed. At the other end of the spectrum the binding of instructions to hardware functional units may occur at run-time just prior to the execution of the instruction. Intermediate solutions are also feasible, for example, an instruction may be bound during software compilation.

Fig. 2 is a block diagram illustrating a preferred system architecture 200 in accordance with the invention. In the system 200 of Fig. 2, the system hardware 201 may include different hardware functional units 202a-c for performing particular system tasks. Each hardware functional unit 202a-c may in fact have alternative hardware implementations with distinct cost and performance characteristics. An interface layer 203 may implement the instruction set abstraction 204. Implementing an instruction set abstraction means that the compiler's view of the hardware is one of a set of instructions, independent of which functional unit implements these instructions. In a traditional microprocessor the instruction sequence produced by the compiler is translated into a sequence of binary patterns that encodes the individual instructions and are interpreted by the microprocessor hardware in the course of executing the instructions. However, on a mixed hardware platform there is the additional task of dispatching instructions to a hardware functional unit for execution. The interface layer 203 thus implements mechanisms for coordinating the execution of instructions on multiple hardware functional units. In doing so the mechanisms themselves remain transparent to the compilation process. Thus, the compiler retains the instruction set abstraction and the compilation process need not be concerned with the mechanisms for coordinating instruction execution across multiple functional units.

A hardware description file 206 may be provided independently from the instruction set 204 and may indicate hardware specific details, such as latencies, or number of registers associated with hardware functional units 202a-c, or other hardware specific details. Since the description of the instruction set 204 and description of the hardware functional units 206 are provided separately, the hardware description information 206 can be used by a compiler to optimize the implementation of the application software 205 that is compiled for execution on a particular system platform 201. For example, the hardware descriptions 206 could be used by the interface layer 203 to schedule the execution of individual instructions on different hardware functional units 202a-c depending on the overall metric to be optimized. If, for example, the goal is to minimize execution time, the interface layer 203 may choose to have certain instructions scheduled for execution by hardware functional units to minimize execution time. Alternatively, the need to minimize power dissipation may cause the interface layer 203 to utilize a different set of hardware functional units 202a- c most likely at the expense of increased execution time. The instruction set 204 facilitates this code optimization as will be described below.

To illustrate an advantage of performing late binding in accordance with the invention, consider a mixed hardware system including a microprocessor 202a, a reconfigurable logic device 202b and a DSP 202c. Other functional units may be provided and the above are merely exemplary. An ISA may be defined for the system that may include i) a set of instructions that can be executed on the microprocessor, ii) a set of instructions that can be executed on the DSP platform, and iii) a set of instructions for which implementations are available on the reconfigurable logic device. This set of instructions may be available as a single instruction set to any common compiler that can be used to target this platform. In addition, hardware specific details that the compiler may find useful in targeting this platform may also be provided in a format that the compiler can decipher. This hardware description file 206 may include details about the microprocessor 202a, DSP 202c and reconfigurable logic device 202b; however, in accordance with the invention, the ISA abstracts the implementation details of the individual hardware functional units 202a-c. The details of how the various functional devices exchange data in a mixed hardware platform are hidden. The compilation of a source program thereby produces a sequence of instructions that can be executed on the system platform. The interface layer 203 is responsible for coordinating the execution of these instructions across the hardware functional units 202a-c as will be described in more detail below. Advantageously, the interface layer 203 may also perform the binding of instructions to hardware functional units 202a-c rather than have this predetermined a priori. These instructions can be scheduled to be executed on those functional units 202a-c in such a manner as to maximize an overall performance goal such as execution time or the like. Thus, late binding affords the ability to use a conventional compiler to target a mixed platform, but the binding stage can then be invoked very late in the process as a separate stage to generate an executable.

Fig. 3 is a diagram illustrating an embodiment of an instruction packet 300 that may be executed in a single clock period in accordance with the invention. Each instruction packet includes a set of instructions. The instruction packet 300 of Fig. 3 may use a rank- ordered structure. Rank-ordering refers to a specific ordering of instructions in the packet. In this embodiment the ordering is such that each instruction position in the packet is associated with a specific hardware functional unit. For example, a system platform may include the following functional units: a microprocessor, an ASIC, a FPGA and a DSP. Those skilled in the art will recognize that the machine may include additional functional units; for simplification purposes the above are merely exemplary. According to a rank- ordered instruction packet, the instruction packet 300 shown in Fig. 3 may be subdivided into bit segments 301a-d associated with respective functional units. For example, the instruction set packet 300 may be 128 bits. In such an embodiment, the packet 300 may be subdivided into four 32-bit (or other sequence) segments 301a-d, each segment 301a-d being associated with one of the hardware functional units. Accordingly, the instruction packet 300 may consist of multiple instructions 301a-d for the given clock cycle, for invoking particular hardware functional units of the system. While four instructions are shown for the clock cycle, those skilled in the art will recognize that the number of instructions may vary depending on the system, and the above is merely exemplary.

In Fig. 3, respective instructions 301a-d may be directed to respective functional units, e.g., the microprocessor, the ASIC, the FPGA and the DSP. The instruction packet may be rank-ordered, with the ordering of instructions being pre-selected a priori at the time the hardware is designed. It should be noted that the ordering of instructions in the packet 300 may change at a later time, should a different hardware design be implemented, however, at design time, the ordering of instructions for the packet 300 is preferably determined. Using such an instruction packet structure, a functional unit that is not invoked during the given clock cycle may receive a no-op or a null instruction from the packet 300. Such an encoding of instructions enables multiple functional units to be initiated concurrently executing multiple instructions and thereby speedup the execution of the program. Fig. 4 is a diagram illustrating another embodiment of an instruction packet 400 using instruction identifiers that may be executed in a single clock period in accordance with the invention. The instruction packet 400 may include a header packet 401 that maintains a count of the number of functional units targeted for execution in the instruction cycle. Following the header packet 401, a series of instruction identifiers 402, 403 may be included with the packet 400. The instruction identifiers 402, 403 may be further subdivided to include an associated header field 404a, 404b, and an associated instruction field 405a, 405b. The header field 404a, 404b encodes information that uniquely identifies a given functional unit (such as, the microprocessor, the ASIC, the FPGA, the DSP, etc.), while the instruction field 405a, 405b contains an instruction for the targeted functional unit. Using this instruction packet structure, it is possible to exclude instructions for functional units not targeted for execution in a given clock cycle. Thus, the design eliminates the need for using no-ops or null instructions and makes more efficient use of instruction storage. Regardless of the instruction packet implementation (such as described with reference to Figs. 3 or 4), the number of bits, k, used to represent a given instruction packet can be variable in length. While two instructions are shown in Fig. 4, those skilled in the art will recognize that any number of instructions can be provided.

The above data structures 300, 400 for representing an instruction packet for mixed platforms are merely exemplary as a means of combining the individual instructions that are to be executed by the hardware functional units that make up the mixed hardware platform. As an example, consider the target to be a hardware platform made up of multiple functional units each capable of executing several instructions that are not unique to a given functional unit. Also, consider that there exists an interpretive engine that is capable of deciphering the instruction packet and assigning a given instruction to a given hardware functional unit at run-time. A compiler can be established that can effectively compile application software to generate code for the mixed hardware platform.

To generate optimized code for a mixed hardware platform, the compiler may use the hardware description information and target the platform as abstracted by the ISA. This is advantageous in that it allows a mixed platform to be treated uniformly thus leveraging conventional compiler technologies towards code generation for such systems. This also exemplifies an extreme case of late binding. That is, the meaning of a given instruction is bound to a hardware implementation at run-time. Thus, for example, if multiple ADD instructions are available on the mixed platform then it is possible that the interpretive engine may decide that any one of those ADD implementations are suitable candidates for run-time binding. This can be advantageous in systems where redundancy is of importance, such as in safety critical applications, where the malfunction of a functional unit would no longer be critical since instructions can be directed to alternative hardware functional units that can execute the same instruction.

Details regarding the number of registers present, the access permissions, register sizes, hardware latencies, and other parameters are indicated by the hardware description information 206 (Fig. 2). The hardware description file is a machine-readable file that records the characteristics of the individual functional units in a manner that can be queried by the compiler.

Fig. 5 is a diagram showing a preferred process flow for generating a sequence of instruction packets in accordance with the invention. Broadly, source code in a given language may be passed to a compiler (Step 501) which may generate output code in an intermediate form. During the compilation stage, hardware description information may be used by the compiler to aid in generating optimized intermediate code for the given system platform (Step 502). However, the optimized intermediate code resulting from the compiler may not yet be in a form suitable for execution on a particular system. To execute the code on a given system, a binding phase may be implemented (Step 503). The binding phase effectively binds an instruction to a particular hardware functional unit for executing the instruction. That is, because the functional behavior of an instruction is decoupled from the hardware implementation used for executing that instruction during the design phase, a binder module may be invoked during compilation to associate a particular hardware functional unit with the instruction. For example, in the context of an integer ADD instruction, the binder module may associate the instruction with one of several alternative hardware implementations available for integer addition. This binding phase may be specific to a given hardware platform, i.e., mix of hardware functional units with specific capabilities and performance. Alternatively platform specific information can be obtained by a generic binder from an external source, such as the hardware description file. The output of the binding phase may result in binary application code (Step 504) in a form that may be native to the generic mixed hardware platform. That is, the binary application code may be in a form that will not run "as is" on any of the functional units comprising the mixed platform.

The process flow described above with reference to Fig. 5 includes binding as a stage that occurs after the front-end of the compiler has generated some intermediate representation. However, in another aspect of the invention, it is possible that implicit binding may occur at a later stage of the compiler front-end operation as opposed to occurring after this stage. Implicit binding refers to a process where the choice of which hardware functional unit is used for hosting a given instruction is decided by the choices made by the compiler in generating the intermediate code. Thus, no opportunity is available to the binder unit in deciding which hardware functional unit may be used for hosting a given instruction. This may be advantageous in cases where the binding phase may benefit further by the compiler making this determination based on the global information available to the compiler, which may not necessarily be available after the code has been converted to a lower level of representation.

Fig. 6 is a block diagram of a system capable of performing the above-described process. A compiler 601 may receive source code 602 in a given language, and the compiler 601 may generate output code 603 in an intermediate form. During the compilation stage, the compiler 601 may obtain hardware description information from a hardware description information file 604 to aid in generating optimized intermediate code for the given system platform. However, the optimized code resulting from the compiler 601 may not yet be in a state suitable for execution on the system. To execute the code on a given system, a binder module 605 may bind an instruction to a particular implementation of the instruction as described above. The binder module 605 may be either tailored to include information about a given platform, such as the hardware functional units and their capabilities, or it may obtain the information from an external source, such as from the hardware description file 604. The output of the binder module 605 may result in binary application code 606 in a form that may be native to the generic mixed hardware platform. That is, the binary application code 606 may be in a form that will not run "as is" on any of the functional units comprising the mixed platform.

Fig. 7 is a diagram of a preferred system capable of executing instructions generated in accordance with the invention. As shown in Fig. 7, an emulator or interpretive engine 701 may be resident on a master processor. The master processor may itself be a functional unit in the mixed hardware platform 702, or it may be a dedicated microprocessor or hardware engine solely responsible for representing the mixed platform used to execute the instructions. For example, the interpretive engine 701 may itself be a program. The engine 701 may mimic the fetch, decode and execute cycle of a typical microprocessor to execute the instructions. For example, consider a compiled program that is a sequence of instructions. The interpretive engine 701 can read the first instruction and either execute the instruction or dispatch the instruction for execution by one of the hardware functional units. The next instruction is read (fetched) by the interpretive engine and the process is repeated until the program (sequence of instructions) being interpreted terminates.

The interpretive engine 701 may alternatively be viewed as an instruction scheduler. Advantageously, instructions may be issued to functional units at run-time so if a functional unit has failed, it may be possible to assign the instruction to another functional unit. This redundancy is possible as long as the instruction implemented on another functional unit displays the same characteristics as the execution of the instruction on the original functional unit. This form of redundancy is especially beneficial for safety critical applications.

Fig. 8 is a diagram of another system capable of executing instructions in accordance with the invention. In the system of Fig. 8, instructions not native to the microprocessor 803 are replaced by equivalent API function calls. That is, the instructions that are to be executed on other hardware functional units in the mixed platform are replaced by API function calls that implement communication with the functional unit. The purpose of communication is to provide input data to, and retrieve output data from, the hardware functional unit. The API module 801 may be implemented in a language that is native to the master processor. In the case where the master processor is itself a functional unit on the mixed system, preferably only those instructions native to any other hardware functional units are replaced with API equivalent function calls.

Preferably, the API module 801 hides communication details between the microprocessor and the functional units. For example, consider a MULT instruction that is to compute the product of two integers stored in registers Rl and R2 and place the result in register R3. Assuming that the MULT instruction is to be executed on a DSP in the mixed hardware system, the compiler will compile application code assuming the availability of a MULT instruction, and the compiled code will execute on the microprocessor. Continuing with this example, every occurrence of the MULT instruction is replaced by API function calls that transfer the contents of registers Rl and R2 to the DSP, reads the result from the DSP, and places the value in register R3. The API module implementation is customized for a particular microprocessor in a mixed system. However, the set of functions in the API and the parameters for these functions remain the same across all mixed hardware systems.

Fig. 9 is a diagram of a third system capable of executing instructions in accordance with the invention. The system shown in Fig. 9 uses a precompiled library 901 of function calls that can be used to communicate with the hardware functional unit that implements an instruction not executed by the microprocessor. The purpose of such communication is to provide input data to, and retrieve output data from, the hardware functional unit. The library module 901 may be implemented in a language that is native to the microprocessor. Thus, instructions not executed by the microprocessor may be replaced by library calls much in the same way that the API calls are as described with reference to the system of Fig. 8. The distinction with the system in Fig. 9 is that library calls may be customized for a particular mixed system and may not present the same functions or parameters for different mixed hardware platforms.

A binder module 902 may receive the intermediate code generated by the compiler, and in accordance with the library of function calls 901, may produce code that can run on the native microprocessor. The data transfer functions may be optimized for a given mixed system including the use of library functions not available for execution on other mixed platforms. This tailoring of the library implementation renders this approach distinct from the use an API which retains the same library interface across implementations on alternative mixed platforms.

For example, consider a MULT instruction that is to compute the product of two integers stored in registers Rl and R2 and place the result in register R3. Assuming that the MULT instruction is to be executed on a reconfigurable device in the mixed hardware system, the compiler will compile application code assuming the availability of a MULT instruction, and the compiled code will execute on the microprocessor. Continuing with this example, every occurrence of the MULT instruction is replaced by library calls that transfer the contents of registers Rl and R2 to the reconfigurable device. The library calls may include those to first configure the device for operation as well as those for polling for completion. Such library calls may not be available for communication with another functional unit that can execute the integer multiplication, for example a DSP. The library module implementation as well as the interface is customized for a particular microprocessor in a mixed system.

Although the present invention has been described with respect to preferred embodiments, it should be understood that various changes, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is described in the appended claims.

Claims

WHAT IS CLAIMED IS:

1. A method for generating platform independent software code from a generic instruction set defining primitive instructions for performing computational functions for execution on a mixed hardware platform, the mixed hardware platform including one or more hardware functional units, the method comprising the steps of: compiling the generic instruction set; binding each primitive instruction in the generic instruction set to a particular implementation of the hardware functional units for executing the primitive instruction; coupling the primitive instructions to generate an instruction set packet; and issuing the instruction set packet to the functional units so that an appropriate primitive instruction is executed by respective ones of the functional units.

2. The method of Claim 1, wherein the binding step is performed during compilation of the generic instruction set.

3. The method of Claim 1, wherein the binding step is performed prior to execution of the primitive instruction.

4. The method of Claim 1, wherein the step of issuing the instruction set packet is performed once every clocking cycle.

5. The method of Claim 1, further comprising the steps of determining an implementation of each particular hardware functional unit in the mixed hardware platform for executing the primitive instructions, and binding a primitive instruction to an associated implementation in a hardware functional unit.

6. The method of Claim 5, wherein an implementation of each particular hardware functional unit in the mixed hardware platform is stored in a database with hardware description information relating to each implementation.

7. The method of Claim 1, further comprising the steps of decoding the instruction set packet and passing appropriate primitive instructions to appropriate hardware functional units for executing the respective primitive instructions.

8. A method of instructing mixed hardware platforms having one or more hardware functional units for performing computational functions using a generic instruction set, each mixed hardware platform having a particular hardware functional unit implementation for performing the computational functions, comprising the steps of: establishing one or more instructions that identify respective functional operations of primitive instructions capable of being performed by the one or more hardware functional units in the mixed hardware platform; binding each instruction in the generic instruction set to a particular hardware functional unit implementation for performing the functional operation; coupling the instructions to generate an instruction set packet; and issuing the instruction set packet to the hardware functional units so that an appropriate instruction is executed by respective ones of the hardware functional units.

9. The method of Claim 8, wherein the binding step is performed during compilation of the generic instruction set.

10. The method of Claim 8, wherein the binding step is performed prior to execution of the instruction.

11. The method of Claim 8, wherein the step of issuing the instruction set packet is performed once every clocking cycle.

12. The method of Claim 8, further comprising the steps of determining an implementation of each hardware functional unit in the mixed hardware system for executing the instructions, and binding an instruction to an associated implementation of a hardware functional unit.

13. The method of Claim 8, further comprising the steps of decoding the instruction set packet and passing appropriate instructions to appropriate hardware functional units for executing the respective instructions.

14. A method of instructing mixed hardware platforms having one or more hardware functional units for performing computational functions using a generic instruction set, each mixed hardware platform having a particular hardware functional unit implementation for performing the computational functions, comprising the steps of: establishing one or more instructions that identify respective basic tasks capable of being performed by the one or more hardware functional units in the mixed hardware platform; binding the instruction to a particular hardware functional unit implementation for executing a basic task associated with the instruction; coupling the instructions to generate an instruction set packet; and issuing the instruction set packet to the hardware functional units so that an appropriate instruction is executed by respective ones of the hardware functional units.

15. The method of Claim 14, wherein the binding step is performed during compilation of the generic instruction set.

16. The method of Claim 14, wherein the binding step is performed prior to execution of the instruction.

17. The method of Claim 14, wherein the step of issuing the instruction set packet is performed once every clocking cycle.

18. The method of Claim 14, further comprising the steps of determining an implementation of each hardware functional unit in the mixed hardware system, and binding an instruction to an associated implementation of a hardware functional unit.

19. The method of Claim 14, further comprising the steps of decoding the instruction set packet and passing appropriate instructions to appropriate hardware functional units for executing the respective instructions.

20. A method of instructing mixed hardware platforms having one or more hardware functional units for performing computational functions using a generic instruction set, each mixed hardware platform having a particular hardware functional unit implementation for performing the computational functions, comprising the steps of: establishing one or more instructions that identify respective functional operations of primitive instructions capable of being performed by the one or more hardware functional units in the mixed hardware platform; binding each primitive instruction to a particular hardware functional unit implementation for executing the primitive instruction; coupling the primitive instructions to generate a collection of instruction sequences each of which is native to a given hardware functional unit in the mixed platform; and issuing the instruction sequences to the hardware functional units so that appropriate instructions are executed by respective ones of the hardware functional units in the mixed platform.

21. The method of Claim 20, wherein the binding step is performed during compilation of the generic instruction set.

22. The method of Claim 20, wherein the binding step is performed prior to execution of the instruction.

23. The method of Claim 20, wherein the step of issuing the collection of instruction sequences is performed once every clocking cycle.

24. The method of Claim 20, further comprising the steps of determining an implementation of each hardware functional unit in the mixed hardware system for executing the primitive instructions, and binding a primitive instruction with an associated implementation of a hardware functional unit.

25. The method of Claim 20, further comprising the steps of decoding the collection of instruction sequences and passing appropriate instructions to appropriate hardware functional units for executing the respective instructions.

26. A method for binding the functional operation of a primitive instruction with a particular hardware functional unit implementation for executing the primitive instruction, comprising the steps of: receiving a high-level software representation of application code; compiling the high-level software representation of application code in accordance with a generic instruction set to produce output application code of primitive instructions to be executed on a mixed hardware platform; binding the primitive instructions with particular hardware functional unit implementation information so that the primitive instructions can be executed by particular hardware functional units in the mixed hardware platform; and generating resultant software application code that can be executed on the mixed hardware platform.

27. The method of Claim 26, wherein the binding step occurs during compilation of the high-level software representation of the application code.

28. The method of Claim 26, wherein the binding step occurs prior to execution of the instruction.

29. A system for executing application code from a generic instruction set on a mixed hardware platform, comprising: one or more functional units configured to perform computational functions in accordance with an instruction; a generic instruction set defining the functional operations of primitive instructions capable of being performed by the one or more functional units on the mixed hardware platform; a hardware description information file for maintaining hardware description information relating to the particular implementation of the hardware functional units on the mixed hardware platform; and an interface layer for associating the primitive instructions with the hardware description information so that the primitive instructions can be executed by the one or more functional units on the mixed hardware platform.

30. The system of Claim 29, wherein the functional units include any of microprocessors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), discrete signal processors (DSPs) and other computational devices.

31. The system of Claim 30, wherein the functional units are connected in a loosely coupled fashion.

32. The system of Claim 30, wherein the functional units are connected in a tightly coupled fashion.

33. The system of Claim 29, wherein the generic instruction set is platform independent.

34. The system of Claim 29, wherein the hardware description information is maintained independent from the generic instruction set such that the functional operations of instructions in the generic instruction set are decoupled from a particular hardware implementation of a functional unit for executing respective instructions in the generic instruction set.

35. A data structure for an instruction set capable of being executed by a mixed hardware system having one or more functional units for performing computational functions, comprising: a bit sequence for instructing the one or more functional units, the bit sequence being partitioned to form an instruction portion for each functional unit in the reconfigurable computing system, the instruction portions being rank-ordered to establish a predetermined order for instructing respective ones of the functional units in the mixed hardware system.

36. The data structure of Claim 35, wherein the rank-ordering of instruction portions is determined at design time.

37. A data structure for an instruction set capable of being executed by a mixed hardware system having one or more functional units for performing computational functions, comprising: a header packet for maintaining a count of the number of functional units in the mixed hardware system targeted for a particular instruction cycle; and a series of instruction portions for instructing respective ones of the functional units, the instruction portions being partitioned to include an associated header field and an associated instruction field, the header field encoding information to uniquely identify a given functional unit in the mixed hardware system, and the instruction field including an instruction to be executed by the targeted functional unit.

38. A system for generating application code from a generic instruction set to be executed by one or more functional units in a mixed hardware system, comprising: a compiler for receiving source code in a given language and for generating output instruction code in an intermediate format in accordance with hardware description information relating to the one or more functional units of the mixed hardware system; and a binder for binding the functional operations of the instruction code with particular hardware implementations of the one or more functional units for executing respective instructions, thereby generating binary application code in a format native to the mixed hardware system and capable of being executed by the one or more functional units in the mixed hardware system.

39. The system of Claim 38, wherein the binder binds the instruction code with particular hardware implementations of the one or more functional units in the mixed hardware system during compilation.

40. The system of Claim 38, wherein the binder binds the instruction code with particular hardware implementations of the one or more functional units in the mixed hardware system prior to execution of the instruction code.

41. The system of Claim 38, wherein each instruction in the instruction code can be bound to more than one hardware implementation of the functional units in the mixed hardware system so that more than one of the functional units in the mixed hardware system can execute the instruction.

42. A method for generating application code for executing on a native microprocessor in a mixed hardware system, the method comprising the steps of: compiling a high-level software representation of application code in accordance with a generic instruction set to produce output application code describing respective functional operations of primitive instructions to be executed by one or more functional units on the mixed hardware platform; and binding the primitive instructions with API function calls that implement communication with a respective functional unit in the mixed hardware platform to execute the primitive instructions on the mixed hardware system.

43. The method of Claim 42, wherein the API function calls are implemented in a manner unique to a host microprocessor.

44. A method for generating application code for executing on a native microprocessor in a mixed hardware system, the method comprising the steps of: compiling a high-level software representation of application code in accordance with a generic instruction set to produce output application code describing respective functional operations of primitive instructions to be executed by one or more functional units on the mixed hardware platform; and binding the primitive instructions with library calls that implement communication with a respective functional unit in the mixed hardware platform to execute the primitive instructions on the mixed hardware platform.

45. The method of Claim 44, wherein the library calls are implemented in a manner unique to a host microprocessor.

46. The method of Claim 44, wherein the library calls are customized for each mixed hardware platform.