WO2003098434A2 - Method for ordering processor operations for modulo-scheduling - Google Patents

Method for ordering processor operations for modulo-scheduling Download PDF

Info

Publication number
WO2003098434A2
WO2003098434A2 PCT/US2003/015167 US0315167W WO03098434A2 WO 2003098434 A2 WO2003098434 A2 WO 2003098434A2 US 0315167 W US0315167 W US 0315167W WO 03098434 A2 WO03098434 A2 WO 03098434A2
Authority
WO
WIPO (PCT)
Prior art keywords
operations
predecessor
ordered list
node
current operation
Prior art date
Application number
PCT/US2003/015167
Other languages
French (fr)
Other versions
WO2003098434A3 (en
Inventor
Ralph D. Hill
Original Assignee
Quicksilver Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quicksilver Technology, Inc. filed Critical Quicksilver Technology, Inc.
Priority to AU2003239454A priority Critical patent/AU2003239454A1/en
Publication of WO2003098434A2 publication Critical patent/WO2003098434A2/en
Publication of WO2003098434A3 publication Critical patent/WO2003098434A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level
    • G06F8/4452Software pipelining

Definitions

  • the present invention generally relates to computer processing and more specifically to a system and method for ordering operations to be scheduled by clustering related operations in an ordering list.
  • Instruction scheduling involves assigning operations from an original sequence of operations to specific functional units at specific times in a way to make efficient use of hardware resources.
  • the scheduled operations produce the same result as executing the operations sequentially in an original order but the operations may not be scheduled in that original order.
  • the goal is to efficiently use hardware resources and retain the original result that would be obtained by executing the operations sequentially.
  • Instruction scheduling operates by scheduling an instruction that is executed for each clock cycle of a processor. Each instruction includes a slot for each functional unit of the processor where an operation may be scheduled. The instruction scheduler then schedules operations for a functional unit during a clock cycle. Typically, instruction schedulers attempt to schedule operations where a minimum number of instructions are used and operations are scheduled for as many functional units as possible for each instruction used. [04] The process of instruction scheduling orders operations in a scheduling order list, which is typically a list of operations in the order the operations should be executed if they were executed sequentially. Typically, a data dependence graph (DDG) is used to order operations to be scheduled. The DDG is arranged based on the dependencies among a group of operations for a program code.
  • DDG data dependence graph
  • the dependencies of the DDG are represented by edges, which represent delays, i.e., the time delay required between the start of a predecessor operation and the start of a successor operation connected by the edge.
  • Operations in the DDG are assigned heights to establish a priority value for the operation.
  • a height indicates an overall dependency value based on the values of all the edges dependent upon a specific operation.
  • the operation with the greatest height in the DDG becomes the highest priority operation for scheduling.
  • the operations are ordered starting from operations with the greatest height to operations with the lowest height. Operations are then scheduled sequentially from the first ordered operation to the last ordered operation. [05]
  • the above approach may work when scheduling a small amount of functional units for each clock cycle.
  • a resulting schedule using the above method results in a large amount of data movement because operations that use the same variables may not be grouped together. This results in a schedule that requires a large number of data movement resources.
  • the processor must include a large number of data movement resources, or a schedule is produced that is inefficient in its use of time because data movement resources are exhausted and the schedule was to be extended in time to compensate.
  • a method for ordering a plurality of operations that are dependent upon one another in an ordered list to be used for scheduling comprises identifying a current operation in the plurality of operations that is not in the ordered list. Also, it is determined if the current operation has any predecessor operations that are not in the ordered list. If the current operation has predecessor operations, predecessor operations are added to the ordered list. The current operation is then added to the ordered list and a successor operation to the current operation is identified. The successor operation is now considered the current operation and the process reiterates to determine if the current operation has any predecessor operations and continues as above. The process continues until a current operation does not have any successor operations.
  • a method for ordering a plurality of operations that are dependent upon one another in an ordered list to be used for scheduling comprises: (a) identifying a current operation in the plurality of operations that is not in the ordered list; (b) determining if the current operation has any predecessor operations that are not in the ordered list; (c) if the current operation has predecessor operations, adding the predecessor operations to the ordered list; (d) adding the operation to the ordered list; (e) identifying a successor operation to the current operation, wherein the successor operation is considered the current operation; and (f) performing steps (b)-(e) until a successor operation is not identified in step (e).
  • computer program products stored on tangible media that direct a processor to order operations as described below are provided.
  • FIG. 1 discloses a system for ordering operations according to one embodiment
  • Fig. 2A illustrates an example of a DDG according to one embodiment
  • Fig. 2B illustrates the DDG for the operations of Fig. 2A with the height shown inside each node;
  • FIG. 2C illustrates the DDG of Figs. 1 A and IB with a maximum predecessor height shown inside each node;
  • FIG. 3 illustrates a method for computing a scheduling priority according to one embodiment
  • Fig. 4A illustrates the method for computing scheduling priority of Fig. 3 in more detail according to one embodiment
  • FIG. 4B illustrates the descend method according to one embodiment
  • Fig. 4C illustrates the climb method according to one embodiment
  • Fig. 5 illustrates an output of a resultant instruction schedule for the ordered operations and program code.
  • Fig. 1 discloses a system 100 for ordering operations according to one embodiment.
  • System 100 is a computing device that outputs an ordered list of operations that may be used to schedule operations in executable instructions.
  • Examples of computing devices include personal computers, work stations, servers, personal digital assistants (PDAs), pocket PCs, and the like.
  • PDAs personal digital assistants
  • the scheduled operations may be executed in a computer processor, such as a RaPiD processor developed by the University of Washington or an adaptable execution unit developed by Quicksilver Technology, Inc.
  • the processor may be included in a cellular phone, personal digital assistant (PDA), global positioning system (GPS) receiver, etc.
  • PDA personal digital assistant
  • GPS global positioning system
  • a computer program product including software code stored on a computer readable medium that directs system 100 as described is provided.
  • Examples of computer readable media include RAM, disk drives, floppy disks, CD-ROMs, flash memory, read only memories (ROMs), and the like.
  • System 100 receives operations that are to be scheduled for a program code.
  • system 100 organizes the operations into relationships that may be represented by a data dependence graph (DDG).
  • DDG data dependence graph
  • Each node of the representation on the data dependence graph represents an operation and edges in the DDG represent dependencies between connected nodes.
  • Fig. 2 A illustrates an example of a DDG according to one embodiment. As shown, each node represents an operation and the edges represent dependencies between the operations. The numbers in the nodes represent an operation number for identification purposes.
  • Fig. 2B illustrates the DDG for the operations of Fig. 2 A with the height shown inside each node.
  • all edges have a weight of one, but it will be understood that different edges may have different weights.
  • Fig. 2C illustrates the DDG of Figs. 1 A and IB with a maximum predecessor height shown inside each node.
  • the maximum predecessor height for a node is the maximum height of any predecessor nodes for the node.
  • a predecessor node is any node the current operation depends on.
  • Predecessor nodes may also be immediate predecessor nodes, which are nodes directly dependent on the current node (connected by an edge).
  • the maximum height of any predecessor nodes of node 1 is five (from nodes 13 and 14) and that MPH is assigned for node 1.
  • the MPH of predecessor nodes of node 4 is four (from nodes 11 and 12) and that MPH is assigned to node 4.
  • the values computed for the DDG are used by system 100 to compute the order of operations in the ordered list. Specifically, a descend module 102 and a climb module 104 use the values in ordering the operations.
  • Descend module 102 implements a descend method, described below. Descend module 102 finds successor operations for a current operation according to one embodiment. For example, a successor operation is an operation that is dependent upon the execution of a current operation. In one embodiment, descend module 102 may find all successor operations that are dependent upon a current operation. Additionally, descend module 102 may find immediate successor operations, which are successor operations that are connected by edges in the DDG to the current operation.
  • Climb module 104 implements a climb method, described below.
  • Climb module 104 finds predecessor operations for a current operation.
  • Predecessor operations are operations that the current operation depends upon.
  • Climb module 104 may find all predecessor operations that are dependent upon a current operation.
  • climb module 104 may find immediate predecessor operations, which are predecessor operations that are connected by an edge in the DDG to the current operation.
  • Using descend module 102 and climb module 104, system 100 is able to order operations that are dependent upon one another. The ordering of operations effectively keeps branches of the DDG together and roughly orders the operations by decreasing height.
  • Fig. 3 illustrates a method for computing a scheduling priority according to one embodiment. In one embodiment, a computer implemented process orders operations in an ordered list.
  • step S300 a list of operations for a program code is received.
  • step S302 the dependencies among the operations are determined.
  • the dependencies may be represented by the DDG graphs of Figs. 2A, 2B, and 2C.
  • step S304 priority values for the operations are determined. For example, the height and MPH of each operation is determined.
  • step S306 one or more operations dependent upon one another are determined from the list of operations.
  • the one or more operations include an operation that has a highest priority value assigned to it.
  • the one or more operations may be operations with the highest MPH not already in the ordered list.
  • step S308 an operation that is not in the ordered list is identified from the one or more operations.
  • the operation is an operation with the highest priority value that is not in the ordered list.
  • step S310 if the identified operation has predecessor operations that are not in the ordered list, the predecessor operations are added to the list.
  • the predecessor operations include all predecessor operations the operation is dependent on. Also, the predecessor operations may be ordered from a greatest to lowest height.
  • step S312 once the predecessor operations are added to the list, the identified operation is added to the list. The process then reiterates to step S308, where the process is repeated for another operation in one or more operations not already in the ordered list. In one embodiment, the operation is a successor operation to the already identified operation.
  • Figs. 4A, 4B and 4C illustrate one embodiment of a method for computing a scheduling priority.
  • step S400 an instance of a DDG is constructed from operations to be ordered.
  • step S402 latencies for edges in the instance of the DDG are determined. Additionally, heights for each operation are compiled from the latencies (step S404). In step S406, MPHs for each operation are determined. [36] In step S408, the process determines ifthere are any operations to order. Ifthere are no operations to order, the process ends at step S410.
  • step S412 an operation N corresponding to a node in the DDG with the greatest height that is not in the ordered list is identified, (step S412). It will be understood that any operation may be identified that is not yet in the ordered list and determining an operation with the greatest height is not required. After determining the operation N with the greatest height, the process performs a descend method with operation N in step S414.
  • Fig. 4B illustrates a flow chart of a process for the descend method according to one embodiment.
  • Descend module 102 performs the descend method in one embodiment.
  • the descend method performs a climb method with operation N as the current operation.
  • Fig. 4C illustrates a flow chart of a process for the climb method according to one embodiment.
  • Climb module 102 performs the climb method in one embodiment.
  • step S430 the process determines if the current operation is in the ordered list.
  • the current operation is the operation N from the descend method.
  • the current operation may be determined from the climb method in step S438, described below. If current operation is in the ordered list, the method proceeds to step S431, where the process determines if the current operation was determined from the descend method or the climb method.
  • the current operation may be the operation determined in step S416 from the descend method or a predecessor operation of the operation from step S438 of the climb method.
  • step S416 of the descend method the method returns to step S416 of the descend method in Fig. 4B. In this case, the climb method has been performed for step S416 of the descend method and the method proceeds to step S418.
  • step S438 of the climb method the method returns to step S438 of the climb method. In this case, performing the climb method with the predecessor operation has been performed and the method reiterates to step S434.
  • step S432 any immediate predecessors of the current operation are sorted by decreasing MPH.
  • step S434 the process determines if the sorted list of predecessors is empty. The list may be empty ifthere are no predecessors, or if the list has been emptied by the (possibly repeated) application of step S436.
  • step S440 If the list is empty, in step S440, the current operation is appended to the ordered list. The process then proceeds to step S431, described above. If the sorted list of predecessors is not empty in step S434, the first predecessor P in the sorted list is removed in step S436. In step S438, the climb method is recursively invoked with the removed predecessor P as the current operation. This recursive process continues until all predecessor operations of the operation N from the descend method and the operation N from the descend method are added to the ordered list. [44] After the operation N from the descend method is added to ordered list, the climb method has been performed for the current operation in step S416 and returns to the descend method. Referring back to Fig.
  • step S414 the process determines if the current operation has any immediate successor operations. Ifthere are no immediate successor operations, the process returns to step S414 of Fig. 4 A. In this case, the descend method has been performed in step S414 and the process reiterates to step S408. In one example, the process returns to step S414 when a node of the lowest priority on the DDG is reached.
  • step S420 the process selects an immediate successor operation of the current operation. The immediate successor operation will now be the current operation. In one embodiment, the immediate successor operation of a greatest height is selected.
  • step S422 the process recursively invokes the descend method with the immediate successor operation as the current operation. The process then continues as described above.
  • the node with a greatest number of outputs is selected first.
  • a reason for this is scheduling nodes with more outputs earlier generally allows more nodes to be scheduled earlier. Also, if one of the nodes in the tie is terminal in the life of a variable, that node is selected. This shortens the lifetime of the variable, possibly reducing the number of registers required. Further, the node with the lowest number of valid locations for scheduling may be selected. Nodes that have fewer valid locations are more difficult to schedule because fewer functional units 106 exist to execute the operations. For the purposes of this example, node 13 is chosen first. The descend method is then called with the operation corresponding to node 13 as the current operation. It will be understood that the above three techniques may be easily combined or used separately.
  • the descend method first calls the climb method for the current operation corresponding to node 13. Node 13 is not yet in the scheduling order list and the climb method determines ifthere are any immediate predecessors to node 13. In this case, there are no immediate predecessors and the climb method adds node 13 to the end of the scheduling order list. The climb method then returns to the descend method. The scheduling order list now includes node 13. [50] Next, the descend method determines the successor of node 13. In one embodiment, the successor with the greatest height is determined. In this case, node 10 is the only successor node for node 13. The descend method is then called for node 10. [51] The descend method calls the climb method for node 10.
  • the climb method determines that node 10 is not in the scheduling order list and determines if node 10 has any immediate predecessor operations.
  • Nodes 13 and 14 are immediate predecessor operations to node 10 and they are sorted by decreasing MPH.
  • Nodes 13 and 14 both have the same MPH and a heuristic may be used to determine which node is chosen first. For purposes of this example, node 13 is chosen and the method determines if node 13 has any immediate predecessors. Node 13 does not have any immediate predecessors and is already in the ordered list; thus, the method proceeds to the next sorted immediate predecessor. [52]
  • the climb method is then called for the next immediate predecessor, node 14.
  • Node 14 does not have any immediate predecessors and is not in the ordered list.
  • Node 14 is then added to the scheduling order list. The method determines that all the nodes in the sorted list have been processed and the current node 10 is added to scheduling order list. The process returns to the descend method. Thus far, the scheduling order list now contains the nodes 13, 14, and 10.
  • the descend method identifies a successor node for node 7 with a greatest height. Node 7 has one successor node, node 3, and the descend method is performed for node 3. [55] The climb method is then called for node 3. Node 3 is not in the scheduling order list and the process determines if node 3 has any immediate predecessor operations. Nodes 6 and 7 are immediate predecessors and the method orders the immediate predecessors by decreasing MPH. In this case, the order is node 7 (MPH of five) followed by node 6 (MPH of 3). Node 7 is already on the scheduling list and the method proceeds to node 6. Node 6 is not in the scheduling order list and has no immediate predecessor nodes. Thus, node 6 is added to the scheduling order list.
  • the method then adds node 3 to the scheduling order list because there are no more sorted immediate predecessor nodes in the sorted list.
  • the scheduling order list now includes nodes 13, 14, 10, 7, 6, and 3.
  • the climb method then returns to the descend method where the successor nodes of node 3 are determined. Node 2 is the only successor node and thus the node with the greatest height.
  • the descend method is performed for node 2.
  • the climb method for node 2 is then called.
  • the method determines that node 2 is not in the scheduling order list and determines any immediate predecessor nodes to mode 2.
  • Nodes 3 and 4 are immediate predecessor nodes for node 2.
  • the process then orders the predecessors ofnode 2 by decreasing MPH. In this case, the order is node 3 (MPH of 5) followed by node 4 (MPH of 4).
  • Node 3 is already in the ordered list and the process proceeds with node 4.
  • Node 4 is not in the scheduling order list and any immediate predecessor nodes ofnode 4 are determined.
  • the following steps effectively add all the predecessor nodes ofnode 4 that are not in the scheduling order list to the scheduling order list.
  • the immediate predecessors ofnode 4 are determined and sorted by decreasing MPH.
  • Nodes 8 and 9 have the same MPH.
  • the order used is node 9 followed by node 8.
  • Node 9 is not on the scheduling order list and the immediate predecessors ofnode 9 are determined and sorted by decreasing MPH.
  • Node 11 and node 12 are immediate predecessors ofnode 9 and have the same MPH of 4.
  • the order used is node 11 followed by node 12.
  • node 11 has no immediate predecessors and node 11 is added to the scheduling order list.
  • Node 12 is next in the sorted list and has no immediate predecessors.
  • Node 12 is not on the scheduling order list and is added.
  • the predecessors for node 9 have now been processed and node 9 is added to the scheduling order list. (At this point, the scheduling order list includes nodes 13, 14, 10, 7, 6, 3, 11, 12, and 9).
  • Node 4 is next in the sorted immediate predecessor list. The method determines the immediate predecessors ofnode 4 that have not been processed.
  • Node 8 has not been processed and is not in the scheduling order list. Also, node 8 does not have any immediate predecessors and the method adds node 8 to the scheduling order list.
  • the scheduling order list now includes nodes 13, 14, 10, 7, 6, 3, 11, 12, 9, 8, and 4. [62] All the predecessors of node 2 have now been added to the scheduling order list and node 2 is added to the scheduling order list.
  • the process then returns to the descend method and successors ofnode 2 are determined. In this case, there is only one successor, node 1.
  • the descend method for node 1 is performed. Subsequently, the descend methods performs the climb method for node 1.
  • the process determines that node 2 is the only immediate predecessor node to node 1.
  • Node 2 is in the scheduling order list and node 1 has no other immediate predecessor operations.
  • node 1 is added to the scheduling order list because there are no more predecessors operations.
  • the scheduling order list now includes the nodes 15, 14, 10, 7, 6, 3, 11, 12, 9, 8, 4, 2, and 1.
  • the descend method has now reached the bottom of the DDG and the process returns to determine ifthere are any nodes to order.
  • Node 5 has not been included in the scheduling order list.
  • node 5 may have been included after node 9.
  • the descend method is called for node 5 because node 5 is not yet in the scheduling order list.
  • the climb method is called by the descend method for node 5.
  • Node 9 is determined to be an immediate predecessor ofnode 5 and is in the scheduling order list. There are no other immediate predecessors and thus node 5 is then added to the scheduling order list.
  • the climb method returns to the descend method and no successors to node 5 are determined. Thus, the method returns and determines that there are no more nodes to order.
  • the final scheduling order list includes the nodes 13, 14, 10, 7, 6, 3, 11, 12, 9, 8, 4, 2, 1, and 5.
  • the program code may include cycles.
  • the cycles may be broken using methods known in the art. For example, techniques based on treating cycles (also known as strongly connected components) as super-vertices may be used. See, for example, pages 37 and 38 of HP labs tech report HPL-94-115 Iterative Modulo Scheduling - Rau, B. Ramakrishna and section 2.1 of Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. Software Pipelining, in ACM Computing Surveys, 27(3):367-432, September 1995. The super-vertices and the cycle-free graph containing the super-vertices may then be processed as described above.
  • Embodiments of the present invention generally keep branches of a DDG together in a scheduling list.
  • an instruction scheduler will tend to place operations that are near each other in the scheduling order list near each other in the resulting schedule.
  • This method of scheduling reduces a distance that data travels in a lifetime of variables for the operations.
  • the goal of keeping these schedules short is also met and the most critical branches are scheduled first because the scheduling order list is generally ordered by graph height.
  • the amount of data motion is reduced and the need for data movement resources is lessened.
  • power is saved by reducing data movement and may result in shorter schedules by avoiding exhaustion of data movement resources. Shorter schedules execute in less time and use less power.
  • FIG. 5 illustrates an output of a resultant instruction schedule for the ordered operations and program code.
  • the first column represents a phase of the program code that is being executed.
  • the first row illustrates a type of functional unit that each column represents.
  • Other rows represent an instruction that is executed each clock cycle.
  • CLRACC sum3 // start the mac loop. startLoop macLpCnt, macLoop macLoop

Abstract

A method for ordering a plurality of operations that are dependent upon one another in an ordered list to be used for scheduling is provided. The method comprises identifying a current operation in the plurality of operations that is not in the ordered list. Also, it is determined if the current operation has any predecessor operations that are not in the ordered list. If the current operation has predecessor operations, predecessor operations are added to the ordered list. The current operation is then added to the ordered list and a successor operation to the current operation is identified. The successor operation is now considered the current operation and the process reiterates to determine if the current operation has any predecessor operations and continues as above. The process continues until a current operation does not have any successor operations.

Description

METHOD FOR ORDERING OPERATIONS FOR SCHEDULING BY A
MODULO SCHEDULER FOR PROCESSORS WITH A LARGE
NUMBER OF FUNCTION UNITS AND RECONFIGURABLE DATA
PATHS
BACKGROUND OF THE INVENTION [01] The present invention generally relates to computer processing and more specifically to a system and method for ordering operations to be scheduled by clustering related operations in an ordering list. [02] Instruction scheduling involves assigning operations from an original sequence of operations to specific functional units at specific times in a way to make efficient use of hardware resources. The scheduled operations produce the same result as executing the operations sequentially in an original order but the operations may not be scheduled in that original order. The goal is to efficiently use hardware resources and retain the original result that would be obtained by executing the operations sequentially.
[03] Instruction scheduling operates by scheduling an instruction that is executed for each clock cycle of a processor. Each instruction includes a slot for each functional unit of the processor where an operation may be scheduled. The instruction scheduler then schedules operations for a functional unit during a clock cycle. Typically, instruction schedulers attempt to schedule operations where a minimum number of instructions are used and operations are scheduled for as many functional units as possible for each instruction used. [04] The process of instruction scheduling orders operations in a scheduling order list, which is typically a list of operations in the order the operations should be executed if they were executed sequentially. Typically, a data dependence graph (DDG) is used to order operations to be scheduled. The DDG is arranged based on the dependencies among a group of operations for a program code. The dependencies of the DDG are represented by edges, which represent delays, i.e., the time delay required between the start of a predecessor operation and the start of a successor operation connected by the edge. Operations in the DDG are assigned heights to establish a priority value for the operation. A height indicates an overall dependency value based on the values of all the edges dependent upon a specific operation. The operation with the greatest height in the DDG becomes the highest priority operation for scheduling. Typically, the operations are ordered starting from operations with the greatest height to operations with the lowest height. Operations are then scheduled sequentially from the first ordered operation to the last ordered operation. [05] The above approach may work when scheduling a small amount of functional units for each clock cycle. However, when scheduling a large amount of functional units for each clock cycle, problems result when operations are ordered sequentially from the highest priority to lowest priority. Thus, the operations that have the most operations dependent on them are scheduled first. If a DDG is depicted as having branches of related operations, the operations of greatest height are typically ordered first. Then, operations of the next greatest height are ordered next, and so on. This method of ordering operations typically orders operations from different branches in a DDG together because operations of a highest priority are usually located at the top of different branches. When operations ordered in this way are scheduled, related operations in branches are scheduled in functional units in a way that inefficiently uses computing resources. For example, a resulting schedule results in fragmentation, increased costs from moving data from functional unit to functional unit, higher resource use cost, and increased communication resource use.
[06] In one example, a resulting schedule using the above method results in a large amount of data movement because operations that use the same variables may not be grouped together. This results in a schedule that requires a large number of data movement resources. Thus, the processor must include a large number of data movement resources, or a schedule is produced that is inefficient in its use of time because data movement resources are exhausted and the schedule was to be extended in time to compensate.
BRIEF SUMMARY OF THE INVENTION [07] In one embodiment, a method for ordering a plurality of operations that are dependent upon one another in an ordered list to be used for scheduling is provided. The method comprises identifying a current operation in the plurality of operations that is not in the ordered list. Also, it is determined if the current operation has any predecessor operations that are not in the ordered list. If the current operation has predecessor operations, predecessor operations are added to the ordered list. The current operation is then added to the ordered list and a successor operation to the current operation is identified. The successor operation is now considered the current operation and the process reiterates to determine if the current operation has any predecessor operations and continues as above. The process continues until a current operation does not have any successor operations. [08] In one embodiment, a method for ordering a plurality of operations that are dependent upon one another in an ordered list to be used for scheduling is provided. The method comprises: (a) identifying a current operation in the plurality of operations that is not in the ordered list; (b) determining if the current operation has any predecessor operations that are not in the ordered list; (c) if the current operation has predecessor operations, adding the predecessor operations to the ordered list; (d) adding the operation to the ordered list; (e) identifying a successor operation to the current operation, wherein the successor operation is considered the current operation; and (f) performing steps (b)-(e) until a successor operation is not identified in step (e). [09] In another embodiment, computer program products stored on tangible media that direct a processor to order operations as described below are provided.
[10] A further understanding of the nature and advantages of the invention herein may be realized by reference of the remaining portions in the specifications and the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[11] Fig. 1 discloses a system for ordering operations according to one embodiment;
[12] Fig. 2A illustrates an example of a DDG according to one embodiment;
[13] Fig. 2B illustrates the DDG for the operations of Fig. 2A with the height shown inside each node;
[14] Fig. 2C illustrates the DDG of Figs. 1 A and IB with a maximum predecessor height shown inside each node;
[15] Fig. 3 illustrates a method for computing a scheduling priority according to one embodiment; [16] Fig. 4A illustrates the method for computing scheduling priority of Fig. 3 in more detail according to one embodiment;
[17] Fig. 4B illustrates the descend method according to one embodiment;
[18] Fig. 4C illustrates the climb method according to one embodiment; and
[19] Fig. 5 illustrates an output of a resultant instruction schedule for the ordered operations and program code.
DETAILED DESCRIPTION OF THE INVENTION [20] Fig. 1 discloses a system 100 for ordering operations according to one embodiment. System 100 is a computing device that outputs an ordered list of operations that may be used to schedule operations in executable instructions. Examples of computing devices include personal computers, work stations, servers, personal digital assistants (PDAs), pocket PCs, and the like. Once the operations are scheduled, the scheduled operations may be executed in a computer processor, such as a RaPiD processor developed by the University of Washington or an adaptable execution unit developed by Quicksilver Technology, Inc. The processor may be included in a cellular phone, personal digital assistant (PDA), global positioning system (GPS) receiver, etc.
[21] In one embodiment, a computer program product including software code stored on a computer readable medium that directs system 100 as described is provided. Examples of computer readable media include RAM, disk drives, floppy disks, CD-ROMs, flash memory, read only memories (ROMs), and the like.
[22] System 100 receives operations that are to be scheduled for a program code. In one embodiment, system 100 organizes the operations into relationships that may be represented by a data dependence graph (DDG). Each node of the representation on the data dependence graph represents an operation and edges in the DDG represent dependencies between connected nodes.
[23] Fig. 2 A illustrates an example of a DDG according to one embodiment. As shown, each node represents an operation and the edges represent dependencies between the operations. The numbers in the nodes represent an operation number for identification purposes.
[24] Fig. 2B illustrates the DDG for the operations of Fig. 2 A with the height shown inside each node. For purposes of this example, all edges have a weight of one, but it will be understood that different edges may have different weights. [25] Fig. 2C illustrates the DDG of Figs. 1 A and IB with a maximum predecessor height shown inside each node. The maximum predecessor height for a node is the maximum height of any predecessor nodes for the node. A predecessor node is any node the current operation depends on. Predecessor nodes may also be immediate predecessor nodes, which are nodes directly dependent on the current node (connected by an edge). For example, the maximum height of any predecessor nodes of node 1 is five (from nodes 13 and 14) and that MPH is assigned for node 1. Also, the MPH of predecessor nodes of node 4 is four (from nodes 11 and 12) and that MPH is assigned to node 4.
[26] In one embodiment, the values computed for the DDG are used by system 100 to compute the order of operations in the ordered list. Specifically, a descend module 102 and a climb module 104 use the values in ordering the operations. [27] Descend module 102 implements a descend method, described below. Descend module 102 finds successor operations for a current operation according to one embodiment. For example, a successor operation is an operation that is dependent upon the execution of a current operation. In one embodiment, descend module 102 may find all successor operations that are dependent upon a current operation. Additionally, descend module 102 may find immediate successor operations, which are successor operations that are connected by edges in the DDG to the current operation.
[28] Climb module 104 implements a climb method, described below. Climb module 104 finds predecessor operations for a current operation. Predecessor operations are operations that the current operation depends upon. Climb module 104 may find all predecessor operations that are dependent upon a current operation. Additionally, climb module 104 may find immediate predecessor operations, which are predecessor operations that are connected by an edge in the DDG to the current operation. [29] Using descend module 102 and climb module 104, system 100 is able to order operations that are dependent upon one another. The ordering of operations effectively keeps branches of the DDG together and roughly orders the operations by decreasing height. [30] Fig. 3 illustrates a method for computing a scheduling priority according to one embodiment. In one embodiment, a computer implemented process orders operations in an ordered list. In step S300, a list of operations for a program code is received. In step S302, the dependencies among the operations are determined. For example, the dependencies may be represented by the DDG graphs of Figs. 2A, 2B, and 2C. In step S304, priority values for the operations are determined. For example, the height and MPH of each operation is determined. [31] In step S306, one or more operations dependent upon one another are determined from the list of operations. In one embodiment, the one or more operations include an operation that has a highest priority value assigned to it. Also, the one or more operations may be operations with the highest MPH not already in the ordered list. [32] In step S308, an operation that is not in the ordered list is identified from the one or more operations. In one embodiment, the operation is an operation with the highest priority value that is not in the ordered list.
[33] In step S310, if the identified operation has predecessor operations that are not in the ordered list, the predecessor operations are added to the list. In one embodiment, the predecessor operations include all predecessor operations the operation is dependent on. Also, the predecessor operations may be ordered from a greatest to lowest height. [34] In step S312, once the predecessor operations are added to the list, the identified operation is added to the list. The process then reiterates to step S308, where the process is repeated for another operation in one or more operations not already in the ordered list. In one embodiment, the operation is a successor operation to the already identified operation. [35] Figs. 4A, 4B and 4C illustrate one embodiment of a method for computing a scheduling priority. In step S400, an instance of a DDG is constructed from operations to be ordered. In step S402, latencies for edges in the instance of the DDG are determined. Additionally, heights for each operation are compiled from the latencies (step S404). In step S406, MPHs for each operation are determined. [36] In step S408, the process determines ifthere are any operations to order. Ifthere are no operations to order, the process ends at step S410.
[37] Ifthere are operations to order, an operation N corresponding to a node in the DDG with the greatest height that is not in the ordered list is identified, (step S412). It will be understood that any operation may be identified that is not yet in the ordered list and determining an operation with the greatest height is not required. After determining the operation N with the greatest height, the process performs a descend method with operation N in step S414.
[38] Fig. 4B illustrates a flow chart of a process for the descend method according to one embodiment. Descend module 102 performs the descend method in one embodiment. In step S416, the descend method performs a climb method with operation N as the current operation.
[39] Fig. 4C illustrates a flow chart of a process for the climb method according to one embodiment. Climb module 102 performs the climb method in one embodiment. In step S430, the process determines if the current operation is in the ordered list. When the climb method is first called by the descend method, the current operation is the operation N from the descend method. However, the current operation may be determined from the climb method in step S438, described below. If current operation is in the ordered list, the method proceeds to step S431, where the process determines if the current operation was determined from the descend method or the climb method. In the recursive nature of the method, the current operation may be the operation determined in step S416 from the descend method or a predecessor operation of the operation from step S438 of the climb method. [40] If the current operation was determined in the descend method, the method returns to step S416 of the descend method in Fig. 4B. In this case, the climb method has been performed for step S416 of the descend method and the method proceeds to step S418. [41] If the current operation was determined in the climb method, the method returns to step S438 of the climb method. In this case, performing the climb method with the predecessor operation has been performed and the method reiterates to step S434. [42] In step S432, any immediate predecessors of the current operation are sorted by decreasing MPH. In step S434, the process determines if the sorted list of predecessors is empty. The list may be empty ifthere are no predecessors, or if the list has been emptied by the (possibly repeated) application of step S436.
[43] If the list is empty, in step S440, the current operation is appended to the ordered list. The process then proceeds to step S431, described above. If the sorted list of predecessors is not empty in step S434, the first predecessor P in the sorted list is removed in step S436. In step S438, the climb method is recursively invoked with the removed predecessor P as the current operation. This recursive process continues until all predecessor operations of the operation N from the descend method and the operation N from the descend method are added to the ordered list. [44] After the operation N from the descend method is added to ordered list, the climb method has been performed for the current operation in step S416 and returns to the descend method. Referring back to Fig. 4B, the process determines if the current operation has any immediate successor operations. Ifthere are no immediate successor operations, the process returns to step S414 of Fig. 4 A. In this case, the descend method has been performed in step S414 and the process reiterates to step S408. In one example, the process returns to step S414 when a node of the lowest priority on the DDG is reached. [45] In step S420, the process selects an immediate successor operation of the current operation. The immediate successor operation will now be the current operation. In one embodiment, the immediate successor operation of a greatest height is selected. [46] In step S422, the process recursively invokes the descend method with the immediate successor operation as the current operation. The process then continues as described above. [47] An example of the above methods will now be described with reference to the DDGs in Figs. 2A-2C. The operation corresponding to the node with the greatest height that is not in the scheduling order list is determined first. In this case, the operations corresponding the nodes 13 and 14 both have the greatest height of five. In one embodiment, heuristics may be used to determine which node is chosen first.
[48] For example, the node with a greatest number of outputs is selected first. A reason for this is scheduling nodes with more outputs earlier generally allows more nodes to be scheduled earlier. Also, if one of the nodes in the tie is terminal in the life of a variable, that node is selected. This shortens the lifetime of the variable, possibly reducing the number of registers required. Further, the node with the lowest number of valid locations for scheduling may be selected. Nodes that have fewer valid locations are more difficult to schedule because fewer functional units 106 exist to execute the operations. For the purposes of this example, node 13 is chosen first. The descend method is then called with the operation corresponding to node 13 as the current operation. It will be understood that the above three techniques may be easily combined or used separately.
[49] The descend method first calls the climb method for the current operation corresponding to node 13. Node 13 is not yet in the scheduling order list and the climb method determines ifthere are any immediate predecessors to node 13. In this case, there are no immediate predecessors and the climb method adds node 13 to the end of the scheduling order list. The climb method then returns to the descend method. The scheduling order list now includes node 13. [50] Next, the descend method determines the successor of node 13. In one embodiment, the successor with the greatest height is determined. In this case, node 10 is the only successor node for node 13. The descend method is then called for node 10. [51] The descend method calls the climb method for node 10. The climb method determines that node 10 is not in the scheduling order list and determines if node 10 has any immediate predecessor operations. Nodes 13 and 14 are immediate predecessor operations to node 10 and they are sorted by decreasing MPH. Nodes 13 and 14 both have the same MPH and a heuristic may be used to determine which node is chosen first. For purposes of this example, node 13 is chosen and the method determines if node 13 has any immediate predecessors. Node 13 does not have any immediate predecessors and is already in the ordered list; thus, the method proceeds to the next sorted immediate predecessor. [52] The climb method is then called for the next immediate predecessor, node 14. Node 14 does not have any immediate predecessors and is not in the ordered list. Node 14 is then added to the scheduling order list. The method determines that all the nodes in the sorted list have been processed and the current node 10 is added to scheduling order list. The process returns to the descend method. Thus far, the scheduling order list now contains the nodes 13, 14, and 10.
[53] The successor node of node 10 with the greatest height is now determined. In this case, node 7 is the only successor and is chosen. The descend method is now performed for node 7 as the current operation. The process then calls the climb method for node 7. Node 7 is not in the scheduling order list and has immediate predecessors, which are ordered by decreasing MPH. An immediate predecessor of a greatest MPH is identified. In this case, the only immediate predecessor to node 7 is node 10. Node 10 is already in the scheduling order list, and the method determines ifthere are no more immediate predecessor nodes in the sorted list. There are no more immediate predecessor nodes are in the sorted list; thus, node 7 is added to the scheduling order list. The process is then returns to the descend method where successors ofnode 7 are determined. The scheduling order list now includes nodes 13, 14, 10, and 7.
[54] The descend method identifies a successor node for node 7 with a greatest height. Node 7 has one successor node, node 3, and the descend method is performed for node 3. [55] The climb method is then called for node 3. Node 3 is not in the scheduling order list and the process determines if node 3 has any immediate predecessor operations. Nodes 6 and 7 are immediate predecessors and the method orders the immediate predecessors by decreasing MPH. In this case, the order is node 7 (MPH of five) followed by node 6 (MPH of 3). Node 7 is already on the scheduling list and the method proceeds to node 6. Node 6 is not in the scheduling order list and has no immediate predecessor nodes. Thus, node 6 is added to the scheduling order list. The method then adds node 3 to the scheduling order list because there are no more sorted immediate predecessor nodes in the sorted list. The scheduling order list now includes nodes 13, 14, 10, 7, 6, and 3. [56] The climb method then returns to the descend method where the successor nodes of node 3 are determined. Node 2 is the only successor node and thus the node with the greatest height.
[57] The descend method is performed for node 2. The climb method for node 2 is then called. The method determines that node 2 is not in the scheduling order list and determines any immediate predecessor nodes to mode 2. Nodes 3 and 4 are immediate predecessor nodes for node 2. The process then orders the predecessors ofnode 2 by decreasing MPH. In this case, the order is node 3 (MPH of 5) followed by node 4 (MPH of 4). Node 3 is already in the ordered list and the process proceeds with node 4. Node 4 is not in the scheduling order list and any immediate predecessor nodes ofnode 4 are determined. [58] The following steps effectively add all the predecessor nodes ofnode 4 that are not in the scheduling order list to the scheduling order list.
[59] First, the immediate predecessors ofnode 4 are determined and sorted by decreasing MPH. Nodes 8 and 9 have the same MPH. For the purposes of this example, the order used is node 9 followed by node 8. Node 9 is not on the scheduling order list and the immediate predecessors ofnode 9 are determined and sorted by decreasing MPH. Node 11 and node 12 are immediate predecessors ofnode 9 and have the same MPH of 4. For purposes of this example, the order used is node 11 followed by node 12.
[60] It is determined that node 11 has no immediate predecessors and node 11 is added to the scheduling order list. Node 12 is next in the sorted list and has no immediate predecessors. Node 12 is not on the scheduling order list and is added. The predecessors for node 9 have now been processed and node 9 is added to the scheduling order list. (At this point, the scheduling order list includes nodes 13, 14, 10, 7, 6, 3, 11, 12, and 9). [61] Node 4 is next in the sorted immediate predecessor list. The method determines the immediate predecessors ofnode 4 that have not been processed. Node 8 has not been processed and is not in the scheduling order list. Also, node 8 does not have any immediate predecessors and the method adds node 8 to the scheduling order list. All sorted immediate predecessors have now been processed for node 4 and node 4 is now added to the end of the scheduling order list. The scheduling order list now includes nodes 13, 14, 10, 7, 6, 3, 11, 12, 9, 8, and 4. [62] All the predecessors of node 2 have now been added to the scheduling order list and node 2 is added to the scheduling order list.
[63] The process then returns to the descend method and successors ofnode 2 are determined. In this case, there is only one successor, node 1. The descend method for node 1 is performed. Subsequently, the descend methods performs the climb method for node 1. The process determines that node 2 is the only immediate predecessor node to node 1. Node 2 is in the scheduling order list and node 1 has no other immediate predecessor operations. Thus, node 1 is added to the scheduling order list because there are no more predecessors operations. The scheduling order list now includes the nodes 15, 14, 10, 7, 6, 3, 11, 12, 9, 8, 4, 2, and 1. The descend method has now reached the bottom of the DDG and the process returns to determine ifthere are any nodes to order.
[64] Node 5 has not been included in the scheduling order list. In one embodiment, node 5 may have been included after node 9. However, for this example, the descend method is called for node 5 because node 5 is not yet in the scheduling order list. The climb method is called by the descend method for node 5. Node 9 is determined to be an immediate predecessor ofnode 5 and is in the scheduling order list. There are no other immediate predecessors and thus node 5 is then added to the scheduling order list. The climb method returns to the descend method and no successors to node 5 are determined. Thus, the method returns and determines that there are no more nodes to order. The final scheduling order list includes the nodes 13, 14, 10, 7, 6, 3, 11, 12, 9, 8, 4, 2, 1, and 5. [65] In one embodiment, the program code may include cycles. In one embodiment, the cycles may be broken using methods known in the art. For example, techniques based on treating cycles (also known as strongly connected components) as super-vertices may be used. See, for example, pages 37 and 38 of HP labs tech report HPL-94-115 Iterative Modulo Scheduling - Rau, B. Ramakrishna and section 2.1 of Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. Software Pipelining, in ACM Computing Surveys, 27(3):367-432, September 1995. The super-vertices and the cycle-free graph containing the super-vertices may then be processed as described above. [66] Embodiments of the present invention generally keep branches of a DDG together in a scheduling list. Thus, an instruction scheduler will tend to place operations that are near each other in the scheduling order list near each other in the resulting schedule. This method of scheduling reduces a distance that data travels in a lifetime of variables for the operations. Also, the goal of keeping these schedules short is also met and the most critical branches are scheduled first because the scheduling order list is generally ordered by graph height. Also, the amount of data motion is reduced and the need for data movement resources is lessened. Thus, power is saved by reducing data movement and may result in shorter schedules by avoiding exhaustion of data movement resources. Shorter schedules execute in less time and use less power. [67] An example of a program code that may be inputted into system 100 is provided below. The code describes an implementation ofa finite impulse response (FIR) filter. Once receiving the code, system 100 orders the operations generally keeping branches of a representation of a DDG for the code together. Fig. 5 illustrates an output of a resultant instruction schedule for the ordered operations and program code. In the table, the first column represents a phase of the program code that is being executed. The first row illustrates a type of functional unit that each column represents. Other rows represent an instruction that is executed each clock cycle.
[68] The following is an example of the program code:
Start
// Set the addresses to point to the start of the data (use immediate constant 0 for now).
Move outOrigin, outAddr Move inOrigin, inBase Move inBase, inAddr // Prime the coef history by reading a short array of zeros. Read zerolndex, coef
Read zerolndex, coef Read zerolndex, coef // Set the coefAddr to point to start of coef array.
Move coefOrigin, coefAddr // start the main loop with a zero overhead loop. Run mainLpCnt iterations. startLoop mainLpCnt, mainLoop mainLoop
// Clear out all four accumulators CLRACC sumO
CLRACC suml
CLRACC sum2
CLRACC sum3 // start the mac loop. startLoop macLpCnt, macLoop macLoop
Read coefAddr, coef Inc coefAddr Read inputAddr, input
Inc inputAddr
// use the history property of coef to allow four multiplies for each read MAC input, coef+0, sumO MAC input, coef+ 1 , sum 1 MAC input, coef+2, sum2
MAC input, coef+3, sum3 loopNext macLoop, macLoopEnd
// now some straightline code to implement the diagonal finish of the mac loop macLoopEnd
RSS sumO, output // RSS -> round, saturate and shift. Details TBD
Put outAddr, output
Inc outAddr Read zerolndex, coef // read from 0 to kick the history along and fill it with 0 for next pass
Read inputAddr, input
Inc inputAddr
MAC input, coef+ 1 , sum 1 MAC input, coef+2, sum2
MAC input, coef+3, sum3 //
RSS suml, output
Put outAddr, output Inc outAddr
Read zerolndex, coef // read from 0 to kick the history along and fill it with 0 for next pass
Read inputAddr, input
Inc inputAddr MAC input, coef+2, sum2 MAC input, coef+3, sum3 //
RSS sum2, output
Put outAddr, output Inc outAddr
Read zerolndex, coef // read from 0 to kick the history along and fill it with 0 for next pass
Read inputAddr, input
MAC input, coef+3, sum3 //
RSS sum3, output
Put outAddr, output
Inc outAddr // end diagonal finish of MAC loop
// rewind coef and input to processes another four elements in the input
Move coefOrigin, coefAddr
Add inBase, 4, inBase
Move inBase, inAddr loopNext mainLoop, mainLoopEnd mainLoopEnd halt [69] The above description is illustrative but not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of the disclosure. The scope of the invention should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the pending claims along with their full scope or equivalents.

Claims

WHAT IS CLAIMED IS:
1. A method for ordering a plurality of operations that are dependent upon one another in an ordered list to be used for scheduling, the method comprising: (a) identifying a current operation in the plurality of operations that is not in the ordered list; (b) determining if the current operation has any predecessor operations that are not in the ordered list; (c) if the current operation has predecessor operations, adding the predecessor operations to the ordered list; (d) adding the current operation to the ordered list; (e) identifying a successor operation to the current operation, wherein the successor operation is considered the current operation; and (f) performing steps (b)-(e) until a successor operation is not identified in step (e).
2. The method of claim 1 , wherein identifying the current operation in the plurality of operations comprises identifying an operation with a highest priority value not in the ordered list.
3. The method of claim 1, wherein determining if the operation has any predecessor operations that are not in the ordered list comprises determining if the operation has any immediate predecessor operations.
4. The method of claim 3, wherein determining if the operation has any immediate predecessor operations that are not in the ordered list comprises: (i) determining if the immediate predecessor operations have any immediate predecessor operations; and (ii) performing step (i) until an immediate predecessor operation does not have any immediate predecessors.
5. The method of claim 4, wherein adding the predecessor operations to the ordered list comprises adding the determined immediate predecessor operations to the ordered list in an order from a greatest priority value to a lowest priority value.
6. The method of claim 1 , wherein operations in the plurality of operations have associated height values, wherein adding the predecessor operations to the ordered list comprises adding the predecessor operations from a highest to lowest height.
7. The method of claim 1 , wherein adding predecessor operations to the ordered list comprises adding predecessor operations in an order starting from a highest priority.
8. A method for ordering a plurality of operations in an ordered list to be used for scheduling, wherein each operation has an associated height value and a maximum predecessor height value, the method comprising: (a) determining a cuπent operation with a highest maximum predecessor height value and a highest height value that is not in the ordered list; (b) determining if the current operation has any predecessor operations that are not in the ordered list; (c) if the current operation has any predecessor operations, adding the predecessor operations to the ordered list; (d) adding the current operation to the ordered list; (e) identifying a successor operation to the current operation, wherein the successor operation is considered the current operation; and (f) performing steps (b)-(e) until a successor operation is not identified.
9. The method of claim 8, wherein adding predecessor operations to the ordered list comprises adding predecessor operations in an order starting from a highest priority value.
10. The method of claim 8, wherein adding the predecessor operations to the list comprises: (i) determining a predecessor operation with a greatest maximum height value and greatest height value for the current operation; (iii) adding the predecessor operation to the ordered list; and (iii) determining a successor operation to the predecessor operation, wherein the successor operation to the predecessor operation is considered the current operation; (iv) performing steps (i)-(iϋ) until all the predecessor operations have been added to the ordered list.
11. A method for ordering a plurality of operations in an ordered list to be used for scheduling, wherein operations in the plurality of operations are organized in groups, wherein a group of operations comprises operations dependent upon one another, the method comprising: (a) identifying a group of operations not in the ordered list, wherein the group of operations includes an operation not in the ordered list with a greatest height; (b) adding operations from the group of operations to the ordered list, wherein the added operations are grouped together in the ordered list; and (c) performing steps (a)-(b) for each group of operations.
12. The method of claim 11 , wherein adding operations comprises adding operations starting from an operation of greatest height.
PCT/US2003/015167 2002-05-15 2003-05-14 Method for ordering processor operations for modulo-scheduling WO2003098434A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003239454A AU2003239454A1 (en) 2002-05-15 2003-05-14 Method for ordering processor operations for modulo-scheduling

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14685702A 2002-05-15 2002-05-15
US10/146,857 2002-05-15

Publications (2)

Publication Number Publication Date
WO2003098434A2 true WO2003098434A2 (en) 2003-11-27
WO2003098434A3 WO2003098434A3 (en) 2004-09-30

Family

ID=29548296

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/015167 WO2003098434A2 (en) 2002-05-15 2003-05-14 Method for ordering processor operations for modulo-scheduling

Country Status (2)

Country Link
AU (1) AU2003239454A1 (en)
WO (1) WO2003098434A2 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339428A (en) * 1991-09-04 1994-08-16 Digital Equipment Corporation Compiler allocating a register to a data item used between a use and store of another data item previously allocated to the register
US5555417A (en) * 1989-11-13 1996-09-10 Hewlett-Packard Company Method and apparatus for compiling computer programs with interprocedural register allocation
US5887174A (en) * 1996-06-18 1999-03-23 International Business Machines Corporation System, method, and program product for instruction scheduling in the presence of hardware lookahead accomplished by the rescheduling of idle slots
US20020013937A1 (en) * 1999-02-17 2002-01-31 Ostanevich Alexander Y. Register economy heuristic for a cycle driven multiple issue instruction scheduler

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555417A (en) * 1989-11-13 1996-09-10 Hewlett-Packard Company Method and apparatus for compiling computer programs with interprocedural register allocation
US5339428A (en) * 1991-09-04 1994-08-16 Digital Equipment Corporation Compiler allocating a register to a data item used between a use and store of another data item previously allocated to the register
US5887174A (en) * 1996-06-18 1999-03-23 International Business Machines Corporation System, method, and program product for instruction scheduling in the presence of hardware lookahead accomplished by the rescheduling of idle slots
US20020013937A1 (en) * 1999-02-17 2002-01-31 Ostanevich Alexander Y. Register economy heuristic for a cycle driven multiple issue instruction scheduler

Also Published As

Publication number Publication date
WO2003098434A3 (en) 2004-09-30
AU2003239454A8 (en) 2003-12-02
AU2003239454A1 (en) 2003-12-02

Similar Documents

Publication Publication Date Title
CA2181099C (en) Method and means for scheduling parallel processors
CN1306399C (en) Virtual machine for network processor
Chatha et al. Hardware-software partitioning and pipelined scheduling of transformative applications
CN100481007C (en) Method and system for performing link-time code optimization without additional code analysis
KR0176263B1 (en) Method and apparatus for optimizing cost-based heuristic instruction schedule
EP1124182A2 (en) Communicating instruction results in processors and compiling methods for processors
US8082546B2 (en) Job scheduling to maximize use of reusable resources and minimize resource deallocation
US20080216062A1 (en) Method for Configuring a Dependency Graph for Dynamic By-Pass Instruction Scheduling
Bozdag et al. Compaction of schedules and a two-stage approach for duplication-based DAG scheduling
JPH09282179A (en) Method and device for instruction scheduling in optimized compiler for minimizing overhead instruction
Calland et al. Circuit retiming applied to decomposed software pipelining
CN113157318B (en) GPDSP assembly transplanting optimization method and system based on countdown buffering
US20040268335A1 (en) Modulo scheduling of multiple instruction chains
US7437719B2 (en) Combinational approach for developing building blocks of DSP compiler
Ito et al. Ilp-based cost-optimal dsp synthesis with module selection and data format conversion
US20020083423A1 (en) List scheduling algorithm for a cycle-driven instruction scheduler
CN114217966A (en) Deep learning model dynamic batch processing scheduling method and system based on resource adjustment
Timmer et al. Execution interval analysis under resource constraints
EP0889405A1 (en) Software debugging method
Bakshi et al. A scheduling and pipelining algorithm for hardware/software systems
WO2003098434A2 (en) Method for ordering processor operations for modulo-scheduling
US7979860B2 (en) Method for estimating cost when placing operations within a modulo scheduler when scheduling for processors with a large number of function units or reconfigurable data paths
CN109885383B (en) Non-unit time task scheduling method with constraint condition
Wang et al. Decomposed software pipelining
CN117591242B (en) Compiling optimization method, system, storage medium and terminal based on bottom virtual machine

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP