US20020144101A1 - Caching DAG traces - Google Patents
Caching DAG traces Download PDFInfo
- Publication number
- US20020144101A1 US20020144101A1 US09/823,235 US82323501A US2002144101A1 US 20020144101 A1 US20020144101 A1 US 20020144101A1 US 82323501 A US82323501 A US 82323501A US 2002144101 A1 US2002144101 A1 US 2002144101A1
- Authority
- US
- United States
- Prior art keywords
- trace
- instruction
- instructions
- criterion
- dag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000001419 dependent effect Effects 0.000 claims description 14
- 238000000034 method Methods 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims 4
- 238000011156 evaluation Methods 0.000 claims 2
- 230000000875 corresponding effect Effects 0.000 description 11
- 230000001960 triggered effect Effects 0.000 description 8
- 238000007635 classification algorithm Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 239000012634 fragment Substances 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 125000002015 acyclic group Chemical group 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
Definitions
- This invention relates to caching DAG (directed acyclic graph) traces.
- a processing system 10 includes a processor 11 , a data cache 12 , and a main memory 13 .
- an instruction cache 14 stores a program, or a sequence of instructions, for the processor to execute.
- a pipeline 15 which is a simplified example for demonstrating the executions of instructions, is included in processor 11 .
- Pipeline 15 executes the instructions and generates results to be stored in data cache 12 or main memory 13 .
- Pipeline 15 includes stages, each executing a function in parallel with, and independent of, executions in other stages.
- the first stage of pipeline 15 is generally a fetch stage 151 that fetches an instruction from instruction cache 14 .
- a decode stage 152 following fetch stage 151 decodes the fetched instruction into an opcode (e.g., Add, Subtract, and Load) and one or more operands (e.g., register 5 ).
- a register read stage 153 fetches operand values of the operation specified in the decoded instruction from registers (not shown) in pipeline 15 , and sends the instruction to functional units (such as arithmetic and logic units) at an execution stage 154 to perform the arithmetic and logic operation specified by the opcode.
- stage 154 is also responsible for access data cache 12 hierarchy.
- a write-back stage 155 commits the results of the operation to the registers (not shown).
- stages of pipeline 15 can operate at the same time on different instructions. For example, while decode stage 152 is decoding an instruction, fetch stage 151 can fetch another instruction from instruction cache 14 . However, the decision as to which instruction to fetch is often based on the results of previous instructions. For example, depending on the results of the instructions preceding a branch instruction (e.g., an if-statement), a branch in the sequence of instructions may or may not be taken. Fetch stage 151 may use a branch prediction algorithm to determine whether the next instruction to fetch is sequentially the next instruction in the program sequence. The next instruction in the program sequence is also called a fall-through instruction, which is fetched if the branch is predicted not taken. If the branch is predicted taken, an instruction at the branch target is fetched.
- branch prediction algorithm e.g., an if-statement
- branch mispredicted the instructions fetched by mistake, between the time when the branch was fetched and when the branch is computed in execute stage 154 , need to be removed from pipeline. Consequently, long latency will occur for pipeline 15 to remove the partially executed fetched instruction and to retrieve the actual next instruction. This latency is usually called branch misprediction penalty.
- the load instructions can stall the operations on pipeline 15 . Specifically, the load instructions load a data block from data cache 12 to the registers of pipeline 15 . If that data block is not in data cache 12 (i.e., a cache miss occurs), pipeline 15 may stall as a result until the data block is brought into the cache from main memory 13 or other secondary data storages.
- FIG. 1 is a diagram of a processing system for executing instructions
- FIG. 2 is a diagram of another processing system for executing instructions
- FIG. 3A is a DAG (directed acyclic graph) representing interdependent instructions
- FIG. 3B is an array storing a representation of the directed acyclic graph
- FIG. 4 illustrates a data structure of a DAG trace cache
- FIG. 5 is another directed acyclic graph with subslice classification results.
- FIG. 6 is an example of a subslice classification algorithm.
- processor 11 includes a DAG trace cache 22 for storing DAG traces.
- Each DAG trace contains information about a group of interdependent instructions among which data dependency exists.
- the information stored in DAG trace 22 allows pipeline 15 to dynamically predict or pre-compute in outcome of a criterion instruction in attempt to prevent the criterion instruction from incurring long latency.
- the interdependent instructions include a criterion instruction, which is either a branch instruction or a load instruction that can incur long latency when executed by pipeline 15 .
- the interdependent instructions also include the instructions, called associated instructions, from which the criterion instruction has data dependence.
- DAG trace cache 22 can be stored as a separate entity from instruction cache 14 , or can be logically embedded as part of an instruction cache 14 and reside in any cache lines of the instruction cache that are marked as part of the trace cache.
- Processor 11 further includes a trace builder 21 that constructs the DAG traces from the instructions stored in instruction cache 14 or from the committed instructions and their execution results generated by pipeline 15 . The traces built by trace builder 21 are placed in trace cache 22 .
- a criterion instruction and its associated instructions can be represented by a directed acyclic graph (DAG) 38 , as shown in FIG. 3A.
- a DAG includes nodes representing instructions, with each node connected to one or more other nodes in the DAG. Between any two connected nodes, there is a directed line that points to either one of the nodes. The directed line represents the dependency between the two instructions represented by the two connected nodes. The instruction being pointed to (i.e., the child) is dependent on the other instruction (i.e., the parent).
- a DAG represents a portion of an operative program; therefore, a DAG is acyclic, just as an operative program contains no circular dependency.
- a DAG can be traced from a number of starting nodes to reach the node representing the criterion instruction. Therefore, given a sequence of interdependent instructions and its corresponding DAG representation, it is possible to change the order of the sequence of the instructions without changing the corresponding DAG representation.
- the last instruction of the sequence must always be the criterion instruction, because the criterion instruction directly or indirectly depends from all the other instructions in the sequence.
- Processor 11 typically repeatedly runs certain programs, e.g., operating system scripts, or portions of a program sequence, such as a loop.
- programs e.g., operating system scripts, or portions of a program sequence, such as a loop.
- the criterion instructions in the programs, as well as their corresponding DAGs are also executed repeatedly.
- these criterion instructions depend on their preceding instructions, one would not know which instruction or data entry to fetch until the execution, or resolution, of these preceding instructions. Nevertheless, the information in a DAG trace, derived from previous execution results or a priori knowledge of the programs, allows pipeline 15 to dynamically predict or pre-compute execution results of a criterion instruction. Pipeline 15 is therefore able to pre-fetch future instructions if the criterion instruction is a branch, or data cache entries if the criterion instruction is a load.
- DAG Trace builder 21 employs a DAG extractor 30 to extract a DAG that represents a criterion instruction and the associated instructions. Information about these instructions is stored as a DAG trace in DAG trace cache 22 with other related information as will be described in detail below.
- the instructions of the trace are fetched from DAG trace cache 22 , decoded (if the DAG trace originally captured is not saved in a decoded format) and executed speculatively in order to predict and/or pre-compute the result of the criterion instruction, e.g., a direction of a branch, a target of a branch, or the data reference address to be accessed soon by the program sequence.
- pipeline 15 executes the original instruction sequence as if there were no predictions or pre-computation by the DAG trace execution.
- pipeline 15 resolves the criterion instruction and the associated instructions in the original instruction sequence, the results of the speculative executions will be either confirmed and adopted, or discarded.
- processor 61 which can be programmed to perform the same function as processor 11 , includes a main pipeline 65 and a daughter pipeline 66 disjoint from the main pipeline. In this embodiment, the main thread and the speculative thread are executed on the two disjoint pipelines.
- the main pipeline 65 executes instructions on the main thread while daughter pipeline 66 executes instructions on the speculative thread.
- the result of the speculative execution causes pre-fetches from either instruction cache 14 or data cache 62 , which are shared between daughter pipeline 66 and main pipeline 65 .
- the pre-fetches allow main pipeline 65 to improve performance.
- the speculative execution do not interfere with the execution of the original instruction sequence regardless of the location in which the speculative thread is executed.
- the speculative executions do not write any value into the registers, neither is the speculative thread allowed to do any store into data cache 62 or main memory 13 . This ensures that the speculative thread does not interfere architectural states of the main thread.
- trace builder 22 and DAG extractor 30 are software procedures of a compiler, or a runtime system (e.g. dynamic monitoring and optimization tools like Intel Vtune®, a product by Intel Corporation, Santa Clara, Calif.) that runs on processor 11 .
- DAG extractor 30 identifies criterion instructions that may incur long latency.
- DAG extractor 30 then captures these criterion instructions and their respective associated instructions by sliding an analysis window of a predetermined size down the original instruction sequence. When an identified criterion instruction moves into the bottom of the window, all the instructions in the window are captured as initial candidate instructions for a DAG trace, since potentially, the criterion instruction is data dependent upon all of them.
- Trace builder 21 examines the captured instructions and discards those having no interdependency relationship with the criterion instruction. The remaining instructions are built into a DAG trace and saved in a trace file, which has a binary format and can be directly loaded by a loader into trace cache 22 . It should be noted that many choices exist with respect to whether the DAG traces are built in the same binary as the original program binary, or the DAG traces are saved as a distinct trace file that is separate from the original program binary. If the DAG traces are built in the same binary as the original program binary, the loader only needs to load one single binary consisting of both original program and the DAG traces. Otherwise, the loader is responsible for separately loading in both the original program binary and the associated trace file.
- the format or representation of instructions in a DAG trace can differ across a wide spectrum, ranging from as simple as storing individual instruction address only, to storing pre-decoded format of instruction. If pre-decoded format is stored, the trace instructions, once fetched, will not need to go through the decode stage in the pipeline.
- trace builder 22 and DAG extractor 30 can be implemented using hardware exclusively or a combination of software and hardware.
- the compiler can identify the criterion instructions that may incur long latency by hint bits, which mark these criterion instructions as candidates for being included in DAG trace cache 22 at runtime.
- a candidate criterion instruction can also be determined in hardware at runtime without assistance from the compiler.
- DAG extractor 30 uses a dynamic detection mechanism that determines the candidates, which have induced latency penalties in previous dynamic executions. This can be accomplished by special hardware that tracks and maintains a table of candidate criterion instructions.
- a load instruction incurring a cache miss can be placed into the table as a candidate, if the latency of the miss, measured by the time required to retrieve the data for the load instruction and place it in data cache 12 , exceeds a predetermined time threshold.
- This can also be accomplished through simple heuristics such as cache line fill from memory tends to serve long latency cache misses, thus any load miss serviced by such cache line fill can be treated as a candidate for criterion instruction.
- the candidates can be further filtered to select the ones incurring latency with a frequency exceeding a predetermined recurrence threshold, or the ones incurring long and uncertain latency, as determined from the mean and variance of the latency.
- DAG extractor 30 may need to determine whether or not the trace already exists in trace cache 22 , which can happen when the trace is built dynamically based on previous executions. In addition, depending upon different dynamic behavior of a program sequence at different times, the DAG trace for a criterion instruction may need to be updated to reflect the behavior change in the dynamic execution of the program.
- DAG trace cache 22 physically includes a tag array and a data array. Each element in the tag array is an index that uniquely identifies a corresponding element in the data array.
- the element of the tag array stores the IP (Instruction Pointer) of the first instruction of a DAG trace, while the corresponding element of the data array stores the instructions or pointers to the instructions that form the trace.
- IP Instruction Pointer
- DAG extractor 30 uses an instruction address or instruction pointer as key to perform an associative lookup on the tag array of the trace cache.
- DAG extractor 30 can also locate a DAG trace using the IP of the last instruction of the trace, i.e., the criterion instruction of the trace.
- a DAG can represent an interdependent instruction sequence, i.e., instructions of a DAG trace, in various permutation orders. If the trace exists in a form that has a different first instruction, DAG extractor 30 would not be able to locate the trace with its first instruction. However, the last instruction of the trace is always the criterion instruction. Therefore, an ancillary tag array, also called an inverted tag array, can be used to store the IP of the criterion instruction of a DAG trace.
- the tag array and the ancillary tag array can co-exist in DAG trace cache 22 .
- the two arrays can be implemented physically as two separate arrays, or only logically separate but physically implemented in the same tag array with a bit in each entry of DAG trace cache 22 to distinguish to which of the two arrays a given tag (or the corresponding IP) belongs.
- DAG extractor 30 identifies a criterion instruction as a candidate, and determines that the criterion instruction and its associated instructions do not exist in trace cache 22 , DAG extractor 30 captures the instructions.
- DAG extractor 30 can use a hardware buffer, e.g. a FIFO (First-In-First-Out) buffer, just like the analysis window used by the compiler.
- FIFO First-In-First-Out
- the FIFO captures criterion instructions once any of them enters the FIFO. Once a criterion instruction enters, the criterion instruction, together with the instructions preceding it, will be captured and taken out of the FIFO.
- the captured instructions may include the ones that are not related to the criterion instruction.
- DAG extractor 30 Based on observations of previous executions of the criterion instruction and data dependency analysis, DAG extractor 30 removes the unrelated instructions and sends the rest of the instructions to trace builder 21 for trace construction.
- the size of the FIFO just like that of the analysis window, determines an upper bound for the size of the DAG representing the DAG trace.
- trace builder 21 further employs a DAG optimizer 31 to optimize the instructions captured by DAG extractor 30 .
- DAG optimizer 31 may simply assign value A directly to register C to bypass the operation that involves register B.
- DAG optimizer 31 may further pack multiple independent DAG traces that are adjacent in the original instruction sequence into a single VLIW (Very Long Instruction Word) trace for parallel executions.
- VLIW trace requires processor 11 to have wide VLIW execution resources for executing these independent DAG traces in parallel.
- the result of the optimization is a group of interdependent instructions, which can be stored in an array.
- the array captures the complete dependency relationship of the instructions and thus the corresponding DAG.
- an array 39 stores dependency information of the instructions in DAG 38 of FIG. 3A.
- Array 39 contains a number of lines, and each line further includes a number of elements.
- Each line includes a line element 310 containing a line number of the line; an IP/Line element 311 indicates whether an IP element 312 contains the IP of an instruction in DAG 38 , or contains a line number of another instruction in the DAG.
- the line also includes two one-bit fields 313 , 314 labeled by ‘P’ and ‘C’, respectively.
- a ‘1’ in “P” field 313 indicates that the next line in the array is a pointer to a parent of the node, and a ‘1’ in ‘C’ field 314 indicates that the next line is a pointer to a child of the node.
- a line with a ‘0’ in both fields indicates that the dependency relationship of the node is completely defined by the line and the lines above it, and that either another node starts from the next line, or the line is the last one for the DAG.
- the line may further include a type field 315 for storing classification results of the instruction in that line. Classifying the instructions further accelerates the execution of instructions, but requires the DAG trace of the instructions to be divided into subslices, as will be described below.
- the first line 320 of array 39 typically contains the IP of the first instruction in DAG 38 , which is also the first instruction coming out of the FIFO or the analysis window.
- the first line 320 of array 39 can alternatively store the IP of the criterion instruction, because the instructions are typically classified into subslices starting from the criterion instruction.
- trace builder 21 builds a DAG trace 40 according to information stored in array 39 .
- trace 40 includes a head of trace (HOT) 41 for storing the IP of the first instruction of the corresponding DAG; a body of trace (BOT) 42 containing decoded and scheduled instructions of the DAG or the IPs of these instructions; and an end of trace (EOT) 43 marking up the end of the trace.
- HOT head of trace
- BOT body of trace
- EOT end of trace
- HOT 41 may specify a triggering condition 410 for trace 40 .
- the triggering condition 410 of trace 40 is checked to determined if the trace should be executed.
- Triggering condition 410 can be satisfied, for example, when a predetermined triggering instruction has just been fetched and/or decoded by main pipeline 65 or by the main thread executed on pipeline 15 .
- a triggering instruction can be any instruction indicating that the criterion instruction of a DAG trace may be executed.
- the triggering instruction does not necessarily have a dependency relationship with the criterion instruction of the DAG trace, and can be inserted into the original instruction sequence by the compiler or hardware.
- the triggering condition can also include additional architectural state comparisons and/or micro-architectural state (i.e., architecturally invisible machine state) comparisons.
- a DAG trace may be triggered only when a triggering instruction is fetched and when the criterion instruction also incurs enough misses to exceed a certain threshold.
- the threshold is a form of micro-architectural state.
- HOT 41 can store the triggering instruction or the IP of the triggering instruction as an index for trace 40 .
- pipeline 15 or main pipeline 65 encounters a triggering instruction, it does a lookup in trace cache 22 to determine whether or not to execute trace 40 .
- the lookup operation does not need to be very fast because the pipeline has not actually encountered the instructions of trace 40 in the original instruction sequence. However, it is required that trace 40 be speculatively executed before main pipeline 65 or the main thread of pipeline 15 encounters the criterion instruction.
- Pipeline 15 or main pipeline 65 may turn off the triggering conditions of trace 40 . In one scenario, the triggering conditions may be turned off if the result of executing trace 40 is wrong most of the time. In some embodiment, this condition could also be used as heuristics to indicate obsoleteness of the current DAG trace and force discarding the current DAG trace and building new DAG trace for the criterion instruction. In another scenario, pipeline 15 or main pipeline 65 may not have enough resources to speculatively execute trace 40 .
- HOT 41 does not need to specify a triggering condition 410 if, for example, a passive run-ahead technique is used to determine when a DAG trace 40 should be triggered.
- This technique requires that a DAG trace 40 be triggered only when a stall condition on main pipeline 65 (or the main thread on pipeline 15 ) occurs.
- the stall may be caused by flushing incorrectly fetched instructions on mispredicated path from pipeline 15 , a miss in data cache 12 , thread switching on multithreaded pipeline such as simultaneously multithreaded (SMT), or switch-on-event multithreaded (SOEMT).
- SMT simultaneously multithreaded
- SOEMT switch-on-event multithreaded
- multiple DAG independent traces are packed into one VLIW and executed in parallel. When one of the DAG traces is triggered, all the other traces in the same VLIW are triggered as well. In some other situations, multiple data dependent DAG traces can be chained together serially. When the first trace of the chain is triggered, the other traces in the chain will also be triggered in a sequential order as specified in the chain of data dependency. If the criterion instruction of one DAG trace is depended upon by multiple DAG traces leading to multiple criterion instructions, then a multi-way DAG trace can be built so that once a criterion instruction is executed, more than one consecutive dependent DAG traces can be initiated in parallel. In a DAG trace cache organization, serially chained DAG traces is represented via the field of next trace in HOT 41 . For multi-way DAG trace, its HOT consists of multiple next trace fields, each leading to another DAG trace.
- HOT 41 may further include a confidence metric 413 to indicate the likelihood of correctness if trace 40 is speculatively executed.
- Confidence metric 413 is based on past executions of DAG trace 40 and is adjusted each time the DAG trace is executed according to the result of the execution, in comparison with the result of execution of the same instruction by the original program sequence. Confidence metric 413 can also be used to determine when DAG trace 40 should be replaced or rebuilt. When DAG trace cache 22 is full, and a new DAG trace needs to enter the DAG trace cache, one of the existing traces must be replaced. Confidence metric 413 indicates whether or not DAG trace 40 is least likely to provide correct or useful information. If the confidence metric 413 of DAG trace 40 indicates that the trace is least likely to be correct or useful among all the traces in DAG trace cache 22 , DAG trace 40 will be replaced with the new trace.
- DAG trace cache 22 can also use other replacement policies, e.g., the LRU (Least Recently Used), to determine if the cache entries that store trace 40 should be replaced.
- LRU Least Recently Used
- the LRU ensures that an entry that has not been recently accessed will be replaced quickly. It should be noted that when an entry is replaced, all the other entries that make reference to the replaced entry must be invalidated.
- confidence metric 413 can also be used to determine if trace builder 21 should rebuild DAG trace 40 .
- the trace can be rebuilt using information about most recent executions of the original instruction sequence to reflect any dynamic changes.
- the result of the criterion instruction will usually be hard to predict for the first several iterations of the loop, after which, however, a steady state will be reached and accuracy of the prediction will be improved. Therefore, a DAG trace 40 is best rebuilt after the steady state is reached.
- Each DAG trace 40 can use a counter 411 to count the number of times the trace is accessed in order to ensure that the trace has reached the steady state.
- DAG trace 40 is rebuilt when its counter 411 equals a frequency threshold 412 specified for the trace, and when the confidence metric 413 is also below the confidence threshold.
- Counter 411 is reset when DAG trace 40 is rebuilt.
- the compiler can provide hint information to specify the number of iterations required for the trace to reach steady state.
- the special hardware used by DAG extractor 30 for dynamic detection of trace candidates can also provide similar information.
- linear, nonlinear, or exponential back-off techniques can be used. For example, an exponential back-off technique requires that each time trace builder 21 rebuilds a DAG trace 40 , the trace builder waits twice as long as last time it waits, up to a predetermined time limit.
- HOT 41 or EOT 43 may also include information that specifies the pointers to the next trace, e.g., in a multi-way trace as described above.
- HOT 41 or EOT 43 may also include a next trace field to specify a pointer to the next fragment of the same trace. For example, when a DAG trace is longer than the line size of the physical trace cache 22 , the trace is broken into multiple fragments each having a size equal to the cache line size. The consecutive fragments are chained by pointers that are stored in the HOT 41 or EOT 43 of the respective cache lines. From these pointers, the trace can be dynamically reconstructed at runtime by concatenating these fragments.
- the next trace field can be controlled by additional prediction algorithm to speculatively correlated two DAG traces.
- the additional DAG trace prediction algorithms are similar to branch prediction.
- additional information used to gauge inter-trace correlation can be encoded in the next trace field as well, such as partial information of HOT to allow certain states to be shared or communicated between two correlated DAG traces.
- Other additional information in HOT 41 further allows processor 11 to produce predictions with improved performance or accuracy.
- the information generally includes: the criterion instruction of trace 40 , live-in (the collection of source operands) and live-out (the collection of destination operands) information, or information related to EOT 43 .
- BOT 43 stores instructions of a DAG trace or pointers to these instructions.
- instructions of a DAG trace can be grouped into subslices. The subslices can be identified by classifying the instructions of the DAG trace. After the subslices are identified, BOT 42 will store the references to the subslices. Therefore, if multiple DAG traces contain the same subslice, DAG trace cache 22 will only duplicate the references to the subslice.
- a DAG 50 includes five nodes, each representing an instruction.
- the criterion instruction is represented by IPS.
- the five instructions are classified into two subslice types: a True Data Dependency (TDD) subslice, which includes all arithmetic and logical operations that contribute to the result of the criterion instruction, and an Address (ADDR) subslice, which includes address calculations of the loads and the stores that contribute to the result of the criterion instruction, along with the loads and the stores.
- TDD True Data Dependency
- ADDR Address
- subslice types may further include: an Existence (EX) subslice, which includes branches that affect whether or not the criterion instruction is executed; and a Control Flow (CF) subslice, which includes all the branches that affect the result of the criterion instruction, but do not affect whether or not the criterion instruction is executed.
- EX Existence
- CF Control Flow
- a subslice can include instructions that do not have direct dependency relationship with other instructions in the same subslice. For example, IP 1 , IP 2 , and IP 4 in FIG. 5 belong to the same TDD subslice, where IP 4 is neither dependent on, nor depended by, IP 1 or IP 2 .
- FIG. 6 illustrates an example of a classification algorithm 60 run by trace builder 21 for classifying the instructions of a DAG trace.
- Classification algorithm 60 initially assumes that all nodes of the corresponding DAG belong to the TDD subslice type. Then, starting from the criterion instruction, algorithm 60 dynamically classifies each of the other nodes in the DAG. If trace builder 21 runs the classification algorithm on DAG 38 of FIG. 3A, the result of the classification will enter ‘type’ field 315 of array 39 in FIG. 3B.
- Classification algorithm 60 can be implemented as a state machine in either hardware or software. Optimization techniques applied on a DAG trace, as described above, can also be performed on a subslice.
- subslices Once the subslices are identified, they can be placed into a portion of trace cache 22 assigned to subslices, called a subslice cache 29 that contains subslice entries. Instead of caching an entire subslice in a single subslice entry, each subslice entry only stores a dependent piece of the subslice, that is, instructions that have direct dependency relationship and belong to the same subslice type. In the corresponding DAG, a dependent piece of the subslice contains the nodes that are connected and belong to the same subslice type.
- the TDD subslice includes three instructions represented by IP 1 , IP 2 , and IP 4 .
- IP 1 and IP 2 is one dependent piece of the TDD subslice
- IP 4 is the other dependent piece of the TDD subslice.
- IP 3 the only node in the ADDR subslice, is a dependent piece of the ADDR subslice. Each of the dependent pieces is stored in a subslice entry.
- Trace builder 21 employs a hardware mechanism to copy the instructions from the array, such as the one in FIG. 3B, into subslice cache 29 in the form of subslice entries. Each subslice entry contains a dependent piece of a subslice. Before a new subslice is saved, trace builder 21 will check if the new piece of the subslice has the same first instruction as a piece of a subslice that already exists in subslice cache 29 . If such a piece exists, the new piece will be discarded, and the trace containing the new piece will point to the existing piece. The new piece is discarded base on the assumption that, if two pieces have the same first instruction, other instructions in the two pieces will also be the same.
- Trace builder 21 checks if a subslice, or a piece of a subslice, exists in subslice cache 29 by locating the IP of the first instruction of the subslice or the piece.
- Subslice cache 29 just like trace cache 22 where the subslice cache resides, physically includes a tag array and a data array. Locating a subslice in subslice cache 29 can be performed in the same way as locating a DAG trace in trace cache 22 as described above. Other embodiments are within the scope of the following claims.
Abstract
A DAG trace cache includes traces, each storing information about interdependent instructions and the interdependency among the instructions. The interdependent instructions include a criterion instruction and are part of a program sequence that is stored in an instruction cache. The information is in the form of a directed acyclic graph. The interdependent instructions include the criterion instruction and instructions preceding the criterion instruction in the program sequence. The information in the DAG trace is used to accelerate executions of the instructions on a processor.
Description
- This invention relates to caching DAG (directed acyclic graph) traces.
- In the following description of the embodiments, reference is made to the accompanying drawings that show, by way of illustration, specific embodiments in which the invention maybe practiced. In the drawings, like numerals describe substantially similar components throughout the several views. These embodiments are described in sufficient details to enable people skilled in the art to practice the invention. Other embodiments may be utilized and structural, logical and electrical changes may be made without departing from the scope of the present invention. Moreover, it is to be understood that the various embodiments of the invention, although different, are not necessarily mutually exclusive. For example, a particular feature, structure, or characteristic described in one embodiment may be included within other embodiments. The following description is, therefore, not to be taken in a limiting sense.
- Referring to FIG. 1, a
processing system 10 includes a processor 11, adata cache 12, and amain memory 13. Within processor 11, an instruction cache 14 stores a program, or a sequence of instructions, for the processor to execute. In the scenario illustrated in FIG. 1, apipeline 15, which is a simplified example for demonstrating the executions of instructions, is included in processor 11.Pipeline 15 executes the instructions and generates results to be stored indata cache 12 ormain memory 13.Pipeline 15 includes stages, each executing a function in parallel with, and independent of, executions in other stages. For example, the first stage ofpipeline 15 is generally afetch stage 151 that fetches an instruction frominstruction cache 14. Adecode stage 152 followingfetch stage 151 decodes the fetched instruction into an opcode (e.g., Add, Subtract, and Load) and one or more operands (e.g., register 5). Subsequently, a register readstage 153 fetches operand values of the operation specified in the decoded instruction from registers (not shown) inpipeline 15, and sends the instruction to functional units (such as arithmetic and logic units) at anexecution stage 154 to perform the arithmetic and logic operation specified by the opcode. For load and store instruction,stage 154 is also responsible foraccess data cache 12 hierarchy. A write-back stage 155 commits the results of the operation to the registers (not shown). - To increase execution speed, stages of
pipeline 15 can operate at the same time on different instructions. For example, while decodestage 152 is decoding an instruction,fetch stage 151 can fetch another instruction frominstruction cache 14. However, the decision as to which instruction to fetch is often based on the results of previous instructions. For example, depending on the results of the instructions preceding a branch instruction (e.g., an if-statement), a branch in the sequence of instructions may or may not be taken.Fetch stage 151 may use a branch prediction algorithm to determine whether the next instruction to fetch is sequentially the next instruction in the program sequence. The next instruction in the program sequence is also called a fall-through instruction, which is fetched if the branch is predicted not taken. If the branch is predicted taken, an instruction at the branch target is fetched. - If branch is mispredicted, the instructions fetched by mistake, between the time when the branch was fetched and when the branch is computed in
execute stage 154, need to be removed from pipeline. Consequently, long latency will occur forpipeline 15 to remove the partially executed fetched instruction and to retrieve the actual next instruction. This latency is usually called branch misprediction penalty. - Similar to branch instructions, the load instructions can stall the operations on
pipeline 15. Specifically, the load instructions load a data block fromdata cache 12 to the registers ofpipeline 15. If that data block is not in data cache 12 (i.e., a cache miss occurs),pipeline 15 may stall as a result until the data block is brought into the cache frommain memory 13 or other secondary data storages. - FIG. 1 is a diagram of a processing system for executing instructions;
- FIG. 2 is a diagram of another processing system for executing instructions;
- FIG. 3A is a DAG (directed acyclic graph) representing interdependent instructions;
- FIG. 3B is an array storing a representation of the directed acyclic graph;
- FIG. 4 illustrates a data structure of a DAG trace cache;
- FIG. 5 is another directed acyclic graph with subslice classification results; and
- FIG. 6 is an example of a subslice classification algorithm.
- Referring to FIG. 1, processor11 includes a
DAG trace cache 22 for storing DAG traces. Each DAG trace contains information about a group of interdependent instructions among which data dependency exists. The information stored inDAG trace 22, as will be described in detail below with reference to FIG. 4, allowspipeline 15 to dynamically predict or pre-compute in outcome of a criterion instruction in attempt to prevent the criterion instruction from incurring long latency. The interdependent instructions include a criterion instruction, which is either a branch instruction or a load instruction that can incur long latency when executed bypipeline 15. The interdependent instructions also include the instructions, called associated instructions, from which the criterion instruction has data dependence. For example, assume that a branch bases its outcome on the sign of a value V (e.g., if V>0, then instruction_block_1, else instruction_block_2). Prior to the branch, V is assigned the value of a register R. The criterion instruction, in this example, is the branch. The associated instructions include the instruction that assigns the value of register R to V, and, if any, the instructions that modify the value of register R before it is assigned to V.DAG trace cache 22 can be stored as a separate entity frominstruction cache 14, or can be logically embedded as part of aninstruction cache 14 and reside in any cache lines of the instruction cache that are marked as part of the trace cache. Processor 11 further includes atrace builder 21 that constructs the DAG traces from the instructions stored ininstruction cache 14 or from the committed instructions and their execution results generated bypipeline 15. The traces built bytrace builder 21 are placed intrace cache 22. - Conceptually, a criterion instruction and its associated instructions can be represented by a directed acyclic graph (DAG)38, as shown in FIG. 3A. A DAG includes nodes representing instructions, with each node connected to one or more other nodes in the DAG. Between any two connected nodes, there is a directed line that points to either one of the nodes. The directed line represents the dependency between the two instructions represented by the two connected nodes. The instruction being pointed to (i.e., the child) is dependent on the other instruction (i.e., the parent). A DAG represents a portion of an operative program; therefore, a DAG is acyclic, just as an operative program contains no circular dependency. To illustrate the acyclic property, one may start from any node in a given DAG, and trace the direction of a connecting line that leads to another node. Repeating the same for the other node, and continues doing so for the next node, and so forth, one will never return to the starting node.
- A DAG can be traced from a number of starting nodes to reach the node representing the criterion instruction. Therefore, given a sequence of interdependent instructions and its corresponding DAG representation, it is possible to change the order of the sequence of the instructions without changing the corresponding DAG representation. The last instruction of the sequence must always be the criterion instruction, because the criterion instruction directly or indirectly depends from all the other instructions in the sequence.
- Processor11 typically repeatedly runs certain programs, e.g., operating system scripts, or portions of a program sequence, such as a loop. As a result, the criterion instructions in the programs, as well as their corresponding DAGs, are also executed repeatedly. As described above, because these criterion instructions depend on their preceding instructions, one would not know which instruction or data entry to fetch until the execution, or resolution, of these preceding instructions. Nevertheless, the information in a DAG trace, derived from previous execution results or a priori knowledge of the programs, allows
pipeline 15 to dynamically predict or pre-compute execution results of a criterion instruction.Pipeline 15 is therefore able to pre-fetch future instructions if the criterion instruction is a branch, or data cache entries if the criterion instruction is a load. -
DAG Trace builder 21 employs aDAG extractor 30 to extract a DAG that represents a criterion instruction and the associated instructions. Information about these instructions is stored as a DAG trace inDAG trace cache 22 with other related information as will be described in detail below. When a predetermined triggering condition for the DAG trace is met, the instructions of the trace are fetched fromDAG trace cache 22, decoded (if the DAG trace originally captured is not saved in a decoded format) and executed speculatively in order to predict and/or pre-compute the result of the criterion instruction, e.g., a direction of a branch, a target of a branch, or the data reference address to be accessed soon by the program sequence. In parallel with the speculative executions,pipeline 15 executes the original instruction sequence as if there were no predictions or pre-computation by the DAG trace execution. Whenpipeline 15 resolves the criterion instruction and the associated instructions in the original instruction sequence, the results of the speculative executions will be either confirmed and adopted, or discarded. - In one embodiment in which
pipeline 15 has simultaneous multithreading (SMT) support, instructions of a DAG trace and the instructions pre-fetched as a result of executing the trace, can be executed as a distinct speculative thread on the pipeline. The speculative thread can be executed in parallel with a main thread executing the original instruction sequence onpipeline 15. In another embodiment, the instructions executed on the speculative thread can be executed on a distinct, secondary, daughter pipeline. Referring to FIG. 2, processor 61, which can be programmed to perform the same function as processor 11, includes a main pipeline 65 and a daughter pipeline 66 disjoint from the main pipeline. In this embodiment, the main thread and the speculative thread are executed on the two disjoint pipelines. The main pipeline 65 executes instructions on the main thread while daughter pipeline 66 executes instructions on the speculative thread. The result of the speculative execution causes pre-fetches from eitherinstruction cache 14 ordata cache 62, which are shared between daughter pipeline 66 and main pipeline 65. The pre-fetches allow main pipeline 65 to improve performance. In both of the above two embodiments, the speculative execution do not interfere with the execution of the original instruction sequence regardless of the location in which the speculative thread is executed. In particular, the speculative executions do not write any value into the registers, neither is the speculative thread allowed to do any store intodata cache 62 ormain memory 13. This ensures that the speculative thread does not interfere architectural states of the main thread. In one scenario,trace builder 22 andDAG extractor 30 are software procedures of a compiler, or a runtime system (e.g. dynamic monitoring and optimization tools like Intel Vtune®, a product by Intel Corporation, Santa Clara, Calif.) that runs on processor 11. Through profiling the original instruction sequence,DAG extractor 30 identifies criterion instructions that may incur long latency.DAG extractor 30 then captures these criterion instructions and their respective associated instructions by sliding an analysis window of a predetermined size down the original instruction sequence. When an identified criterion instruction moves into the bottom of the window, all the instructions in the window are captured as initial candidate instructions for a DAG trace, since potentially, the criterion instruction is data dependent upon all of them.Trace builder 21 examines the captured instructions and discards those having no interdependency relationship with the criterion instruction. The remaining instructions are built into a DAG trace and saved in a trace file, which has a binary format and can be directly loaded by a loader intotrace cache 22. It should be noted that many choices exist with respect to whether the DAG traces are built in the same binary as the original program binary, or the DAG traces are saved as a distinct trace file that is separate from the original program binary. If the DAG traces are built in the same binary as the original program binary, the loader only needs to load one single binary consisting of both original program and the DAG traces. Otherwise, the loader is responsible for separately loading in both the original program binary and the associated trace file. - In addition, depending on implementation choices, the format or representation of instructions in a DAG trace can differ across a wide spectrum, ranging from as simple as storing individual instruction address only, to storing pre-decoded format of instruction. If pre-decoded format is stored, the trace instructions, once fetched, will not need to go through the decode stage in the pipeline.
- Alternatively,
trace builder 22 andDAG extractor 30 can be implemented using hardware exclusively or a combination of software and hardware. In a more elaborate alternative scenario, the compiler can identify the criterion instructions that may incur long latency by hint bits, which mark these criterion instructions as candidates for being included inDAG trace cache 22 at runtime. In another scenario, a candidate criterion instruction can also be determined in hardware at runtime without assistance from the compiler.DAG extractor 30 uses a dynamic detection mechanism that determines the candidates, which have induced latency penalties in previous dynamic executions. This can be accomplished by special hardware that tracks and maintains a table of candidate criterion instructions. For example, a load instruction incurring a cache miss can be placed into the table as a candidate, if the latency of the miss, measured by the time required to retrieve the data for the load instruction and place it indata cache 12, exceeds a predetermined time threshold. This can also be accomplished through simple heuristics such as cache line fill from memory tends to serve long latency cache misses, thus any load miss serviced by such cache line fill can be treated as a candidate for criterion instruction. The candidates can be further filtered to select the ones incurring latency with a frequency exceeding a predetermined recurrence threshold, or the ones incurring long and uncertain latency, as determined from the mean and variance of the latency. Once a criterion instruction is identified as a criterion instruction candidate for building a DAG trace,DAG extractor 30 may need to determine whether or not the trace already exists intrace cache 22, which can happen when the trace is built dynamically based on previous executions. In addition, depending upon different dynamic behavior of a program sequence at different times, the DAG trace for a criterion instruction may need to be updated to reflect the behavior change in the dynamic execution of the program. - Like traditional cache structures,
DAG trace cache 22 physically includes a tag array and a data array. Each element in the tag array is an index that uniquely identifies a corresponding element in the data array. The element of the tag array stores the IP (Instruction Pointer) of the first instruction of a DAG trace, while the corresponding element of the data array stores the instructions or pointers to the instructions that form the trace. To locate a DAG trace intrace cache 22,DAG extractor 30 uses an instruction address or instruction pointer as key to perform an associative lookup on the tag array of the trace cache. -
DAG extractor 30 can also locate a DAG trace using the IP of the last instruction of the trace, i.e., the criterion instruction of the trace. As describe above, a DAG can represent an interdependent instruction sequence, i.e., instructions of a DAG trace, in various permutation orders. If the trace exists in a form that has a different first instruction,DAG extractor 30 would not be able to locate the trace with its first instruction. However, the last instruction of the trace is always the criterion instruction. Therefore, an ancillary tag array, also called an inverted tag array, can be used to store the IP of the criterion instruction of a DAG trace. The tag array and the ancillary tag array can co-exist inDAG trace cache 22. The two arrays can be implemented physically as two separate arrays, or only logically separate but physically implemented in the same tag array with a bit in each entry ofDAG trace cache 22 to distinguish to which of the two arrays a given tag (or the corresponding IP) belongs. OnceDAG extractor 30 identifies a criterion instruction as a candidate, and determines that the criterion instruction and its associated instructions do not exist intrace cache 22,DAG extractor 30 captures the instructions.DAG extractor 30 can use a hardware buffer, e.g. a FIFO (First-In-First-Out) buffer, just like the analysis window used by the compiler. The FIFO captures criterion instructions once any of them enters the FIFO. Once a criterion instruction enters, the criterion instruction, together with the instructions preceding it, will be captured and taken out of the FIFO. The captured instructions may include the ones that are not related to the criterion instruction. Based on observations of previous executions of the criterion instruction and data dependency analysis,DAG extractor 30 removes the unrelated instructions and sends the rest of the instructions to tracebuilder 21 for trace construction. The size of the FIFO, just like that of the analysis window, determines an upper bound for the size of the DAG representing the DAG trace. - In one embodiment,
trace builder 21 further employs aDAG optimizer 31 to optimize the instructions captured byDAG extractor 30. One approach is to bypass redundant instructions in the DAG. For example, assume that a value A is assigned to a register B, and the value of register B is then assigned to another registerC. DAG optimizer 31 may simply assign value A directly to register C to bypass the operation that involves registerB. DAG optimizer 31 may further pack multiple independent DAG traces that are adjacent in the original instruction sequence into a single VLIW (Very Long Instruction Word) trace for parallel executions. The VLIW trace, however, requires processor 11 to have wide VLIW execution resources for executing these independent DAG traces in parallel. - The result of the optimization is a group of interdependent instructions, which can be stored in an array. The array captures the complete dependency relationship of the instructions and thus the corresponding DAG. Referring to FIG. 3B, an
array 39 stores dependency information of the instructions inDAG 38 of FIG. 3A.Array 39 contains a number of lines, and each line further includes a number of elements. Each line includes aline element 310 containing a line number of the line; an IP/Line element 311 indicates whether anIP element 312 contains the IP of an instruction inDAG 38, or contains a line number of another instruction in the DAG. The line also includes two one-bit fields field 313 indicates that the next line in the array is a pointer to a parent of the node, and a ‘1’ in ‘C’field 314 indicates that the next line is a pointer to a child of the node. A line with a ‘0’ in both fields, however, indicates that the dependency relationship of the node is completely defined by the line and the lines above it, and that either another node starts from the next line, or the line is the last one for the DAG. The line may further include atype field 315 for storing classification results of the instruction in that line. Classifying the instructions further accelerates the execution of instructions, but requires the DAG trace of the instructions to be divided into subslices, as will be described below. - The
first line 320 ofarray 39 typically contains the IP of the first instruction inDAG 38, which is also the first instruction coming out of the FIFO or the analysis window. However, thefirst line 320 ofarray 39 can alternatively store the IP of the criterion instruction, because the instructions are typically classified into subslices starting from the criterion instruction. - Referring to FIG. 4,
trace builder 21 builds aDAG trace 40 according to information stored inarray 39. In general,trace 40 includes a head of trace (HOT) 41 for storing the IP of the first instruction of the corresponding DAG; a body of trace (BOT) 42 containing decoded and scheduled instructions of the DAG or the IPs of these instructions; and an end of trace (EOT) 43 marking up the end of the trace. In some embodiments where the end of a DAG trace is specified inHOT 41,EOT 43 may not exist. - HOT41 may specify a triggering
condition 410 fortrace 40. Beforetrace 40 is fetched fromtrace cache 22, the triggeringcondition 410 oftrace 40 is checked to determined if the trace should be executed. Triggeringcondition 410 can be satisfied, for example, when a predetermined triggering instruction has just been fetched and/or decoded by main pipeline 65 or by the main thread executed onpipeline 15. In general, a triggering instruction can be any instruction indicating that the criterion instruction of a DAG trace may be executed. The triggering instruction does not necessarily have a dependency relationship with the criterion instruction of the DAG trace, and can be inserted into the original instruction sequence by the compiler or hardware. In addition to the IP of a triggering instruction, the triggering condition can also include additional architectural state comparisons and/or micro-architectural state (i.e., architecturally invisible machine state) comparisons. For example, a DAG trace may be triggered only when a triggering instruction is fetched and when the criterion instruction also incurs enough misses to exceed a certain threshold. The threshold is a form of micro-architectural state. HOT 41 can store the triggering instruction or the IP of the triggering instruction as an index fortrace 40. Whenpipeline 15 or main pipeline 65 encounters a triggering instruction, it does a lookup intrace cache 22 to determine whether or not to executetrace 40. The lookup operation does not need to be very fast because the pipeline has not actually encountered the instructions oftrace 40 in the original instruction sequence. However, it is required that trace 40 be speculatively executed before main pipeline 65 or the main thread ofpipeline 15 encounters the criterion instruction.Pipeline 15 or main pipeline 65 may turn off the triggering conditions oftrace 40. In one scenario, the triggering conditions may be turned off if the result of executingtrace 40 is wrong most of the time. In some embodiment, this condition could also be used as heuristics to indicate obsoleteness of the current DAG trace and force discarding the current DAG trace and building new DAG trace for the criterion instruction. In another scenario,pipeline 15 or main pipeline 65 may not have enough resources to speculatively executetrace 40. -
HOT 41 does not need to specify a triggeringcondition 410 if, for example, a passive run-ahead technique is used to determine when aDAG trace 40 should be triggered. This technique requires that aDAG trace 40 be triggered only when a stall condition on main pipeline 65 (or the main thread on pipeline 15) occurs. The stall may be caused by flushing incorrectly fetched instructions on mispredicated path frompipeline 15, a miss indata cache 12, thread switching on multithreaded pipeline such as simultaneously multithreaded (SMT), or switch-on-event multithreaded (SOEMT). TheDAG trace 40 to be triggered is generally the DAG trace that closely follows the instruction incurring the stall. - In another embodiment, multiple DAG independent traces are packed into one VLIW and executed in parallel. When one of the DAG traces is triggered, all the other traces in the same VLIW are triggered as well. In some other situations, multiple data dependent DAG traces can be chained together serially. When the first trace of the chain is triggered, the other traces in the chain will also be triggered in a sequential order as specified in the chain of data dependency. If the criterion instruction of one DAG trace is depended upon by multiple DAG traces leading to multiple criterion instructions, then a multi-way DAG trace can be built so that once a criterion instruction is executed, more than one consecutive dependent DAG traces can be initiated in parallel. In a DAG trace cache organization, serially chained DAG traces is represented via the field of next trace in
HOT 41. For multi-way DAG trace, its HOT consists of multiple next trace fields, each leading to another DAG trace. - HOT41 may further include a confidence metric 413 to indicate the likelihood of correctness if
trace 40 is speculatively executed.Confidence metric 413 is based on past executions ofDAG trace 40 and is adjusted each time the DAG trace is executed according to the result of the execution, in comparison with the result of execution of the same instruction by the original program sequence. Confidence metric 413 can also be used to determine whenDAG trace 40 should be replaced or rebuilt. WhenDAG trace cache 22 is full, and a new DAG trace needs to enter the DAG trace cache, one of the existing traces must be replaced.Confidence metric 413 indicates whether or notDAG trace 40 is least likely to provide correct or useful information. If theconfidence metric 413 ofDAG trace 40 indicates that the trace is least likely to be correct or useful among all the traces inDAG trace cache 22,DAG trace 40 will be replaced with the new trace. -
DAG trace cache 22 can also use other replacement policies, e.g., the LRU (Least Recently Used), to determine if the cache entries that storetrace 40 should be replaced. The LRU ensures that an entry that has not been recently accessed will be replaced quickly. It should be noted that when an entry is replaced, all the other entries that make reference to the replaced entry must be invalidated. - As described above, confidence metric413 can also be used to determine if
trace builder 21 should rebuildDAG trace 40. When confidence metric 413 falls below a pre-selected confidence threshold, the trace can be rebuilt using information about most recent executions of the original instruction sequence to reflect any dynamic changes. When the criterion instruction ofDAG trace 40 resides in a loop of only a few instructions, the result of the criterion instruction will usually be hard to predict for the first several iterations of the loop, after which, however, a steady state will be reached and accuracy of the prediction will be improved. Therefore, aDAG trace 40 is best rebuilt after the steady state is reached. EachDAG trace 40 can use a counter 411 to count the number of times the trace is accessed in order to ensure that the trace has reached the steady state.DAG trace 40 is rebuilt when its counter 411 equals a frequency threshold 412 specified for the trace, and when theconfidence metric 413 is also below the confidence threshold. Counter 411 is reset whenDAG trace 40 is rebuilt. There are several approaches for determining the frequency threshold 412 for aDAG trace 40. The compiler can provide hint information to specify the number of iterations required for the trace to reach steady state. The special hardware used byDAG extractor 30 for dynamic detection of trace candidates can also provide similar information. Alternatively, linear, nonlinear, or exponential back-off techniques can be used. For example, an exponential back-off technique requires that eachtime trace builder 21 rebuilds aDAG trace 40, the trace builder waits twice as long as last time it waits, up to a predetermined time limit. - HOT41 or
EOT 43 may also include information that specifies the pointers to the next trace, e.g., in a multi-way trace as described above. - HOT41 or
EOT 43 may also include a next trace field to specify a pointer to the next fragment of the same trace. For example, when a DAG trace is longer than the line size of thephysical trace cache 22, the trace is broken into multiple fragments each having a size equal to the cache line size. The consecutive fragments are chained by pointers that are stored in the HOT 41 orEOT 43 of the respective cache lines. From these pointers, the trace can be dynamically reconstructed at runtime by concatenating these fragments. - In some embodiment, the next trace field can be controlled by additional prediction algorithm to speculatively correlated two DAG traces. The additional DAG trace prediction algorithms are similar to branch prediction. In a more sophisticated embodiment, additional information used to gauge inter-trace correlation can be encoded in the next trace field as well, such as partial information of HOT to allow certain states to be shared or communicated between two correlated DAG traces. Other additional information in
HOT 41 further allows processor 11 to produce predictions with improved performance or accuracy. The information generally includes: the criterion instruction oftrace 40, live-in (the collection of source operands) and live-out (the collection of destination operands) information, or information related toEOT 43. -
BOT 43 stores instructions of a DAG trace or pointers to these instructions. To compact the size ofBOT 43, instructions of a DAG trace can be grouped into subslices. The subslices can be identified by classifying the instructions of the DAG trace. After the subslices are identified,BOT 42 will store the references to the subslices. Therefore, if multiple DAG traces contain the same subslice,DAG trace cache 22 will only duplicate the references to the subslice. - Referring to FIG. 5, a
DAG 50 includes five nodes, each representing an instruction. The criterion instruction is represented by IPS. The five instructions are classified into two subslice types: a True Data Dependency (TDD) subslice, which includes all arithmetic and logical operations that contribute to the result of the criterion instruction, and an Address (ADDR) subslice, which includes address calculations of the loads and the stores that contribute to the result of the criterion instruction, along with the loads and the stores. Additionally, subslice types may further include: an Existence (EX) subslice, which includes branches that affect whether or not the criterion instruction is executed; and a Control Flow (CF) subslice, which includes all the branches that affect the result of the criterion instruction, but do not affect whether or not the criterion instruction is executed. A subslice can include instructions that do not have direct dependency relationship with other instructions in the same subslice. For example, IP1, IP2, and IP4 in FIG. 5 belong to the same TDD subslice, where IP4 is neither dependent on, nor depended by, IP1 or IP2. - FIG. 6 illustrates an example of a
classification algorithm 60 run bytrace builder 21 for classifying the instructions of a DAG trace.Classification algorithm 60 initially assumes that all nodes of the corresponding DAG belong to the TDD subslice type. Then, starting from the criterion instruction,algorithm 60 dynamically classifies each of the other nodes in the DAG. Iftrace builder 21 runs the classification algorithm onDAG 38 of FIG. 3A, the result of the classification will enter ‘type’field 315 ofarray 39 in FIG. 3B. -
Classification algorithm 60 can be implemented as a state machine in either hardware or software. Optimization techniques applied on a DAG trace, as described above, can also be performed on a subslice. - Once the subslices are identified, they can be placed into a portion of
trace cache 22 assigned to subslices, called asubslice cache 29 that contains subslice entries. Instead of caching an entire subslice in a single subslice entry, each subslice entry only stores a dependent piece of the subslice, that is, instructions that have direct dependency relationship and belong to the same subslice type. In the corresponding DAG, a dependent piece of the subslice contains the nodes that are connected and belong to the same subslice type. - Referring again to the example of FIG. 5, the TDD subslice includes three instructions represented by IP1, IP2, and IP4. IP1 and IP2 is one dependent piece of the TDD subslice, and IP4 is the other dependent piece of the TDD subslice. IP3, the only node in the ADDR subslice, is a dependent piece of the ADDR subslice. Each of the dependent pieces is stored in a subslice entry.
-
Trace builder 21 employs a hardware mechanism to copy the instructions from the array, such as the one in FIG. 3B, intosubslice cache 29 in the form of subslice entries. Each subslice entry contains a dependent piece of a subslice. Before a new subslice is saved,trace builder 21 will check if the new piece of the subslice has the same first instruction as a piece of a subslice that already exists insubslice cache 29. If such a piece exists, the new piece will be discarded, and the trace containing the new piece will point to the existing piece. The new piece is discarded base on the assumption that, if two pieces have the same first instruction, other instructions in the two pieces will also be the same. The assumption, however, may not be always true. The other instructions in the existing piece may be different from those of the new piece, thus causing the prediction of the corresponding trace to be incorrect. However, the incorrectness of the prediction will not affect the correctness of the overall program, because the speculative execution of DAG traces will not interfere architectural states of the main thread. Equally significantly, the criterion instruction of the trace and all the associated instructions will eventually be executed in the main thread onpipeline 15, or on main pipeline 65. After these instructions are executed, the pre-fetched instructions as a result of the incorrect prediction will be discarded from the speculative thread onpipeline 15, or from daughter pipeline 66. -
Trace builder 21 checks if a subslice, or a piece of a subslice, exists insubslice cache 29 by locating the IP of the first instruction of the subslice or the piece.Subslice cache 29, just liketrace cache 22 where the subslice cache resides, physically includes a tag array and a data array. Locating a subslice insubslice cache 29 can be performed in the same way as locating a DAG trace intrace cache 22 as described above. Other embodiments are within the scope of the following claims.
Claims (31)
1. An apparatus comprising:
a cache of traces, each trace including information about interdependent instructions among which data dependency exists, the interdependent instructions including a criterion instruction that is part of a program sequence.
2. The apparatus of claim 1 wherein the information comprises a directed acyclic graph.
3. The apparatus of claim 1 wherein the trace includes pointers to the interdependent instructions.
4. The apparatus of claim 1 wherein the trace includes the interdependent instructions.
5. The apparatus of claim 1 wherein the interdependent instructions include the criterion instruction and instructions preceding the criterion instruction in the program sequence.
6. The apparatus of claim 1 wherein the interdependent instructions are classified into subslice types, the trace including a pointer to each subslice that is formed by each type of the interdependent instructions.
7. The apparatus of claim 6 wherein each subslice is stored as dependent pieces.
8. The apparatus of claim 1 wherein the information includes a triggering condition of the trace, the interdependent instructions of the trace being executed when the triggering condition is met.
9. The apparatus of claim 8 wherein the triggering condition includes a triggering instruction in the program sequence, the triggering condition being based on evaluation of an architectural state.
10. The apparatus of claim 8 wherein the triggering condition includes a triggering instruction in the program sequence, the triggering condition being based on evaluation of a micro-architectural state.
11. The apparatus of claim 1 wherein the information further includes a confidence metric of the trace that predicts the likelihood of producing a correct result from executing the trace.
12. The apparatus of claim 11 wherein the confidence metric of the trace indicates whether or not the trace should be replaced by a new trace storing information about different instructions.
13. The apparatus of claim 11 wherein the confidence metric of the trace indicates whether or not the trace should be rebuilt using new information about the criterion instruction that arrives at the trace cache.
14. The apparatus of claim 11 further comprising a counter having a counter value that indicates the number of times the trace has been executed, the counter value, when exceeding a frequency threshold of the trace, triggering the trace to be rebuilt.
15. The apparatus of claim 1 wherein traces that are independent of each other and adjacent in the program sequence are grouped into a very-long-instruction-word for parallel executions.
16. The apparatus of claim 1 wherein traces that are data dependent of each other are chained together for serial executions.
17. The apparatus of claim 1 further comprising an instruction pointer that indexes the trace, the instruction pointer pointing to a first instruction or a last instruction of the interdependent instructions.
18. The apparatus of claim 1 further comprising:
a main pipeline executing the program sequence; and at least one secondary pipeline disjoint from the main pipeline executing the interdependent instructions.
19. The apparatus of claim 1 wherein the interdependent instructions are executed by a secondary thread on a pipeline, and the program sequence is executed by a main thread on the same pipeline.
20. A method comprising:
identifying a criterion instruction incurring latency in a program sequence;
capturing the criterion instruction and instructions preceding the criterion instruction in the program sequence, the preceding instructions and the criterion instruction being interdependent; and
storing a trace in a trace cache, the trace including information about the criterion instruction and the preceding instructions.
21. The method of claim 20 wherein the information is in a form of a directed acyclic graph
22. The method of claim 20 wherein the latency includes a long latency that exceeds a predetermined time threshold, a frequent latency that exceeds a predetermined recurrence threshold, or a long and uncertain latency that exceeds a mean threshold and a variance threshold.
23. The method of claim 20 further comprising dynamically identifying the criterion instruction based on information derived from previous executions.
24. The method of claim 20 further comprising capturing the criterion instruction and the preceding instructions by a buffer.
25. The method of claim 20 further comprising locating an existing trace in the trace cache before storing the trace, the existing trace and the trace to be stored having the same first instruction or the same last instruction.
26. The method of claim 20 further comprising rebuilding the trace after a duration of time interval that grows each time the trace is rebuilt until the duration reaches a predetermined time limit.
27. The method of claim 20 further comprising storing, in an array, the information about the criterion instruction and the preceding instructions.
28. The method of claim 27 wherein the array further includes a subslice type for each of the instructions, the subslice type being a result of classifying the instructions.
29. A computer program residing on a computer readable medium comprising instructions for causing a computer to:
identify a criterion instruction incurring latency in a program sequence;
capture the criterion instruction and instructions preceding the criterion instruction in the program sequence, the preceding instructions and the criterion instruction being interdependent; and
store a trace in a trace file, the trace including information about the criterion instruction, the preceding instructions, and interdependency among the criterion instruction and the preceding instructions.
30. The computer program of claim 29 wherein an analysis window defined in the computer program causes the computer to capture the criterion instruction and preceding instructions.
31. The computer program of claim 29 wherein the computer identifies the criterion instruction by profiling the program sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/823,235 US20020144101A1 (en) | 2001-03-30 | 2001-03-30 | Caching DAG traces |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/823,235 US20020144101A1 (en) | 2001-03-30 | 2001-03-30 | Caching DAG traces |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020144101A1 true US20020144101A1 (en) | 2002-10-03 |
Family
ID=25238172
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/823,235 Abandoned US20020144101A1 (en) | 2001-03-30 | 2001-03-30 | Caching DAG traces |
Country Status (1)
Country | Link |
---|---|
US (1) | US20020144101A1 (en) |
Cited By (88)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030126408A1 (en) * | 2002-01-03 | 2003-07-03 | Sriram Vajapeyam | Dependence-chain processor |
US20030220989A1 (en) * | 2002-05-23 | 2003-11-27 | Michael Tsuji | Method and system for client browser update |
US20050076180A1 (en) * | 2003-10-01 | 2005-04-07 | Advanced Micro Devices, Inc. | System and method for handling exceptional instructions in a trace cache based processor |
US20050137902A1 (en) * | 2003-12-22 | 2005-06-23 | Simon Bowie-Britton | Methods and systems for managing successful completion of a network of processes |
US20050193175A1 (en) * | 2004-02-26 | 2005-09-01 | Morrow Michael W. | Low power semi-trace instruction cache |
US20050231502A1 (en) * | 2004-04-16 | 2005-10-20 | John Harper | High-level program interface for graphics operations |
US20060003579A1 (en) * | 2004-06-30 | 2006-01-05 | Sir Jiun H | Interconnects with direct metalization and conductive polymer |
US7069278B2 (en) | 2003-08-08 | 2006-06-27 | Jpmorgan Chase Bank, N.A. | System for archive integrity management and related methods |
US20070022274A1 (en) * | 2005-06-29 | 2007-01-25 | Roni Rosner | Apparatus, system, and method of predicting and correcting critical paths |
US7197630B1 (en) | 2004-04-12 | 2007-03-27 | Advanced Micro Devices, Inc. | Method and system for changing the executable status of an operation following a branch misprediction without refetching the operation |
US7213126B1 (en) | 2004-01-12 | 2007-05-01 | Advanced Micro Devices, Inc. | Method and processor including logic for storing traces within a trace cache |
US20070101100A1 (en) * | 2005-10-28 | 2007-05-03 | Freescale Semiconductor, Inc. | System and method for decoupled precomputation prefetching |
US20070150877A1 (en) * | 2005-12-21 | 2007-06-28 | Xerox Corporation | Image processing system and method employing a threaded scheduler |
US20080120496A1 (en) * | 2006-11-17 | 2008-05-22 | Bradford Jeffrey P | Data Processing System, Processor and Method of Data Processing Having Improved Branch Target Address Cache |
US7555633B1 (en) | 2003-11-03 | 2009-06-30 | Advanced Micro Devices, Inc. | Instruction cache prefetch based on trace cache eviction |
US20090249004A1 (en) * | 2008-03-26 | 2009-10-01 | Microsoft Corporation | Data caching for distributed execution computing |
US7606975B1 (en) * | 2005-09-28 | 2009-10-20 | Sun Microsystems, Inc. | Trace cache for efficient self-modifying code processing |
US7681019B1 (en) | 2005-11-18 | 2010-03-16 | Sun Microsystems, Inc. | Executing functions determined via a collection of operations from translated instructions |
US20100122038A1 (en) * | 2008-11-07 | 2010-05-13 | Sun Microsystems, Inc. | Methods and apparatuses for improving speculation success in processors |
US20100122036A1 (en) * | 2008-11-07 | 2010-05-13 | Sun Microsystems, Inc. | Methods and apparatuses for improving speculation success in processors |
US7783863B1 (en) * | 2005-09-28 | 2010-08-24 | Oracle America, Inc. | Graceful degradation in a trace-based processor |
US7797517B1 (en) | 2005-11-18 | 2010-09-14 | Oracle America, Inc. | Trace optimization via fusing operations of a target architecture operation set |
US7814298B1 (en) | 2005-09-28 | 2010-10-12 | Oracle America, Inc. | Promoting and appending traces in an instruction processing circuit based upon a bias value |
US7849292B1 (en) | 2005-09-28 | 2010-12-07 | Oracle America, Inc. | Flag optimization of a trace |
US7870369B1 (en) * | 2005-09-28 | 2011-01-11 | Oracle America, Inc. | Abort prioritization in a trace-based processor |
US7877630B1 (en) | 2005-09-28 | 2011-01-25 | Oracle America, Inc. | Trace based rollback of a speculatively updated cache |
US20110074821A1 (en) * | 2004-04-16 | 2011-03-31 | Apple Inc. | System for Emulating Graphics Operations |
US7937564B1 (en) | 2005-09-28 | 2011-05-03 | Oracle America, Inc. | Emit vector optimization of a trace |
US7949854B1 (en) | 2005-09-28 | 2011-05-24 | Oracle America, Inc. | Trace unit with a trace builder |
US7953961B1 (en) | 2005-09-28 | 2011-05-31 | Oracle America, Inc. | Trace unit with an op path from a decoder (bypass mode) and from a basic-block builder |
US7953933B1 (en) | 2005-09-28 | 2011-05-31 | Oracle America, Inc. | Instruction cache, decoder circuit, basic block cache circuit and multi-block cache circuit |
US7966479B1 (en) | 2005-09-28 | 2011-06-21 | Oracle America, Inc. | Concurrent vs. low power branch prediction |
US7987342B1 (en) | 2005-09-28 | 2011-07-26 | Oracle America, Inc. | Trace unit with a decoder, a basic-block cache, a multi-block cache, and sequencer |
US8010745B1 (en) | 2006-09-27 | 2011-08-30 | Oracle America, Inc. | Rolling back a speculative update of a non-modifiable cache line |
US8015359B1 (en) | 2005-09-28 | 2011-09-06 | Oracle America, Inc. | Method and system for utilizing a common structure for trace verification and maintaining coherency in an instruction processing circuit |
US8019944B1 (en) | 2005-09-28 | 2011-09-13 | Oracle America, Inc. | Checking for a memory ordering violation after a speculative cache write |
US8024522B1 (en) | 2005-09-28 | 2011-09-20 | Oracle America, Inc. | Memory ordering queue/versioning cache circuit |
US8032439B2 (en) | 2003-01-07 | 2011-10-04 | Jpmorgan Chase Bank, N.A. | System and method for process scheduling |
US8032710B1 (en) | 2005-09-28 | 2011-10-04 | Oracle America, Inc. | System and method for ensuring coherency in trace execution |
US8037285B1 (en) | 2005-09-28 | 2011-10-11 | Oracle America, Inc. | Trace unit |
US8051247B1 (en) | 2005-09-28 | 2011-11-01 | Oracle America, Inc. | Trace based deallocation of entries in a versioning cache circuit |
US8065606B1 (en) | 2005-09-16 | 2011-11-22 | Jpmorgan Chase Bank, N.A. | System and method for automating document generation |
US8069336B2 (en) | 2003-12-03 | 2011-11-29 | Globalfoundries Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US8095659B2 (en) | 2003-05-16 | 2012-01-10 | Jp Morgan Chase Bank | Service interface |
US8104076B1 (en) | 2006-11-13 | 2012-01-24 | Jpmorgan Chase Bank, N.A. | Application access control system |
US20120213124A1 (en) * | 2011-02-21 | 2012-08-23 | Jean-Philippe Vasseur | Method and apparatus to trigger dag reoptimization in a sensor network |
US8321467B2 (en) | 2002-12-03 | 2012-11-27 | Jp Morgan Chase Bank | System and method for communicating between an application and a database |
US8370609B1 (en) | 2006-09-27 | 2013-02-05 | Oracle America, Inc. | Data cache rollbacks for failed speculative traces with memory operations |
US8370576B1 (en) | 2005-09-28 | 2013-02-05 | Oracle America, Inc. | Cache rollback acceleration via a bank based versioning cache ciruit |
US8370232B2 (en) | 1999-02-09 | 2013-02-05 | Jpmorgan Chase Bank, National Association | System and method for back office processing of banking transactions using electronic files |
WO2013070616A1 (en) * | 2011-11-07 | 2013-05-16 | Nvidia Corporation | An algorithm for vectorization and memory coalescing during compiling |
US8499293B1 (en) | 2005-09-28 | 2013-07-30 | Oracle America, Inc. | Symbolic renaming optimization of a trace |
US20140101643A1 (en) * | 2012-10-04 | 2014-04-10 | International Business Machines Corporation | Trace generation method, trace generation device, trace generation program product, and multi-level compilation using trace generation method |
US8832500B2 (en) | 2012-08-10 | 2014-09-09 | Advanced Micro Devices, Inc. | Multiple clock domain tracing |
US8850266B2 (en) | 2011-06-14 | 2014-09-30 | International Business Machines Corporation | Effective validation of execution units within a processor |
US8930760B2 (en) | 2012-12-17 | 2015-01-06 | International Business Machines Corporation | Validating cache coherency protocol within a processor |
US8935574B2 (en) | 2011-12-16 | 2015-01-13 | Advanced Micro Devices, Inc. | Correlating traces in a computing system |
US8959398B2 (en) | 2012-08-16 | 2015-02-17 | Advanced Micro Devices, Inc. | Multiple clock domain debug capability |
US9038177B1 (en) | 2010-11-30 | 2015-05-19 | Jpmorgan Chase Bank, N.A. | Method and system for implementing multi-level data fusion |
US20150317159A1 (en) * | 2014-05-01 | 2015-11-05 | Netronome Systems, Inc. | Pop stack absolute instruction |
US9292588B1 (en) | 2011-07-20 | 2016-03-22 | Jpmorgan Chase Bank, N.A. | Safe storing data for disaster recovery |
US20160266956A1 (en) * | 2012-05-31 | 2016-09-15 | International Business Machines Corporation | Data lifecycle management |
US9691118B2 (en) | 2004-04-16 | 2017-06-27 | Apple Inc. | System for optimizing graphics operations |
US20180165096A1 (en) * | 2016-12-09 | 2018-06-14 | Advanced Micro Devices, Inc. | Operation cache |
US10031833B2 (en) | 2016-08-31 | 2018-07-24 | Microsoft Technology Licensing, Llc | Cache-based tracing for time travel debugging and analysis |
US10031834B2 (en) | 2016-08-31 | 2018-07-24 | Microsoft Technology Licensing, Llc | Cache-based tracing for time travel debugging and analysis |
US10042737B2 (en) | 2016-08-31 | 2018-08-07 | Microsoft Technology Licensing, Llc | Program tracing for time travel debugging and analysis |
US10078584B2 (en) * | 2016-05-06 | 2018-09-18 | International Business Machines Corporation | Reducing minor garbage collection overhead |
US10204175B2 (en) * | 2016-05-18 | 2019-02-12 | International Business Machines Corporation | Dynamic memory tuning for in-memory data analytic platforms |
US10296442B2 (en) | 2017-06-29 | 2019-05-21 | Microsoft Technology Licensing, Llc | Distributed time-travel trace recording and replay |
WO2019095873A1 (en) * | 2017-11-20 | 2019-05-23 | 上海寒武纪信息科技有限公司 | Task parallel processing method, apparatus and system, storage medium and computer device |
US10310963B2 (en) | 2016-10-20 | 2019-06-04 | Microsoft Technology Licensing, Llc | Facilitating recording a trace file of code execution using index bits in a processor cache |
US10310977B2 (en) | 2016-10-20 | 2019-06-04 | Microsoft Technology Licensing, Llc | Facilitating recording a trace file of code execution using a processor cache |
US10318332B2 (en) | 2017-04-01 | 2019-06-11 | Microsoft Technology Licensing, Llc | Virtual machine execution tracing |
US10324851B2 (en) | 2016-10-20 | 2019-06-18 | Microsoft Technology Licensing, Llc | Facilitating recording a trace file of code execution using way-locking in a set-associative processor cache |
US10445211B2 (en) * | 2017-08-28 | 2019-10-15 | Microsoft Technology Licensing, Llc | Logging trace data for program code execution at an instruction level |
US10459824B2 (en) | 2017-09-18 | 2019-10-29 | Microsoft Technology Licensing, Llc | Cache-based trace recording using cache coherence protocol data |
US10467152B2 (en) | 2016-05-18 | 2019-11-05 | International Business Machines Corporation | Dynamic cache management for in-memory data analytic platforms |
US10489273B2 (en) | 2016-10-20 | 2019-11-26 | Microsoft Technology Licensing, Llc | Reuse of a related thread's cache while recording a trace file of code execution |
US10496537B2 (en) | 2018-02-23 | 2019-12-03 | Microsoft Technology Licensing, Llc | Trace recording by logging influxes to a lower-layer cache based on entries in an upper-layer cache |
CN110659070A (en) * | 2018-06-29 | 2020-01-07 | 赛灵思公司 | High-parallelism computing system and instruction scheduling method thereof |
US10540250B2 (en) | 2016-11-11 | 2020-01-21 | Microsoft Technology Licensing, Llc | Reducing storage requirements for storing memory addresses and values |
US10540373B1 (en) | 2013-03-04 | 2020-01-21 | Jpmorgan Chase Bank, N.A. | Clause library manager |
US10558572B2 (en) | 2018-01-16 | 2020-02-11 | Microsoft Technology Licensing, Llc | Decoupling trace data streams using cache coherence protocol data |
US10642737B2 (en) | 2018-02-23 | 2020-05-05 | Microsoft Technology Licensing, Llc | Logging cache influxes by request to a higher-level cache |
US10846245B2 (en) * | 2019-03-14 | 2020-11-24 | Intel Corporation | Minimizing usage of hardware counters in triggered operations for collective communication |
US11093225B2 (en) * | 2018-06-28 | 2021-08-17 | Xilinx, Inc. | High parallelism computing system and instruction scheduling method thereof |
US11907091B2 (en) | 2018-02-16 | 2024-02-20 | Microsoft Technology Licensing, Llc | Trace recording by logging influxes to an upper-layer shared cache, plus cache coherence protocol transitions among lower-layer caches |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4916652A (en) * | 1987-09-30 | 1990-04-10 | International Business Machines Corporation | Dynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures |
US5357617A (en) * | 1991-11-22 | 1994-10-18 | International Business Machines Corporation | Method and apparatus for substantially concurrent multiple instruction thread processing by a single pipeline processor |
US5937195A (en) * | 1996-11-27 | 1999-08-10 | Hewlett-Packard Co | Global control flow treatment of predicated code |
US6044222A (en) * | 1997-06-23 | 2000-03-28 | International Business Machines Corporation | System, method, and program product for loop instruction scheduling hardware lookahead |
US6073213A (en) * | 1997-12-01 | 2000-06-06 | Intel Corporation | Method and apparatus for caching trace segments with multiple entry points |
US6092187A (en) * | 1997-09-19 | 2000-07-18 | Mips Technologies, Inc. | Instruction prediction based on filtering |
US6594824B1 (en) * | 1999-02-17 | 2003-07-15 | Elbrus International Limited | Profile driven code motion and scheduling |
-
2001
- 2001-03-30 US US09/823,235 patent/US20020144101A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4916652A (en) * | 1987-09-30 | 1990-04-10 | International Business Machines Corporation | Dynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures |
US5357617A (en) * | 1991-11-22 | 1994-10-18 | International Business Machines Corporation | Method and apparatus for substantially concurrent multiple instruction thread processing by a single pipeline processor |
US5937195A (en) * | 1996-11-27 | 1999-08-10 | Hewlett-Packard Co | Global control flow treatment of predicated code |
US6044222A (en) * | 1997-06-23 | 2000-03-28 | International Business Machines Corporation | System, method, and program product for loop instruction scheduling hardware lookahead |
US6092187A (en) * | 1997-09-19 | 2000-07-18 | Mips Technologies, Inc. | Instruction prediction based on filtering |
US6073213A (en) * | 1997-12-01 | 2000-06-06 | Intel Corporation | Method and apparatus for caching trace segments with multiple entry points |
US6594824B1 (en) * | 1999-02-17 | 2003-07-15 | Elbrus International Limited | Profile driven code motion and scheduling |
Cited By (120)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8370232B2 (en) | 1999-02-09 | 2013-02-05 | Jpmorgan Chase Bank, National Association | System and method for back office processing of banking transactions using electronic files |
US8600893B2 (en) | 1999-02-09 | 2013-12-03 | Jpmorgan Chase Bank, National Association | System and method for back office processing of banking transactions using electronic files |
US10467688B1 (en) | 1999-02-09 | 2019-11-05 | Jpmorgan Chase Bank, N.A. | System and method for back office processing of banking transactions using electronic files |
US20030126408A1 (en) * | 2002-01-03 | 2003-07-03 | Sriram Vajapeyam | Dependence-chain processor |
US7363467B2 (en) | 2002-01-03 | 2008-04-22 | Intel Corporation | Dependence-chain processing using trace descriptors having dependency descriptors |
US20030220989A1 (en) * | 2002-05-23 | 2003-11-27 | Michael Tsuji | Method and system for client browser update |
US7987246B2 (en) | 2002-05-23 | 2011-07-26 | Jpmorgan Chase Bank | Method and system for client browser update |
US8321467B2 (en) | 2002-12-03 | 2012-11-27 | Jp Morgan Chase Bank | System and method for communicating between an application and a database |
US10692135B2 (en) | 2003-01-07 | 2020-06-23 | Jpmorgan Chase Bank, N.A. | System and method for process scheduling |
US8032439B2 (en) | 2003-01-07 | 2011-10-04 | Jpmorgan Chase Bank, N.A. | System and method for process scheduling |
US8095659B2 (en) | 2003-05-16 | 2012-01-10 | Jp Morgan Chase Bank | Service interface |
US7069278B2 (en) | 2003-08-08 | 2006-06-27 | Jpmorgan Chase Bank, N.A. | System for archive integrity management and related methods |
US7133969B2 (en) * | 2003-10-01 | 2006-11-07 | Advanced Micro Devices, Inc. | System and method for handling exceptional instructions in a trace cache based processor |
US20050076180A1 (en) * | 2003-10-01 | 2005-04-07 | Advanced Micro Devices, Inc. | System and method for handling exceptional instructions in a trace cache based processor |
US7555633B1 (en) | 2003-11-03 | 2009-06-30 | Advanced Micro Devices, Inc. | Instruction cache prefetch based on trace cache eviction |
US8069336B2 (en) | 2003-12-03 | 2011-11-29 | Globalfoundries Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US20050137902A1 (en) * | 2003-12-22 | 2005-06-23 | Simon Bowie-Britton | Methods and systems for managing successful completion of a network of processes |
US7213126B1 (en) | 2004-01-12 | 2007-05-01 | Advanced Micro Devices, Inc. | Method and processor including logic for storing traces within a trace cache |
US7437512B2 (en) * | 2004-02-26 | 2008-10-14 | Marvell International Ltd. | Low power semi-trace instruction/trace hybrid cache with logic for indexing the trace cache under certain conditions |
US20090013131A1 (en) * | 2004-02-26 | 2009-01-08 | Marvell International Ltd. | Low power semi-trace instruction cache |
US7822925B2 (en) | 2004-02-26 | 2010-10-26 | Marvell International Ltd. | Low power semi-trace instruction/trace hybrid cache with logic for indexing the trace cache under certain conditions |
US20050193175A1 (en) * | 2004-02-26 | 2005-09-01 | Morrow Michael W. | Low power semi-trace instruction cache |
US7197630B1 (en) | 2004-04-12 | 2007-03-27 | Advanced Micro Devices, Inc. | Method and system for changing the executable status of an operation following a branch misprediction without refetching the operation |
US20110074821A1 (en) * | 2004-04-16 | 2011-03-31 | Apple Inc. | System for Emulating Graphics Operations |
US20050231502A1 (en) * | 2004-04-16 | 2005-10-20 | John Harper | High-level program interface for graphics operations |
US8704837B2 (en) | 2004-04-16 | 2014-04-22 | Apple Inc. | High-level program interface for graphics operations |
US8044963B2 (en) * | 2004-04-16 | 2011-10-25 | Apple Inc. | System for emulating graphics operations |
US9691118B2 (en) | 2004-04-16 | 2017-06-27 | Apple Inc. | System for optimizing graphics operations |
US10402934B2 (en) | 2004-04-16 | 2019-09-03 | Apple Inc. | System for optimizing graphics operations |
US20060003579A1 (en) * | 2004-06-30 | 2006-01-05 | Sir Jiun H | Interconnects with direct metalization and conductive polymer |
US20070022274A1 (en) * | 2005-06-29 | 2007-01-25 | Roni Rosner | Apparatus, system, and method of predicting and correcting critical paths |
US8065606B1 (en) | 2005-09-16 | 2011-11-22 | Jpmorgan Chase Bank, N.A. | System and method for automating document generation |
US8732567B1 (en) | 2005-09-16 | 2014-05-20 | Jpmorgan Chase Bank, N.A. | System and method for automating document generation |
US8370576B1 (en) | 2005-09-28 | 2013-02-05 | Oracle America, Inc. | Cache rollback acceleration via a bank based versioning cache ciruit |
US8019944B1 (en) | 2005-09-28 | 2011-09-13 | Oracle America, Inc. | Checking for a memory ordering violation after a speculative cache write |
US7941607B1 (en) | 2005-09-28 | 2011-05-10 | Oracle America, Inc. | Method and system for promoting traces in an instruction processing circuit |
US7949854B1 (en) | 2005-09-28 | 2011-05-24 | Oracle America, Inc. | Trace unit with a trace builder |
US7953961B1 (en) | 2005-09-28 | 2011-05-31 | Oracle America, Inc. | Trace unit with an op path from a decoder (bypass mode) and from a basic-block builder |
US7953933B1 (en) | 2005-09-28 | 2011-05-31 | Oracle America, Inc. | Instruction cache, decoder circuit, basic block cache circuit and multi-block cache circuit |
US7966479B1 (en) | 2005-09-28 | 2011-06-21 | Oracle America, Inc. | Concurrent vs. low power branch prediction |
US7877630B1 (en) | 2005-09-28 | 2011-01-25 | Oracle America, Inc. | Trace based rollback of a speculatively updated cache |
US7987342B1 (en) | 2005-09-28 | 2011-07-26 | Oracle America, Inc. | Trace unit with a decoder, a basic-block cache, a multi-block cache, and sequencer |
US7937564B1 (en) | 2005-09-28 | 2011-05-03 | Oracle America, Inc. | Emit vector optimization of a trace |
US8015359B1 (en) | 2005-09-28 | 2011-09-06 | Oracle America, Inc. | Method and system for utilizing a common structure for trace verification and maintaining coherency in an instruction processing circuit |
US8499293B1 (en) | 2005-09-28 | 2013-07-30 | Oracle America, Inc. | Symbolic renaming optimization of a trace |
US8024522B1 (en) | 2005-09-28 | 2011-09-20 | Oracle America, Inc. | Memory ordering queue/versioning cache circuit |
US7870369B1 (en) * | 2005-09-28 | 2011-01-11 | Oracle America, Inc. | Abort prioritization in a trace-based processor |
US8032710B1 (en) | 2005-09-28 | 2011-10-04 | Oracle America, Inc. | System and method for ensuring coherency in trace execution |
US8037285B1 (en) | 2005-09-28 | 2011-10-11 | Oracle America, Inc. | Trace unit |
US7849292B1 (en) | 2005-09-28 | 2010-12-07 | Oracle America, Inc. | Flag optimization of a trace |
US8051247B1 (en) | 2005-09-28 | 2011-11-01 | Oracle America, Inc. | Trace based deallocation of entries in a versioning cache circuit |
US7814298B1 (en) | 2005-09-28 | 2010-10-12 | Oracle America, Inc. | Promoting and appending traces in an instruction processing circuit based upon a bias value |
US7606975B1 (en) * | 2005-09-28 | 2009-10-20 | Sun Microsystems, Inc. | Trace cache for efficient self-modifying code processing |
US7783863B1 (en) * | 2005-09-28 | 2010-08-24 | Oracle America, Inc. | Graceful degradation in a trace-based processor |
US7676634B1 (en) | 2005-09-28 | 2010-03-09 | Sun Microsystems, Inc. | Selective trace cache invalidation for self-modifying code via memory aging |
US20070101100A1 (en) * | 2005-10-28 | 2007-05-03 | Freescale Semiconductor, Inc. | System and method for decoupled precomputation prefetching |
US7797517B1 (en) | 2005-11-18 | 2010-09-14 | Oracle America, Inc. | Trace optimization via fusing operations of a target architecture operation set |
US7681019B1 (en) | 2005-11-18 | 2010-03-16 | Sun Microsystems, Inc. | Executing functions determined via a collection of operations from translated instructions |
US9081609B2 (en) * | 2005-12-21 | 2015-07-14 | Xerox Corporation | Image processing system and method employing a threaded scheduler |
US20070150877A1 (en) * | 2005-12-21 | 2007-06-28 | Xerox Corporation | Image processing system and method employing a threaded scheduler |
US8370609B1 (en) | 2006-09-27 | 2013-02-05 | Oracle America, Inc. | Data cache rollbacks for failed speculative traces with memory operations |
US8010745B1 (en) | 2006-09-27 | 2011-08-30 | Oracle America, Inc. | Rolling back a speculative update of a non-modifiable cache line |
US8104076B1 (en) | 2006-11-13 | 2012-01-24 | Jpmorgan Chase Bank, N.A. | Application access control system |
US7707396B2 (en) * | 2006-11-17 | 2010-04-27 | International Business Machines Corporation | Data processing system, processor and method of data processing having improved branch target address cache |
US20080120496A1 (en) * | 2006-11-17 | 2008-05-22 | Bradford Jeffrey P | Data Processing System, Processor and Method of Data Processing Having Improved Branch Target Address Cache |
US20090249004A1 (en) * | 2008-03-26 | 2009-10-01 | Microsoft Corporation | Data caching for distributed execution computing |
US8229968B2 (en) | 2008-03-26 | 2012-07-24 | Microsoft Corporation | Data caching for distributed execution computing |
US8898401B2 (en) | 2008-11-07 | 2014-11-25 | Oracle America, Inc. | Methods and apparatuses for improving speculation success in processors |
US8806145B2 (en) * | 2008-11-07 | 2014-08-12 | Oracle America, Inc. | Methods and apparatuses for improving speculation success in processors |
US20100122036A1 (en) * | 2008-11-07 | 2010-05-13 | Sun Microsystems, Inc. | Methods and apparatuses for improving speculation success in processors |
US20100122038A1 (en) * | 2008-11-07 | 2010-05-13 | Sun Microsystems, Inc. | Methods and apparatuses for improving speculation success in processors |
US9038177B1 (en) | 2010-11-30 | 2015-05-19 | Jpmorgan Chase Bank, N.A. | Method and system for implementing multi-level data fusion |
US8588108B2 (en) * | 2011-02-21 | 2013-11-19 | Cisco Technology, Inc. | Method and apparatus to trigger DAG reoptimization in a sensor network |
US20120213124A1 (en) * | 2011-02-21 | 2012-08-23 | Jean-Philippe Vasseur | Method and apparatus to trigger dag reoptimization in a sensor network |
US8892949B2 (en) | 2011-06-14 | 2014-11-18 | International Business Machines Corporation | Effective validation of execution units within a processor |
US8850266B2 (en) | 2011-06-14 | 2014-09-30 | International Business Machines Corporation | Effective validation of execution units within a processor |
US9971654B2 (en) | 2011-07-20 | 2018-05-15 | Jpmorgan Chase Bank, N.A. | Safe storing data for disaster recovery |
US9292588B1 (en) | 2011-07-20 | 2016-03-22 | Jpmorgan Chase Bank, N.A. | Safe storing data for disaster recovery |
US9639336B2 (en) | 2011-11-07 | 2017-05-02 | Nvidia Corporation | Algorithm for vectorization and memory coalescing during compiling |
WO2013070616A1 (en) * | 2011-11-07 | 2013-05-16 | Nvidia Corporation | An algorithm for vectorization and memory coalescing during compiling |
US8935574B2 (en) | 2011-12-16 | 2015-01-13 | Advanced Micro Devices, Inc. | Correlating traces in a computing system |
US10585740B2 (en) | 2012-05-31 | 2020-03-10 | International Business Machines Corporation | Data lifecycle management |
US9983921B2 (en) * | 2012-05-31 | 2018-05-29 | International Business Machines Corporation | Data lifecycle management |
US11200108B2 (en) | 2012-05-31 | 2021-12-14 | International Business Machines Corporation | Data lifecycle management |
US20160266956A1 (en) * | 2012-05-31 | 2016-09-15 | International Business Machines Corporation | Data lifecycle management |
US11188409B2 (en) | 2012-05-31 | 2021-11-30 | International Business Machines Corporation | Data lifecycle management |
US8832500B2 (en) | 2012-08-10 | 2014-09-09 | Advanced Micro Devices, Inc. | Multiple clock domain tracing |
US8959398B2 (en) | 2012-08-16 | 2015-02-17 | Advanced Micro Devices, Inc. | Multiple clock domain debug capability |
US20140101643A1 (en) * | 2012-10-04 | 2014-04-10 | International Business Machines Corporation | Trace generation method, trace generation device, trace generation program product, and multi-level compilation using trace generation method |
US9104433B2 (en) * | 2012-10-04 | 2015-08-11 | International Business Machines Corporation | Trace generation method, trace generation device, trace generation program product, and multi-level compilation using trace generation method |
US8930760B2 (en) | 2012-12-17 | 2015-01-06 | International Business Machines Corporation | Validating cache coherency protocol within a processor |
US10540373B1 (en) | 2013-03-04 | 2020-01-21 | Jpmorgan Chase Bank, N.A. | Clause library manager |
US20150317159A1 (en) * | 2014-05-01 | 2015-11-05 | Netronome Systems, Inc. | Pop stack absolute instruction |
US10474465B2 (en) * | 2014-05-01 | 2019-11-12 | Netronome Systems, Inc. | Pop stack absolute instruction |
US10078584B2 (en) * | 2016-05-06 | 2018-09-18 | International Business Machines Corporation | Reducing minor garbage collection overhead |
US10324837B2 (en) * | 2016-05-06 | 2019-06-18 | International Business Machines Corporation | Reducing minor garbage collection overhead |
US10467152B2 (en) | 2016-05-18 | 2019-11-05 | International Business Machines Corporation | Dynamic cache management for in-memory data analytic platforms |
US10204175B2 (en) * | 2016-05-18 | 2019-02-12 | International Business Machines Corporation | Dynamic memory tuning for in-memory data analytic platforms |
US10031833B2 (en) | 2016-08-31 | 2018-07-24 | Microsoft Technology Licensing, Llc | Cache-based tracing for time travel debugging and analysis |
US10042737B2 (en) | 2016-08-31 | 2018-08-07 | Microsoft Technology Licensing, Llc | Program tracing for time travel debugging and analysis |
US10031834B2 (en) | 2016-08-31 | 2018-07-24 | Microsoft Technology Licensing, Llc | Cache-based tracing for time travel debugging and analysis |
US10324851B2 (en) | 2016-10-20 | 2019-06-18 | Microsoft Technology Licensing, Llc | Facilitating recording a trace file of code execution using way-locking in a set-associative processor cache |
US10489273B2 (en) | 2016-10-20 | 2019-11-26 | Microsoft Technology Licensing, Llc | Reuse of a related thread's cache while recording a trace file of code execution |
US10310977B2 (en) | 2016-10-20 | 2019-06-04 | Microsoft Technology Licensing, Llc | Facilitating recording a trace file of code execution using a processor cache |
US10310963B2 (en) | 2016-10-20 | 2019-06-04 | Microsoft Technology Licensing, Llc | Facilitating recording a trace file of code execution using index bits in a processor cache |
US10540250B2 (en) | 2016-11-11 | 2020-01-21 | Microsoft Technology Licensing, Llc | Reducing storage requirements for storing memory addresses and values |
US20180165096A1 (en) * | 2016-12-09 | 2018-06-14 | Advanced Micro Devices, Inc. | Operation cache |
US10606599B2 (en) * | 2016-12-09 | 2020-03-31 | Advanced Micro Devices, Inc. | Operation cache |
US10318332B2 (en) | 2017-04-01 | 2019-06-11 | Microsoft Technology Licensing, Llc | Virtual machine execution tracing |
US10296442B2 (en) | 2017-06-29 | 2019-05-21 | Microsoft Technology Licensing, Llc | Distributed time-travel trace recording and replay |
US10445211B2 (en) * | 2017-08-28 | 2019-10-15 | Microsoft Technology Licensing, Llc | Logging trace data for program code execution at an instruction level |
US10459824B2 (en) | 2017-09-18 | 2019-10-29 | Microsoft Technology Licensing, Llc | Cache-based trace recording using cache coherence protocol data |
WO2019095873A1 (en) * | 2017-11-20 | 2019-05-23 | 上海寒武纪信息科技有限公司 | Task parallel processing method, apparatus and system, storage medium and computer device |
US10558572B2 (en) | 2018-01-16 | 2020-02-11 | Microsoft Technology Licensing, Llc | Decoupling trace data streams using cache coherence protocol data |
US11907091B2 (en) | 2018-02-16 | 2024-02-20 | Microsoft Technology Licensing, Llc | Trace recording by logging influxes to an upper-layer shared cache, plus cache coherence protocol transitions among lower-layer caches |
US10496537B2 (en) | 2018-02-23 | 2019-12-03 | Microsoft Technology Licensing, Llc | Trace recording by logging influxes to a lower-layer cache based on entries in an upper-layer cache |
US10642737B2 (en) | 2018-02-23 | 2020-05-05 | Microsoft Technology Licensing, Llc | Logging cache influxes by request to a higher-level cache |
US11093225B2 (en) * | 2018-06-28 | 2021-08-17 | Xilinx, Inc. | High parallelism computing system and instruction scheduling method thereof |
CN110659070A (en) * | 2018-06-29 | 2020-01-07 | 赛灵思公司 | High-parallelism computing system and instruction scheduling method thereof |
US10846245B2 (en) * | 2019-03-14 | 2020-11-24 | Intel Corporation | Minimizing usage of hardware counters in triggered operations for collective communication |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20020144101A1 (en) | Caching DAG traces | |
US7441110B1 (en) | Prefetching using future branch path information derived from branch prediction | |
KR101031938B1 (en) | Branch history register for loop branches | |
JP3548255B2 (en) | Branch instruction prediction mechanism and prediction method | |
US7814469B2 (en) | Speculative multi-threading for instruction prefetch and/or trace pre-build | |
JP5198879B2 (en) | Suppress branch history register updates by branching at the end of the loop | |
US7293164B2 (en) | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions | |
KR100395763B1 (en) | A branch predictor for microprocessor having multiple processes | |
JP2007515715A (en) | How to transition from instruction cache to trace cache on label boundary | |
CN101460922B (en) | Sliding-window, block-based branch target address cache | |
US6381691B1 (en) | Method and apparatus for reordering memory operations along multiple execution paths in a processor | |
US7290255B2 (en) | Autonomic method and apparatus for local program code reorganization using branch count per instruction hardware | |
US20040117606A1 (en) | Method and apparatus for dynamically conditioning statically produced load speculation and prefetches using runtime information | |
US7228528B2 (en) | Building inter-block streams from a dynamic execution trace for a program | |
KR102635965B1 (en) | Front end of microprocessor and computer-implemented method using the same | |
JP3762816B2 (en) | System and method for tracking early exceptions in a microprocessor | |
US20060015706A1 (en) | TLB correlated branch predictor and method for use thereof | |
EP0912927B1 (en) | A load/store unit with multiple pointers for completing store and load-miss instructions | |
TWI524272B (en) | Microprocessor and dynamically reconfigurable method and detection method, and computer program product thereof | |
US20080016292A1 (en) | Access controller and access control method | |
EP4248321A1 (en) | An apparatus and method for performing enhanced pointer chasing prefetcher | |
Jourdan et al. | Increasing the Instruction-Level Parallelism through Data-Flow Manipulation | |
JP2008501166A (en) | TLB correlation type branch predictor and method of using the same | |
WO1998002801A1 (en) | A functional unit with a pointer for mispredicted branch resolution, and a superscalar microprocessor employing the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, HONG;CHAZIN, NEIL A.;HUGHES, CHRISTOPHER J.;AND OTHERS;REEL/FRAME:011875/0919;SIGNING DATES FROM 20010424 TO 20010501 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |