US20020066088A1 - System and method for software code optimization - Google Patents

System and method for software code optimization Download PDF

Info

Publication number
US20020066088A1
US20020066088A1 US09/765,916 US76591601A US2002066088A1 US 20020066088 A1 US20020066088 A1 US 20020066088A1 US 76591601 A US76591601 A US 76591601A US 2002066088 A1 US2002066088 A1 US 2002066088A1
Authority
US
United States
Prior art keywords
implementation
code
target
application
software program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/765,916
Inventor
Frederic Canut
Mustapha Derras
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cadence Design Systems Inc
Original Assignee
Cadence Design Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cadence Design Systems Inc filed Critical Cadence Design Systems Inc
Priority to US09/765,916 priority Critical patent/US20020066088A1/en
Assigned to CADENCE DESIGN SYSTEMS, INC. reassignment CADENCE DESIGN SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CANUT, FREDERIC, DERRAS, MUSTAPHA
Publication of US20020066088A1 publication Critical patent/US20020066088A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/317Testing of digital circuits
    • G01R31/3181Functional testing
    • G01R31/3183Generation of test inputs, e.g. test vectors, patterns or sequences
    • G01R31/318342Generation of test inputs, e.g. test vectors, patterns or sequences by preliminary fault modelling, e.g. analysis, simulation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/28Testing of electronic circuits, e.g. by signal tracer
    • G01R31/317Testing of digital circuits
    • G01R31/3181Functional testing
    • G01R31/3183Generation of test inputs, e.g. test vectors, patterns or sequences
    • G01R31/318342Generation of test inputs, e.g. test vectors, patterns or sequences by preliminary fault modelling, e.g. analysis, simulation
    • G01R31/318357Simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level

Definitions

  • the present invention relates to the field of software development, and particularly, to system and methods for software code optimization.
  • the present invention in one aspect, provides a systems and methods for optimizing software for execution on a specific host processor.
  • a method is provided of optimizing a software program for a target processor in order to meet specific performance objectives, where the software program is coded in a high-level language.
  • the method includes the steps of first optimizing the software program in the high-level language, using optimizations that are substantially independent of the target processor to host the application.
  • the process preferably successfully exits.
  • the method preferably proceeds to a second step.
  • the initially optimized form of the software program is again optimized in the high-level language, although target processor-dependent optimizations are used. If the performance objectives are met after completing this second step, then the process preferably terminates. If the performance objectives are not met, then the process proceeds to a third step.
  • the twice-optimized software program is optimized using a low-level language of the target processor on key portions of the code, such that although the software implementation becomes target-dependent, it remains relatively portable.
  • performance profiles are determined for the intermediate forms of the optimized software program. These performance profiles are then preferably quantitatively compared to the previously defined performance objectives.
  • FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor representing a preferred embodiment of the invention.
  • FIG. 2 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the generic implementation process represented as one step in FIG. 1.
  • FIG. 3 is a flow diagram depicting a preferred embodiment of detailed steps of the generic implementation process depicted in FIG. 2.
  • FIG. 4 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-independent optimization process.
  • FIG. 5 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the specific implementation process represented as one step in FIG. 1.
  • FIG. 6 is a flow diagram depicting a preferred embodiment of detailed steps of the specific implementation process depicted in FIG. 5.
  • FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process.
  • FIG. 8 is a flow diagram depicting a preferred embodiment of steps of the fully dedicated implementation process represented as one step in FIG. 1.
  • each successive basic step generally results in code that is closer to being dedicated to operate on a particular target.
  • the optimization process is terminated.
  • the evaluation of performance, and thereby, whether the performance meets stated, predefined objectives preferably accounts for several factors.
  • a key performance measure is the real-time speed of the application when operating on the specified target.
  • Another performance measure is the accuracy and/or quality of the output.
  • Another factor that may be integrated into the process evaluation is the binary code size. While the application is made to fit in the target processor's memory, this factor generally is less and less important as memory sizes increase, become smaller, cheaper and less power consuming.
  • one major step in the optimization process is to fix the initial constraints that are applied to the development and optimization of the software application.
  • constraints are preferably used to quantitatively evaluate the application implementation's performance in an overall sense, and facilitate determining the feasibility of porting the application to a specific target at each of the development stages.
  • These measures preferably inherently integrate processing performance characteristics of the target processor, including its clock frequency, which relates to the number cycles available to execute the application.
  • Another set of parameters that preferably are calculated is the global I/O data flow to determine if the memory accesses (read/write) are achievable for the specified target.
  • This set of parameters integrate elements like the aggregated data flow over the internal and external buses (data exchanges).
  • FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor.
  • an optimization process 100 comprises three optimization steps.
  • the software for the DSP processor target is written in a high-level language such as C, C++ or Ada.
  • the programming language is one that is completely portable between all probable DSP targets.
  • optimization techniques particular to the language are preferably used.
  • the code optimization in this step 102 preferably does not employ any optimization tools that depend on the processor that is meant to host the application.
  • the new implementation Upon completion of this step 102 , the new implementation, as a next step 104 , is evaluated to determine whether the performance goals have been reached. If by completing the step 102 of target-independent optimization the performance objectives are achieved, then the overall optimization process 100 successfully terminates 112 . If the performance requirements for the application have not been achieved, then in a next optimization step 106 , certain portions of the software code are re-implemented in the high-level language to take advantage of the specific processing capabilities of the DSP target.
  • the software may remain partially portable for a number of reasons.
  • the modified code is preferably selected from a portion of code that is short in terms of lines of source code, but is repeatedly executed and is thus responsible for a relatively significant percentage of the processing overhead.
  • the amount of code that must be modified is minimized, and may additionally be flagged in the source file to indicate that it is target-specific code. If the target processor later changes, only these identified portions need be addressed for optimization. Further, the previously unoptimized code, corresponding in functionality to the portion of code that is optimized in this step 106 , may remain coded in the source file.
  • This original unoptimized code may be used as a starting point for optimizing the same portion of code of any subsequent target processor.
  • Another benefit is that although the coding is specific to the DSP target for the application, the code preferably remains in the high-level language. By remaining in a high-level language (versus being re-coded in a low-level language such as an assembly language), the resulting code is inherently much easier to revisit and comprehend should modifications be necessary.
  • software-profiling tools are applied to readily identify the portions of code that fit the criteria required to be preferred candidates for optimization, so that they then can be optimized for the particular DSP target as necessary.
  • step 108 the implementation is again evaluated to determine whether the performance objectives on the target processor have been met.
  • step 104 if the performance objectives are achieved though the step 102 of target-dependent optimization using the high-level language, then the overall optimization process 100 successfully terminates 112 .
  • the optimization process 100 proceeds to a third optimization step 110 .
  • the software is configured to be fully dedicated to the architecture and processing benefits of the target processor.
  • Various coding techniques that are particular to the target processor for the application may be employed. Some of these techniques include executing instructions in parallel or using any pipeline processing or other specialized processing capabilities.
  • the method is performed automatically after the software code has been initially developed in a high-level language.
  • the system is provided the performance parameters that are desired for the application, as well as the architectural specification of the target processor. Given these inputs, the system then processes the high-level language source code, compiles and simulates the code's execution, and tests the code against the specified performance requirements. If the performance requirements are not met, the system profiles the code and then optimizes the portions that are the best candidates for optimization.
  • the system comprises a software-optimizing processor in conjunction with memory that automatically performs the code profiling operations, code generation operating on portions of code that are determined to be candidates for optimization, and then subsequent performance analysis.
  • the software-optimizing processor may comprise any type of computer, and has processing characteristics dependent upon, for example, the processing requirements for the code generation, profiling and performance assessment operations. It may comprise, e.g., a computer, such as a workstation such as are manufactured by Sun Microsystems, a main frame computer, or a personal computer such as the type manufactured by IBM or Apple.
  • a computer executing optimization software is preferably used for the software-optimizing processor, due to the utility and flexibility of a computer in programming, modifying software, and observing software performance.
  • the software-optimizing processor may be implemented using any type of processor or processors that may perform the code optimization process as described herein.
  • processor refers to a wide variety of computational devices or means including, for example, using multiple processors that perform different processing tasks or have the same tasks distributed between processors.
  • the processor(s) may be general purpose CPUs or special purpose processors such as are often conventionally used in digital signal processing systems. Further, multiple processors may be implemented in a server-client or other network configuration, as a pipeline array of processors, etc. Some or all of the processing is alternatively implemented with hard-wired circuitry such as an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other logic device.
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • the term “memory” refers to any storage medium that is accessible to a processor that meets the memory storage needs for a system for optimizing software.
  • the memory buffer is random access memory (RAM) that is directly accessed by the software-optimizing processor for ease in manipulating and processing selected portions of data.
  • the memory store comprises a hard disk or other nonvolatile memory device or component.
  • the term generic means target-independent.
  • the high-level language source code normally C ⁇ code
  • the portability of the application is maintained and some optimization is integrated into the application at a high level, without using assembly language code.
  • FIG. 2 is a flow diagram depicting a simplified representation of the main steps of the generic implementation process 200 represented as step 102 in FIG. 1. Preferred detailed steps of this process 200 are depicted in the task flow diagram of FIG. 3. Preferably, as shown in FIG. 2, there are four main steps that take as input the mathematical theory related to a signal processing algorithm and lead to an implementation that is later used by the specific implementation process.
  • the floating-point implementation step takes as input the theoretical solution of a process and transforms the solution into a structured language implementation.
  • a main purpose of the step is to be able to reflect as much of the math in the theory into the implementation.
  • the floating-point implementation transitions to a fixed-point implementation linked to the precision that the DSP can handle.
  • the DSP may need a 16-bit precision implementation.
  • typically a group developing the floating-point application is not the group developing the fixed-point implementation.
  • the theoretical implementation is made with no consideration of the precision. In that case, the implementation is oriented to processing quality and pushes the precision problem to the fixed-point porting.
  • a target precision is involved at an early stage of the development and impacts the quality of the processing. This provides a full precision-oriented implementation. However, this implementation must be entirely redone if the target architecture is changed.
  • the implementation architecture is preferably considered. If an implementation involves hundreds of function calls, the real-time execution at the end of the implementation flow is impacted. For this reason, two different steps in the implementation activity are utilized.
  • a common method of implementing signal processing is to take the floating-point implementation and port it to a specific target.
  • Another method includes porting an existing fixed-point implementation to a new target.
  • the mechanisms are quite different because of the availability of a first implementation. In the latter case, it is more an adaptation of an existing application than a new implementation.
  • the advantage is that it shortens the development process by reusing the existing code done for another target.
  • the goal of processing qualification is to obtain an implementation that preferably provides the best trade-off on the output result for a given precision.
  • One of the tools that can accelerate the completion of this step is the CiertoTM signal processing worksystem. This tool provides the capability to validate, compare, and qualify a process with a reference to the floating-point implementation.
  • Fixing the derivation criteria depends primarily on the application category. For image processing comparison, information like texture, edge, contrast and distortion is considered. For voice processing, the same elements may be taken into account, but spectral analysis, tone, volume and saturation, etc. may also be considered. Depending on the application domain, the criteria can be completely different. Furthermore, within a given domain, the criteria can change. Radar can be used in military or agricultural activities but the measures made for those two applications of radar image may be quite different.
  • the first sizing of the algorithm can be addressed.
  • the information gathered includes the real-time data flow, the implementation structure and architecture, the profiling of instructions and cycles, and the performances of the target DSP. These elements help determine if the code can fit inside the target.
  • the goal of real-time data flow is to understand the different I/Os related to the algorithm that are to be integrated into the DSP.
  • On one level is the global data flow that globally indicates the availability of the raw and the processed data.
  • the developer identifies the processing delays that are going to provide a basic characteristic of the application relating to data flow.
  • the real behavior of the data coming in and out on the data bus of the system is not necessarily clear.
  • the programmer may have to zoom in the elementary time duration (selected for the global data flow representation) to characterize the behavior of the implementation confronted with the interruptions coming from the devices involved in the process.
  • This “elementary time duration” can be very different from one application to another. It can be the duration of, for example, an image frame, an image line, an audio frame, or a dedicated time dictated by control software or the processor.
  • Application cadence may impact all future decisions for the application. For example, in an interrupt-driven architecture, which is the case in most of the real-time DSP constraint developments, it is then possible to make clear design choices like use of a (first-in first-out) FIFO that will buffer data. This option provides a more flexible way to manage bus I/Os because it allows a better optimization of the bandwidth usage. It is generally a more expensive system design, but it is recommended for processing that involves large amounts of data, like image processing.
  • a designer may choose not to use a FIFO. This means that each piece of data produced is either immediately saved or eventually lost. This is the most constrained way of implementing a signal processing application, but is cheaper and well suited for processing that involves little data such as voice processing. This example shows the impact of the application cadence criteria on application development.
  • Implementation organization preferably considers the implementation structure, the architecture of the implementation, and the behavior of the implementation.
  • the implementation structure generally means that the developer knows the number of functions implemented, the number of times they are called, split if possible into low-level and high-level functions, and so on. This first measure can be made manually or by using tools. One difficulty is identifying a tool that indicates the number of times a function is called. Context switching can be expensive if it occurs too many times. For this purpose, one can use free coverage tools like gprofs that provide part of the necessary information. The use of other tools like Sparcworks (Sun Microsystems) provides the call graph.
  • the architecture of the implementation generally means knowing the overall behavior of the application to know if it may be necessary to revisit the algorithm construction to emphasize real-time issues. Given a specific processing algorithm that produces a signal processing development, the requirements can be formalized as follows:
  • the first step in evaluating the feasibility of the application includes determining the global data flow to fix the limit of the input/output size and the precision (8, 16 or 32 bits).
  • the first output of the data flow indicates if it is possible to sustain the I/Os, but another indication concerns the algorithm structure.
  • AD Architectural Delay
  • the objective of high-level profiling is to provide a first indication of the number of cycles consumed by the implementation and the binary code size. If necessary, a simulator can also produce an instruction profiling.
  • One difficulty is fixing the comparison criteria so that it is known whether the application fits in the targeted DSP.
  • the benches are generally provided by the DSP vendors. This means that the cycle counts indicated at a higher level must be correlated with the performances of the target DSP to establish a go/no go process.
  • DSP providers provide appropriate benches.
  • DSPs are compared with C-written kernel functions including MAC: Multiply accumulate, Vect: Simple vectors multiply, Fir: FIR filter with redundant load elimination, Lat: Lattice synthesis, Iir: IIR filter, Jpeg: JPEG discrete cosine transform
  • This implementation is the reference after generic code qualification to be optimized at the C level. Based on this code, one applies several rules concerning the method of implementing the application that fosters processing time reduction.
  • a primary objective is to establish a test process that guarantees the integrity of the processing. The goal is to reduce the cycle consumption and not to transform the result of the processing. It is also possible to establish a specific test script to validate the optimization and/or use tools to compare the processing results.
  • the script makes easier the run of several tests and allows the programmer to gather information (traces) on the application behavior.
  • the tools allow the creation of specific comparisons on the processed data.
  • the Cadence Cierto Signal Processing Worksystem (SPW) is capable of such a task and can speed the development cycle.
  • optimization preferably uses tricks such as loop reduction, loop merging, test reduction, pointer usage, and in-line functions or macros, to reduce context switching. These tricks are generic and can be used for most if not all high-level languages.
  • Another optimization step that can be integrated at this level is development chain optimization by addressing the specific options of the pre-processor, compiler, assembler, and/or linker. This may be useful if the implementation is initially done with the target development environment. Generally, the applications are initially developed on PCs or Workstations. Then, taking advantage of the generic compiler is not useful and can lead to bad decisions in terms of performances and code size.
  • the developer assumes that the target DSP is fixed and that a simulator is available. Many C optimizations as are known in the art are possible at this language level.
  • at least three parameters are integrated: the global effort in terms of time to integrate a new optimization step, the processing time reduction that can be evaluated, and the code size evolution. These parameters preferably are correlated to the time dedicated to the project and whether or not the application is mandatory to system functionality.
  • a goal of these measures is to understand the impact of a modification.
  • Another goal is to fix a limit for the different optimization steps in terms of time.
  • One rule for example, may be to measure more than five percent of cycle reduction between two steps.
  • some instructions to allow the use of DSP-specific characteristics preferably are integrated into the C or other high-level language implementation. Many of the instructions may be addressed by using pragma instructions that are placed in the code to take advantage of caches or internal RAM, loop counters, multiply-accumulate capabilities (MAC), and multiply-subtract capabilities (MSU). Other specific characteristics like splitable ALUs or multipliers, parallel instruction execution, and pipeline effects are addressed in the assembly level. For some DSPs, the only way to use these characteristics is to handle them at the assembly level. Furthermore, this step requires that the developer perform the least amount of tuning on the code to comply with the DSPs features.
  • Another task of this stage is to implement the high-level language (e.g., C) code and look at the effect obtained on the generated assembler.
  • the goal is not to modify assembly code but to write C code in a way that the assembler part of the compiler generates optimized assembly code.
  • the assumption is made that there is some specific C implementations that will impact the generated assembly code in the same way for many compilers.
  • the examples are the “do ⁇ while” or the MAC integration. However, this is mainly true for the second and third DSP generations.
  • FIG. 5 is a flow diagram depicting a simplified representation of the main steps of the specific implementation process 500 represented as step 106 in FIG. 1. Detailed steps of the specific implementation process are depicted in the task flow diagram of FIG. 6.
  • a key objective is to integrate specific pragmas and intrinsics into the code.
  • the pragmas allow the use of cache or internal RAM memories and integration of loop counters to optimize loop branches.
  • the other aspect of this optimization concerns the implementation modifications that take advantage of the specific capabilities of the target DSP, including multiply-accumulate, multiply-subtract, splittable multiply-add, and post register modification.
  • the goal is to generate the assembly code and observe what can be modified in the C implementation that can be translated differently by the compiler.
  • Some methods of accomplishing this include, for example, removing code that is not used, avoiding overhead introduced by recursive calls, moving loop invariant expressions out of the loops, and reducing the scope of the variables (using macros integrates this concept naturally).
  • the developer preferably encapsulates the specific instructions of a DSP by integrating specific flags related to the target compiler in the code, and by using a source versioning system to handle the various target DSPs. Note that integrating specific flags can validate specific code parts depending on those flags.
  • FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process. As shown in FIG. 7, the size decreases slowly because specific points of the application are addressed, but the impact on the performance can be impressive.
  • a fully dedicated implementation process is the lowest stage of the development process. Trade-offs on the application are preferably made by removing some processing passes, wherever possible. Assembly-specific optimization is also integrated to finally reach the target performance.
  • FIG. 8 is a task flow diagram depicting steps of a dedicated implementation process 800 represented as step 108 in FIG. 1.
  • the dedicated implementation process includes two main steps, manual assembly optimization and feature tuning/cutting.
  • One work-around is to drop out some specific part of the application that will have little impact on the quality of the processed data.
  • functions such as a ring subtraction, a high-pass filter on the input signal, or compression rate could be dropped without a significant loss of performance.
  • the gain in terms of performance may not be high, cutting compression rate can suppress enough cycles to reach the target performance.

Abstract

A method is provided of optimizing a software program for a target processor in order to meet specific performance objectives and yet maintain portability, where the software program is initially coded in a high-level language. The method includes a first step of optimizing the software program in the high-level language, using optimizations that are substantially independent of the target processor to host the application. Preferably, if the performance objectives are met after the completion of this step, then the process preferably successfully terminates. However, if the performance objectives are not met, then the method preferably proceeds to a second step. In the second step, the initially optimized form of the software program is again optimized in the high-level language, although target processor-dependent optimizations are used. If the performance objectives are met after completing this second step, then the process preferably terminates. If the performance objectives are not met, then the process proceeds to a third step. In the third step, the twice-optimized software program is optimized using a low-level language of the target processor on key portions of the code, such that although the software implementation becomes target-dependent, it remains relatively portable.

Description

  • This application claims priority to a U.S. Provisional Application entitled “System-on-a-Chip-1,” having Ser. No. 60/216,746 and filed on Jul. 3, 2000, and which is hereby incorporated by reference into this application as though fully set forth herein.[0001]
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention [0002]
  • The present invention relates to the field of software development, and particularly, to system and methods for software code optimization. [0003]
  • 2. Background [0004]
  • In the design of software for digital signal processing (DSP) and other applications, programmers take advantage of the low-level but high-speed capabilities that the particular target processor (e.g., DSP processor, microcontroller) offers in order to achieve the performance requirements for the applications. However, the application of these tools early in the development process leads to the development of a program that may be unportable should a different target processor be subsequently used to host the program. The development of code that is not portable from one target processor to another may result in significant redesign and development costs for the same basic application. Often, if the program is portable, the most significant issue is the cost in time and resources to port the application to a new host, such that the performance requirements of the program on the new host are met. [0005]
  • A need exists therefore for a system and method that minimizes the likelihood that a development process for software will result in a program that is unportable from one target processor to another. [0006]
  • SUMMARY OF THE INVENTION
  • The present invention, in one aspect, provides a systems and methods for optimizing software for execution on a specific host processor. [0007]
  • In one embodiment, a method is provided of optimizing a software program for a target processor in order to meet specific performance objectives, where the software program is coded in a high-level language. The method includes the steps of first optimizing the software program in the high-level language, using optimizations that are substantially independent of the target processor to host the application. Preferably, if the performance objectives are met after the completion of this step, then the process preferably successfully exits. Thus, if the performance objectives are not met, then the method preferably proceeds to a second step. [0008]
  • In the second step, the initially optimized form of the software program is again optimized in the high-level language, although target processor-dependent optimizations are used. If the performance objectives are met after completing this second step, then the process preferably terminates. If the performance objectives are not met, then the process proceeds to a third step. [0009]
  • In the third step, the twice-optimized software program is optimized using a low-level language of the target processor on key portions of the code, such that although the software implementation becomes target-dependent, it remains relatively portable. Preferably, in evaluating whether the performance objectives have been achieved, performance profiles are determined for the intermediate forms of the optimized software program. These performance profiles are then preferably quantitatively compared to the previously defined performance objectives.[0010]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor representing a preferred embodiment of the invention. [0011]
  • FIG. 2 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the generic implementation process represented as one step in FIG. 1. [0012]
  • FIG. 3 is a flow diagram depicting a preferred embodiment of detailed steps of the generic implementation process depicted in FIG. 2. [0013]
  • FIG. 4 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-independent optimization process. [0014]
  • FIG. 5 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the specific implementation process represented as one step in FIG. 1. [0015]
  • FIG. 6 is a flow diagram depicting a preferred embodiment of detailed steps of the specific implementation process depicted in FIG. 5. [0016]
  • FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process. [0017]
  • FIG. 8 is a flow diagram depicting a preferred embodiment of steps of the fully dedicated implementation process represented as one step in FIG. 1.[0018]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • In a preferred embodiment of a software code optimization method (or process) comprising multiple basic steps, each successive basic step generally results in code that is closer to being dedicated to operate on a particular target. Thus, to promote portability, if the performance goals of the application are reached after the completion of any step in the process, then the optimization process is terminated. [0019]
  • The evaluation of performance, and thereby, whether the performance meets stated, predefined objectives preferably accounts for several factors. Preferably, a key performance measure is the real-time speed of the application when operating on the specified target. Another performance measure is the accuracy and/or quality of the output. Another factor that may be integrated into the process evaluation is the binary code size. While the application is made to fit in the target processor's memory, this factor generally is less and less important as memory sizes increase, become smaller, cheaper and less power consuming. [0020]
  • Thus, one major step in the optimization process is to fix the initial constraints that are applied to the development and optimization of the software application. These constraints are preferably used to quantitatively evaluate the application implementation's performance in an overall sense, and facilitate determining the feasibility of porting the application to a specific target at each of the development stages. These measures preferably inherently integrate processing performance characteristics of the target processor, including its clock frequency, which relates to the number cycles available to execute the application. [0021]
  • Another set of parameters that preferably are calculated is the global I/O data flow to determine if the memory accesses (read/write) are achievable for the specified target. This set of parameters integrate elements like the aggregated data flow over the internal and external buses (data exchanges). [0022]
  • FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor. In the embodiment represented in FIG. 1, an [0023] optimization process 100 comprises three optimization steps. In a first optimization step 102, the software for the DSP processor target is written in a high-level language such as C, C++ or Ada. Preferably, the programming language is one that is completely portable between all probable DSP targets. In coding the software, optimization techniques particular to the language are preferably used. The code optimization in this step 102 preferably does not employ any optimization tools that depend on the processor that is meant to host the application.
  • Upon completion of this [0024] step 102, the new implementation, as a next step 104, is evaluated to determine whether the performance goals have been reached. If by completing the step 102 of target-independent optimization the performance objectives are achieved, then the overall optimization process 100 successfully terminates 112. If the performance requirements for the application have not been achieved, then in a next optimization step 106, certain portions of the software code are re-implemented in the high-level language to take advantage of the specific processing capabilities of the DSP target.
  • While the code after this [0025] step 106 is less portable than the code that results after the previous step 102, the software may remain partially portable for a number of reasons. One reason is that the modified code is preferably selected from a portion of code that is short in terms of lines of source code, but is repeatedly executed and is thus responsible for a relatively significant percentage of the processing overhead. By first modifying the code that fits these criteria, the amount of code that must be modified is minimized, and may additionally be flagged in the source file to indicate that it is target-specific code. If the target processor later changes, only these identified portions need be addressed for optimization. Further, the previously unoptimized code, corresponding in functionality to the portion of code that is optimized in this step 106, may remain coded in the source file. This original unoptimized code may be used as a starting point for optimizing the same portion of code of any subsequent target processor. Another benefit is that although the coding is specific to the DSP target for the application, the code preferably remains in the high-level language. By remaining in a high-level language (versus being re-coded in a low-level language such as an assembly language), the resulting code is inherently much easier to revisit and comprehend should modifications be necessary.
  • Preferably, software-profiling tools are applied to readily identify the portions of code that fit the criteria required to be preferred candidates for optimization, so that they then can be optimized for the particular DSP target as necessary. [0026]
  • Once a section of code has been optimized for a particular DSP target, other portions of code that meet the criteria to be candidates for optimization may also be optimized for the DSP target, if the performance criteria for the application have still not been achieved. [0027]
  • If the portions of code that are candidates for optimization have been optimized, then in a [0028] next step 108, the implementation is again evaluated to determine whether the performance objectives on the target processor have been met. As in step 104, if the performance objectives are achieved though the step 102 of target-dependent optimization using the high-level language, then the overall optimization process 100 successfully terminates 112. However, if the performance goals of the application have not been met, then the optimization process 100 proceeds to a third optimization step 110. In this step 110, the software is configured to be fully dedicated to the architecture and processing benefits of the target processor. Various coding techniques that are particular to the target processor for the application may be employed. Some of these techniques include executing instructions in parallel or using any pipeline processing or other specialized processing capabilities. Further, tradeoffs may be made between performance and throughput in order to meet pre-stated objectives of the application. The result of the process is an efficiently created program for a DSP, microcontroller, or other computing target processor that meets pre-stated performance objectives, and is optimal, or close to optimal, with respect to its portability, thereby minimizing future software development efforts for the same application.
  • In one embodiment of a system for performing the optimization method, the method is performed automatically after the software code has been initially developed in a high-level language. Preferably, the system is provided the performance parameters that are desired for the application, as well as the architectural specification of the target processor. Given these inputs, the system then processes the high-level language source code, compiles and simulates the code's execution, and tests the code against the specified performance requirements. If the performance requirements are not met, the system profiles the code and then optimizes the portions that are the best candidates for optimization. [0029]
  • Preferably, the system comprises a software-optimizing processor in conjunction with memory that automatically performs the code profiling operations, code generation operating on portions of code that are determined to be candidates for optimization, and then subsequent performance analysis. The software-optimizing processor may comprise any type of computer, and has processing characteristics dependent upon, for example, the processing requirements for the code generation, profiling and performance assessment operations. It may comprise, e.g., a computer, such as a workstation such as are manufactured by Sun Microsystems, a main frame computer, or a personal computer such as the type manufactured by IBM or Apple. A computer executing optimization software is preferably used for the software-optimizing processor, due to the utility and flexibility of a computer in programming, modifying software, and observing software performance. More generally, the software-optimizing processor may be implemented using any type of processor or processors that may perform the code optimization process as described herein. [0030]
  • Thus, the term “processor,” in its use herein, refers to a wide variety of computational devices or means including, for example, using multiple processors that perform different processing tasks or have the same tasks distributed between processors. The processor(s) may be general purpose CPUs or special purpose processors such as are often conventionally used in digital signal processing systems. Further, multiple processors may be implemented in a server-client or other network configuration, as a pipeline array of processors, etc. Some or all of the processing is alternatively implemented with hard-wired circuitry such as an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other logic device. In conjunction with the term “processor,” the term “memory” refers to any storage medium that is accessible to a processor that meets the memory storage needs for a system for optimizing software. Preferably, the memory buffer is random access memory (RAM) that is directly accessed by the software-optimizing processor for ease in manipulating and processing selected portions of data. Preferably, the memory store comprises a hard disk or other nonvolatile memory device or component. [0031]
  • Preferred embodiments of the method for performing each of the basic steps illustrated in FIG. 1 are now provided. [0032]
  • Generic Implementation Process (Target independent Optimization)
  • As used herein, the term generic means target-independent. In the DSP domain, in a target-independent implementation, the high-level language source code, normally C− code, uses no specific function calls to pragma or macros dedicated to the target. With a target-independent implementation, the portability of the application is maintained and some optimization is integrated into the application at a high level, without using assembly language code. [0033]
  • FIG. 2 is a flow diagram depicting a simplified representation of the main steps of the [0034] generic implementation process 200 represented as step 102 in FIG. 1. Preferred detailed steps of this process 200 are depicted in the task flow diagram of FIG. 3. Preferably, as shown in FIG. 2, there are four main steps that take as input the mathematical theory related to a signal processing algorithm and lead to an implementation that is later used by the specific implementation process.
  • Stage G1: Floating Point Implementation
  • The floating-point implementation step takes as input the theoretical solution of a process and transforms the solution into a structured language implementation. A main purpose of the step is to be able to reflect as much of the math in the theory into the implementation. [0035]
  • Precision in the calculations is important in the floating-point implementation. In general cases, those applications are done using double-floats and tools like the Cadence® Cierto™ signal processing worksystem or MathLab. Such tools provide representation of the processed data, allow graphical representation and comparison, and extract errors so that an implementation can be qualified. [0036]
  • For an integer DSP, the floating-point implementation transitions to a fixed-point implementation linked to the precision that the DSP can handle. For example, the DSP may need a 16-bit precision implementation. However, typically a group developing the floating-point application is not the group developing the fixed-point implementation. This means that there are at least the following two approaches. In the first approach, the theoretical implementation is made with no consideration of the precision. In that case, the implementation is oriented to processing quality and pushes the precision problem to the fixed-point porting. In the second approach, a target precision is involved at an early stage of the development and impacts the quality of the processing. This provides a full precision-oriented implementation. However, this implementation must be entirely redone if the target architecture is changed. Details regarding floating point formats and related issues in terms of implementation are provided in several articles, including Morgan, Don, “Practical DSP Modeling, Techniques, and Programming in C,” John Wiley & Sons, 1995, pp. 263-298, and Lasley, P., Bier, J., Sholan, A., and Lee, Edward A., “DSP Processors Fundamentals, Architectures and Features,” IEEE Press Series on Signal Processing, 1997, pp. 1-30, which are hereby incorporated by reference as though fully set forth herein. [0037]
  • Stage G2: Fixed Point Implementation [0038]
  • In the stage of deriving the fixed-point implementation, trade-offs relating to precision may be made. The extent of these trade-offs primarily depends on the target DSP capability. If the target processor is a 16-bit precision DSP, the accepted deviation of the output result will be greater than on a 32-bit DSP. [0039]
  • However, depending on the complexity of the algorithm, another factor, the implementation architecture, is preferably considered. If an implementation involves hundreds of function calls, the real-time execution at the end of the implementation flow is impacted. For this reason, two different steps in the implementation activity are utilized. [0040]
  • Another consideration at this level is the inheritance. A common method of implementing signal processing is to take the floating-point implementation and port it to a specific target. Another method includes porting an existing fixed-point implementation to a new target. The mechanisms are quite different because of the availability of a first implementation. In the latter case, it is more an adaptation of an existing application than a new implementation. The advantage is that it shortens the development process by reusing the existing code done for another target. [0041]
  • Sub-stage G2.1: Processing Qualification
  • The goal of processing qualification is to obtain an implementation that preferably provides the best trade-off on the output result for a given precision. One of the tools that can accelerate the completion of this step is the Cierto™ signal processing worksystem. This tool provides the capability to validate, compare, and qualify a process with a reference to the floating-point implementation. [0042]
  • Fixing the derivation criteria depends primarily on the application category. For image processing comparison, information like texture, edge, contrast and distortion is considered. For voice processing, the same elements may be taken into account, but spectral analysis, tone, volume and saturation, etc. may also be considered. Depending on the application domain, the criteria can be completely different. Furthermore, within a given domain, the criteria can change. Radar can be used in military or agricultural activities but the measures made for those two applications of radar image may be quite different. [0043]
  • Sub-stage G2.2: Implementation Sizing
  • With a qualified algorithm in terms of quality and precision, the first sizing of the algorithm can be addressed. Preferably, the information gathered includes the real-time data flow, the implementation structure and architecture, the profiling of instructions and cycles, and the performances of the target DSP. These elements help determine if the code can fit inside the target. [0044]
  • Real-Time Data Flow
  • The goal of real-time data flow is to understand the different I/Os related to the algorithm that are to be integrated into the DSP. On one level is the global data flow that globally indicates the availability of the raw and the processed data. With the global data flow, the developer identifies the processing delays that are going to provide a basic characteristic of the application relating to data flow. [0045]
  • However, with the global data flow that corresponds to a simplified representation of the data flow, the real behavior of the data coming in and out on the data bus of the system is not necessarily clear. The programmer may have to zoom in the elementary time duration (selected for the global data flow representation) to characterize the behavior of the implementation confronted with the interruptions coming from the devices involved in the process. This “elementary time duration” can be very different from one application to another. It can be the duration of, for example, an image frame, an image line, an audio frame, or a dedicated time dictated by control software or the processor. [0046]
  • Another data flow consideration is application cadence. Application cadence may impact all future decisions for the application. For example, in an interrupt-driven architecture, which is the case in most of the real-time DSP constraint developments, it is then possible to make clear design choices like use of a (first-in first-out) FIFO that will buffer data. This option provides a more flexible way to manage bus I/Os because it allows a better optimization of the bandwidth usage. It is generally a more expensive system design, but it is recommended for processing that involves large amounts of data, like image processing. [0047]
  • Alternatively, a designer may choose not to use a FIFO. This means that each piece of data produced is either immediately saved or eventually lost. This is the most constrained way of implementing a signal processing application, but is cheaper and well suited for processing that involves little data such as voice processing. This example shows the impact of the application cadence criteria on application development. [0048]
  • Another factor is bandwidth. Such a study exhaustively integrates external data variables and code fetches. However, a strict and exact representation of the detailed I/Os is generally impossible. All of the representations reflect only the static point of view. A real I/O study preferably indicates that temporal drift impacts the overall bandwidth all along the application processing. [0049]
  • Implementation Organization
  • Also affecting implementation sizing is implementation organization. Implementation organization preferably considers the implementation structure, the architecture of the implementation, and the behavior of the implementation. [0050]
  • Implementation Structure
  • The implementation structure generally means that the developer knows the number of functions implemented, the number of times they are called, split if possible into low-level and high-level functions, and so on. This first measure can be made manually or by using tools. One difficulty is identifying a tool that indicates the number of times a function is called. Context switching can be expensive if it occurs too many times. For this purpose, one can use free coverage tools like gprofs that provide part of the necessary information. The use of other tools like Sparcworks (Sun Microsystems) provides the call graph. [0051]
  • Implementation Architecture
  • The architecture of the implementation generally means knowing the overall behavior of the application to know if it may be necessary to revisit the algorithm construction to emphasize real-time issues. Given a specific processing algorithm that produces a signal processing development, the requirements can be formalized as follows: [0052]
  • 1) Obtain an “elementary” signal sample. This can be an audio value, an entire image, etc. [0053]
  • 2) Process the sample using the development made. [0054]
  • The first step in evaluating the feasibility of the application includes determining the global data flow to fix the limit of the input/output size and the precision (8, 16 or 32 bits). The first output of the data flow indicates if it is possible to sustain the I/Os, but another indication concerns the algorithm structure. [0055]
  • Effectively analyzing the processing flow in conjunction with the data flow can indicate that the some steps of the processing cannot be done before others. In that case, it may be possible to conclude sometime that some delay constraints can not be matched or simply that the algorithm cannot run real-time. [0056]
  • There are several definitions of the delay. There is the intrinsic delay related to every processing called real processing delay. In a data process, there is always the processing delay needed to perform the data transformation. But there is also the Architectural Delay (AD) related to the structure of the algorithm. In this case, it is related to the algorithm architecture that never allows reducing the architectural delay. [0057]
  • Application Behavior
  • A majority of applications integrate some computations that use static correspondence tables or lookup tables to transform the signal. However, depending on the calculation results, the tables that are used will not be the same. If, for the same computation, the signal processing uses two different conversion tables that have different sizes, then the application is non-deterministic. Thus, this part of an implementation is preferably clearly identified so that all of the steps that follow result in useful measures that can be accurately correlated with the performance increase of the application. [0058]
  • Profiling
  • The objective of high-level profiling is to provide a first indication of the number of cycles consumed by the implementation and the binary code size. If necessary, a simulator can also produce an instruction profiling. [0059]
  • One difficulty is fixing the comparison criteria so that it is known whether the application fits in the targeted DSP. However, it is possible to get a ratio based on the different benches realized on the DSP. The benches are generally provided by the DSP vendors. This means that the cycle counts indicated at a higher level must be correlated with the performances of the target DSP to establish a go/no go process. [0060]
  • As an example, several DSP providers provide appropriate benches. Several DSPs are compared with C-written kernel functions including MAC: Multiply accumulate, Vect: Simple vectors multiply, Fir: FIR filter with redundant load elimination, Lat: Lattice synthesis, Iir: IIR filter, Jpeg: JPEG discrete cosine transform [0061]
  • From the application point of view, if the programmer takes the average cycle reduction ratio, it is possible to obtain a value of 2.8. From the DSP point of view, one can get a 2.78 gain factor. This is one indicator. [0062]
  • On one hand, one thing that is not integrated in such benches is the fact that a complete application merges many kinds of functions. This means that the optimization is less efficient for part of the implemented algorithm. Furthermore, the application integrates several function calls that add, in some cases, significant overhead. [0063]
  • On the other hand, such benches do not assume that the maximum potential of the DSPs is exploited. One preferably measures the gain factor to get effective comparison criteria for the go-no go decision because the developer should go further down in assembly optimization. [0064]
  • Thus, if the generic C implementation cycle count indicates more than five to six times the number of targeted cycles, the developer may consider that the real-time application is not reachable in a reasonable amount of time. [0065]
  • Stage G3: Reference C Code/Optimization
  • This implementation is the reference after generic code qualification to be optimized at the C level. Based on this code, one applies several rules concerning the method of implementing the application that fosters processing time reduction. [0066]
  • A primary objective is to establish a test process that guarantees the integrity of the processing. The goal is to reduce the cycle consumption and not to transform the result of the processing. It is also possible to establish a specific test script to validate the optimization and/or use tools to compare the processing results. [0067]
  • The script makes easier the run of several tests and allows the programmer to gather information (traces) on the application behavior. The tools allow the creation of specific comparisons on the processed data. The Cadence Cierto Signal Processing Worksystem (SPW) is capable of such a task and can speed the development cycle. [0068]
  • Stage G3.1: Optimization
  • For the high-level language implementation, optimization preferably uses tricks such as loop reduction, loop merging, test reduction, pointer usage, and in-line functions or macros, to reduce context switching. These tricks are generic and can be used for most if not all high-level languages. [0069]
  • Another optimization step that can be integrated at this level is development chain optimization by addressing the specific options of the pre-processor, compiler, assembler, and/or linker. This may be useful if the implementation is initially done with the target development environment. Generally, the applications are initially developed on PCs or Workstations. Then, taking advantage of the generic compiler is not useful and can lead to bad decisions in terms of performances and code size. [0070]
  • At the implementation level, the developer assumes that the target DSP is fixed and that a simulator is available. Many C optimizations as are known in the art are possible at this language level. [0071]
  • Stage G3.2: Profiling
  • Each time a specific implementation is globally applied and validated the result is preferably benched, and if possible, fixed to facilitate further optimization. Preferably, at least three parameters are integrated: the global effort in terms of time to integrate a new optimization step, the processing time reduction that can be evaluated, and the code size evolution. These parameters preferably are correlated to the time dedicated to the project and whether or not the application is mandatory to system functionality. [0072]
  • Processing Time Reduction
  • In most cases, the gain in processing time follows a x[0073] −1 law. In-line functions, loop reduction and/or unrolling produce significant gain. Integrating pointers are normally less significant. However, a curve like that presented in FIG. 4, which regroups the measures realized for the generic implementation process, may be obtained.
  • A goal of these measures is to understand the impact of a modification. There is no generic rule that can be applied to all the code and all of the applications that reduce the number of cycles. Modifications that appear to optimize cycle count can actually increase it. Another goal is to fix a limit for the different optimization steps in terms of time. One rule, for example, may be to measure more than five percent of cycle reduction between two steps. [0074]
  • Code Size Evolution
  • This development measure is necessary to have embedded applications that do not have several Mbytes of memory available on the final system. Experiments have shown that the generic C optimization process considerably increases the code size. A fully dedicated C optimization process generally decreases the size. However, the programmer preferably guarantees that the code size does not exceed the available memory size of the target system. [0075]
  • Specific Implementation Process (Target-dependent Implementation in the High-level Language)
  • In this process, some instructions to allow the use of DSP-specific characteristics preferably are integrated into the C or other high-level language implementation. Many of the instructions may be addressed by using pragma instructions that are placed in the code to take advantage of caches or internal RAM, loop counters, multiply-accumulate capabilities (MAC), and multiply-subtract capabilities (MSU). Other specific characteristics like splitable ALUs or multipliers, parallel instruction execution, and pipeline effects are addressed in the assembly level. For some DSPs, the only way to use these characteristics is to handle them at the assembly level. Furthermore, this step requires that the developer perform the least amount of tuning on the code to comply with the DSPs features. [0076]
  • Although the pragmas and intrinsics tend to detract from the portability, those parts of the code may be encapsulated and isolated. With the use of “#if-define” or other such conditional compiling flags, target compiler dependent flags can be integrated into the code so that it is possible to recompile the same application for all the targets to be addressed. However, this method of implementation requires a clear and structured versioning system as well as clear coding rules. One of the main issues arises from the need to support more than three or four different targets. [0077]
  • Another task of this stage is to implement the high-level language (e.g., C) code and look at the effect obtained on the generated assembler. The goal is not to modify assembly code but to write C code in a way that the assembler part of the compiler generates optimized assembly code. The assumption is made that there is some specific C implementations that will impact the generated assembly code in the same way for many compilers. The examples are the “do {} while” or the MAC integration. However, this is mainly true for the second and third DSP generations. One can also use the example of the post-register modification. If the developer has realized a conversion of the implementation to integrate pointers, the position in the code of the pointer increment automatically generates or not the post-register modification in the assembly. [0078]
  • FIG. 5 is a flow diagram depicting a simplified representation of the main steps of the [0079] specific implementation process 500 represented as step 106 in FIG. 1. Detailed steps of the specific implementation process are depicted in the task flow diagram of FIG. 6.
  • Stage SI: C-Optimized Code Impacting the Assembler
  • A key objective is to integrate specific pragmas and intrinsics into the code. The pragmas allow the use of cache or internal RAM memories and integration of loop counters to optimize loop branches. The other aspect of this optimization concerns the implementation modifications that take advantage of the specific capabilities of the target DSP, including multiply-accumulate, multiply-subtract, splittable multiply-add, and post register modification. [0080]
  • The goal is to generate the assembly code and observe what can be modified in the C implementation that can be translated differently by the compiler. [0081]
  • Stage S2: Specialized Low-Level Functions
  • Depending on the implementation structure, it may be necessary to tune some specific functions that are used intensively. Some methods of accomplishing this include, for example, removing code that is not used, avoiding overhead introduced by recursive calls, moving loop invariant expressions out of the loops, and reducing the scope of the variables (using macros integrates this concept naturally). [0082]
  • Code Portability
  • While the above-mentioned improvements may be viewed as non-portable, they are portable in the sense that the overall architecture of the implementation can be ported and reused. To achieve that, the developer preferably encapsulates the specific instructions of a DSP by integrating specific flags related to the target compiler in the code, and by using a source versioning system to handle the various target DSPs. Note that integrating specific flags can validate specific code parts depending on those flags. [0083]
  • FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process. As shown in FIG. 7, the size decreases slowly because specific points of the application are addressed, but the impact on the performance can be impressive. [0084]
  • Fully-Dedicated Implementation Process
  • A fully dedicated implementation process is the lowest stage of the development process. Trade-offs on the application are preferably made by removing some processing passes, wherever possible. Assembly-specific optimization is also integrated to finally reach the target performance. [0085]
  • A key challenge is to determine after a profiling if the performance goal is reached. FIG. 8 is a task flow diagram depicting steps of a [0086] dedicated implementation process 800 represented as step 108 in FIG. 1. The dedicated implementation process includes two main steps, manual assembly optimization and feature tuning/cutting.
  • Manual Assembly Optimization
  • At this point, very low-level assembly language optimization is integrated. Key characteristics for this implementation generally come from parallel instructions, pipeline effects, and not fully optimized assembly code. Regarding parallel instructions, some DSPs are able to execute several instructions in the same clock cycle. It is possible to execute loads-operation-store in the same instruction. The main objective is to be able to integrate the pipeline effects that affect the availability of the processed data. [0087]
  • With respect to pipeline effects, mainly in the branch call is it possible to code specific instructions and take advantage of the pipeline delay slots. This optimization can be useful for the loop intensive applications. It is mandatory to handle the parallel instruction optimization. [0088]
  • For the computationally intensive part of not fully optimized assembly code, it may be necessary to reorganize the generated code and integrate a more optimal way of using accumulators and registers. [0089]
  • Feature Tuning/Cutting
  • Depending on the capacities of the used DSP, it may be necessary to re-adapt the application because it does not fit. If such a decision is made, then the high-level steps of the process are not adapted or have been neglected. It is necessary to re-evaluate the application behavior in terms of processing, which is normally a high-level task of the process, for example, floating to fixed-point implementation. [0090]
  • One work-around is to drop out some specific part of the application that will have little impact on the quality of the processed data. For example, in an audio processing application, functions such as a ring subtraction, a high-pass filter on the input signal, or compression rate could be dropped without a significant loss of performance. Although the gain in terms of performance may not be high, cutting compression rate can suppress enough cycles to reach the target performance. [0091]
  • Reaching this step of the dedicated implementation process may mean that the application has not been evaluated correctly. If this is the case, then optionally, some of the highest process levels may be re-addressed. [0092]
  • While preferred embodiments of the invention have been described herein, and are further explained in the accompanying materials, many variations are possible which remain within the concept and scope of the invention. Such variations would become clear to one of ordinary skill in the art after inspection of the specification and the drawings. The invention therefore is not to be restricted except within the spirit and scope of any appended claims. [0093]

Claims (3)

What is claimed is:
1. A method of optimizing a software program for a target processor to meet performance objectives, where the software program is coded in a high-level language, the method comprising the steps of:
(a) optimizing the software program such that a resulting first optimized form of the software program is substantially independent of the target processor and is substantially coded in the high-level language;
(b) optimizing the first optimized form of the software program such that a resulting second optimized form of the software program is substantially dependent on the target processor and is substantially coded in the high-level language; and
(c) optimizing the second optimized form of the software program such that a resulting third optimized form of the software program is substantially dependent on the target processor and is includes portions coded in a low-level language of the target processor.
2. The method of claim 1, further comprising steps of:
(a1) determining a first performance profile for the first optimized form of the software program, and comparing the first performance profile with the performance objectives; and
(b1) determining a second performance profile for the second optimized form of the software program, and comparing the second performance profile with the performance objectives.
3. The method of claim 2, wherein steps (b), (b1), and (c) are not performed if the performance objectives are met after completing step (a), and step (c) is not performed if the performance objectives are met after completing step (b).
US09/765,916 2000-07-03 2001-01-18 System and method for software code optimization Abandoned US20020066088A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/765,916 US20020066088A1 (en) 2000-07-03 2001-01-18 System and method for software code optimization

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US21674600P 2000-07-03 2000-07-03
US09/765,916 US20020066088A1 (en) 2000-07-03 2001-01-18 System and method for software code optimization

Publications (1)

Publication Number Publication Date
US20020066088A1 true US20020066088A1 (en) 2002-05-30

Family

ID=22808341

Family Applications (3)

Application Number Title Priority Date Filing Date
US09/765,916 Abandoned US20020066088A1 (en) 2000-07-03 2001-01-18 System and method for software code optimization
US09/765,917 Expired - Fee Related US7100124B2 (en) 2000-07-03 2001-01-18 Interface configurable for use with target/initiator signals
US11/408,858 Expired - Lifetime US7594205B2 (en) 2000-07-03 2006-04-21 Interface configurable for use with target/initiator signals

Family Applications After (2)

Application Number Title Priority Date Filing Date
US09/765,917 Expired - Fee Related US7100124B2 (en) 2000-07-03 2001-01-18 Interface configurable for use with target/initiator signals
US11/408,858 Expired - Lifetime US7594205B2 (en) 2000-07-03 2006-04-21 Interface configurable for use with target/initiator signals

Country Status (3)

Country Link
US (3) US20020066088A1 (en)
EP (1) EP1299826A1 (en)
WO (1) WO2002005144A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204386A1 (en) * 2002-04-24 2003-10-30 Glenn Colon-Bonet Class-based system for circuit modeling
US20040006761A1 (en) * 2002-07-05 2004-01-08 Anand Minakshisundaran B. System and method for automating generation of an automated sensor network
US20050289519A1 (en) * 2004-06-24 2005-12-29 Apple Computer, Inc. Fast approximation functions for image processing filters
EP1648113A2 (en) * 2004-10-14 2006-04-19 Agilent Technologies, Inc. - a Delaware corporation - Probe apparatus and method therefor
US20060143601A1 (en) * 2004-12-28 2006-06-29 International Business Machines Corporation Runtime optimizing applications for a target system from within a deployment server
US20070061784A1 (en) * 2005-09-09 2007-03-15 Sun Microsystems, Inc. Automatic code tuning
US20070061785A1 (en) * 2005-09-09 2007-03-15 Sun Microsystems, Inc. Web-based code tuning service
US20070074008A1 (en) * 2005-09-28 2007-03-29 Donofrio David D Mixed mode floating-point pipeline with extended functions
US20070234147A1 (en) * 2006-01-11 2007-10-04 Tsuyoshi Nakamura Circuit analysis device
US20090037824A1 (en) * 2007-07-30 2009-02-05 Oracle International Corporation Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications
US20090158263A1 (en) * 2007-12-13 2009-06-18 Alcatel-Lucent Device and method for automatically optimizing composite applications having orchestrated activities
US20090293051A1 (en) * 2008-05-22 2009-11-26 Fortinet, Inc., A Delaware Corporation Monitoring and dynamic tuning of target system performance
US20110231813A1 (en) * 2010-03-19 2011-09-22 Seo Sun Ae Apparatus and method for on-demand optimization of applications
US20130290936A1 (en) * 2012-04-30 2013-10-31 Nec Laboratories America, Inc. Method and System for Correlated Tracing with Automated Multi-Layer Function Instrumentation Localization
US20130346953A1 (en) * 2012-06-22 2013-12-26 Altera Corporation Opencl compilation
US8689194B1 (en) * 2007-08-20 2014-04-01 The Mathworks, Inc. Optimization identification
US9329846B1 (en) * 2009-11-25 2016-05-03 Parakinetics Inc. Cooperative program code transformation
US20220121677A1 (en) * 2019-06-25 2022-04-21 Sisense Sf, Inc. Method for automated query language expansion and indexing
US11954113B2 (en) * 2021-12-23 2024-04-09 Sisense Sf, Inc. Method for automated query language expansion and indexing

Families Citing this family (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020066088A1 (en) * 2000-07-03 2002-05-30 Cadence Design Systems, Inc. System and method for software code optimization
US6724220B1 (en) 2000-10-26 2004-04-20 Cyress Semiconductor Corporation Programmable microcontroller architecture (mixed analog/digital)
US8160864B1 (en) 2000-10-26 2012-04-17 Cypress Semiconductor Corporation In-circuit emulator and pod synchronized boot
US8176296B2 (en) 2000-10-26 2012-05-08 Cypress Semiconductor Corporation Programmable microcontroller architecture
US8149048B1 (en) 2000-10-26 2012-04-03 Cypress Semiconductor Corporation Apparatus and method for programmable power management in a programmable analog circuit block
US7765095B1 (en) 2000-10-26 2010-07-27 Cypress Semiconductor Corporation Conditional branching in an in-circuit emulation system
US8103496B1 (en) 2000-10-26 2012-01-24 Cypress Semicondutor Corporation Breakpoint control in an in-circuit emulation system
US6605962B2 (en) * 2001-05-06 2003-08-12 Altera Corporation PLD architecture for flexible placement of IP function blocks
US7406674B1 (en) 2001-10-24 2008-07-29 Cypress Semiconductor Corporation Method and apparatus for generating microcontroller configuration information
US8078970B1 (en) 2001-11-09 2011-12-13 Cypress Semiconductor Corporation Graphical user interface with user-selectable list-box
US8042093B1 (en) 2001-11-15 2011-10-18 Cypress Semiconductor Corporation System providing automatic source code generation for personalization and parameterization of user modules
US7770113B1 (en) 2001-11-19 2010-08-03 Cypress Semiconductor Corporation System and method for dynamically generating a configuration datasheet
US7774190B1 (en) 2001-11-19 2010-08-10 Cypress Semiconductor Corporation Sleep and stall in an in-circuit emulation system
US7844437B1 (en) * 2001-11-19 2010-11-30 Cypress Semiconductor Corporation System and method for performing next placements and pruning of disallowed placements for programming an integrated circuit
US8069405B1 (en) 2001-11-19 2011-11-29 Cypress Semiconductor Corporation User interface for efficiently browsing an electronic document using data-driven tabs
US6971004B1 (en) 2001-11-19 2005-11-29 Cypress Semiconductor Corp. System and method of dynamically reconfiguring a programmable integrated circuit
US7577726B1 (en) * 2002-02-07 2009-08-18 Cisco Technology, Inc. Method for updating a hardware configuration of a networked communications device
US8103497B1 (en) 2002-03-28 2012-01-24 Cypress Semiconductor Corporation External interface for event architecture
US7308608B1 (en) 2002-05-01 2007-12-11 Cypress Semiconductor Corporation Reconfigurable testing system and method
JP2006525585A (en) 2003-05-07 2006-11-09 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Processing system and method for transmitting data
US7584441B2 (en) * 2003-09-19 2009-09-01 Cadence Design Systems, Inc. Method for generating optimized constraint systems for retimable digital designs
US7406509B2 (en) * 2004-01-07 2008-07-29 Network Appliance, Inc. Dynamic switching of a communication port in a storage system between target and initiator modes
KR101034494B1 (en) * 2004-02-11 2011-05-17 삼성전자주식회사 Bus system based on open core protocol
US7295049B1 (en) 2004-03-25 2007-11-13 Cypress Semiconductor Corporation Method and circuit for rapid alignment of signals
US8069436B2 (en) 2004-08-13 2011-11-29 Cypress Semiconductor Corporation Providing hardware independence to automate code generation of processing device firmware
US8286125B2 (en) 2004-08-13 2012-10-09 Cypress Semiconductor Corporation Model for a hardware device-independent method of defining embedded firmware for programmable systems
JP2008512754A (en) * 2004-09-10 2008-04-24 フリースケール セミコンダクター インコーポレイテッド Apparatus and method for multiple endian mode bus matching
US7332976B1 (en) 2005-02-04 2008-02-19 Cypress Semiconductor Corporation Poly-phase frequency synthesis oscillator
US7400183B1 (en) 2005-05-05 2008-07-15 Cypress Semiconductor Corporation Voltage controlled oscillator delay cell and method
US8089461B2 (en) 2005-06-23 2012-01-03 Cypress Semiconductor Corporation Touch wake for electronic devices
US7689736B2 (en) * 2005-11-07 2010-03-30 Dot Hill Systems Corporation Method and apparatus for a storage controller to dynamically determine the usage of onboard I/O ports
US8085067B1 (en) 2005-12-21 2011-12-27 Cypress Semiconductor Corporation Differential-to-single ended signal converter circuit and method
US8067948B2 (en) 2006-03-27 2011-11-29 Cypress Semiconductor Corporation Input/output multiplexer bus
JP5159161B2 (en) * 2006-06-26 2013-03-06 キヤノン株式会社 Radiation imaging apparatus, radiation imaging system and control method thereof
CN100426275C (en) * 2006-11-21 2008-10-15 北京中星微电子有限公司 Bus interface devices and method
GB0706134D0 (en) * 2007-03-29 2007-05-09 Nokia Oyj A modular device component
US8092083B2 (en) 2007-04-17 2012-01-10 Cypress Semiconductor Corporation Temperature sensor with digital bandgap
US8130025B2 (en) 2007-04-17 2012-03-06 Cypress Semiconductor Corporation Numerical band gap
US7737724B2 (en) 2007-04-17 2010-06-15 Cypress Semiconductor Corporation Universal digital block interconnection and channel routing
US8516025B2 (en) * 2007-04-17 2013-08-20 Cypress Semiconductor Corporation Clock driven dynamic datapath chaining
US9564902B2 (en) 2007-04-17 2017-02-07 Cypress Semiconductor Corporation Dynamically configurable and re-configurable data path
US8026739B2 (en) 2007-04-17 2011-09-27 Cypress Semiconductor Corporation System level interconnect with programmable switching
US9720805B1 (en) 2007-04-25 2017-08-01 Cypress Semiconductor Corporation System and method for controlling a target device
US8266575B1 (en) 2007-04-25 2012-09-11 Cypress Semiconductor Corporation Systems and methods for dynamically reconfiguring a programmable system on a chip
US8065653B1 (en) 2007-04-25 2011-11-22 Cypress Semiconductor Corporation Configuration of programmable IC design elements
US8049569B1 (en) 2007-09-05 2011-11-01 Cypress Semiconductor Corporation Circuit and method for improving the accuracy of a crystal-less oscillator having dual-frequency modes
US9448964B2 (en) 2009-05-04 2016-09-20 Cypress Semiconductor Corporation Autonomous control in a programmable system
US8146027B1 (en) * 2009-05-07 2012-03-27 Xilinx, Inc. Creating interfaces for importation of modules into a circuit design
US8661390B2 (en) * 2012-02-13 2014-02-25 Chihliang (Eric) Cheng Method of extracting block binders and an application in block placement for an integrated circuit
US9727679B2 (en) 2014-12-20 2017-08-08 Intel Corporation System on chip configuration metadata
US10437946B1 (en) * 2016-09-01 2019-10-08 Xilinx, Inc. Using implemented core sources for simulation

Citations (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5450588A (en) * 1990-02-14 1995-09-12 International Business Machines Corporation Reducing pipeline delays in compilers by code hoisting
US5517611A (en) * 1993-06-04 1996-05-14 Sun Microsystems, Inc. Floating-point processor for a high performance three dimensional graphics accelerator
US5517432A (en) * 1994-01-31 1996-05-14 Sony Corporation Of Japan Finite state machine transition analyzer
US5524244A (en) * 1988-07-11 1996-06-04 Logic Devices, Inc. System for dividing processing tasks into signal processor and decision-making microprocessor interfacing therewith
US5539652A (en) * 1995-02-07 1996-07-23 Hewlett-Packard Company Method for manufacturing test simulation in electronic circuit design
US5548761A (en) * 1993-03-09 1996-08-20 International Business Machines Corporation Compiler for target machine independent optimization of data movement, ownership transfer and device control
US5557779A (en) * 1991-06-10 1996-09-17 Kabushiki Kaisha Toshiba Method for distributing a clock signal within a semiconductor integrated circuit by minimizing clock skew
US5577213A (en) * 1994-06-03 1996-11-19 At&T Global Information Solutions Company Multi-device adapter card for computer
US5581669A (en) * 1992-12-18 1996-12-03 Microsoft Corporation System and method for peripheral data transfer
US5596587A (en) * 1993-03-29 1997-01-21 Teradyne, Inc. Method and apparatus for preparing in-circuit test vectors
US5644754A (en) * 1993-11-22 1997-07-01 Siemens Aktiengesellschaft Bus controller and electronic device in a system in which several electronic devices are networked
US5651111A (en) * 1994-06-07 1997-07-22 Digital Equipment Corporation Method and apparatus for producing a software test system using complementary code to resolve external dependencies
US5732209A (en) * 1995-11-29 1998-03-24 Exponential Technology, Inc. Self-testing multi-processor die with internal compare points
US5737234A (en) * 1991-10-30 1998-04-07 Xilinx Inc Method of optimizing resource allocation starting from a high level block diagram
US5761078A (en) * 1996-03-21 1998-06-02 International Business Machines Corporation Field programmable gate arrays using semi-hard multicell macros
US5764498A (en) * 1997-06-25 1998-06-09 Honeywell Inc. Electronics assembly formed with a slotted coupling device that absorbs mechanical forces, such as vibration and mechanical shock
US5774371A (en) * 1994-08-03 1998-06-30 Matsushita Electric Industrial Co., Ltd. Semiconductor integrated circuit and layout designing method for the same
US5784291A (en) * 1994-12-22 1998-07-21 Texas Instruments, Incorporated CPU, memory controller, bus bridge integrated circuits, layout structures, system and methods
US5797013A (en) * 1995-11-29 1998-08-18 Hewlett-Packard Company Intelligent loop unrolling
US5812854A (en) * 1996-03-18 1998-09-22 International Business Machines Corporation Mechanism for integrating user-defined instructions with compiler-generated instructions and for optimizing the integrated instruction stream
US5946488A (en) * 1997-05-16 1999-08-31 Thnkage Ltd. Method for selectively and incrementally displaying the results of preprocessing
US5960186A (en) * 1995-06-08 1999-09-28 Arm Limited Digital circuit simulation with data interface scheduling
US5966537A (en) * 1997-05-28 1999-10-12 Sun Microsystems, Inc. Method and apparatus for dynamically optimizing an executable computer program using input data
US5983303A (en) * 1997-05-27 1999-11-09 Fusion Micromedia Corporation Bus arrangements for interconnection of discrete and/or integrated modules in a digital system and associated method
US6034542A (en) * 1997-10-14 2000-03-07 Xilinx, Inc. Bus structure for modularized chip with FPGA modules
US6067644A (en) * 1998-04-15 2000-05-23 International Business Machines Corporation System and method monitoring instruction progress within a processor
US6102961A (en) * 1998-05-29 2000-08-15 Cadence Design Systems, Inc. Method and apparatus for selecting IP Blocks
US6122690A (en) * 1997-06-05 2000-09-19 Mentor Graphics Corporation On-chip bus architecture that is both processor independent and scalable
US6134606A (en) * 1997-07-25 2000-10-17 Flashpoint Technology, Inc. System/method for controlling parameters in hand-held digital camera with selectable parameter scripts, and with command for retrieving camera capabilities and associated permissible parameter values
US6148432A (en) * 1997-11-17 2000-11-14 Micron Technology, Inc. Inserting buffers between modules to limit changes to inter-module signals during ASIC design and synthesis
US6154873A (en) * 1997-06-05 2000-11-28 Nec Corporation Layout designing method and layout designing apparatus
US6164841A (en) * 1998-05-04 2000-12-26 Hewlett-Packard Company Method, apparatus, and product for dynamic software code translation system
US6230317B1 (en) * 1997-07-11 2001-05-08 Intel Corporation Method and apparatus for software pipelining of nested loops
US6237128B1 (en) * 1997-10-01 2001-05-22 International Business Machines Corporation Method and apparatus for enabling parallel layout checking of designing VLSI-chips
US6247174B1 (en) * 1998-01-02 2001-06-12 Hewlett-Packard Company Optimization of source code with embedded machine instructions
US6260175B1 (en) * 1997-03-07 2001-07-10 Lsi Logic Corporation Method for designing an integrated circuit using predefined and preverified core modules having prebalanced clock trees
US6269467B1 (en) * 1998-09-30 2001-07-31 Cadence Design Systems, Inc. Block based design methodology
US6305001B1 (en) * 1998-06-18 2001-10-16 Lsi Logic Corporation Clock distribution network planning and method therefor
US6311313B1 (en) * 1998-12-29 2001-10-30 International Business Machines Corporation X-Y grid tree clock distribution network with tunable tree and grid networks
US6327696B1 (en) * 1998-05-05 2001-12-04 Lsi Logic Corporation Method and apparatus for zero skew routing from a fixed H trunk
US6345384B1 (en) * 1998-04-22 2002-02-05 Kabushiki Kaisha Toshiba Optimized program code generator, a method for compiling a source text and a computer-readable medium for a processor capable of operating with a plurality of instruction sets
US6347395B1 (en) * 1998-12-18 2002-02-12 Koninklijke Philips Electronics N.V. (Kpenv) Method and arrangement for rapid silicon prototyping
US6367051B1 (en) * 1998-06-12 2002-04-02 Monterey Design Systems, Inc. System and method for concurrent buffer insertion and placement of logic gates
US6367060B1 (en) * 1999-06-18 2002-04-02 C. K. Cheng Method and apparatus for clock tree solution synthesis based on design constraints
US20020100029A1 (en) * 2000-07-20 2002-07-25 Matt Bowen System, method and article of manufacture for compiling and invoking C functions in hardware
US6477691B1 (en) * 2000-04-03 2002-11-05 International Business Machines Corporation Methods and arrangements for automatic synthesis of systems-on-chip
US20030005419A1 (en) * 1999-10-12 2003-01-02 John Samuel Pieper Insertion of prefetch instructions into computer program code
US6622300B1 (en) * 1999-04-21 2003-09-16 Hewlett-Packard Development Company, L.P. Dynamic optimization of computer programs using code-rewriting kernal module
US6643630B1 (en) * 2000-04-13 2003-11-04 Koninklijke Philips Electronics N.V. Apparatus and method for annotating an intermediate representation of an application source code
US6654952B1 (en) * 2000-02-03 2003-11-25 Sun Microsystems, Inc. Region based optimizations using data dependence graphs
US6701474B2 (en) * 2000-06-28 2004-03-02 Cadence Design Systems, Inc. System and method for testing integrated circuits
US20040068707A1 (en) * 2002-10-03 2004-04-08 International Business Machines Corporation System on a chip bus with automatic pipeline stage insertion for timing closure
US6738967B1 (en) * 2000-03-14 2004-05-18 Microsoft Corporation Compiling for multiple virtual machines targeting different processor architectures
US6751723B1 (en) * 2000-09-02 2004-06-15 Actel Corporation Field programmable gate array and microcontroller system-on-a-chip
US6845489B1 (en) * 1999-04-30 2005-01-18 Matsushita Electric Industrial Co., Ltd. Database for design of integrated circuit device and method for designing integrated circuit device
US7072817B1 (en) * 1999-10-01 2006-07-04 Stmicroelectronics Ltd. Method of designing an initiator in an integrated circuit
US7100124B2 (en) * 2000-07-03 2006-08-29 Cadence Design Systems, Inc. Interface configurable for use with target/initiator signals

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4872169A (en) 1987-03-06 1989-10-03 Texas Instruments Incorporated Hierarchical scan selection
US5838583A (en) * 1996-04-12 1998-11-17 Cadence Design Systems, Inc. Optimized placement and routing of datapaths
US6067409A (en) 1996-06-28 2000-05-23 Lsi Logic Corporation Advanced modular cell placement system
US6286128B1 (en) 1998-02-11 2001-09-04 Monterey Design Systems, Inc. Method for design optimization using logical and physical information
US6311302B1 (en) 1999-04-01 2001-10-30 Philips Semiconductor, Inc. Method and arrangement for hierarchical control of multiple test access port control modules
US6470486B1 (en) * 1999-05-26 2002-10-22 Get2Chip Method for delay-optimizing technology mapping of digital logic

Patent Citations (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5524244A (en) * 1988-07-11 1996-06-04 Logic Devices, Inc. System for dividing processing tasks into signal processor and decision-making microprocessor interfacing therewith
US5450588A (en) * 1990-02-14 1995-09-12 International Business Machines Corporation Reducing pipeline delays in compilers by code hoisting
US5557779A (en) * 1991-06-10 1996-09-17 Kabushiki Kaisha Toshiba Method for distributing a clock signal within a semiconductor integrated circuit by minimizing clock skew
US5737234A (en) * 1991-10-30 1998-04-07 Xilinx Inc Method of optimizing resource allocation starting from a high level block diagram
US5581669A (en) * 1992-12-18 1996-12-03 Microsoft Corporation System and method for peripheral data transfer
US5548761A (en) * 1993-03-09 1996-08-20 International Business Machines Corporation Compiler for target machine independent optimization of data movement, ownership transfer and device control
US5596587A (en) * 1993-03-29 1997-01-21 Teradyne, Inc. Method and apparatus for preparing in-circuit test vectors
US5517611A (en) * 1993-06-04 1996-05-14 Sun Microsystems, Inc. Floating-point processor for a high performance three dimensional graphics accelerator
US5644754A (en) * 1993-11-22 1997-07-01 Siemens Aktiengesellschaft Bus controller and electronic device in a system in which several electronic devices are networked
US5517432A (en) * 1994-01-31 1996-05-14 Sony Corporation Of Japan Finite state machine transition analyzer
US5577213A (en) * 1994-06-03 1996-11-19 At&T Global Information Solutions Company Multi-device adapter card for computer
US5651111A (en) * 1994-06-07 1997-07-22 Digital Equipment Corporation Method and apparatus for producing a software test system using complementary code to resolve external dependencies
US5774371A (en) * 1994-08-03 1998-06-30 Matsushita Electric Industrial Co., Ltd. Semiconductor integrated circuit and layout designing method for the same
US5784291A (en) * 1994-12-22 1998-07-21 Texas Instruments, Incorporated CPU, memory controller, bus bridge integrated circuits, layout structures, system and methods
US5539652A (en) * 1995-02-07 1996-07-23 Hewlett-Packard Company Method for manufacturing test simulation in electronic circuit design
US5960186A (en) * 1995-06-08 1999-09-28 Arm Limited Digital circuit simulation with data interface scheduling
US5797013A (en) * 1995-11-29 1998-08-18 Hewlett-Packard Company Intelligent loop unrolling
US5732209A (en) * 1995-11-29 1998-03-24 Exponential Technology, Inc. Self-testing multi-processor die with internal compare points
US5812854A (en) * 1996-03-18 1998-09-22 International Business Machines Corporation Mechanism for integrating user-defined instructions with compiler-generated instructions and for optimizing the integrated instruction stream
US5761078A (en) * 1996-03-21 1998-06-02 International Business Machines Corporation Field programmable gate arrays using semi-hard multicell macros
US6260175B1 (en) * 1997-03-07 2001-07-10 Lsi Logic Corporation Method for designing an integrated circuit using predefined and preverified core modules having prebalanced clock trees
US5946488A (en) * 1997-05-16 1999-08-31 Thnkage Ltd. Method for selectively and incrementally displaying the results of preprocessing
US5983303A (en) * 1997-05-27 1999-11-09 Fusion Micromedia Corporation Bus arrangements for interconnection of discrete and/or integrated modules in a digital system and associated method
US5966537A (en) * 1997-05-28 1999-10-12 Sun Microsystems, Inc. Method and apparatus for dynamically optimizing an executable computer program using input data
US6154873A (en) * 1997-06-05 2000-11-28 Nec Corporation Layout designing method and layout designing apparatus
US6122690A (en) * 1997-06-05 2000-09-19 Mentor Graphics Corporation On-chip bus architecture that is both processor independent and scalable
US5764498A (en) * 1997-06-25 1998-06-09 Honeywell Inc. Electronics assembly formed with a slotted coupling device that absorbs mechanical forces, such as vibration and mechanical shock
US6230317B1 (en) * 1997-07-11 2001-05-08 Intel Corporation Method and apparatus for software pipelining of nested loops
US6134606A (en) * 1997-07-25 2000-10-17 Flashpoint Technology, Inc. System/method for controlling parameters in hand-held digital camera with selectable parameter scripts, and with command for retrieving camera capabilities and associated permissible parameter values
US6237128B1 (en) * 1997-10-01 2001-05-22 International Business Machines Corporation Method and apparatus for enabling parallel layout checking of designing VLSI-chips
US6034542A (en) * 1997-10-14 2000-03-07 Xilinx, Inc. Bus structure for modularized chip with FPGA modules
US6148432A (en) * 1997-11-17 2000-11-14 Micron Technology, Inc. Inserting buffers between modules to limit changes to inter-module signals during ASIC design and synthesis
US6247174B1 (en) * 1998-01-02 2001-06-12 Hewlett-Packard Company Optimization of source code with embedded machine instructions
US6067644A (en) * 1998-04-15 2000-05-23 International Business Machines Corporation System and method monitoring instruction progress within a processor
US6345384B1 (en) * 1998-04-22 2002-02-05 Kabushiki Kaisha Toshiba Optimized program code generator, a method for compiling a source text and a computer-readable medium for a processor capable of operating with a plurality of instruction sets
US6164841A (en) * 1998-05-04 2000-12-26 Hewlett-Packard Company Method, apparatus, and product for dynamic software code translation system
US6327696B1 (en) * 1998-05-05 2001-12-04 Lsi Logic Corporation Method and apparatus for zero skew routing from a fixed H trunk
US6102961A (en) * 1998-05-29 2000-08-15 Cadence Design Systems, Inc. Method and apparatus for selecting IP Blocks
US6367051B1 (en) * 1998-06-12 2002-04-02 Monterey Design Systems, Inc. System and method for concurrent buffer insertion and placement of logic gates
US6305001B1 (en) * 1998-06-18 2001-10-16 Lsi Logic Corporation Clock distribution network planning and method therefor
US6269467B1 (en) * 1998-09-30 2001-07-31 Cadence Design Systems, Inc. Block based design methodology
US6347395B1 (en) * 1998-12-18 2002-02-12 Koninklijke Philips Electronics N.V. (Kpenv) Method and arrangement for rapid silicon prototyping
US6311313B1 (en) * 1998-12-29 2001-10-30 International Business Machines Corporation X-Y grid tree clock distribution network with tunable tree and grid networks
US6622300B1 (en) * 1999-04-21 2003-09-16 Hewlett-Packard Development Company, L.P. Dynamic optimization of computer programs using code-rewriting kernal module
US6845489B1 (en) * 1999-04-30 2005-01-18 Matsushita Electric Industrial Co., Ltd. Database for design of integrated circuit device and method for designing integrated circuit device
US6367060B1 (en) * 1999-06-18 2002-04-02 C. K. Cheng Method and apparatus for clock tree solution synthesis based on design constraints
US7072817B1 (en) * 1999-10-01 2006-07-04 Stmicroelectronics Ltd. Method of designing an initiator in an integrated circuit
US20030005419A1 (en) * 1999-10-12 2003-01-02 John Samuel Pieper Insertion of prefetch instructions into computer program code
US6654952B1 (en) * 2000-02-03 2003-11-25 Sun Microsystems, Inc. Region based optimizations using data dependence graphs
US6738967B1 (en) * 2000-03-14 2004-05-18 Microsoft Corporation Compiling for multiple virtual machines targeting different processor architectures
US6477691B1 (en) * 2000-04-03 2002-11-05 International Business Machines Corporation Methods and arrangements for automatic synthesis of systems-on-chip
US6643630B1 (en) * 2000-04-13 2003-11-04 Koninklijke Philips Electronics N.V. Apparatus and method for annotating an intermediate representation of an application source code
US6701474B2 (en) * 2000-06-28 2004-03-02 Cadence Design Systems, Inc. System and method for testing integrated circuits
US7100124B2 (en) * 2000-07-03 2006-08-29 Cadence Design Systems, Inc. Interface configurable for use with target/initiator signals
US20020100029A1 (en) * 2000-07-20 2002-07-25 Matt Bowen System, method and article of manufacture for compiling and invoking C functions in hardware
US6751723B1 (en) * 2000-09-02 2004-06-15 Actel Corporation Field programmable gate array and microcontroller system-on-a-chip
US20040068707A1 (en) * 2002-10-03 2004-04-08 International Business Machines Corporation System on a chip bus with automatic pipeline stage insertion for timing closure

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204386A1 (en) * 2002-04-24 2003-10-30 Glenn Colon-Bonet Class-based system for circuit modeling
US20040006761A1 (en) * 2002-07-05 2004-01-08 Anand Minakshisundaran B. System and method for automating generation of an automated sensor network
US8271937B2 (en) 2002-07-05 2012-09-18 Cooper Technologies Company System and method for automating generation of an automated sensor network
US7346891B2 (en) * 2002-07-05 2008-03-18 Eka Systems Inc. System and method for automating generation of an automated sensor network
US20050289519A1 (en) * 2004-06-24 2005-12-29 Apple Computer, Inc. Fast approximation functions for image processing filters
EP1648113A2 (en) * 2004-10-14 2006-04-19 Agilent Technologies, Inc. - a Delaware corporation - Probe apparatus and method therefor
US20060083179A1 (en) * 2004-10-14 2006-04-20 Kevin Mitchell Probe apparatus and metod therefor
EP1648113A3 (en) * 2004-10-14 2008-06-04 Agilent Technologies, Inc. Probe apparatus and method therefor
US9535679B2 (en) * 2004-12-28 2017-01-03 International Business Machines Corporation Dynamically optimizing applications within a deployment server
US20060143601A1 (en) * 2004-12-28 2006-06-29 International Business Machines Corporation Runtime optimizing applications for a target system from within a deployment server
US7895585B2 (en) * 2005-09-09 2011-02-22 Oracle America, Inc. Automatic code tuning
US20070061785A1 (en) * 2005-09-09 2007-03-15 Sun Microsystems, Inc. Web-based code tuning service
US20070061784A1 (en) * 2005-09-09 2007-03-15 Sun Microsystems, Inc. Automatic code tuning
US20070074008A1 (en) * 2005-09-28 2007-03-29 Donofrio David D Mixed mode floating-point pipeline with extended functions
US20070234147A1 (en) * 2006-01-11 2007-10-04 Tsuyoshi Nakamura Circuit analysis device
US7624362B2 (en) * 2006-01-11 2009-11-24 Panasonic Corporation Circuit analysis device using processor information
US20090037824A1 (en) * 2007-07-30 2009-02-05 Oracle International Corporation Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications
US8572593B2 (en) * 2007-07-30 2013-10-29 Oracle International Corporation Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications
US8689194B1 (en) * 2007-08-20 2014-04-01 The Mathworks, Inc. Optimization identification
US9934004B1 (en) 2007-08-20 2018-04-03 The Mathworks, Inc. Optimization identification
US20090158263A1 (en) * 2007-12-13 2009-06-18 Alcatel-Lucent Device and method for automatically optimizing composite applications having orchestrated activities
US8601454B2 (en) * 2007-12-13 2013-12-03 Alcatel Lucent Device and method for automatically optimizing composite applications having orchestrated activities
US20090293051A1 (en) * 2008-05-22 2009-11-26 Fortinet, Inc., A Delaware Corporation Monitoring and dynamic tuning of target system performance
US9329846B1 (en) * 2009-11-25 2016-05-03 Parakinetics Inc. Cooperative program code transformation
US9383978B2 (en) * 2010-03-19 2016-07-05 Samsung Electronics Co., Ltd. Apparatus and method for on-demand optimization of applications
US20110231813A1 (en) * 2010-03-19 2011-09-22 Seo Sun Ae Apparatus and method for on-demand optimization of applications
US9092568B2 (en) * 2012-04-30 2015-07-28 Nec Laboratories America, Inc. Method and system for correlated tracing with automated multi-layer function instrumentation localization
US20130290936A1 (en) * 2012-04-30 2013-10-31 Nec Laboratories America, Inc. Method and System for Correlated Tracing with Automated Multi-Layer Function Instrumentation Localization
US9134981B2 (en) * 2012-06-22 2015-09-15 Altera Corporation OpenCL compilation
US20130346953A1 (en) * 2012-06-22 2013-12-26 Altera Corporation Opencl compilation
US20220121677A1 (en) * 2019-06-25 2022-04-21 Sisense Sf, Inc. Method for automated query language expansion and indexing
US11954113B2 (en) * 2021-12-23 2024-04-09 Sisense Sf, Inc. Method for automated query language expansion and indexing

Also Published As

Publication number Publication date
WO2002005144A1 (en) 2002-01-17
US7100124B2 (en) 2006-08-29
US20020016706A1 (en) 2002-02-07
US7594205B2 (en) 2009-09-22
US20060230369A1 (en) 2006-10-12
EP1299826A1 (en) 2003-04-09

Similar Documents

Publication Publication Date Title
US20020066088A1 (en) System and method for software code optimization
Raihan et al. Modeling deep learning accelerator enabled gpus
Lee et al. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems
US6113650A (en) Compiler for optimization in generating instruction sequence and compiling method
US5819064A (en) Hardware extraction technique for programmable reduced instruction set computers
JP4745341B2 (en) System, method and apparatus for dependency chain processing
Menard et al. Automatic floating-point to fixed-point conversion for DSP code generation
US6035123A (en) Determining hardware complexity of software operations
US8387026B1 (en) Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling
US9569188B1 (en) Compiling source code to reduce run-time execution of vector element reverse operations
Gustafsson et al. Towards a flow analysis for embedded system C programs
Brunie et al. Code generators for mathematical functions
Hong et al. An integrated environment for rapid prototyping of DSP Algorithms using matlab and Texas instruments’ TMS320C30
Sandberg et al. Faster WCET flow analysis by program slicing
US6256776B1 (en) Digital signal processing code development with fixed point and floating point libraries
Aamodt et al. Compile-time and instruction-set methods for improving floating-to fixed-point conversion accuracy
Liveris et al. A code transformation-based methodology for improving I-cache performance of DSP applications
US20220091831A1 (en) A streaming compiler for automatic adjoint differentiation
Spadini et al. Characterization of repeating dynamic code fragments
Bloch et al. Performance estimation of high-level dataflow program on heterogeneous platforms
Aamodt Floating-point to fixed-point compilation and embedded architectural support
Varnagirytė et al. A practical approach to DSP code optimization using compiler/architecture
Miomandre et al. Approximate buffers for reducing memory requirements in the ska
Miomandre et al. Approximate buffers for reducing memory requirements: Case study on SKA
JPH02176938A (en) Machine language instruction optimizing system

Legal Events

Date Code Title Description
AS Assignment

Owner name: CADENCE DESIGN SYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CANUT, FREDERIC;DERRAS, MUSTAPHA;REEL/FRAME:012233/0459

Effective date: 20010907

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION