US6029004A - Method and apparatus for modular reordering of portions of a computer program based on profile data - Google Patents

Method and apparatus for modular reordering of portions of a computer program based on profile data Download PDF

Info

Publication number
US6029004A
US6029004A US08/819,526 US81952697A US6029004A US 6029004 A US6029004 A US 6029004A US 81952697 A US81952697 A US 81952697A US 6029004 A US6029004 A US 6029004A
Authority
US
United States
Prior art keywords
call graph
profile data
modules
computer program
reordering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US08/819,526
Inventor
Vita Bortnikov
Bilha Mendelson
Mark Novick
Robert Ralph Roediger
William Jon Schmidt
Inbal Shavit-Lottem
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US08/819,526 priority Critical patent/US6029004A/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NOVICK, MARK, SHAVIT-LOTTEM, INBAL, BORTNIKOV, VITA, MENDELSON, BILHA, ROEDIGER, ROBERT RALPH, SCHMIDT, WILLIAM JON
Application granted granted Critical
Publication of US6029004A publication Critical patent/US6029004A/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level

Definitions

  • compilers and linkers take the human-readable form of a computer program, known as "source code”, and convert it into “machine code” or "object code” instructions that may be executed by a computer system. Because a compiler and its associated linker generate the stream of machine code instructions that are eventually executed on a computer system, the manner in which the compiler and linker package procedures within the computer program affects the performance of the computer program.
  • auxiliary storage such as a hard disk drive
  • Main memory is typically comprised of Random Access Memory (RAM), which has a much smaller storage capacity and is more expensive than primary storage, yet it is very fast.
  • Instructions and data are typically moved between primary storage and main memory in "pages.”
  • a "page” consists of a predefined number of bytes (typically a power of two), and is the fundamental unit of transfer between primary storage and main memory.
  • a predetermined number of pages are typically set aside in main memory for storing pages as they are moved between auxiliary storage and main memory.
  • the memory paging system When a processor within a computer system begins executing a computer program, the memory paging system fills the portion of main memory allocated to paging with pages from primary storage. When the processor needs data that is not in any of the pages in main memory, the memory paging system selects one or more pages that are replaced by new pages from primary storage. Swapping pages in and out of memory requires time and system resources, and therefore degrades system performance. In other words, the fewer the number of page swaps, the better.
  • profilers In order to optimize the performance of modern computer programs, profilers have been developed to predict and/or measure the run-time behavior of a computer program. profilers typically generate profile data that estimates how often different portions of the computer program are executed. Using profile data, a compiler, a linker, or a separate optimizer program may make decisions regarding the preferred order of procedures within the computer program in order to improve the performance of the computer system.
  • a module is a subset of a computer program that may be independently compiled (i.e., compiled separately from the rest of the modules).
  • a module generally has one or more procedures. While reordering all of the procedures in a computer program may produce a near-optimal arrangement of procedures, it has a negative effect on the maintainability of the computer program. For example, when a problem or "bug" is discovered in the computer program, the bug needs to be fixed with a minimum of disruption to the rest of the computer program.
  • an enhancement when an enhancement is made to the computer program, there needs to be a way to easily add the enhancement.
  • a small portion of the computer program e.g., one or more modules
  • replacing one or two modules may be problematic if the procedures within the replaced module have been packaged among procedures in many other modules.
  • An apparatus and method reorder portions of a computer program in a way that achieves both enhanced performance and maintainability of the computer program.
  • a global call graph is initially constructed from profile data. From the information in the global call graph, an intramodular call graph is generated for each module. Reordering techniques are used to reorder the procedures in each module according to the profile data in each intramodular call graph. An intermodular call graph is generated from the information in the global call graph. Reordering techniques are used to reorder the modules in the computer program. By reordering procedures within modules, then reordering the modules, enhanced performance is achieved without reordering procedures across module boundaries. Respecting module boundaries enhances the maintainability of the computer program by allowing a module to be replaced without adversely affecting the other modules while still providing many of the advantages of global procedure reordering.
  • FIG. 1 is a block diagram of a computer apparatus in accordance with the present invention.
  • FIG. 2 is a block diagram of the modular reordering mechanism of FIG. 1;
  • FIG. 3 is a flow diagram of a method for modular reordering of a computer program in accordance with the present invention
  • FIG. 4 is a global graph for a sample computer program
  • FIG. 5 is an intermodular call graph for a sample computer program represented by the global call graph of FIG. 4;
  • FIG. 6 is an intermodular call graph for a sample computer program represented by the global call graph of FIG. 4.
  • the present invention relates to optimization of a computer program using profile data.
  • the Overview section below provides general background information that will be helpful in understanding the concepts of the invention.
  • profiling mechanism that uses information collected about a program's run-time behavior (known as profile data) to improve optimization of that program.
  • profile data means any estimates of execution frequencies in a computer program, regardless of how the estimates are generated.
  • Profile data may be generated in a number of different ways. One way of generating profile data is to perform a static analysis of the program code to estimate the execution frequencies of procedures in the computer program. Other methods are known that dynamically collect information about a computer program as it runs.
  • sampling profiler uses a hardware timer to periodically wake up a process that records the address of the currently executing instruction.
  • a second type of dynamic profiler is known as a trace-based profiler, which collects an execution trace of all the instructions executed by the computer program.
  • An execution trace is a map that shows the addresses that were encountered during program execution. The profiler then reduces this information to a manageable size to determine how often each procedure in the computer program was called.
  • a third type of dynamic profiler is known as an instrumenting profiler.
  • An instrumenting profiler recompiles the computer program and inserts special instrumentation code known as "hooks" at important points in the computer program (such as procedure calls). As the instrumented program executes, these hooks cause data counters to be incremented, recording the procedure call history information directly as the computer program runs. The counters contain profile data that is then used to determine how often each procedure in the computer program was called.
  • Modern compilers typically group instructions into groups known as "basic blocks".
  • basic block is well known in the art, and represents a maximal sequence of straight-line code.
  • a procedure in a computer program typically includes many basic blocks.
  • a module, or compilation unit, in a computer program includes one or more procedures that are compiled at the same time.
  • the computer program is defined by a hierarchy of modules, procedures, and basic blocks. Once the computer program is defined by a particular hierarchy, certain profile-based optimizations may be performed to enhance the performance of the computer program.
  • procedure reordering which analyzes the most frequently executed paths among procedures in a computer program, and uses this information to reorder the procedures within the module or computer program of interest.
  • the primary purpose of this reordering is to improve memory paging performance.
  • Procedure A which has 500 bytes
  • Procedure B which has 1200 bytes
  • the page size is 1,000 bytes and that the memory paging system brings in two pages at a time and can store a total of ten pages.
  • A is aligned at the first byte of the module, and that other procedures have been executed prior to A (to assure the memory paging system has filled all of the pages).
  • procedure B may occupy two or three pages depending on where the page boundaries fall. For example, if B has its first 100 bytes on one page, its next 1000 bytes on a second page, and its last 100 bytes on a third page, it would span three pages. If procedure B spans three pages, the memory paging system would have to bring in four pages into the main memory, since it brings in two pages at a time.
  • the initial packing order of A and B results in the memory paging system performing a total of six page swaps to execute A and B. Note, however, that if B were packaged immediately after A, both A and B (total of 1,700 bytes) would be brought in with the first two page swaps to bring in A, reducing the number of page swaps from six to two.
  • This example illustrates how the packaging order of procedures in a computer program can affect performance of the computer system.
  • a call graph is a graph consisting of one node for each procedure in the program portion of interest.
  • a call graph can be "weighted" with estimates of execution frequencies. These estimates are often obtained using profile data from sample executions of the computer program, but other methods of estimating execution frequencies are possible. For example, some compilers try to provide rough estimates by static analysis of the computer program. Such estimates are generally not as accurate as those obtained by dynamic profiling, but can be generated without the overhead of dynamic profiling. Weights may be assigned to procedures or to the arcs between them in a call graph. A weight given to a procedure indicates how frequently that procedure is called, whereas a weight given to an arc A-B indicates how frequently procedure A calls procedure B.
  • the initial procedure is preferred to be a procedure that touches the last trace.
  • the method for reordering procedures according to Pettis et al. and according to the methods disclosed in the related application cited above are all viable ways of reordering procedures within a computer program.
  • Profile data may be generated in any of the methods discussed above, or may be generated by new methods in the future.
  • the present invention uses profile data in a call graph to reorder procedures in a portion of a computer program, without regard to how the profiling data was generated.
  • a computer system 100 in accordance with the present invention is an enhanced IBM AS/400 mid-range computer system.
  • Computer system 100 suitably comprises a processor 110, main memory 120, a memory controller 130, an auxiliary storage interface 140, and a terminal interface 150, all of which are interconnected via a system bus 160.
  • processor 110 main memory 120
  • memory controller 130 an auxiliary storage interface 140
  • terminal interface 150 terminal interface 150
  • FIG. 1 is presented to simply illustrate some of the salient features of computer system 100.
  • Processor 110 performs computation and control functions of computer system 100, and comprises a suitable central processing unit.
  • Processor 110 may comprise a single integrated circuit, such as a microprocessor, or may comprise any suitable number of integrated circuit devices and/or circuit boards working in cooperation to accomplish the functions of a processor.
  • Processor 110 suitably executes a computer program 122 within main memory 120.
  • Auxiliary storage interface 140 is used to allow computer system 100 to store and retrieve information from auxiliary storage, such as magnetic disk (e.g., hard disks or floppy diskettes) or optical storage devices (e.g., CD-ROM).
  • auxiliary storage such as magnetic disk (e.g., hard disks or floppy diskettes) or optical storage devices (e.g., CD-ROM).
  • One suitable storage device is a direct access storage device (DASD) 170.
  • DASD 170 may be a floppy disk drive which may read programs and data from a floppy disk 180.
  • a modular reordering mechanism in accordance with the present invention may exist as a program product on one or more floppy disks 180.
  • signal bearing media include: recordable type media such as floppy disks (e.g., disk 180) and CD ROMS, and transmission type media such as digital and analog communication links.
  • Memory controller 130 through use of a processor separate from processor 110, is responsible for moving requested information from main memory 120 and/or through auxiliary storage interface 140 to processor 110. Memory controller 130 performs the functions of the memory paging system, swapping pages from DASD 170 to main memory 120 as required. While for the purposes of explanation, memory controller 130 is shown as a separate entity, those skilled in the art understand that, in practice, portions of the function provided by memory controller 130 may actually reside in the circuitry associated with processor 110, main memory 120, and/or auxiliary storage interface 140.
  • Terminal interface 150 allows system administrators and computer programmers to communicate with computer system 100, normally through programmable workstations.
  • system 100 depicted in FIG. 1 contains only a single main processor 110 and a single system bus 160, it should be understood that the present invention applies equally to computer systems having multiple processors and multiple system buses.
  • system bus 160 of the preferred embodiment is a typical hardwired, multidrop bus, any connection means that supports bi-directional communication could be used.
  • Main memory 120 contains one or more computer programs 122, an operating system 124, a global call graph 126, and a modular reordering mechanism 128.
  • modular reordering mechanism 128 may be part of an optimizing compiler, may be part of a linker, or may be part of a separate post-pass processing program.
  • Computer program 122 in memory 120 is used in its broadest sense, and includes any and all forms of computer programs, including source code, intermediate (i.e., relocatable) code, machine code, and any other representation of a computer program that has procedures that could be reordered.
  • Global call graph 126 includes profile data that estimates execution frequencies of procedures within computer program 122.
  • Global call graph 126 preferably includes all procedures within a computer program 122.
  • Profile data within global call graph 126 may be generated using static analysis, dynamic profiling, or any other method for estimating execution frequencies for the procedures within computer program 122. Once generated, the profile data is stored in main memory 120 as part of the global call graph 126.
  • main memory 120 will not necessarily contain all parts of all mechanisms shown. For example, portions of computer program 122 and operating system 124 may be loaded into an instruction cache (not shown) for processor 110 to execute, while other files may well be stored on magnetic or optical disk storage devices (not shown).
  • the compiler may generate a machine code instruction stream from computer program 122 that is intended to be executed on a different computer system if the optimizing compiler is a cross-compiler. The methods and apparatus discussed herein apply to all types of compilers, including cross-compilers.
  • modular procedure reordering mechanism 128 includes a call graph partitioner 210, a procedure reordering mechanism 220, and a module reordering mechanism 230.
  • Call graph partitioner 210 includes an intramodular call graph generator 212 and an intermodular call graph generator 214.
  • Intramodular call graph generator 212 generates one or more intramodular call graphs 222, preferably one intramodular call graph 222 per module in the computer program 122.
  • Intermodular call graph generator 214 generates an intermodular call graph 234. The functions of these generators 212 and 214 is discussed in more detail below with reference to FIGS. 5 and 6.
  • Procedure reordering mechanism 220 takes profile data in an intramodular call graph 222 (produced by intramodular call graph generator 212) that corresponds to a selected module, and reorders procedures within the module.
  • Procedure reordering mechanism 220 preferably reorders the procedures in each module within the computer program 122 (and therefore uses profile data in many intramodular call graphs), but it is also anticipated within the scope of the present invention that procedure reordering mechanism 220 may reorder the procedures only within selected modules. Any method may be used to reorder the procedures in a module within the scope of the invention. For example, the coalescing method that uses an undirected call graph as taught in the Pettis et al.
  • Each intramodular call graph 222 comprises a subgraph of the global call graph 126 that contains information for procedures within a single module.
  • Each intramodular call graph 222 provides profile data regarding which procedures are called by other procedures within a module, but no calls outside of the module are represented. Intramodular call graph 222 thus provides profile data which allows procedure reordering mechanism 220 to appropriately reorder procedures within module boundaries.
  • Procedure reordering mechanism 220 is preferably implemented in a compiler or in a separate software tool that performs post-pass processing and optimization of the computer program.
  • a compiler must order procedures in a module when it compiles the module, so optimizing the procedure ordering when the module is initially compiled saves the time and effort of reordering later.
  • a post-pass processing tool may be preferable for ordering procedures within modules. For example, if many different programmers are developing different modules of a computer program using different compilers, a post-pass processing tool could reorder the procedures within each module in one place, rather than building in a procedure reordering mechanism into each different compiler.
  • Module reordering mechanism 230 uses profile data in an intermodular call graph 234 (generated by intermodular call graph generator 214) that preferably includes all of the modules in computer program 122, but could include a subset of the modules as well. Module reordering mechanism 230 reorders modules within computer program 122 to determine the packaging order of the modules. Module reordering mechanism 230 preferably uses the same reordering method for reordering modules that procedure reordering mechanism 220 uses for reordering procedures. In the alternative, module reordering mechanism 230 may use a different reordering method.
  • Each intermodular call graph 234 comprises a subset of the global call graph 126 that contains profile data regarding which procedures in which modules are called by procedures in other modules. No information regarding procedure calls within a module is present, because this information was previously presented in the intramodular call graphs 222 corresponding to each module and taken into account when reordering the procedures within each module. Intermodular call graph 234 thus provides profile data which allows module reordering mechanism 230 to determine the best ordering of modules in the computer program based on the procedure calls between modules.
  • Module reordering mechanism 230 is preferably implemented in a linker or in a separate software tool that performs post-pass processing and optimization of the computer program. If performed by a linker, the reordering need not be performed again at a later stage. However, in certain circumstances (e.g., if it is not feasible to modify the linker because the linker code is inaccessible), module reordering by a post-pass processing tool may be the more attractive option.
  • the reordering mechanism 128 is modular because it divides the computer program up into modules, reorders procedures within those modules, then reorders the modules themselves. Thus, reordering is being performed at two distinct levels of granularity.
  • procedure reordering mechanism 220 and module reordering mechanism 230 use the same reordering method, it is possible to use similar code or even the same code to perform both reorderings.
  • this reordering scheme may be extended to finer or coarser levels of granularity within the scope of the present invention.
  • the intramodular call graph generator could generate call graphs consisting of multiple modules, rather than a single module, or could generate multiple call graphs from a single module. Those skilled in the art will recognize that these and other variations are possible within the scope of the present invention.
  • a method 300 for modular reordering of program portions in a computer program begins by generating a global call graph 126 (step 310).
  • the global call graph 126 is generated from profile data that estimates the execution frequency for procedures within computer program 122. As discussed above, this profile data may be statically or dynamically generated, and may be appropriately weighted to arrive at the estimates of execution frequencies.
  • Method 300 then generates an intramodular call graph 222 for each module (step 320), and reorders the procedures within each module (step 330). Note that the reordering within each module respects the module boundaries. This assures that if a module needs to be replaced due to an enhancement or bug fix, it may be replaced without affecting procedures in other modules.
  • Method 300 then generates an intermodular call graph 234 (step 340), and reorders the modules within the computer program (step 350).
  • the reordering of modules preferably includes all modules in the computer program in the preferred embodiment, but reordering fewer than all modules is also anticipated by the present invention.
  • steps 320 and 340 may generate their respective call graphs directly from profile data, without first constructing a global call graph.
  • a sample global call graph 126 is shown in FIG. 4, which is generated in step 310 (FIG. 3).
  • the computer program depicted by this global call graph 126 includes: a first module M1 that has four procedures M1.P1, M1.P2, M1.P3, and M1.P4; a second module M2 that includes two procedures M2.P1 and M2.P2; and a third module M3 that includes three procedures M3.P1, M3.P2, and M3.P3.
  • An arc going from a first procedure to a second procedure depicts that the first procedure calls the second procedure.
  • M1.P1 calls M2.P1.
  • the weights of the arcs represent the number of times the call is made, so M1.P1 calls M2.P1 20 times.
  • the presence and weight of arcs in the global call graph 126 are derived from profile data that estimates execution frequencies for procedures within the computer program.
  • the weights of procedures are estimates of the total number of times each procedure is called, which equals the sum of the weights of arcs leading to a procedure. For example, in global call graph 126 of FIG. 4, procedure M1.P4 has a weight of 30, which is the sum of the weights of arcs leading to that procedure.
  • each intramodular call graph 222 includes information regarding all calls within a module, but calls across module boundaries are not represented or considered. Thus, when comparing module M1 in the global call graph 126 with module M1 in the intramodular call graph 222, we see that the calls among modules within module M1 are still present (namely, that M1.P1 calls M1.P2 16 times and that M 1.P2 calls M 1.P4 20 times).
  • Procedure reordering mechanism 220 thus operates on one module at a time to reorder the procedures within a selected module according to the profile data contained within the intramodular call graph 222 that corresponds to the selected module.
  • Profile data within global call graph 126 may also be used to generate intermodular call graph 234 of FIG. 6 (step 340).
  • Call graph partitioner 210 uses its intermodular call graph generator 214 for this purpose.
  • Intermodular call graph 234 represents all the calls between modules in global call graph 126, but calls entirely within each module are not represented.
  • the call from M1.P1 to M2.P1 with a weight of 20 in global call graph 126 is represented in intermodular call graph 234 of FIG. 6 as an arc of weight 20 from M1 to M2.
  • the call from M2.P2 to M1.P4 with a weight of 10 in global call graph 126 of FIG. 6 is represented by an arc of weight 10 from M2 to M1.
  • the call from M3.P2 to M2.P2 with a weight of 20 in global call graph 126 is represented in intermodular call graph 234 by an arc of weight 20 from M3 to M2. Note, however, that multiple arcs in the same direction between modules in the global call graph 126 are combined into one arc in the intermodular call graph 234. For example, the call from M1.P3 to M3.P1 with a weight of 50 is combined with the call from M1.P4 to M3.P2 with a weight of 100 to result in a single arc from M1 to M3 with a weight of 150.
  • module reordering mechanism 230 uses the profile data contained in intermodular call graph 234 to reorder the modules (step 350).
  • the global call graph 126 of FIG. 4, the intramodular call graphs 222 of FIG. 5, and the intermodular call graph 234 of FIG. 6 are shown to illustrate the general concepts of the invention. This specific example shows how the global call graph 126 may be easily partitioned into several intramodular call graphs 222 and one intermodular call graph 234. Those skilled in the art will recognize that many variations may be made within the scope of the invention. For example, each module could be further divided down into groups of procedures. In addition, different intermodular call graphs could be generated for different groups of modules. While the call graphs of FIGS. 4-6 are shown as directed call graphs, reordering using undirected call graphs is also within the scope of the invention. The present invention expressly encompasses any and all reordering of program portions within a computer program, regardless of the level of granularity.
  • the apparatus and methods disclosed herein provide a way to reorder portions of a computer program on a modular basis, thereby achieving enhanced performance through procedure reordering within modules and through module reordering while providing enhanced maintainability of the computer program by restricting procedure reordering to reordering within module boundaries.
  • the computer program may be easily maintained by replacing entire modules without adversely affecting the operation of modules in the computer program that do not change.

Abstract

An apparatus and method reorder portions of a computer program in a way that achieves both enhanced performance and maintainability of the computer program. A global call graph is initially constructed that includes profile data. From the information in the global call graph, an intramodular call graph is generated for each module. Reordering techniques are used to reorder the procedures in each module according to the profile data in each intramodular call graph. An intermodular call graph is generated from the information in the global call graph. Reordering techniques are used to reorder the modules in the computer program. By reordering procedures within modules, then reordering the modules, enhanced performance is achieved without reordering procedures across module boundaries. Respecting module boundaries enhances the maintainability of the computer program by allowing a module to be replaced without adversely affecting the other modules while still providing many of the advantages of global procedure reordering.

Description

RELATED APPLICATIONS
This application is related to the following co-pending patent applications: "Method and Apparatus for Reordering Procedures in a Computer Program Based on Profile Data", Ser. No. 08/820,735, filed Mar. 19, 1997; and "Method and Apparatus for Profile-Based Reordering of Program Portions in a Computer Program", Ser. No. 08/814,527, filed Mar. 10, 1997.
BACKGROUND OF THE INVENTION
1. Technical Field
This invention generally relates to computer systems. More specifically, this invention relates to a method and apparatus for reordering the packaging of procedures within a computer program.
2. Background Art
The development of the EDVAC computer system in 1948 is generally considered the beginning of the computer era. Since that time, dramatic advances in both hardware and software (e.g., computer programs) have drastically improved the performance of computer systems. Modern software has become very complex when compared to early computer programs. Many modern computer programs have tens or hundreds of thousands of instructions. The execution time (and hence, performance) of a computer program is very closely related to the number of instructions that are executed as the computer program runs. Thus, as the size and complexity of computer programs increase, the execution time of the computer program increases as well.
Unlike early computer programs, modern computer programs are typically written in a high-level language that is easy to understand by a human programmer. Special software tools known as compilers and linkers take the human-readable form of a computer program, known as "source code", and convert it into "machine code" or "object code" instructions that may be executed by a computer system. Because a compiler and its associated linker generate the stream of machine code instructions that are eventually executed on a computer system, the manner in which the compiler and linker package procedures within the computer program affects the performance of the computer program.
In particular, the ordering of procedures within a computer program affects the performance of the memory paging system in a computer system. Nearly all computer systems have auxiliary storage, such as a hard disk drive, that has large storage capacity, and is relatively inexpensive yet slow compared to main memory. Main memory is typically comprised of Random Access Memory (RAM), which has a much smaller storage capacity and is more expensive than primary storage, yet it is very fast. Instructions and data are typically moved between primary storage and main memory in "pages." A "page" consists of a predefined number of bytes (typically a power of two), and is the fundamental unit of transfer between primary storage and main memory. A predetermined number of pages are typically set aside in main memory for storing pages as they are moved between auxiliary storage and main memory. When a processor within a computer system begins executing a computer program, the memory paging system fills the portion of main memory allocated to paging with pages from primary storage. When the processor needs data that is not in any of the pages in main memory, the memory paging system selects one or more pages that are replaced by new pages from primary storage. Swapping pages in and out of memory requires time and system resources, and therefore degrades system performance. In other words, the fewer the number of page swaps, the better.
In order to optimize the performance of modern computer programs, profilers have been developed to predict and/or measure the run-time behavior of a computer program. Profilers typically generate profile data that estimates how often different portions of the computer program are executed. Using profile data, a compiler, a linker, or a separate optimizer program may make decisions regarding the preferred order of procedures within the computer program in order to improve the performance of the computer system.
Known prior art systems generate profile data that is used by a linker to determine the order of all procedures within a computer program. For many sophisticated computer programs, such as operating systems, the size of the computer program is quite large, consisting of numerous modules. A module is a subset of a computer program that may be independently compiled (i.e., compiled separately from the rest of the modules). A module generally has one or more procedures. While reordering all of the procedures in a computer program may produce a near-optimal arrangement of procedures, it has a negative effect on the maintainability of the computer program. For example, when a problem or "bug" is discovered in the computer program, the bug needs to be fixed with a minimum of disruption to the rest of the computer program. Likewise, when an enhancement is made to the computer program, there needs to be a way to easily add the enhancement. Ideally, a small portion of the computer program (e.g., one or more modules) may be changed to fix the bug or to add the enhancement. However, replacing one or two modules may be problematic if the procedures within the replaced module have been packaged among procedures in many other modules.
If the procedures in a computer program have been interspersed across module boundaries, the most common way to assure the correct operation of the computer program is to supply an entire new program when a bug is fixed or an enhancement is made. However, in the case of large programs like operating systems, the prospect of sending out the entire program each time a bug fix is needed (or an enhancement is made) is prohibitively expensive. In addition, for some computer programs (again, like operating systems), the program needs to continue running while the updated modules are added. Shipping the entire program would require the computer system to be shut down while the new computer program is loaded, an untenable solution in many circumstances. Faced with these issues and with prior art solutions, there exist two primary options: 1) allow global procedure reordering, which makes maintenance difficult but enhances the performance of the code; or 2) forego the advantages of global procedure reordering so that procedures stay in the modules where they are created, making maintenance easier at the cost of slower execution time for the code. Without improved apparatus and methods for the reordering of procedures in a computer program based on profile data in a way that allows maintainability of the computer program, the computer industry will continue to suffer from this undesirable tradeoff.
DISCLOSURE OF INVENTION
An apparatus and method reorder portions of a computer program in a way that achieves both enhanced performance and maintainability of the computer program. A global call graph is initially constructed from profile data. From the information in the global call graph, an intramodular call graph is generated for each module. Reordering techniques are used to reorder the procedures in each module according to the profile data in each intramodular call graph. An intermodular call graph is generated from the information in the global call graph. Reordering techniques are used to reorder the modules in the computer program. By reordering procedures within modules, then reordering the modules, enhanced performance is achieved without reordering procedures across module boundaries. Respecting module boundaries enhances the maintainability of the computer program by allowing a module to be replaced without adversely affecting the other modules while still providing many of the advantages of global procedure reordering.
The foregoing and other features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings.
BRIEF DESCRIPTION OF DRAWINGS
The preferred embodiments of the present invention will hereinafter be described in conjunction with the appended drawings, where like designations denote like elements, and:
FIG. 1 is a block diagram of a computer apparatus in accordance with the present invention;
FIG. 2 is a block diagram of the modular reordering mechanism of FIG. 1;
FIG. 3 is a flow diagram of a method for modular reordering of a computer program in accordance with the present invention;
FIG. 4 is a global graph for a sample computer program;
FIG. 5 is an intermodular call graph for a sample computer program represented by the global call graph of FIG. 4; and
FIG. 6 is an intermodular call graph for a sample computer program represented by the global call graph of FIG. 4.
BEST MODE FOR CARRYING OUT THE INVENTION
The present invention relates to optimization of a computer program using profile data. For those that are not experts in the field, the Overview section below provides general background information that will be helpful in understanding the concepts of the invention.
OVERVIEW
Optimizations Using Profile Data
Many modern software development environments include a profiling mechanism that uses information collected about a program's run-time behavior (known as profile data) to improve optimization of that program. "Profile data" as used herein means any estimates of execution frequencies in a computer program, regardless of how the estimates are generated. Profile data may be generated in a number of different ways. One way of generating profile data is to perform a static analysis of the program code to estimate the execution frequencies of procedures in the computer program. Other methods are known that dynamically collect information about a computer program as it runs.
One type of dynamic profiler is known as a sampling profiler. A sampling profiler uses a hardware timer to periodically wake up a process that records the address of the currently executing instruction.
A second type of dynamic profiler is known as a trace-based profiler, which collects an execution trace of all the instructions executed by the computer program. An execution trace is a map that shows the addresses that were encountered during program execution. The profiler then reduces this information to a manageable size to determine how often each procedure in the computer program was called.
A third type of dynamic profiler is known as an instrumenting profiler. An instrumenting profiler recompiles the computer program and inserts special instrumentation code known as "hooks" at important points in the computer program (such as procedure calls). As the instrumented program executes, these hooks cause data counters to be incremented, recording the procedure call history information directly as the computer program runs. The counters contain profile data that is then used to determine how often each procedure in the computer program was called.
Procedure Reordering
Modern compilers typically group instructions into groups known as "basic blocks". The term "basic block" is well known in the art, and represents a maximal sequence of straight-line code. A procedure in a computer program typically includes many basic blocks. A module, or compilation unit, in a computer program includes one or more procedures that are compiled at the same time. The computer program is defined by a hierarchy of modules, procedures, and basic blocks. Once the computer program is defined by a particular hierarchy, certain profile-based optimizations may be performed to enhance the performance of the computer program.
One important profile-based optimization is procedure reordering, which analyzes the most frequently executed paths among procedures in a computer program, and uses this information to reorder the procedures within the module or computer program of interest. The primary purpose of this reordering is to improve memory paging performance.
The performance of the memory paging system depends on the ordering of procedures, as illustrated by the extremely simplified example below. Procedure A, which has 500 bytes, is initially packaged at the beginning of the module. Procedure B, which has 1200 bytes, is initially packaged at the end of the module. There are numerous procedures in between A and B that have a total of 15,000 bytes. Assume the page size is 1,000 bytes and that the memory paging system brings in two pages at a time and can store a total of ten pages. Assume that A is aligned at the first byte of the module, and that other procedures have been executed prior to A (to assure the memory paging system has filled all of the pages). When the memory paging system sees that the processor needs to access A, it will swap two of the pages already in memory for the two pages that include procedure A. These two pages include procedure A and 1,500 bytes of other procedures. When the processor has finished executing A, assume that B needs to be executed, but B is not in memory. The memory paging system must bring in the pages that contain procedure B. Note that procedure B may occupy two or three pages depending on where the page boundaries fall. For example, if B has its first 100 bytes on one page, its next 1000 bytes on a second page, and its last 100 bytes on a third page, it would span three pages. If procedure B spans three pages, the memory paging system would have to bring in four pages into the main memory, since it brings in two pages at a time. In this example, the initial packing order of A and B results in the memory paging system performing a total of six page swaps to execute A and B. Note, however, that if B were packaged immediately after A, both A and B (total of 1,700 bytes) would be brought in with the first two page swaps to bring in A, reducing the number of page swaps from six to two. This example illustrates how the packaging order of procedures in a computer program can affect performance of the computer system.
Effectively reordering procedures requires an understanding of call graphs. A call graph is a graph consisting of one node for each procedure in the program portion of interest.
In an undirected call graph (that ignores the direction of the call), there is an arc between procedure A and procedure B if either one calls the other. In a directed call graph, there is an arc from A to B if procedure A calls procedure B. In such a case, A is said to be a predecessor of B, B is said to be a successor of A, and the arc A-B is said to be incident to procedures A and B.
A call graph can be "weighted" with estimates of execution frequencies. These estimates are often obtained using profile data from sample executions of the computer program, but other methods of estimating execution frequencies are possible. For example, some compilers try to provide rough estimates by static analysis of the computer program. Such estimates are generally not as accurate as those obtained by dynamic profiling, but can be generated without the overhead of dynamic profiling. Weights may be assigned to procedures or to the arcs between them in a call graph. A weight given to a procedure indicates how frequently that procedure is called, whereas a weight given to an arc A-B indicates how frequently procedure A calls procedure B.
One known method for reordering procedures in a computer program based on profile data is disclosed in U.S. Pat. No. 5,212,794 "Method for Optimizing Computer Code to Provide More Efficient Execution on Computers Having Cache Memories", issued May 13, 1993 to Pettis et al. and assigned to Hewlett-Packard Co. The applicable method is described in column 8, line 56, through column 10, line 48 of the Pettis et al. patent. The Pettis et al. method uses a coalescing approach to reorder procedures in a computer program.
Other methods for reordering procedures have also been developed. Several such methods are disclosed in the related co-pending patent application entitled "Method and Apparatus for Reordering Procedures in a Computer Program Based on Profile Data," Ser. No. 08/820,735, filed Mar. 19, 1997. In the first embodiment of this related patent application, procedures are reordered by constructing traces of maximal length in both directions from an initial block that is the block with the greatest weight in the call graph. In the second embodiment of this related patent application, procedures are reordered by constructing traces, adding predecessors and successors to the trace if they are perfect partners with the current block in the trace. In the third embodiment of the related patent application, when a new trace is started, the initial procedure is preferred to be a procedure that touches the last trace. The method for reordering procedures according to Pettis et al. and according to the methods disclosed in the related application cited above are all viable ways of reordering procedures within a computer program.
The remainder of this specification discloses an apparatus and various methods for reordering procedures in a module within a computer program and between modules in a computer program in a way that allows one module to be replaced without affecting the ordering of procedures in the remaining modules.
DETAILED DESCRIPTION
Profile data may be generated in any of the methods discussed above, or may be generated by new methods in the future. The present invention uses profile data in a call graph to reorder procedures in a portion of a computer program, without regard to how the profiling data was generated.
Referring to FIG. 1, a computer system 100 in accordance with the present invention is an enhanced IBM AS/400 mid-range computer system. However, those skilled in the art will appreciate that the mechanisms and apparatus of the present invention apply equally to any computer system, regardless of whether the computer system is a complicated multi-user computing apparatus or a single user device such as a personal computer or workstation. Computer system 100 suitably comprises a processor 110, main memory 120, a memory controller 130, an auxiliary storage interface 140, and a terminal interface 150, all of which are interconnected via a system bus 160. Note that various modifications, additions, or deletions may be made to the computer system 100 illustrated in FIG. 1 within the scope of the present invention such as the addition of cache memory or other peripheral devices; FIG. 1 is presented to simply illustrate some of the salient features of computer system 100.
Processor 110 performs computation and control functions of computer system 100, and comprises a suitable central processing unit. Processor 110 may comprise a single integrated circuit, such as a microprocessor, or may comprise any suitable number of integrated circuit devices and/or circuit boards working in cooperation to accomplish the functions of a processor. Processor 110 suitably executes a computer program 122 within main memory 120.
Auxiliary storage interface 140 is used to allow computer system 100 to store and retrieve information from auxiliary storage, such as magnetic disk (e.g., hard disks or floppy diskettes) or optical storage devices (e.g., CD-ROM). One suitable storage device is a direct access storage device (DASD) 170. As shown in FIG. 1, DASD 170 may be a floppy disk drive which may read programs and data from a floppy disk 180. Note that a modular reordering mechanism in accordance with the present invention may exist as a program product on one or more floppy disks 180. It is important to note that while the present invention has been (and will continue to be) described in the context of a fully functional computer system, those skilled in the art will appreciate that the mechanisms of the present invention are capable of being distributed as a program product in a variety of forms, and that the present invention applies equally regardless of the particular type of signal bearing media employed to actually carry out the distribution. Examples of signal bearing media include: recordable type media such as floppy disks (e.g., disk 180) and CD ROMS, and transmission type media such as digital and analog communication links.
Memory controller 130, through use of a processor separate from processor 110, is responsible for moving requested information from main memory 120 and/or through auxiliary storage interface 140 to processor 110. Memory controller 130 performs the functions of the memory paging system, swapping pages from DASD 170 to main memory 120 as required. While for the purposes of explanation, memory controller 130 is shown as a separate entity, those skilled in the art understand that, in practice, portions of the function provided by memory controller 130 may actually reside in the circuitry associated with processor 110, main memory 120, and/or auxiliary storage interface 140.
Terminal interface 150 allows system administrators and computer programmers to communicate with computer system 100, normally through programmable workstations. Although the system 100 depicted in FIG. 1 contains only a single main processor 110 and a single system bus 160, it should be understood that the present invention applies equally to computer systems having multiple processors and multiple system buses. Similarly, although the system bus 160 of the preferred embodiment is a typical hardwired, multidrop bus, any connection means that supports bi-directional communication could be used.
Main memory 120 contains one or more computer programs 122, an operating system 124, a global call graph 126, and a modular reordering mechanism 128. In the preferred embodiment, modular reordering mechanism 128 may be part of an optimizing compiler, may be part of a linker, or may be part of a separate post-pass processing program. Computer program 122 in memory 120 is used in its broadest sense, and includes any and all forms of computer programs, including source code, intermediate (i.e., relocatable) code, machine code, and any other representation of a computer program that has procedures that could be reordered. Global call graph 126 includes profile data that estimates execution frequencies of procedures within computer program 122. Global call graph 126 preferably includes all procedures within a computer program 122. Profile data within global call graph 126 may be generated using static analysis, dynamic profiling, or any other method for estimating execution frequencies for the procedures within computer program 122. Once generated, the profile data is stored in main memory 120 as part of the global call graph 126.
It should be understood that main memory 120 will not necessarily contain all parts of all mechanisms shown. For example, portions of computer program 122 and operating system 124 may be loaded into an instruction cache (not shown) for processor 110 to execute, while other files may well be stored on magnetic or optical disk storage devices (not shown). In addition, if modular reordering mechanism 128 is part of an optimizing compiler, the compiler may generate a machine code instruction stream from computer program 122 that is intended to be executed on a different computer system if the optimizing compiler is a cross-compiler. The methods and apparatus discussed herein apply to all types of compilers, including cross-compilers.
Referring to FIG. 2, modular procedure reordering mechanism 128 includes a call graph partitioner 210, a procedure reordering mechanism 220, and a module reordering mechanism 230. Call graph partitioner 210 includes an intramodular call graph generator 212 and an intermodular call graph generator 214. Intramodular call graph generator 212 generates one or more intramodular call graphs 222, preferably one intramodular call graph 222 per module in the computer program 122. Intermodular call graph generator 214 generates an intermodular call graph 234. The functions of these generators 212 and 214 is discussed in more detail below with reference to FIGS. 5 and 6.
Procedure reordering mechanism 220 takes profile data in an intramodular call graph 222 (produced by intramodular call graph generator 212) that corresponds to a selected module, and reorders procedures within the module. Procedure reordering mechanism 220 preferably reorders the procedures in each module within the computer program 122 (and therefore uses profile data in many intramodular call graphs), but it is also anticipated within the scope of the present invention that procedure reordering mechanism 220 may reorder the procedures only within selected modules. Any method may be used to reorder the procedures in a module within the scope of the invention. For example, the coalescing method that uses an undirected call graph as taught in the Pettis et al. patent (discussed in the Overview section) could be adapted to provide one suitable method for reordering procedures within a module, even though the Pettis et al. disclosure is specifically addressed to procedure reordering in the entire computer program by a linker. Other suitable examples of reordering methods are disclosed in the related co-pending patent application entitled "Method and Apparatus for Reordering Procedures in a Computer Program Based on Profile Data," Ser. No. 08/820,735, filed Mar. 19, 1997. Regardless of the particular reordering method used, whether now known or developed in the future, the present invention extends to any and all methods for reordering procedures within a selected module in a computer program using profile data in an intramodular call graph.
Each intramodular call graph 222 comprises a subgraph of the global call graph 126 that contains information for procedures within a single module. Each intramodular call graph 222 provides profile data regarding which procedures are called by other procedures within a module, but no calls outside of the module are represented. Intramodular call graph 222 thus provides profile data which allows procedure reordering mechanism 220 to appropriately reorder procedures within module boundaries.
Procedure reordering mechanism 220 is preferably implemented in a compiler or in a separate software tool that performs post-pass processing and optimization of the computer program. A compiler must order procedures in a module when it compiles the module, so optimizing the procedure ordering when the module is initially compiled saves the time and effort of reordering later. However, in some circumstances, a post-pass processing tool may be preferable for ordering procedures within modules. For example, if many different programmers are developing different modules of a computer program using different compilers, a post-pass processing tool could reorder the procedures within each module in one place, rather than building in a procedure reordering mechanism into each different compiler.
Module reordering mechanism 230 uses profile data in an intermodular call graph 234 (generated by intermodular call graph generator 214) that preferably includes all of the modules in computer program 122, but could include a subset of the modules as well. Module reordering mechanism 230 reorders modules within computer program 122 to determine the packaging order of the modules. Module reordering mechanism 230 preferably uses the same reordering method for reordering modules that procedure reordering mechanism 220 uses for reordering procedures. In the alternative, module reordering mechanism 230 may use a different reordering method.
Each intermodular call graph 234 comprises a subset of the global call graph 126 that contains profile data regarding which procedures in which modules are called by procedures in other modules. No information regarding procedure calls within a module is present, because this information was previously presented in the intramodular call graphs 222 corresponding to each module and taken into account when reordering the procedures within each module. Intermodular call graph 234 thus provides profile data which allows module reordering mechanism 230 to determine the best ordering of modules in the computer program based on the procedure calls between modules.
Module reordering mechanism 230 is preferably implemented in a linker or in a separate software tool that performs post-pass processing and optimization of the computer program. If performed by a linker, the reordering need not be performed again at a later stage. However, in certain circumstances (e.g., if it is not feasible to modify the linker because the linker code is inaccessible), module reordering by a post-pass processing tool may be the more attractive option.
The reordering mechanism 128 is modular because it divides the computer program up into modules, reorders procedures within those modules, then reorders the modules themselves. Thus, reordering is being performed at two distinct levels of granularity. When procedure reordering mechanism 220 and module reordering mechanism 230 use the same reordering method, it is possible to use similar code or even the same code to perform both reorderings. Note that this reordering scheme may be extended to finer or coarser levels of granularity within the scope of the present invention. For example, the intramodular call graph generator could generate call graphs consisting of multiple modules, rather than a single module, or could generate multiple call graphs from a single module. Those skilled in the art will recognize that these and other variations are possible within the scope of the present invention.
Referring to FIG. 3, a method 300 for modular reordering of program portions in a computer program begins by generating a global call graph 126 (step 310). The global call graph 126 is generated from profile data that estimates the execution frequency for procedures within computer program 122. As discussed above, this profile data may be statically or dynamically generated, and may be appropriately weighted to arrive at the estimates of execution frequencies. Method 300 then generates an intramodular call graph 222 for each module (step 320), and reorders the procedures within each module (step 330). Note that the reordering within each module respects the module boundaries. This assures that if a module needs to be replaced due to an enhancement or bug fix, it may be replaced without affecting procedures in other modules.
Method 300 then generates an intermodular call graph 234 (step 340), and reorders the modules within the computer program (step 350). The reordering of modules preferably includes all modules in the computer program in the preferred embodiment, but reordering fewer than all modules is also anticipated by the present invention.
In FIG. 3, the generation of the intramodular call graph (step 320) and the intermodular call graph (step 340) is shown following the generation of the global call graph (step 310). Note, however, that steps 320 and 340 may generate their respective call graphs directly from profile data, without first constructing a global call graph.
The steps of method 300 may best be understood with reference to the graphs of FIGS. 4-6. A sample global call graph 126 is shown in FIG. 4, which is generated in step 310 (FIG. 3). The computer program depicted by this global call graph 126 includes: a first module M1 that has four procedures M1.P1, M1.P2, M1.P3, and M1.P4; a second module M2 that includes two procedures M2.P1 and M2.P2; and a third module M3 that includes three procedures M3.P1, M3.P2, and M3.P3. An arc going from a first procedure to a second procedure depicts that the first procedure calls the second procedure. For example, in global call graph 126 of FIG. 4, M1.P1 calls M2.P1. The weights of the arcs represent the number of times the call is made, so M1.P1 calls M2.P1 20 times. The presence and weight of arcs in the global call graph 126 are derived from profile data that estimates execution frequencies for procedures within the computer program. The weights of procedures are estimates of the total number of times each procedure is called, which equals the sum of the weights of arcs leading to a procedure. For example, in global call graph 126 of FIG. 4, procedure M1.P4 has a weight of 30, which is the sum of the weights of arcs leading to that procedure.
Once global call graph 126 has been generated, the profile data therein can be used to generate an intramodular call graph for each module in the computer program (step 320). Call graph partitioner 210 uses its intramodular call graph generator 212 for this purpose. Referring now to FIG. 5, each intramodular call graph 222 includes information regarding all calls within a module, but calls across module boundaries are not represented or considered. Thus, when comparing module M1 in the global call graph 126 with module M1 in the intramodular call graph 222, we see that the calls among modules within module M1 are still present (namely, that M1.P1 calls M1.P2 16 times and that M 1.P2 calls M 1.P4 20 times). However, the calls across module boundaries (namely, from M1.P1 to M2.P1; from M1.P4 to M3.P2; from M1.P3 to M3.P1; and from M2.P2 to M1.P4) are not shown. Procedure reordering mechanism 220 thus operates on one module at a time to reorder the procedures within a selected module according to the profile data contained within the intramodular call graph 222 that corresponds to the selected module.
Profile data within global call graph 126 may also be used to generate intermodular call graph 234 of FIG. 6 (step 340). Call graph partitioner 210 uses its intermodular call graph generator 214 for this purpose. Intermodular call graph 234 represents all the calls between modules in global call graph 126, but calls entirely within each module are not represented. Thus, the call from M1.P1 to M2.P1 with a weight of 20 in global call graph 126 is represented in intermodular call graph 234 of FIG. 6 as an arc of weight 20 from M1 to M2. The call from M2.P2 to M1.P4 with a weight of 10 in global call graph 126 of FIG. 6 is represented by an arc of weight 10 from M2 to M1. Similarly, the call from M3.P2 to M2.P2 with a weight of 20 in global call graph 126 is represented in intermodular call graph 234 by an arc of weight 20 from M3 to M2. Note, however, that multiple arcs in the same direction between modules in the global call graph 126 are combined into one arc in the intermodular call graph 234. For example, the call from M1.P3 to M3.P1 with a weight of 50 is combined with the call from M1.P4 to M3.P2 with a weight of 100 to result in a single arc from M1 to M3 with a weight of 150. Once intermodular call graph 234 has been generated, module reordering mechanism 230 uses the profile data contained in intermodular call graph 234 to reorder the modules (step 350).
The global call graph 126 of FIG. 4, the intramodular call graphs 222 of FIG. 5, and the intermodular call graph 234 of FIG. 6 are shown to illustrate the general concepts of the invention. This specific example shows how the global call graph 126 may be easily partitioned into several intramodular call graphs 222 and one intermodular call graph 234. Those skilled in the art will recognize that many variations may be made within the scope of the invention. For example, each module could be further divided down into groups of procedures. In addition, different intermodular call graphs could be generated for different groups of modules. While the call graphs of FIGS. 4-6 are shown as directed call graphs, reordering using undirected call graphs is also within the scope of the invention. The present invention expressly encompasses any and all reordering of program portions within a computer program, regardless of the level of granularity.
The apparatus and methods disclosed herein provide a way to reorder portions of a computer program on a modular basis, thereby achieving enhanced performance through procedure reordering within modules and through module reordering while providing enhanced maintainability of the computer program by restricting procedure reordering to reordering within module boundaries. In this manner, the computer program may be easily maintained by replacing entire modules without adversely affecting the operation of modules in the computer program that do not change.
While the invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (42)

We claim:
1. An apparatus comprising:
at least one processor;
a memory coupled to the at least one processor;
a computer program residing in the memory, the computer program including at least one procedure within each of a plurality of modules;
profile data residing in the memory that represents the execution frequency of at least one procedure within at least one of the plurality of modules in the computer program;
a reordering mechanism residing in the memory and executed by the at least one processor, the reordering mechanism reordering a plurality of procedures within at least one of the plurality of modules based on a portion of the profile data corresponding to the plurality of procedures and reordering the plurality of modules based on a portion of the profile data corresponding to procedures within the plurality of modules.
2. The apparatus of claim 1 wherein the profile data comprises a global call graph for the computer program.
3. The apparatus of claim 1 wherein the reordering mechanism comprises a call graph partitioner for generating at least one intramodular call graph from the profile data and for generating at least one intermodular call graph from the profile data.
4. The apparatus of claim 3 wherein the reordering mechanism comprises a procedure reordering mechanism that reorders the plurality of procedures within a selected one of the plurality of modules based on the profile data in the intramodular call graph that corresponds to the selected module.
5. The apparatus of claim 3 wherein the reordering mechanism comprises a module reordering mechanism that reorders the plurality of modules based on the profile data in the intermodular call graph.
6. An apparatus comprising:
at least one processor;
a memory coupled to the at least one processor;
a computer program residing in the memory, the computer program including at least one procedure within each of a plurality of modules;
profile data residing in the memory that represents the execution frequency of at least one procedure within at least one of the plurality of modules in the computer program;
a reordering mechanism residing in the memory and executed by the at least one processor, the reordering mechanism generating an intramodular call graph from the profile data for a selected one of the plurality of modules in the computer program and reordering the procedures within the selected module based on the profile data within the intramodular call graph.
7. The apparatus of claim 6 wherein the profile data comprises a global call graph for the computer program.
8. The apparatus of claim 6 wherein the reordering mechanism further generates an intermodular call graph from the profile data and reorders the plurality of modules based on the profile data within the intermodular call graph.
9. An apparatus comprising:
at least one processor;
a memory coupled to the at least one processor;
a computer program residing in the memory, the computer program including at least one procedure within each of a plurality of modules;
profile data residing in the memory that represents the execution frequency of at least one procedure within at least one of the plurality of modules in the computer program;
a reordering mechanism residing in the memory and executed by the at least one processor, the reordering mechanism generating an intermodular call graph from the profile data and reordering the plurality of modules based on the profile data within the intermodular call graph.
10. The apparatus of claim 9 wherein the profile data comprises a global call graph for the computer program.
11. An apparatus comprising:
at least one processor;
a memory coupled to the at least one processor;
a computer program residing in the memory, the computer program including at least one procedure within each of a plurality of modules;
a global call graph residing in the memory that represents the execution frequency of a plurality of procedures within at least one of the plurality of modules in the computer program;
a reordering mechanism residing in the memory and executed by the at least one processor, the reordering mechanism comprising:
a call graph partitioner, the call graph partitioner generating an intramodular call graph and an intermodular call graph from the global call graph;
a procedure reordering mechanism, the procedure reordering mechanism reordering the plurality of procedures within a selected one of the plurality of modules based on profile data in the intramodular call graph that corresponds to the selected module; and
a module reordering mechanism, the module reordering mechanism reordering the plurality of modules based on profile data within the intermodular call graph.
12. A method for improving execution speed of a computer program on a computer apparatus by reordering procedures and modules in the computer program according to profile data that represents the execution frequency of at least one procedure within at least one of a plurality of modules in the computer program, the computer program including at least one procedure within each of the plurality of modules, the method comprising the steps of:
reordering a plurality of procedures within at least one of the plurality of modules based on a portion of the profile data corresponding to the plurality of procedures; and
reordering the plurality of modules based on a portion of the profile data corresponding to procedures within the plurality of modules.
13. The method of claim 12 wherein the step of reordering the plurality of procedures within at least one of the plurality of modules comprises the steps of:
generating an intramodular call graph from the profile data for a selected one of the plurality of modules in the computer program; and
reordering the procedures within the selected module based on the profile data within the intramodular call graph.
14. The method of claim 13 wherein the step of generating the intramodular call graph comprises the steps of:
generating a global call graph for the computer program from the profile data, the global call graph containing the plurality of modules that each contain at least one procedure; and
removing arcs in the global call graph that cross the boundaries of the selected module.
15. The method of claim 12 wherein the step of reordering the plurality of modules comprises the steps of:
generating an intermodular call graph from the profile data; and
reordering the plurality of modules based on the profile data within the intermodular call graph.
16. The method of claim 15 wherein the step of generating the intermodular call graph comprises the steps of:
generating a global call graph for the computer program from the profile data, the global call graph containing the plurality of modules that each contain at least one procedure;
removing the procedures within the modules of the global call graph;
retaining the arcs between modules; and
combining multiple arcs that span from a first module to a second module into a single arc with a weight equal to the sum of the weights of the multiple arcs.
17. The method of claim 12 further comprising the step of generating a global call graph from the profile data.
18. A method for improving execution speed of a computer program on a computer apparatus by reordering procedures in the computer program according to profile data that represents the execution frequency of at least one procedure within at least one of a plurality of modules in the computer program, the computer program including at least one procedure within each of the plurality of modules, the method comprising the steps of:
generating an intramodular call graph from the profile data for a selected one of the plurality of modules in the computer program; and
reordering the procedures within the selected module based on the profile data within the intramodular call graph.
19. The method of claim 18 wherein the step of generating the intramodular call graph comprises the steps of:
generating a global call graph for the computer program from the profile data, the global call graph containing the plurality of modules that each contain at least one procedure; and
removing arcs in the global call graph that cross the boundaries of the selected module.
20. The method of claim 18 further comprising the steps of:
generating an intermodular call graph from the profile data; and
reordering the plurality of modules based on the profile data within the intermodular call graph.
21. The method of claim 20 wherein the step of generating the intermodular call graph comprises the steps of:
generating a global call graph for the computer program from the profile data, the global call graph containing the plurality of modules that each contain at least one procedure;
removing the procedures within the modules of the global call graph;
retaining the arcs between modules; and
combining multiple arcs that span from a first module to a second module into a single arc with a weight equal to the sum of the weights of the multiple arcs.
22. The method of claim 18 further comprising the step of generating a global call graph from the profile data.
23. A method for improving execution speed of a computer program on a computer apparatus by reordering procedures in the computer program according to profile data that represents the execution frequency of at least one procedure within at least one of a plurality of modules in the computer program, the computer program including at least one procedure within each of the plurality of modules, the method comprising the steps of:
generating an intermodular call graph from the profile data; and
reordering the plurality of modules based on the profile data within the intermodular call graph.
24. The method of claim 23 wherein the step of generating the intermodular call graph comprises the steps of:
generating a global call graph for the computer program from the profile data, the global call graph containing the plurality of modules that each contain at least one procedure;
removing the procedures within the modules of the global call graph;
retaining the arcs between modules; and
combining multiple arcs that span from a first module to a second module into a single arc with a weight equal to the sum of the weights of the multiple arcs.
25. The method of claim 23 further comprising the step of generating a global call graph from the profile data.
26. A method for improving execution speed of a computer program on a computer apparatus by reordering procedures and modules in the computer program according to profile data, the computer program including at least one procedure within each of the plurality of modules, the method comprising the steps of:
generating a global call graph for the computer program from the profile data;
generating an intramodular call graph from the global call graph for each of the plurality of modules in the computer program;
reordering the procedures within each of the plurality of modules based on profile data within the corresponding intramodular call graph;
generating an intermodular call graph from the global call graph; and
reordering the plurality of modules based on profile data within the intermodular call graph.
27. A program product comprising:
(A) a reordering mechanism that reorders a plurality of procedures within at least one of a plurality of modules in a computer program based on profile data corresponding to the plurality of procedures and that reorders the plurality of modules based on profile data corresponding to procedures within the plurality of modules; and
(B) signal bearing media bearing the reordering mechanism.
28. The program product of claim 27 wherein the signal bearing media comprises recordable media.
29. The program product of claim 27 wherein the signal bearing media comprises transmission media.
30. The program product of claim 27 wherein the profile data comprises a global call graph for the computer program.
31. The program product of claim 27 wherein the reordering mechanism comprises a call graph partitioner for generating at least one intramodular call graph from the profile data and for generating at least one intermodular call graph from the profile data.
32. The program product of claim 31 wherein the reordering mechanism comprises a procedure reordering mechanism that reorders the plurality of procedures within a selected one of the plurality of modules based on the profile data in the intramodular call graph that corresponds to the selected module.
33. The program product of claim 31 wherein the reordering mechanism comprises a module reordering mechanism that reorders the plurality of modules based on the profile data in the intermodular call graph.
34. A program product comprising:
(A) a reordering mechanism that generates an intramodular call graph from profile data for a selected one of the plurality of modules in a computer program and that reorders the procedures within the selected module based on the profile data within the intramodular call graph; and
(B) signal bearing media bearing the reordering mechanism.
35. The program product of claim 34 wherein the signal bearing media comprises recordable media.
36. The program product of claim 34 wherein the signal bearing media comprises transmission media.
37. The program product of claim 34 wherein the profile data comprises a global call graph for the computer program.
38. The program product of claim 34 wherein the reordering mechanism generates an intermodular call graph from the profile data and reorders the plurality of modules based on the profile data within the intermodular call graph.
39. A program product comprising:
(A) a reordering mechanism for generating an intermodular call graph from profile data relating to a computer program, the reordering mechanism reordering a plurality of modules within a computer program based on the profile data within the intermodular call graph; and
(B) signal bearing media bearing the reordering mechanism.
40. The program product of claim 39 wherein the signal bearing media comprises recordable media.
41. The program product of claim 39 wherein the signal bearing media comprises transmission media.
42. The apparatus of claim 39 wherein the profile data comprises a global call graph for the computer program.
US08/819,526 1997-03-17 1997-03-17 Method and apparatus for modular reordering of portions of a computer program based on profile data Expired - Fee Related US6029004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/819,526 US6029004A (en) 1997-03-17 1997-03-17 Method and apparatus for modular reordering of portions of a computer program based on profile data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/819,526 US6029004A (en) 1997-03-17 1997-03-17 Method and apparatus for modular reordering of portions of a computer program based on profile data

Publications (1)

Publication Number Publication Date
US6029004A true US6029004A (en) 2000-02-22

Family

ID=25228391

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/819,526 Expired - Fee Related US6029004A (en) 1997-03-17 1997-03-17 Method and apparatus for modular reordering of portions of a computer program based on profile data

Country Status (1)

Country Link
US (1) US6029004A (en)

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175956B1 (en) * 1998-07-15 2001-01-16 International Business Machines Corporation Method and computer program product for implementing method calls in a computer system
US6418530B2 (en) * 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6453411B1 (en) * 1999-02-18 2002-09-17 Hewlett-Packard Company System and method using a hardware embedded run-time optimizer
US6532532B1 (en) * 1998-12-19 2003-03-11 International Computers Limited Instruction execution mechanism
US20030097652A1 (en) * 2001-11-19 2003-05-22 International Business Machines Corporation Compiler apparatus and method for optimizing loops in a computer program
US20040015927A1 (en) * 2001-03-23 2004-01-22 International Business Machines Corporation Percolating hot function store/restores to colder calling functions
US6763452B1 (en) 1999-01-28 2004-07-13 Ati International Srl Modifying program execution based on profiling
US6779107B1 (en) 1999-05-28 2004-08-17 Ati International Srl Computer execution by opportunistic adaptation
US6789181B1 (en) 1999-01-28 2004-09-07 Ati International, Srl Safety net paradigm for managing two computer execution modes
US6848100B1 (en) * 2000-03-31 2005-01-25 Intel Corporation Hierarchical software path profiling
US20050086451A1 (en) * 1999-01-28 2005-04-21 Ati International Srl Table look-up for control of instruction execution
US20050091003A1 (en) * 2003-10-22 2005-04-28 Yuh-Cherng Wu Computer system diagnostic procedures performed in specified order
US20050097397A1 (en) * 2003-10-21 2005-05-05 Yuh-Cherng Wu Computer system diagnosis with user-developed procedure
US20050097400A1 (en) * 2003-10-21 2005-05-05 Yuh-Cherng Wu Failures of computer system diagnostic procedures addressed in specified order
US6938075B1 (en) 1998-12-24 2005-08-30 Computer Associates Think, Inc. Method and apparatus for hierarchical software distribution packages including composite packages
US6954923B1 (en) 1999-01-28 2005-10-11 Ati International Srl Recording classification of instructions executed by a computer
US20060020918A1 (en) * 2004-07-09 2006-01-26 David Mosberger Determining call counts in a program
US20060053414A1 (en) * 2004-09-09 2006-03-09 International Business Machines Corporation Generating sequence diagrams using call trees
US7036118B1 (en) * 2001-12-20 2006-04-25 Mindspeed Technologies, Inc. System for executing computer programs on a limited-memory computing machine
US20060129997A1 (en) * 2004-12-13 2006-06-15 Stichnoth James M Optimized layout for managed runtime environment
US20060225056A1 (en) * 2005-04-05 2006-10-05 Cisco Technology, Inc. Method and system for analyzing source code
US7120906B1 (en) * 2000-04-28 2006-10-10 Silicon Graphics, Inc. Method and computer program product for precise feedback data generation and updating for compile-time optimizations
US20070028226A1 (en) * 2000-11-17 2007-02-01 Shao-Chun Chen Pattern detection preprocessor in an electronic device update generation system
US20070050762A1 (en) * 2004-04-06 2007-03-01 Shao-Chun Chen Build optimizer tool for efficient management of software builds for mobile devices
US20070169028A1 (en) * 2005-12-15 2007-07-19 Glenn Kasten Partitioning of non-volatile memories for vectorization
US20070283328A1 (en) * 2006-05-31 2007-12-06 Taimur Javed Computer code partitioning for enhanced performance
US20080034349A1 (en) * 2006-08-04 2008-02-07 Microsoft Corporation Incremental program modification based on usage data
US20080184209A1 (en) * 2007-01-31 2008-07-31 Lafrance-Linden David Profiling metrics for computer programs
US20080216073A1 (en) * 1999-01-28 2008-09-04 Ati International Srl Apparatus for executing programs for a first computer architechture on a computer of a second architechture
US7685565B1 (en) 2009-03-19 2010-03-23 International Business Machines Corporation Run time reconfiguration of computer instructions
US20100082688A1 (en) * 2008-09-30 2010-04-01 Yahoo! Inc. System and method for reporting and analysis of media consumption data
US20100175053A1 (en) * 2007-06-21 2010-07-08 Nxp B.V. Device and a method of managing a plurality of software items
US20100242019A1 (en) * 2009-03-23 2010-09-23 Dany Moshkovich Semantic Intensity Based Decomposition of Software Systems
US7941647B2 (en) 1999-01-28 2011-05-10 Ati Technologies Ulc Computer for executing two instruction sets and adds a macroinstruction end marker for performing iterations after loop termination
US8074055B1 (en) 1999-01-28 2011-12-06 Ati Technologies Ulc Altering data storage conventions of a processor when execution flows from first architecture code to second architecture code
US8312417B2 (en) 2007-05-18 2012-11-13 International Business Machines Corporation Using dynamic call graphs for creating state machines
US8468515B2 (en) 2000-11-17 2013-06-18 Hewlett-Packard Development Company, L.P. Initialization and update of software and/or firmware in electronic devices
US8526940B1 (en) 2004-08-17 2013-09-03 Palm, Inc. Centralized rules repository for smart phone customer care
US8555273B1 (en) 2003-09-17 2013-10-08 Palm. Inc. Network for updating electronic devices
US8578361B2 (en) 2004-04-21 2013-11-05 Palm, Inc. Updating an electronic device with update agent code
US8752044B2 (en) 2006-07-27 2014-06-10 Qualcomm Incorporated User experience and dependency management in a mobile device
US8893110B2 (en) 2006-06-08 2014-11-18 Qualcomm Incorporated Device management in a network
US9250895B2 (en) * 2014-06-24 2016-02-02 International Business Machines Corporation Establishing subsystem boundaries based on call flow graph topology
US9292419B1 (en) * 2013-06-04 2016-03-22 The Mathworks, Inc. Code coverage and confidence determination

Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4847755A (en) * 1985-10-31 1989-07-11 Mcc Development, Ltd. Parallel processing method and apparatus for increasing processing throughout by parallel processing low level instructions having natural concurrencies
US4914590A (en) * 1988-05-18 1990-04-03 Emhart Industries, Inc. Natural language understanding system
US4947315A (en) * 1986-12-03 1990-08-07 Finnigan Corporation System for controlling instrument using a levels data structure and concurrently running compiler task and operator task
US5014185A (en) * 1988-04-27 1991-05-07 Japan Tobacco, Inc. Loop control apparatus
US5021945A (en) * 1985-10-31 1991-06-04 Mcc Development, Ltd. Parallel processor system for processing natural concurrencies and method therefor
US5179703A (en) * 1987-11-17 1993-01-12 International Business Machines Corporation Dynamically adaptive environment for computer programs
US5193180A (en) * 1991-06-21 1993-03-09 Pure Software Inc. System for modifying relocatable object code files to monitor accesses to dynamically allocated memory
US5212794A (en) * 1990-06-01 1993-05-18 Hewlett-Packard Company Method for optimizing computer code to provide more efficient execution on computers having cache memories
US5265254A (en) * 1991-08-14 1993-11-23 Hewlett-Packard Company System of debugging software through use of code markers inserted into spaces in the source code during and after compilation
US5333304A (en) * 1991-05-03 1994-07-26 International Business Machines Corporation Method and apparatus for software application evaluation utilizing compiler applications
US5355487A (en) * 1991-02-28 1994-10-11 International Business Machines Corporation Non-invasive trace-driven system and method for computer system profiling
US5412799A (en) * 1990-02-27 1995-05-02 Massachusetts Institute Of Technology Efficient data processor instrumentation for systematic program debugging and development
US5428782A (en) * 1989-09-28 1995-06-27 Texas Instruments Incorporated Portable and dynamic distributed applications architecture
US5428793A (en) * 1989-11-13 1995-06-27 Hewlett-Packard Company Method and apparatus for compiling computer programs with interproceduural register allocation
US5450586A (en) * 1991-08-14 1995-09-12 Hewlett-Packard Company System for analyzing and debugging embedded software through dynamic and interactive use of code markers
US5465258A (en) * 1989-11-13 1995-11-07 Integrity Systems, Inc. Binary image performance evaluation tool
US5485616A (en) * 1993-10-12 1996-01-16 International Business Machines Corporation Using program call graphs to determine the maximum fixed point solution of interprocedural bidirectional data flow problems in a compiler
US5522036A (en) * 1993-05-10 1996-05-28 Benjamin V. Shapiro Method and apparatus for the automatic analysis of computer software
US5539907A (en) * 1994-03-01 1996-07-23 Digital Equipment Corporation System for monitoring computer system performance
US5606698A (en) * 1993-04-26 1997-02-25 Cadence Design Systems, Inc. Method for deriving optimal code schedule sequences from synchronous dataflow graphs
US5689712A (en) * 1994-07-27 1997-11-18 International Business Machines Corporation Profile-based optimizing postprocessors for data references
US5787284A (en) * 1995-12-28 1998-07-28 International Business Machines Corporation Improving memory layout based on connectivity considerations
US5797013A (en) * 1995-11-29 1998-08-18 Hewlett-Packard Company Intelligent loop unrolling
US5797012A (en) * 1995-12-28 1998-08-18 International Business Machines Corporation Connectivity based program partitioning
US5812854A (en) * 1996-03-18 1998-09-22 International Business Machines Corporation Mechanism for integrating user-defined instructions with compiler-generated instructions and for optimizing the integrated instruction stream

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5021945A (en) * 1985-10-31 1991-06-04 Mcc Development, Ltd. Parallel processor system for processing natural concurrencies and method therefor
US4847755A (en) * 1985-10-31 1989-07-11 Mcc Development, Ltd. Parallel processing method and apparatus for increasing processing throughout by parallel processing low level instructions having natural concurrencies
US5517628A (en) * 1985-10-31 1996-05-14 Biax Corporation Computer with instructions that use an address field to select among multiple condition code registers
US4947315A (en) * 1986-12-03 1990-08-07 Finnigan Corporation System for controlling instrument using a levels data structure and concurrently running compiler task and operator task
US5179703A (en) * 1987-11-17 1993-01-12 International Business Machines Corporation Dynamically adaptive environment for computer programs
US5014185A (en) * 1988-04-27 1991-05-07 Japan Tobacco, Inc. Loop control apparatus
US4914590A (en) * 1988-05-18 1990-04-03 Emhart Industries, Inc. Natural language understanding system
US5428782A (en) * 1989-09-28 1995-06-27 Texas Instruments Incorporated Portable and dynamic distributed applications architecture
US5555417A (en) * 1989-11-13 1996-09-10 Hewlett-Packard Company Method and apparatus for compiling computer programs with interprocedural register allocation
US5465258A (en) * 1989-11-13 1995-11-07 Integrity Systems, Inc. Binary image performance evaluation tool
US5428793A (en) * 1989-11-13 1995-06-27 Hewlett-Packard Company Method and apparatus for compiling computer programs with interproceduural register allocation
US5412799A (en) * 1990-02-27 1995-05-02 Massachusetts Institute Of Technology Efficient data processor instrumentation for systematic program debugging and development
US5212794A (en) * 1990-06-01 1993-05-18 Hewlett-Packard Company Method for optimizing computer code to provide more efficient execution on computers having cache memories
US5355487A (en) * 1991-02-28 1994-10-11 International Business Machines Corporation Non-invasive trace-driven system and method for computer system profiling
US5333304A (en) * 1991-05-03 1994-07-26 International Business Machines Corporation Method and apparatus for software application evaluation utilizing compiler applications
US5335344A (en) * 1991-06-21 1994-08-02 Pure Software Inc. Method for inserting new machine instructions into preexisting machine code to monitor preexisting machine access to memory
US5535329A (en) * 1991-06-21 1996-07-09 Pure Software, Inc. Method and apparatus for modifying relocatable object code files and monitoring programs
US5193180A (en) * 1991-06-21 1993-03-09 Pure Software Inc. System for modifying relocatable object code files to monitor accesses to dynamically allocated memory
US5450586A (en) * 1991-08-14 1995-09-12 Hewlett-Packard Company System for analyzing and debugging embedded software through dynamic and interactive use of code markers
US5265254A (en) * 1991-08-14 1993-11-23 Hewlett-Packard Company System of debugging software through use of code markers inserted into spaces in the source code during and after compilation
US5606698A (en) * 1993-04-26 1997-02-25 Cadence Design Systems, Inc. Method for deriving optimal code schedule sequences from synchronous dataflow graphs
US5522036A (en) * 1993-05-10 1996-05-28 Benjamin V. Shapiro Method and apparatus for the automatic analysis of computer software
US5485616A (en) * 1993-10-12 1996-01-16 International Business Machines Corporation Using program call graphs to determine the maximum fixed point solution of interprocedural bidirectional data flow problems in a compiler
US5539907A (en) * 1994-03-01 1996-07-23 Digital Equipment Corporation System for monitoring computer system performance
US5689712A (en) * 1994-07-27 1997-11-18 International Business Machines Corporation Profile-based optimizing postprocessors for data references
US5797013A (en) * 1995-11-29 1998-08-18 Hewlett-Packard Company Intelligent loop unrolling
US5787284A (en) * 1995-12-28 1998-07-28 International Business Machines Corporation Improving memory layout based on connectivity considerations
US5797012A (en) * 1995-12-28 1998-08-18 International Business Machines Corporation Connectivity based program partitioning
US5812854A (en) * 1996-03-18 1998-09-22 International Business Machines Corporation Mechanism for integrating user-defined instructions with compiler-generated instructions and for optimizing the integrated instruction stream

Non-Patent Citations (18)

* Cited by examiner, † Cited by third party
Title
"Program Restructing Technique for Improving Memory Management Performance", IBM Technical Disclosure Bulletin, vol. 39, No. 03, Mar. 1996, pp. 203-205.
"Statics Gathering and Analyzing Tool for Open Software Foundation's Distributed Computing Environment", IBM Technical Disclosure Bulletin, vol. 37, No. 02B, Feb. 1994, pp. 215-217.
Balasa, F., et al., "Transformation of Nested Loops with Modulo Indexing to Affine Recurrences", Parallel Processing Letters, vol. 4, No. 3 (Sep. 1994), pp. 271-280.
Balasa, F., et al., Transformation of Nested Loops with Modulo Indexing to Affine Recurrences , Parallel Processing Letters, vol. 4, No. 3 (Sep. 1994), pp. 271 280. *
Conte, T.M., et al., "Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization", International Journal of Parallel Programming, vol. 24, No. 2, Apr. 1996, pp. 187-206.
Conte, T.M., et al., "Using Branch Handling Hardware to Support Profile-Driven Optimization", International Symposium on Microarchitecture, 27th, Nov. 30-Dec. 2, 1994, pp. 12-21.
Conte, T.M., et al., Hardware Based Profiling: An Effective Technique for Profile Driven Optimization , International Journal of Parallel Programming, vol. 24, No. 2, Apr. 1996, pp. 187 206. *
Conte, T.M., et al., Using Branch Handling Hardware to Support Profile Driven Optimization , International Symposium on Microarchitecture, 27th, Nov. 30 Dec. 2, 1994, pp. 12 21. *
Kishon, A., et al., "Semantics Directed Execution Monitoring", J. Functional Programming, vol. 5, No. 4, Oct. 1995, pp. 501-547.
Kishon, A., et al., Semantics Directed Execution Monitoring , J. Functional Programming, vol. 5, No. 4, Oct. 1995, pp. 501 547. *
Pettis and Hansen, "Profile Guided Code Positioning", Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, Jun. 20-22, 1990, pp. 16-27.
Pettis and Hansen, Profile Guided Code Positioning , Proceedings of the ACM SIGPLAN 90 Conference on Programming Language Design and Implementation, Jun. 20 22, 1990, pp. 16 27. *
Program Restructing Technique for Improving Memory Management Performance , IBM Technical Disclosure Bulletin, vol. 39, No. 03, Mar. 1996, pp. 203 205. *
Speer, S.E., et al., "Improving UNIX Kernel Performance using Profile Based Optimization", 1994 Winter USENIX, Jan. 17-21, pp. 181-188.
Speer, S.E., et al., Improving UNIX Kernel Performance using Profile Based Optimization , 1994 Winter USENIX, Jan. 17 21, pp. 181 188. *
Statics Gathering and Analyzing Tool for Open Software Foundation s Distributed Computing Environment , IBM Technical Disclosure Bulletin, vol. 37, No. 02B, Feb. 1994, pp. 215 217. *
Youfeng, W, et al., "Static Branch Frequency and Program Profile Analysis", International Symposium on Microarchitecture, 27th, Nov. 30-Dec. 2, 1994, pp. 1-11.
Youfeng, W, et al., Static Branch Frequency and Program Profile Analysis , International Symposium on Microarchitecture, 27th, Nov. 30 Dec. 2, 1994, pp. 1 11. *

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6175956B1 (en) * 1998-07-15 2001-01-16 International Business Machines Corporation Method and computer program product for implementing method calls in a computer system
US6532532B1 (en) * 1998-12-19 2003-03-11 International Computers Limited Instruction execution mechanism
US6938075B1 (en) 1998-12-24 2005-08-30 Computer Associates Think, Inc. Method and apparatus for hierarchical software distribution packages including composite packages
US8788792B2 (en) 1999-01-28 2014-07-22 Ati Technologies Ulc Apparatus for executing programs for a first computer architecture on a computer of a second architecture
US6789181B1 (en) 1999-01-28 2004-09-07 Ati International, Srl Safety net paradigm for managing two computer execution modes
US7941647B2 (en) 1999-01-28 2011-05-10 Ati Technologies Ulc Computer for executing two instruction sets and adds a macroinstruction end marker for performing iterations after loop termination
US8065504B2 (en) 1999-01-28 2011-11-22 Ati International Srl Using on-chip and off-chip look-up tables indexed by instruction address to control instruction execution in a processor
US8074055B1 (en) 1999-01-28 2011-12-06 Ati Technologies Ulc Altering data storage conventions of a processor when execution flows from first architecture code to second architecture code
US8121828B2 (en) 1999-01-28 2012-02-21 Ati Technologies Ulc Detecting conditions for transfer of execution from one computer instruction stream to another and executing transfer on satisfaction of the conditions
US8127121B2 (en) 1999-01-28 2012-02-28 Ati Technologies Ulc Apparatus for executing programs for a first computer architechture on a computer of a second architechture
US20080216073A1 (en) * 1999-01-28 2008-09-04 Ati International Srl Apparatus for executing programs for a first computer architechture on a computer of a second architechture
US6826748B1 (en) * 1999-01-28 2004-11-30 Ati International Srl Profiling program execution into registers of a computer
US6763452B1 (en) 1999-01-28 2004-07-13 Ati International Srl Modifying program execution based on profiling
US20050086451A1 (en) * 1999-01-28 2005-04-21 Ati International Srl Table look-up for control of instruction execution
US20050086650A1 (en) * 1999-01-28 2005-04-21 Ati International Srl Transferring execution from one instruction stream to another
US6954923B1 (en) 1999-01-28 2005-10-11 Ati International Srl Recording classification of instructions executed by a computer
US6418530B2 (en) * 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6647491B2 (en) 1999-02-18 2003-11-11 Hewlett-Packard Development Company, L.P. Hardware/software system for profiling instructions and selecting a trace using branch history information for branch predictions
US6453411B1 (en) * 1999-02-18 2002-09-17 Hewlett-Packard Company System and method using a hardware embedded run-time optimizer
US20030005271A1 (en) * 1999-02-18 2003-01-02 Hsu Wei C. System and method using a hardware embedded run-time optimizer
US6779107B1 (en) 1999-05-28 2004-08-17 Ati International Srl Computer execution by opportunistic adaptation
US6848100B1 (en) * 2000-03-31 2005-01-25 Intel Corporation Hierarchical software path profiling
US7120906B1 (en) * 2000-04-28 2006-10-10 Silicon Graphics, Inc. Method and computer program product for precise feedback data generation and updating for compile-time optimizations
US8468515B2 (en) 2000-11-17 2013-06-18 Hewlett-Packard Development Company, L.P. Initialization and update of software and/or firmware in electronic devices
US20070028226A1 (en) * 2000-11-17 2007-02-01 Shao-Chun Chen Pattern detection preprocessor in an electronic device update generation system
US8479189B2 (en) 2000-11-17 2013-07-02 Hewlett-Packard Development Company, L.P. Pattern detection preprocessor in an electronic device update generation system
US20040015927A1 (en) * 2001-03-23 2004-01-22 International Business Machines Corporation Percolating hot function store/restores to colder calling functions
US7036116B2 (en) * 2001-03-23 2006-04-25 International Business Machines Corporation Percolating hot function store/restores to colder calling functions
US6938249B2 (en) * 2001-11-19 2005-08-30 International Business Machines Corporation Compiler apparatus and method for optimizing loops in a computer program
US20030097652A1 (en) * 2001-11-19 2003-05-22 International Business Machines Corporation Compiler apparatus and method for optimizing loops in a computer program
US7036118B1 (en) * 2001-12-20 2006-04-25 Mindspeed Technologies, Inc. System for executing computer programs on a limited-memory computing machine
US8555273B1 (en) 2003-09-17 2013-10-08 Palm. Inc. Network for updating electronic devices
US7260744B2 (en) * 2003-10-21 2007-08-21 Sap Aktiengesellschaft Computer system diagnosis with user-developed procedure
US7263634B2 (en) 2003-10-21 2007-08-28 Sap Aktiengesellschaft Failures of computer system diagnostic procedures addressed in specified order
US20050097397A1 (en) * 2003-10-21 2005-05-05 Yuh-Cherng Wu Computer system diagnosis with user-developed procedure
US20050097400A1 (en) * 2003-10-21 2005-05-05 Yuh-Cherng Wu Failures of computer system diagnostic procedures addressed in specified order
US20050091003A1 (en) * 2003-10-22 2005-04-28 Yuh-Cherng Wu Computer system diagnostic procedures performed in specified order
US7260750B2 (en) 2003-10-22 2007-08-21 Sap Aktiengesellschaft Computer system diagnostic procedures performed in specified order
US7694291B2 (en) * 2004-04-06 2010-04-06 Hewlett-Packard Development Company, L.P. Build optimizer tool for efficient management of software builds for mobile devices
US20070050762A1 (en) * 2004-04-06 2007-03-01 Shao-Chun Chen Build optimizer tool for efficient management of software builds for mobile devices
US8578361B2 (en) 2004-04-21 2013-11-05 Palm, Inc. Updating an electronic device with update agent code
US8214819B2 (en) * 2004-07-09 2012-07-03 Hewlett-Packard Development Company, L.P. Determining call counts in a program
US20060020918A1 (en) * 2004-07-09 2006-01-26 David Mosberger Determining call counts in a program
US8526940B1 (en) 2004-08-17 2013-09-03 Palm, Inc. Centralized rules repository for smart phone customer care
US8141073B2 (en) 2004-09-09 2012-03-20 International Business Machines Corporation Generating sequence diagrams using call trees
US20080196011A1 (en) * 2004-09-09 2008-08-14 Kapil Bhandari Generating sequence diagrams using call trees
US8171449B2 (en) 2004-09-09 2012-05-01 International Business Machines Corporation Generating sequence diagrams using call trees
US8146055B2 (en) 2004-09-09 2012-03-27 International Business Machines Corporation Generating sequence diagrams using call trees
US20080235666A1 (en) * 2004-09-09 2008-09-25 International Business Machines Corporation Generating sequence diagrams using call trees
US20060053414A1 (en) * 2004-09-09 2006-03-09 International Business Machines Corporation Generating sequence diagrams using call trees
US7506320B2 (en) * 2004-09-09 2009-03-17 International Business Machines Corporation Generating sequence diagrams using call trees
US20090119650A1 (en) * 2004-09-09 2009-05-07 International Business Machines Corporation Generating sequence diagrams using call trees
US20060129997A1 (en) * 2004-12-13 2006-06-15 Stichnoth James M Optimized layout for managed runtime environment
US7661097B2 (en) * 2005-04-05 2010-02-09 Cisco Technology, Inc. Method and system for analyzing source code
US20060225056A1 (en) * 2005-04-05 2006-10-05 Cisco Technology, Inc. Method and system for analyzing source code
US20070169028A1 (en) * 2005-12-15 2007-07-19 Glenn Kasten Partitioning of non-volatile memories for vectorization
US20100306754A1 (en) * 2006-05-31 2010-12-02 International Business Machines Corporation Code partitioning for enhanced performance
US20070283328A1 (en) * 2006-05-31 2007-12-06 Taimur Javed Computer code partitioning for enhanced performance
US9600305B2 (en) * 2006-05-31 2017-03-21 International Business Machines Corporation Code partitioning for enhanced performance
US8893110B2 (en) 2006-06-08 2014-11-18 Qualcomm Incorporated Device management in a network
US8752044B2 (en) 2006-07-27 2014-06-10 Qualcomm Incorporated User experience and dependency management in a mobile device
US9081638B2 (en) 2006-07-27 2015-07-14 Qualcomm Incorporated User experience and dependency management in a mobile device
US20080034349A1 (en) * 2006-08-04 2008-02-07 Microsoft Corporation Incremental program modification based on usage data
US20080184209A1 (en) * 2007-01-31 2008-07-31 Lafrance-Linden David Profiling metrics for computer programs
US8739143B2 (en) * 2007-01-31 2014-05-27 Hewlett-Packard Development Company, L.P. Profiling metrics for computer programs
US8312417B2 (en) 2007-05-18 2012-11-13 International Business Machines Corporation Using dynamic call graphs for creating state machines
US8407676B2 (en) 2007-06-21 2013-03-26 Nxp B.V. Device and a method of managing a plurality of software items
US20100175053A1 (en) * 2007-06-21 2010-07-08 Nxp B.V. Device and a method of managing a plurality of software items
US20100082688A1 (en) * 2008-09-30 2010-04-01 Yahoo! Inc. System and method for reporting and analysis of media consumption data
US9600484B2 (en) * 2008-09-30 2017-03-21 Excalibur Ip, Llc System and method for reporting and analysis of media consumption data
US7685565B1 (en) 2009-03-19 2010-03-23 International Business Machines Corporation Run time reconfiguration of computer instructions
US8769515B2 (en) * 2009-03-23 2014-07-01 International Business Machines Corporation Semantic intensity based decomposition of software systems
US20100242019A1 (en) * 2009-03-23 2010-09-23 Dany Moshkovich Semantic Intensity Based Decomposition of Software Systems
US9292419B1 (en) * 2013-06-04 2016-03-22 The Mathworks, Inc. Code coverage and confidence determination
US9250895B2 (en) * 2014-06-24 2016-02-02 International Business Machines Corporation Establishing subsystem boundaries based on call flow graph topology

Similar Documents

Publication Publication Date Title
US6029004A (en) Method and apparatus for modular reordering of portions of a computer program based on profile data
US5960198A (en) Software profiler with runtime control to enable and disable instrumented executable
US5950009A (en) Method and apparatus for profile-based reordering of program portions in a computer program
US6026234A (en) Method and apparatus for profiling indirect procedure calls in a computer program
US8037465B2 (en) Thread-data affinity optimization using compiler
US7992141B2 (en) Method and apparatus for building executable computer programs using compiled program libraries
JP2777496B2 (en) Uses when profiling multi-processes in computer systems
Torrellas et al. Optimizing instruction cache performance for operating system intensive workloads
US20060048114A1 (en) Method and apparatus for dynamic compilation of selective code blocks of computer programming code to different memory locations
US9495136B2 (en) Using aliasing information for dynamic binary optimization
Ferrari The improvement of program behavior
US20120198427A1 (en) Ensuring Register Availability for Dynamic Binary Optimization
US7412369B1 (en) System and method for designing and optimizing the memory of an embedded processing system
Steenkiste et al. Lisp on a reduced-instruction-set processor: Characterization and optimization
US6360360B1 (en) Object-oriented compiler mechanism for automatically selecting among multiple implementations of objects
US6249912B1 (en) Method and apparatus for determining most recently used methods
Mueller et al. Fast instruction cache analysis via static cache simulation
US7143404B2 (en) Profile-guided data layout
JPH09212369A (en) Storage area allocation device
Gao et al. Reducing overheads for acquiring dynamic memory traces
Boothe Fast accurate simulation of large shared memory multiprocessors
Clémençon et al. Application-driven development of an integrated tool environment for distributed memory parallel processors
Badouel et al. Svmview: a performance tuning tool for dsm-based parallel computers
Rauchwerger et al. SmartApps: An application centric approach to high performance computing
Nakazawa et al. The MHETA execution model for heterogeneous clusters

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BORTNIKOV, VITA;MENDELSON, BILHA;NOVICK, MARK;AND OTHERS;REEL/FRAME:008655/0285;SIGNING DATES FROM 19970313 TO 19970805

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
FP Expired due to failure to pay maintenance fee

Effective date: 20040222

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362