US7310799B2 - Reducing load instructions via global data reordering - Google Patents

Reducing load instructions via global data reordering Download PDF

Info

Publication number
US7310799B2
US7310799B2 US10/335,356 US33535602A US7310799B2 US 7310799 B2 US7310799 B2 US 7310799B2 US 33535602 A US33535602 A US 33535602A US 7310799 B2 US7310799 B2 US 7310799B2
Authority
US
United States
Prior art keywords
toc
global
global variables
addresses
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US10/335,356
Other versions
US20040128662A1 (en
Inventor
Vadim Eisenberg
Maxim Gurevich
Gad Haber
Moshe Klausner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US10/335,356 priority Critical patent/US7310799B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EISENBERG, VADIM, GUREVICH, MAXIM, HABER, GAD, KLAUSNER, MOSHE
Publication of US20040128662A1 publication Critical patent/US20040128662A1/en
Application granted granted Critical
Publication of US7310799B2 publication Critical patent/US7310799B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • G06F8/4442Reducing the number of cache misses; Data prefetching

Definitions

  • the present invention relates to global data areas in general, and more specifically to reordering and optimization of global data areas.
  • Calder et al. presents a method to improve a program's locality using data placement.
  • Calder et al discuss the use of all data variables types: global, local (stack), and dynamic (heap) variables, as well as constants.
  • the data reordering of Calder et al. is based on two kinds of feedback profiles.
  • the first profile lists each object encountered during execution, the object's name, reference count, size and lifetime information.
  • the second profile is a temporal relationship graph (TRG) between different variables accessed by the application.
  • the nodes of the TRG graph are variables, while an edge between two variables provides an estimation of the number of cache conflicts that would arise if these two variables were overlapped in the same cache line.
  • Preferred embodiments of the present invention may therefore globally reorder the global data area such that a substantially maximum number of load instructions that reference global variables via a table of variable addresses, known as a Table of Contents (TOC) may be replaced with add immediate instructions.
  • TOC Table of Contents
  • the method includes reordering a global data area of a program and for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from a TOC anchor, replacing the load instruction with an add immediate instruction.
  • the method may further include placing a TOC at the top, or within a predetermined distance from the top, of the global data area.
  • the method may also include placing the global variables after the TOC, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables. Also, the method may further include placing in run-time order, groups of the global variables that frequently follow each other in run-time.
  • the method may include setting the TOC anchor to an address that will
  • the method may further include eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions. Alternatively, it may include reordering the global data area to substantially maximize the number of load instructions replaceable with add immediate instructions.
  • the method may be implemented by a compiler, a linker, and/or a post-link tool.
  • a method for improving cache utilization includes reordering a global data area of a program and replacing one or more load instructions that reference global variables within range of the immediate part of the add immediate instruction from a TOC anchor, with the add immediate instruction.
  • the method also includes eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of the TOC.
  • the method may be applied to improving cache ratio.
  • the global data area includes a TOC within a predetermined distance from the top of the global data area, and a multiplicity of global variables after the TOC, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables.
  • the global data area may further include one or more groups of the global variables that frequently follow each other in run-time, placed in run-time order.
  • a method for a computer program embodied on a computer-readable medium includes a first code segment operative to place a TOC at the top, or within a predetermined distance from the top, of a global data area, and a second code segment operative to place after the TOC the multiplicity of global variables, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables.
  • the third code segment is operative to replace a load instruction with an add immediate instruction, for each load instruction referencing the global variables within a range of the immediate part of the add immediate instruction from a TOC.
  • the fourth code segment is operative to place in run-time order, one or more groups of the global variables that frequently follow each other in run-time.
  • a method for a computer program embodied on a computer-readable medium includes a first code segment operative to replace one or more load instructions referencing the global variables with an add immediate instruction and a second code segment operative to eliminate one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of the TOC.
  • the system includes means for reordering a global data area of a program and means for replacing said load instruction with an add immediate instruction for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from a TOC anchor.
  • the system includes means for reordering a global data area of a program, means for replacing one or more load instructions that reference global variables within range of the immediate part of the add immediate instruction from a TOC anchor, with said add immediate instruction, and means for eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of said TOC.
  • FIG. 1 is a block diagram of a global data area
  • FIG. 2 is a block diagram of a global data area constructed and operated according to an embodiment of the present invention.
  • FIGS. 3A-C are examples of a data connectivity graph constructed according to a preferred embodiment of the present invention.
  • the present method may reorder the global data of a given program.
  • the data reordering enables the replacement of frequently executed load instructions that reference global data with fast add immediate instructions, reduces the total size of global data area, and improves the global data locality.
  • the present invention is especially useful in global data reordering in reduced instruction set computer (RISC) architectures.
  • the global reordering may be according to representative feedback information of each instruction, or basic block, execution rate in the code.
  • references herein to global variables although not noted, also include references to functions descriptors, constants, and any other program object that may be in global data area.
  • Global data 10 comprises global variables 12 and a TOC 14 .
  • Global variables 12 may include global variables, constants and function addresses. It is noted that in FIG. 1 global variables 12 with dense hatching are more frequently referenced by a program that those global variables 12 with lighter hatching.
  • TOC 14 may contain addresses of global variables 12 of the program.
  • TOC 14 may comprise a plurality of TOC entries 16 ; each TOC entry 16 may contain a variable address 18 .
  • Variable addresses 18 may be the absolute address of the associated global variable 12 . Consequently, when a program references a specific global variable 12 , the associated variable address 18 of the referenced global variable 12 is extracted from TOC 14 .
  • Rtoc 22 holds an address known as TOC anchor 20 .
  • TOC anchor 20 may be the address of the middle of the TOC 14 , designated in FIG. 1 as yyy.
  • a program may reference a global variable 12 x having an address xxx.
  • setting the address of global variable 12 x into a register R 5 is done with a load instruction, using Rtoc 22 and a displacement.
  • the example command may then read:
  • load is a load instruction
  • zzz is the address of TOC entry 16 containing the variable address 18 x (xxx) of the desired global variable 12 .
  • the inventors of the present invention discovered that by reordering the global data it is possible to replace load instructions with faster add immediate instructions. This will eliminate many of the accesses to memory, and may save considerable time.
  • This is a detailed explanation of global data reordering, and subsequent replacement of load instructions with add immediate instructions.
  • FIG. 2 an illustration of global data area 50 , operated and constructed according to a preferred embodiment of the present invention. It is noted that in preferred embodiments of the present invention TOC 14 may be relocated to the top of global data area 50 .
  • Global variables 12 may be located after TOC 14 in an area known herein as global variable area 52 .
  • the Rtoc 22 holds TOC anchor 22 having a new value yyy′.
  • global variables 12 may be ordered generally by order of reference frequency; generally, from the most frequently referenced global variable 12 , or hottest, to the least frequently referenced global variable 12 , or coldest. Thus, the hottest global variable 12 may be closer to TOC 14 than colder global variables 12 .
  • the address of the desired global variable 12 x may be calculated by adding Rtoc 22 , holding the TOC anchor 20 (yyy′) and the difference (xxx′ ⁇ yyy′) between the TOC anchor 20 (yyy′) and the address of desired global variable 12 x (xxx′). Therefore, the load instruction from the above example may be replaced with an add immediate instruction as follows:
  • the offset of the global variable 12 from the TOC anchor 20 may not fit into the immediate part of the add immediate instruction.
  • the load instruction may be replaced by two or more add immediate instructions; for example add immediate and add immediate shifted in the following way:
  • the immediate value imm 1 may be added to Rtoc 22 and the result is put into R 5 .
  • the immediate value imm 2 may be added to R 5 .
  • redundant TOC entries 16 may be removed from TOC 14 . Removal of redundant TOC entries 16 reduces the size of TOC 14 , thus reducing the total size of the global data area.
  • RANGE load is the range of the displacement in the load instruction.
  • RANGE addi is the range of the immediate part of the addi instruction. It is noted that RANGE load may be different from RANGE addi .
  • a number of advantages may be realized by reducing the size of the TOC 14 .
  • One advantage is running program performance improvements.
  • TOC 14 is smaller, more global variables 12 may be within RANGE addi .
  • each load instruction referencing a global variable 12 within RANGE addi may be replaced with an addi instruction.
  • the more global variables 12 within RANGE addi the more add immediate instructions, the few the load instructions, the fewer accesses to memory to retrieve addresses of global variables 12 .
  • improved computation time is improved computation time.
  • a smaller TOC 14 may improve the cache ratio.
  • the cache holds the most frequently requested data.
  • the cache holds the hottest TOC entries 16 and the hottest variables. If the hottest TOC entries 16 become redundant (due to the replacement of load with addi, explained above) and are removed from TOC 14 , then the cache may have more room to hold variables and the remaining TOC entries 16 . This will improve the cache utilization and improve performance.
  • TOC entries 16 may not be removed if they are exported to other modules. This is because possibly the TOC entries 16 may be referenced from other executable modules via TOC 14 .
  • present embodiment may be implemented on a 32-bit addressing machine.
  • present invention is also applicable for 64-bit or other larger bit addressing machines. In larger bit machines several addi instructions may be used, and still be included within the true spirit and scope of the present invention.
  • two or more addi instructions may run slower than a single load instruction. Therefore, in some embodiments it is not advisable to replace frequently executed load instructions with two or more add immediate instructions.
  • preferred embodiments may reorder the global variables 12 such that the frequently referenced global variables 12 are located closer to the TOC 14 than less frequently referenced global variables 12 .
  • TOC anchor 20 may be relocated such that
  • the TOC entries 16 are reordered in the order of the corresponding global variables.
  • global variables 12 may be reordered in groups of global variables 12 that are frequently referenced one-after-the-other in run-time.
  • variable 12 a is most frequently referenced.
  • variable 12 m is frequently referenced.
  • variable 12 p is frequently referenced immediately after variable 12 a , 12 n or 12 p . Therefore, the order may be global variable 12 a , followed by global variable 12 m , followed by global variable 12 p .
  • Global variable 12 n may be placed separately from 12 a , 12 n and 12 p.
  • this step may comprise creating a data connectivity graph illustrating the data usage connectivity of global data variables 12 .
  • FIGS. 3A-C examples of a data connectivity graph constructed according to a preferred embodiment of the present invention.
  • a data connectivity graph (DCG) is a weighted directed graph representing the data usage connectivity of the global data.
  • FIG. 3A illustrates a portion of a program flow at the basic block level, including the execution rate of each instruction.
  • FIG. 3B illustrates the data connectivity within a TOC, as drawn from the program flow and execution rates of FIG. 3A .
  • FIG. 3C is the data connectivity graph resulting from FIG. 3B .
  • the illustrated DCG of FIG. 3C uses feedback information on the execution rate of each instruction, as shown in FIG. 3A .
  • the nodes of the DCG represent the TOC entries 16 which correspond to global variables 12 .
  • node x 1 represents TOC entry 16 x 1 for global variable X 1
  • node z represents TOC entry 16 z for global variable Z.
  • a directed edge in the DCG represents successive references of a program to a first TOC entry 16 and a second TOC entry 16 .
  • a directed edge from a node x 2 to node y exists if after a reference to TOC entry 16 x 2 , the next reference of the program to the TOC 14 is to the TOC entry 16 y .
  • the weight of the (x 2 ,y) edge is the number of references to x 2 that are followed by a references to y at run time.
  • the DCG can be constructed from the integration of the feedback information on the execution frequency of the code, together with the control flow of the program code.
  • each node in the DCG may have a hotness measure attached to it.
  • the hotness of a global variable is set to be the sum of the execution counts of all the instructions that reference this variable address in the TOC.
  • this step may comprise alternative methods of data profiling.
  • Another example of such an alternative method is described in U.S. Pat. No. 5,850,549, described above in the Background. It is noted that the above referenced patent is just one of many methods to create a data connectivity graph. The above patent is meant by way of example only, and other methods are covered within the principles of the present invention.
  • the constants may not be part of the global variable area. In such cases, the constants may be relocated and appended to the global variables area 20 .
  • TOC 14 Relocate TOC 14 .
  • relocate TOC 14 to the beginning of the global data area 50 .
  • TOC 14 may be relocated to an location close to the beginning of global data 50 , while not directly at the beginning of the area.
  • global variables 12 may be reordered such that the frequently referenced global variables 12 are located closer to the TOC 14 than less frequently referenced global variables 12 .
  • groups of global variables 12 are placed in the order in which they are most frequently referenced by the program at run-time.
  • Reorder TOC entries 16 are reordered according to the order of their corresponding global variables 12 .
  • TOC entries 16 that can be removed. Mark as removable TOC entries 16 of non-exported variables that are never or rarely referenced by the program at run-time. It is noted that TOC entries 16 of exported variables should not be marked as removable.
  • Update references to the global variables Based on the restructured TOC 14 , update the references to the global variables 12 . A method for updating is described hereinbelow.
  • references to the global data area 50 may be updated to reflect the relocations.
  • Variable addresses 18 may be updated to reflect the new locations of the associated global variables 12 .
  • references to the global data area 50 may be updated as follows:
  • Update references to reflect the movement of the global variables 12 are updated in order to reflect the movement of the global variables 12 relative to the TOC anchor 20 due to the reduction of TOC 14 , and the resetting of the TOC anchor 20 .
  • load instructions may be replaced with any applicable immediate instruction that performs calculation rather than memory access, and still fall within the true spirit and scope of the invention.
  • any instruction that accesses the memory may be replaced with an add immediate instruction, or any other applicable immediate instruction that performs calculation, and still falls within the true spirit and scope of the invention.
  • command examples herein are in the form of load register, base register, disp it is appreciated that other commands, such as load register, base register, index register, disp or other, still fall within the true spirit and scope of the invention.
  • the present invention may be used to modify an existing global data area or may be especially useful for creating a global data area 50 .
  • the present invention may be implemented in a compiler, linker or in a post-linker, as applicable.

Abstract

A method for improving program performance including reordering a global data area of a program and for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from a TOC anchor, replacing the load instruction with an add immediate instruction. The method may further include placing a TOC at the top, or within a predetermined distance from the top, of the global data area. The method may also include placing the global variables after the TOC, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables. Also, the method may further include placing in run-time order, groups of the global variables that frequently follow each other in run-time.

Description

FIELD OF THE INVENTION
The present invention relates to global data areas in general, and more specifically to reordering and optimization of global data areas.
BACKGROUND
Over the past several years many methods and tools have been developed to improve application performance. Many of these methods and tools are based on using data reordering/placement algorithms to improve the application's data locality.
“Cache-Conscious Data Placement”, by Calder et al., the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, Calif., 1998, and incorporated herein by reference, presents a method to improve a program's locality using data placement. Calder et al discuss the use of all data variables types: global, local (stack), and dynamic (heap) variables, as well as constants.
The data reordering of Calder et al. is based on two kinds of feedback profiles. The first profile lists each object encountered during execution, the object's name, reference count, size and lifetime information. The second profile is a temporal relationship graph (TRG) between different variables accessed by the application. The nodes of the TRG graph are variables, while an edge between two variables provides an estimation of the number of cache conflicts that would arise if these two variables were overlapped in the same cache line.
U.S. Pat. No. 5,850,549, “Global Variable Coalescing”, to Blainey, et. al., assigned to the assignee of the present patent application and incorporated herein by reference, describes a weighted interference graph where each node represents a variable and each edge represents an access relationship between two variables. The weights on the edges represent the access frequency and the weights on the nodes represent the variable size.
However, these prior art data reordering optimization techniques do not go far enough; they do not realize the additional optimization opportunities revealed as a result of the data reordering. Therefore, there still exists a need to provide method and apparatus to exploit the opportunities revealed as a result of data reordering, and thus, to provide even greater application performance improvements.
SUMMARY
While prior art works have described data reordering, none of them have realized the additional optimization opportunities revealed as a result of the reordering. The present inventors have discovered that it is possible to exploit global data reordering by replacing load instructions that reference global data and constants with fast add immediate instructions. As a result, the present invention may obtain additional performance improvements.
It is therefore an objective of the present invention to realize optimization opportunities resulting from reordering program global data. Preferred embodiments of the present invention may therefore globally reorder the global data area such that a substantially maximum number of load instructions that reference global variables via a table of variable addresses, known as a Table of Contents (TOC) may be replaced with add immediate instructions.
It is an additional objective of the present invention to improve cache utilization by grouping data that is frequently referenced together in run-time. It is a further objective of the present invention to improve the data locality by reducing the size of the global data area in a given program.
According to one aspect of the present invention, there is therefore provided a method for improving program performance. The method includes reordering a global data area of a program and for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from a TOC anchor, replacing the load instruction with an add immediate instruction. The method may further include placing a TOC at the top, or within a predetermined distance from the top, of the global data area.
The method may also include placing the global variables after the TOC, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables. Also, the method may further include placing in run-time order, groups of the global variables that frequently follow each other in run-time.
In some alternative, the method may include setting the TOC anchor to an address that will
    • 1) enable access to all TOC entries with a load instruction using Rtoc and a displacement, and
    • 2) a maximum number of addresses of global variables can be calculated using a single add immediate instruction.
The method may further include eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions. Alternatively, it may include reordering the global data area to substantially maximize the number of load instructions replaceable with add immediate instructions.
The method may be implemented by a compiler, a linker, and/or a post-link tool.
According to one aspect of the present invention, there is therefore provided a method for improving cache utilization. The method includes reordering a global data area of a program and replacing one or more load instructions that reference global variables within range of the immediate part of the add immediate instruction from a TOC anchor, with the add immediate instruction. The method also includes eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of the TOC. The method may be applied to improving cache ratio.
According to one aspect of the present invention, there is therefore provided a method for an improved global data area. The global data area includes a TOC within a predetermined distance from the top of the global data area, and a multiplicity of global variables after the TOC, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables.
The global data area may further include one or more groups of the global variables that frequently follow each other in run-time, placed in run-time order.
According to one aspect of the present invention, there is therefore provided a method for a computer program embodied on a computer-readable medium. The computer program includes a first code segment operative to place a TOC at the top, or within a predetermined distance from the top, of a global data area, and a second code segment operative to place after the TOC the multiplicity of global variables, wherein more frequently referenced global variable are closer to the TOC than less frequently referenced global variables.
The third code segment is operative to replace a load instruction with an add immediate instruction, for each load instruction referencing the global variables within a range of the immediate part of the add immediate instruction from a TOC. The fourth code segment is operative to place in run-time order, one or more groups of the global variables that frequently follow each other in run-time.
According to one aspect of the present invention, there is therefore provided a method for a computer program embodied on a computer-readable medium. The computer program includes a first code segment operative to replace one or more load instructions referencing the global variables with an add immediate instruction and a second code segment operative to eliminate one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of the TOC.
According to one aspect of the present invention, there is therefore provided a method for a system for improving program performance. The system includes means for reordering a global data area of a program and means for replacing said load instruction with an add immediate instruction for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from a TOC anchor.
According to one aspect of the present invention, there is therefore provided a method for a system for improving cache utilization. The system includes means for reordering a global data area of a program, means for replacing one or more load instructions that reference global variables within range of the immediate part of the add immediate instruction from a TOC anchor, with said add immediate instruction, and means for eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of said TOC.
BRIEF DESCRIPTION
Embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of a global data area;
FIG. 2 is a block diagram of a global data area constructed and operated according to an embodiment of the present invention; and
FIGS. 3A-C are examples of a data connectivity graph constructed according to a preferred embodiment of the present invention.
DETAILED DESCRIPTION
The present method may reorder the global data of a given program. The data reordering enables the replacement of frequently executed load instructions that reference global data with fast add immediate instructions, reduces the total size of global data area, and improves the global data locality.
It is noted that the present invention is especially useful in global data reordering in reduced instruction set computer (RISC) architectures. The global reordering may be according to representative feedback information of each instruction, or basic block, execution rate in the code.
For ease in understanding the present invention, herein now is a discussion of the global data mechanism in a RISC architecture.
In many RISC architectures the machine instructions are too short to contain the full absolute memory addresses as an immediate operands. Thus, unfortunately, absolute memory addresses of referenced global variables and functions must be obtained by different methods. In order to solve this problem, the affected RISC architectures typically use offsets from a base address to reference the memory addresses. The base address is typically a register. The mechanism for referencing global variables in executables is typically done via a global table, commonly known as a Table of Contents (TOC).
It is noted that references herein to global variables, although not noted, also include references to functions descriptors, constants, and any other program object that may be in global data area.
Reference is now made to FIG. 1, an illustration of a global data area 10. Global data 10 comprises global variables 12 and a TOC 14. Global variables 12 may include global variables, constants and function addresses. It is noted that in FIG. 1 global variables 12 with dense hatching are more frequently referenced by a program that those global variables 12 with lighter hatching.
TOC 14 may contain addresses of global variables 12 of the program. Thus, TOC 14 may comprise a plurality of TOC entries 16; each TOC entry 16 may contain a variable address 18. Variable addresses 18 may be the absolute address of the associated global variable 12. Consequently, when a program references a specific global variable 12, the associated variable address 18 of the referenced global variable 12 is extracted from TOC 14.
When a program accesses a TOC entry 16 in TOC 14, the program uses a special register Rtoc 22. Rtoc 22 holds an address known as TOC anchor 20. TOC anchor 20 may be the address of the middle of the TOC 14, designated in FIG. 1 as yyy.
As an example, a program may reference a global variable 12 x having an address xxx. Typically, setting the address of global variable 12 x into a register R5 (not shown) is done with a load instruction, using Rtoc 22 and a displacement.
The example command may then read:
    • load R5, Rtoc, disp
    • or: load into R5 the content of the memory at the address that is calculated by adding Rtoc and disp.
Where load is a load instruction,
    • disp=zzz−yyy,
    • Rtoc 22 holds TOC anchor 20, having a value yyy, and
zzz is the address of TOC entry 16 containing the variable address 18 x (xxx) of the desired global variable 12.
The inventors of the present invention discovered that by reordering the global data it is possible to replace load instructions with faster add immediate instructions. This will eliminate many of the accesses to memory, and may save considerable time. Hereinnow is a detailed explanation of global data reordering, and subsequent replacement of load instructions with add immediate instructions.
Reference is now made to FIG. 2, an illustration of global data area 50, operated and constructed according to a preferred embodiment of the present invention. It is noted that in preferred embodiments of the present invention TOC 14 may be relocated to the top of global data area 50. Global variables 12 may be located after TOC 14 in an area known herein as global variable area 52. The Rtoc 22 holds TOC anchor 22 having a new value yyy′.
In preferred embodiments, within global area 52, global variables 12 may be ordered generally by order of reference frequency; generally, from the most frequently referenced global variable 12, or hottest, to the least frequently referenced global variable 12, or coldest. Thus, the hottest global variable 12 may be closer to TOC 14 than colder global variables 12.
When the global variable 12 is close enough to TOC 14, it is possible to calculate the address of the variable 12 with an add immediate instruction using Rtoc 22 and an immediate value. Consequently, it may be possible to eliminate the memory access to the TOC 14, via the load instruction, by replacing the load instruction with an add immediate instruction.
Thus, returning to the above example, the address of the desired global variable 12 x may be calculated by adding Rtoc 22, holding the TOC anchor 20 (yyy′) and the difference (xxx′−yyy′) between the TOC anchor 20 (yyy′) and the address of desired global variable 12 x (xxx′). Therefore, the load instruction from the above example may be replaced with an add immediate instruction as follows:
    • addi R5, Rtoc, imm
    • or: add the immediate value to the address in Rtoc and put the result in R5.
Where addi is an add immediate instruction
    • Rtoc 22 holds TOC anchor 20 having a value yyy′; and
    • imm=xxx′−yyy′.
In alternative embodiments of the present invention, the offset of the global variable 12 from the TOC anchor 20 may not fit into the immediate part of the add immediate instruction. In such cases the load instruction may be replaced by two or more add immediate instructions; for example add immediate and add immediate shifted in the following way:
    • addi R5, Rtoc, imm1
    • addis R5, R5, imm2
    • where imm1=LSB(xxx′−yyy′), representing the least significant bits (LSB) of the offset between the TOC anchor 20 (yyy′) and the address xxx of the global variable 12 x.
    • imm2=MSB(xxx′−yyy′), representing the most significant bits (MSB) of the offset between the TOC anchor 20 (yyy′) and the address xxx of the global variable 12 x.
Thus, in preferred embodiments of the present invention, first the immediate value imm1 may be added to Rtoc 22 and the result is put into R5. Then the immediate value imm2 may be added to R5.
Please note that the sign of the offset must be preserved.
It is additionally noted that if all the references to a specific global variable 12 are replaced with immediate references, then the associated TOC entry 16 of that specific global variable 12 may become redundant. Thus, in preferred embodiments of the present invention, redundant TOC entries 16 may be removed from TOC 14. Removal of redundant TOC entries 16 reduces the size of TOC 14, thus reducing the total size of the global data area.
For ease of understanding the following discussion, please note the following terms. RANGEload is the range of the displacement in the load instruction. RANGEaddi is the range of the immediate part of the addi instruction. It is noted that RANGEload may be different from RANGEaddi.
A number of advantages may be realized by reducing the size of the TOC 14. One advantage is running program performance improvements. When TOC 14 is smaller, more global variables 12 may be within RANGEaddi. Please remember that each load instruction referencing a global variable 12 within RANGEaddi may be replaced with an addi instruction. Accordingly, the more global variables 12 within RANGEaddi, the more add immediate instructions, the few the load instructions, the fewer accesses to memory to retrieve addresses of global variables 12. Hence, improved computation time.
Additionally, a smaller TOC 14 may improve the cache ratio. Typically, the cache holds the most frequently requested data. In the specific case of TOC, the cache holds the hottest TOC entries 16 and the hottest variables. If the hottest TOC entries 16 become redundant (due to the replacement of load with addi, explained above) and are removed from TOC 14, then the cache may have more room to hold variables and the remaining TOC entries 16. This will improve the cache utilization and improve performance.
It is noted that TOC entries 16 may not be removed if they are exported to other modules. This is because possibly the TOC entries 16 may be referenced from other executable modules via TOC 14.
It is further noted that the present embodiment may be implemented on a 32-bit addressing machine. However, the present invention is also applicable for 64-bit or other larger bit addressing machines. In larger bit machines several addi instructions may be used, and still be included within the true spirit and scope of the present invention.
For some architectures, two or more addi instructions may run slower than a single load instruction. Therefore, in some embodiments it is not advisable to replace frequently executed load instructions with two or more add immediate instructions.
The inventors have additionally discovered that it is desirable to maximize the performance potential of replacing load instructions with add immediate ones. In order to do so, preferred embodiments may reorder the global variables 12 such that the frequently referenced global variables 12 are located closer to the TOC 14 than less frequently referenced global variables 12.
It is appreciated that alternative embodiments may apparent to those skilled in the art, that while not being the embodiment described herein, do however place the more frequently referenced variables closer to the TOC 14 than the less frequently referenced variable. As an example, an alternative embodiment may place the most frequent referenced variable within a predefined distance from TOC 14. These alternative embodiments, while not being described herein, are readily implemented within the principles of the present invention, and are included within the true spirit and scope of the present invention.
A preferred embodiment for reordering is now explained. In preferred embodiments, in order to improve optimization of the program code, TOC anchor 20 may be relocated such that
    • a) all TOC entries 16 are accessible with the regular load instruction using a displacement from Rtoc 22, i.e. within the range of RANGEload, and
    • b) a maximum number of addresses of global variables 12 can be calculated with a single add immediate instruction using the Rtoc 22 and an immediate value, i.e. they are within the range of RANGEaddi. It is noted that while the present embodiment may describe the maximum number of global variables 12 within the range of RANGEaddi, it is appreciated that alternative embodiments may apparent to those skilled in the art, which while not providing the maximum number of variable, do provide a “close to maximum” number of global variables 12 within the range of RANGEaddi. These alternative embodiments, while not being described herein, are readily implemented within the principles of the present invention, and are included within the true spirit and scope of the present invention.
In further preferred embodiments of the present invention, the TOC entries 16 are reordered in the order of the corresponding global variables.
In yet further preferred embodiments, global variables 12 may be reordered in groups of global variables 12 that are frequently referenced one-after-the-other in run-time.
Consequently, the global variables 12 most frequently referenced are reordered closer to TOC 14. Furthermore, groups of global variables 12 that frequently follow each other at run-time are placed in run-time order.
Herein in FIG. 2 is an example group of global variables 12 that frequently follow each other at run-time. Variable 12 a is most frequently referenced. During run-time, after reference to variable 12 a, typically the next reference is to variable 12 m. Also, frequently referenced after variable 12 m, is variable 12 p. However, variable 12 n is rarely referenced immediately after variable 12 a, 12 n or 12 p. Therefore, the order may be global variable 12 a, followed by global variable 12 m, followed by global variable 12 p. Global variable 12 n may be placed separately from 12 a, 12 n and 12 p.
It is noted that after repositioning TOC 14 and global variables 12 in the global data area 50, all the remaining entries in the TOC 14 and all the instructions that reference the global data area 50 need to be modified accordingly.
Hereinnow is an example of a method for global reordering according to a preferred embodiment of the present invention.
1) Determine the hotness of global variables 12 and the groups of global variables frequently referenced together in run-time. In preferred embodiments this step may comprise creating a data connectivity graph illustrating the data usage connectivity of global data variables 12.
Reference is now made to FIGS. 3A-C, examples of a data connectivity graph constructed according to a preferred embodiment of the present invention. A data connectivity graph (DCG) is a weighted directed graph representing the data usage connectivity of the global data. FIG. 3A illustrates a portion of a program flow at the basic block level, including the execution rate of each instruction. FIG. 3B illustrates the data connectivity within a TOC, as drawn from the program flow and execution rates of FIG. 3A. FIG. 3C is the data connectivity graph resulting from FIG. 3B.
The illustrated DCG of FIG. 3C uses feedback information on the execution rate of each instruction, as shown in FIG. 3A. The nodes of the DCG represent the TOC entries 16 which correspond to global variables 12. Thus node x1 represents TOC entry 16 x 1 for global variable X1; node z represents TOC entry 16 z for global variable Z.
A directed edge in the DCG represents successive references of a program to a first TOC entry 16 and a second TOC entry 16. As an example, a directed edge from a node x2 to node y exists if after a reference to TOC entry 16 x 2, the next reference of the program to the TOC 14 is to the TOC entry 16 y. The weight of the (x2,y) edge is the number of references to x2 that are followed by a references to y at run time. The DCG can be constructed from the integration of the feedback information on the execution frequency of the code, together with the control flow of the program code.
In some embodiments, each node in the DCG may have a hotness measure attached to it. In general, the hotness of a global variable is set to be the sum of the execution counts of all the instructions that reference this variable address in the TOC.
In alternative embodiments, this step may comprise alternative methods of data profiling. Another example of such an alternative method is described in U.S. Pat. No. 5,850,549, described above in the Background. It is noted that the above referenced patent is just one of many methods to create a data connectivity graph. The above patent is meant by way of example only, and other methods are covered within the principles of the present invention.
2) Relocate constants. In some programs the constants may not be part of the global variable area. In such cases, the constants may be relocated and appended to the global variables area 20.
3) Relocate TOC 14. Preferably, relocate TOC 14 to the beginning of the global data area 50. Alternatively, TOC 14 may be relocated to an location close to the beginning of global data 50, while not directly at the beginning of the area.
4) Place all the global variables 12 in global variable area 52. In preferred embodiments of the present invention, global variables 12 may be reordered such that the frequently referenced global variables 12 are located closer to the TOC 14 than less frequently referenced global variables 12. In alternative embodiments, groups of global variables 12 are placed in the order in which they are most frequently referenced by the program at run-time.
In order to determine hotness and run-time order, refer to the data connectivity graph of step 1. One method for placing the global variable is described in “Cache-Conscious Data Placement”, by Calder et al., noted herein above in the Background. It is noted that the “Cache-Conscious Data Placement” is just one of many placement methods. The article is meant by way of example only, and other methods are covered within the principles of the present invention.
5) Reorder TOC entries 16. The TOC entries 16 are reordered according to the order of their corresponding global variables 12.
6) Mark TOC entries 16 that can be removed. Mark as removable TOC entries 16 of non-exported variables that are never or rarely referenced by the program at run-time. It is noted that TOC entries 16 of exported variables should not be marked as removable.
7) Remove TOC entries 16 that are marked removable. Decrease the TOC size and relocate the global variables 12 accordingly.
8) Set TOC anchor 20. Set TOC anchor 20 to point to an address raddr in the global data area 50 where:
    • raddr=TOCstart+(RANGEload/2)
    • where TOCstart is the address of the beginning of TOC 14, and
    • RANGEload is the range of the displacement in the load instruction.
9) Update references to the global variables. Based on the restructured TOC 14, update the references to the global variables 12. A method for updating is described hereinbelow.
10) Mark removable TOC entries. Mark as removable TOC entries 16 of non-exported global variables 12 which are within the range of the RANGEaddi from the TOC anchor 20,
    • where RANGEaddi is the range of the immediate part of the addi instruction.
11) If any of the remaining TOC entries 16 are marked as removable, return to step 7. Otherwise, if there are no more TOC entries 16 marked as removable, end.
It is appreciated that one or more of the steps of the above method may be omitted, or slightly modified, or carried out in a different order than shown, without departing from the true spirit and scope of the present invention.
Updating the References to the Global Data
After reducing the size of TOC 14, references to the global data area 50 may be updated to reflect the relocations. Variable addresses 18 may be updated to reflect the new locations of the associated global variables 12. In the code sections, references to the global data area 50 may be updated as follows:
1. Replace Load instructions with a single add immediate instruction. Replace Load instructions that reference TOC entries 16 of global variables 12 within RANGEaddi from TOC anchor 20 with a single add immediate instruction, where the immediate value is the offset between the address of global variable 12 and the address of TOC anchor 20.
2. Modify the load instructions according to the new location of TOC entries 16. For the load instructions for which the TOC entry 16 was not removed, modify the load instructions that reference TOC entries 16 of global variables 12 outside the range of the RANGEaddi from the TOC anchor 20 according to the new location of TOC entries 16
3. Replace Load instructions with two or more add immediate instruction. For the load instructions for which their TOC entry was removed, replace the load instructions that references TOC entries 16 of global variables 12 outside the range of the RANGEaddi from TOC anchor 20 with two or more add immediate instructions as follows:
    • 3.1. First an add immediate instruction, where the immediate value is the LSB of the offset between global variable 12 and TOC anchor 20.
    • 3.2. Then an add immediate shifted instruction, where the immediate value is the MSB of the offset between global variable 12 and TOC anchor 20.
4. Update references to reflect the movement of the global variables 12. References to global variables 12 that were replaced with add immediate instructions are updated in order to reflect the movement of the global variables 12 relative to the TOC anchor 20 due to the reduction of TOC 14, and the resetting of the TOC anchor 20.
It is appreciated that those skilled in the art that may be aware of various other modifications, which while not specifically shown herein, are nevertheless within the true spirit and scope of the invention. As an example, load instructions may be replaced with any applicable immediate instruction that performs calculation rather than memory access, and still fall within the true spirit and scope of the invention. Likewise, any instruction that accesses the memory, may be replaced with an add immediate instruction, or any other applicable immediate instruction that performs calculation, and still falls within the true spirit and scope of the invention. Similarly, although the command examples herein are in the form of load register, base register, disp it is appreciated that other commands, such as load register, base register, index register, disp or other, still fall within the true spirit and scope of the invention.
It is appreciated that one or more of the steps of the above method may be omitted, or slightly modified, or carried out in a different order than shown, without departing from the true spirit and scope of the present invention. It is noted that the present invention may be used to modify an existing global data area or may be especially useful for creating a global data area 50. Thus the present invention may be implemented in a compiler, linker or in a post-linker, as applicable.
It will thus be appreciated that the preferred embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described herein above. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to person skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.

Claims (20)

1. A method for improving program performance, the method comprising the steps of:
reordering global variables within a global data area of a program by moving global variables having addresses associated therewith in a Table of Contents (TOC) which addresses are more frequently referenced to be closer to a TOC anchor than global variables having addresses associated therewith in said TOC which addresses are less frequently referenced; and
for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from said TOC anchor, replacing said load instruction with an add immediate instruction.
2. The method of claim 1, further comprising the step of:
placing said TOC at the top of said global data area.
3. The method of claim 1, further comprising the step of:
placing said TOC within a predetermined distance from the top of said global data area.
4. The method of claim 1, further comprising the step of:
placing in run-time order, groups of said global variables that frequently follow each other in run-time.
5. The method of claim 1, further comprising the step of
setting said TOC anchor to an address that will enable access to all TOC entries with a load instruction using Rtoc and a displacement, and a maximum number of addresses of global variables can be calculated using a single add immediate instruction.
6. The method of claim 1, and further comprising the step of:
eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions.
7. The method of claim 1, and further comprising the step of:
reordering said global data area to substantially maximize the number of load instructions replaceable with add immediate instructions.
8. The method of claim 1, wherein said claim 1 is implemented by at least one of the following; a complier, a linker, and a post-link tool.
9. A method according to claim 1 and also comprising the step of:
reordering said addresses of said global variables in said TOC by moving said addresses to correspond to said reordering of said global variables.
10. A method for improving cache utilization, the method comprising the steps of:
reordering global variables within a global data area of a program by moving global variables having addresses associated therewith in a Table of Contents (TOC) which addresses are more frequently referenced to be closer to a TOC anchor than global variables having addresses associated therewith in said TOC which addresses are less frequently referenced; and
replacing one or more load instructions that reference global variables within range of the immediate part of the add immediate instruction from said TOC anchor, with said add immediate instruction; and
eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of said TOC.
11. The method of claim 10, and further comprising the steps of:
placing said TOC at the top of said global data area.
12. The method of claim 10, and further comprising the steps of:
placing said TOC within a predetermined distance from the top of said global data area.
13. The method of claim 12, and further comprising the step of
placing in run-time order, groups of said global variables that frequently follow each other in run-time.
14. The method of claim 10, and further comprising the steps of:
maximizing the number of said global variables within range of the immediate part of the add immediate instruction from said TOC anchor.
15. The method of claim 10, wherein any of said steps of reordering, replacing and eliminating are applied to improving cache ratio.
16. A computer program embodied on a computer-readable medium, the computer program comprising:
a first code segment operative to place a Table of Contents (TOC) at the top of a global data area,
a second code segment operative to place after said TOC a multiplicity of global variables, wherein global variables having addresses associated therewith in said TOC which are more frequently referenced are reordered to be closer to a TOC anchor than global variables having addresses associated therewith in said TOC which are less frequently referenced, and
a third code segment operative to replace a load instruction with an add immediate instruction, for each load instruction referencing said global variables within a range of the immediate part of the add immediate instruction from a TOC.
17. The computer program of claim 16, and further comprising:
a fourth code segment operative to place in run-time order, one or more groups of said global variables that frequently follow each other in run-time.
18. A computer program embodied on a computer-readable medium, the computer program comprising:
a first code segment operative to place a TOC within a predetermined distance from the top of a global data area, and
a second code segment operative to place after said TOC a multiplicity of global variables, wherein global variables having addresses associated therewith in said TOC which are more frequently referenced are reordered to be closer to a TOC anchor than global variables having addresses associated therewith in said TOC which are less frequently referenced, and
a third code segment operative to replace a load instruction with an add immediate instruction, for each load instruction referencing said global variables within a range of the immediate part of the add immediate instruction from a TOC.
19. A system for improving program performance, the system comprising:
means for reordering global variables within a global data area of a program, wherein global variables having addresses associated therewith in a Table of Contents (TOC) which addresses are more frequently referenced are reordered to be closer to said TOC than global variables having addresses associated therewith in said TOC which addresses are less frequently referenced; and
means for replacing a load instruction with an add immediate instruction for each load instruction referencing global variables within range of the immediate part of an add immediate instruction from a TOC anchor.
20. A system for improving cache utilization, the system comprising:
means for reordering global variables within a global data area of a program, wherein global variables having addresses associated therewith in a Table of Contents (TOC) which addresses are more frequently referenced are reordered to be closer to said TOC than global variables having addresses associated therewith in said TOC which addresses are less frequently referenced;
means for replacing one or more load instructions that reference global variables within range of the immediate part of the add immediate instruction from a TOC anchor, with said add immediate instruction, and
means for eliminating one or more TOC entries that contain variable addresses that are referenced by only add immediate instructions, thereby reducing the size of said TOC.
US10/335,356 2002-12-31 2002-12-31 Reducing load instructions via global data reordering Expired - Fee Related US7310799B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/335,356 US7310799B2 (en) 2002-12-31 2002-12-31 Reducing load instructions via global data reordering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/335,356 US7310799B2 (en) 2002-12-31 2002-12-31 Reducing load instructions via global data reordering

Publications (2)

Publication Number Publication Date
US20040128662A1 US20040128662A1 (en) 2004-07-01
US7310799B2 true US7310799B2 (en) 2007-12-18

Family

ID=32655329

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/335,356 Expired - Fee Related US7310799B2 (en) 2002-12-31 2002-12-31 Reducing load instructions via global data reordering

Country Status (1)

Country Link
US (1) US7310799B2 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080141233A1 (en) * 2006-12-07 2008-06-12 International Business Machines Corporation Presenting machine instructions in a machine-independent tree form suitable for post-link optimizations
US7676799B1 (en) * 2005-06-10 2010-03-09 Sun Microsystems, Inc. Address simplification by binary transformation
US8607211B2 (en) 2011-10-03 2013-12-10 International Business Machines Corporation Linking code for an enhanced application binary interface (ABI) with decode time instruction optimization
US8612952B2 (en) 2010-04-07 2013-12-17 International Business Machines Corporation Performance optimization based on data accesses during critical sections
US8615745B2 (en) 2011-10-03 2013-12-24 International Business Machines Corporation Compiling code for an enhanced application binary interface (ABI) with decode time instruction optimization
US8756591B2 (en) 2011-10-03 2014-06-17 International Business Machines Corporation Generating compiled code that indicates register liveness
US8959502B2 (en) 2011-07-27 2015-02-17 International Business Machines Corporation Processing table of content access overflow in an application
US9513828B2 (en) * 2015-03-25 2016-12-06 International Business Machines Corporation Accessing global data from accelerator devices
US20190087335A1 (en) * 2017-09-19 2019-03-21 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US10534609B2 (en) 2017-08-18 2020-01-14 International Business Machines Corporation Code-specific affiliated register prediction
US10558461B2 (en) 2017-08-18 2020-02-11 International Business Machines Corporation Determining and predicting derived values used in register-indirect branching
US10564974B2 (en) 2017-08-18 2020-02-18 International Business Machines Corporation Determining and predicting affiliated registers based on dynamic runtime control flow analysis
US10579385B2 (en) 2017-08-18 2020-03-03 International Business Machines Corporation Prediction of an affiliated register
US10620955B2 (en) 2017-09-19 2020-04-14 International Business Machines Corporation Predicting a table of contents pointer value responsive to branching to a subroutine
US10705973B2 (en) 2017-09-19 2020-07-07 International Business Machines Corporation Initializing a data structure for use in predicting table of contents pointer values
US10713050B2 (en) 2017-09-19 2020-07-14 International Business Machines Corporation Replacing Table of Contents (TOC)-setting instructions in code with TOC predicting instructions
US10831457B2 (en) 2017-09-19 2020-11-10 International Business Machines Corporation Code generation relating to providing table of contents pointer values
US10884745B2 (en) 2017-08-18 2021-01-05 International Business Machines Corporation Providing a predicted target address to multiple locations based on detecting an affiliated relationship
US10884929B2 (en) 2017-09-19 2021-01-05 International Business Machines Corporation Set table of contents (TOC) register instruction
US10901741B2 (en) 2017-08-18 2021-01-26 International Business Machines Corporation Dynamic fusion of derived value creation and prediction of derived values in a subroutine branch sequence
US10908911B2 (en) 2017-08-18 2021-02-02 International Business Machines Corporation Predicting and storing a predicted target address in a plurality of selected locations
US11061575B2 (en) * 2017-09-19 2021-07-13 International Business Machines Corporation Read-only table of contents register
US11150904B2 (en) 2017-08-18 2021-10-19 International Business Machines Corporation Concurrent prediction of branch addresses and update of register contents

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100361094C (en) * 2005-07-01 2008-01-09 华为技术有限公司 Method for saving global varible internal memory space
US7784042B1 (en) * 2005-11-10 2010-08-24 Oracle America, Inc. Data reordering for improved cache operation
US11321236B2 (en) * 2018-01-08 2022-05-03 Microsoft Technology Licensing, Llc. Reduced instructions to generate global variable addresses

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5774722A (en) * 1995-12-14 1998-06-30 International Business Machines Corporation Method for efficient external reference resolution in dynamically linked shared code libraries in single address space operating systems
US5850549A (en) * 1995-12-28 1998-12-15 International Business Machines Corporation Global variable coalescing
US5923882A (en) * 1995-08-29 1999-07-13 Silicon Graphics, Inc. Cross-module optimization for dynamically-shared programs and libraries
US6360361B1 (en) * 1999-03-15 2002-03-19 Microsoft Corporation Field reordering to optimize cache utilization
US6594678B1 (en) * 2000-01-05 2003-07-15 Sun Microsystems, Inc. Methods and apparatus for improving locality of reference through memory management
US6665671B2 (en) * 2001-04-04 2003-12-16 Hewlett-Packard Development Company, L.P. System and method for optimization of shared data
US6862729B1 (en) * 2000-04-04 2005-03-01 Microsoft Corporation Profile-driven data layout optimization

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5923882A (en) * 1995-08-29 1999-07-13 Silicon Graphics, Inc. Cross-module optimization for dynamically-shared programs and libraries
US5774722A (en) * 1995-12-14 1998-06-30 International Business Machines Corporation Method for efficient external reference resolution in dynamically linked shared code libraries in single address space operating systems
US5850549A (en) * 1995-12-28 1998-12-15 International Business Machines Corporation Global variable coalescing
US6360361B1 (en) * 1999-03-15 2002-03-19 Microsoft Corporation Field reordering to optimize cache utilization
US6594678B1 (en) * 2000-01-05 2003-07-15 Sun Microsystems, Inc. Methods and apparatus for improving locality of reference through memory management
US6862729B1 (en) * 2000-04-04 2005-03-01 Microsoft Corporation Profile-driven data layout optimization
US6665671B2 (en) * 2001-04-04 2003-12-16 Hewlett-Packard Development Company, L.P. System and method for optimization of shared data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Calder et al., "Cache-Conscious Data Placement," Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, ACM Press (Oct. 1998), pp. 139-149. *
Chow et al., "How Many Addressing Modes are Enough?," Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems, IEEE Computer Society Press (Oct. 1987), pp. 117-121. *

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676799B1 (en) * 2005-06-10 2010-03-09 Sun Microsystems, Inc. Address simplification by binary transformation
US20080141233A1 (en) * 2006-12-07 2008-06-12 International Business Machines Corporation Presenting machine instructions in a machine-independent tree form suitable for post-link optimizations
US8656381B2 (en) * 2006-12-07 2014-02-18 International Business Machines Corporation Presenting machine instructions in a machine-independent tree form suitable for post-link optimizations
US8612952B2 (en) 2010-04-07 2013-12-17 International Business Machines Corporation Performance optimization based on data accesses during critical sections
US8959502B2 (en) 2011-07-27 2015-02-17 International Business Machines Corporation Processing table of content access overflow in an application
US8607211B2 (en) 2011-10-03 2013-12-10 International Business Machines Corporation Linking code for an enhanced application binary interface (ABI) with decode time instruction optimization
US8612959B2 (en) 2011-10-03 2013-12-17 International Business Machines Corporation Linking code for an enhanced application binary interface (ABI) with decode time instruction optimization
US8615745B2 (en) 2011-10-03 2013-12-24 International Business Machines Corporation Compiling code for an enhanced application binary interface (ABI) with decode time instruction optimization
US8615746B2 (en) 2011-10-03 2013-12-24 International Business Machines Corporation Compiling code for an enhanced application binary interface (ABI) with decode time instruction optimization
US8756591B2 (en) 2011-10-03 2014-06-17 International Business Machines Corporation Generating compiled code that indicates register liveness
US9513828B2 (en) * 2015-03-25 2016-12-06 International Business Machines Corporation Accessing global data from accelerator devices
US9513832B2 (en) * 2015-03-25 2016-12-06 International Business Machines Corporation Accessing global data from accelerator devices
US11150908B2 (en) 2017-08-18 2021-10-19 International Business Machines Corporation Dynamic fusion of derived value creation and prediction of derived values in a subroutine branch sequence
US10908911B2 (en) 2017-08-18 2021-02-02 International Business Machines Corporation Predicting and storing a predicted target address in a plurality of selected locations
US10884745B2 (en) 2017-08-18 2021-01-05 International Business Machines Corporation Providing a predicted target address to multiple locations based on detecting an affiliated relationship
US11150904B2 (en) 2017-08-18 2021-10-19 International Business Machines Corporation Concurrent prediction of branch addresses and update of register contents
US10534609B2 (en) 2017-08-18 2020-01-14 International Business Machines Corporation Code-specific affiliated register prediction
US10558461B2 (en) 2017-08-18 2020-02-11 International Business Machines Corporation Determining and predicting derived values used in register-indirect branching
US10564974B2 (en) 2017-08-18 2020-02-18 International Business Machines Corporation Determining and predicting affiliated registers based on dynamic runtime control flow analysis
US10579385B2 (en) 2017-08-18 2020-03-03 International Business Machines Corporation Prediction of an affiliated register
US10929135B2 (en) 2017-08-18 2021-02-23 International Business Machines Corporation Predicting and storing a predicted target address in a plurality of selected locations
US11314511B2 (en) 2017-08-18 2022-04-26 International Business Machines Corporation Concurrent prediction of branch addresses and update of register contents
US10901741B2 (en) 2017-08-18 2021-01-26 International Business Machines Corporation Dynamic fusion of derived value creation and prediction of derived values in a subroutine branch sequence
US10891133B2 (en) 2017-08-18 2021-01-12 International Business Machines Corporation Code-specific affiliated register prediction
US10884747B2 (en) 2017-08-18 2021-01-05 International Business Machines Corporation Prediction of an affiliated register
US10884746B2 (en) 2017-08-18 2021-01-05 International Business Machines Corporation Determining and predicting affiliated registers based on dynamic runtime control flow analysis
US10719328B2 (en) 2017-08-18 2020-07-21 International Business Machines Corporation Determining and predicting derived values used in register-indirect branching
US10884748B2 (en) 2017-08-18 2021-01-05 International Business Machines Corporation Providing a predicted target address to multiple locations based on detecting an affiliated relationship
US10754656B2 (en) 2017-08-18 2020-08-25 International Business Machines Corporation Determining and predicting derived values
US20190087337A1 (en) * 2017-09-19 2019-03-21 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US10831457B2 (en) 2017-09-19 2020-11-10 International Business Machines Corporation Code generation relating to providing table of contents pointer values
US10884929B2 (en) 2017-09-19 2021-01-05 International Business Machines Corporation Set table of contents (TOC) register instruction
US10884930B2 (en) 2017-09-19 2021-01-05 International Business Machines Corporation Set table of contents (TOC) register instruction
US10725918B2 (en) * 2017-09-19 2020-07-28 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US10713051B2 (en) 2017-09-19 2020-07-14 International Business Machines Corporation Replacing table of contents (TOC)-setting instructions in code with TOC predicting instructions
US10713050B2 (en) 2017-09-19 2020-07-14 International Business Machines Corporation Replacing Table of Contents (TOC)-setting instructions in code with TOC predicting instructions
US10705973B2 (en) 2017-09-19 2020-07-07 International Business Machines Corporation Initializing a data structure for use in predicting table of contents pointer values
US10896030B2 (en) 2017-09-19 2021-01-19 International Business Machines Corporation Code generation relating to providing table of contents pointer values
US10691600B2 (en) * 2017-09-19 2020-06-23 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US10656946B2 (en) 2017-09-19 2020-05-19 International Business Machines Corporation Predicting a table of contents pointer value responsive to branching to a subroutine
US10620955B2 (en) 2017-09-19 2020-04-14 International Business Machines Corporation Predicting a table of contents pointer value responsive to branching to a subroutine
US10949350B2 (en) * 2017-09-19 2021-03-16 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US10963382B2 (en) * 2017-09-19 2021-03-30 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US10977185B2 (en) 2017-09-19 2021-04-13 International Business Machines Corporation Initializing a data structure for use in predicting table of contents pointer values
US11010164B2 (en) 2017-09-19 2021-05-18 International Business Machines Corporation Predicting a table of contents pointer value responsive to branching to a subroutine
US11061575B2 (en) * 2017-09-19 2021-07-13 International Business Machines Corporation Read-only table of contents register
US11061576B2 (en) * 2017-09-19 2021-07-13 International Business Machines Corporation Read-only table of contents register
US11138113B2 (en) 2017-09-19 2021-10-05 International Business Machines Corporation Set table of contents (TOC) register instruction
US11138127B2 (en) 2017-09-19 2021-10-05 International Business Machines Corporation Initializing a data structure for use in predicting table of contents pointer values
US20190340126A1 (en) * 2017-09-19 2019-11-07 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US20190340127A1 (en) * 2017-09-19 2019-11-07 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses
US20190087335A1 (en) * 2017-09-19 2019-03-21 International Business Machines Corporation Table of contents cache entry having a pointer for a range of addresses

Also Published As

Publication number Publication date
US20040128662A1 (en) 2004-07-01

Similar Documents

Publication Publication Date Title
US7310799B2 (en) Reducing load instructions via global data reordering
US6584549B2 (en) System and method for prefetching data into a cache based on miss distance
US5848423A (en) Garbage collection system and method for locating root set pointers in method activation records
KR100512665B1 (en) Space-limited marking structure for tracing garbage collectors
US4928239A (en) Cache memory with variable fetch and replacement schemes
US7096321B2 (en) Method and system for a cache replacement technique with adaptive skipping
EP0317080A2 (en) Method and apparatus using variable ranges to support symbolic debugging of optimized code
EP2517097A1 (en) Methods and apparatuses to allocate file storage via tree representations of a bitmap
US6484228B2 (en) Method and apparatus for data compression and decompression for a data processor system
US5355478A (en) Method for avoiding cache misses during external tournament tree replacement sorting procedures
US6049667A (en) Computer system, method of compiling and method of accessing address space with pointer of different width therefrom
US6343354B1 (en) Method and apparatus for compression, decompression, and execution of program code
US6158047A (en) Client/server system for fast, user transparent and memory efficient computer language translation
TW201621669A (en) Method for managing a memory apparatus, and associated memory apparatus thereof
US7533228B1 (en) Two-pass sliding compaction
US9280350B2 (en) Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments
US7822790B2 (en) Relative positioning and access of memory objects
US20060059477A1 (en) Method and system for dynamically loading data structures into memory with global constant pool
US6401182B1 (en) Method and apparatus for memory management
WO1999001817A1 (en) Defragmentation of stored data without pointer indirection
CN1324483C (en) System and method of data replacement in cache ways
US7124287B2 (en) Dynamically adaptive associativity of a branch target buffer (BTB)
CA1279731C (en) Cache memory with variable fetch and replacement schemes
GB2299879A (en) Instruction/data prefetching using non-referenced prefetch cache
US6944637B2 (en) Reduced size objects headers

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EISENBERG, VADIM;GUREVICH, MAXIM;HABER, GAD;AND OTHERS;REEL/FRAME:013472/0894

Effective date: 20021231

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20151218