US5802574A - Method and apparatus for quickly modifying cache state - Google Patents

Method and apparatus for quickly modifying cache state Download PDF

Info

Publication number
US5802574A
US5802574A US08/670,753 US67075396A US5802574A US 5802574 A US5802574 A US 5802574A US 67075396 A US67075396 A US 67075396A US 5802574 A US5802574 A US 5802574A
Authority
US
United States
Prior art keywords
attribute
cache line
bit
state
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/670,753
Inventor
Deif Atallah
Mitchell Kahn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US08/670,753 priority Critical patent/US5802574A/en
Application granted granted Critical
Publication of US5802574A publication Critical patent/US5802574A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0808Multiuser, multiprocessor or multiprocessing cache systems with cache invalidating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means

Definitions

  • the present invention relates to data processing systems utilizing cache memories, and more particularly to modifying the state of cache memory.
  • Caches are used in various forms to reduce the effective time required by a processor to access instructions or data that are stored in main memory.
  • the theory of a cache is that a system attains a higher speed by using a small portion of very fast memory as a cache along with a larger amount of slower main memory.
  • the cache memory is usually placed operationally between the data processing unit or units and the main memory.
  • the processor looks first to the cache memory to see if the information required is available in the cache.
  • data and/or instructions are first called from main memory, the information is stored in cache as part of a block of information (known as a cache line) that is taken from consecutive locations of main memory.
  • the processor interacts with the faster cache memory rather than main memory.
  • Statistically when information is accessed from a particular block in main memory, subsequent accesses most likely will call for information from within the same block. This locality of reference property results in a substantial decrease in average memory access time.
  • FIG. 1 is a simplified block diagram of a cache 100.
  • the cache includes a set of cache lines such as cache line 102. Each cache line is capable of storing a block of data 104 from consecutive addresses in main memory. Each cache line is associated with a tag 106, which represents the block address of the line.
  • a valid bit 108 indicates cache coherency, i.e., that the data in the cache accurately reflects the data maintained at the same address in main memory.
  • the reading and writing of data in the cache is controlled by a cache access logic circuit 110.
  • FIG. 2 The use of a cache in the context of a computer system is illustrated in FIG. 2.
  • a CPU central processing unit
  • a memory management unit (MMU) 204 controls the addressing of the cache
  • a bus control unit (BCU) 206 controls the access of the cache 100 and the CPU 202 to a system bus 208.
  • the system bus 208 enables the cache 100 and the CPU 202 to exchange information with a main memory 210.
  • a bus mastering I/O device such as an Ethernet controller
  • a second processor 212 also may access data from the main memory 210 over the system bus 208.
  • the MMU 204 translates virtual addresses issued by the CPU 202 into physical addresses for accessing main memory 210.
  • the memory mapping from virtual to physical addresses may be stored in a page table 214 in the main memory 210.
  • the MMU 204 includes a translation lookaside buffer, which is a cache storing a subset of the page table 214 to permit rapid access to the mapping represented by the subset.
  • Other information that is stored in each page table entry is a presence bit, which indicates whether or not the referenced address is presently assigned to the main memory 210 (if not, secondary memory must be accessed) and a protection mask, which indicates the current program's access rights (read and write, read only, or no access, among others) to the addressed physical page.
  • An access request to a page that is not present in main memory, or an access attempt without the proper access rights results in a trap that aborts the current micro-instruction and transfers control to an appropriate operating system micro-program.
  • the address issued by the CPU 202 is presented to the MMU 204. If MMU 204 determines that CPU 202 has access rights, then MMU 204 presents the address to the cache access logic 110 in the cache 100.
  • the cache access logic 110 compares the relevant part (the tag field) of the physical address containing the block address to addresses it currently stores in the tag array 106. If there is a match, i.e., a cache hit, then the data found at the referenced address is returned to the CPU 202. If, however, the address fails to match any of the tag addresses, i.e., a cache miss occurs, then the BCU 206 copies into the cache 100 the main memory data block containing the information at the addressed location. The BCU 206 also sets the corresponding valid bit, which indicates cache coherency.
  • Incoherency between the cache and main memory arises in many situations.
  • another CPU processor or I/O processor such as an Ethernet controller, typically reads and writes data to specified regions of the main memory 210 that are denoted as "buffers".
  • Incoherency occurs when the device 212 writes data into the main memory 210 at buffer locations which have been cached by the CPU 202 in cache 100.
  • Such inconsistencies may be overcome through various means.
  • One method of avoiding incoherency is to prohibit the cache from caching the buffer regions of main memory.
  • using an uncacheable region eliminates the opportunity to use faster cache memory and thus reduces the speed of CPU operations on buffer data.
  • the entire cache may be invalidated when any part of it is known to be incoherent. This technique is fast, but degrades performance because coherent data is also invalidated.
  • the cache 100 may monitor or "snoop" the address pins of main memory 210 to determine whether the device 212, or another device is writing to main memory. If so, the tag field of the address is compared to the tag in the cache 100. If there is a hit, the cache may change the valid bit associated with the cache line containing the address issued by the device 212 to indicate that the associated cache line is invalid. Because device 212 may typically operate on large blocks of data located at cached addresses, many cache lines will need to be invalidated. Every buffer address must be compared to every tag in the cache to determine whether a cache line includes a buffer address that would require invalidation. Because the buffer space is relatively large compared to the cache capacity, this comparison process can occupy the CPU 202 for thousands of clock cycles.
  • the need to perform tag comparisons not only slows the invalidation of cache areas, but also hampers cache performance in other contexts.
  • the MESI Modified-Exclusive-Shared-Invalidated
  • the MESI Modified-Exclusive-Shared-Invalidated protocol is used to maintain coherency in a multiprocessor system among on-chip caches coupled to the same main memory.
  • one processor in the multiprocessor system may request a change in the MESI state of a large number of cache lines. For example, after a first processor has modified data in an area of cache marked "Exclusive" to that processor, the state of corresponding cache lines in the first processor's cache is switched to "Modified" (by the MMU).
  • the first processor may then learn through snooping that a second processor is attempting to access data from the same page of main memory. If the first processor determines that the second processor should also have access rights to that page, then the MESI state of the corresponding cache lines in the second processor must be changed to "Shared". First, however, to maintain cache coherency, the first processor must inform the second processor to wait while the first processor writes back the modified data to main memory. After completing the write back operation, the first processor will then inform the second processor that the status of that page of memory should be changed in the cache of the second processor to the "Shared" state. The second processor must then perform a tag comparison for each address found in the page to set the state of the corresponding cache lines. This sequential process requires many clock cycles to complete.
  • the present invention provides a method and apparatus for selectively modifying the state of cached data without performing a tag comparison.
  • Each cache line includes at least one attribute bit and at least one state bit.
  • a processor issues an instruction requesting modification of the state of all cache lines associated with an attribute specified by the instruction. Qualifying logic modifies the state of a cache line as a function of the attributes stored in the cache line and the attribute specified by the instruction.
  • FIG. 1 is a block diagram of a prior art cache.
  • FIG. 2 illustrates the use of conventional cache in a computer system.
  • FIG. 3 illustrates one embodiment of the cache of the present invention.
  • FIG. 4 illustrates a computer system of the present invention incorporating a cache of the present invention.
  • FIG. 5 is a flowchart diagramming the process of the present invention.
  • FIG. 6 is a logical attribute unit of the present invention.
  • FIG. 7 is another embodiment of the cache of the present invention.
  • FIG. 8 illustrates a computer system incorporating an external logical attribute unit of the present invention.
  • the present invention provides a method and apparatus for quickly modifying the cache state.
  • specific embodiments are set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the invention may be practiced without these details. In other instances, well known elements, devices, process steps and the like are not set forth in detail in order to avoid unnecessarily obscuring the present invention.
  • FIG. 3 illustrates one embodiment of a cache 300 of the present invention.
  • Each cache line 302 of the present invention has been extended to include a set of attribute bits 304.
  • the attribute bits may, for example, indicate memory regions that the programmer wishes to make quickly invalidatable, or the process number of a process in a multi-tasking system.
  • the state bits 306 represent the state of the cache line, such as whether it is valid or invalid, or the MESI state.
  • the cache 300 of the present invention includes a cache access logic circuit 308, which in turn includes a qualifying logic circuit 310 of the present invention.
  • the qualifying logic 310 permits the state bits 306 to be modified as a function of the attribute bits 304 and a modify instruction issued by the CPU.
  • FIG. 4 is an embodiment of a computer system of the present invention incorporating the cache 300.
  • the computer system includes a CPU 400 along with an MMU 401 and a BCU 402, as well as the cache 300.
  • the CPU 400 interacts with the cache 300 over an internal CPU bus 408.
  • the CPU 400 MMU 401, BCU 402 and cache 300 may all reside on the same processor chip 410.
  • the processor chip 410 is coupled to the I/O or processor 212 and the main memory 210 via the main memory bus 802.
  • the MMU 401 includes a circuit of the present invention that is denoted a "logical attribute unit” (LAU) 404, and the BCU 402 includes a circuit of the present invention that is denoted "attribute setting logic” (ASL) 406.
  • LAU logical attribute unit
  • ASL attribute setting logic
  • the LAU 404 need not be located within the MMU 401. In fact, the LAU 404 may be located outside the processor chip.
  • the present invention is not limited to a computer system that requires an MMU for virtual-to-physical address translation.
  • the association of attributes with the type of access to main memory is established (Step 510).
  • Three types of accesses will be described herein, although the present invention should be understood as not being limited to those types.
  • the address of the access may determine the attribute value.
  • the attribute may be mapped to the state of the processor.
  • the attribute may represent the type of data that is being accessed.
  • the attribute bits 304 may initially be stored in the page table 214 along with other characteristics of different address ranges.
  • the attribute bits of that address and of a predefined block of adjacent addresses may be stored in the LAU 404, which may itself be implemented as a cache.
  • the attributes themselves would most likely not require as much storage space as the total amount of information normally contained in a page table, all of the attributes along with their corresponding address ranges initially may be stored directly in the logical address unit 404 of the processor chip.
  • the present invention is not limited to a computer system using a page table, as long as the relationship between addresses and their corresponding attributes is maintained somewhere in the system.
  • the cache access logic 308 When the CPU 400 attempts to access memory, the address is presented to the MMU 401. Assuming that this is the first access to that area of memory, the cache access logic 308 will indicate a cache miss.
  • the BCU then retrieves the appropriate cache line from main memory and stores it in the cache 300.
  • the attribute setting logic 406 in the BCU 402 obtains the attributes corresponding to the retrieved cache line from the logical attributes unit 404.
  • the ASL 406 then writes this information into the attribute bits 304 of the cache line 302 stored in the cache 300 (Step 512 of FIG. 5).
  • the CPU 400 issues an instruction requesting modification of the state of all cache lines associated with an attribute specified by the instruction (Step 514).
  • the qualifying logic 310 is configured to allow the state 306 of each cache line to be modified as a logical function of the attribute bits 304 and the "modify" instruction (Step 516).
  • the qualifying logic 310 can be implemented using simple combinational logic. The present invention thus allows the CPU to modify simultaneously the state of all cache lines associated with a given set of attribute bits in one clock cycle. This structure also avoids the need to compare address tags sequentially to determine which cache lines contain the states to be modified.
  • FIG. 7 A more specific example of the operation of the present invention for the first type of access is explained with reference to FIG. 7.
  • the figure illustrates the use of one instruction to invalidate all cache lines tagged with a bit indicating that they are "quickly invalidatable.”
  • This implementation could be used to invalidate the cache lines associated with the buffer area accessed by an Ethernet controller.
  • the state 306 to be modified is the valid bit
  • the corresponding attribute 304 is a bit representing that the line is deemed to be quickly invalidatable (QINV bit).
  • QINV bit quick invalidatable
  • the attribute setting logic 406 of the BCU 402 sets the QINV bit 304 according to the appropriate attribute stored in the LAU 404.
  • the valid bits 306 stored at all cache lines containing a QINV bit that is set to one are all simultaneously reset in parallel by an invalidate instruction from the CPU 400 using the qualifying logic 310.
  • the qualifying logic can be implemented using an AND gate 312, which resets a flip-flop 314 when the QINV bit 304 is set and a one bit is received as data from the invalidate instruction sent by the CPU 400. When reset, flip-flop 314 resets the valid bit 306.
  • the CPU 400 controls the setting of the attribute bits.
  • This feature proves useful in a number of situations. For example, a user may want to invalidate all cache lines used by a predefined process or application program in a multi-tasking system after the user has finished using the application.
  • the attribute may represent cache lines containing information used by the processor while handling an interrupt, or the protection level (e.g., user/supervisor state) of the processor while accessing the information in the cache lines.
  • the CPU 400 while running the processor, the application or while in the predetermined state, instructs the attribute setting logic 406 in the BCU 402 to mark the cache lines 302 that contain memory addresses used by the CPU while running the process or application, or while in the predetermined state, with attribute bits representing the desired attribute.
  • the CPU 400 may issue an instruction to the qualifying logic 310 specifying the attributes of the cache lines that are to have their state changed. For example, after the CPU 400 has finished running an application program or interrupt handler, the CPU may quickly invalidate all cache lines used by the application or interrupt handler.
  • the combinational logic of logic 310 is configured to clear the valid bits of all cache lines having attribute bits 304 that specify the application or interrupt handler identified by the CPU instruction.
  • the attribute bits 304 may represent process number
  • the state bits 306 represent the MESI state.
  • the system designer may want all cache lines used by an application program to be shared by all processors.
  • the CPU 400 instructs the attribute setting logic 406 to write the process number into the attribute bits 304 as the CPU 400 accesses the information at the cache lines used by the process.
  • the CPU 400 issues a modify instruction to the qualifying logic 310 specifying the process number of the application.
  • the qualifying 310 compares the process number of each cache line 302 to the process number specified by the modify instruction. For all cache lines for which those two quantities are equal, the logic 310 changes the state 306 of those lines to "Shared.”
  • the mapping of data type to attribute is a special case of the CPU setting the attribute bits.
  • the CPU may set the attribute bits to indicate whether it is fetching and caching either instructions or data.
  • the processor may need to use a cached application program repeatedly, but may only need the data for a short period of time. Accordingly, the CPU 400 could request that the cache lines holding the data be invalidated after it has finished using the data.
  • the data may be marked as being speculative or nonspeculative.
  • a processor may speculatively execute four load operations at a time and cache the results, while storing the speculative results of the previous four speculatively executed loads into the actual architecture registers. It can be seen that after performing ten iterations, the processor will have prefetched three sets of four load operations. After the processor has passed through the ten iterations and retired the speculative results into the architectural registers, the twelve prefetched data items in the cache are no longer of use. Accordingly, as the CPU performs the speculative loads and caches the results, it marks the corresponding cache line as containing speculative data.
  • the CPU itself is informed that an operation is speculative by a field in the opcode of the instruction. For example, a compiler would optimize the loop of load instructions by substituting prefetched loads for the source code, resulting in a corresponding change in the opcode. After passing through ten iterations of the loop, the processor can then quickly invalidate the cache lines holding (now unnecessary) speculative data.
  • an external LAU 800 may be coupled to the processor through the main memory bus 802.
  • This external LAU 800 can be preprogrammed like the internal LAU 404 to hold a mapping of addresses to attributes.
  • the external LAU can be preprogrammed with the buffer address as used by an Ethernet controller and mapped to the attribute of being quickly invalidatable.
  • the LAU 800 returns the corresponding attribute bits to the cache 300 through dedicated attribute pins of the attribute setting logic 406 along with the data that is retrieved over the usual data lines from main memory 210.

Abstract

The state of cached data may be modified without performing a tag comparison. Each cache line includes at least one attribute bit and at least one state bit. A processor issues an instruction requesting modification of the state of all cache lines associated with an attribute specified by the instruction. Qualifying logic modifies the state of a cache line as a function of the attributes stored in the cache line and the attribute specified by the instruction.

Description

This is a continuation of application Ser. No. 08/173,985, filed Dec. 28, 1993, now abandoned.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to data processing systems utilizing cache memories, and more particularly to modifying the state of cache memory.
2. Art Background
Caches are used in various forms to reduce the effective time required by a processor to access instructions or data that are stored in main memory. The theory of a cache is that a system attains a higher speed by using a small portion of very fast memory as a cache along with a larger amount of slower main memory. The cache memory is usually placed operationally between the data processing unit or units and the main memory. When the processor needs to access main memory, it looks first to the cache memory to see if the information required is available in the cache. When data and/or instructions are first called from main memory, the information is stored in cache as part of a block of information (known as a cache line) that is taken from consecutive locations of main memory. During subsequent memory accesses to the same addresses, the processor interacts with the faster cache memory rather than main memory. Statistically, when information is accessed from a particular block in main memory, subsequent accesses most likely will call for information from within the same block. This locality of reference property results in a substantial decrease in average memory access time.
FIG. 1 is a simplified block diagram of a cache 100. The cache includes a set of cache lines such as cache line 102. Each cache line is capable of storing a block of data 104 from consecutive addresses in main memory. Each cache line is associated with a tag 106, which represents the block address of the line. A valid bit 108 indicates cache coherency, i.e., that the data in the cache accurately reflects the data maintained at the same address in main memory. The reading and writing of data in the cache is controlled by a cache access logic circuit 110.
The use of a cache in the context of a computer system is illustrated in FIG. 2. Over an internal central processing unit (CPU) bus 200, a CPU 202 interacts with the cache 100. A memory management unit (MMU) 204 controls the addressing of the cache, and a bus control unit (BCU) 206 controls the access of the cache 100 and the CPU 202 to a system bus 208. The system bus 208 enables the cache 100 and the CPU 202 to exchange information with a main memory 210. A bus mastering I/O device (such as an Ethernet controller) or a second processor 212, also may access data from the main memory 210 over the system bus 208.
In a typical computer system utilizing virtual memory, the MMU 204 translates virtual addresses issued by the CPU 202 into physical addresses for accessing main memory 210. The memory mapping from virtual to physical addresses may be stored in a page table 214 in the main memory 210. The MMU 204 includes a translation lookaside buffer, which is a cache storing a subset of the page table 214 to permit rapid access to the mapping represented by the subset. Other information that is stored in each page table entry is a presence bit, which indicates whether or not the referenced address is presently assigned to the main memory 210 (if not, secondary memory must be accessed) and a protection mask, which indicates the current program's access rights (read and write, read only, or no access, among others) to the addressed physical page. An access request to a page that is not present in main memory, or an access attempt without the proper access rights, results in a trap that aborts the current micro-instruction and transfers control to an appropriate operating system micro-program.
When the CPU 202 attempts to access the main memory 210, the address issued by the CPU 202 is presented to the MMU 204. If MMU 204 determines that CPU 202 has access rights, then MMU 204 presents the address to the cache access logic 110 in the cache 100. The cache access logic 110 compares the relevant part (the tag field) of the physical address containing the block address to addresses it currently stores in the tag array 106. If there is a match, i.e., a cache hit, then the data found at the referenced address is returned to the CPU 202. If, however, the address fails to match any of the tag addresses, i.e., a cache miss occurs, then the BCU 206 copies into the cache 100 the main memory data block containing the information at the addressed location. The BCU 206 also sets the corresponding valid bit, which indicates cache coherency.
Incoherency between the cache and main memory arises in many situations. For example, another CPU processor or I/O processor, such as an Ethernet controller, typically reads and writes data to specified regions of the main memory 210 that are denoted as "buffers". Incoherency occurs when the device 212 writes data into the main memory 210 at buffer locations which have been cached by the CPU 202 in cache 100. Such inconsistencies may be overcome through various means. One method of avoiding incoherency is to prohibit the cache from caching the buffer regions of main memory. However, using an uncacheable region eliminates the opportunity to use faster cache memory and thus reduces the speed of CPU operations on buffer data. Alternatively, the entire cache may be invalidated when any part of it is known to be incoherent. This technique is fast, but degrades performance because coherent data is also invalidated.
As another alternative, the cache 100 may monitor or "snoop" the address pins of main memory 210 to determine whether the device 212, or another device is writing to main memory. If so, the tag field of the address is compared to the tag in the cache 100. If there is a hit, the cache may change the valid bit associated with the cache line containing the address issued by the device 212 to indicate that the associated cache line is invalid. Because device 212 may typically operate on large blocks of data located at cached addresses, many cache lines will need to be invalidated. Every buffer address must be compared to every tag in the cache to determine whether a cache line includes a buffer address that would require invalidation. Because the buffer space is relatively large compared to the cache capacity, this comparison process can occupy the CPU 202 for thousands of clock cycles.
The need to perform tag comparisons not only slows the invalidation of cache areas, but also hampers cache performance in other contexts. The MESI (Modified-Exclusive-Shared-Invalidated) protocol is used to maintain coherency in a multiprocessor system among on-chip caches coupled to the same main memory. Under certain circumstances, one processor in the multiprocessor system may request a change in the MESI state of a large number of cache lines. For example, after a first processor has modified data in an area of cache marked "Exclusive" to that processor, the state of corresponding cache lines in the first processor's cache is switched to "Modified" (by the MMU). The first processor may then learn through snooping that a second processor is attempting to access data from the same page of main memory. If the first processor determines that the second processor should also have access rights to that page, then the MESI state of the corresponding cache lines in the second processor must be changed to "Shared". First, however, to maintain cache coherency, the first processor must inform the second processor to wait while the first processor writes back the modified data to main memory. After completing the write back operation, the first processor will then inform the second processor that the status of that page of memory should be changed in the cache of the second processor to the "Shared" state. The second processor must then perform a tag comparison for each address found in the page to set the state of the corresponding cache lines. This sequential process requires many clock cycles to complete.
The previous discussion illustrates that to modify the state of a set of cache lines requires that the cache lines to be modified be determined through the tag comparison process. However, this process has been shown often to reduce the speed of memory access operations to an unacceptable level. Thus, it is an object of the present invention to selectively update the state of cached data without the need for performing tag comparisons.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for selectively modifying the state of cached data without performing a tag comparison. Each cache line includes at least one attribute bit and at least one state bit. A processor issues an instruction requesting modification of the state of all cache lines associated with an attribute specified by the instruction. Qualifying logic modifies the state of a cache line as a function of the attributes stored in the cache line and the attribute specified by the instruction.
BRIEF DESCRIPTION OF THE DRAWINGS
The objects, features and advantages of the present invention will be apparent from the following detailed description in which:
FIG. 1 is a block diagram of a prior art cache.
FIG. 2 illustrates the use of conventional cache in a computer system.
FIG. 3 illustrates one embodiment of the cache of the present invention.
FIG. 4 illustrates a computer system of the present invention incorporating a cache of the present invention.
FIG. 5 is a flowchart diagramming the process of the present invention.
FIG. 6 is a logical attribute unit of the present invention.
FIG. 7 is another embodiment of the cache of the present invention.
FIG. 8 illustrates a computer system incorporating an external logical attribute unit of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides a method and apparatus for quickly modifying the cache state. For purposes of explanation, specific embodiments are set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the invention may be practiced without these details. In other instances, well known elements, devices, process steps and the like are not set forth in detail in order to avoid unnecessarily obscuring the present invention.
FIG. 3 illustrates one embodiment of a cache 300 of the present invention. Each cache line 302 of the present invention has been extended to include a set of attribute bits 304. The attribute bits may, for example, indicate memory regions that the programmer wishes to make quickly invalidatable, or the process number of a process in a multi-tasking system. The state bits 306 represent the state of the cache line, such as whether it is valid or invalid, or the MESI state. The cache 300 of the present invention includes a cache access logic circuit 308, which in turn includes a qualifying logic circuit 310 of the present invention. The qualifying logic 310 permits the state bits 306 to be modified as a function of the attribute bits 304 and a modify instruction issued by the CPU.
The general operation of the cache 300 will be explained with reference to the block diagram of FIG. 4 and the flowchart of FIG. 5. FIG. 4 is an embodiment of a computer system of the present invention incorporating the cache 300. The computer system includes a CPU 400 along with an MMU 401 and a BCU 402, as well as the cache 300. The CPU 400 interacts with the cache 300 over an internal CPU bus 408. The CPU 400 MMU 401, BCU 402 and cache 300 may all reside on the same processor chip 410. The processor chip 410 is coupled to the I/O or processor 212 and the main memory 210 via the main memory bus 802. The MMU 401 includes a circuit of the present invention that is denoted a "logical attribute unit" (LAU) 404, and the BCU 402 includes a circuit of the present invention that is denoted "attribute setting logic" (ASL) 406. Note that in an alternative embodiment discussed below, the LAU 404 need not be located within the MMU 401. In fact, the LAU 404 may be located outside the processor chip. Moreover, the present invention is not limited to a computer system that requires an MMU for virtual-to-physical address translation.
Referring to FIG. 5, at some point in time the association of attributes with the type of access to main memory is established (Step 510). Three types of accesses will be described herein, although the present invention should be understood as not being limited to those types. First, the address of the access may determine the attribute value. Second, the attribute may be mapped to the state of the processor. Third, the attribute may represent the type of data that is being accessed.
To implement the mapping of address range to attribute, in one embodiment the attribute bits 304 (shown in FIG. 3) may initially be stored in the page table 214 along with other characteristics of different address ranges. In that case, when an address is cached in the cache 300, the attribute bits of that address and of a predefined block of adjacent addresses may be stored in the LAU 404, which may itself be implemented as a cache. Alternatively, as shown in FIG. 6, because the attributes themselves would most likely not require as much storage space as the total amount of information normally contained in a page table, all of the attributes along with their corresponding address ranges initially may be stored directly in the logical address unit 404 of the processor chip. Thus, the present invention is not limited to a computer system using a page table, as long as the relationship between addresses and their corresponding attributes is maintained somewhere in the system.
When the CPU 400 attempts to access memory, the address is presented to the MMU 401. Assuming that this is the first access to that area of memory, the cache access logic 308 will indicate a cache miss. The BCU then retrieves the appropriate cache line from main memory and stores it in the cache 300. For the first type of access, the attribute setting logic 406 in the BCU 402 obtains the attributes corresponding to the retrieved cache line from the logical attributes unit 404. The ASL 406 then writes this information into the attribute bits 304 of the cache line 302 stored in the cache 300 (Step 512 of FIG. 5).
The CPU 400 issues an instruction requesting modification of the state of all cache lines associated with an attribute specified by the instruction (Step 514). The qualifying logic 310 is configured to allow the state 306 of each cache line to be modified as a logical function of the attribute bits 304 and the "modify" instruction (Step 516). In general, the qualifying logic 310 can be implemented using simple combinational logic. The present invention thus allows the CPU to modify simultaneously the state of all cache lines associated with a given set of attribute bits in one clock cycle. This structure also avoids the need to compare address tags sequentially to determine which cache lines contain the states to be modified.
A more specific example of the operation of the present invention for the first type of access is explained with reference to FIG. 7. The figure illustrates the use of one instruction to invalidate all cache lines tagged with a bit indicating that they are "quickly invalidatable." This implementation could be used to invalidate the cache lines associated with the buffer area accessed by an Ethernet controller. In this case, the state 306 to be modified is the valid bit, and the corresponding attribute 304 is a bit representing that the line is deemed to be quickly invalidatable (QINV bit). When a line containing quickly invalidatable data is cached, the attribute setting logic 406 of the BCU 402 sets the QINV bit 304 according to the appropriate attribute stored in the LAU 404. The valid bits 306 stored at all cache lines containing a QINV bit that is set to one are all simultaneously reset in parallel by an invalidate instruction from the CPU 400 using the qualifying logic 310. In this embodiment, the qualifying logic can be implemented using an AND gate 312, which resets a flip-flop 314 when the QINV bit 304 is set and a one bit is received as data from the invalidate instruction sent by the CPU 400. When reset, flip-flop 314 resets the valid bit 306.
For the second type of access, to implement the mapping of processor state to attribute, the CPU 400 controls the setting of the attribute bits. This feature proves useful in a number of situations. For example, a user may want to invalidate all cache lines used by a predefined process or application program in a multi-tasking system after the user has finished using the application. Alternatively, the attribute may represent cache lines containing information used by the processor while handling an interrupt, or the protection level (e.g., user/supervisor state) of the processor while accessing the information in the cache lines. In these cases, the CPU 400, while running the processor, the application or while in the predetermined state, instructs the attribute setting logic 406 in the BCU 402 to mark the cache lines 302 that contain memory addresses used by the CPU while running the process or application, or while in the predetermined state, with attribute bits representing the desired attribute.
After the processor has changed state, the CPU 400 may issue an instruction to the qualifying logic 310 specifying the attributes of the cache lines that are to have their state changed. For example, after the CPU 400 has finished running an application program or interrupt handler, the CPU may quickly invalidate all cache lines used by the application or interrupt handler. Using circuit design techniques well known in the art, the combinational logic of logic 310 is configured to clear the valid bits of all cache lines having attribute bits 304 that specify the application or interrupt handler identified by the CPU instruction.
As another example, the attribute bits 304 may represent process number, and the state bits 306 represent the MESI state. For example, in a multiprocessor system the system designer may want all cache lines used by an application program to be shared by all processors. As before, the CPU 400 instructs the attribute setting logic 406 to write the process number into the attribute bits 304 as the CPU 400 accesses the information at the cache lines used by the process. The CPU 400 issues a modify instruction to the qualifying logic 310 specifying the process number of the application. The qualifying 310 compares the process number of each cache line 302 to the process number specified by the modify instruction. For all cache lines for which those two quantities are equal, the logic 310 changes the state 306 of those lines to "Shared."
As for the third type of access, the mapping of data type to attribute is a special case of the CPU setting the attribute bits. For example, when using a unified instruction/data cache, the CPU may set the attribute bits to indicate whether it is fetching and caching either instructions or data. The processor may need to use a cached application program repeatedly, but may only need the data for a short period of time. Accordingly, the CPU 400 could request that the cache lines holding the data be invalidated after it has finished using the data.
Alternatively, the data may be marked as being speculative or nonspeculative. For example, if a processor is executing a loop of ten load operations, it may speculatively execute four load operations at a time and cache the results, while storing the speculative results of the previous four speculatively executed loads into the actual architecture registers. It can be seen that after performing ten iterations, the processor will have prefetched three sets of four load operations. After the processor has passed through the ten iterations and retired the speculative results into the architectural registers, the twelve prefetched data items in the cache are no longer of use. Accordingly, as the CPU performs the speculative loads and caches the results, it marks the corresponding cache line as containing speculative data. The CPU itself is informed that an operation is speculative by a field in the opcode of the instruction. For example, a compiler would optimize the loop of load instructions by substituting prefetched loads for the source code, resulting in a corresponding change in the opcode. After passing through ten iterations of the loop, the processor can then quickly invalidate the cache lines holding (now unnecessary) speculative data.
It was mentioned above that the LAU may be located outside the processor chip. For example, as shown in FIG. 8, an external LAU 800 may be coupled to the processor through the main memory bus 802. This external LAU 800 can be preprogrammed like the internal LAU 404 to hold a mapping of addresses to attributes. For example, the external LAU can be preprogrammed with the buffer address as used by an Ethernet controller and mapped to the attribute of being quickly invalidatable. When the CPU 400 attempts to access an area of main memory that is also mapped by the external LAU 800, the LAU 800 returns the corresponding attribute bits to the cache 300 through dedicated attribute pins of the attribute setting logic 406 along with the data that is retrieved over the usual data lines from main memory 210.
Based on the foregoing examples, one can see that the present invention is generally applicable to the modification of any conceivable cache state as a function of the attributes of a cache line and a predetermined instruction issued by the CPU. Thus, although the invention has been described in conjunction with preferred embodiments, it will be appreciated that various modifications and alterations may be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (18)

We claim:
1. An apparatus for performing operations on cached information, the apparatus comprising:
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, wherein the at least one attribute is a user/supervisor state of the processor;
attribute setting circuitry for setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit in response to the at least one attribute bit and the instruction without performing a tag comparison.
2. An apparatus for performing operations on cached information, the apparatus comprising:
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing an attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is information used by an interrupt handler;
attribute setting circuitry for setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit in response to the at least one attribute bit and the instruction without performing a tag comparison.
3. An apparatus for performing operations on cached information, the apparatus comprising:
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing an attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is an instruction or data;
attribute setting circuitry for setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit in response to the at least one attribute bit and the instruction without performing a tag comparison.
4. An apparatus for performing operations on cached information, the apparatus comprising:
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing an attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is speculative or nonspeculative;
attribute setting circuitry for setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit in response to the at least one attribute bit and the instruction without performing a tag comparison.
5. An apparatus for performing operations on cached information, the apparatus comprising:
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing an attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line wherein the at least one attribute indicates that the at least one cache line is to be invalidated in response to the instruction, the at least one state bit being a valid/invalid bit;
attribute setting circuitry for setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit in response to the at least one attribute bit and the instruction without performing a tag comparison.
6. A computer system comprising:
a processor;
an apparatus that performs operations on cached information, the apparatus being coupled to the processor and comprising,
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, wherein the at least one attribute is a user/supervisor state of the processors;
attribute setting circuitry that sets the at least one attribute bit of a corresponding cache line in response to a control signal from the processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit as a function of the at least one attribute bit and the instruction.
7. A computer system comprising:
a processor;
an apparatus that performs operations on cached information, the apparatus being coupled to the processor and comprising,
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is information used by an interrupt handler;
attribute setting circuitry that sets the at least one attribute bit of a corresponding cache line in response to a control signal from the processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit as a function of the at least one attribute bit and the instruction.
8. A computer system comprising:
a processor;
an apparatus that performs operations on cached information, the apparatus being coupled to the processor and comprising,
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is instruction or data;
attribute setting circuitry that sets the at least one attribute bit of a corresponding cache line in response to a control signal from the processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit as a function of the at least one attribute bit and the instruction.
9. A computer system comprising:
a processor;
an apparatus that performs operations on cached information, the apparatus being coupled to the processor and comprising,
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is speculative or nonspeculative data;
attribute setting circuitry that sets the at least one attribute bit of a corresponding cache line in response to a control signal from the processor; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit as a function of the at least one attribute bit and the instruction.
10. A computer system comprising:
a processor;
an apparatus that performs operations on cached information, the apparatus being coupled to the processor and comprising,
a cache having at least one cache line, the at least one cache line representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, wherein the at least one attribute indicates that the at least one cache line is to be invalidated in response to the instruction, the at least one state bit being a valid/invalid bit; and
qualifying logic that receives the at least one attribute bit and an instruction specifying a predetermined attribute, the qualifying logic setting the at least one state bit as a function of the at least one attribute bit and the instruction.
11. A method for performing operations on cached information, at least one cache line of a cache representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, the method comprising the steps of:
issuing an instruction specifying a predetermined attribute;
setting at least one state bit of the at least one cache line as a function of the at least one attribute bit and the instruction; and
setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor, wherein the at least one attribute is a user/supervisor state of the processor.
12. A method for performing operations on cached information, at least one cache line of a cache representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, the method comprising the steps of:
issuing an instruction specifying a predetermined attribute;
setting at least one state bit of the at least one cache line as a function of the at least one attribute bit and the instruction; and
setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor, wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is information used by an interrupt handler.
13. A method for performing operations on cached information, at least one cache line of a cache representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, the method comprising the steps of:
issuing an instruction specifying a predetermined attribute;
setting at least one state bit of the at least one cache line as a function of the at least one attribute bit and the instruction; and
setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor, wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is instruction or data.
14. A method for performing operations on cached information, at least one cache line of a cache representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, the method comprising the steps of:
issuing an instruction specifying a predetermined attribute;
setting at least one state bit of the at least one cache line as a function of the at least one attribute bit and the instruction; and
setting the at least one attribute bit of a corresponding cache line in response to a control signal from a processor, wherein the at least one attribute indicates whether the corresponding cached information represented by the at least one cache line is speculative or nonspeculative.
15. A method for performing operations on cached information, at least one cache line of a cache representing corresponding cached information, the at least one cache line including at least one attribute bit and at least one state bit, the at least one attribute bit representing at least one attribute of the at least one cache line, the at least one state bit representing a state of the at least one cache line, the method comprising the steps of:
issuing an instruction specifying a predetermined attribute; and
setting at least one state bit of the at least one cache line as a function of the at least one attribute bit and the instruction, wherein the at least one attribute indicates that the at least one cache line is to be invalidated in response to the instruction, the at least one state bit being a valid/invalid bit.
16. A method of caching information, including the steps of:
storing cache lines in a cache;
for each cache line stored, storing tag bits, state bits and attribute bits; wherein the attribute bits comprise bits that:
indicate a user/supervisor state of the processor;
indicate that the at least one cache line is to be invalidated in response to the instruction, the at least one state bit being a valid/invalid bit;
indicate whether corresponding cached information represented by the at least one cache line is information used by an interrupt handler;
indicate whether the corresponding cached information represented by the at least one cache line is an instruction or data; and
indicate whether the corresponding cached information represented by the at least one cache line is speculative or nonspeculative;
receiving an instruction from a processor; and
when the instruction includes certain attribute bits, changing state bits of each of the cache lines that include the certain attribute bits in one clock cycle.
17. The method of claim 16, wherein the state bits are changed to indicate that the cache line is invalid.
18. The method of claim 16, further comprising the step of associating an access type with an attribute.
US08/670,753 1993-12-28 1996-06-24 Method and apparatus for quickly modifying cache state Expired - Lifetime US5802574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/670,753 US5802574A (en) 1993-12-28 1996-06-24 Method and apparatus for quickly modifying cache state

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17398593A 1993-12-28 1993-12-28
US08/670,753 US5802574A (en) 1993-12-28 1996-06-24 Method and apparatus for quickly modifying cache state

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US17398593A Continuation 1993-12-28 1993-12-28

Publications (1)

Publication Number Publication Date
US5802574A true US5802574A (en) 1998-09-01

Family

ID=22634339

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/670,753 Expired - Lifetime US5802574A (en) 1993-12-28 1996-06-24 Method and apparatus for quickly modifying cache state

Country Status (1)

Country Link
US (1) US5802574A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6006311A (en) * 1997-04-14 1999-12-21 Internatinal Business Machines Corporation Dynamic updating of repair mask used for cache defect avoidance
US6044430A (en) * 1997-12-17 2000-03-28 Advanced Micro Devices Inc. Real time interrupt handling for superscalar processors
US6058456A (en) * 1997-04-14 2000-05-02 International Business Machines Corporation Software-managed programmable unified/split caching mechanism for instructions and data
US6397304B1 (en) * 1999-06-16 2002-05-28 Intel Corporation Method and apparatus for improving system performance in multiprocessor systems
US6516387B1 (en) * 2001-07-30 2003-02-04 Lsi Logic Corporation Set-associative cache having a configurable split and unified mode
US6549984B1 (en) * 1997-12-17 2003-04-15 Intel Corporation Multi-bus access cache
US6591332B1 (en) * 2000-04-28 2003-07-08 Hewlett-Packard Development Company, L.P. Apparatus and method for tracking flushes of cache entries in a data processing system
US6636944B1 (en) * 1997-04-24 2003-10-21 International Business Machines Corporation Associative cache and method for replacing data entries having an IO state
US20040225774A1 (en) * 2001-02-23 2004-11-11 Shah Paras A. Enhancing a pci-x split completion transaction by aligning cachelines with an allowable disconnect boundary's ending address
US20060103659A1 (en) * 2004-11-15 2006-05-18 Ashish Karandikar Latency tolerant system for executing video processing operations
US7149851B1 (en) 2003-08-21 2006-12-12 Transmeta Corporation Method and system for conservatively managing store capacity available to a processor issuing stores
US20070067578A1 (en) * 2005-09-16 2007-03-22 Hewlett-Packard Development Company, L.P. Controlling processor access to cache memory
US7225299B1 (en) * 2003-07-16 2007-05-29 Transmeta Corporation Supporting speculative modification in a data cache
US20090037689A1 (en) * 2007-07-30 2009-02-05 Nvidia Corporation Optimal Use of Buffer Space by a Storage Controller Which Writes Retrieved Data Directly to a Memory
US20090153571A1 (en) * 2007-12-17 2009-06-18 Crow Franklin C Interrupt handling techniques in the rasterizer of a GPU
US20090273606A1 (en) * 2008-05-01 2009-11-05 Nvidia Corporation Rewind-enabled hardware encoder
US20090274209A1 (en) * 2008-05-01 2009-11-05 Nvidia Corporation Multistandard hardware video encoder
US20100153661A1 (en) * 2008-12-11 2010-06-17 Nvidia Corporation Processing of read requests in a memory controller using pre-fetch mechanism
US8411096B1 (en) 2007-08-15 2013-04-02 Nvidia Corporation Shader program instruction fetch
US8427490B1 (en) 2004-05-14 2013-04-23 Nvidia Corporation Validating a graphics pipeline using pre-determined schedules
US8624906B2 (en) 2004-09-29 2014-01-07 Nvidia Corporation Method and system for non stalling pipeline instruction fetching from memory
US8659601B1 (en) 2007-08-15 2014-02-25 Nvidia Corporation Program sequencer for generating indeterminant length shader programs for a graphics processor
US8698819B1 (en) 2007-08-15 2014-04-15 Nvidia Corporation Software assisted shader merging
US9024957B1 (en) 2007-08-15 2015-05-05 Nvidia Corporation Address independent shader program loading
US9064333B2 (en) 2007-12-17 2015-06-23 Nvidia Corporation Interrupt handling techniques in the rasterizer of a GPU
US9092170B1 (en) 2005-10-18 2015-07-28 Nvidia Corporation Method and system for implementing fragment operation processing across a graphics bus interconnect

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4355355A (en) * 1980-03-19 1982-10-19 International Business Machines Corp. Address generating mechanism for multiple virtual spaces
US5321827A (en) * 1989-08-02 1994-06-14 Advanced Logic Research, Inc. Computer system with modular upgrade capability
US5341508A (en) * 1991-10-04 1994-08-23 Bull Hn Information Systems Inc. Processing unit having multiple synchronous bus for sharing access and regulating system bus access to synchronous bus
US5355467A (en) * 1991-06-04 1994-10-11 Intel Corporation Second level cache controller unit and system
US5375216A (en) * 1992-02-28 1994-12-20 Motorola, Inc. Apparatus and method for optimizing performance of a cache memory in a data processing system
US5448719A (en) * 1992-06-05 1995-09-05 Compaq Computer Corp. Method and apparatus for maintaining and retrieving live data in a posted write cache in case of power failure
US5524234A (en) * 1992-11-13 1996-06-04 Cyrix Corporation Coherency for write-back cache in a system designed for write-through cache including write-back latency control
US5629950A (en) * 1992-04-24 1997-05-13 Digital Equipment Corporation Fault management scheme for a cache memory

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4355355A (en) * 1980-03-19 1982-10-19 International Business Machines Corp. Address generating mechanism for multiple virtual spaces
US5321827A (en) * 1989-08-02 1994-06-14 Advanced Logic Research, Inc. Computer system with modular upgrade capability
US5355467A (en) * 1991-06-04 1994-10-11 Intel Corporation Second level cache controller unit and system
US5341508A (en) * 1991-10-04 1994-08-23 Bull Hn Information Systems Inc. Processing unit having multiple synchronous bus for sharing access and regulating system bus access to synchronous bus
US5375216A (en) * 1992-02-28 1994-12-20 Motorola, Inc. Apparatus and method for optimizing performance of a cache memory in a data processing system
US5629950A (en) * 1992-04-24 1997-05-13 Digital Equipment Corporation Fault management scheme for a cache memory
US5448719A (en) * 1992-06-05 1995-09-05 Compaq Computer Corp. Method and apparatus for maintaining and retrieving live data in a posted write cache in case of power failure
US5524234A (en) * 1992-11-13 1996-06-04 Cyrix Corporation Coherency for write-back cache in a system designed for write-through cache including write-back latency control

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Applicant s admitted art(Figure 2), Dec. 28, 1993. *
Applicant's admitted art(Figure 2), Dec. 28, 1993.

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6006311A (en) * 1997-04-14 1999-12-21 Internatinal Business Machines Corporation Dynamic updating of repair mask used for cache defect avoidance
US6058456A (en) * 1997-04-14 2000-05-02 International Business Machines Corporation Software-managed programmable unified/split caching mechanism for instructions and data
US6636944B1 (en) * 1997-04-24 2003-10-21 International Business Machines Corporation Associative cache and method for replacing data entries having an IO state
US6044430A (en) * 1997-12-17 2000-03-28 Advanced Micro Devices Inc. Real time interrupt handling for superscalar processors
US6295574B1 (en) 1997-12-17 2001-09-25 Advanced Micro Devices, Inc. Real time interrupt handling for superscalar processors
US6549984B1 (en) * 1997-12-17 2003-04-15 Intel Corporation Multi-bus access cache
US6397304B1 (en) * 1999-06-16 2002-05-28 Intel Corporation Method and apparatus for improving system performance in multiprocessor systems
US6591332B1 (en) * 2000-04-28 2003-07-08 Hewlett-Packard Development Company, L.P. Apparatus and method for tracking flushes of cache entries in a data processing system
US20040225774A1 (en) * 2001-02-23 2004-11-11 Shah Paras A. Enhancing a pci-x split completion transaction by aligning cachelines with an allowable disconnect boundary's ending address
US6901467B2 (en) * 2001-02-23 2005-05-31 Hewlett-Packard Development Company, L.P. Enhancing a PCI-X split completion transaction by aligning cachelines with an allowable disconnect boundary's ending address
US6516387B1 (en) * 2001-07-30 2003-02-04 Lsi Logic Corporation Set-associative cache having a configurable split and unified mode
US7873793B1 (en) 2003-07-16 2011-01-18 Guillermo Rozas Supporting speculative modification in a data cache
US7225299B1 (en) * 2003-07-16 2007-05-29 Transmeta Corporation Supporting speculative modification in a data cache
US7149851B1 (en) 2003-08-21 2006-12-12 Transmeta Corporation Method and system for conservatively managing store capacity available to a processor issuing stores
US7606979B1 (en) * 2003-08-21 2009-10-20 Guillermo Rozas Method and system for conservatively managing store capacity available to a processor issuing stores
US8427490B1 (en) 2004-05-14 2013-04-23 Nvidia Corporation Validating a graphics pipeline using pre-determined schedules
US8624906B2 (en) 2004-09-29 2014-01-07 Nvidia Corporation Method and system for non stalling pipeline instruction fetching from memory
US8698817B2 (en) 2004-11-15 2014-04-15 Nvidia Corporation Video processor having scalar and vector components
US8493397B1 (en) * 2004-11-15 2013-07-23 Nvidia Corporation State machine control for a pipelined L2 cache to implement memory transfers for a video processor
US8736623B1 (en) 2004-11-15 2014-05-27 Nvidia Corporation Programmable DMA engine for implementing memory transfers and video processing for a video processor
US9111368B1 (en) 2004-11-15 2015-08-18 Nvidia Corporation Pipelined L2 cache for memory transfers for a video processor
US8725990B1 (en) 2004-11-15 2014-05-13 Nvidia Corporation Configurable SIMD engine with high, low and mixed precision modes
US20060152520A1 (en) * 2004-11-15 2006-07-13 Shirish Gadre Stream processing in a video processor
US8687008B2 (en) 2004-11-15 2014-04-01 Nvidia Corporation Latency tolerant system for executing video processing operations
US8738891B1 (en) 2004-11-15 2014-05-27 Nvidia Corporation Methods and systems for command acceleration in a video processor via translation of scalar instructions into vector instructions
US20060103659A1 (en) * 2004-11-15 2006-05-18 Ashish Karandikar Latency tolerant system for executing video processing operations
US8683184B1 (en) 2004-11-15 2014-03-25 Nvidia Corporation Multi context execution on a video processor
US8416251B2 (en) 2004-11-15 2013-04-09 Nvidia Corporation Stream processing in a video processor
US8424012B1 (en) 2004-11-15 2013-04-16 Nvidia Corporation Context switching on a video processor having a scalar execution unit and a vector execution unit
US20060176308A1 (en) * 2004-11-15 2006-08-10 Ashish Karandikar Multidimensional datapath processing in a video processor
US20060176309A1 (en) * 2004-11-15 2006-08-10 Shirish Gadre Video processor having scalar and vector components
US8493396B2 (en) 2004-11-15 2013-07-23 Nvidia Corporation Multidimensional datapath processing in a video processor
US7984241B2 (en) * 2005-09-16 2011-07-19 Hewlett-Packard Development Company, L.P. Controlling processor access to cache memory
US20070067578A1 (en) * 2005-09-16 2007-03-22 Hewlett-Packard Development Company, L.P. Controlling processor access to cache memory
US9092170B1 (en) 2005-10-18 2015-07-28 Nvidia Corporation Method and system for implementing fragment operation processing across a graphics bus interconnect
US8683126B2 (en) 2007-07-30 2014-03-25 Nvidia Corporation Optimal use of buffer space by a storage controller which writes retrieved data directly to a memory
US20090037689A1 (en) * 2007-07-30 2009-02-05 Nvidia Corporation Optimal Use of Buffer Space by a Storage Controller Which Writes Retrieved Data Directly to a Memory
US8659601B1 (en) 2007-08-15 2014-02-25 Nvidia Corporation Program sequencer for generating indeterminant length shader programs for a graphics processor
US8411096B1 (en) 2007-08-15 2013-04-02 Nvidia Corporation Shader program instruction fetch
US8698819B1 (en) 2007-08-15 2014-04-15 Nvidia Corporation Software assisted shader merging
US9024957B1 (en) 2007-08-15 2015-05-05 Nvidia Corporation Address independent shader program loading
US8780123B2 (en) 2007-12-17 2014-07-15 Nvidia Corporation Interrupt handling techniques in the rasterizer of a GPU
US9064333B2 (en) 2007-12-17 2015-06-23 Nvidia Corporation Interrupt handling techniques in the rasterizer of a GPU
US20090153571A1 (en) * 2007-12-17 2009-06-18 Crow Franklin C Interrupt handling techniques in the rasterizer of a GPU
US20090273606A1 (en) * 2008-05-01 2009-11-05 Nvidia Corporation Rewind-enabled hardware encoder
US8923385B2 (en) 2008-05-01 2014-12-30 Nvidia Corporation Rewind-enabled hardware encoder
US20090274209A1 (en) * 2008-05-01 2009-11-05 Nvidia Corporation Multistandard hardware video encoder
US8681861B2 (en) 2008-05-01 2014-03-25 Nvidia Corporation Multistandard hardware video encoder
US8489851B2 (en) 2008-12-11 2013-07-16 Nvidia Corporation Processing of read requests in a memory controller using pre-fetch mechanism
US20100153661A1 (en) * 2008-12-11 2010-06-17 Nvidia Corporation Processing of read requests in a memory controller using pre-fetch mechanism

Similar Documents

Publication Publication Date Title
US5802574A (en) Method and apparatus for quickly modifying cache state
US6591340B2 (en) Microprocessor having improved memory management unit and cache memory
US6598128B1 (en) Microprocessor having improved memory management unit and cache memory
US7032074B2 (en) Method and mechanism to use a cache to translate from a virtual bus to a physical bus
US5715428A (en) Apparatus for maintaining multilevel cache hierarchy coherency in a multiprocessor computer system
US5524233A (en) Method and apparatus for controlling an external cache memory wherein the cache controller is responsive to an interagent communication for performing cache control operations
US9251095B2 (en) Providing metadata in a translation lookaside buffer (TLB)
US5359723A (en) Cache memory hierarchy having a large write through first level that allocates for CPU read misses only and a small write back second level that allocates for CPU write misses only
US6629207B1 (en) Method for loading instructions or data into a locked way of a cache memory
EP1182559B1 (en) Improved microprocessor
US8909871B2 (en) Data processing system and method for reducing cache pollution by write stream memory access patterns
EP0945805B1 (en) A cache coherency mechanism
US6789172B2 (en) Cache and DMA with a global valid bit
USRE45078E1 (en) Highly efficient design of storage array utilizing multiple pointers to indicate valid and invalid lines for use in first and second cache spaces and memory subsystems
US7539823B2 (en) Multiprocessing apparatus having reduced cache miss occurrences
CN101446923B (en) System and method for flushing a cache line in response to instruction
EP3265917B1 (en) Cache maintenance instruction
US7434007B2 (en) Management of cache memories in a data processing apparatus
US5715427A (en) Semi-associative cache with MRU/LRU replacement
US5671231A (en) Method and apparatus for performing cache snoop testing on a cache system
JP2010507160A (en) Processing of write access request to shared memory of data processor
US20020069330A1 (en) Cache with DMA and dirty bits
JPH0619786A (en) Method and apparatus for maintenance of cache coference
US11720495B2 (en) Multi-level cache security
US5860114A (en) Method and apparatus for managing snoop requests using snoop advisory cells

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12