WO2001055917A1 - Improved apparatus and method for multi-threaded signal processing - Google Patents

Improved apparatus and method for multi-threaded signal processing Download PDF

Info

Publication number
WO2001055917A1
WO2001055917A1 PCT/US2001/002982 US0102982W WO0155917A1 WO 2001055917 A1 WO2001055917 A1 WO 2001055917A1 US 0102982 W US0102982 W US 0102982W WO 0155917 A1 WO0155917 A1 WO 0155917A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
processing
kernel
operations
design
Prior art date
Application number
PCT/US2001/002982
Other languages
French (fr)
Inventor
Ravi Subramanian
Keith Rieken
Original Assignee
Morphics Technology Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Morphics Technology Inc. filed Critical Morphics Technology Inc.
Priority to DE10195202T priority Critical patent/DE10195202T1/en
Priority to AU2001233119A priority patent/AU2001233119A1/en
Priority to GB0217126A priority patent/GB2374701B/en
Priority to KR1020027009711A priority patent/KR100784412B1/en
Priority to JP2001555391A priority patent/JP2003521072A/en
Publication of WO2001055917A1 publication Critical patent/WO2001055917A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design

Definitions

  • Field of Invention Invention relates to electronic data and signal processing, particularly to high-
  • processors and instruction-set architectures that allow for the exploitation of hardware parallelism and software concurrency.
  • High-performance is typically defined as the ability to execute a very large number of operations per second. This figure of merit is strongly dependent on the
  • Instruction-set architecture refers to the actual programmer-visible sets of
  • Instruction-level parallelism this approach, which exploits parallelism in hardware, provides for parallel threads of processing via the use of a very long or vectorized instruction word, whose fields can be
  • multi-processor systems may employ multi-threaded
  • Multi -threading generally is a known
  • design or functional definition, algorithm, electronic signal, or data file is provided initially to include one or more multi-threaded representation.
  • Such initial prototype is provided initially to include one or more multi-threaded representation.
  • Such element is built by providing a datapath, whose structure and configurability is determined via profiling, a sequencer/finite-state-machine, whose structure and configurability is determined via
  • profiling and local memory, whose structure is determined via profiling memory
  • kernel elements are implemented entirely in software or programmable logic, or combination thereof.
  • term “profiling” refers generally to
  • FIG. 1 is a general methodology and tool architecture diagram for
  • FIGs. 2A-B are functional block diagrams for implementing one aspect of the
  • FIG. 3 is a representative functional diagram illustrating heterogeneous aspect
  • FIG. 4 is a representative functional diagram illustrating reconfigurable aspect
  • FIG. 5 is a representative functional diagram illustrating kernel aspect of the present invention.
  • FIG. 6 is a representative functional diagram illustrating interface aspect of the present invention.
  • FIG. 7 is a system methodology flow chart showing functional operations for implementing one or more aspects of the present invention.
  • FIG. 8 is representative of software code stubs for implementing one or more aspects of the present invention.
  • FIG. 9N-B are representative functional diagrams of one or more applications
  • multi-threaded prototype may be used or otherwise be implemented in fixed, parameterizable, programmable, or configurable logic unit or
  • multi-thread algorithms specific sequences of operations, patterns of memory accesses, or segments, each thread being profiled or characterized to optimize operation or implementation using fixed, parameterizable, programmable, or
  • datapath structure is configured into single or multi-thread
  • profiling terminology is understood to refer generally to any
  • profiling is accomplished according to one or more previously and/or
  • the generated symbolic representation may identify certain threads associated with the
  • Each thread may be profiled for processing by corresponding kernel
  • thread may further be mapped to identify the sequence, or scheduling information, for
  • processing architecture may substantially include a set of kernel elements, such that
  • one kernel element processes certain function represented by corresponding thread
  • each thread may be profiled separately or hierarchically for
  • group kernel element and a second-level or group kernel element are associated with a corresponding first thread and second thread in a given function or
  • front-end processing e.g., data
  • chip-rate processing e.g., sample epoch
  • channel element processing e.g., alignment/deskewing, combiner, soft decision computer, interpath interference equalizer, receive antenna diversity
  • interleaving e.g., deinterleaver controller
  • channel coding e.g.,
  • turbo decoder convolutional decoder, etc.
  • kernel elements i.e., as determined by profiling technique as described further
  • FIG. 1 is a general architecture or system block diagram showing top-level
  • present design methodology serves to provide a tool architecture and processor implementation and architecture, or data file representative thereof, for enabling
  • system architecture such as network implementation.
  • netlist, or high-level description language (such as C or HDL) defining one or more functional modules or algorithms 12 is provided manually or computed automatically.
  • profiling and mapping scheme 14 is processed or applied to primitives 16 and
  • mapping 14 provides scheduling data for schedule operation tables 20.
  • kernels 18 are processed and interconnected for implementation 22, for
  • FIGs. 2A-B functional block diagrams show representative set of kernels 18,
  • one or more kernel 18 is associated with or corresponds to profiled and mapped thread, and is implemented reconfigurably using sequencer 32, datapath 34,
  • multi-threaded representation thereof which may be profiled effectively for parallel processing using one or more corresponding kernel logic elements (e.g., according to
  • FIG. 3 functional diagram shows representative heterogeneous, reconfigurable
  • kernel 8 may implement
  • kernel 6 may implement "large” granularity
  • kernel may be implemented or dynamically reconfigured according to design requirement or profile mapping preference.
  • FIG. 4 functional diagram shows one or more
  • kernels such as reconfigurable logic or programmable function units
  • PFU 40 having programmable logic elements and switch matrix (e.g., for encoding bit-level operations), reconfigurable datapaths 42 having multiplexers, registers, adders, buffers, etc. and configurable signal flow through these elements (e.g., for
  • reconfigurable control 46 having data memory, datapath, program memory, instruction decoder and controller, etc. (e.g., for real-time operating system process
  • FIG. 5 functional diagram shows preferred functional elements for implementing kernel 18, including data sequencer 32, data memory 36, and parameterizable configurable
  • ALU arithmetic logic unit
  • FIG. 6 is a representative functional diagram illustrating optional interface
  • DRL dynamically reconfigurable logic
  • DRL process is heterogeneous and reconfigurable
  • hardware interfaces 54 couples
  • processor element 52 associated with library 62 and specified functional modules 60, including processor software model 57 having C-program model 56 and input/output
  • information e.g., signal or data representation
  • general system design e.g., signal or data representation
  • processor model 50 for functional cooperation or emulated real-time signal
  • FIG. 7 flow chart shows another aspect of present operational steps. Initially, user-generated or computer-generated functions are defined 70 for prototype or other
  • one or more mathematical analysis or design performance optimization scheme may be applied 72 to initial design definition.
  • one or more mathematical analysis or design performance optimization scheme may be applied 72 to initial design definition.
  • constituent algorithms for design definition is provided 74, and representation of such algorithms is thereby coded 76, preferably in high-level, register transfer, or
  • Algorithms may be profiled and mapped 78, or otherwise functionally defined
  • communications semaphores 84 also are provided for communications semaphores 84 and scheduling and finite state
  • FIG. 8 shows
  • profiling processing or
  • reconfigurable algorithms representative thereof is temporal, thereby including
  • temporal application includes changes in receiver algorithms required in a cellular
  • processing throughput requirements in one path may increase or decrease as processing progresses (e.g., from antenna to final retrieved data representation,) present profiling scheme serves to determine hardware-
  • dependent changes in receive path of wireless receiver may need to change at startup for global reconfiguration between transaction configuration (e.g.,
  • implementation may be selected, such as for processing data at highest data rate
  • programmable interconnect For a datapath which may need to be selected at configuration time, but is not changed often, then programmable interconnect may be
  • multiplexing structure may apply. Also, for control functions where operation
  • parameterized kernels for processing operations may apply.
  • FIG. 9A shows general aspects of applying present invention, including flow
  • API API
  • Preferred implementation receives configuration parameters through API 94 to define or implement one or more interconnected block modules 96, representing
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • DRL or other functional block
  • kernel elements 98 are integers.
  • configurable parameters 100 may be defined or implemented to correspond in
  • design and implementation method or system serves to
  • prototype function is thus profiled for parallel processing by one or more thread, for
  • FIG. 9B portable mobile radio handsets 102 transmit and receive signals wirelessly
  • base station 104 possibly coupled to other handsets 102 and base stations 104
  • kernel elements may be configured for operation in base station 104 and/or handset units 102.
  • kernels may be configured for profiled

Abstract

System and circuit design methodology and apparatus implements general functional definition (10) using multi-threaded representation thereof, which may be profiled for parallel processing using one or more corresponding kernel logic elements (18). Preferably, communication (26), networking, or media processing functionality or algorithm (12) is functionally analyzed and symbolically represented to identify one or more thread segments, which are each profiled (14) using temporal and/or non-temporal functions, according to one or more particular fixed, parameterizable, programmable, or reconfigurable logic kernel.

Description

IMPROVED APPARATUS AND METHOD FOR
MULTI-THREADED SIGNAL PROCESSING
Field of Invention Invention relates to electronic data and signal processing, particularly to high-
performance multi-threaded information processing techniques.
Background of Invention Traditional methods for achieving high-performance in computational systems
for digital information processing have centered around the design of architectures that deliver greater levels of parallelism. This is typically achieved via the design of
processors and instruction-set architectures that allow for the exploitation of hardware parallelism and software concurrency.
High-performance is typically defined as the ability to execute a very large number of operations per second. This figure of merit is strongly dependent on the
type of operations, which typically depends on the type of application targeted.
Traditional design of high-performance information processing systems usually relies on principles of computer architecture to define several key attributes of
the processing system:
• Instruction-set architecture refers to the actual programmer-visible sets of
instructions, and serves as the boundary between hardware and software.
• Organization refers to high-level aspects of computer design, such as memory
system, bus structure, and internal CPU design.
• Hardware refers to specific detailed logic design, circuit implementation, and
packaging. In order to achieve high-performance, which is an attribute typically required
in special-purpose processors (i.e., built for special applications), three approaches are
taken:
(1) Instruction-level parallelism: this approach, which exploits parallelism in hardware, provides for parallel threads of processing via the use of a very long or vectorized instruction word, whose fields can be
decomposed into concurrent processing threads. The mechanism to exploit this parallelism may be realized via a scheduler, which schedules operations onto one of several datapath processing units. This scheme has
many drawbacks, including the difficulty of building the scheduler and
identifying enough parallelism to achieve desired throughput.
(2) Superscalar techniques: this approach exploits fine-grain highly-
pipelined, single-threaded processor architectures to achieve high performance. This scheme may achieve very high performance, but only
for a small class of operations. For operations not well-matched to a particular datapath architecture, performance of superscalar design is reduced significantly. Thus, the superscalar approach is unsuitable for
wide-ranging applications with high signal-processing content.
(3) Memory hierarchy techniques: to hide latency of memory accesses
to slower memories, memory hierarchy techniques have been used
extensively, especially in microprocessor designs, to increase overall
system performance by intelligently using fast memories, i.e., caches, between the processor units and slower memory effectively to hide latency of slower memory.
Conventionally, multi-processor systems may employ multi-threaded
processing to improve compute performance. Multi -threading generally is a known
approach for enhancing compute resource utility, and thus, overall processing
performance. However, ordinary multi -threaded processing solutions are implemented using complex distributed or networked computer nodes, which are often not easily reconfϊgurable at lower logic or circuit level, nor contemplated for
addressing advanced functional problem sets, such as multi-mode telecommunications algorithms or networking protocols. Accordingly, there is a need for improved multi-
thread processing solution.
Summary of Invention
Invention resides in design and implementation methodology, processor
architecture, and system for processing multi -threaded digital information (signal or
data representation) to improve functional performance. Preferably, general system
design or functional definition, algorithm, electronic signal, or data file is provided initially to include one or more multi-threaded representation. Such initial prototype
design or function may then be profiled or otherwise characterized for parallel or
effectively similar processing, in particular, in order functionally to use or otherwise
be implemented in one or more corresponding fixed, parameterizable, programmable,
or configurable logic units or other equivalent functional signal-processing kernel or
element, using temporal and/or non-temporal functional considerations. Preferably, relatively complex system functionality, such as for application to digital communications and/or networking and/or media processing system design, is
analyzed according to pre-specified system design rules, mathematical operations,
sequences of operations, or parameters, and then symbolically or schematically
represented to identify one or more algorithms, specific sequences of operations, patterns of memory accesses, or segments (i.e., single or multi-"threads"), which may
each be profiled, structured, or otherwise characterized for optimized operation or implementation using one or more particular fixed, parameterizable, programmable,
or configurable logic unit or kernel elements. Such element is built by providing a datapath, whose structure and configurability is determined via profiling, a sequencer/finite-state-machine, whose structure and configurability is determined via
profiling, and local memory, whose structure is determined via profiling memory
accesses and using locality to derive local memory properties. Optionally, one or
more kernel elements are implemented entirely in software or programmable logic, or combination thereof. Further, as described herein, term "profiling" refers generally to
automated and/or manual processing of one or more system or function modules to
define one or more configurable structures associated with each module.
Brief Description of Drawings
FIG. 1 is a general methodology and tool architecture diagram for
implementing in software and/or hardware a preferred embodiment of the present
invention.
FIGs. 2A-B are functional block diagrams for implementing one aspect of the
present invention. FIG. 3 is a representative functional diagram illustrating heterogeneous aspect
of the present invention.
FIG. 4 is a representative functional diagram illustrating reconfigurable aspect
of the present invention.
FIG. 5 is a representative functional diagram illustrating kernel aspect of the present invention.
FIG. 6 is a representative functional diagram illustrating interface aspect of the present invention.
FIG. 7 is a system methodology flow chart showing functional operations for implementing one or more aspects of the present invention.
FIG. 8 is representative of software code stubs for implementing one or more aspects of the present invention.
FIG. 9N-B are representative functional diagrams of one or more applications
of present invention.
Detailed Description of Preferred Embodiment
Present innovation enables automated design and implementation to process
single or multi -threaded or equivalently partitioned processing of digital data, signals, or functional representation for improved processing performance. Initially, system
design or functional definition, algorithm, electronic signal, or data file provides
certain single or multi-threaded representation, whereupon one or more system design
or function modules are profiled, structured, or otherwise characterized for parallel or concurrent processing. For example, multi-threaded prototype may be used or otherwise be implemented in fixed, parameterizable, programmable, or configurable logic unit or
other signal-processing kernel or element. Hence, complex system functionality, such
as digital communication, networking, or multi-media application, may be analyzed
per system design rules, mathematical operations, sequences of operations, or parameters, then symbolically or schematically represented to identify certain single or
multi-thread algorithms, specific sequences of operations, patterns of memory accesses, or segments, each thread being profiled or characterized to optimize operation or implementation using fixed, parameterizable, programmable, or
configurable logic unit or kernel element.
Optionally, datapath structure is configured into single or multi-thread
element, as determined by profiling, a sequencer and/or equivalent finite-state-
machine, whose structure and configurability is determined by profiling, and local
memory, whose structure is determined by profiling memory accesses and locality to
derive memory properties.
As used herein, profiling terminology is understood to refer generally to any
computer-automated and/or manual processing, interpretation, or classification of one or more system or function modules to define or categorize one or more configurable
structures associated with each module, e.g., by selecting or assigning one or more
functional elements or design objects, such as interconnection, signals, logic, circuits,
etc. Preferably, profiling is accomplished according to one or more previously and/or
dynamically defined criteria or functional rule set. Generally, in a computer- automated and/or manual development approach, a
single or multi-threaded design is processed by providing initially a first-level
functional definition representing a prototype system, such that an other-level
functional definition symbolically representing equivalent functionality may be
generated or effectively profiled therefrom. In this hierarchical design scheme, the generated symbolic representation may identify certain threads associated with the
system design, preferably at one or more functional levels.
Each thread may be profiled for processing by corresponding kernel
element(s), and one or more common set of operations is identified for given threads, (e.g., on a 1-to-l, multiple-to-1, or 1-to-multiple thread-to-kernel relationship). Each
thread may further be mapped to identify the sequence, or scheduling information, for
each set of operators utilized to implement system or functional modules, such as a
sequence of arithmetic operations, control operations, and/or memory access
operations or related memory locations. Hence, using the present system development methodology, a multi-threaded
processing architecture may substantially include a set of kernel elements, such that
one kernel element processes certain function represented by corresponding thread,
and another kernel element in the same prototype design processes other function represented by other corresponding thread. In this partitioned or distributed
processing approach, each thread may be profiled separately or hierarchically for
appropriate multi-level or functional group processing. For example, a first-level or
group kernel element and a second-level or group kernel element, respectively are associated with a corresponding first thread and second thread in a given function or
system design. In a representative system design for wireless code division multiple access
(CDMA) communications application, it is contemplated that various kernels may be
provided to serve different functional groups, such as: front-end processing (e.g., data
switch selector, sample interpolation, etc.); chip-rate processing (e.g., sample epoch
selection, matched filter, generic despreader, generic dechannelizer, code generation unit, integrate and dump, generic searcher control, etc.); symbol sequence processing
(e.g., transport format decoder, dynamic spreading factor computer, fast Hadamard transform, etc.); channel element processing (e.g., alignment/deskewing, combiner, soft decision computer, interpath interference equalizer, receive antenna diversity
combiner, etc.); interleaving (e.g., deinterleaver controller); and channel coding (e.g.,
turbo decoder, convolutional decoder, etc.).
Generally, present approach enables one or more functional or system designs
to be implemented efficiently, preferably via current multi-threading scheme, in a
single processor architecture by re-parameterizing, reprogramming, or reconfiguring kernel elements (i.e., as determined by profiling technique as described further
therein,) from which corresponding threads are assembled, and/or by changing
sequence of operations (i.e., as determined by mapping and/or scheduling) with which
threads are implemented. Preferred embodiment implements functional or system
design in one or more heterogeneous and reconfigurable logic or kernel elements (i.e.,
according to so-called "DRL" process, as described further herein.)
FIG. 1 is a general architecture or system block diagram showing top-level
overview of present design methodology, functional modules, and software and/or hardware tool architecture, preferably implemented in one or more electronic design automation platforms, including one or more stand-alone or networked computers, processors, engineering workstations, or other compute facility having appropriate
operating system, user interface, storage management, communications interfaces, and
other computer-aided design and engineering tools. Preferably, it is contemplated that
present design methodology serves to provide a tool architecture and processor implementation and architecture, or data file representative thereof, for enabling
system architecture, such as network implementation.
As shown, initially one or more functional definition files 10, such as design
netlist, or high-level description language (such as C or HDL) defining one or more functional modules or algorithms 12 is provided manually or computed automatically.
In accordance with one aspect of present implementation, functionally-selective
profiling and mapping scheme 14 is processed or applied to primitives 16 and
functional definitions 10 to generate or provide, particularly on a multi -threaded basis,
one or more control and communication signals 26 and kernels 18. Further, profiling
and mapping 14 provides scheduling data for schedule operation tables 20. Control
and communication signals are processed according to one or more predefined or
selected functional rule set or signaling flags, e.g., communication semaphores 24.
Various kernels 18 are processed and interconnected for implementation 22, for
example, in reconfigurable form as described herein for multi-threaded signal
processing.
FIGs. 2A-B functional block diagrams show representative set of kernels 18,
28 and their physical implementation, including schedule and allocate function 30.
Preferably, one or more kernel 18 is associated with or corresponds to profiled and mapped thread, and is implemented reconfigurably using sequencer 32, datapath 34,
and memory 36.
Hence, according to present system and circuit design methodology and/or
computing apparatus, general functional definition is implementable using single or
multi-threaded representation thereof, which may be profiled effectively for parallel processing using one or more corresponding kernel logic elements (e.g., according to
1-to-multi, 1-to-l, multi-to-1 or multi-to-multi kernel to thread relationship.) For example, communication, networking, or media processing functionality or algorithm
is functionally analyzed and symbolically represented to identify one or more thread segments, which are each profiled or otherwise characterized for optimized operation or implementation using one or more particularly designated fixed, parameterizable,
programmable, or reconfigurable logic kernel.
FIG. 3 functional diagram shows representative heterogeneous, reconfigurable,
multi-processing arrangement, for example, whereupon kernel 8 may implement
"small" granularity threaded function, and kernel 6 may implement "large" granularity
threaded function. In this reconfigurable arrangement, various levels of functional
granularity, which is preferably an attribute of design function and corresponding
kernel, may be implemented or dynamically reconfigured according to design requirement or profile mapping preference.
For further illustration, FIG. 4 functional diagram shows one or more
representative or available configurable logic or functions which may be employed
according to present approach for implementing single or multi-threads into
designated kernels, such as reconfigurable logic or programmable function units
(PFU) 40 having programmable logic elements and switch matrix (e.g., for encoding bit-level operations), reconfigurable datapaths 42 having multiplexers, registers, adders, buffers, etc. and configurable signal flow through these elements (e.g., for
dedicated datapath filters), reconfigurable arithmetic 44 having address generators,
memory, memory address control, etc. (e.g., for arithmetic convolution kernels), and
reconfigurable control 46 having data memory, datapath, program memory, instruction decoder and controller, etc. (e.g., for real-time operating system process
management).
Moreover, as further illustration of sample kernel implementation, FIG. 5 functional diagram shows preferred functional elements for implementing kernel 18, including data sequencer 32, data memory 36, and parameterizable configurable
arithmetic logic unit (ALU) 34.
FIG. 6 is a representative functional diagram illustrating optional interface
between dynamically reconfigurable logic (DRL) process 64 and associated
configuration database for processing functions externally to main processor hardware
model 50. Preferably, DRL process is heterogeneous and reconfigurable, and
implemented using current innovation. As shown, hardware interfaces 54 couples
processor element 52 associated with library 62 and specified functional modules 60, including processor software model 57 having C-program model 56 and input/output
device drivers 58 to external DRL process 64.
In this optional embodiment, one or more single or multi-threaded digital
information (e.g., signal or data representation), such as general system design or
functional definition, algorithm, electronic signal or data file is provided initially to
include one or more multi-threaded representation, and such initial prototype design
or function is profiled or otherwise characterized for parallel or effectively similar processing, in particular, in order functionally to use or otherwise be implemented in one or more corresponding fixed, parameterizable, programmable, or configurable
logic unit or other equivalent functional signal-processing kernel or element in
processor model 50, 57 for functional cooperation or emulated real-time signal
interaction with external DRL process 64.
FIG. 7 flow chart shows another aspect of present operational steps. Initially, user-generated or computer-generated functions are defined 70 for prototype or other
system design. Then, one or more mathematical analysis or design performance optimization scheme may be applied 72 to initial design definition. Next, one or more
constituent algorithms for design definition is provided 74, and representation of such algorithms is thereby coded 76, preferably in high-level, register transfer, or
behavioral functional format.
Algorithms may be profiled and mapped 78, or otherwise functionally defined
or categorized manually and/or automatically for optimized or directed operation or
implementation of system design modules, functions, signals, components, or other
element thereof using correspondingly defined kernels 80, preferably using one or more specified design building-blocks, i.e., primitives 86. Profiling and mapping data
also are provided for communications semaphores 84 and scheduling and finite state
machine control and parameters 88. Then, kernel definition 80 and FSM control
parameterization and scheduling 88, as well as communications semaphores 84 are
applied to implement single or multi -threaded elements of present design into
processor architecture with reconfigurable kernel elements 82. FIG. 8 shows
representative software code of sample design indicating usage of multi-thread kernels 90. In accordance with one aspect of present invention, profiling processing or
reconfigurable algorithms representative thereof is temporal, thereby including
determination of certain time value or degree of change over time. Example of
temporal application includes changes in receiver algorithms required in a cellular
wireless system and any associated signal processing scheme for these algorithms which can take advantage of present profiling methodology. In this example,
whereupon processing throughput requirements in one path (e.g., reception direction) may increase or decrease as processing progresses (e.g., from antenna to final retrieved data representation,) present profiling scheme serves to determine hardware-
software or other functional partitioning of overall design implementation.
Further, in such cellular wireless example, it is contemplated that multiple
methods may perform similar or equivalent signal processing, but result in different
air-interface requirements or effective functionality. Particularly in the hardware
partition of a given system, various processing forms or functional elements may
occur or operate at various rates. Because variable processing rates may be required, and various modes of operational control may be dictated by support for multiple
processing streams, several additional non-temporal and temporal profiling techniques
may be applied to provide optimal functional flexibility in view of available
operational performance point or capacity of such hardware architecture (e.g., real¬
time and non-real-time profiling). It is contemplated generally herein that other
examples of application of present innovation may arise additionally with cellular
wireless, including fixed-wireless, unlicensed wireless LANs, cordless telephony, telemetry, and the like. One profiling technique applies to hardware-based algorithms across multiple
modes of operation to determine type and number of operations and storage elements
required, thereby enabling designer to classify each temporally-distinct function in a
form which facilitates identification of commonly-used resources.
Another profiling technique applies for controlling multiple levels of hardware definition according to frequency of change, which is required. Here, mode-
dependent changes in receive path of wireless receiver, for example, may need to change at startup for global reconfiguration between transaction configuration (e.g.,
where transactions are multi-second transactions), and within sub-second transaction
across blocks of data (e.g., "on the fly.")
Depending on profiling results, appropriate level of configurable
implementation may be selected, such as for processing data at highest data rate
needing control on per-cycle basis. However, flexibility may be required for control,
and programmable state machine may provide optimal flexibility meeting necessary
performance requirements. For a datapath which may need to be selected at configuration time, but is not changed often, then programmable interconnect may be
appropriately applied.
Moreover, if datapath selection occurs real-time, then datapath-cell-based
multiplexing structure may apply. Also, for control functions where operation
ordering is necessary, then parameterized kernels for processing operations may apply.
Additionally, in cases of high-performance requirements and low flexibility
requirements, dedicated datapaths are applicable to optimize silicon implementation. In case of multi-standard wireless receiver design, which delivers optimal flexibility relative to performance point, one or more of foregoing profiling techniques are
applicable.
FIG. 9A shows general aspects of applying present invention, including flow
for transferring configuration table 92 of capability, parameters and values according
to one or more industry or proprietary standards through applications programming
interface (API) 94 to provide one or more configuration parameters for single or multi-threaded reconfigurable system implementation according to present scheme,
e.g., using wired and/or over-the-air wireless network download or other
transmission/reception. Preferred implementation receives configuration parameters through API 94 to define or implement one or more interconnected block modules 96, representing
microprocessor, digital signal processor (DSP), application specific integrated circuit
(ASIC), field programmable gate array (FPGA), DRL, or other functional block
module, which further may be defined or implemented in one or more interconnected
kernel elements 98. In accordance with one aspect of present invention, one or more
configurable parameters 100 may be defined or implemented to correspond in
threaded fashion to one or more specified kernel elements. Hence, in this configurable-parameter case, design and implementation method or system serves to
process multi-threaded digital signal or data for improved functional performance.
Generally, system design or functional definition, algorithm, electronic signal
or data file is provided to include such multi-threaded representation, and initial
prototype function is thus profiled for parallel processing by one or more thread, for
example, to implement certain parameterizable kernel elements, which may be
constrained temporally. More particularly, in digital wireless communication application, as shown in
FIG. 9B, portable mobile radio handsets 102 transmit and receive signals wirelessly
with base station 104, possibly coupled to other handsets 102 and base stations 104
through digital network 106. In this networked application, specified design rules,
operations, or parameters, as well as any symbolic or schematic representation thereof
identify or coπespond to multi-threads, for profiling and implementation in programmable kernels or software modules.
Optionally, kernel elements may be configured for operation in base station 104 and/or handset units 102. In particular, kernels may be configured for profiled
datapath, sequencer/finite-state-machine, memory, or other logical structure, possibly
according to temporal or non-temporal design constraint.
Foregoing described embodiments of the invention are provided as
illustrations and descriptions. They are not intended to limit the invention to precise
form described.
In particular, Applicant contemplates that functional implementation of
invention described herein may be implemented equivalently in hardware, software,
firmware, and/or other available functional components or building blocks. Other
variations and embodiments are possible in light of above teachings, and it is thus
intended that the scope of invention not be limited by this Detailed Description, but
rather by Claims following.

Claims

ClaimsWhat is claimed is:
1. In a computer-assisted design system, an automated method for processing
multi-threaded system functionality, the method comprising the steps of:
providing a first function definition representing a system design;
generating from the first function definition a second function definition
representing symbolically the first function definition, such symbolic representation
identifying one or more thread associated with the system design; and
profiling each thread for processing by a specified kernel element or set
thereof.
2. The method of Claim 1 further comprising the steps of:
identifying a common sequence of operations in a given thread; and
associating the common sequence of operations with a set of operators.
3. The method of Claim 2 further comprising the step of:
associating the set of operators with a sequence of arithmetic operations.
4. The method of Claim 2 further comprising the step of:
associating the set of operators with a sequence of control operations.
5. The method of Claim 2 further comprising the step of:
associating the set of operators with a sequence of memory access operations
or locations.
6. The method of Claim 1 wherein:
one or more threads is profiled according to a temporal function.
7. Apparatus for multi-threaded processing comprising:
a first kernel element; and a second kernel element; wherein the first kernel element processes a first function represented by a first
thread, the second kernel element processes a second function represented by a second thread, the first thread and the second thread each being profiled for processing respectively by the first kernel element and the second kernel element, and the first
thread and the second thread being associated with a common function.
8. The apparatus of Claim 7 wherein: a common sequence of operations is identifiable with a given thread,
the common sequence of operations being associated with a set of operators.
9. The apparatus of Claim 8 wherein:
the set of operators is associated with a sequence of arithmetic, control, or
memory access operations.
10. The apparatus of Claim 7 wherein:
the first or second thread is profiled according to a temporal constraint.
11. The apparatus of Claim 7 wherein: the first and second kernel elements are implemented as one or more executable
software modules.
12. The apparatus of Claim 7 wherein:
the first and second kernel elements are implemented as one or more
functional modules in a fixed base station or a mobile handset of a radio communication system.
13. In a communication system comprising a base station and one or more portable units, wherein each portable unit may communicate wirelessly through radio
signals with the base station, a method for signal processing comprising the step of:
generating by a base station a first signal representing a system configuration,
the first signal representing symbolically one or more function definition associated
with one or more thread in the system configuration, wherein each thread is profiled for processing by a specified kernel element in a portable unit.
14. The method of Claim 13 further comprising the step of:
receiving the first signal by the portable unit, one or more kernel element in
the portable unit being configured to process one or more thread in the system
design according to the first signal.
15. The method of Claim 13 wherein:
one or more thread is profiled according to a temporal functional constraint.
PCT/US2001/002982 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing WO2001055917A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
DE10195202T DE10195202T1 (en) 2000-01-27 2001-01-29 Method and device for multi-branched signal processing
AU2001233119A AU2001233119A1 (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing
GB0217126A GB2374701B (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal procesing
KR1020027009711A KR100784412B1 (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing
JP2001555391A JP2003521072A (en) 2000-01-27 2001-01-29 Improved apparatus and method for multithreaded signal processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US49263400A 2000-01-27 2000-01-27
US09/492,634 2000-01-27

Publications (1)

Publication Number Publication Date
WO2001055917A1 true WO2001055917A1 (en) 2001-08-02

Family

ID=23956997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/002982 WO2001055917A1 (en) 2000-01-27 2001-01-29 Improved apparatus and method for multi-threaded signal processing

Country Status (6)

Country Link
JP (1) JP2003521072A (en)
KR (1) KR100784412B1 (en)
AU (1) AU2001233119A1 (en)
DE (1) DE10195202T1 (en)
GB (1) GB2374701B (en)
WO (1) WO2001055917A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839889B2 (en) 2000-03-01 2005-01-04 Realtek Semiconductor Corp. Mixed hardware/software architecture and method for processing xDSL communications
US7693257B2 (en) 2006-06-29 2010-04-06 Accuray Incorporated Treatment delivery optimization
USRE44365E1 (en) 1997-02-08 2013-07-09 Martin Vorbach Method of self-synchronization of configurable elements of a programmable module
US8869121B2 (en) 2001-08-16 2014-10-21 Pact Xpp Technologies Ag Method for the translation of programs for reconfigurable architectures
US8914590B2 (en) 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
US9037807B2 (en) 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US9047440B2 (en) 2000-10-06 2015-06-02 Pact Xpp Technologies Ag Logical cell array and bus system
US9075605B2 (en) 2001-03-05 2015-07-07 Pact Xpp Technologies Ag Methods and devices for treating and processing data

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7657893B2 (en) * 2003-04-23 2010-02-02 International Business Machines Corporation Accounting method and logic for determining per-thread processor resource utilization in a simultaneous multi-threaded (SMT) processor
CN107193539B (en) * 2016-03-14 2020-11-24 北京京东尚科信息技术有限公司 Multithreading concurrent processing method and multithreading concurrent processing system
US11288072B2 (en) * 2019-09-11 2022-03-29 Ceremorphic, Inc. Multi-threaded processor with thread granularity

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821220A (en) * 1986-07-25 1989-04-11 Tektronix, Inc. System for animating program operation and displaying time-based relationships
US5519867A (en) * 1993-07-19 1996-05-21 Taligent, Inc. Object-oriented multitasking system
US5537226A (en) * 1994-11-22 1996-07-16 Xerox Corporation Method for restoring images scanned in the presence of vibration
US5870588A (en) * 1995-10-23 1999-02-09 Interuniversitair Micro-Elektronica Centrum(Imec Vzw) Design environment and a design method for hardware/software co-design
US6112020A (en) * 1996-10-31 2000-08-29 Altera Corporation Apparatus and method for generating configuration and test files for programmable logic devices

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5946487A (en) * 1996-06-10 1999-08-31 Lsi Logic Corporation Object-oriented multi-media architecture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4821220A (en) * 1986-07-25 1989-04-11 Tektronix, Inc. System for animating program operation and displaying time-based relationships
US5519867A (en) * 1993-07-19 1996-05-21 Taligent, Inc. Object-oriented multitasking system
US5537226A (en) * 1994-11-22 1996-07-16 Xerox Corporation Method for restoring images scanned in the presence of vibration
US5870588A (en) * 1995-10-23 1999-02-09 Interuniversitair Micro-Elektronica Centrum(Imec Vzw) Design environment and a design method for hardware/software co-design
US6112020A (en) * 1996-10-31 2000-08-29 Altera Corporation Apparatus and method for generating configuration and test files for programmable logic devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BERNARD K. GUNTHER: "Multithreading with Distributed Functional Units", IEEE TRANSACTIONS ON COMPUTERS, vol. 46, no. 4, April 1997 (1997-04-01), pages 399 - 411, XP002939210 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE44365E1 (en) 1997-02-08 2013-07-09 Martin Vorbach Method of self-synchronization of configurable elements of a programmable module
USRE45109E1 (en) 1997-02-08 2014-09-02 Pact Xpp Technologies Ag Method of self-synchronization of configurable elements of a programmable module
USRE45223E1 (en) 1997-02-08 2014-10-28 Pact Xpp Technologies Ag Method of self-synchronization of configurable elements of a programmable module
US6839889B2 (en) 2000-03-01 2005-01-04 Realtek Semiconductor Corp. Mixed hardware/software architecture and method for processing xDSL communications
US6965960B2 (en) 2000-03-01 2005-11-15 Realtek Semiconductor Corporation xDSL symbol processor and method of operating same
US9047440B2 (en) 2000-10-06 2015-06-02 Pact Xpp Technologies Ag Logical cell array and bus system
US9037807B2 (en) 2001-03-05 2015-05-19 Pact Xpp Technologies Ag Processor arrangement on a chip including data processing, memory, and interface elements
US9075605B2 (en) 2001-03-05 2015-07-07 Pact Xpp Technologies Ag Methods and devices for treating and processing data
US8869121B2 (en) 2001-08-16 2014-10-21 Pact Xpp Technologies Ag Method for the translation of programs for reconfigurable architectures
US8914590B2 (en) 2002-08-07 2014-12-16 Pact Xpp Technologies Ag Data processing method and device
US7693257B2 (en) 2006-06-29 2010-04-06 Accuray Incorporated Treatment delivery optimization

Also Published As

Publication number Publication date
JP2003521072A (en) 2003-07-08
GB2374701A (en) 2002-10-23
DE10195202T1 (en) 2003-04-30
GB2374701B (en) 2004-12-15
KR20030004327A (en) 2003-01-14
GB0217126D0 (en) 2002-09-04
KR100784412B1 (en) 2007-12-11
AU2001233119A1 (en) 2001-08-07

Similar Documents

Publication Publication Date Title
KR100358631B1 (en) Application specific processor and design method for same
Master The next big leap in reconfigurable systems
Wolf et al. Multiprocessor system-on-chip (MPSoC) technology
US5867400A (en) Application specific processor and design method for same
US20060026578A1 (en) Programmable processor architecture hirarchical compilation
EP1953649B1 (en) Reconfigurable integrated circuit
Smit et al. Dynamic reconfiguration in mobile systems
CN1653446A (en) High-performance hybrid processor with configurable execution units
US20100293356A1 (en) Method and system for managing hardware resources to implement system functions using an adaptive computing architecture
WO2002097562A2 (en) Method and system for scheduling in an adaptable computing engine
WO2001055917A1 (en) Improved apparatus and method for multi-threaded signal processing
Bondalapati et al. Reconfigurable computing: Architectures, models and algorithms
Niu et al. Automating elimination of idle functions by runtime reconfiguration
Pillement et al. DART: a functional-level reconfigurable architecture for high energy efficiency
Auras et al. CMA: Chip multi-accelerator
David et al. Energy-Efficient Reconfigurable Processsors
Ueda et al. Architecture-level performance estimation method based on system-level profiling
David et al. A compilation framework for a dynamically reconfigurable architecture
Chen et al. Flexible heterogeneous multicore architectures for versatile media processing via customized long instruction words
Tiensyrjä et al. Systemc and ocapi-xl based system-level design for reconfigurable systems-on-chip
Bossuet et al. Targeting tiled architectures in design exploration
Ou et al. Performance modeling of reconfigurable SoC architectures and energy-efficient mapping of a class of application
David et al. Mapping future generation mobile telecommunication applications on a dynamically reconfigurable architecture
Vasconcellos Parallel signal-processing for everyone
Masselos et al. System level architecture exploration for reconfigurable systems on chip

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 200217126

Country of ref document: GB

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020027009711

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2001 555391

Country of ref document: JP

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 1020027009711

Country of ref document: KR

122 Ep: pct application non-entry in european phase
RET De translation (de og part 6b)

Ref document number: 10195202

Country of ref document: DE

Date of ref document: 20030430

Kind code of ref document: P

WWE Wipo information: entry into national phase

Ref document number: 10195202

Country of ref document: DE

REG Reference to national code

Ref country code: DE

Ref legal event code: 8607