DE19926538A1

DE19926538A1 - Hardware with decoupled configuration register partitions data flow or control flow graphs into time-separated sub-graphs and forms and implements them sequentially on a component

Info

Publication number: DE19926538A1
Application number: DE19926538A
Authority: DE
Inventors: Martin Vorbach
Original assignee: Pact Informationstechnologie GmbH
Current assignee: Pact Informationstechnologie GmbH
Priority date: 1999-06-10
Filing date: 1999-06-10
Publication date: 2000-12-14

Abstract

The hardware has a decoupled configuration register and performs programs on a component with a single or multi-dimensional cell structure. Data flow or control flow graphs are partitioned into time-separated sub-graphs and formed sequentially and implemented on the component. An Independent claim is also included for a method for performing programs on a component with a single or multi-dimensional cell structure.

Description

Object of the invention and areas of application

Die vorliegende Erfindung erstreckt sich auf das Gebiet von programmierbaren und insbesondere während des Betriebes umprogrammierbaren arithmetischen und/oder logischen Bausteinen mit Vielzahl von arithmetischen und/oder logischen Einheiten, deren Verschaltung ebenfalls programmierbar und während des Betriebes umprogrammierbar ist. Derartige logische Bausteine sind unter dem Oberbegriff FPGA von verschiedenen Firmen verfügbar. Weiterhin sind mehrere Patente veröffentlicht, die spezielle arithmetische Bausteine mit automatischer Datensynchronisation und verbesserten offenlegen.The present invention extends to the field of programmable and especially during operation reprogrammable arithmetic and / or logical Blocks with a large number of arithmetic and / or logical Units whose interconnection is also programmable and is reprogrammable during operation. Such logical Blocks are under the generic term FPGA of different Companies available. There are also several patents published using special arithmetic blocks automatic data synchronization and improved disclose.

Sämtliche beschriebene Bausteine besitzen eine zwei- oder mehrdimensionale Anordnung von logischen und/oder arithmetischen Einheiten, die über Bussysteme miteinander verschaltbar sind.All of the blocks described have a two or multi-dimensional arrangement of logical and / or arithmetic units that are connected to each other via bus systems are interconnectable.

Aufgabe der Erfindung ist es, ein Programmierverfahren zur Verfügung zu stellen, das es ermöglicht die beschriebenen Bausteine in gewöhnlichen Hochsprachen effizient zu programmieren und dabei die Vorteile der durch die Vielzahl von Einheiten entstehende Parallelität der beschriebenen Bausteine weitgehend automatisch, vollständig und effizient zu nutzen.The object of the invention is to provide a programming method for To make available that enables the described Blocks in common high-level languages efficiently program while taking advantage of the variety parallelism of the units described Building blocks largely automatically, completely and efficiently use.

State of the art

Bausteine der genannten Gattung werden zumeist unter Verwendung gewöhnlicher Datenflussprachen programmiert. Dabei treten zwei grundlegende Probleme auf:
Blocks of the type mentioned are mostly programmed using ordinary data flow languages. There are two basic problems:

1. Programming in data flow languages is for Programmer needs getting used to, deeply sequential tasks can only be described with great difficulty.
2. Leave large applications and sequential descriptions with the existing translation programs (synthesis Tools) only to a limited extent on the desired target technology (synthesize).

Für gewöhnlich werden Applikationen in mehrere Teilapplikationen partitioniert, die dann einzeln auf die Zieltechnologie synthetisiert werden (Fig. 1). Die einzelnen Binärcodes werden dann auf jeweils einen Baustein geladen. Wesentliche Voraussetzung der Erfindung ist das in DE 44 16 881 beschriebene Verfahren, das es ermöglicht, mehrere partitionierte Teilapplikationen innerhalb eines Bausteines zu nutzen, indem die zeitliche Abhängigkeit analysiert wird und über Steuersignale sequentiell die jeweils erforderlichen Teilapplikationen bei einer übergeordneten Ladeeinheit angefordert und von dieser daraufhin auf den Baustein geladen werden.Applications are usually partitioned into several sub-applications, which are then synthesized individually for the target technology ( FIG. 1). The individual binary codes are then loaded onto one block each. An essential prerequisite of the invention is the method described in DE 44 16 881, which makes it possible to use several partitioned sub-applications within a module by analyzing the time dependency and sequentially requesting the sub-applications required in each case from a higher-level loading unit and then by the latter be loaded onto the block.

Existierende Synthese-Tools sind nur bedingt in der Lage Programm-Schleifen auf Bausteine abzubilden (Fig. 2 (0201)).Existing synthesis tools are only able to map program loops to blocks to a limited extent ( Fig. 2 ( 0201 )).

Dabei werden sogenannte FOR-Schleifen (0202) als Primitiv- Schleife häufig noch dadurch unterstützt, daß die Schleife vollkommen auf die Ressourcen des Zielbausteines ausgewalzt werden.So-called FOR loops ( 0202 ) are often also supported as a primitive loop by rolling the loop completely onto the resources of the target module.

WHILE-Schleifen (0203) besitzen im Gegensatz zu FOR-Schleifen keinen konstanten Abbruchswert. Vielmehr wird durch eine Bedingung evaluiert, wann der Schleifenabbruch stattfindet. Daher ist gewöhnlicherweise (wenn die Bedingung nicht konstant ist) zur Synthesezeit nicht bekannt, wenn die Schleife abbricht. Durch das dynamische Verhalten können Synthese-Tools diese Schleifen nicht fest auf Hardware abgebildet d. h. auf einen Zielbaustein übertragen werden.In contrast to FOR loops, WHILE loops ( 0203 ) do not have a constant termination value. Rather, a condition evaluates when the loop is aborted. Therefore, it is usually not known (if the condition is not constant) at synthesis time when the loop breaks. Due to the dynamic behavior, synthesis tools cannot permanently map these loops to hardware, ie they can be transferred to a target block.

Rekursionen sind grundsätzlich nicht auf Hardware abbildbar, wann die Rekursionstiefe nicht zur Synthesezeit bekannt und damit konstant ist. Bei der Rekursion werden mit jeder neuen Rekursionsebene neue Ressourcen allokiert. Das würde bedeuten, daß mit jeder Rekursionsebene neue Hardware zur Verfügung gestellt werden muß, was aber dynamisch nicht möglich ist.In general, recursions cannot be mapped to hardware, when the recursion depth is not known at the time of synthesis and so that is constant. When recursing, with each new one Recursion level allocated new resources. That would mean, that new hardware is available with every recursion level must be asked, which is not dynamically possible.

Selbst einfache Grundstrukturen sind von Synthesetools nur dann abbildbar, wenn der Zielbaustein ausreichend groß ist, d. h. ausreichende Ressourcen bietet.Even simple basic structures are only available from synthesis tools can be mapped if the target module is sufficiently large, d. H. offers sufficient resources.

Einfache zeitliche Abhängigkeiten (0301) werden durch heutige Synthese-Tools nicht in mehrere Teilapplikationen partitioniert und sind deshalb nur als Ganzes auf einen Zielbaustein übertragbar.Simple time dependencies ( 0301 ) are not partitioned into several sub-applications by today's synthesis tools and can therefore only be transferred as a whole to a target module.

Bedingte Ausführungen (0302) und Schleifen über Bedingungen (0303) sind ebenfalls nur abbildbar, wenn ausreichende Ressourcen auf dem Zielbaustein existieren.Conditional executions ( 0302 ) and loops via conditions ( 0303 ) can also only be mapped if sufficient resources exist on the target block.

Method according to the invention

Durch das in DE 44 16 881 beschriebene Verfahren ist es möglich Bedingungen zur Laufzeit innerhalb der Hardwarestrukturen der genannten Bausteine zu erkennen und derart dynamisch darauf zu reagieren, daß die Funktion der Hardware entsprechend der eingetretenen Bedingung modifiziert wird, was im wesentlichen durch das Konfigurieren einer neuen Struktur geschieht.It is by the method described in DE 44 16 881 possible conditions at runtime within the Recognize hardware structures of the named modules and to react so dynamically that the function of the Hardware modified according to the condition becomes what is essentially by configuring a new one Structure happens.

Ein wesentlicher Schritt in dem erfindungsgemäßen Verfahren ist die Partitionierung von Graphen in zeitlich unabhängige Teilgraphen. An essential step in the method according to the invention is the partitioning of graphs in time-independent Subgraphs.

Der Begriff "zeitliche Unabhängigkeit" wird damit definiert, daß die Daten, die zwischen zwei Teilapplikationen übertragen werden durch einen Speicher, gleich welcher Ausgestaltung (also auch mittels einfacher Register), entkoppelt werden. Die ist besonders an den Stellen eines Graphen möglich, an denen eine klare Schnittstelle mit einer begrenzten und möglichst minimalen Menge von Signalen zwischen den beiden Teilapplikationen besteht.The term "temporal independence" is defined that the data transferred between two sub-applications are through a memory, of whatever configuration (also using simple registers). The is particularly possible at the points on a graph where a clear interface with a limited and possible minimal amount of signals between the two There are partial applications.

Die zeitliche Unabhängigkeit kann in großen Graphen durch das gezielte Einfügen von klar definierten und möglichste einfachen Schnittstellen zum Speichern von Daten in einen Zwischenspeicher herbeigeführt werden (vgl. S_n in Fig. 4). Schleifen weisen grundsätzlich eine starke zeitliche Unabhängigkeit auf, da sie lange Zeit über einer bestimmten Menge von (zumeist) in der Schleife lokalen Variablen arbeiten und nur beim Schleifeneintritt und beim verlassen der Schleife eine Übertragung der Operanden bzw. des Ergebnisses erfordern.The temporal independence can be brought about in large graphs by deliberately inserting clearly defined and possible simple interfaces for storing data in a buffer (cf. S _n in FIG. 4). Loops are fundamentally very independent of time, because they work for a long time over a certain amount of (mostly) local variables in the loop and only require the operands or the result to be transferred when the loop enters and exits the loop.

Durch die zeitliche Unabhängigkeit wird erreicht, daß nach der vollständigen Ausführung einer Teilapplikation die nachfolgende Teilapplikation geladen werden kann, ohne daß irgendwelche weiteren Abhängigkeiten oder Einflüsse auftreten. Beim Speichern der Daten in den genannten Speicher kann ein Signal (Trigger) generiert werden, das die übergeordneten Ladeeinheit zum Nachladen der nächsten Teilapplikation auffordert. Der Trigger kann bei der Verwendung von einfachen Registern als Speicher immer generiert werden, wenn das Register beschrieben wird. Bei der Verwendung von Speichern, i. b. von solchen die nach dem FIFO-Prinzip arbeiten, ist die Generierung des Triggers von mehreren Bedingungen abhängig. The independence in time ensures that after the complete execution of a partial application subsequent partial application can be loaded without any other dependencies or influences occur. When saving the data in the named memory, a Signal (trigger) are generated by the parent Loading unit for reloading the next partial application prompts. The trigger can be used when using simple Registers are always generated as memory when that Register is described. When using memories, i. b. of those that work according to the FIFO principle is the Generation of the trigger depends on several conditions.

Folgende Bedingungen können beispielsweise einzeln oder kombiniert ein Trigger erzeugen:
For example, the following conditions can generate a trigger individually or in combination:

- Results memory full
- operand memory empty
- no new operands
- Generated any condition within the sub-application through z. B.
- comparator
- Counter.

Eine Teilapplikation wird im folgenden auch Modul genannt, um die Verständlichkeit aus Sicht der klassischen Programmierung zu erhöhen. Aus demselben Grund werden Signale im folgenden auch Variablen genannt. Dabei unterscheiden sich die Variablen in einem Punkt wesentlich von herkömmlichen Variablen: Jeder Variable ist ein Statussignal (Ready) zugeordnet, das anzeigt, ob die Variable einen gültigen Wert besitzt. Wenn ein Signal einen gültigen (berechneten) Wert besitzt, ist das Statussignal Ready; wenn das Signal keinen gültigen Wert besitzt (Berechnung noch nicht abgeschlossen), ist das Statussignal Not_Ready. Das Prinzip ist ausführlich in der Patentanmeldung PACT02 beschrieben.A partial application is also called module in the following intelligibility from the perspective of classic programming to increase. For the same reason, signals are as follows also called variables. The variables differ essentially different from conventional variables in one point: everyone Variable is assigned a status signal (ready), which indicates whether the variable has a valid value. If a signal has a valid (calculated) value, that is Status signal Ready; if the signal is not a valid value owns (calculation not yet completed), that is Status signal Not_Ready. The principle is detailed in the Patent application PACT02 described.

The processor model

Die in den folgenden Figuren gezeigten Graphen besitzen als Graphenknoten immer in Modul, wobei davon ausgegangen wird, daß mehrere Module auf einen Zielbaustein abgebildet werden können. Das heißt, obwohl alle Module zeitlich voneinander unabhängig sind, wird nur bei nach den Modulen eine Umkonfiguration durchgeführt, bzw. ein Datenspeicher eingefügt, die mit einem vertikalen Strich und Δt markiert sind. Dieser Punkt wird Umkonfigurationszeitpunkt genannt. The graphs shown in the following figures have as Graph nodes always in module, assuming that several modules are mapped to a target module can. That is, although all modules are in time are independent, only after the modules one Reconfiguration carried out, or a data storage inserted, marked with a vertical line and Δt are. This point is called the time of reconfiguration.

Das bedeutet zusammenfassend:
In summary, this means:

1. Large modules can be partitioned at appropriate locations and in small, time-independent modules be disassembled.
2. In the case of small modules that are common to one Target module can be mapped to the temporal Independence waived. This will Configuration steps saved and data processing accelerates.
3. The reconfiguration times are according to the Positioned resources of the target building blocks. This is one given any scaling of the graph length.

In Fig. 4a sind einige grundlegenden Eigenschaften des erfindungsgemäßen Verfahrens dargestellt:
Die Module des Types A sind zu einer Gruppe zusammengefaßt und besitzen am Ende einen bedingten Sprung, entweder nach B1 oder B2. An dieser Position (0401) ist ein Umkonfigurationspunkt eingefügt, da es sinnvoll ist die Zweige des bedingten Sprunges als jeweils eine Gruppe zu betrachten (Fall 1). Würden dagegen beide Zweige von B (B1 und B2) zusätzlich zu A auf den Zielbaustein passen (Fall 2), wäre es sinnvoll nur einen Umkonfigurationspunkt bei 0402 einzufügen, da dadurch die Zahl der Konfigurationen verringert wird und sich die Verarbeitungsgeschwindigkeit erhöht. Beide Zweige (B1 und B2) springen bei 0402 nach C.In Fig. 4a some basic characteristics of the process of the invention are shown:
The modules of type A are grouped together and end up with a conditional jump, either to B1 or B2. At this position ( 0401 ) a reconfiguration point is inserted, since it makes sense to consider the branches of the conditional jump as a group (case 1). If, on the other hand, both branches of B (B1 and B2) match A on the target module (case 2), it would make sense to insert only one reconfiguration point at 0402 , as this reduces the number of configurations and increases the processing speed. Both branches (B1 and B2) jump to 0402 .

Die Konfiguration der Zellen auf dem Zielbaustein ist in Fig. 4b schematisch dargestellt. Dabei werden die Funktionen der einzelnen Graphenknoten auf die Zellen des Zielbausteins abgebildet. Jeweils eine Zeile stellt eine Konfiguration dar. Die gestrichelten Pfeile bei einem Zeilenwechsel zeigen eine Umkonfiguration. S_n ist eine datenspeichernde Zelle, von beliebiger Ausgestaltung (Register, Speicher, etc.). Dabei ist S_nI ein Speicher, der Daten entgegennimmt und S_nO ein Speicher der Daten ausgibt. Der Speicher S_n ist für gleiche n jeweils derselbe, I und O kennzeichnen die Datentransferrichtung.The configuration of the cells on the target module is shown schematically in FIG. 4b. The functions of the individual graph nodes are mapped to the cells of the target module. One line each represents a configuration. The dashed arrows when changing lines indicate a reconfiguration. S _n is a data-storing cell of any configuration (register, memory, etc.). S _n I is a memory that accepts data and S _n O outputs a memory of the data. The memory S _n is the same for the same n, I and O denote the data transfer direction.

Beide Fälle des bedingten Sprunges (Fall 1, Fall 2) sind dargestellt.Both cases of the conditional jump (case 1, case 2) are shown.

Das Modell in Fig. 4 entspricht einem Datenflußmodell, jedoch mit der wesentlichen Erweiterung der Umkonfigurationspunkt und der damit erreichbaren Partitionierung des Graphen, wobei die zwischen den Partitionen übertragenen Daten zwischengespeichert werden.The model in FIG. 4 corresponds to a data flow model, however with the essential expansion of the reconfiguration point and the partitioning of the graph that can be achieved thereby, the data transferred between the partitions being buffered.

Im Modell von Fig. 5a wird aus einer beliebigen Graphenmenge und -Konstellation (0501) selektiv ein Graph aus einer Menge von Graphen B aufgerufen. Nach der Ausführung von B gelangen die Daten nach 0501 zurück.In the model of FIG. 5a, a graph from a set of graphs B is selectively called up from any graph set and constellation ( 0501 ). After executing B, the data return to 0501 .

Wird in 0501 ein ausreichend großer Sequencer (A) implementiert, ist mit dem Modell ein den typischen Prozessoren sehr ähnliches Prinzip implementierbar. Dabei gelangen
If a sufficiently large sequencer (A) is implemented in 0501 , the model can implement a principle very similar to that of typical processors. Get there

1. Data in the sequencer A, which it decodes as commands and reacts to it according to the "von Neumann" principle;
2. Data in the sequencer A that are considered data and to a permanently configured arithmetic unit C for the calculation to get redirected.

Ver Graph B stellt selektierbar ein besonderes Rechenwerke und/oder besondere Opcodes für bestimmte Funktionen zur Verfügung und wird alternativ zur Beschleunigung von C verwendet. Beispielsweise kann B1 ein optimierter Algorithmus zu Berechnung von Matrixmultiplikationen sein, während B2 einen FIR-Filter und B3 eine Mustererkennung darstellt. Entsprechend eines Opcodes der von 0501 dekodiert wird, wird der geeignete bzw. entsprechende Graph B aufgerufen.Ver Graph B selectively provides a special arithmetic unit and / or special opcodes for certain functions and is alternatively used to accelerate C. For example, B1 can be an optimized algorithm for calculating matrix multiplications, while B2 represents an FIR filter and B3 represents a pattern recognition. The appropriate or corresponding graph B is called in accordance with an opcode which is decoded by 0501 .

Fig. 5b schematisiert die Abbildung auf die einzelnen Zellen, wobei in 0502 der pipelineartige Rechenwerks-Character symbolisiert wird. Fig. 5b schematically the figure on the individual cells, said pipelined arithmetic logic unit Character is symbolized in 0502nd

Während in den Umkonfigurationspunkten von Fig. 4 vorzugsweise größere Speicher zum Zwischenspeichern der Daten eingefügt werden, ist eine einfache Synchronisation der Daten in den Umkonfigurationspunkten von Fig. 5 ausreichend, da der Datenstrom vorzugsweise als ganzer durch den Graphen B läuft und der Graph B nicht weiter partitioniert ist; dadurch ist das Zwischenspeichern der Daten überflüssig.While larger memories for temporarily storing the data are preferably inserted in the reconfiguration points of FIG. 4, a simple synchronization of the data in the reconfiguration points of FIG. 5 is sufficient, since the data stream preferably runs as a whole through graph B and graph B does not continue is partitioned; this means that there is no need to cache the data.

In Fig. 6a sind verschiedene Schleifen dargestellt. Schleifen können grundsätzlich auf drei Arten behandelt werden:
Various loops are shown in FIG. 6a. Basically, loops can be handled in three ways:

1. Hardware approach: Loops are completely rolled out onto the target hardware ( 0601 a / b). As already explained, this is only possible with a few types of loops.
2. Data flow approach: Loops are set up across several cells within the data flow ( 0602 a / b). The end of the loop is fed back to the beginning of the loop.
3. Sequencer approach: A sequencer with a minimal instruction set executes the loop ( 0603 a / b). The cells of the target building blocks are designed in such a way that they contain the corresponding sequencer (cf. FIGS. 11a / b).

Durch eine geeignete Zerlegung von Schleifen kann deren Ausführung ggf. optimiert werden:
A suitable disassembly of loops can optimize their execution if necessary:

1. Using optimization methods according to the prior art, the loop body, that is to say the part to be repeated, can often be optimized by removing certain operations from the loop and placing them in front of or behind the loop ( 0604 a / b). This significantly reduces the number of commands to be sequenced. The removed operations are performed only once before or after the loop is executed.
2. Another optimization option is the division of loops into several smaller or shorter loops. The division takes place in such a way that several parallel or several sequential ( 0605 a / b) loops are created.

Fig. 7 verdeutlicht die Implementierung einer Rekursion. Dabei werden dieselben Ressourcen (0701) in Form von Zellen für jede Rekursionsebene (1-3) verwendet. Die Ergebnisse einer jeden Rekursionsebene (1-3) werden beim Aufbau (0711) in einen nach dem Stack-Prinzip aufgebauten Speicher (0702) geschrieben. Gleichzeitig mit dem Abbau (0712) der Ebenen wird der Stack abgebaut. Fig. 7 illustrates the implementation of a recursion. The same resources ( 0701 ) in the form of cells are used for each recursion level ( 1-3 ). The results of each recursion level ( 1-3 ) are written to a memory ( 0702 ) built up according to the stack principle during construction ( 0711 ). The stack is dismantled simultaneously with the dismantling ( 0712 ) of the levels.

High-level language examples

Ein Modul kann beispielsweise folgendermaßen deklariert werden:
For example, a module can be declared as follows:

module kennzeichnet den Beginn eines Modules.
input/output definiert die Ein-/Ausgangsvariablen mit den Typen ty_n.
begin . . . end markieren den Rumpf des Modules.
register <regname1/2< übergibt das Ergebnis an den Output, wobei des Ergebnis in dem durch <regname1/2< spezifizierten Register zwischengespeichert wird. <regname1/2< ist dabei eine globale Referenz auf ein bestimmtes Register.module marks the beginning of a module.
input / output defines the input / output variables with the types ty _n .
begin. . . end mark the fuselage of the module.
register <regname1 / 2 <transfers the result to the output, whereby the result is buffered in the register specified by <regname1 / 2 <. <regname1 / 2 <is a global reference to a specific register.

Als weitere Übergabemodi an den Output stehen beispielsweise folgende Speicherarten zur Verfügung:
fifo <fifoname<, wobei die Daten an einen nach dem FIFO- Prinzip arbeitenden Speicher übergeben werden. fifoname ißt dabei eine globale Referenz auf einen bestimmten, im FIFO- Modus arbeitenden Speicher. terminate@ wird dabei um den Parameter bzw. das Signal. "fifofull" erweitert, der/das anzeigt, daß der Speicher voll ist.
stack <stackname<, wobei die Daten an einen nach dem Stack- Prinzip arbeitenden Speicher übergeben werden. stackname ist dabei eine globale Referenz auf einen bestimmten, im Stack- Modus arbeitenden Speicher.
terminate@ unterscheidet die Programmierung entsprechend des erfindungsgemäßen Verfahrens von der herkömmlichen sequentiellen Programmierung. Der Befehl definiert das Abbruchkriterium des Modules. Die Ergebnisvariablen res1 und res2 werden von terminate@ nicht mit ihrem tatsächlichen Wert evaluiert, statt dessen wird nur die Gültigkeit der Variablen (also deren Statussignal) geprüft. Sind beide Variablen gültig, terminiert das Modul mit dem Wert 1. Das bedeutet, ein Signal mit dem Wert 1 wird an die übergeordneten Ladeeinheit weitergeleitet, woraufhin die übergeordneten Ladeeinheit das nachfolgende Modul lädt.
The following storage types are available as additional transfer modes to the output:
fifo <fifoname <, whereby the data is transferred to a memory that works according to the FIFO principle. fifoname eats a global reference to a specific memory operating in FIFO mode. terminate @ is the parameter or the signal. "fifofull" expanded, indicating that the memory is full.
stack <stackname <, whereby the data is transferred to a memory that works according to the stack principle. stackname is a global reference to a specific memory working in stack mode.
terminate @ distinguishes programming according to the method according to the invention from conventional sequential programming. The command defines the termination criterion of the module. The result variables res1 and res2 are not evaluated by terminate @ with their actual value; instead, only the validity of the variables (i.e. their status signal) is checked. If both variables are valid, the module terminates with the value 1. This means that a signal with the value 1 is forwarded to the higher-level loading unit, whereupon the higher-level loading unit loads the subsequent module.

register wird in diesem Beispiel über input-Daten definiert. Dabei ist <regname1< derselbe wie in example1. Dies bewirkt, daß das Register, das die output-Daten in example1 aufnimmt, die input-Daten für example2 zur Verfügung stellt.
fifo definiert einen FIFO-Speicher der Tiefe 256 für die Ausgangsdaten res1. Das Full-Flag (fifofull) des FIFO- Speichers wird in terminate@ als Abbruchkriterium verwendet.
In this example, register is defined using input data. Where <regname1 <is the same as in example1. This causes the register that takes the output data in example1 to provide the input data for example2.
fifo defines a FIFO memory with a depth of 256 for the output data res1. The full flag (fifofull) of the FIFO memory is used in terminate @ as an abort criterion.

define definiert eine Schnittstelle für Daten (Register, Speicher, etc.). Bei der Definition werden die erforderlichen Ressourcen, sowie die Bezeichnung der Schnittstelle angegeben. Da die Ressourcen eindeutig angegeben werden und nur einmal verwendet werden können, ist die Definition global, d. h. die Bezeichnung gilt für das gesamte Programm.
call ruft ein Modul als Unterprogramm auf.
signal definiert ein Signal als Ausgangssignal, ohne daß eine Zwischenspeicherung verwendet wird.define defines an interface for data (register, memory, etc.). The required resources and the name of the interface are specified in the definition. Since the resources are clearly stated and can only be used once, the definition is global, ie the designation applies to the entire program.
call calls a module as a subroutine.
signal defines a signal as an output signal without using buffering.

Durch terminate@ (example2) wird das Modul main terminiert, sobald das Unterprogramm example2 terminiert.The module main is terminated by terminate @ (example2), as soon as the subroutine example2 terminates.

Durch die globale Deklaration "define . . ." ist es prinzipiell nicht mehr notwendig, die so definierten input/output Signale in die Schnittstellen-Deklaration der Module aufzunehmen. Die entsprechend modifizierten Beispiel-Module würden dann folgendermaßen aussehen:
Through the global declaration "define..." In principle, it is no longer necessary to include the input / output signals defined in this way in the interface declaration of the modules. The corresponding modified example modules would then look like this:

. . .. . .

The status information of the processor model

Zur Bestimmung der Zustände innerhalb eines Graphen werden die Statusregister der einzelnen Zellen (PAEs) Über ein zusätzlich zum Datenbus (0801) existierendes Status-Bussystem (0802) al len anderen Rechenwerken zur Verfügung gestellt (Fig. 8b). Das bedeutet, daß eine Zelle (PAE X) die Statusinformation einer anderen Zelle (PAE Y) evaluieren kann und dementsprechend die Daten verarbeitet. Um den Unterschied zu bestehenden Paralle rechnersystemen zu verdeutlichen, ist in Fig. 8a der Stand der Technik angegeben. Dabei ist ein Multiprozessorsystem gezeigt, dessen Prozessoren über einen gemeinsamen Datenbus (0803) miteinander verbünden sind. Ein explizites Bussystem für den synchronen Austausch von Daten und Status existiert nicht.To determine the states within a graph, the status registers of the individual cells (PAEs) are made available to all other arithmetic units via a status bus system ( 0802 ) that exists in addition to the data bus ( 0801 ) ( FIG. 8b). This means that one cell (PAE X) can evaluate the status information of another cell (PAE Y) and processes the data accordingly. In order to clarify the difference to existing parallel computer systems, the state of the art is given in FIG. 8a. A multiprocessor system is shown, the processors of which are connected to one another via a common data bus ( 0803 ). There is no explicit bus system for the synchronous exchange of data and status.

Abschließend soll angemerkt werden, daß je nach Aufgabe sowohl der Datenflußgraph, als auch der Kontrollflußgraph entspre chend dem beschriebenen Verfahren behandelt werden kann.Finally, it should be noted that depending on the task, both the data flow graph and the control flow graph correspond can be treated according to the described method.

Hardware extensions compared to PACT02 and PACT04

Durch PACT02 und PACT04 ist der Stand der Technik in Bezug aµf die Konfigurationseigenschaften von Zellen (PAEs)definiert und veröffentlicht in DE 196 51 075 (PACT02) sowie in DE 196 54 846 (PACT04).With PACT02 and PACT04, the state of the art is related to aµf defines the configuration properties of cells (PAEs) and published in DE 196 51 075 (PACT02) and in DE 196 54 846 (PACT04).

Dabei soll auf zwei Eigenschaften eingegangen werden:
Two properties are to be considered:

1. According to PACT02, a PAE is assigned a set of configuration registers which contains a configuration ( FIG. 8a).
2. According to PACT04, a group of PAEs can access a memory for storing or reading data ( FIG. 8b).

Aufgabe ist es,
The task is,

a) to create a process that the reconfiguration of. PAEs accelerated and timed by the higher-level loading unit decoupled, and
b) to interpret the procedure so that the Possibility is created over several configurations Sequences.

Decoupling of the configuration register

Das Konfigurationsregister wird von der übergeordneten Ladeeinheit (CT) entkoppelt (Fig. 9), indem ein Satz von mehreren Konfigurationsregistern (0901) verwendet wird. Immer genau eines der Konfigurationsregister bestimmt selektiv die Funktion dar PAE. Die Auswahl des aktiven Registers wird über einen Multiplexer (0902) durchgeführt. In jedes der Konfigurationsregister kann die CT beliebig schreiben, sofern dieses nicht die aktuelle Konfiguration der PAE bestimmt. Welches Konfigurationsregister von 0902 selektiert wird kann durch verschiedene Quellen bestimmt werden:
The configuration register is decoupled from the higher-level loading unit (CT) ( FIG. 9) by using a set of several configuration registers ( 0901 ). Exactly one of the configuration registers selectively determines the function of the PAE. The active register is selected using a multiplexer ( 0902 ). The CT can write to any of the configuration registers as long as this does not determine the current configuration of the PAE. Which configuration register is selected by 0902 can be determined by different sources:

1. An arbitrary status signal or a group of arbitrary status signals which are routed to 0902 via a bus system ( 0802 ) ( FIG. 9a). The status signals are generated by any PAEs or made available by external connections of the block (see Fig. 8).
2. The status signal of the PAE, which is configured by 0901/0902, is used for selection (Fig. 9b).
3. A signal generated by the higher-level CT is used for selection ( FIG. 9c).

Dabei ist es möglich wahlweise die eingehenden Signale (0903, 0904, 0905) mittels eines Registers für einen bestimmten Zeitraum zu speichern.It is possible to save the incoming signals ( 0903 , 0904 , 0905 ) for a certain period of time using a register.

Durch den Einsatz mehrere Register wird die CT zeitlich entkoppelt. Das bedeutet, die CT kann mehrere Konfigurationen "vorladen", ohne daß eine direkte zeitliche Abhängigkeit besteht. By using several registers, the CT is timed decoupled. This means that the CT can have several configurations "preload" without any direct time dependency consists.

Lediglich das selektierte Register in 0901 noch nicht geladen ist, wird mit der Konfiguration der PAE so lange gewartet, bis die CT das Register geladen hat. Um festzustellen, ob ein Register eine gültige Information besitzt kann ein "Valid-Bit" (0906) pro Register eingeführt werden, das von der CT gesetzt wird. Ist 0906 bei einem selektierten Register nicht gesetzt, wird über ein Signal die CT zum schnellstmöglichen Setzten des Registers aufgefordert.Only the selected register in 0901 has not yet been loaded, the configuration of the PAE is waited until the CT has loaded the register. In order to determine whether a register has valid information, a "valid bit" ( 0906 ) can be inserted per register, which is set by the CT. If 0906 is not set for a selected register, the CT prompts you to set the register as quickly as possible.

Das in Fig. 9 beschriebene Verfahren ist einfach zu einem Sequenzer erweiterbar (Fig. 10). Dazu wird ein Mikrokontroller (1001) zur Ansteuerung der Selektionssignale des Multiplexers (0902) verwendet. Der Sequenzer bestimmt dabei abhängig von der aktuell selektierten Konfiguration (1002) und einer zusätzlichen Statusinformation (1003/1004) die nächste zu selektierende Konfiguration. Dabei kann die Statusinformation
The method described in FIG. 9 can easily be expanded to a sequencer ( FIG. 10). For this purpose, a microcontroller ( 1001 ) is used to control the selection signals of the multiplexer ( 0902 ). The sequencer determined dependent on the currently selected configuration (1002) and an additional status information (1003/1004), the next to be selected configuration. The status information

a) the status of the status signal of the PAE, which is configured by 0901/0902 (FIG. 10a).
b) be any status signal supplied via 0802 ( FIG. 10b).
c) a combination of (a) and (b).

Zum einfachen Verständnis kann 0901 als ein Speicher betrachtet werden, wobei über 0902 ein Befehl von 1001 adressiert wird. Die Adressierung ist dabei abhängig vom Befehl selbst und von einem Statusregister. Insoweit entspricht der Aufbau einer "von Neumann" Maschine, mit dem Unterschied,
For easy understanding, 0901 can be viewed as a memory, with 0902 addressing a command from 1001 . The addressing depends on the command itself and on a status register. In this respect, the construction of a "von Neumann" machine corresponds, with the difference that

a) universal usability, i.e. not to use the sequencer (cf. Fig. 9)
b) that the status signal does not have to be generated by the arithmetic unit assigned to the sequencer (PAE), but can come from any other arithmetic unit (cf. FIG. 10b).

Wichtig ist, daß der Sequenzer dabei Sprünge, insbesondere auch bedingte Sprünge, innerhalb von 0901 ausführen kann.It is important that the sequencer can make jumps, especially conditional jumps, within 0901 .

Ein weiteres zusätzliches oder alternatives Verfahren (Fig. 11) zum Aufbau von Sequenzern innerhalb der genannten Bausteine ist die Verwendung der internen Datenspeicher (1101) zum Speichern der Konfigurationsinformation für eine PAE. Dabei wird der Datenausgang eines Speichers auf einen Konfigurationseingang einer PAE geschaltet (1102). Die Adresse (1103) für 1101 kann dabei von derselben PAE oder einer beliebigen anderen generiert werden.Another additional or alternative method ( FIG. 11) for setting up sequencers within the above-mentioned modules is the use of the internal data memory ( 1101 ) for storing the configuration information for a PAE. The data output of a memory is switched to a configuration input of a PAE ( 1102 ). The address ( 1103 ) for 1101 can be generated by the same PAE or any other.

Bei diesem Verfahren ist der Sequenzer nicht fest implementiert, sondern wird durch eine PAE oder eine Gruppe von PAEs nachgebildet.The sequencer is not fixed in this method implemented but is by a PAE or a group modeled by PAEs.

Claims

1. Method for executing programs on a block with a one- or multi-dimensional cell structure, characterized in that data flow or control flow graphs are partitioned into temporally separate subgraphs and sequentially mapped and executed on the block.

2. Hardware with decoupled configuration register.