DE4416881C2

DE4416881C2 - Method for operating a data processing device

Info

Publication number: DE4416881C2
Application number: DE4416881A
Authority: DE
Inventors: Martin Vorbach; Robert Muench
Original assignee: PACT INF TECH GmbH
Current assignee: Krass Maren Zuerich Ch; Richter Thomas 04703 Bockelwitz De
Priority date: 1993-05-13
Filing date: 1994-05-13
Publication date: 1998-03-19
Anticipated expiration: 2014-05-14
Also published as: DE4416881A1

Description

Die vorliegende Erfindung bezieht sich auf Verfahren zum Betrieb einer Datenverarbeitungseinrichtung, d. h. einer Hardwareeinheit zur logischen und arithmetischen Manipulation (Verknüpfung) von in binärer Form vorliegenden Daten (Informationen).The present invention relates to methods of operating a Data processing device, d. H. a hardware unit for logical and arithmetic manipulation (combination) of existing in binary form Data (information).

Nach dem Stand der Technik sind gewöhnliche Mikroprozessoren (z. B. Intel 80×86) bereits bekannt. Diese sind aus fest vorgegebenen Einheiten ausgestaltet und verarbeiten Programme durch das Auskodieren von Befehlen (Microcode) und dadurch bestimmte Registermanipulationen.According to the state of the art, ordinary microprocessors (e.g. Intel 80 × 86) already known. These are made up of predefined units designs and processes programs by encoding commands (Microcode) and thereby certain register manipulations.

Ebenfalls bekannt sind sogenannte FPGAs (z. B. aus US-Patent 4,870,302) die zum Aufbau von komplexen logischen Strukturen verwendet werden. Auf der Grundlage dieser Bausteine lassen sich Rechenwerke wie Addierer, Multiplizierer, etc. innerhalb des Bausteines für die Durchführung einer bestimmten Funktion oder Aufgabe konfigurieren. Durch die direkte Implementierung einer Funktion in die entsprechenden Logikbausteine können FPGAs Funktionen oftmals schneller ausführen als Mikroprozessoren. Obwohl die bekannten Bausteine aufgrund ihrer SRAM-Architektur bereits umkonfiguriert werden können, existiert kein Mechanismus, um die Umkonfiguration schnell und dynamisch während der Laufzeit durchführen zu können, insbesondere, wenn nur Teilbereiche des Bausteines mit einer neuen Funktion konfiguriert werden sollen, während andere Teile des Bausteines ihre Aufgabe fortsetzen. Durch die fehlende Interaktion zwischen einer konfigurierenden Einheit und dem FPGA an sich, scheiden die Bausteine als funktional vollwertiger Ersatz für Mikroprozessoren aus.So-called FPGAs (for example from US Pat. No. 4,870,302) are also known can be used to build complex logical structures. On the Arithmetic units such as adders, Multiplier, etc. within the building block for the implementation of a configure a specific function or task. By direct Implementation of a function in the corresponding logic modules can FPGAs often perform functions faster than microprocessors. Although the well-known components due to their SRAM architecture can be reconfigured, there is no mechanism to Perform reconfiguration quickly and dynamically during runtime can, especially if only parts of the block with a new Function should be configured while other parts of the block continue their task. Due to the lack of interaction between one configuration unit and the FPGA itself, separate the components as fully functional replacement for microprocessors.

Die der vorliegenden Erfindung zugrunde liegende Aufgabe besteht darin, ein Verfahren zum Betrieb einer Datenverarbeitungseinrichtung mit programmierbarer und konfigurierbarer Zell-Struktur - wobei eine Zelle als ein logisches Schaltelement ähnlich US 4,870,302 (L.E.) oder eine besonders ausgestaltete Recheneinheit (ALU) definiert ist - bereitzustellen, das eine höhere Parallelität der Verarbeitung und eine flexiblere Verarbeitung von Daten gewährleistet.The object underlying the present invention is a Method for operating a data processing device with programmable and configurable cell structure - being a cell as a logic switching element similar to US 4,870,302 (L.E.) or one in particular configured computing unit (ALU) is defined - to provide one higher parallel processing and more flexible processing of Data guaranteed.

Dabei wird beschrieben, wie Bausteine, die aus einer Vielzahl von zwei- oder mehrdimensionalen Zellstrukturen aufgebaut sind, schnell und effizient nach Ablauf eines Arbeitsschrittes oder Teilarbeitsschrittes, durch Interaktion zwischen dem Baustein und einer konfigurierenden Einheit, dynamisch neu konfiguriert werden, ohne Einfluß auf noch ablaufende Arbeitsschritte zu haben. Die Notwendigkeit einer Umkonfiguration, bzw. das Ende eines Arbeitsschrittes, kann gemäß dieses Verfahrens automatisch erkannt werden. Dadurch kann auf Grundlage dieses Verfahrens ein vollwertiger Ersatz für Mikroprozessoren geschaffen werden.It describes how building blocks made up of a multitude of two or multi-dimensional cell structures are built up quickly and efficiently after the completion of a work step or sub-work step Interaction between the block and a configuring unit, can be dynamically reconfigured without affecting those still in progress To have work steps. The need for a reconfiguration, or that The end of a step can be done automatically according to this procedure be recognized. This can be based on this procedure full replacement for microprocessors can be created.

Ein Vorteil der vorliegenden Erfindung liegt darin, daß die beschriebene Methode eine über einen weiten Raum skalierbare Parallelität ermöglicht. Hierbei wird eine Basis zum schnellen und flexiblen Aufbau von zum Beispiel neuronalen Strukturen geschaffen, wie sie bis dato lediglich mit erheblichem Aufwand durch Software simuliert werden können.An advantage of the present invention is that the described Method enables parallelism that is scalable over a wide space. This provides a basis for the quick and flexible construction of, for example created neural structures as they have only been used up to now considerable effort can be simulated by software.

Diese Aufgabe wird durch die im Patentanspruch I angegebenen Merkmale beziehungsweise Verfahrenschritte gelöst. Zur Verdeutlichung der Verfahrensschritte wird beispielsweise ein integrierter Schaltkreis (Chip) mit einer Vielzahl insbesondere orthogonal zueinander angeordneter Zellen mit je einer Mehrzahl jeweils logisch gleicher und strukturell identisch angeordneter Zellen gezeigt, sowie dessen interne Busstruktur, die zur Erleichterung der Programmierung äußerst homogen ist. Grundsätzlich ist es denkbar innerhalb eines Datenflußprozessors Zellen mit verschiedenen Zellogiken und Zellstrukturen unterzubringen, um so die Leistungsfähigkeit zu erhöhen, indem zum Beispiel für Speicheransteuerungen (Businterface) andere Zellen als für arithmetische Operationen (Arithmetisch/logische Einheiten (ALUs) existieren. Insbesondere kann für neuronale Netze eine ge wisse Spezialisierung von Vorteil sein. Den Zellen ist eine Ladelogik zugeordnet, über die die Zellen je für sich und gegebenenfalls gruppenweise in sogenannte MACROs (Menge von Zellen, welche zusammen eine definierte Aufgabe lösen) zusammengefaßt so programmierbar sind, daß einerseits beliebige logische Funktionen, andererseits aber auch die Verknüpfung der Zellen untereinander in weiten Bereichen verifizierbar sind. Dies wird erreicht indem jeder einzelnen Zelle ein gewisser Speicherplatz zur Verfügung steht, in dem die Konfigurationsdaten abgelegt sind. Anhand dieser Daten werden Multiplexer oder Transistoren in der Zelle beschaltet um die jeweilige Zellfunktion zu gewährleisten (siehe Fig. 12).This object is achieved by the features or method steps specified in patent claim I. To illustrate the method steps, for example, an integrated circuit (chip) with a large number of cells, in particular orthogonally arranged to one another, each with a plurality of logically identical and structurally identical cells, and its internal bus structure, which is extremely homogeneous to facilitate programming, are shown. In principle, it is conceivable to accommodate cells with different cell logics and cell structures within a data flow processor in order to increase the performance, for example, by using different cells for memory controls (bus interface) than for arithmetic operations (arithmetic / logic units (ALUs) The cells are assigned a loading logic by means of which the cells can be programmed individually and, if appropriate, in groups into so-called MACROs (set of cells which together solve a defined task) in such a way that, on the one hand, any logical Functions, on the other hand, however, the linkage of the cells to one another can be verified over a wide range. This is achieved in that each individual cell has a certain storage space in which the configuration data is stored the transistors in the cell are connected to ensure the respective cell function (see Fig. 12).

Mit anderen als im Patentanspruch 1 gebrauchten Worten besteht der Kern der vorliegenden Erfindung darin, eine Methode für einen Datenflußprozessor vorzuschlagen, der zellular aufgebaut ist und dessen Zellen über eine Ladelogik im arithmetisch-logischen Sinne quasi beliebig neu konfiguriert werden können. Dabei ist es von äußerster Notwendigkeit, daß die betreffenden Zellen einzeln und ohne Beeinflussung der übrigen Zellen oder gar einer Stillegung des gesamten Bausteins umkonfiguriert werden können. Ein Datenflußprozessor gemäß dem vorliegenden Verfahren kann so während eines ersten Arbeitszyklusses als Addierer und während eines späteren Arbeitszyklusses als Multiplizierer "programmiert"/genutzt werden, wobei die Anzahl der für die Addition beziehungsweise die Multiplikation erforderlichen Zellen durchaus unterschiedlich sein können. Dabei bleibt die Plazierung der bereits geladenen und während des Umladeprozesses nicht tangierten MACROs erhalten; der Ladelogik beziehungsweise dem Compiler obliegt es, das neu zu ladende MACRO innerhalb der freien Zellen zu partitionieren (d. h. das zu ladende MACRO so zu zerlegen, daß es sich opti mal einfügen läßt). Die Ablaufsteuerung des Programms wird dabei von der Ladelogik übernommen, indem sie gemäß dem momentan ausgeführten Programmabschnitt die entsprechenden MACROs in den Baustein lädt, wobei der Ladevorgang von der später beschriebenen Synchronisationslogik mitgesteuert wird, indem sie den Zeitpunkt des Umladens festlegt. Daher entspricht ein DFP gemäß dem beschriebenem Verfahren nicht der bekannten von-Neumann-Ar chitektur, da die Daten- und Programmspeicher getrennt sind. Dies bedeu tet jedoch gleichzeitig eine höhere Sicherheit, da fehlerhafte Programme keinen CODE, sondern lediglich DATEN zerstören können.In other words than used in claim 1, the core of present invention therein a method for a data flow processor propose who is cellular and whose cells over a Loading logic in the arithmetic-logical sense reconfigured as desired can be. It is extremely important that the concerned cells individually and without influencing the other cells or can even be reconfigured to shutdown the entire module. A data flow processor according to the present method can thus during a first working cycle as an adder and during a later one Working cycle "programmed" / used as multiplier, whereby the number of for addition or multiplication required cells can be quite different. That remains the placement of those already loaded and not during the reloading process receive affected MACROs; the loading logic or the compiler It is the responsibility of the MACRO to be reloaded within the free cells partition (i.e. disassemble the MACRO to be loaded so that it is opti times insert). The sequence control of the program is controlled by the Loading logic taken over by according to the currently running Program section loads the corresponding MACROs into the block, the Charging process also controlled by the synchronization logic described later is determined by the time of reloading. Therefore corresponds to a DFP according to the described method not the known von Neumann-Ar architecture because the data and program memory are separate. This means However, at the same time, there is a higher level of security because of faulty programs no CODE, just destroy DATA.

Um dem Datenflußprozessor eine arbeitsfähige Struktur zu geben, werden einige Zellen, und zwar unter anderem die Eingabe-/Ausgabefunktionen (I/O) und Speichermanagementfunktionen (I/O) vor dem Laden der Programme geladen und bleiben für gewöhnlich während der gesamten Laufzeit konstant. To give the data flow processor a workable structure some cells, including the input / output functions (I / O) and memory management functions (I / O) before loading the programs loaded and usually remain constant throughout the runtime.

Dies ist erforderlich um den Datenflußprozessor an seine Hardwareumgebung anzupassen. Die übrigen Zellen werden zu sogenannten MACROs zusammengefaßt und können während der Laufzeit nahezu beliebig und ohne Beeinflussung von Nachbarzellen oder anderen MACROs umkonfiguriert werden. Dazu sind die Zellen einzeln und direkt adressierbar.This is necessary in order to get the data flow processor to its hardware environment adapt. The remaining cells are combined into so-called MACROs and can be used almost anywhere during the runtime and without influencing Neighboring cells or other MACROs can be reconfigured. For that are the Cells individually and directly addressable.

Um die Umstrukturierung (das Umladen/Umkonfigurieren) der Zellen oder MACROs mit der Ladelogik zu synchronisieren, kann - wo notwendig, da nur Umgeladen werden darf, wenn die MACROs mit ihrer alten Tätigkeit fertig sind - eine Synchronisationsschaltung als MACRO auf dem Datenflußprozessor untergebracht werden, die die entsprechenden Signale an die Ladelogik absendet. Hierzu kann eventuell eine Modifikation der gewöhnlichen MACROs von Nöten sein, da diese dann der Synchronisations-Schaltung Zustandsinformationen zur Verfügung stellen müssen.To restructure (reload / reconfigure) the cells or Synchronizing MACROs with the charging logic can - where necessary, because only It can be reloaded when the MACROs have finished their old work are - a synchronization circuit as MACRO on the data flow processor be housed, the corresponding signals to the charging logic submits. This can possibly be a modification of the usual MACROs be necessary as this is then the synchronization circuit State information must be available.

Diese Zustandsinformationen signalisieren der Synchronisationslogik für gewöhnlich, daß einzelne MACROs ihre Aufgabe erledigt haben, was aus programmiertechnischer Sicht zum Beispiel die Terminierung einer Prozedur oder das Erreichen der Terminierungsbedingung einer Schleife bedeuten kann. D.h. das Programm wird an einer anderen Stelle fortgesetzt und die die Zustandsinformation absendenden MACROs können umgeladen werden. Zudem kann es von Interesse sein, daß die MACROs in einer bestimmten Reihenfolge umgeladen werden. Hierzu kann eine Wertung der einzelnen Zustandsinformationen durch eine Logik (zum Beispiel einen (Prioritäts-Ar biter) erfolgen. Eine derartige - einfache - Logik ist in Fig. 19 gezeichnet. Die Logik besitzt sieben Eingangssignale durch die die sieben MACROs ihre Zustandsinformation abgeben. In diesem Fall soll 0 für "in Arbeit" und 1 für "fertig" stehen. Die Logik besitzt drei Ausgangssignale, die an die Ladelogik geführt werden, wobei der Zustand 000 als Ruhezustand gilt. Liegt ein Signal an einem der sieben Eingänge an, so findet eine Dezimal-Binär-Umsetzung statt, so wird zum Beispiel Sync6 als 110 dargestellt, was der Ladelogik anzeigt, daß das MACRO, welches Sync6 bedient, seine Aufgabe beendet hat. Liegen gleichzeitig mehrere Zustandsinformationen am Eingang an, so gibt die Synchronisationsschaltung das Signal mit der höchsten Priorität an die Ladelogik weiter; liegen zum Beispiel Sync0, Sync4 und Sync6 an, so reicht die Synchronistaions-Schal tung zunächst Sync6 an die Ladelogik weiter. Nachdem die entsprechenden MACROs umgeladen sind und somit Sync6 nicht mehr anliegt wird Sync4 weitergeleitet usw. Zur Verdeutlichung dieses Prinzips kann zum Beispiel der Standard-TTL-Baustein 74148 in Betracht gezogen werden.This status information usually signals the synchronization logic that individual MACROs have done their job, which from a programming point of view can mean, for example, the termination of a procedure or the achievement of the termination condition of a loop. Ie the program is continued at another point and the MACROs sending the status information can be reloaded. It may also be of interest that the MACROs are reloaded in a certain order. For this purpose, the individual status information can be evaluated by logic (for example a (priority ar biter). Such - simple - logic is shown in Fig. 19. The logic has seven input signals through which the seven MACROs output their status information In this case, 0 should stand for "in progress" and 1 for "finished". The logic has three output signals which are fed to the loading logic, with state 000 being considered as idle state, if there is a signal at one of the seven inputs, a decimal-to-binary conversion takes place, for example Sync6 is shown as 110, which indicates to the loading logic that the MACRO that is operating Sync6 has ended its task If the status information is present at the input at the same time, the synchronization circuit also sends the signal the highest priority to the charging logic; if, for example, Sync0, Sync4 and Sync6 are present, the Synchronistaions circuit initially passes Sync6 to d he loading logic continues. After the corresponding MACROs have been reloaded and Sync6 is no longer present, Sync4 is forwarded etc. To clarify this principle, the standard TTL block 74148 can be considered, for example.

Über die Ladelogik kann der Datenflußprozessor jeweils optimal und gege benenfalls dynamisch auf eine zu lösende Aufgabe eingestellt werden. Damit ist zum Beispiel der große Vorteil verbunden, daß neue Normen oder dergleichen einzig und allein durch Umladen des Datenflußprozessors umgesetzt werden können und nicht - wie bisher - einen Austausch mit entsprechendem Anfall von Elektronikschrott bedingen.Via the loading logic, the data flow processor can be optimal and counter can also be dynamically adjusted to a task to be solved. In order to For example, there is the great advantage that new standards or The like only by reloading the data flow processor can be implemented and not - as before - an exchange with cause a corresponding amount of electronic waste.

Die Datenflußprozessoren sind untereinander kaskadierbar, was zu einer beinahe beliebigen Erhöhung des Parallelisierungsgrades, der Rechenleistung, sowie der Netzgröße in neuronalen Netzen führt. Besonders wichtig ist hier eine klare homogene Verbindung der Zellen mit den Ein-/Aus gangs-Pins (IO-Pins) der Datenflußprozessoren, um möglichst keine Einschränkungen auf die Programme zu haben.The data flow processors can be cascaded with one another, resulting in a almost any increase in the degree of parallelization, the Computing power, as well as the network size in neural networks. Especially What is important here is a clear, homogeneous connection of the cells with the on / off data pins of the data flow processors, if possible none To have restrictions on the programs.

In Fig. 4 ist zum Beispiel die Kaskadierung von vier DFPs gezeigt. Sie erscheinen der Umgebung wie ein großer homogener Baustein (Fig. 5). Prinzipiell sind damit zwei Kaskadierungsmethoden denkbar:For example, Figure 4 shows the cascading of four DFPs. They appear to the environment like a large homogeneous building block ( Fig. 5). In principle, two cascading methods are conceivable:

a) Only the local connections between the cells are brought out, which in the present example means two IO pins per edge cell and four IO pins per corner cell. However, the compiler / programmer must note that the global connections are not brought out, which means that the cascading is not completely homogeneous. (Global connections between several cells, usually between a complete cell row or column - see Fig. 1 -; local connections only exist between two cells). Fig. 6a shows the possible structure within a DFP, Fig. 7a shows the resulting cascading of several DFPs (three shown).
b) The local and global connections are brought out, which drastically increases the number of drivers / IO pins and lines required, in our example ( Fig. 6b) to six IO pins per edge cell and twelve IO pins per corner cell. This ensures complete homogeneity in cascading ( Fig. 7b).

Da die globalen Verbindungen insbesondere bei Verwendung der Kaskadierungstechnik b) sehr lang werden können, kann der unangenehme Effekt auftreten, daß die Zahl der globalen Verbindungen nicht ausreicht, da bekanntlich jede Verbindung nur von einem Signal genutzt werden kann. Um diesen Effekt zu minimieren, kann nach einer gewissen Länge der globalen Verbindungen ein Treiber eingeschleift werden. Der Treiber hat zum einen eine Verstärkung des Signal s zur Aufgabe, die bei langen Strecken und entsprechend hohen Lasten, unbedingt erforderlich ist; zum anderen kann der Treiber in Tristate gehen und damit das Signal unterbrechen. Dadurch können die Abschnitte links und rechts, beziehungsweise oberhalb und unterhalb des Treibers von verschiedenen Signalen genutzt werden, sofern der Treiber in Tristate ist, ansonsten wird ein Signal durchgeschleift. Wichtig ist hierbei, daß die Treiber der einzelnen globalen Leitungen auch einzeln angesteuert werden können, d. h. ein globales Signal kann unterbrochen sein, das Nachbarsignal ist jedoch durchgeschleift. Somit können auf einer globalen Verbindung durchaus abschnittweise verschiedene Signale anliegen, während die globale Nachbarverbindung tatsächlich global von ein und demselben Signal verwendet wird (vergleiche Fig. 18).Since the global connections can become very long, especially when using the cascading technique b), the unpleasant effect can occur that the number of global connections is insufficient, since it is known that each connection can only be used by one signal. To minimize this effect, a driver can be looped in after a certain length of the global connections. On the one hand, the driver has the task of amplifying the signal, which is absolutely necessary for long distances and correspondingly high loads; on the other hand, the driver can go into tri-state and thus interrupt the signal. As a result, the sections on the left and right, or above and below the driver, can be used by different signals if the driver is in tri-state, otherwise a signal is looped through. It is important here that the drivers of the individual global lines can also be controlled individually, ie a global signal can be interrupted, but the neighboring signal is looped through. Different sections of a global connection can thus be present on a global connection, while the global neighboring connection is actually used globally by one and the same signal (see FIG. 18).

Zur besseren Kommunikation zwischen den Datenflußprozessoren und der Ladelogik können sogenannte Shared-Memories eingesetzt werden. So können zum Beispiel Programme von einer Festplatte, die im IO-Bereich eines Daten flußprozessors liegt zur Ladelogik durchgereicht werden, indem die Datenflußprozessoren die Daten von der Platte in den Shared-Memory schreiben und die Ladelogik sie dort abholt. Dies ist besonders wichtig, da hier, wie bereits erwähnt, keine von-Neumann- sondern eine Harvardarchitektur vorliegt. Ebenso sind die Shared-Memories von Vorteil, wenn Konstanten, die im Programm - das im Speicherbereich der Ladelogik liegt - definiert sind, mit Daten - die im Speicherbetrieb der Datenflußprozessoren liegen - verknüpft werden sollen.For better communication between the data flow processors and the So-called shared memories can be used for loading logic. So can for example programs from a hard disk in the IO area of a data flow processor is passed through to the loading logic by the Data flow processors transfer the data from the disk to the shared memory write and the loading logic picks them up there. This is particularly important because here, as already mentioned, no von Neumann but one Harvard architecture is present. The shared memories are also advantageous, if constants in the program - that in the memory area of the loading logic lies - are defined, with data - in the storage mode of the Data flow processors lie - should be linked.

Weiterbildungen der vorstehend definierten und umschriebenen Erfindung sind Gegenstand der Unteransprüche.Developments of the invention defined and described above are Subject of the subclaims.

Eine besondere Verwendung eines Datenflußprozessors gemäß dem beschriebenem Verfahrens ist darin zu sehen, daß er in Verbindung mit ge eigneten Ein-/Ausgabe-Einheiten einerseits und einem Speicher andererseits die Basis für einen kompletten (komplexen) Rechner bilden kann. Dabei kann ein Großteil der IO-Funktionen als MACROs auf dem Datenflußprozessor implementiert werden und es brauchen momentan lediglich Spezialbausteine (Ethernet-Treiber, VRAMS. . .) extern zugefügt zu werden. Bei einer Normänderung oder Verbesserung muß dann wie bereits angedeutet nur das MACRO softwareseitig angepaßt werden; ein Eingriff in die Hardware ist nicht notwendig. Es bietet sich hier an, einen IO-(Ein gabe-/Ausgabe-) Stecker festzulegen, über welchen dann die Zusatzbausteine angeschlossen werden können.A special use of a data flow processor according to the The method described can be seen in the fact that it in connection with ge suitable input / output units on the one hand and a memory on the other hand, form the basis for a complete (complex) computer can. Most of the IO functions can be used as MACROs on the Flow processor to be implemented and currently only need it Special modules (Ethernet drivers, VRAMS...) To be added externally. If there is a change in the standard or an improvement, it must be as indicated only the MACRO can be adapted on the software side; an intervention in the Hardware is not necessary. It is advisable to use an IO (on (output / output) connector, via which the additional modules then can be connected.

Fig. 16 zeigt den stark vereinfachten Aufbau eines heute üblichen Rechners. Durch den Einsatz eines DFP-Bausteins können erhebliche Teile eingespart werden (Fig. 17), wobei die entsprechenden herkömmlichen Baugruppen (CPU, Speicherverwaltung, SCSI-, Tastatur- und Videointerface, sowie der para llelen und seriellen Schnittstellen) als MACROs in die kaskadierten DFPs abgelegt werden. Nur die durch einen DFP nicht nachbildbaren Teile wie Speicher und Leitungstreiber mit nicht TTL-Pegeln oder für hohe Lasten müssen extern zugeschaltet werden. Durch die Verwendung des DFPs ist eine günstige Produktion gegeben, da ein und derselbe Baustein sehr häufig verwendet wird, das Layout der Platine ist durch die homogene Vernetzung entsprechend einfach. Zudem wird der Aufbau des Rechners durch die Ladelogik bestimmt, die hier für gewöhnlich nur zu Beginn der Verarbeitung (nach einem Reset) das DFP-Array lädt, wodurch eine günstige Fehlerkorrektur- und Erweiterungsmöglichkeit gegeben ist. Ein derartiger Rechner kann insbesondere mehrere verschiedene Rechnerstrukturen simulieren, indem einfach der Aufbau des zu simulierenden Rechners in das DFP-Array geladen wird. Zu bemerken ist, daß hierbei der DFP nicht in seiner Funktion als DFP arbeitet sondern lediglich ein hochkomplexes und frei programmierbares Zellarray zur Verfügung stellt, sich hierbei jedoch von herkömmlichen Bausteinen in seiner besonderen guten Kaskadierbarkeit unterscheidet. FIG. 16 shows the simplified structure of a conventional computer today. By using a DFP module, considerable parts can be saved ( Fig. 17), with the corresponding conventional modules (CPU, memory management, SCSI, keyboard and video interface, as well as the parallel and serial interfaces) as MACROs in the cascaded DFPs be filed. Only parts that cannot be simulated by a DFP, such as memory and line drivers with non-TTL levels or for high loads, must be connected externally. The use of the DFP results in inexpensive production, since one and the same building block is used very often, the layout of the board is correspondingly simple due to the homogeneous networking. In addition, the structure of the computer is determined by the loading logic, which usually loads the DFP array only at the start of processing (after a reset), which provides a favorable option for error correction and expansion. Such a computer can in particular simulate several different computer structures by simply loading the structure of the computer to be simulated into the DFP array. It should be noted that the DFP does not work in its function as a DFP, but only provides a highly complex and freely programmable cell array, but differs from conventional components in its particularly good cascadability.

Ein weiteres Einsatzgebiet eines solchen Bausteins ist der Aufbau großer neuronaler Netze. Sein besonderer Vorzug liegt hierbei in seiner hohen Gatterdichte, seiner ausgezeichneten Kaskadierbarkeit, sowie seiner Homogenität. Ein Lernvorgang, der eine Änderung einzelner axiomatischer Verbindungen beziehungsweise einzelner Zellfunktionen beinhaltet ist auf üblichen Bausteinen ebenso schlecht durchführbar, wie der Aufbau großer homogener und gleichzeitig flexibler Zellstrukturen. Die dynamische Umkonfigurierbarkeit ermöglicht erstmalig die optimale Simulation von Lernvorgängen.Another area of application for such a module is the construction of large ones neural networks. Its special advantage is its high Gate density, its excellent cascading, as well as its Homogeneity. A learning process that involves a change of individual axiomatic Connections or individual cell functions is included usual building blocks just as difficult to carry out as the construction of large ones homogeneous and at the same time flexible cell structures. The dynamic Reconfigurability enables the optimal simulation of Learning processes.

Die vorliegende Erfindung wird im folgenden anhand der weiteren Figuren näher erläutert. Insgesamt zeigen The present invention is described below with reference to the other figures explained in more detail. Show overall

Fig. 1 ein aus vier Zellen bestehendes unprogrammiertes SUBMACRO X (analog einem 1-Bit-Addierer gemäß Fig. 12 beziehungsweise Fig. 13) mit den erforderlichen Leitungsanschlüssen; FIG. 1 is an existing four-cell unprogrammed SUBMACRO X (analogous to a 1-bit adder of FIG 12 and FIG. 13.) With the required line connections;

Fig. 2 einen Teilausschnitt eines integrierten Schaltkreises (Chip) mit einer Vielzahl von Zellen und einem separierten SUBMACRO X gemäß Fig. 1; FIG. 2 shows a partial section of an integrated circuit (chip) with a multiplicity of cells and a separated SUBMACRO X according to FIG. 1;

Fig. 3 einen integrierten Schaltkreis (Chip) mit einer Orthogonal struktur einer quasi beliebigen Vielzahl von Zellen und einer extern zugeordneten Ladelogik; Fig. 3 is an integrated circuit (chip) with an orthogonal structure of any quasi plurality of cells and an externally associated load logic;

Fig. 4 die Kaskadierung von vier DFPs, wobei die Verbindung zwischen den IO-Pins nur schematisch dargestellt sind (tatsächlich bedeutet eine gezeichnete Verbindung eine Mehrzahl von Leitungen); . Figure 4 illustrates the cascading of four DFPs, wherein the connection between the IO pins shown only schematically (in fact means a drawn connecting a plurality of lines);

Fig. 5 die durch die Kaskadierung erreichte Homogenität; Fig. 5 is the homogeneity achieved by the cascading;

Fig. 6a die Struktur der E/A-Zellen, wobei die globalen Verbindungen nicht herausgeführt werden, FIG. 6a, the structure of the I / O cells, said global connections are not drawn out,

Fig. 6b die Struktur der E/A-Zellen, jedoch mit herausgeführten globalen Verbindungen; 6b shows the structure of the I / O cells, but with a lead-out global connections.

Fig. 7a die aus Fig. 6a resultierende Kaskadierung, wobei eine Eckzelle, sowie die zwei mit ihr kommunizierenden Treiberzellen der kaskadierten Bausteine (vergleiche hierzu Fig. 4) gezeichnet sind; FIG. 7a shows the cascading resulting from FIG. 6a, a corner cell and the two driver cells of the cascaded modules communicating with it (see FIG. 4);

Fig. 7b die aus Fig. 6b resultierende Kaskadierung, wobei eine Eckzelle, sowie die zwei mit ihr kommunizierenden Treiberzellen der kaskadierten Bausteine (vergleiche hierzu Fig. 4) gezeichnet sind; FIG. 7b shows the cascading resulting from FIG. 6b, a corner cell and the two driver cells of the cascaded modules communicating with it (see FIG. 4);

Fig. 8 einen bei spiel haften Aufbau einer Zelle mit Multiplexern zur Auswahl der jeweiligen logischen Bausteine; 8 is a game in exemplary structure of a cell with multiplexers to select the respective logical blocks.

Fig. 9 ein Schaltsymbol für einen 8-Bit-Addierer; Fig. 9 is a circuit symbol of an 8-bit adder;

Fig. 10 ein Schaltsymbol für einen aus acht 1-Bit-Addierern bestehenden 8-Bit-Addierer nach Fig. 9; FIG. 10 shows a circuit symbol for an 8-bit adder according to FIG. 9 consisting of eight 1-bit adders; FIG.

Fig. 11 eine logische Struktur eines 1-Bit-Addierers entsprechend Fig. 10; FIG. 11 shows a logical structure of a 1-bit adder corresponding to FIG. 10;

Fig. 12 eine Zellenstruktur des 1-Bit-Addierers entsprechend Fig. 11; FIG. 12 shows a cell structure of the 1-bit adder corresponding to FIG. 11;

Fig. 13 einen der Zellenstruktur nach Fig. 9 entsprechend aufgebauten 8-Bit-Addierer; Figure 13 shows a cell structure of Figure 9 constructed in accordance with 8-bit adder..;

Fig. 14 ein erstes Ausführungsbeispiel einer Mehrzahl miteinander zu einem Rechenwerk gekoppelter integrierter Schaltkreise (Datenflußprozessor) nach Fig. 3; FIG. 14 shows a first exemplary embodiment of a plurality of integrated circuits (data flow processor) according to FIG. 3 coupled to one arithmetic unit; FIG.

Fig. 15 ein zweites Ausführungsbeispiel einer Mehrzahl miteinander zu einem Rechenwerk gekoppelter integrierter Schaltkreise (Datenflußprozessor) nach Fig. 3; FIG. 15 shows a second exemplary embodiment of a plurality of integrated circuits (data flow processor) according to FIG. 3 coupled to one arithmetic unit; FIG.

Fig. 16 den stark schematisierten Aufbau eines herkömmlichen Rechners; FIG. 16 is the highly schematic structure of a conventional computer;

Fig. 17 den möglichen Aufbau desselben Rechners mit Hilfe eines Arrays aus kaskadierten DFPs; FIG. 17 is the possible construction of the same computer with the aid of an array of cascaded DFPs;

Fig. 18 einen Ausschnitt mit eingezeichneten (Leitungs-) Treibern eines DFPs. Fig. 18 a section with indicated (line) of a drivers DFPs.

Fig. 19 eine zum Beispiel mit einem Standard-TTL-Baustein 74148 ausgeführte Synchronisationslogik; Figure 19 is a synchronization logic, for example, using a standard TTL-block 74148 executed.

Fig. 20a, b, c ein Ausführungsbeispiel eines MACRO zur Addition zweier Zahlenreihen; FIG. 20a, b, c, an embodiment of a MACRO for the addition of two rows of numbers;

Fig. 21a eine Multiplikationsschaltung (vergleiche Fig. 20); FIG. 21a shows a multiplication circuit (compare FIG. 20);

Fig. 21b die interne Struktur des DFPs nach dem Laden (vergleiche Fig. 20b); Fig. 21b, the internal structure of the DFPs after loading (see Figure 20b.);

Fig. 21c die Arbeitsweise des DFPs im Speicher, sowie die Zustände der Zähler 47, 49; Fig. 21c, the operation of DFPs in the memory, and the states of counters 47, 49;

Fig. 22a, b, c eine Kaskadenschaltung, wobei der Addierer aus Fig. 20 und der Multiplizierer aus Fig. 21 zur Steigerung der Rechenleistung hintereinander geschaltet sind;
In Fig. 9 ist ein Schaltsymbol eines 8-Bit-Addierers dargestellt. Das Schaltsymbol besteht aus einem quadratischen Baustein 1 mit acht Eingängen A 0. . .7 für ein erstes Datenwort A und acht Eingängen B 0. . .7 für ein zweites (zu addierendes) Datenwort B. Die jeweils acht Eingänge A_i, B_i (i = 0. . .7) werden ergänzt durch einen weiteren Eingang Üein über den dem Baustein 1 gegebenenfalls ein Übertrag zugeleitet wird. Der Baustein 1 hat funktions- und bestimmungsgemäß acht Ausgänge S 0. . .7 für binären Summanden und einen weiteren Ausgang Üaus für den gegebenenfalls bestehenden Übertrag. FIG. 22a, b, c a cascade circuit, wherein the adder of Figure 20 and the multiplier of FIG connected 21 to enhance performance after another..;
In Fig. 9 is a circuit symbol of an 8-bit adder is illustrated. The circuit symbol consists of a square module 1 with eight inputs A 0.. .7 for a first data word A and eight inputs B 0.. .7 for a second (to be added) data word B. Each of the eight inputs A _i , B _i (i = 0.. .7) are supplemented by a further input Üein via which a carry is possibly sent to module 1 . The function block 1 has eight outputs S 0. .7 for binary summands and a further output Üaus for the possibly existing carry.

Das in Fig. 9 dargestellte Schaltsymbol ist in Fig. 10 als Anordnung sogenannter SUBMACROS dargestellt. Diese SUBMACROS 2 bestehen je aus einem 1-Bit-Addierer 3 mit je einem Eingang für die entsprechenden Bits des Datenworts und einem weiteren Eingang für ein Übertragsbit. Die 1-Bit- Addierer 3 weisen darüberhinaus einen Ausgang für den Summanden und einen Ausgang für den Übertrag Üaus auf.The circuit symbol shown in FIG. 9 is shown in FIG. 10 as an arrangement of so-called SUBMACROS. These SUBMACROS 2 each consist of a 1-bit adder 3 , each with an input for the corresponding bits of the data word and a further input for a carry bit. The 1-bit adders 3 also have an output for the summand and an output for the carryover.

In Fig. 11 ist die binäre Logik eines 1-Bit-Addierers beziehungsweise eines SUBMACROS 3 nach Fig. 10 dargestellt. Analog zu Fig. 10 weist diese Schaltlogik je einen Eingang A_i, B_i für die konjugierten Bits der zu verknüpfenden Daten auf; ferner ist ein Eingang Üein für den Übertrag vorgesehen. Diese Bits werden den dargestellten Verbindungen be ziehungsweise Verknüpfungen entsprechend in zwei ODER-Gliedern 5 und drei NAND-Gliedern 6 verknüpft, so daß am Ausgangsanschluß Si und am Ausgang für den Übertrag Üaus die einem Volladdierer entsprechenden Verknüpfungsergeb nisse (Si, Üaus) anstehen. FIG. 11 shows the binary logic of a 1-bit adder or a SUBMACROS 3 according to FIG. 10. Analog to FIG. 10, this switching logic has an input A _i , B _i for the conjugate bits of the data to be linked; an input Üein is also provided for the carryover. These bits are the connections shown or links correspondingly linked in two OR gates 5 and three NAND gates 6 , so that at the output terminal Si and at the output for the transfer Üaus the results of a full adder (Si, Üaus) are pending.

Auf der Grundlage logisch und strukturell identischer Zellen 10, deren einzelne logische Bausteine der auszuführenden Verknüpfungsfunktion entsprechend verschaltet werden, wird der 8-bit Addierer 2 in geeigneter Weise in die Zellstruktur implementiert. Der Vorgang geschieht mittels der noch zu beschreibenden Ladelogik. Gemäß der in Fig. 12 gezeigten, von der Schaltlogik nach Fig. 11 abgeleiteten Verknüpfungslogik für einen 1-Bit-Ad dierer sind je zwei Zellen 10.1, 10.2 bezüglich der logischen Bausteine insoweit gleich, daß jeweils ein ODER-Glied 5 und ein NAND-Glied 6 aktiviert sind. Eine dritte Zelle 10.3 wird nur als Leitungszelle (Leiter bahnzelle) benutzt und die vierte Zelle 10.4 ist bezüglich des dritten NAND-Gliedes 6 aktiv geschaltet. Das aus den vier Zellen 10.1. . . . 10.4 bestehende SUBMACRO 2 steht somit stellvertretend für einen 1-Bit-Addierer, d. h. ein 1-Bit-Addierer einer Datenverarbeitungseinrichtung nach Art der vorliegenden Erfindung kann über vier entsprechend programmierte (konfigurierte) Zellen 10.1. . . . 10.4 verifiziert werden. (Der Vollständigkeit halber soll angemerkt werden, daß die einzelnen Zellen ein erheblich umfangreicheres Netzwerk von logischen Bausteinen, sprich Ver knüpfungsgliedern, und Invertern aufweist, die jeweils dem aktuellen Befehl der Ladelogik zufolge aktiv geschaltet werden können. Neben den logischen Bausteinen ist auch ein dichtes Netz von Verbindungsleitungen zwischen den jeweils benachbarten Bausteinen und zum Aufbau von zeilen- und spaltenweisen Busstrukturen zur Datenübertragung andererseits vorgesehen, so daß über eine entsprechende Programmierung seitens der Ladelogik quasi beliebige logische Verknüpfungsstrukturen implementiert werden können). The 8-bit adder 2 is suitably implemented in the cell structure on the basis of logically and structurally identical cells 10 , the individual logic components of which are interconnected in accordance with the linking function to be carried out. The process takes place by means of the loading logic to be described. According to the logic for a 1-bit adder shown in FIG. 12 and derived from the switching logic according to FIG. 11, two cells 10.1 , 10.2 are the same with respect to the logic modules to the extent that an OR gate 5 and a NAND gate Link 6 are activated. A third cell 10.3 is used only as a conduction cell (conductor cell) and the fourth cell 10.4 is activated with respect to the third NAND element 6 . That from the four cells 10.1. . . . 10.4 existing SUBMACRO 2 is therefore representative of a 1-bit adder, ie a 1-bit adder of a data processing device according to the type of the present invention can have four appropriately programmed (configured) cells 10.1. . . . 10.4 are verified. (For the sake of completeness, it should be noted that the individual cells have a considerably more extensive network of logic modules, that is to say links, and inverters, which can be activated according to the current command of the charging logic. In addition to the logic modules, there is also a dense network of connecting lines between the respective adjacent building blocks and for the construction of row and column-by-row bus structures for data transmission on the other hand, so that any logic link structures can be implemented by appropriate programming on the part of the loading logic).

Der Vollständigkeit halber ist in Fig. 13 der Zellenaufbau eines 8-Bit-Ad dierers in seiner Gesamtheit dargestellt. Die in Fig. 13 gezeigte Struktur entspricht insoweit der nach Fig. 10, wobei die in Fig. 10 symbolisch als SUBMACROS 3 dargestellten 1-Bit-Addierer jeweils durch eine vier-zellige Einheit 10.1. . . . 10.4 ersetzt sind. Bezogen auf einen Datenflußprozessor gemäß des beschriebenen Verfahrens bedeutet dies, daß zweiunddreißig Zellen der zur Verfügung stehenden Gesamtheit von Zellen einer zellular mit logisch identischem Layout gefertigten Schaltungsplatine seitens der Ladelogik so angesteuert und konfiguriert beziehungsweise programmiert werden, daß diese zweiunddreißig Zellen ein 8-Bit-Addierer bilden.For the sake of completeness, the cell structure of an 8-bit adder is shown in its entirety in FIG. 13. In this respect, the structure shown in FIG. 13 corresponds to that of FIG. 10, the 1-bit adders symbolically represented as SUBMACROS 3 in FIG. 10 each by a four-cell unit 10.1. . . . 10.4 are replaced. In relation to a data flow processor in accordance with the described method, this means that thirty-two cells of the totality of cells available on a circuit board made in a cell with a logically identical layout are controlled and configured or programmed by the loading logic in such a way that these thirty-two cells are an 8-bit adder form.

In der Darstellung nach Fig. 13 ist über eine strichpunktierte Umrahmung ein SUBMACRO "X" zeichnerisch separiert, das letztlich als aus vier einem 1-Bit-Addierer entsprechend programmierten Zellen (10 gemäß Fig. 12) bestehende Untereinheit zu betrachten ist.In the illustration according to FIG. 13, a SUBMACRO "X" is separated in the drawing by a dash-dotted border, which is ultimately to be regarded as a subunit consisting of four cells ( 10 according to FIG. 12) programmed according to a 1-bit adder.

Das in Fig. 13 separierte SUBMACRO "X" ist in Fig. 1 als Teil eines integrierten Schaltkreises (Chip) 20 gemeinsam mit Leitungs- und Datenanschlüssen dargestellt. Das SUBMACRO "X" besteht aus den vier Zellen 10 die entsprechend der orthogonalen Struktur je Seite vier Datenanschlüsse (also insgesamt sechzehn Datenanschlüsse je Zelle) aufweisen. Die Datenanschlüsse verbinden jeweils benachbarte Zellen, so daß ersichtlich wird, wie beispielsweise eine Dateneinheit von Zelle zu Zelle durchgeschleust wird. Die Ansteuerung der Zellen 10 erfolgt einerseits über sogenannte lokale Steuerungen, das sind lokale Leitungen, die mit allen Zellen verbunden sind, und andererseits über sogenannte globale Leitungen, d. h. Leitungen, die über den gesamten integrierten Schaltkreis (Chip) 20 geführt sind.The SUBMACRO "X" separated in FIG. 13 is shown in FIG. 1 as part of an integrated circuit (chip) 20 together with line and data connections. The SUBMACRO "X" consists of the four cells 10 which, according to the orthogonal structure, have four data connections per side (ie a total of sixteen data connections per cell). The data connections connect adjacent cells, so that it can be seen how, for example, a data unit is passed through from cell to cell. The control of the cells 10 takes place on the one hand via so-called local controls, which are local lines which are connected to all cells, and on the other hand via so-called global lines, ie lines which are routed over the entire integrated circuit (chip) 20 .

In Fig. 2 ist ein vergrößerter Ausschnitt eines integrierten Schaltkreises 20 dargestellt, der mit einem orthogonalen Raster von Zellen 10 belegt ist. Wie in Fig. 2 angedeutet kann so zum Beispiel eine Gruppe von vier Zellen 10 als SUBMACRO "X" ausgewählt und dem 1-Bit-Addierer entsprechend Fig. 12 gemäß programmiert beziehungsweise konfiguriert werden. FIG. 2 shows an enlarged section of an integrated circuit 20 which is covered with an orthogonal grid of cells 10 . As indicated in FIG. 2, for example, a group of four cells 10 can be selected as SUBMACRO "X" and programmed or configured according to the 1-bit adder according to FIG. 12.

Ein vollständiger integrierter Schaltkreis gemäß dem beschriebenen Verfahren (DFP) 20 ist beispielsweise in Fig. 3 dargestellt. Dieser integrierte Schaltkreis 20 besteht aus einer Vielzahl im orthogonalen Raster angeordneter Zellen 10 und weist an seinen Außenkanten eine ent sprechende Anzahl von Leitungsanschlüssen (Pins) auf, über die Signale, insbesondere Ansteuersignale und Daten zugeführt und weitergeleitet werden können. In Fig. 3 ist wiederum das SUBMACRO "X" gemäß Fig. 13/Fig. 1 abgegrenzt; darüberhinaus sind auch weitere SUBMACROS separiert, die spezifischen Funktionen und Vernetzungen entsprechend zu Untereinheiten zusammengefaßt sind. Dem integrierten Schaltkreis (Chip) 20 ist eine Ladelogik 30 zugeordnet beziehungsweise übergeordnet, über die der integrierte Schaltkreis 20 programmiert und konfiguriert wird. Die Lade logik 30 teilt letztlich dem integrierten Schaltkreis 20 mit, wie er arithmetisch-logisch zu arbeiten hat.A complete integrated circuit according to the described method (DFP) 20 is shown in FIG. 3, for example. This integrated circuit 20 consists of a plurality of cells 10 arranged in the orthogonal grid and has on its outer edges a corresponding number of line connections (pins) via which signals, in particular control signals and data, can be supplied and forwarded. In Fig. 3 the SUBMACRO "X" is again delimited according to Fig. 13 / Fig. 1; In addition, other SUBMACROS are also separated, which are combined into subunits according to specific functions and networks. The integrated circuit (chip) 20 is assigned or superimposed on a charging logic 30 , via which the integrated circuit 20 is programmed and configured. The loading logic 30 ultimately tells the integrated circuit 20 how it has to work arithmetically and logically.

Anhand von Fig. 14 beziehungsweise Fig. 15 soll im folgenden eine Rechnerstruktur beschrieben werden, die auf den im vorstehenden definierten und erläuterten integrierten Schaltkreis 20 aufbaut.Referring to Fig. 14 and Fig. 15 to be described below, a computer structure that is based on the features defined in the foregoing and illustrated integrated circuit 20.

Gemäß dem in Fig. 14 dargestellten ersten Ausführungsbeispiel ist - analog zur Anordnung der Zellen - im Orthogonalraster eine Mehrzahl von integrierten Schaltkreisen 20 angeordnet, deren jeweils benachbarte über lokale BUS-Leitungen 21 miteinander gekoppelt beziehungsweise vernetzt sind. Die - beispielsweise aus sechzehn integrierten Schaltkreisen 20 bestehende - Rechnerstruktur weist Ein-/Ausgangsleitungen IO auf, über die der Rechner quasi mit der Außenwelt in Verbindung steht, d. h. korrespondiert. Der Rechner gemäß Fig. 14 weist ferner einen Speicher 22 auf, der dem dargestellten Ausführungsbeispiel entsprechend aus zwei separierten Speichern, zusammengesetzt aus jeweils RAM, ROM sowie einem Dual-Ported RAM als shared memory zu der Ladelogik geschaltet, besteht, die gleichermaßen als Schreib-Lese-Speicher oder auch nur als Lese-Speicher realisiert sein können. Der soweit beschriebenen Rechnerstruktur ist die Ladelogik 30 zu- beziehungsweise übergeordnet, mittels der die integrierten Schaltkreise (Datenflußprozessor) 20 programmiert und konfiguriert und ver netzt werden.According to the first exemplary embodiment shown in FIG. 14, a plurality of integrated circuits 20 are arranged in the orthogonal grid, analogous to the arrangement of the cells, the adjacent circuits of which are coupled or networked with one another via local BUS lines 21 . The computer structure, consisting for example of sixteen integrated circuits 20 , has input / output lines IO, via which the computer is quasi connected to the outside world, ie corresponds. The computer shown in FIG. 14 further includes a memory 22, the corresponding two separated memories, each composed of RAM, ROM and a dual-ported RAM as a shared memory connected from to the charging logic to the embodiment shown, is that equally as write Read memory or can only be implemented as read memory. The loading structure 30 is assigned to or superordinate to the computer structure described so far, by means of which the integrated circuits (data flow processor) 20 are programmed and configured and networked.

Die Ladelogik 30 baut beispielsweise auf einem Transputer 31, d. h. einem Prozessor mit mikrocodiertem Befehlssatz auf, dem seinerseits ein Speicher 32 zugeordnet ist. Die Verbindung zwischen dem Transputer 31 und dem Datenflußprozessor basiert auf einer Schnittstelle 33 für die sogenannten Ladedaten, d. h. die Daten die den Datenflußprozessor aufgabenspezifisch programmieren und konfigurieren und einer Schnittstelle 34 für den bereits genannten Rechnerspeicher 22, d. h. den Shared-Memory-Speicher. The loading logic 30 is based, for example, on a transputer 31 , ie a processor with a microcoded instruction set, to which a memory 32 is assigned. The connection between the transputer 31 and the data flow processor is based on an interface 33 for the so-called loading data, ie the data which the data flow processor programs and configures in a task-specific manner, and an interface 34 for the computer memory 22 already mentioned, ie the shared memory memory.

Die in Fig. 14 dargestellte Struktur stellt so einen kompletten Rechner dar, der über die Ladelogik 30 jeweils fall- beziehungsweise aufgabenspezifisch programmiert und konfiguriert werden kann. Der Vollständigkeit halber sei noch angemerkt, daß - wie in Verbindung mit der Ladelogik 30 über Pfeile angedeutet - mehrere dieser Rechner vernetzt, d. h. miteinander gekoppelt werden können.The structure shown in FIG. 14 thus represents a complete computer that can be programmed and configured for each case or task-specific via the loading logic 30 . For the sake of completeness, it should also be noted that — as indicated in connection with the charging logic 30 by means of arrows — several of these computers can be networked, that is to say coupled to one another.

Ein weiteres Ausführungsbeispiel einer Rechnerstruktur ist in Fig. 15 dargestellt. Im Unterschied zu Fig. 14 sind dabei neben den lokalen BUS-Lei tungen zwischen den benachbarten integrierten Schaltkreisen 20 noch übergeordnete zentrale BUS-Leitungen 23 vorgesehen, um zum Beispiel spezifische Ein- beziehungsweise Ausgangsprobleme lösen zu können. Auch der Speicher 22 (Shared-Memory) ist über zentrale BUS-Leitungen 23 mit den integrierten Schaltkreisen 20 verbunden, und zwar wie dargestellt jeweils mit Gruppen dieser integrierten Schaltkreise. Die in Fig. 15 dargestellte Rechnerstruktur weist die gleiche Ladelogik 30 auf, wie sie anhand von Fig. 14 erläutert wurde.Another exemplary embodiment of a computer structure is shown in FIG. 15. In contrast to FIG. 14, in addition to the local bus lines between the adjacent integrated circuits 20 , higher-level central bus lines 23 are also provided, for example in order to be able to solve specific input and output problems. The memory 22 (shared memory) is also connected to the integrated circuits 20 via central bus lines 23 , as shown in each case with groups of these integrated circuits. The computer structure shown in FIG. 15 has the same loading logic 30 as was explained with reference to FIG. 14.

In Verbindung mit Fig. 20a soll eine aus erfindungsgemäßen Datenflußprozessoren aufgebaute Additionsschaltung erläutert werden. Ausgegangen wird von zwei Zahlenreihen A_n und B_n für sämtliche n zwischen 0 und 9; die Aufgabe besteht darin, die Summe C_i = A_i + B_i zu bilden, wobei der Index i die Werte 0 ⇐ n < 9 annehmen kann.An addition circuit constructed from data flow processors according to the invention will be explained in connection with FIG. 20a. It is assumed that there are two series of numbers A _n and B _n for all n between 0 and 9; the task is to form the sum C _i = A _i + B _i , where the index i can assume the values 0 ⇐ n <9.

Bezugnehmend auf die Darstellung nach Fig. 20a ist die Zahlenreihe A_n in einem ersten Speicher RAM1 abgespeichert und zwar zum Beispiel ab einer Speicheradresse 1000 h; die Zahlenreihe B_n ist in einem Speicher RAM2 an einer Speicheradresse 0dfa0h abgespeichert; die Summe C_n wird in den RAM1 eingeschrieben und zwar unter der Adresse 100ah. . Referring to the illustration of FIG 20a, the numerical series A is _n in a first memory RAM1 stored and that, for example, from a memory address 1000 h; the number series B _n is stored in a memory RAM2 at a memory address 0 dfa0h; the sum C _n is written into the RAM1, namely at the address 100 ah.

Es ist ein weiterer Zähler 49 zugeschaltet, der lediglich die einzelnen durch die Steuerschaltung freigegebenen Taktzyklen hochzählt. Dies soll im Weiteren zur Verdeutlichung der Umkonfigurierbarkeit einzelner MACROs ohne Beeinflussung der an der Umkonfigurierung nicht beteiligten MACROs dienen.Another counter 49 is connected, which only counts up the individual clock cycles released by the control circuit. This is also intended to clarify the reconfigurability of individual MACROs without influencing the MACROs not involved in the reconfiguration.

Fig. 20a zeigt zunächst die eigentliche Additionsschaltung 40, die aus einem ersten Register 41 zur Aufnahme der Zahlenreihe A_n und einem zweiten Register 42 zur Aufnahme der Zahlenreihe B_n besteht. Den beiden Registern 41/42 ist ein 8-Bit-Addierer entsprechend dem in Fig. 9 dargestellten MACRO 1 nachgeschaltet. Der Ausgang des MACRO 1 führt über eine Treiberschaltung 43 zurück zum Speicher RAM1. Die Takt- beziehungsweise Zeitsteuerung der Additionsschaltung 40 erfolgt über eine von einem Taktgenerator T angesteuerte Zustandsmaschine (STATEMACHINE) 45, die mit den Registern 41, 42 und der Treiberschaltung 43 verbunden ist. FIG. 20a shows the actual first addition circuit 40, which consists of a first register 41 _n for receiving the number series A and a second register 42 is for receiving the row of numbers B _n. An 8-bit adder according to the MACRO 1 shown in FIG. 9 is connected downstream of the two registers 41/42 . The output of the MACRO 1 leads back to the memory RAM1 via a driver circuit 43 . The clock or time control of the addition circuit 40 takes place via a state machine (STATEMACHINE) 45 controlled by a clock generator T, which is connected to the registers 41 , 42 and the driver circuit 43 .

Die Additionsschaltung 40 wird funktional durch eine Adreßschaltung 46 zur Generierung der Adreßdaten für die abzuspeichernden Additionsergebnisse ergänzt. Die Adreßschaltung 46 besteht ihrerseits aus drei MACROs 1 (gemäß Fig. 9) zur Bildung der Adreßdaten, wobei diese MACROs 1 wie folgt geschaltet sind: Über jeweils einen Eingang werden die zu verknüpfenden Adressen für A_n, B_n, C_n zugeführt. Diese Adressen werden mit den Ausgangssignalen eines Zählers 47 addiert und mit der Statemachine 45 so verknüpft, daß am Ausgang die neue Zieladresse ansteht. Der Zähler 47 und der Komparator 48 haben dabei die Aufgabe sicherzustellen, daß jeweils die richtigen Summanden verknüpft werden und daß jeweils am Ende der Zahlenreihen, d. h. bei n = 9 abgebrochen wird. Ist die Addition vollendet, so wird in der Zustandsmaschine 45 ein STOP-Signal generiert und die Schal tung passiv geschaltet. Ebenso kann das STOP-Signal als Eingangssignal für eine Synchronisations-Schaltung verwenden werden, indem die Synchronisationslogik anhand dieses Signals erkennen kann, daß die Gesamtfunktion "Addieren" gemäß dem nachfolgend beschriebenen ML1 Programm beendet ist und die MACROs somit durch Neue ersetzt werden können (zum Beispiel könnte STOP das Signal Sync5 sein).The addition circuit 40 is functionally supplemented by an address circuit 46 for generating the address data for the addition results to be stored. The address circuit 46 in turn consists of three MACROs 1 (according to FIG. 9) for forming the address data, these MACROs 1 being connected as follows: The addresses to be linked for A _n , B _n , C _{n are} supplied via one input each. These addresses are added to the output signals of a counter 47 and linked to the state machine 45 so that the new destination address is pending at the output. The counter 47 and the comparator 48 have the task of ensuring that the correct summands are linked in each case and that the end of the number series, ie at n = 9, is terminated. When the addition is complete, a STOP signal is generated in the state machine 45 and the switching device is switched to passive. Likewise, the STOP signal can be used as an input signal for a synchronization circuit, since the synchronization logic can use this signal to recognize that the overall function "adding" according to the ML1 program described below has ended and the MACROs can thus be replaced by new ones (for Example could be STOP the signal Sync5).

Der Zeitablauf in der 45 (STATEMACHINE) läßt sich dabei wie folgt darstellen, wobei noch anzumerken ist, daß in der Zustandsmaschine 45 eine Verzögerungszeit T (in Form von Taktzyklen) zwischen der Adreßgenerierung und dem Datenerhalt implementiert ist:The time sequence in FIG. 45 (STATEMACHINE) can be represented as follows, although it should also be noted that a delay time T (in the form of clock cycles) between the address generation and the data retention is implemented in the state machine 45 :

- In cycle 1, the counter 47 is increased by 1 and in the comparator 48 it is checked whether n <9 has been reached; the addresses for A, B, C are calculated in synchronism with these operations;
- In the cycle (T + 1) the summands A, B are read out and added;
- The sum C is stored in the cycle (T + 2).

Mit anderen Worten heißt dies, daß die Operationsschleife und die eigentliche Addition gerade (T + 2) Taktzyklen erfordert. Im allgemeinen sind für T 2. . .3 Takte erforderlich, so daß verglichen mit den herkömmlichen Prozessoren (CPU), die im allgemeinen 50 bis mehrere 100 Taktzyklen bedingen, eine ganz wesentliche Rechenzeit-Reduzierung möglich wird.In other words, the operational loop and the actual addition requires even (T + 2) clock cycles. In general are for M 2.. .3 clocks required, so that compared to the conventional processors (CPU), which generally 50 to several 100 Cycle cycles require a very significant reduction in computing time becomes.

Die anhand von Fig. 20 aufgezeigte Konfiguration soll im folgenden über eine hypothetische MACRO-Sprache ML1 nochmals erläutert werden:
Es existieren die Zahlenreihen A_n und B_n
n: 0 ⇐ n < = 9
Es sollen die Summen C_i = A_i + B_i mit i ∈ N gebildet werden.
const n = 9;
array A[n] in RAM[1] at 1000 h;
array B[n] in RAM[2] at 0dfa0h;
array C[n] in RAM[1] at 100ah;
for i = 0 to n with (A[i], B[i], C[i])
Δ1;
C = Δ1 = A + B;
next;
RAM[1] ist der 1. Speicherblock
RAM[2] ist der 2. Speicherblock
at folgt die Basisadresse der Arrays
for ist der Schleifenbeginn
next ist das Schleifenende
with ( ) folgen die Variablen, deren Adressen durch die Zählvariable i bestimmt werden
T folgt die Verzögerungszeit für eine Statemachine in Taktzyklen
Das Timing der Zustandsmaschine (Statemachine) sieht demnach folgendermaßen aus:
Zyklus Aktivität
1 Zähler erhöhen, Vergleich auf < 9 (ja ⇒ Abbruch) und
Adressen für A, B, C, berechnen
T + 1 A, B, holen und addieren
T + 2 Nach C speichern
Das heißt - wie bereits erwähnt - die Schleife und die Addition benötigen gerade einmal T + 2 Taktzyklen.The configuration shown with reference to FIG. 20 is to be explained again below using a hypothetical MACRO language ML1:
The number series A _n and B _n exist
n: 0 ⇐ n <= 9
The sums C _i = A _i + B _i with i ∈ N are to be formed.
const n = 9;
array A [n] in RAM [1] at 1000 h;
array B [n] in RAM [2] at 0dfa0h;
array C [n] in RAM [1] at 100ah;
for i = 0 to n with (A [i], B [i], C [i])
Δ1;
C = Δ1 = A + B;
next;
RAM [1] is the 1st memory block
RAM [2] is the 2nd memory block
at follows the base address of the arrays
for is the start of the loop
next is the loop end
with () follow the variables whose addresses are determined by the counter variable i
T follows the delay time for a state machine in clock cycles
The timing of the state machine (state machine) therefore looks like this:
Cycle activity
Increase 1 counter, compare to <9 (yes ⇒ abort) and
Calculate addresses for A, B, C
T + 1 A, B, get and add
T + 2 Save after C.
That means - as already mentioned - the loop and the addition require just T + 2 clock cycles.

Fig. 20b zeigt den groben Aufbau der einzelnen Funktionen (MACROs) in einem erfindungsgemäßen DFP. Die MACROs sind in ihrer etwaigen Lage und Größe eingezeichnet und mit den anhand von Fig. 20a erläuterten entsprechenden Nummern versehen. Fig. 20b shows the rough structure of the individual functions (MACROs) in an inventive DFP. The MACROs are shown in their possible position and size and are provided with the corresponding numbers explained with reference to FIG. 20a.

Fig. 20c zeigt den groben Aufbau der einzelnen Funktionen auf die RAM-Blöcke 1 und 2: Die Summanden werden nacheinander in aufsteigender Reihenfolge aus den RAM-Blöcken 1 und 2 ab Adresse 1000 h beziehungsweise 0dfa0h gelesen und in RAM-Block 1 ab Adresse 100ah gespeichert. Zudem sind die Zähler 47 und 49 gegeben, beide zählen während des Ablaufs der Schaltung von 0 bis 9. FIG. 20c shows the rough structure of the individual functions on the RAM blocks 1 and 2: The summands are h sequentially in ascending order from the RAM blocks 1 and 2 from address 1000 respectively read 0dfa0h and in RAM block 1 starting at address 100 ah saved. In addition, the counters 47 and 49 are given, both count from 0 to 9 during the course of the circuit.

Nach Beendigung des beschriebenen Programms soll ein neues Programm geladen werden, das die Ergebnisse weiterverwertet. Die Umladung soll zur Laufzeit erfolgen. Das Programm ist im Folgenden gegeben:After the program described has ended, a new program should be loaded the results are used. The transshipment is supposed to be at runtime respectively. The program is given below:

Es existieren die Zahlenreihen A_n und B_n, wobei A_n durch das Ergebnis C_n des vorher ausgeführten Programms gegeben ist:
n: 0 ⇐ n < = 9
Es sollen die Produkte C_i = A_i×B_i mit i ∈ N gebildet werden.
const n = 9
array A[n] in RAM[1] at 100ah
array B[n] in RAM[2] at 0dfa0h
array C[n] in RAM[1] at 1015 h
for i = 0 to n with (A[i], B[i], C[i])
Δ1;
C = Δ1 = A×B;
next.There are the series of numbers A _n and B _n , where A _{n is given} by the result C _{n of} the program previously executed:
n: 0 ⇐ n <= 9
The products C _i = A _i × B _i with i ∈ N are to be formed.
const n = 9
array A [n] in RAM [1] at 100ah
array B [n] in RAM [2] at 0dfa0h
array C [n] in RAM [1] at 1015 h
for i = 0 to n with (A [i], B [i], C [i])
Δ1;
C = Δ1 = A × B;
next.

Die Beschreibung der einzelnen Befehle ist bereits bekannt, × symbolisiert die Multiplikation. The description of the individual commands is already known, × symbolizes the multiplication.

Die MACRO-Struktur ist in Fig. 21a beschrieben, Fig. 21b gibt in bekannter Weise die Lage und Größe der einzelnen MACROs auf dem Chip an, besonders zu beachten ist die Größe des Multiplizierers 2 in Vergleich zu Addierer 1 aus Fig. 20b. In Fig. 21c ist erneut die Auswirkung der Funktion auf den Speicher aufgezeigt, Zähler 47 zählt erneut von 0 bis 9, d. h. er wird beim Nachladen der MACROs zurückgesetzt.The MACRO structure is described in FIG. 21a, FIG. 21b specifies the position and size of the individual MACROs on the chip in a known manner; the size of the multiplier 2 in comparison to adder 1 from FIG. 20b must be particularly noted. In Fig. 21c, the effect of the function is again shown to the memory counter 47 counts again from 0 to 9, he that is reset when reloading the MACROs.

Besonders zu beachten ist der Zähler 49. Angenommen, das Umladen der MACROs beträgt 10 Taktzyklen. Dann läuft der Zähler 49 von 9 auf 19, da der Baustein dynamisch umgeladen wird, d. h. nur die umzuladenden Teile werden gestoppt, der Rest arbeitet weiter. Das führt nun dazu, daß der Zähler während des Programmablaufs von 19 auf 29 hochläuft. (Hiermit soll das dynamische unabhängige Umladen demonstriert werden, in jedem bisher bekannten Baustein würde der Zähler erneut von 0 auf 9 laufen, da er zurückgesetzt wird).Counter 49 is particularly important. Assume that the reloading of the MACROs is 10 clock cycles. Then the counter 49 runs from 9 to 19, since the module is dynamically reloaded, ie only the parts to be reloaded are stopped, the rest continues to work. This now leads to the counter running up from 19 to 29 during the program execution. (This is to demonstrate the dynamic independent reloading; in each block known to date, the counter would run from 0 to 9 again because it is reset).

Bei näherer Betrachtung des Problems stellt sich die Frage, warum nicht beide Operationen, die Addition und die Multiplikation in einem Zyklus durchgeführt werden, also die Operation:
Es existieren die Zahlenreihen A_n und B_n, wobei A_n durch das Ergebnis von C_n des vorher ausgeführten Programms gegeben ist:
n: 0 ⇐ n < = 9
Es sollen die Produkte C_i = (A_i+B_i)×B_i mit i ∈ N gebildet werden.
path D;
const n = 9;
array A[n] in RAM[1] at 1000 h
array B[n] in RAM[2] at 0dfa0h
array C[n] in RAM[1] at 100ah
for i = 0 to n with (A[i], B[i], C[i])
Δ1;
D = Δ1 = A+B;
C = Δ1 = D×B;
next; A closer look at the problem raises the question of why the two operations, the addition and the multiplication, are not carried out in one cycle, i.e. the operation:
There are the series of numbers A _n and B _n , where A _{n is given} by the result of C _{n of} the program previously executed:
n: 0 ⇐ n <= 9
The products C _i = (A _i + B _i ) × B _i with i ∈ N should be formed.
path D;
const n = 9;
array A [n] in RAM [1] at 1000 h
array B [n] in RAM [2] at 0dfa0h
array C [n] in RAM [1] at 100ah
for i = 0 to n with (A [i], B [i], C [i])
Δ1;
D = Δ1 = A + B;
C = Δ1 = D × B;
next;

path D definiert einen internen nicht aus den DFP herausgeführten Doppelpfad. Die Operation benötigt wegen einem zusätzlichen Δ1 einen Taktzyklus mehr als vorher, ist insgesamt jedoch schneller als die beiden obigen Programme in Folge ausgeführt, da zum einen die Schleife nur einmal durchlaufen wird, zum zweiten nicht umgeladen wird.path D defines an internal one not brought out of the DFP Double path. The operation requires one because of an additional Δ1 Clock cycle more than before, but overall is faster than the two The above programs are executed in succession because, on the one hand, the loop only occurs once secondly, it is not reloaded.

Prinzipiell könnte das Programm auch so formuliert werden:
const n=9;
array A[n] in RAM[1] at 1000 h
array B[n] in RAM[2] at 0dfa0h
array C[n] in RAM[1] at 100ah
for i = 0 to n with (A[i], B[i], C[i])
Δ1;
C = Δ2 = (A+B)×B;
next;In principle, the program could also be formulated as follows:
const n = 9;
array A [n] in RAM [1] at 1000 h
array B [n] in RAM [2] at 0dfa0h
array C [n] in RAM [1] at 100ah
for i = 0 to n with (A [i], B [i], C [i])
Δ1;
C = Δ2 = (A + B) × B;
next;

Sind die Gatterlaufzeiten des Addierers und des Multiplizierers zusammen kleiner als ein Taktzyklus, kann die Operation (A+B)×B auch in einem Taktzyklus durchgeführt werden, was zu einer weiteren erheblichen Geschwindigkeitssteigerung führt:
const n = 9;
array A[n] in RAM[1] at 1000 h
array B[n] in RAM[2] at 0dfa0h
array C[n] in RAM[1] at 100ah
for i = 0 to n with (A[i], B[i], C[i])
Δ1;
C = Δ1 = (A+B)×B;
next;If the gate running times of the adder and the multiplier together are less than one clock cycle, the operation (A + B) × B can also be carried out in one clock cycle, which leads to a further considerable increase in speed:
const n = 9;
array A [n] in RAM [1] at 1000 h
array B [n] in RAM [2] at 0dfa0h
array C [n] in RAM [1] at 100ah
for i = 0 to n with (A [i], B [i], C [i])
Δ1;
C = Δ1 = (A + B) × B;
next;

Anhand von Fig. 8 soll ein einfaches Beispiel eines Zellenaufbaus erläutert werden. Die Zelle 10 umfaßt zum Beispiel ein UND-Glied 51, ein ODER-Glied 52, ein XOR-Glied 53, einen Inverter 54 sowie eine Registerzelle 55. Die Zelle 10 weist darüberhinaus eingangsseitig zwei Multiplexer 56, 57 mit (den sechzehn Eingängen der Zelle entsprechend Fig. 1) zum Beispiel je sechzehn Eingangsanschlüssen IN1, IN2 auf. Über diesen (16 : 1)-Multiplexer 56/57 werden jeweils die den genannten logischen Gliedern UND, ODER, XOR 51. . .53 zuzuführenden Daten ausgewählt. Diese logischen Glieder sind ausgangsseitig mit einem (3 : 1)-Multiplexer 58 gekoppelt, der seinerseits mit dem Eingang des Inverters 54, einem Eingang der Registerzelle 55 und einem weiteren (3 : 16)-Multiplexer 59 gekoppelt ist. Der letztgenannte Multiplexer 59 ist zusätzlich mit dem Ausgang des Inverters 54 und einem Ausgang der Registerzelle 55 verbunden und gibt das Ausgangssignal OUT ab.A simple example of a cell structure will be explained with reference to FIG. 8. The cell 10 includes, for example, an AND gate 51 , an OR gate 52 , an XOR gate 53 , an inverter 54 and a register cell 55 . The cell 10 also has two multiplexers 56 , 57 on the input side (with the sixteen inputs of the cell corresponding to FIG. 1), for example sixteen input connections IN1, IN2 each. This (16: 1) multiplexer 56/57 is used to transfer the logical elements AND, OR, XOR 51 mentioned . . . 53 data to be selected. On the output side, these logic elements are coupled to a (3: 1) multiplexer 58 , which in turn is coupled to the input of the inverter 54 , an input of the register cell 55 and a further (3: 16) multiplexer 59 . The latter multiplexer 59 is additionally connected to the output of the inverter 54 and an output of the register cell 55 and outputs the output signal OUT.

Der Vollständigkeit halber sei angemerkt, daß die Registerzelle 55 mit einem Reset-Eingang R und einem Takteingang gekoppelt ist.For the sake of completeness, it should be noted that the register cell 55 is coupled to a reset input R and a clock input.

Dem im vorstehenden erläuterten Zellenaufbau, d. h. der Zelle 10 ist nun eine Ladelogik 30 übergeordnet, die mit den Multiplexern 56, 57, 58 und 59 verbunden ist und diese den gewünschten Funktionen entsprechend ansteuert.A charging logic 30 is now superordinate to the cell structure explained above, ie the cell 10 , which is connected to the multiplexers 56 , 57 , 58 and 59 and controls them in accordance with the desired functions.

Sollen zum Beispiel die Signale A2 mit B5 verrundet werden, so werden die Multiplexer 56, 57 den Leitungen "ZWEI" beziehungsweise "FÜNF" entsprechend aktiv geschaltet; die Summanden gelangen dann zum UND-Glied 51 und werden bei entsprechender Aktivierung der Multiplexer 58, 59 am Ausgang OUT abgegeben. Soll zum Beispiel eine NAND-Verknüpfung durchgeführt werden, so schaltet der Multiplexer 58 zum Inverter 54 und am Ausgang OUT steht dann das negierte UND-Ergebnis an.For example, if the signals A2 are to be rounded off with B5, the multiplexers 56 , 57 are switched to the lines “TWO” and “FIVE” accordingly; the summands then arrive at the AND gate 51 and are output at the output OUT when the multiplexers 58 , 59 are activated accordingly. For example, if a NAND operation is to be carried out, the multiplexer 58 switches to the inverter 54 and the negated AND result is then present at the output OUT.

Claims

1. A method for operating a data processing device with a programmable and configurable cell structure, the data processing device containing a cell matrix from a plurality of orthogonally arranged, homogeneously structured cells, which are freely programmable in their function and networking by a charging logic, characterized in that

a) a configuration program, consisting of a sequence of loading logic commands, each specifying the function and networking of the individual cells, so that a special configuration with special application (s) results from a partial sequence which has the charging logic (30) accessing (Fig. 3, Fig. 8, Fig. 14, Fig. 15) and executed by it,
b) initially due to a certain number of loading logic components, a configuration ( FIG. 20b) with a special application (s) is missing, by means of the loading logic ( 30 ) at the beginning as a starting configuration in the cell matrix ( FIG. 2, FIG is set. 3),
c) a state machine ( 45 ) or a comparator ( 48 ) detects the end of the application of one or more cells,
d) through the recognition, there is a feedback to the loading logic ( 30 ), which continues the program execution of the configuration program with another partial sequence ( FIG. 21b, FIG. 22b), which only reconfigures, ie recreates, cells that perform their function for have already ended the current application or are not required, so that the processing of the data streams of those still involved in the current configuration is carried out in parallel (e.g. registers 41 , 42 ) or pipelined (e.g. FIG. 22a, summer 141 and Multiplier 149 ) working cells that are still in the execution of their application is not disturbed.

2. A method of operating a data processing device according to claim 1, characterized in that by a prioritization logic (z. B. Fig. 19) the reconfiguration of the independent cells working in parallel or pipelined, a function or configuration in the optimal correct order over a Feedback to the charging logic is ensured.

3. Method for operating a data processing device according to one of claims 1 or 2, characterized in that the operands are obtained from several, any number of independently addressed and controlled memories ( Fig. 20a, Fig. 20c; RAM1, RAM2) and the result in one of these memories ( FIG. 20a, FIG. 20c; RAM1) or an independent or a separate data channel ( FIG. 14, FIG. 15; IO) is output.

4. A method for operating a data processing device according to one of claims 1 to 3, characterized in that for better utilization of the networking of cells, buses with the help of intermediary switches ( Fig. 18; driver) are used, so that buses are divided into independent sections can be.