WO1990004829A2

WO1990004829A2 - Object-oriented, logic, and database programming tool

Info

Publication number: WO1990004829A2
Application number: PCT/US1989/004687
Authority: WO
Inventors: Fredric H. Mellender; Andrew G. Straw; Stephen E. Riegel
Original assignee: Eastman Kodak Company
Priority date: 1988-10-24
Filing date: 1989-10-23
Publication date: 1990-05-03
Also published as: EP0439533A1; WO1990004829A3; JPH04501477A; US4989132A

Abstract

A programming tool is provided which integrates an object-oriented programming language system, a logic programming language system, and a database in such a manner that logic terms can be treated as objects in the object-oriented programming language system, objects can be treated as logic terms in the logic programming language system, and logic terms and objects are stored in the database in a common data structure format. Automatic management of the database is provided which is transparent to the user.

Description

Object-Oriented, Logic, and Database Programming Tool

Technical Field of the Invention

The invention relates to a programming tool that allows application programming in both logic and object-oriented style, and which provides integrated database support Background of the Invention

Object-oriented programming, logic programming, and database facilities have all been shown to have significant power in the writing of applications to run on a computer. No single programming tool has successfully integrated all three facilities in such a way as to eliminate an explicit interface between them. Normally, one must convert between object data to logic data to use the logic programming system, and then convert the logic data back again in order to use the object-oriented system. Furthermore, one must normally make explicit calls to a database manager in order to retrieve and store application data.

There have been some attempts to provide combined logic and object-oriented programming tools. For example, the Smalltalk/V (Smallialk/V Tutorial and Programming Handbook, Digitalk, Inc., 1987) allows the user to invoke a logic programming tool (Prolog) from an object-oriented one (Smalltalk). However, the only kind of data (terms) that Prolog understands are strings, symbols, numbers, structures, and lists of any of the above. Furthermore, the Prolog structures are constrained to be a type of list from the object-oriented programming tool. Additionally, Smalltalk/V does not have database storage for the objects.

There have also been attempts to provide database support for object-oriented tools. For example, the Gemstone system, a product of Servio- Logic, Inc., while supporting a database server that can be programmed in Smalltalk, does not allow the application to be written in Smalltalk in such a way that the database server is transparent: i.e. the application must make specific calls to the database server ('Integrating an Object Server with Other Worlds', by Alan Purdy el al, ACM Transactions on Office Information, Vol. 5, Number 1, Jan. 1987). Gemstone does not contain any logic programming tools.

Some so-called "expert system shells" (e.g., Nexpert Object from Neuron Data, Inc.) allow for objects, rules and database features to be combined, but these tools are for the construction of a certain class of application ("expert systems"), and do not provide a general-purpose programming tool.

It is the object of the present invention to solve the problem of providing a general purpose program ming tool that smoothly integrates object-oriented and logic programming, and provides the user with database facilities that are transparent to the user.

Summary of the Invention

The present invention solves the problem by providing a single programming tool (referred to herein as Alltalk) which allows the programmer to write applications in an object-oriented language (a dialect of Smalltalk, also referred to herein as Alltalk), a logic programming language, (which is an extension of Prolog, herein called ALF) or a combination of the object and logic programming languages which allows the logic programming language system to consider any object from the object-oriented programming language system as a term in the logic programming language, and which supplies database management on behalf of the programmer, without the need for any specific database management control statements to be supplied by the programmer.

The main components of the Alltalk tool include a work station having an operator interface, a mass memory, and a CPU. An object-oriented programming language system tunning on the work station includes an object-oriented programming language and an object-oriented language compiler for translating, source code written in the object-oriented programming language into objects and interpreter code. Also running on the work station is a logic programming system including a logic programming language having components of terms, clauses, predicates, atoms, and logic variables, and a logic language compiler for translating source code written in the logic programming language into objects. A database residing in the mass memory stores objects and components of logic programs as objects in a common data structure format, applications data, and applications stored as compiled interpreter code. The database is managed by an database manager that represents objects and components of the logic programming language in the common data structure format as objects and is responsive to calls for retrieving and storing objects in the database and for automatically deleting objects from the database when they have become obsolete. An interpreter executes the interpreter code and generates calls to the database manager. A logic subsystem solves logic queries and treats objects as components of a logic program.

According to a further aspect of the present invention, an improved database format is provided for an object-oriented programming language system. The database has a key file and a prime file. The prime file contains records of variable length for storing objects, and the key file contains records of fixed length for storing the address, record length, and type of object in the prime file. An improved database manager for managing this database includes an object manager employed by the compiler, interpreter, primitives and utilities for providing access to objects in the database, and for maintaining organization of objects in the database. An access manager is called by a buffer manager for retrieving objects from the database, a transaction manager for updating the database with new or changed objects at commit points, and for undoing changes to objects upon aborts, and the object manager for providing high level interface to the database. A buffer manager is called by the object manager for generating calls to the access manager, and by a pool manager for keeping an in-memory copy of objects. The pool manager maintains memory for buffers.

According to another aspect of the present invention, an improved garbage collector is provided for a heap based programming language system. The garbage collector employs the concept of regions for garbage collection. When a context (representing the state of a method which is executing in the system) is created, it is assigned a region number. When an object is created or accessed by a method, it is' assigned the region number of the context of the method that created or accessed it, unless the object was previously assigned a lower number. When an object is returned from a called method to the calling method, the object is moved to the region of the calling method. When reference is made from a first object to a second object assigned to another region, the second object is moved to the region of the first object. When returning from a method, if the context to which it is returning belongs to a number at least two lower than the current region number before returning, the regions with the higher number than that of the context to which it is returning are collected (i.e., the objects in these regions are discarded).

According to a still further aspect of the present invention, the runtime performance of a Smalltalk programming language system is improved by implementing a technique called message flattening. The compiler flags any method which consists of a single return statement which returns either an instance variable, or the result of a primitive, for which the first argument is self, and the other arguments correspond to arguments to the method. The interpreter detects these flags at runtime and flattens any message that would normally invoke these methods, by replacing this message send in the first instance with an assign, and in the second instance with a primitive invocation. Brief Description of the Drawings

Figure 1 is a schematic diagram showing an overview of the invention;

Figure 2 is a schematic diagram of the compiler;

Figure 3 is a schematic diagram of the runtime environment;

Figure 4 is a schematic diagram showing initialized stacks for contexts and block stubs;

Figure 5 is a schematic diagram showing the creation of a new context;

Figure 6 is a schematic diagram showing creation of a block;

Figure 7 is a schematic diagram showing creation of a second block;

Figure 8 is a schematic diagram showing the creation of a new context;

Figure 9 is a schematic diagram showing a block evaluation;

Figures 10-13 illustrate the modes of block execution;

Figure 14 shows the creation of a process;

Figures 15-16 illustrate process management;

Figure 17-18 show the relationships between the context stack, processes, regions, and objects;

Figure 19 is a schematic block diagram illustrating the functions of the garbage collector;

Figure 20 shows the in-use table's structure and internal relationships;

Figure 21 shows the in-use table's relationships with the object table, the buffers, and the database;

Figure 22 shows how garbage is collected upon a method return; and

Figure 23 is a schematic block diagram illustrating the functions of the ALF compiler.

Description of the Invention

A portion of the disclosure of this patent document contains material to which a claim of copyright protection is made. The copyright owner has no objection to the copying of the patent document or the patent disclosure, but reserves all other rights.

1 Introduction

The Alltalk tool runs on workstation type hardware, such as a Sun 4/360 by Sun Microsystems, Inc., executing the UNIX operating system (UNIX is a trademark of AT&T). Referring to the Drawings, Figure 1, the hardware includes an operator interface including a visual display (CRT) 10, a keyboard 12, and a pointing device 14, such as a 3 button mouse. The hardware also includes mass memory, such as a disk 16 on which the Alltalk database resides, as well as a CPU and main memory 18. The Alltalk software which is executed by the CPU and main memory 18 consists of an Alltalk compiler 20 for a dialect of the Smalltalk language (also called Alltalk) and an Alltalk runtime environment 22. The hardware components of the workstation are connected by a bus 24. 2 Overview

The Alltalk compiler 20 is a program for translating Alltalk language source statements into interpreter code. The compiler is generated by the YACC and LEX utilities in the UNLX operating system, and contains subroutines written in the C programming language.

Referring to the Drawings, Figure 2, the compiler operates in 2 phases: the first phase 26 parses the source code written in the Alltalk language 28 and constructs an intermediate code 30. The second phase 32 takes the intermediate code and generates class objects, constant objects, and method objects and places these in a database 40. These objects are subsequently retrieved by the runtime environment 22 (see Fig. 1).

The runtime environment 22 is written in the C programming language, and in Alltalk. Referring to the Drawings, Figure 3, the logic language compiler 36 and the logic subsystem 38 are both written in Alltalk. These are compiled through the previously mentioned Alltalk compiler 20, and the output placed in the database 40 and hence available to the runtime environment 22. Other applications 42 written in Alltalk are similarly available to the runtime environment after compilation. Application programs 42 (called methods) are processed by an interpreter 44, which calls other components of the runtime environment, which includes: a transaction manager 46 which can commit and abort transactions, an object manager 48 which is called to create and retrieve objects, a method fetcher 50 which determines the correct method to execute next, and a garbage collector 52 which detects and removes unneeded objects from main memory. The object manager 48 calls upon a buffer manager 54 to determine if a requested object is in memory or needs to be fetched from the database. If the object is to be retrieved from the database, a pool manager 56 is called to find space in an appropriate buffer, after which an access manager 58 is called. It is the access manager 58 that accesses the disk 16 containing the database 40. 3 Compiler

The Alltalk compiler 20 translates class descriptions written in a dialect of the Smalltalk language herein referred to as Alltalk into database objects for use by the Alltalk interpreter 44 during execution.

3.1 Synopsis

The Alltalk compiler 20 takes a file containing one or more complete Alltalk class descriptions, and for each class generates:

1. A class object, containing a dictionary of the methods in the class and specification of the instance and class variables,

2. Compiled methods, each consisting of "bytecodes", which drive the runtime interpreter, and

3. Objects representing constants encountered during compilation, (numeric values, strings, etc.) which are placed in the database 40 for use by the interpreter 44 during execution.

The Alltalk compiler 20 consists of two phases. The first phase 26 (see Fig. 2) does the compilation work, (parse, optimization, and code generation), while the second phase 32 resolves global symbols and loads the results into the database. The two phases communicate via intermediate code 30 (written in an assembler-like intermediate language) which can be examined and altered by the user, if desired.

The following is a description of the organization of the internals of the Alltalk compiler 20, including code generation strategies and optimization techniques. 3.2 Phase 1 (kcom)

3.2.1 Parsing

The first phase 26 of the Alltalk compiler 20 consists of two distinct processing stages:

1. Parse tree construction, and

2. Code generation, (including optimization).

The parsing phase is implemented in a fairly straightforward manner using the UNLX yacc/lex parser generator/lexical analyzer tools. The primary goal of the parsing stage is to create an internal parse tree representation of the class description and its methods which can be analyzed using a relatively simple set of mutually recursive tree-walking routines. In addition, the grammer of the input file is checked and errors are reported to the user.

The grammar specification of the object-oriented language is virtually identical to that specified in the syntax diagrams of the standard Smalltalk language reference, Smalltalk-80 The Language and Its Implementation, by Goldberg and Robson. The most notable variation in the Alltalk grammar is that of allowing a primitive invocation to be used as a primary expression and to have primary expressions as arguments, (this is adopted from Little Smalltalk, by Timothy Budd). This allows Alltalk primitives to be intermixed freely with the Alltalk language as if they were function calls which return a value, (which is essentially what the primitives really are), instead of as wholesale replacements for methods, as in standard Smalltalk.

Additional productions have been included to allow for reading an entire class description from a file, (in a form roughly similar to Smalltalk "fileln/file Out" format). These additional productions include "header" information such as superclass specification, instance/class variable declarations, and instance/class method classification statements.

While it is possible to build the entire analysis and generation mechanisms directly into the action portions of the yacc productions, the conciseness of the analysis and generation stage would be lost in that it becomes difficult to piece together how the parser actions interact to accomplish that stage when the controlling function is the yacc parser. Clarity is enhanced by having the analysis and generation functions make explicit their own walking of the parsed information, since it may vary from that of the parser at various points in the compilation. For example, more complex/global optimization techniques, such as inter-statement optimizations, may need to determine their own scope of applicability across several statements worth of parsed information. Such techniques are harder to embody as a single understandable function when mixed with the simple actions of parsing.

The basic parse node is a simple binary node, (left and right child pointers), with placeholders for the node type constant, a source code line number, and a string pointer.

Parse nodes are created via a function called makenode(), which allocates storage for the node, inserts the current source line number, and sets the other elements as specified by the user. The storage allocated for these nodes, (as well as for the class and method structures and copies of strings), is not tracked in the Alltalk compiler 20 since the compiler is expected to be run only for the duration of the compilation of a file.

A sample parse tree for an Alltalk method is given in Table 3.1. Syntactical shorthand and default meanings, such as the return value of a method having no statements being "self', or a block having no statements being "nil", are fleshed out during the parse phase in order to limit the amount of special case logic in the analysis and generation phase.

A successful parse generates a parse tree for the statements of each method in the class. These parse trees are anchored in a method structure for each method, which are all, in turn, linked to a single class structure. When the parsing of a class is complete, the class structure is handed to analysis and code generation routines. 3.2.2 Code Generation

The major components of this stage of the compiler are:

1. Symbol table (symbol. c)

2. Code generation (compiles)

3. Code management (codex)

4. Optimization (optimize. c)

Generally, the processing steps involved in this stage, (implemented by a function called compile Class()), proceed as follows:

1. The symbol table is initialized with the instance and class variables available via the superclass chain for the class. These symbols are retrieved from a symbol file in the local directory'. It is considered a fatal error if the superclass cannot be found in the symbol file, (i.e., the superclass must be compiled first).

2. The instance and class variables for the class are added to the symbol table. Name clashes involving superclass variables are also considered fatal.

3. Each method is compiled. During method compilation, bytecodes are collected into segments corresponding to groups of statements in the method: one for the method itself, and one for each block within the method. When method compilation is complete, the method code segment is emitted first, followed by the segments for each block.

4. After the methods have been successfully compiled, a record of the class' instance and class variables are written to the symbol file in the local directory. This makes the class available for use as a superclass in subsequent compilations.

The method compilation step is the heart of the compilation task. Before describing this step in detail, a description of the compiler's view of symbolic references and the symbol table is given.

3.2.3 Symbols

Throughout the Alltalk compiler 20, references to named symbols in the program being compiled, as well as references to unnamed runtime storage are represented in a uniform manner. This uniform representation allows the code generation stage to freely create and pass references between the recursive routines which implement this stage without regard for their type until a leaf routine which needs detailed type information is executed. The conciseness of the code generation routines is greatly enhanced with this representation scheme.

There are nine reference types, as follows:

Named References

Instance Variable

Class Variable

Method Parameter

Formal Method Temporary Block Parameter

Global Symbol ("true", "false", "nil", class name, etc.)

Unnamed References

Constant ("10", "3.14", 'a string', #symbol, etc.)

Compiler Temporary (used in evaluating intermediate

expressions) Block Stub/Closure (reference to storage holding

the runtime id of the closure)

The symbol table supports a subset of the named references, separating them into the three categories of: 1) class, 2) instance, and 3) temporary symbols. Temporary symbols encompass the method parameter, formal method temporary, and block parameter references. Global symbol references are never actually placed in the symbol table, but are materialized whenever the search for a name fails. These symbols are resolved by the second phase 32 of the Alltalk compiler 20, since the cross reference values for these names are actually present in the runtime system dictionary contained in the database 40.

The symbol table interface routines contain the usual routines for the addition of symbols, (add Symbol()), and name-based search for symbols, (find Symbol()). An initialization routine, (init Symbols()), purges the table and then uses the globally specified superclass name to populate the table with "ref ' structures for the instance and class variable symbols available via the superclass chain, as recorded in the symbol file in the local directory. A routine for writing the instance and class variable symbols, (write Symbols()), to the local symbol file for the globally specified class, (i.e., the one being compiled), is also provided. Finally, a pair of general routines, (mark Symbols() and release Symbols()), are available for get/set of placeholders in the symbol table. These are primarily used to record the starting position of method and/or block temporary symbols, so that they can be removed at the end of the compilation of the method and/or block statements.

3.2.4 Method Compilation

In the runtime environment 22 (see Fig. 3), a method is executed with an associated "context" containing local storage organized as an array of temporary slots, analogous to a "stack frame" in a conventional language. This local storage is divided into the following five sections from the compiler's point of view:

1. The object id of the receiver, known as "self.

2. Method parameters.

3. Formal method temporaries, (named temporaries).

4. Compiler scratch area for intermediate expression evaluation.

5. Block stub/closure id storage for blocks in the method.

A general mechanism for tracking the use of the temporary slots is implemented in the compiler using a set of macro routines. This. set includes routines for: allocating a number of temporaries, (alloc Temp()), which returns the starting slot for the requested count; freeing a number of temporaries, (free Temp()); get/set of temporary usage information, (get Temp Use() and set Temp Use()); clearing usage information, (clear Temp Use()); and requesting the high water mark for temporary usage, (maxTemp Use())- Temporary usage is tracked with these routines for the first four kinds of temporaries listed above. Storage for block ids is tallied separately during method code generation since it is not known what the required number of compiler scratch temporaries will be until the method compilation has finished.

Before the method statements are examined, several initialization steps are performed:

1. The symbol table is populated with the entries for "self", "super", the parameters, and the formal temporaries. The slot index for each entry is determined by allocating a temporary as each symbol is added to the table.

2. A code segment is allocated for the method statements and made to be the "active" segment.

During compilation, generated code is placed in the "active" code segment, which is switched when compilation of a new list of statements, (e.g., a block), is started or completed.

3. Label generation is reset, (used for branch targets and block entry points).

4. The block count is reset.

Also prior to commencing code generation, the parse tree of the method is examined to see if it can be tagged for "flattening" at runtime. "Method flattening" is a technique for determining whether a runtime message send can be avoided because the method is "trivial". A "trivial" method is one which contains a single statement returning either:

1. An instance variable, (can replace send with an assign), or

2. The result of a primitive for which first argument is self and the remaining arguments to the primitive invocation line up exactly with arguments to the method, (can replace send with primitive invocation).

At this point, compilation of the method statements is initiated by calling compile Statement List() with a pointer to the first statement parse node of the method. This routine invokes compile Expr() to compile the expression associated with each statement in the list. Compile Statement List() is used to compile lists of statements for blocks as well as methods, generating appropriate return bytecodes when explicit return statements are encountered and after the last statement in the method or block. Compile Statement List() distinguishes between method and block statement lists by the value of active code segment id, which is -1 for a method or >= 0 for a block. Provision is also made for the case of "inline" block code generation, which is used in optimization of certain messages involving blocks, (such as messages to booleans), described later.

3.2.5 Expressions

Expression compilation is the center of most activity during the compilation of a method. Compile Expr() defines the compilation actions for all parse nodes other than statements in a concise manner. This routine is invoked with the node to be compiled and a destination specification for the result of compiling the expression indicated by the node in the form of a "ref" structure. The destination specification allows the calling routine to control placement of the expression's value, which is particularly useful for aligning values for message sends, minimizing unnecessary data movement at runtime. Simple expressions, such as identifiers and constants, are trivial compilations requiring only assignment of the value associated with the identifier or constant description to the specified destination. An explicit Alltalk assignment expression, (e.g., "a <- b + c"), only requires compilation of the expression on the right of the "<-", with the reference on the left as the destination, in addition to assigning this result into the specified destination for the assignment expression itself, (e.g., "d <- (a <- b + c)"), if indicated. The remaining expression types, (messages, cascades, primitive invocations and blocks), require somewhat more involved compilation steps, hence, these cases have been split into separate routines, (gen- Send(), gen Cascade(), gen Exec Prim(), and gen Block()). We now describe the compilation steps performed in each of these cases.

3.2.6 Messages and Cascades

The runtime implementation of the send message bytecode requires that the receiver and arguments be present in a contiguous set of the sending context's temporaries. The location for the return value of the message send is also required to be a temporary in the sending context, though it need not be adjacent to the receiver and arguments.

Gen Send() complies with the first condition by allocating a contiguous set of temporaries, (via alloc- Temp()), and compiling the receiver and argument expressions with each of these temporaries, (in order), as the specified destination. Hence, results of receiver and argument expressions are cleanly aligned with their use in containing message sends, eliminating unnecessary data re-positioning assignments. A simple optimization is also done at this point. If the receiver and argument temporaries already happen to line up, (detected by lineup()), new temporaries are not allocated and the receiver and argument values need not be moved.

The second condition, (destination must be a temporary), is honored by examining the specified destination reference and allocating a temporary to hold the result of the message send if the destination is not already a temporary. This situation is remembered and code is generated for moving the result from the allocated temporary to the actual destination after the message send. This implementation style allows for the addition of variations of the send message bytecode for non-temporary destinations, if the need arises.

The previous comments also apply to cascaded message sends, (gen Cascade()), except that the receiver expression is only evaluated once and the result placed in a temporary, to which the remaining messages in the cascade are sent, (gen Cascade Send()).

3.2.7 Primitive Invocations

As with message sends, Alltalk's primitive invocations require that arguments to the primitive be in a contiguous set of the invoking context's temporaries, and the destination for the result be a temporary in the invoking context Gen Exec Prim() handles the non-temporary destination and argument alignment cases, (using lineup()), in the same manner as is done for message sends. Each primitive argument is compiled, allowing arbitrary expressions to be used as arguments.

3.2.8 Blocks

Blocks are the most involved expression compilations in that they cause changes in the global state of the compiler. In Alltalk, a block is a list of statements which are to be executed with their own context when a "value" message is sent to it The lexical aspects of a block allow it to refer to names available to the method in which the block is defined, as well as to names in any containing block. These names include the method's parameters and formal temporaries and any containing block's parameters. These semantics imply that a block is a static "object" of sorts which can potentially have multiple runtime activations, with each activation dynamically establishing variable name <-> storage bindings. Hence, from the compiler's point of view, a block is a separate list of statements to be compiled and "set-up" as an object, which may also include cross-context runtime references to be represented.

Gen Block() alters the global state of the compiler to create the proper compilation conditions to meet the needs described above. The block being compiled is given a unique id within the method, and a new code segment is allocated and marked with this id and connected to the list of code segments generated for the method so far. The currently active code segment is saved, along with its temporary usage, (since the block will have its own context), and the new segment is made the active segment (generated code is always placed in the active segment). The previous code segment and its temporary usage are restored when compilation of the block is completed. The symbol table is marked so that the block's symbols, (block parameter names), can be released at the end of the block's compilation, and the block's symbols are added to symbol table for proper scoping. Finally, a label is generated to mark the start of the block's code in the method.

At this point the state of the compiler has been properly altered, and compilation of the statements in the block is initiated via compile Statement List().

Once the block has been compiled, all the information needed to describe the block's activation characteristics at runtime, (temporary usage and entry point), has been established. This information is supplied to the runtime interpreter 44 via the set up block bytecode. This bytecode causes the interpreter to copy this information and associate it with a unique runtime id, known as a block stub id. A block stub id can be manipulated much the same way as any other object id In the case of returning a block stub, or assigning a block stub to an instance variable, Alltalk establishes an object for the block stub. The information associated with the block stub id is used to establish a context for executing the statements of the associated block whenever the "value" message is sent to this id, (i.e., the evaluate block bytecode is executed for the id). Note that this requires that a block must be "set up" before it can be "evaluated" at runtime.

Alltalk choses placement of the set up bytecode for a specific block so that the bytecode is not executed an uncontrolled number of times. This is because the set up block implementation in the interpreter does not check for multiple "set ups" performed for the same block.

The solution to the placement problem is to group the set up block bytecodes for any "top level" block, (i.e., any block encountered while generating code for the method statements), and its contained blocks, and place them in the method code segment ahead of the the first use of the "top level" block. This technique avoids executing set ups for any block(s) which are not in the specific control flow path at runtime. Gen Block() implements this strategy by setting a pointer to a position in the method segment code at which the set up block bytecodes are to be "spliced" when a "top level" block is entered.

3.2.9 Message Optimizations

Except for aiding the runtime environment for "method flattening", the rest of the compiler optimizations involve recognition of specific message selectors in the source code, (optimize.c). The optimization strategy for these selectors is to generate inline code to implement the specific semantics of the selector, (assuming a specific receiver class), in order to avoid sending the actual message at runtime. These optimizations are detected by the gen Opti Send() routine which is invoked from compile Expr() when a message expression is compiled. If gen Opti Send() can handle the message, the normal compilation via gen Send() is avoided by compile Expr(). The message selectors/receiver class combinations which are currently optimized are listed in Table 3.2.

The complexity of these optimizations vary from simply generating special bytecodes, (e.g., Integer messages), to inline block code generation with conditional branch bytecodes for implementing looping constructs, (e.g., Block "while" messages).

Due to the straightforward expression of these optimizations, they are not treated in detail here. However, one of the more complex optimizations, (opt While()), will be described to highlight and convey an understanding of some of the issues and supporting procedure structure involved in these optimizations.

Opt While() handles optimization of the various "while" messages which can be sent to blocks, (while True:, while False:, while True, and while False). This routine demonstrates the need to deal with:

1. Evaluation of literal or non-literal block objects in receiver and/or argument positions, 2. Proper placement of set up block bytecodes to avoid repeated set up of the same block(s),

(described previously in the section on block expression compilation), and

3. Generation of additional code to implement the semantics of the message, (looping, in this case).

Since the semantics of the "while" messages clearly involves sequenced evaluation of receiver and argument blocks, it is possible, if either block is literal, to treat the statements of that block as if they were in the statement list of the method or block containing the "while" message. This causes code to be generated directly into the currently active code segment, ("inline"), resulting in evaluation of that block in the current context at runtime, instead of setting up a separate context for evaluation of the block code. If either block is not a literal, (e.g., passed in as a parameter), that block must be evaluated in a separate context, (performed by the "evalb" bytecode).

This situation of block code generation strategy arises in the optimization of many other of the messages listed in Table 3.2. Opi Block() determines the code generation strategy based on the type of the parse node representing the block in the parse tree. If the node represents a literal block in the source code, the statement list for the block is compiled into the active code segment using compile Statement List()- Otherwise, a "value" unary message expression, with the node representing the block as the receiver, is constructed and compiled under the explicit assumption that the receiver will be a block, (gen Eval- Block()). Note that this assumption is not made, (and a different bytecode is generated), when the "value" message is encountered in the original source code, since the actual receiver may not be a block at runtime, in this case.

Literal blocks which are part of the "while" message may be "top level" blocks, (i.e., outermost block of a nesting within a method). Because of this, opt While() must set the "splice point" in the method segment code for set up block bytecodes for any blocks contained in the "while" blocks, such that these bytecodes are placed outside the looping portion of the "while" code. This avoids the multiple "set up" problem for a block discussed in the previous section on block compilation.

With the background of the preceding discussion, the implementation of the optimization of "while" messages is summarized in the following steps:

1. If the "while" message is encountered in the method statement list, set a marker to the current position in the code as the "splice point" for set up block bytecodes for blocks which are encountered during compilation of the "while" message.

2. Generate a label to mark the start of the condition block, (i.e., the receiver of the "while" message).

3. Compile the condition block, (using opt Block()), with an allocated temporary as the destination for its evaluation result. 4. Generate a conditional branch to the end of the "while" message code, (step 7), based on the result of evaluating the condition block and the specific message being compiled.

5. Compile the body block, (using opt Block()), with no destination for its evaluation result.

6. Place an unconditional branch back to the label generated in step 2 to close the loop.

7. Generate code to assign "nil" to the destination specified for the value of the "while" message expression, (the destination may be "none"). This is the defined value for a "while" message expression.

8. Free the temporary allocated for the result of the condition block in step 3.

An example of the code generated for "while" message expressions with different combinations of literal and non-literal condition and body blocks is shown in Table 3.3.

The intermediate language is discussed further in the next section.

3.3 Phase 2 (kasm)

As noted in the synopsis, the second phase 32 of the Alltalk compiler 20 concerns itself with resolving symbols, and creating and loading the classes and methods into the database. 3.3.1 Intermediate Language

The intermediate language expected as input for this phase consists of tokens representing bytecodes, along with directives for establishing the class, delimiting methods, and tracking the Alltalk source file name and line numbers. A summary of these tokens and directives are listed in Tables 3.4 and 3.5, respectively.

In addition to the basic elements of the language, symbolic labels of the form "L<number>" are also available for use in the code, (for jeq, jne, jmp, and setb bytecodes), with target labels being required to start at the beginning of a line which contains no other tokens. Comments are allowed on any line, and are defined to be anything contained between a semicolon, (";"), and the end of the line. An example of this intermediate language for a method of a class called Foo, is shown in Table 3.6, which was constructed in order to demonstrate the variety of code and reference type representations generated by- phase 1.

3.3.2 Effective Addresses

References to various types of runtime variables and constants are represented in specific symbolic forms in the intermediate language which we call "effective addresses". These forms appear in the argument fields of many of the bytecode tokens, although not all forms are valid in specific argument positions of specific bytecodes. These effective address forms are summarized in Table 3.7, and the reader is again referred to the code in Table 3.6 for examples.

In the particular case of the "mov" bytecode, phase 2 translates both the source and destination effective address forms into one of six specific runtime reference types.

3.3.3 Operational Description

Phase 2 maintains a global state around the current class and method being "assembled", resulting in method-at-a-time assembly and placement into the database. The class object is not given to the object manager until all methods described in the input file have been successfully translated and passed to the object manager. This insures that the old version of the class, (hence, its methods), is not replaced unless assembly of the new version is successful.

In contrast to phase 1, this phase is very "flat", that is, it contains no recursive functions to walk parse trees, since each input statement is essentially a self-contained description. All the implementing functions, (assembled), are dispatched directly from the parser on a per statement, (or group of statements), basis, resulting in a very simple control flow.

Assembly of a method essentially consists of collecting the bytecodes described by the bytecode statements into a scratch area, (Method Bytes), and recording labels, references to labels, and block references in these statements for resolution when the end of the method is reached. Each bytecode statement has a corresponding translation routine, (assembles), which builds the runtime representation of the bytecode in the scratch area.

When the end of the method is reached, (end Method()). all label and block references are resolved and the object manager is called upon to allocate space for the compiled method object. In this area, the instance variable slots for the method object are initialized, (no Temps, no Parms, class Oop, selector Symbol, .... etc.), and the bytecodes are copied in from the scratch area. A dictionary entry relating the method selector symbol id of the method to the id of the compiled method object is also created and added to entries already established for other methods, (in the Methods global array). These dictionary entries are stored in the class object when the end of the class is reached, (i.e., when a new '.class' directive or end-of-file is encountered).

When the end of the class is reached, (end Class()), space is obtained from the object manager under the same object id as the previous version of the class, to cause replacement of that class. The class is then built in this area by filling in control information, including the object id of the first instance of the class obtained from previous version, the object id of the class name symbol, the id of the superclass and the size of the method dictionary for the class. The method dictionary entries are then closed-hashed, (by method selector symbol id), into a dictionary area in the class object. The class object is then flushed to the database, signaling completion of the assembly of the class, ending phase 2.

4. Interpreter

The interpreter 44 (see Fig. 3) is that portion of the Alltalk runtime environment 22 which the user invokes to run Alltalk applications. The interpreter 44 decodes the object code generated by the compiler 20 (see Fig. 2), and executes it calling upon many of the other services of the runtime environment 22. The interpreter 44 also includes a debugger, described below, which allows the programmer to inspect the running program in a variety of ways.

4.1 Synopsis

The previously described Alltalk compiler 20 for the Alltalk dialect of the Smalltalk language translates Alltalk source code into an intermediate representation, called bytecodes, and stores this representation in the database 40. Each bytecode represents an instruction for the interpreter 44, and consists of an operation code (a 32 bit integer) and a variable number of parameters. Applications are executed using the Alltalk interpreter 44. The Alltalk interpreter 44 uses the object manager 48 as the interface to the database 40. It also calls on the transaction manager 46 and the garbage collector 52. In addition, it invokes primitives which interface to the UNIX operating system to do things like operating on primitive data types (integer addition, floating point multiplication, string concatenation, etc.), performing file I/O, managing the display, and controlling keyboard and mouse input

The object manager 48, transaction manager 46, garbage collector 52, and primitives are described in later sections in more detail.

The main functions in the interpreter 44 that are discussed in this section can be grouped into the following main categories:

1) bytecode loop;

2) bytecode handlers;

3) context management;

4) process management;

5) initialization and shutdown; and

6) the debugger.

4.2 Bytecode Loop

The state of the Alltalk interpreter 44 is captured, essentially, in a global array called Processes. Each element of this array represents one Smalltalk process. At interpreter initialization, one process is created. The user's application can create new processes, switch processes, and destroy processes as- needed Associated with each process is a stack of contexts, and a pointer to one which is the currently-executing context of that process. A context is created when a message is sent or a block is evaluated, and is destroyed when the corresponding message/block returns. Associated with each context is a set of bytecodes for the corresponding method/block, and a pointer to one which is the currently- executing bytecode of that context The bytecodes are the object code to which the user's appbcation was compiled. Each context also has an array of temporaries which are used to hold intermediate results of the execution of the associated method/block.

At any given time, only one of the Processes is running; it is the current process. The current context of that process is, then, the current context The current bytecode of that context is the current bytecode. The basic operation of the Alltalk interpreter 44 is a bytecode decode/dispatch loop. Code exists in the interpreter for handling each type of bytecode generated by the compiler. The interpreter decodes a bytecode to determine its type, then invokes the appropriate code for that bytecode type. We call the piece of code for a particular bytecode type a bytecode handler. Each bytecode handler increments the bytecode pointer so that after the handler completes, the interpreter main loop can decode and dispatch the next bytecode. Bytecode handlers can manipulate the bytecode pointer and other interpreter data structures in ways to affect program flow.

The routine exec_bcodes() contains the bytecode loop. It decodes the bytecodes and invokes the appropriate bytecode handler. Before doing so, however, it checks to see if it should switch processes, i.e., it checks whether a different Smalltalk process should become the currently-active process. See the section below on Process Management for details on how process switches are handled and new processes are created.

4.3 Bytecode Handlers

There is one bytecode handler in the Alltalk interpreter 44 for each type of bytecode generated by the Alltalk compiler 20. Each handler is one (or more) case(s) in a C-language switch statement. The switch statement is part of exec_bcodes() in the file exec_bcodes.c. Each case of the switch is in a separate file to make source code maintenance easier. At compile time, these files are included in exec_bcodes.c via #include's. This strategy was chosen over making the bytecode handlers each separate procedures because it cuts down on call overhead in the bytecode loop. It also allows the use of machine registers for certain control variables, since the handlers are all within a single C language function. Note that thousands of bytecodes are executed each second; overhead for that many calls would be very large.

A complete description of the bytecodes and their parameters is included in Table 4.1.

-----------------------------------------------------

***************

***************** *****************

The bytecodes can be grouped into the following categories:

- execute a primitive;

- send a message;

- define a block;

- evaluate a block;

- return from a block or method;

- branch; and

- assign from one variable to another.

We describe each of these next

4.3.1 Execute a primitive

bytecodes discussed:

0x000→ 0x0FF (primitives 0-255)

Primitives are called by Alltalk code to do the low-level tasks. These tasks generally depend on the underlying hardware and operating system, and include things like file I/O, integer and floating point arithmetic, and using the display. The bytecodes numbered from 0 to 255 (decimal), i.e., 00 to FF hex, are reserved for primitives. Primitives are similar to methods in that they have a receiver, they have optional arguments, and they return an object They are unlike methods in that they are written in C rather than Alltalk, and no context is set up for them. 4.3.2 Send a message

bytecodes discussed:

0x100 (send_msg_bcode) 0x10B (send_param_msg_bcode) 0x10C (send_message_add) 0x10D (send_message_sub) 0x10E (send_message_eq) 0x10F (send_message_addl) 0x110 (send_message_subl)

The compiler generates several different types of bytecodes for messages. The normal message send is handled by send_msg_bcode. Messages of the type 'perform:' and 'perform:with:' are handled by send_param_ nsg_bcode. These two handlers operate in very similar manner. The main difference is that for send_msg_bcode, the message selector is known at compile time, and is included in the bytecode itself; for send_param_msg_bcode, the oop of the message selector is found, at run time, in a temporary of the current context.

The normal processing of a send_msg (and send_param_msg) is as follows. Note that we do not discuss various optimizations that we have put into send_msg bytecodes. These are discussed in a separate section below.

1) Get the oop of the receiver of the message from the temporaries of the sending context The send_msg parameter arg_start_slot is the index into the temporaries at which this oop is found.

2) If the receiver is not a context or positive integer, call the object manager to fetch the receiver object Note that contexts and positive integers are not managed by the object manager: contexts are not objects in Alltalk, and positive integers are encoded as negative oops.

3) Determine the receiver's class. If the receiver is not a context or positive integer, its class is found in its object header.

4) Call the object manager to fetch the method associated with the message we are processing. We pass to the object manager the hashed_seIector and super_flag parameters from the bytecode, plus the class of the receiver. It returns the method object which contains the bytecodes for the message we are processing.

5) In the sending context, store the value of the bytecode parameter put_answ_slot. This is needed when we return to this context from the method we are about to execute. It represents the index of the sending context's temporaries into which the returned result is to be put

6) Increment the bytecode pointer in the sending context. When we return to this context, we will continue executing bytecodes in this context at that point.

7) Create a new context for the message we are processing. Copy num_args arguments from the sending context, starting at arg_start_slot in the temporaries of the sending context They are copied into the temporaries of the new context starting at slot 0. Note that this assures that the receiver of a message can always be found in context temporary 0. The new context will have a bytecode pointer which points to its first bytecode. We make this context the current context and return to the bytecode loop.

4.3.3 Define a block

bytecodes discussed:

0x106 (setup_blk_bcode)

In Alltalk, blocks are not objects managed by the object manager, but rather are maintained by the interpreter as C data structures. When they are assigned to instance variables, or returned as the result of a message, they are made into objects, is the home context (this is discussed in more detail below). They can, however, be assigned to method temporaries and passed as parameters in messages without being made into objects first

When the interpreter encounters a setup_blk bytecode, it creates a data structure called a block stub, and gives it an object id ( which is a 32 bit integer) which we call an oop). The oop is in a special range, i.e., greater than or equal to INIT_CNTX_ID, so it can be recognized later as a block by the interpreter. The block stub contains enough information to evaluate the block when an eval_blk bytecode is later encountered. Its oop is stored back in the temporaries of the home meth od in which it is defined. It can then be handled like any other oop stored in temporaries (except for the cases mentioned above).

4.3.4 Evaluate a block

bytecodes discussed:

0x109 (eval_blk_bcode) 0x10A (eval_blk_bcode2)

Evaluating a block means executing the code that the block contains. Note that a block must be 'set up' before it can be evaluated However, a block which is set up may or may not be evaluated. For example, the if True: block and if False: block of an if True:if False: message won't both be evaluated. A block may be evaluated immediately after it gets set up, or later. It may be evaluated by the context in which it was set up, or the context which sets it up may pass it as a parameter in a message send, so that it gets evaluated by another context.

The eval_blk bytecode handler causes a block to be evaluated by converting the block stub for that block into an active context on the context stack of the active process. It makes that new context be the current context, and makes the global bytecode pointer point to the block's first bytecode. 4.3.5 Return from a block or method

bytecodes discussed:

0x101 (long_return) 0x104 (short_return)

When the Alltalk interpreter 44 encounters a return bytecode, it means that the currently executing context is finished, and it switches control to a previous context In addition, it passes back an object (actually the object's oop) to the context to which it is returning.

There are two different return bytecodes. What we call the long return (also known as return from method) causes the interpreter to return to the context just previous to the home context of the current context. The home context of a method context is itself; the home context of a block context is the context of the method in which the block is defined/setup. Therefore, a long return from a block is the same as doing a return from the block's home method. Long returns are indicated in the Alltalk code by the caret symbol, "*".

A short return causes the interpreter to return to the context just previous to the current context regardless of what it is. A short return and a long return from a method context are the same. (The Alltalk compiler 20 always generates a long return for returns from a method context) A short return from a block means to simply return to the previous context in the stack. This previous context is the context which caused the block to be evaluated; it may or may not be the block's home context

4.3.6 Branch

bytecodes discussed:

0x105 (branch) 0x107 (branch_on_equal) 0x108 (branch_on_not_equal)

Branch bytecodes are used to implement control structures. In addition, branching bytecodes are used by the compiler as part of several optimizations.

The unconditional branch bytecode (0x105) simply increments/decrements the bytecode pointer by a certain amount. The conditional branch bytecodes compare an oop found in a temporary of the current context with an oop contained in the bytecode itself. Whether or not the bytecode pointer is incremented depends on the results of comparing these two oops.

4.3.7 Assign from one variable to another

bytecodes discussed:

0x1nm (assign type n variable from type m variable)

The Alltalk compiler 20 and the Alltalk interpreter 44 understand six different types of variables. These six types are as follows:

Type 1

This type of variable is simply an oop that the Alltalk compiler 20 generates, and includes as part of the assignment bytecode. Obviously, it cannot be the destination of an assignment statement, only the source. Examples of Type 1 variables are string, character, integer, and floating point constants, and class names.

Type 2

This type of variable is a temporary in the home context of the current context The Alltalk compiler 20 specifies it as an index into the array of temporaries.

Type 3

This type of variable is an instance variable of the object which is 'self in the current context. The Alltalk compiler 20 specifies it as an index into the instance variables of the receiver.

Type 4

This type of variable is an indirect reference to a particular instance variable of a particular object. The Alltalk compiler 20 specifies the instance variable by specifying an index into the temporaries of the current context (which specifies the object), plus an index into the instance variables of that object (which specifies the particular instance variable).

Type 5

This type of variable is a temporary in the current context. The Alltalk compiler 20 specifies it as an index into the array of temporaries. Note the difference between this and the type 2 variable. For a method context type 2 and type 5 are the same because a method's home context is itself; for a block, type 2 refers to its home context's temporaries, and type 5 refers to its own temporaries.

Type 6

This type of variable is needed for nested blocks in which an inner block refers to an argument of an outer block. The Alltalk compiler 20 specifies the argument by giving two parameters in the bytecode. First is an index into the temporaries of the home context. In that particular temporary is found the id of me block stub of the outer block. The second parameter is an index into the temporaries of the outer block. In that particular temporary is found tiie oop of interest Since Smalltalk does not allow assignment to the arguments of a block, a type 6 variable cannot be the destination of an assignment statement, only the source.

Each assignment bytecode has a source variable type and a destination variable type. The destination is specified first, then the source. Because type 1 and type 6 variables cannot be destinations, there are 24 assignment bytecodes (4 destination types * 6 source types). The assignment bytecode handlers simply put the oop specified by the source into the location specified by the destination. 4.4 Context Management

As mentioned above, the state of the Alltalk interpreter 44 is contained in the global array, Processes. Each element in that array represents a process. In addition to the interpreter's C-language data structure for a process, there is also an instance of Smalltalk Class Process for each Smalltalk process in an application. In the following, we concentrate on the interpreter's data structure for processes, and ignore the Smalltalk object. Each process has associated with it a set of contexts. In the following, we explain how contexts are implemented for one process, but one should remember that there is one set of contexts for each process.

In order to improve performance of the Alltalk interpreter 44, it does not treat contexts as objects. Instead, they are maintained by the interpreter as C data structures. (As mentioned above, however, the home context may be turned into an object if an owned block is turned into an object).

The Alltalk interpreter 44 manages contexts in two pieces. One piece contains what are called active contexts. These are contexts associated with methods which have not yet returned and blocks which are executing and have not yet returned. This piece operates like a stack: when a message is sent or a block starts execution, the Alltalk interpreter 44 pushes anottier context on the stack; when a method or block returns, the Alltalk interpreter 44 pops one context (or more, in the case of a long return from a block) off the stack.

The second piece contains what are called block stubs. A block stub is established as the result of a setup_blk bytecode (see setup_blk_bcode). In order to treat blocks as objects, object id's (oops) are given to such blocks. The block stubs represent these pseudo-objects. They hold just enough information so that when a block is evaluated (a value message is sent to it), the Alltalk interpreter 44 can create an active context for it Note that a block stub exists as long as its home context exists; it does not go away just because its associated active context returns. In fact in the case of loops in Smalltalk code, the same block stub might be evaluated many times, having an active context created from it and destroyed each time.

Because block stubs are stored as a separate piece, the active contexts can be allowed to obey a stack discipline. This simplifies context management and improves performance.

The data structure for contexts is defined in "interp_types.h". Contexts are of fixed size, and have 64 temporaries each. (Smalltalk defines 64 as the maximum number of temporaries a context may have.) This allows the Alltalk interpreter 44 to allocate space for them and doubly link them at interpreter initialization time, rather than on the fly. They are allocated as an array, and have one array/stack of contexts per Smalltalk process. The routine init_cntx() initializes one context and it is called by init_cntx_stack() which initializes and links all contexts for a given process when the process gets created.

The data structure for block stubs is also defined in "interp_types.h". Block stubs are of fixed size. This allows space to be allocated for them and allows them to be linked at interpreter initialization time, rather man on the fly. They are allocated as an array, and have one array/stack of contexts per Smalltalk process. The routine init_blk_stub_stack() initializes and links all block stubs for a given process when the process gets created

In addition to the two arrays, the Alltalk interpreter 44 maintains a pointer to the current active context, cur_cntx, and a pointer to the next available (unused) block stub, next_blk_stub, for each process.

The fields of a context that are important for context management are described next.

prev, next

Each context has a prev pointer which links it to the previous context in the array/stack, and a next pointer which links it to the next context in the array/stack. These pointers are used rather than the array index to move between contexts. The Alltalk interpreter 44 follows the next pointer of the current context when it needs to add a new context This happens when a message is sent (see send_msg_bcode), or a block is evaluated (see eval_blk_bcode). The Alltalk interpreter 44 follows the prev pointer of the home context of the current context to find the context to which it should return when it does a long return; it follows the prev pointer of the current context itself when it does a short return (see "short return", Table 4.1).

home_cntx

For a method context home_cntx points to itself. For a block context, home_cntx points to the context of the method in which the block is defined. This pointer is needed when the Alltalk interpreter 44 does long returns from blocks, and when blocks refer to the temporaries of their home method. By having a method context's home be itself, the Alltalk interpreter 44 can handle all long returns (both from method contexts and from block contexts) in the same way.

first_block

The first_block field of a context points to the first block stub that the context could allocate.

This is used to free up block stubs when an active context returns.

my_blk_stub

For a method context my_blk_stub is not used, and is NULL. For a block context, the field points to the context's corresponding stub. This pointer is used by the debugger (described below), and is also used in conjunction with the prev_active_cntx field to handle the case where one block stub has multiple active contexts at the same time.

prev_active_cntx

For a method context, prev_active_cntx is not used, and is NULL. For a block context, it is used in conjunction with the my_blk_stub field to handle the case where one block stub has multiple active contexts at the same time. It saves a pointer to the previous active block context associated with this block context's block stub. If this context is the only active context associated with the block stub, then this field holds a NULL pointer.

The fields of a block stub that are important for context management are described next.

id

Each block has an id which is an oop (long integer) in a special range, that is, greater than or equal to the constant INIT_CNTX_ID. The id's are assigned to a stub when the process to which it belongs is initialized. The id can be stored in the temporaries of other contexts, and can be passed as a parameter in a message send. In this way, blocks can be treated (almost) like real objects for flexibility, and yet be managed by the interpreter for good performance.

Each block stub has a next pointer which finks it to the next block stub on the array/stack. When a new block stub is needed, the Alltalk interpreter 44 uses the one pointed to by the global pointer, nexι_blk_stub. At that time, it follows the next pointer of the stub pointed to by next_blk_stub to update next_blk_stub.

home_cntx

Each block stub has a pointer to its home context. If the stub gets evaluated, the Alltalk interpreter 44 needs this pointer in the active context created for the block. Via this pointer, it can get at the temporaries of the home context

active_cntx

When a block gets evaluated, the Alltalk interpreter 44 updates the stub with a pointer to the active context that gets created to do the evaluation. This pointer is needed in order to resolve references to type 6 variables.

When a block is stored in an instance variable, or passed back from a method the Alltalk interpreter 44 must make the block a persistent object In so doing, it must also make the home context a persistent object as well, since the block can reference temporaries of the home context Alltalk contains routines to make the block and its home context persistent objects (and thus they may then be stored in the database and manipulated as any other object), and to put the block and home context back on the stack so that the block can be executed.

Referring to the Drawings, Figures 4 through 13 show how context management is done in Alltalk. Each Figure shows the same portion of the active context stack and the block stub stack for one process. Each box in the Figures represents one context or one stub; only the fields involved in context management are shown. (The my_blk_stub and prev_active_cntx fields are shown only in Figure 13.) Pointers are indicated by arrows; pointers "connected to ground" represent NULL pointers. Pointers shown in double lines indicate pointers which were changed from the previous figure. The stacks grow downward.

Figure 4, shows the state of the two stacks after the interpreter has been initialized, but no messages have been sent Note that the next and prev pointers of the contexts, and the next pointers of the stubs have been established. Also, the id's of the stubs have been set. Figure 5, shows what happens when a message is sent. (We assume that the sending context is just off the top of the figure; the context we are about to create is the top box we see in the figure.) We follow the next pointer of the sending context to "create" a new context (from here on, called method context #1). The new context becomes the cur_cntx, and its class is Method Context. Since it's a method context, its home_cntx is made to point to itself. Its first_block pointer is made to point to the stub pointed to by next_blk_stub. Note that next_blk_stub is not moved; only when a block stub is used (i.e., set up) is the next_blk_stub moved forward.

Figure 6 shows the stacks after method context #1 sets up its first block. Setting up a block means that the Alltalk interpreter 44 created a block stub; it does not mean that the Alltalk interpreter 44 creates another active context. The block stub pointed to by next_blk_stub becomes the new block stub. The Alltalk interpreter 44 pushes next_blk_stub forward to the stub pointed to by the next field of the new block stub. The home_cntx field of the new block stub is made to point to the home_cntx of cur_cntx, i.e., method context #1. Note that if cur_cntx were a block context, the home_cntx of the new block stub would not be that block context but rather the block's home context. Note also that method context #1 does not change.

Figure 7 shows what the stacks look like after method context #1 sets up another block. We now have two block stubs whose home_cntx is method context #1.

Figure 8 shows the stacks after method context #1 sends a message. To handle this, the Alltalk interpreter 44 must "create" a new context (from here on, called method context #2). The Alltalk interpreter 44 follows the next pointer of the current context to find the next available active context, and make it the cur_cntx. Its first_block pointer is made to point to the block stub pointed to by next_blk_stub. Since the new context is a method context, its home_cntx field is made to point to itself.

Figure 9 is somewhat more complicated. In that figure, we see the stacks after method context #2 starts to evaluate one of the blocks that was set up by method context #1. (We assume that the block was passed as a parameter in the message which resulted in the creation of method context #2.) The stub to be evaluated is #214740009. To handle this, the Alltalk interpreter 44 must "create" a new active context - but this time, it is a block context. Just as with method context creation, the Alltalk interpreter 44 follows the next pointer of the cur_cntx to find the next available active context and make it the cur_cntx. Also, the Alltalk interpreter 44 makes its first_block pointer point to the block stub pointed to by next_blk_stub. However, the home_cntx pointer of the new context does not point to the new context itself; because the new context is a block context, its home_cntx pointer is gotten from its block stub. In this case, home_cntx points to method context #1. Note also that the block stub's active_cntx pointer is made to point to the new block context The transformation of a block stub to an active context is handled by the routine stub_to_cntx().

Figure 10, shows how the stacks would appear if the block were to do a short return. Note that the Alltalk interpreter 44 simply follows the prev pointer of the current context to find the context to return to; it is made me cur_cntx. Note also that the block stub associated with the evaluated block does not go away, even though its active context did go away. Block stubs go away when their home context goes away (returns). The Alltalk interpreter 44 also moves next_blk_stub to point to the block context's first_block. This effectively "destroys" and frees up any block stubs set up by the block context. (In this case, the block context created no block stubs, so next_blk_stub does not change.)

Figure 11, shows how the stacks would appear if the block were to do a long return. Remember that a long return from a block is the same as doing a return from the block's home context In this case, the block's home context is method context #1, so the Alltalk interpreter 44 (in essence) does a return from method context #1. It follows the prev pointer of method context #1 to find the context to return to; it becomes the cur_cntx. It also moves the next_blk_stub pointer back to point to the stub pointed to by first_block of method context #1. This effectively "destroys" and frees up all blocks created by method context #1 and any of its descendent contexts.

Figures 12 and 13 show how the my_blk_stub and prev_active_cntx fields are used to handle the case where a block stub may have multiple active contexts associated with it Note that these fields are shown in these figures only, and only for block contexts. Note also that we have shifted our view of the stacks down (or up) by one context in order to fit the contexts of interest on the page.

Figure 12 shows how the stacks would appear if a second block context was activated for the same block stub as the current context. Note that the two block contexts created from the same block stub are very similar; only their prev_active_cntx fields differ. Note th at the second one uses this field to point back to the previous (first) one. Note also th at the active_cntx field in the stub is updated so it points to the new context

Figure 13 shows how the stacks would appear if the second block context did a short return. The Alltalk interpreter 44 follows the my_blk_stub pointer of the returning block context to find its associated block stub. It copies the prev_active_cntx pointer of the returning block context into the active_cntx field of the stub. Then it does the normal processing for a short return, that is, follow the returning context's prev pointer to find me sending context and makes it the new current context. Note that in this example, prev and prev_active_cntx point to the same context, that is, the first block context; however, this will not necessarily be the case. There could be other intervening contexts between these two activations of the same stub. This is why it must save this information in the newly-created context.

4.5 Process Management

As mentioned above, the Alltalk interpreter 44 maintains run time data structures for Smalltalk processes in an array called Processes[]. Each element in that array represents one Smalltalk process. Each element contains (basically) a stack of active contexts, a pointer to the current context in that stack, an array of block stubs, and a pointer to the next available stub. The management of these two stacks and two pointers was described in the previous section. However, we have not yet discussed how processes are created, switched, or destroyed. These topics will be discused in this section.

4.5.1 Creating Processes

A Smalltalk process is created by sending a message to a block. The block contains the code thai is to be executed in the new process. The message sent to the block might be fork At:, fork, etc. However, all of th ese messages eventually result in the message new Process being sent to the block. The Smalltalk code for method new Process in Class Block is shown in Table 4.2.

The for Context:priority: method in Class Process is a class method for creating new processes, and it is implemented as a primitive in Alltalk.

The routine create Process() is the main routine for creating a new process. It first finds an available element in the Processes[] array by calling get_proc_id(). Then, in order to create a new process in Alltalk, the Alltalk interpreter 44 establlshes the first context in that new process. It does that by copying appropriate active contexts and block stubs from the creator process to the created (new) process, and then making slight adjustments to the copies. This is best explained using an example.

Suppose an application wishes to create a process that simply prints a message. An example of code to do this is shown in Table 4.3.

What contexts and stubs should be copied? Obviously, me Alltalk interpreter 44 must copy the user's block, that is, the one in method my Test. Because a block may refer to its home method's temporaries (though in this case it does not), and because a block's bytecodes are actually contained in its home method, it copies both the block stub and its home. In this case, the home context is the method context associated with the execution of my Test. But this is not enough. Note that the method, Block new Process, which actually sends the message which directly creates the new process (Process for Context:priority:) also creates a block. This block, [self value. Processor terminate Active.], also must be copied; and its home context must be copied as well. In what follows, we call this block the outer block. Note that self in the outer block refers to the user's block.

To summarize: the Alltalk interpreter 44 copies the user's block and its home context (see proc_copy_cntxl() ), plus the outer block and its home context (see proc_copy_cntx2() ). After that, it evaluates th e outer block, that is, it creates an active context from the block stub. When the new process becomes active, this, in turn, causes the user's block to be evaluated (as a result of the message self value). When that block finishes, the new process is destroyed (as a result of the message Processor terminate Active).

Referring to the Drawings, Figures 14 through 16 illustrate the relationships between these contexts and blocks. Figure 14 shows a portion of the active context stack and block stub stack of the creator process. The contexts and stubs shown are the ones that are of interest when the Alltalk interpreter 44 creates the new process. Figure 15 shows me active context stack and block stub stack of the created process just after it is created by the interpreter. Figure 16 shows the same stack just after the new process has become active, and the user's block begins to execute.

4.5.2 Switching Processes

Switching processes is fairly straightforward. Before each bytecode is executed, the Alltalk interpreter 44 tests the Divert flag; if set, it switches to the process returned by the routine process Switch(). The routine process Switch() returns an oop; the routine find_process() takes the oop as an argument, and returns a pointer to the corresponding element of the Processes[] array.

The machinery for managing process switches is contained in the module process.c. It follows the implementation described in the standard reference for Smalltalk by Golberg and Robson, mentioned above. 4.5.3 Destroying Processes

Destroying (i.e., terminating) a process involves two basic steps. First, the appropriate element of the Processes[] array is marked as not in use so it can be reused if needed. Second, the garbage collector (described below) is told to clean up after the process. The routine destroy Process() handles these two tasks.

Processes are destroyed in two situations. The first case is when the interpreter quits. At that time, all active processes are destroyed so garbage collection can be performed correcdy. The second case is when a terminate message is sent to a Process object. This second case is implemented via primitives. Note that process 0 is created automatically when the interpreter is initialized; it cannot be destroyed, except by shutting down the interpreter. 4.6 Optimizations

Various techniques are used to improve the run-time performance of the Alltalk tool. These techniques are useful independently of the Alltalk tool. They can be advantageously employed in any Smalltalk-like object-oriented programming tool to improve the runtime performance. We describe these techniques below.

4.6.1 Replacing certain message sends with less expensive processing

This is referred to as message flattening. The Alltalk interpreter 44 detects at runtime if a message send's only purpose is either of the following 2 cases:

1. Return of an instance variable.

2. Execution of a primitive.

The Alltalk compiler 20 flags methods that are of these types, for easy detection at runtime. The Alltalk interpreter 44 will execute the appropriate logic in-line, and modify flags in the bytecode that is being executed, as well as caching in the bytecode itself the class of the receiver. Subsequent executions of the bytecode involved will cause the class of the now current receiver to be checked against the class cached in the bytecode. If it matches, the Alltalk interpreter 44 performs the optimized logic, in-line, without fetching (or executing) the method. Thus this optimization saves the fetching of the method, allocation (and subsequent deallocation) of a new context and interpretation of the method.

4.6.2 Treating primitives as bytecodes

Rath er than have one bytecode just for dispatching primitives, (e.g, an execute_primitive bytecode), in Alltalk, each primitive is its own bytecode. This eliminates the extra level of indirection to get to the code for primitives. As mentioned previously, primitive bytecodes are in the range of 0x000 to 0x0FF hex; other bytecodes begin at 0x100.

4.6.3 Saving a call to the object manager to fetch reciever

If the receiver of a message is the same as the receiver of the sending method, the Alltalk interpreter 44 avoids the call to the object manager to fetch the receiver again. Instead, since in Alltalk a pointer to the receiver is held in the associated context, the Alltalk interpreter 44 gets the receiver pointer from the associated context instead.

4.6.4 Replacing 'value' messages with block evaluation

Since evaluating a block is less expensive than sending a message, the Alltalk interpreter 44 attempts to' replace send_msg_bcodes with eval_blk_bcodes when possible. The Alltalk compiler 20 recognizes messages with the selector value (and value:, etc.), and replaces them with eval_blk_bcode2 bytecodes. This bytecode is the same as the eval_blk_bcode, except that it must check to see that the "receiver" of the value message is a block. If it is not a block, eval_blk_bcode2 simply returns, and lets processing fall through to the next bytecode which is a send_msgJbcode for the value message; if the receiver is a block, eval_blk_bcode2 operates like evalJblk_bcode, except tiiat it must push the bytecode pointer past the following send_ msg_bcode which it replaces. 4.6.5 Caching methods in send_msg bytecodes

Alltalk uses a performance-improving technique, common to most Smalltalk implementations, known as method caching. The technique takes advantage of the fact that while Smalltalk allows polymorphism, a given message often ends up being resolved to the same method every time. How Alltalk takes advantage of this is as follows.

The send_msg bytecode has two extra fields which implement a method cache. One field is likely_class. This saves the class of the receiver of the message when it was last sent The second field is likely_method. This saves the oop of the compiled meth od to which the message was resolved last time it was sent When the bytecode is encountered again, the Alltalk interpreter 44 checks to see if the new receiver's class matches likely_class; if it does, it uses the compiled method in likely_method. If the classes do not match, it must do the normal, more expensive processing to fetch the appropriate method.

Note that in Alltalk, when the cache is used, the Alltalk interpreter 44 calls the object manager to reserve the method object, to insure the object is not garbage collected until the object is no longer needed. However, this is less expensive than normal method fetching. Note also, that if the cache is not usable (i.e., the receiver's class does not match likely_class), the Alltalk interpreter 44 updates the cache to match the receiver's class and the method's oop in the current message.

4.7 Initialization and Shutdown

The main procedure of the Alltalk interpreter 44 is contained in the module interp.c. It performs various types of initializations, then invokes the bytecode loop by calllng exec_bcodes(). When exec_bcodes() returns, main() does some minor clean up, and exits.

Initialization procedures are the following.

1) Command line arguments are processed. These are parameters passed on the statement used to invoke the runtime environment 22. They include switches for relinquishing control of the keyboard and mouse to the Smalltalk application, and for avoiding the normal system booting procedures. Another parameter is an optional filename; it indicates tiiat the interpreter should get the information for the initial message of the application from that file rather than by prompting the user.

2) Signal handling is set up for the I/O primitives.

3) The object manager 48 is initialized via a call to init_om().

4) The values for the initial message are processed via a call to get_init_vals().

5) Keyboard and Mouse are 'opened' via calls to openMouse() and openKeyboard(), if appropriate.

6) The oops of certain Alltalk objects are referenced in the Alltalk interpreter 44 via global variables.. Some of th ese are fixed to certain oops. For example, true is always oop 257. However some of the oops referenced via interpreter globals must be determined at start up of the interpreter - they are not fixed forever, just for the duration of the interpreter's run. The appropriate assignments are made by calling initialize Oops(). Likewise, certain instance variable indices are referenced by the interpreter via globals. These, too, must be determined at start up. A call to initializelndices() takes care of this.

7) The first Smalltalk process is established. See the section above on Process Management for more details. The routines create Process(), and init_processor() do most of tiiis work.

8) The display is 'opened' via a call to open DispIay().

9) The bytecodes and context for the first message are built and made the first one to be executed. Basically, the interpreter 44 builds: a) send_msg and return bytecodes for the message startup sent to Class System Boot;

b) send_msg and return bytecodes for the user-supplied initial message.

The routines bld_dummy_bcodes() and bk _dummy_cntx() perform these tasks.

4.8 The debugger

The debugger is named RAID, and it combines many of the features of the standard Smalltalk debugger and the UNTX debugger, dbx.

4.8.1 Overview of the Debugger

RAID (Revised Alltalk Interactive Debugger) is the debugger for the Alltalk system. We designed it to be used for debugging both Alltalk applications code, and the Alltalk system (implementation) itself. RAID provides typical debugger capabilities such as:

- setting break points;

- stepping through program execution;

- tracing various types of information (messages, blocks, bytecodes, processes); and

- displaying values of data structures/variables.

RAID is written in C, and is integrated quite closely with the Alltalk interpreter.

The user interface is a simple command interpreter, that looks somewhat like the Unix debugger, dbx, to the user. The command interpreter uses UNIX utilities lex and yacc to parse input and dispatch the appropriate C routines that perform the tasks of the RAID commands.

4.8.2 Basic Architecture of RAID

There are several versions of the Alltalk interpreter 44 , each geared to a particular need. Not all of these interpreters contain RAID. For example, one version is optimized for running debugged applications as fast as possible; leaving out the debugger improves performance considerably. Another version is geared toward the collection of performance statistics; it also does not include the debugger. The version of the interpreter built by default, however, does include RAID.

Conceptually, there are three pieces to the implementation of RAID. One piece is a set of C routines in a library separate from the interpreter, that performs the tasks associated with the RAID commands. Each command has a C procedure associated with it and th at procedure may use oth er utility procedures to do its work. This first piece is conditionally linked to the interpreter depending on which version of the interpreter is made.

A second piece is the code within the interpreter that can get conditionally compiled into the interpreter itself; by default, it is included, but it can be excluded if debugging is not needed. This code is included when the C compiler switch DEBUGGER is on.

The third piece is a set of global variables and constants th at are used to communicate between the first two pieces.

In what follows, we will refer to piece one simply as the debugger; piece two will be referred to as RAID code in the interpreter, piece three will be called debugger globals.

RAID is invoked when the interpreter calls a routine in the debugger called, appropriately enough, debugger(). Flow of control is as follows: 1) RAID code in the interpreter calls debugger().

2) debugger() prompts me user, and invokes the lex/yacc command interpreter.

3) The command interpreter parses and interprets the user input, and calls the appropriate C- procedure with the appropriate parameters.

4) The C-procedure performs the tasks associated with the desired command. This usually results in either display of some information (like the contents of the current context), or the updating of the debugger globals (like turning on or off the switch that tells the interpreter to stop at the next message-send).

5) When the C-procedure returns, either control will be passed back to the interpreter at the point at which it called debugger(), or the debugger goes to step 2. Which path is taken depends on the command just processed. For example, after the continue command executes, control is returned to the interpreter; after the print_active_cntx command executes, the user is given another RAID prompt.

6) When control returns to the interpreter, it continues, executing both normal code and RAID code.

RAID code within the interpreter may call debugger() (step 1 above); it may update debugger globals; or it may display data to the user based on the values of the debugger globals (switches).

4.8.3 Command Interpreter

As previously mentioned, the interactive interface to RAID is a simple command interpreter built using the UNIX utilities lex and yacc.

The utility lex defines what are valid tokens in the RAID "command language"; the grammar defines how these tokens can legally be put together to form commands. In addition, the grammar calls the C- procedure associated with the command, passing the command parameters as arguments.

The following naming/capitalization conventions are employed for tokens:

1) Tokens representing command names are all uppercase, e.g., MSG_STEP.

2) Oth er terminals have first letter uppercase, all others lowercase, e.g. Hex_numeric.

3) Non-terminals are all lowercase, e.g., help_param.

4.8.4 Implementation of the Commands

This section will give a brief description of how each RAID command is implemented. For each command, we discuss how each of the three pieces of the RAID implementation (debugger, RAID code within the interpreter, and debugger globals) is used. First, we describe the naming/capitalization conventions used in the RAID implementation.

4.8.4.1 Naming conventions

Almost all variables, constants, and procedures that RAID uses begin with the letters "d_" or "D_" (the letter "d" or "D" followed by the underscore character). In addition, we use the following capitalization conventions:

1) RAID global constants are all uppercase, e.g., D_PROMPT_SYMBOL.

2) RAID typedefs and structure definitions are all lowercase, e.g., d_ostat_struct.

3) RAID global variables have first letter uppercase, all otiiers lowercase, e.g., D_init_vals. 4) RAID macros are all uppercase, e.g., D_CRESET().

5) RAID procedures are all lowercase, e.g., d_where().

6) Associated with each command with name command_name is a routine with the name d_ com mand_ name(). 4.8.4.2 RAID Switches

Some operations of RAH) are controlled by two sets of binary switches. One set of switches controls the trace information that is displayed as the interpreter runs, e.g., message sends and returns. The other set holds state information, e.g., which RAID command is currently executing.

Each set of switches is implemented using a global variable bit vector, plus three macros: one for setting a particular switch (bit), one for resetting a particular switch (bit), and one for testing whether or not a switch (bit) is set. The first set of switches uses the global variable D_display_switches, and the corresponding macros are D_DSET(), D_DRESET(), and D_ISDSET(). The second set of switches uses the global variable D_control_switches, and the corresponding macros are D_CSET(), D_CRESET(), and D_ISCSET(). 4.8.43 Commands for starting and stopping execution continue

This command simply continues execution of the interpreter by causing debugger() to do a return. We cause debugger() to return by setting the global variable D_in_debugger to "0" (zero). quit, restart, rerun

The quit command causes the interpreter to exit; restart aborts the current Alltalk apphcation, restarts the interpreter on the same application, and gives a RAID prompt; rerun is equivalent to restart followed immediately by a continue - it does not re-prompt the user before restarting the application. It is important to do garbage collection before aborting an application, so these commands make sure each active Smalltalk process is explicitly destroyed before aborting. The code does different tilings depending on the state of the Alltalk interpreter 44 when the command is invoked.

If the bytecode loop has not yet started, the user is forced to get into the bytecode loop (by executing one bytecode, for example) before allowing any of the tiiese commands to be used.

If the interpreter is in the middle of an application, i.e., it is inside the bytecode loop, the debugger does longjmp() to an appropriate spot in exec_bcodes() where all active processes are destroyed in order to- be sure garbage collection is done appropriately. Then it returns to interp().

If an application has just completed, it is already outside the bytecode loop, so the debugger simply returns to the routine interp(); no garbage collection is needed since all active processes ran to completion.

In either of tiiese last two cases, the debugger sets the appropriate global switch (D_QUIT, D_RERUN, or D_RESTART) so that when it returns to interp, it knows whether to exit, restart, or rerun. run

The run command is similar to restart, but it is used when the user wants to run a different Alltalk application without leaving the interpreter. The code, then, must clear all breakpoints (since these are probably not meaningful in the new application), and get new values for the interpreter's initial message. 4.8.4.4 Commands for finding out where you are print_message, where

The where command is analogous to the dbx command of the same name. It prints out the currently active messages, i.e., the messages sends that have not yet returned. Only those messages in the currently-active process are printed The print_message command prints only the most-recendy activated (last sent) message. Both commands use the routine d_print_msg() to print the message associated with a given context; where calls this routine on all the contexts in the context stack of the current process; print_message calls this routine only on the current context

4.8.43 Commands for setting breakpoints stop_at

This command handles a stop set for a particular bytecode type, e.g., send_msg_bcode. If the user enters the command without a parameter, the. debugger simply prints out the currently-set stop, if any. If a parameter is given, the debugger stores it into the RAID global variable, D_stop_at_bcode. Bytecodes range from (hex) 0x100 to 0x156; primitives range from (decimal) 0 to 255. The user may specify a bytecode in either range. As the interpreter executes, within exec_bcodes(), before executing a bytecode, it checks the bytecode against the D_stop_at_bcode; if it matches, the interpreter calls debugger(). stop_in, delete

These commands handle stops set for particular methods and/or classes and/or selectors. More th an one stop can be set at a time; the constant D_MAX_STOPS determines how many stops can be used. Stops are stored in the global array D_stop_in_data. They are identified by number, from 1 to D_MAX_STOPS. The parameters of the stop_in command define a new stop; new stops are added using d_add_stop() called from d_stop_in(). As with stop_at, if invoked witii no parameters, stop_in simply prints the currently set stops using d_print_stop(); Stops are deleted using the delete command. Note that deleted stops cannot be re-used.

During interpreter execution, in the send_msg bytecode handler, after each send_msg bytecode is executed, a check is made to see if the just-executed bytecode matches any of the stops. If so, the stop is printed, and debugger() is called.

4.8.4.6. Commands for executing a limited portion of the application bcode_step

This command simply causes the interpreter to continue execution until the next bytecode is about to be executed. It sets the D_BCODE_STEP switch. During interpreter execution, before a bytecode is executed in exec_bcodes(), this switch is tested; if set, debugger() is called. The switch is reset every time debugger() is called. goto, skip_msg

These commands cause the interpreter to continue execution until a particular message is sent The message is identified by the process in which it executes, and by its sequence number within that process. With the goto command, the user specifies an absolute message sequence number; with the skip_msg command, he specifies a relative message sequence number. Note that goto also allows the user to specify a particular process; skip _msg uses the current process. The process and message sequence number are stored in D_goto_skip. These are cleared every time debugger() is called. msg_step

This command is to messages as bcode _step is to bytecodes. It causes the interpreter to continue execution until the next message is sent It sets the D_MSG_STEP switch. During interpreter execution, after a send_msg bytecode is executed, this switch is tested; if set debugger() is called. The switch is reset every time debugger() is called. next_msg

This command is rather more compllcated than msg_step. This command is to msg_step as the dbx command next is to the dbx command step. That is, it causes the interpreter to continue executing until the next message at the current level is sent In order to do th is, it must keep track of what the current level was when the command was invoked; this is stored in D_base_cntx. As the interpreter executes a send_msg bytecode, it checks to see if the message just sent was sent from D_base_cntx. If so, then debugger() is called. Also, on every return bytecode, the interpreter checks to see if it is returning from (or past) D_base_cntx. If so, D_base_cntx is set to be the context to which it is returning, the user is given a warning message, and debugger() is called. This is analogous to doing a next in dbx past a return. return

This command causes the interpreter to continue execution until it returns from (or past) the current context. Basically, d_return() sets the D_RETURN flag, and fills in the global D_ret_from with the current context and process id. ret_bcode checks these; if D_RETURN is set and it is returning from or past the context specified in D_retjτom, the interpreter displays a message and calls debugger(). This simple logic gets complicated because of the optimization that converts message sends into assign54 bytecodes and primitive bytecodes. Note that the user is unaware of these optimizations, so the interpreter makes tiiese optimizations transparent to her. The interpreter uses the switches D_MSG_REPLACED and D_RET_FROM_REPLACED_MSG to keep track of these situations.

4.8.4.7. Commands for using the trace features

The set and unset commands turn on and off, respectively, the various display switches. See the section above on how these switches are implemented. How each of the switches is used is described next set/unset bcode

The D_BCODES switch is tested in exec_bcodes() before the interpreter executes each bytecode. If the switch is set it calls print_bcode() on the bytecode about to be executed. set/unset context

The D_CONTEXTS switch is tested in exec_bcodes() after the interpreter executes each bytecode. If the switch is set, it calls d_print_cntx() on the current context. set/unset block

The D_BLOCKS switch is tested by the interpreter when a block is evaluated If the switch is set, the debugger prints information about the block that the interpreter is about to evaluate. The switch is also tested when the interpreter does a return. If the switch is set, and it is returning from a block, the value returned is displayed. Note that this information is not printed if the debugger is currendy executing a nextjnessage command, and the interpreter is at a level below the level at which the next_message command was invoked. set/unset process

When the D_PROCESSES switch is set, a message is printed whenever a process is created, destroyed, switched, or finished (returns from its first context). The switch is tested in, respectively, create Process(), destroy Process(), exec_bcodes(), and ret_bcode. set/unset message

The D_MESSAGES switch is tested when a message is sent. If the switch is set the debugger prints information about the message that the interpreter is about to send. The switch is also tested when the interpreter does a return. If the switch is set, and the interpreter is returning from a message (rather than from a block), the value returned is displayed by the debugger. Note that this information is not printed if the debugger is currently executing a next_message command, and the interpreter is at a level below the level at which the nextjnessage command was invoked. Also note mat the interpreter takes care of the cases in which a message send is replaced by a primitive or an assign54 bytecode. The assign54 case is handled in send_msg_bcode; the primitive case is handled in send _msg_bcode (for the send) and exec_prim_bcode (for the return). set/unset receiver

The D_RECEIVERS switch is tested in exec_bcodes() after the interpreter executes each bytecode. If the switch is set, the debugger calls d_print_receiver() on the current receiver.

4.8.4.8. Commands for displaying Alltalk runtime objects print_global, print_oop, print_receiver

These commands use a database lister to print the contents of an object The print_global command takes a string as a parameter; it's used for objects such as symbols, Class names, and other global objects. The print_oop command takes an oop (integer) as a parameter. The print _receiver command takes no parameter; it simply causes the debugger to print the contents of the current receiver. print_temp

This command lakes a small positive integer as parameter. The parameter corresponds to a method temporary of the currently executing method; 1 represents the first temporary, 2 the second, etc. The routine d_print_temp _num() calculates where to find this in the temporaries of the appropriate context on the stack, and prints it as an oop.

4.8.4.9. Commands for displaying Alltalk runtime data

The commands in this section simply print the contents of Alltalk interpreter data structures. They are meant to be used mainly by Alltalk systems (implementation) programmers. print_bcode

This command simply causes the debugger to print the currently executing bytecode. Note that print_bcode is a general routine, which is also used by the database lister. print_active_cntx, print_bIock_stub, print_cntx_of _stub

The interpreter maintains contexts, one for each currently-active message and block, in an array, one array per Smalltalk process. The interpreter also maintains an array (one for each Smalltalk process) for each block that has been set up and is active or has the potential to become active (we call these block stubs). These commands allow the user to print the contents of any of these contexts or block stubs.

The command print_active_cntx takes as a parameter a positive integer which is the index into the array of contexts of the current process. That particular context is printed using d_print_cntx().

The command print_block_stub takes as a parameter a block stub id. This is a positive integer greater than INIT_CNTX_ID. This range of integers is used to track blocks independently of normal objects. The routine d_p rint_block_stub() translates this id into an index into the array of block stubs for the current process; the appropriate block stub is then printed.

The command print_cntx_of_stub also takes a block stub id as parameter. As with print_block_stub, it finds the appropriate stub; but it uses d_print_cntx() to print the active context associated with that stub, if there is one. print_process

This command causes the debugger to print the contents of the interpreter data structure associated with a particular Smalltalk process, not including the context stack or the block stub stack. status

This command is equivalent to executing the following commands, all without parameters:

stop jn (prints method stops, if any);

stop_at (prints bytecode stop, if any);

stat_status (prints statistics collections that are turned on, if any); and

set (prints the display/trace switches th at are turned on, if any).

4.8.4.10. Commands for collecting message statistics

A tool for collecting statistics on Alltalk messages is implemented in Alltalk. This tool is invoked from within RAID. Basically, it keeps track of which methods are executed, how many times each is executed, and how much time is spent on behalf of each method and its descendants.

There are two main data structures for keeping these statistics. One is a table which keeps a running total of the messages stats for messages which have already returned; the table is stored in the global variable, D_stat_tab. The other is a stack of records, one record for each message which is active, i.e., has been sent but has not yet returned. There is one stack per Smalltalk process, and th ese are stored in the global array, D_stat_stack. When a method returns, its record is popped from the stack, and 'added' to the table.

We now describe the records used on the stack. A stack record is defined by struct msg_rec. It contains the class and selector of the method; this is used to identify the method. It also contains the class and selector of the method which invoked it. The stack record also contains two pairs of the following form: a time stamp, and a cumulative time. One stamp/cum pair is used to keep track of time spent on behalf of th is meth od and its descendants; the other stamp/cum pair keeps track of time spent in the method only.

In a field called self plus descendants, the statistics tool stores in the start_time sub-field the time at which the method begins executing. When the method returns, it subtracts start_time from the current time, and store the result in the elap_ti me sub-field.

In a field called self, the statistics tool stores in the time_stamp sub-field the time at which the method begins executing. When the metiiod itself sends a message, time_stamp is subtracted from the current time, added to the cum_time sub-field which is initally zero. When control returns to th is method, time_stamp is reset. When this method returns, time_stamp is again subtracted from the current time, and result added to cum_time. In this way, cumjime keeps track of only the time spent on behalf of this method, exclusive of its descendants.

We show the distinction between the two pairs of data in Table 4.4.

The stats table is an array of records. Each record is of the type struct method _rec. A record contains a class and selector to identify its method, plus the number of times it has been sent (and returned), plus the total time spent on its behalf, plus the total time spent on behalf of it and its descendants. When a meth od returns, the routine shown in table 4.5 is performed.

The routine d_stat_tab_insert() works as shown in Table 4.6.

Five commands are available from RAID that affect message statistics collection. The command stat_on turns on collection of statistics; stat_off turns off collection. This is done by setting and resetting the switch, D_STAT. This switch is tested by send_msg_bcode (and send_param_msg_bcode) and ret_bcode; if the switch is set, these routines cause statistics collection to be done. Neither command affects the table, but both initialize (empty) the stack. The command stat_reset initializes the stack and empties the table. Any statistics collected up to this point are lost. The table can be printed to the screen or to a file using the stat_print command, and one can determine whether or not statistics collection is on by using the stat_status command.

4.8.4.11. Commands for collecting object manager statistics

The statistics tool also includes a means for collecting statistics related to the object manager 48. The statistics collected are mainly counts of various events, and maximum and minimum values of certain object manager variables/sizes.

Basically, the statistics tool uses two instances (D_ostats and D_obuffer_cnts) of one large structure (d_ostat_struct) to keep various statistics. Just as with the message statistics, the collection of object manager statistics can be turned on and off at any time via RAID commands. Also, as with the message statistics, commands are available for printing object manager statistics to the screen or to a file; for resetting the collection 'table'; and for determining whether or not collection is turned on or off.

4.8.4.12. Commands for getting help with RAID help, short_help

RAID has an on-line help facility. When the user enters the help command with a command name as a parameter, he is presented with a manual page (a la UNIX) for that command. The help files are written in UNIX nroff form. When the user requests help on a particular command, the debugger uses the system() UNIX library routine to invoke the UNIX more command on the appropriate help file.

Note that the grammar translates the parameter (the command name) from a string to a token (i.e., constant) before passing it on to the d_help() routine. The d_help() routine then does a switch based on that constant and displays the correct file.

Invoking the help command witiiout a parameter results in the display of a summary of all commands. Invoking the short_help command (takes no parameters) causes an even shorter list of all the commands to be displayed.

5. Object Manager

The object manager 48 provides access to objects in the database 40 and in main memory 18. It is used by the compiler 20, interpreter 44, primitives, and utilities. It maintains the database 40 as well as the organization of objects in memory. Object manager 48 is also called by the method-fetcher 50 to fetch methods for the interpreter 44, using the class of the receiver of a message, and the Smalltalk superclass hierarchy. Although the object manager 48 is described herein with reference to the Alltalk tool, the object manager 48 is also useful as a general purpose object-oriented database manager.

5.1 Database Storage Layout

The database 40 consists of 2 UNIX files: db Jcey and db.prime. The key file provides associative access to the prime file: the access manager 58 hashes into the key file (all of whose records are of fixed length), and finds the address (file offset) of the object in the prime file. The key file record also contains the length of the prime record, so the access manager 58 knows how many bytes to retrieve.

Objects in the prime file are 1 of 6 types: OBJ_REC, a normal Alltalk object as seen by the Alltalk programmer; SYMBOL_XREF, a symbol cross-reference record th at contains the string for the symbol and the associated oop of the Alltalk symbol object; and DICT_XREF, which is the Smalltalk dictionary cross-reference record This dictionary record contains the string that is the name of the global symbol (e.g. Class name), the oop of the associated Alltalk symbol object as well as the object id of the Alltalk object th at has that symbol as the object's global name. The oth er types are CTL_REC, the control record; CKPT_REC, the checkpoint integrity record; and DLT_REC, a logically deleted object record. The key file is divided into 2 parts, an objectKey Space, and a symboI Space. The object Key Space part of the file (which is first in the file) is used to find the address of an object, given the oop (object id). The second pan of the file, symboI Space, is used to find a cross-reference record, given the string associated with a symbol or global. To use the symboI Space, the access manager 58 hashes the string to get an address in symboI Space, retrieves the key record at that address, and then proceeds to the prime file to retrieve the cross-reference record, which contains the oop of the object being sought

The records in the key file are of fixed length, and contain three fields:

1. the address (byte offset) of the object in the prime file

2. the size of the object (in bytes)

3. the type of the object record

ColUsions in the key file are handled by chaining the objects in the prime file together. If the object at the address indicated by the key file record does not have an id (oop, or string) tiiat matches the target sought the access manager 58 follows the 'overflow' chain in the records in the prime file, checking the target against the id until it is found. Fastest access to newest objects is provided by placing them first in the overflow chain. 5.2 Database access manager

The routines in the access manager 58 are called mainly by the buffer manager 54 (when objects are to be retrieved), and by the garbage collector 52 and the transaction manager 46 (when objects are to be added/updated in the database at commit points). They are also called by dictionary and symbol access routines discussed later.

Important to the access manager 58 is a "control record", which is stored as the first record in the prime file. This contains the next available oop (to use for new objects), and the next available address in the prime file (used and then updated when new records are added to the prime file). The control record also maintains certain database statistics, including the last checkpoint id. It is written to the database after every commit is complete, to insure proper restart

The first call to the access routines will open the Unix files, and put a (UNIX) lock on the files, to assure single user access. The lock check can be overridden for read-only access (as in the database-lister utility). The checkpoint integrity record is also checked by the access manager 58 to make sure that the system was not aborted while a 'commit' was in progress. This record is updated in the database with the checkpoint id when the commit starts. If the first call to the access manager finds the checkpoint id in the control record out of sync with the checkpoint id in the checkpoint integrity record, the access manager 58 aborts (the control record is written to the database only after the commit is successful). The only way to recover is to restore from a back-up

The fetchit function retrieves an object from the database 40, given a record type and a key. The fetched record is placed in the buffers (see buffer manager, below), along with the disk address of the retrieved record. This will be used if/when the record needs to be replaced in the database.

The storeit function is capable of adding a new object (or replacing same) in the database. First the access manager 58 looks at the record's disk address (which was stored with the record in the buffer, when (if) the record was previously retrieved). If th is is not NULL, it knows that the record already exists in the database, and it replaces the record using this disk address (records never change their disk address during a run, except when they are lengthened -- see below). The responsible program must NULL out this address if the record has changed its key, or if the record has lengthened. If the access manager 58 cannot use the disk address, it assumes it has a new record. It looks at the record to be added/replaced to determine its type (OBJ_REC, DICT_XREF, SYMBOL_XREF), and gets the appropriate key record. If no key entry exists for the new record, it sets one up and adds the record to the end of the prime file. If the key entry does exist a collision results. The access manager 58 fetches the record pointed to by the key, updates the new record's overflow pointer to point to the record currently pointed to by the key record, and then updates the key record to point to the new record being added. This insures that fastest access is to newest records (they are first in the overflow chain).

The forceit function will put a record in the database, but (unlike storeit) checks to see if it is already there. If so, it logically deletes the old copy and adds the new one. This function is called when an object is newly created with an oop dial already exists (e.g. a Class), and when an object is lengthened. It uses the storeit function if the new object is not already in the database, or is smaller than the one it is replacing. Else, the access manager 58 gets the old object and logically deletes it (by placing a special mark in the recjype), and then executes the storeit logic.

CaUers of the access methods are expected to have determined the id of the object even if it is a new one. They can call oop_gen to get the next available id. This routine will look at a table (filled in by the garbage collector 52, when an object is deleted) in an attempt to reuse oops. If none are available for reuse (e.g. at start of run), the access manager 58 creates a new one by using and then incrementing a field in the control record tiiat keeps track of the next oop to create.

The function start _commit is called when a commit is started (normally in the transaction manager). This routine updates the special checkpoint integrity record mentioned above. If the run is aborted before the commit is finished the control record will be out of sync with the checkpoint integrity record, causing subsequent runs to be aborted.

The function chckpt_oop is called when a commit is finished. Presumably, the calling program called start _commit and has now finished writing all of the changed objects to the database (via storeit and forceit), and the database is now in sync with memory. Chckpt_oop will update the control record indicating the commit is finished, and write it to the database. The control record also keeps track of the next oop to use, and the next prime file address to use.

5.3 Buffer manager

The buffer manager 54 maintains the in-memory copy of objects. It is caUed by the object manager 48 when an existing object is to be fetched or when a new object is to be stored in the buffers. It can be called with the following operations:

1. FETCH_FROM_DB

which means that the caller knows that the object is not in the buffers (buffer manager returns an error if it finds it there), and the object is to be fetched from the database and put in the buffers.

2. FETCH

which means look in the buffers for the object; if it is not tiiere retrieve it from the database, then update the buffers.

3. STORE

which means that a new object is being added, or an existing one being replaced. The buffers are updated with the new (version of the) object

4. FORCE

which means that a new version of an existing object has been constructed, and the old one is to be invalidated (th is happens when the length of an existing object is changed, or when the 'become' primitive is executed). New space in the buffer is allocated, the object's disk address is set to zeros (the disk address is control data kept with the object in the buffer), and the object table is updated to point to the new spot in the buffer where the new version of the object will be stored.

The buffer manager 54 uses an object table to keep track of which objects are already in the buffers. The table contains the id and a pointer to each object in the buffers. To retrieve an object the buffer manager hashes into the object table to see if it is already in memory. If not, the object is fetched from the database, placed it in the buffer, and the object table is updated. When the buffer manager 54 needs space in the buffers in which to place a new object, a "forced" object, or an object from the database, it calls upon the pool manager 56 to find the space in the correct buffer. In any case, the buffer manager 54 returns a pointer to the object to the calling program.

5.4 Pool Manager

The pool manager 56 maintains memory for the various buffers. It keeps a total of 7 buffers: small slot, medium slot and large slot buffers for methods, another set of 3 for non-method objects, and one buffer, "huge", for oversize objects (methods and non-methods can both go in "huge"). Except for the "huge" buffer, all buffers have fixed size slots. Memory for the buffers is pre-allocated, except for "huge", which is maintained using the UNIX routines: malloc/free.

Pool manager is called to find a spot in a buffer for an object. It uses the size and type (method/non- method) of the object to determine which buffer to search for the empty slot If a slot is found, a pointer to the slot is returned to the calling program (probably buffer manager 54). It searches for an available slot with the following algoridim. 1. For each buffer a "slot-indicator" is kept which is the next slot to look at This is maintained across calls to pool manager, and wraps around when the end of the particular buffer is encountered. It is updated to be one higher than the slot returned the last time the pool manager found space in that buffer.

2. Two searches of the buffer are made, starting at the slot-indicator. On the first pass, a search is made for a slot that is empty, or else holds an object that is not being used (i.e., not in the "in- use" table, -- see garbage collector section), and whose "usage Count" is 0. This usage Count is incremented every time the object manager 48 fetches the object and decremented every time the pool manager 56 looks at the object's slot; it indicates the frequency of access to the object If a slot can not be found on the first pass, the usage Count is ignored on the second pass. If a slot cannot be found on the second pass, it means the buffer is filled with objects that are being held by the interpreter 44, and the run must be stopped (memory is exhausted).

3. The object table is updated by removing the entry for the object in the buffer slot that is about to be reused, and an entry in the table is added for the new object just placed in the buffer.

By having different buffers for small, medium, and large objects the number of slots and the slot size can be tailored to fit the distribution of object sizes in the database. Alltalk runs faster with fixed size slots in each buffer, since this means that no compaction is required by the garbage collector 52. Different buffers are provided for methods vs. other objects because non-mediods are expected to be more volatile in their usage than methods, and to have a different size distribution. 5.5 High Level Object Manager Protocol

The object manager 48 provides a set of high-level functions for object access. It is these functions that are used by the interpreter 44, compiler 20, primitives, and others. The access manager 58, pool manager 56, and buffer manager 54 are used to implement these higher level functions.

A program can call the object manager 48 with a call of NEW in order to establish a new object in memory. The class id of the new object must be supplied. The object manager 48 will fetch the class of the new object, and initialize the new object appropriately. The caller must also supply the number of index variables required. The latter parameter can not be changed for the object later: to "grow" an existing object the FORCE call must be used. The FORCE call will accept the id of the object to be grown, and set up a new object with the specified quantity of index variables. It is up to the calhng program (usually a primitive) to set all oth er data appropriately.

There are two retrieval routines available. Reserve_obj will fetch a requested object, and lock its position in memory until the current method (and all other users) ends. This is done by putting an entry in the "in-use" table for the process and region that is passed in the call to reserve _obj. This table is described in the garbage collector section; it serves to keep track of which objects have their memory address pinned down (until the the garbage collector processes the region specified). The entry does not leave the table until the object is either garbage coUected, or, if updated, written to the database. It is the presence of this entry that keeps the pool manager from re-using the object's slot in the buffer. The reserve_obj routine must be used if the caller expects to eitiier update the object, or re-access the object using the pointer returned from the call.

The other retrieval routine is get_obj which also returns a memory pointer to the object requested. This routine will not guarantee that the pointer is valid across calls to the object manager routines. It is used mainly by primitives where only temporary, read-only, access is required. The object manager requires that no calling program cache object pointers except in the interpreter contexts. It also assumes that no program is maintaining local storage of object id's except the interpreter context temporaries, and instance variables of other objects. The reason for these restrictions is that the garbage collector only knows which objects are referenced through the instance variables of other objects (and context temporaries), and only knows which objects have their addresses cached by having entries in the in-use table; all other objects are fair game to be garbage collected. The entries in the in_use table are tagged with the region id (see the garbage collector), and it is assumed that when the region is collected, the memory pointers are no longer required, and the object's buffer space can then be used for other objects.

The object manager logic also depends on the calling program setting the UPDATE flag in the object if the object has been updated. This is the only indication that the object is to be (eventually) re-written to the database. If an object is to be made permanent in the database (even though it has no references from oth er object's instance variables), the calling program should call "referenced" to establish this (see the garbage collector section). An updated object's storage in the buffers will not be re-used until after the next commit call. This is assured by an entry being placed in the in-use table for the object, when reserve_obj was called

New objects and existing objects that have been updated are written to the database when the transaction manager is called to do a commit or when the garbage collector collects region 0 of a process (see the garbage collector section). This latter event happens whenever a process terminates, and at end of run. When an object is written to the database, its UPDATE flag is turned off and (if it is not otherwise pinned down), the pool manager can consider its slot in the buffer for reuse.

5.6 Dictionary and Symbol Access Routine.

Routines getdictionary and putdictionary update the Smalltalk dictionary, and retrieve an object given a global name (a string). Similarly, getsymbol and putsymbol

get an Alltalk symbol object given the string it represents, and update the cross-reference with a new symbol.

5.7 Method Fetcher

The method fetcher 50 retrieves the appropriate compiled meth od object given a selector, the receiver's class, whether it is a "send super", and whether the message is to a class or an instance object. It fetches the class and looks up the selector in the dictionary. If not found, it fetches the class's class, and so forth. Normally, it stops in class Object but continues on if the original message was to a class. In this case, it follows the metaclass chain, as described in the standard Smalltalk reference. The method fetcher 50 employs a table to retain the method id, given a selector, class, and meth od type. It examines this table first before chasing the superclass chain. 6. Garbage Collector

Garbage is defined as objects tiiat are no longer reachable, and therefore can be safely discarded. Since there is no explicit delete command available to the programmer in a Smalltalk language, removal of objects is entirely up to the system. Furthermore, since many objects are transient in an Alltalk session, it is important that the objects be collected efficiently with a minimum of disruption to response time. Although the garbage collector 52 is described in connection with the Alltalk tool, it is useful for garbage collection in any heap based language system (such as Lisp, Prolog, and a variety of object-oriented languages, such as Loops, and Flavors). The garbage collector 52 is integrated with the object manager 48 and interpreter 44.

The garbage collector 52, shown in more detail in Figure 19, includes a collector means 200 for implementing the actual garbage collection function, a region cleaner 202 for detecting regions that have accumulated an excess number of objects, and caUing the collector 200 to clean such regions; a cross-process checker 204 for insuring that no object in-use by another process is discarded; and an off-line mark/sweep collector 206 called by the interpreter for periodically removing objects from the database 40 that have become unreachable (directiy or indirectly) by any object in the database dictionary. The collector 200 employs an in-use table 101 described below, in executing the garbage collection function.

The following definitions will be helpful in describing the garbage collector.

Processes

A process is a Smalltalk object representing a light-weight thread of control. Multiple processes may exist, but only one is active at any time. Processes in Alltalk adhere to the definition in the standard Smalltalk reference.

Contexts

A context is a Smalltalk object representing the state of a method which is executing. Contexts are analogous to stack frames in procedural languages, with the notable exception that allocation/deallocation does not always obey a strict stack discipline. There is one set of contexts per Smalltalk process. In Alltalk, these are managed by the interpreter, rather than being full- fledged objects. As explained before, however, contexts will be transformed into objects when required (i.e. when an owned block is transformed into an object).

Regions

Regions are not Smalltalk objects. They are used in Alltalk for garbage collection. In Alltalk, each context belongs to a region. Several contexts from the same process may belong to the same region, but a context is associated with only one region, and regions do not span processes. When a context is created, it is assigned a region number. Once assigned, a context's region number, never changes. Each object created or accessed is assigned the region number of the context that created or accessed it, unless it was already associated with a region with a lower number. After the number of objects in the 'current' region exceeds a fixed maximum, a new region (with an id one greater than the previous one) is started when the next context is created. Thus the region number is the same or increases as one travels down the context stack from sender to receiver. Referring to the Drawings, Figure 17 shows a context stack for processes 0 and 1. The first two contexts 60 and 62 within the context stack 64 for process 0, belong to the same region (0). The next two contexts 66 and 68 in the stack belong to different regions (1 and 2), and the last two contexts 69 and 70 in the stack 64 are assigned to the same region number (3). The stacks for each process grow in the direction of arrow A, by adding contexts to the tops of the stacks. Figure 18 shows how objects belong to both regions and processes. For example, object 72 belongs to both process 0, region 0, and to process 1, region 1. Object 74, on the other hand, belongs only to process 0 region 0. Object 76 belongs only to process 1 region 1.

Parent/Children objects

If object A refers to object B via one of its instance variables, we call A the parent of B, and B the child of A. When we refer to the transitive closure of A, we mean A's instance variables, and their instance variables, and so on.

6.1 In-use table.

The in-use table 101 in Alltalk keeps track of those objects in memory which must not be overwritten and whose location in memory must not be changed. Typically, such objects fall into one of the following categories.

1. Receivers

In Alltalk, the interpreter 44 retrieves the receiver of a message send, and caches a pointer to it in the corresponding context Until the context returns, this pointer must remain valid

2. Methods

hi order to process a message, the corresponding compiled metiiod must be retrieved. A pointer to this object (as well as a pointer to the currently executing bytecode within the method) is also cached in the corresponding context Until the context returns, these pointers must remain valid.

3. Temporary objects

At any given time during the execution of an Alltalk application, any number of method executions may be suspended waiting for the return of a message send. Objects created or updated as the result of the execution of such a method may have to be kept in the in-use table until the method returns. They cannot be written to the database, because they may turn out to be garbage (i.e., created only to hold temporary results). This determination can only be made after the method finishes executing.

The object manager 48 makes one entry in the in-use table for each object that needs to be kept in memory. If an object is referenced from multiple processes, it will have multiple entries, one for each process. However, if an object is referenced multiple times from the same process, it has only one entry for those references. Referring to the Drawings, Figure 20 shows the format of entries in the in-use table 101. An entry has the following fields:

1. A pointer to the object in memory (buffer pointer);

2. The id of the process from which th is object is referenced;

3. The region within that process with which the object is associated; and

4. Pointers for chaining this entry to others in the table.

Entries 100-110 in the in-use table 101 are chained together in two ways. First all entries for a given object (e.g. object A, 112, entries 100-104) are chained across processes. In this way, the garbage collector 52 keeps track of the fact that an object may be referenced from more than one process. Additionally, all entries for a given process are chained across objects. For example, entries 106 -110 are all for process 0. This chain connects objects from tail to head, in order from highest region to lowest, for a given process. This allows the garbage collector 52 to scan all objects within a process from high regions to low regions, in order to collect (discard) unused objects efficiently. 6.2 Assigning objects to regions.

Objects are put into the in-use table by the object manager, and assigned to regions by the garbage collector as follows: (Note that when when an object is 'moved' to another region, it is not physically moved; its region field in the in-use table changed).

-- New objects are put into the table when created, and are assigned to the region of the context in which they were created;

-- Objects retrieved from the database are put into the table, and assigned to the region of the context in which they were retrieved. When the object manager is called to fetch an object it is (barely) possible that the request contain a region less than that already associated with the object (in the in-use table). In this case, the existing reference is discarded, and the object is re-associated with the lower region. Whenever this is done, objects in the transitive closure are moved to the region of this parent, for any that are currently at a higher region than this parent;

-- When an object is assigned to an instance variable, it (and its transitive closure) are moved to the region of the parent object, if the parent is in a lower region. Note that only those children that are already in the in-use table have to be adjusted; those children that are not in the table do not have to be retrieved from the database; and

-- When a metiiod does a return, the returned object (and its transitive closure) are moved to the region of the context to which it is returned, if the latter is a lower region. Again, only those children that are already in the table and are in a higher region have to be adjusted. 6.4 How the buffers, object table, and in-use table are related.

Referring to the Drawings, Figure 21 shows how the in-use table 101, the object table 120, an object in the buffers 122, and the key file 124 and the prime file 126 of the database 40 are all related. Given an object's id (oop) 128, the object manager hashes the oop to find the entry in the object table 120, and follows the pointer 134 to determine its location in memory (the buffers 122). The object in the buffer 122 has a header portion which is used only by the object manager; it is not visible to the interpreter, and it does not get written to the database 40. In addition to caching the disk address of the object, this header contains a pointer 130 back to the object's entry in the object table 120, and a pointer 132 to the object's first entry in the in-use table 101. Figure 21 also shows how the object address in the key file 124 points to the location of the object in the prime file 126 of database 40. When the object manager cannot find the object in the object table 122, it retrieves the object from the database. It hashes the object's id 128 to access the key file, which contains the actual disk address 140 in the prime file 126 in the database 40.

6.5 Collecting regions.

Most garbage objects are collected by the collector 200, using the following logic. When returning from a metiiod, if the context to which the process is returning belongs to a region with an id at least two lower than the current region number before returning, the regions with id higher than that of the context to which it is returning are collected. Referring to the Drawings, Figure 22 shows, in case 1, a context in region n returning to another in region n. Since the region number is the same, no action is taken.

Case 2 shows a context in region n+1 returning to one in region n. Since n+1 is not two larger than n, no action is taken. Case 3 shows a context in region n+2 returning to one in region n. Since n+2 is twolarger than n, the collector 200 collects regions n+2 and n+1, and all oth er regions having numbergreater than n. A region is collected by following the chain of objects in the in-use table for the current process. Starting at the tail of the chain, entries are removed until an entry is reached belonging to the region of the context to which the process is returning. When an entry is removed, a check to see if it is the only entry in the table for that object (by checking the cross-process/by-object chain for the object). If it was the only entry for that object the collector 52 goes to its header in memory, and null out its pointer to the in-use table. The pool manager 56 then knows that slot can be re-used. If the pool manager 56 decides to reuse the slot it follows the back pointer to the object's entry in the object table, and deletes that entry.

The above architecture offers performance improvements over others for the following reasons:

a. Storage compaction. Some garbage collectors must compact any storage recovered. Because we have fixed size slots in our buffer pools, we do not have to compact the object space. This means that our collector need not move objects around in memory, but only deals with the in-use table entries.

b. Evenness of processing. Many (non-reference counting) garbage collectors do little processing at reference creation time, but wait until the collector is called in order to clean out a region by moving objects to other regions. Our collector does much of its work when cross-region instance variable assignments are made, and when processing Smalltalk 'return' statements, which distributes the garbage collection processing evenly throughout the run. This means that the periods when the system is doing garbage collection (and is th us unavailable to the user) is spread evenly throughout the session and there are no long periods of time when the system is unavallable.

c. Connection with the interpreter. We have integrated the garbage collector with the interpreter in a way that reduces the time spent in garbage collection, which improves overall performance. Because we invoke the collector upon a message 'return', and then move the returned object to another region, we have a natural point where intermediate results and other transient objects assodated with the method th at is terminating can be safely collected. All objects left in the regions being collected can now be discarded. Thus garbage collection at these points is extremely efficient involving very little processing.

6.6 Region Cleaning.

It is possible (but rare) for a region to accumulate an excessive number of objects before the above collector is invoked. The region cleaner 202 detects this and the collector 200 "cleans" the region(s) involved. To detect that a region needs to be cleaned, the region cleaner 202 keeps track (by region) of the number of objects accumulated since the last "region cleaning". When this exceeds a certain maximum point (e.g., 150 objects), the region cleaner 202 invokes the collector 200 for the region involved. The number of objects in one of the regions is checked every time any new object is created. The region that is checked is the "next" one, which is that region in the same process that has a region number that is 1 higher than that of the region that was checked upon the previous object creation. Thus, for checking, regions are ordered by process number, and th en region number widiin process. After the last region has been checked for a process, the next to be checked will be region 0 of the process with a process number that is 1 higher than the previous one that was checked. When all regions within all processes have been checked, the "next" region to be checked is set to be region 0, widiin process 0.

The region cleaner is a procedure that looks at the region to be cleaned, and all regions with region numbers less than this, within the same process. All updated database objects, and all objects pointed to by the interpreter contexts (within the same process/regions) are marked via direct memory pointers (i.e. receivers and method objects). Then the transitive closure of all objects pointed to by marked objects is marked. However, any object tiiat is not in memory, or not in a region being cleaned, or that is neither a newly created nor an updated object is not marked These restrictions limit the number of objects examined during the mark/sweep, and keep the mark/sweep entirely within main memory so that no disk accesses occur.

6.7 Cross-Process References

The above discussion related how contexts widiin a single process interact with the garbage collector 52. For the most part, processes can be handled independently vis a vis garbage collection. As mentioned above, however, objects can be shared across processes, and we must insure that no object is discarded th at is in use by another process. This is handled with the following logic:

1. When the interpreter 44 establishes a new process, it knows which (non-global) objects from the spawning process are being shared with the new process. Upon creation of the new process, the interpreter asks the object manager 48 to place entries for each shared object in the in-use table, at the new process id. The object manager will create entries for the object and its transitive closure at the new process.

It may also happen that one process may request access to an object that is in use by another. When this happens, entries are placed in the in-use table for the requested object, and its transitive closure for the requesting process:

Thus we see that any object shared between 2 or more processes has entries for each process in the in-use table, and so do objects reachable from the shared object (children, etc). All entries for a single object, (used in multiple processes) are linked together, so it is easy to determine which processes share a given object.

2. The collector will not discard any object if it is in use by another process: when the region for a process is collected, all entries in the in-use table are removed for that process, but the object is not removed from the object table, nor is its space reclaimed, until there are no more processes sharing the object

3. Whenever an instance variable in an object P is updated with the id of an object C, the cross process checker 204 checks to see if the new parent (P) is in use at multiple processes. If it is, the child C (and its children, etc), have entries placed in the in-use table for whatever other processes also share the parent, that do not already share the child (etc). The child is placed in the same region that owns the parent (for the process). This logic is in addition to the region checking between parent and child mentioned above.

It can be seen then, that any object reachable through an object P that is shared across processes, has entries for all children of P (etc) in each process th at shares P. Thus collecting any single process will not remove any object that is still reachable by another process. Only when all processes that are sharing an object have removed th eir entries from the in-use table will the object manager 48 discard the object and re-use the space.

6.8 Offline Mark/Sweep Collector

Objects are not written to the database 40 unless they are reachable (at commit time) by some object in the database. An off-line mark/sweep collector 206 is run periodically to remove objects from the database that have subsequently become unreachable. The same utility removes logically deleted objects and re-organizes the database for efficiency.

The basic idea is to "mark" all objects that can be reached in the database, and then, during a second phase (the "sweep"), delete all objects th at have not been marked During the second phase, we also "unmark" all marked records, preparing for the next mark/sweep run.

It is not possible to run the Alltalk system and mark/sweep at the same time, since Alltalk could place new (unmarked) objects in the database which could be incorrectly deleted during the sweep phase. A UNIX lock in the object manager keeps mark/sweep from being started if Alltalk is running (and Alltalk from starting if mark/sweep is running). If mark/sweep is interrupted and re-started, the re-start will first unmark all marked records, and then re-do the mark phase.

Certain objects set up by the compiler are outside the mark/sweep logic: these are mainly constants compiled into meth ods. These constants are not 'reachable' in the normal way, and instead, have a flag ("PERMANENT OBJECT") set, that cause mark/sweep to treat these as already "marked". Other examples of permanent objects are symbol objects for selectors established by the compiler or other symbol objects pointed to from the global dictionary. The only way to get rid of these is to completely rebuild the database. This is not a problem, if applications avoid putting data in the global dictionary, but instead use regular Smalltalk dictionaries (pointed to by the global dictionary).

It is the existence of these "non-reachable" (but permanent) objects that require us to read all objects in the database in the mark phase (otherwise only the global dictionary entries would have to be processed).

The "root" of reachable trees in the database start at the dictionary records (see the object manager description above). These records have their "PERMANENT OBJECT" flag on and will cause the mark phase to retain them, and their children (see below).

6.8.1 Mark phase

The mark phase reads the database sequentially. It skips over any (already) marked objects, non- permanent objects, and logically deleted objects (the latter objects are explained in the object manager description above). The remaining unmarked permanent objects are processed by:

1. Marking the object and then writing the id of the object to a sequential file ( the "reorg file"; the sweep phase will process th is), which represents all reachable objects.

2. Placing all of the marked/permanent object's instance variables (its children)in a "kids" table.

Classes have their method dictionary entries placed in the kids table as well to insure that the method objects will be marked.

Before processing the next sequential record from the database, the mark phase processes all of the children in the "kids" table first (fetching these from the database, and if they are not already marked: marking them, putting their keys in the reorg file, and adding their children to the kids table). It can be seen that the records placed on the reorg file are in "children depth first", which will cluster parents and their immediate children together.

During the mark phase, integrity checking and statistics gathering are also performed.

6.8.2 Sweep phase

First the old database (prime and index) is copied to back-up copies which will insure that we can recover it the sweep phase is interrupted. Then a new database is initialized. Then the reorg file is read sequentially. Each record is processed as follows: 1. Fetch the object indicated by the reorg record, from the (old) database. If the fetched object is not a class object place the id of the last object processed for the class of the object in the fetched object's class chain (this keeps a pointer chain between all objects of the same class). Store away the fetched object's id for use in updating the class chain of the next object, of the same class, that is processed during this phase.

2. Write the object fetched in step 1. to the new database at the next available byte (i.e., the objects are packed together in the order encountered on the reorg file).

At the end of the sweep phase, the mark/sweep collector 206 updates all of the classes with the first instance of that class (head of class chain), to anchor the class instance chain.

The sweep phase (like the mark phase) keeps various statistics and does integrity checking as it goes along, and reports them out at the end.

6.9 Transaction Management

When region 0 of a process is collected, that process has ended and all objects created by tiiat process, that are reachable from the database, are written to the database by the garbage collector 52. To accomplish this, the collector will signal that a commit is in process, and then write out all objects to the database that remain in the process (all have been moved to region 0 by tins time), and which cannot be garbage collected Any object that is also shared by another process is not written out, since this will be taken care of when that other process terminates. Note that the shared object could be garbage collected between the time when one sharing process terminates, and the otiier sharing process terminates. Not writing the object out when the first process terminates results in fewer "garbage" objects being written to the database.

A commit routine flushes objects to the database that are reachable from database objects. An abort routine invalidates all objects in the buffers which have been updated or created since the last commit. This forces subsequent accesses to tiiese objects to be fetched from the database, and thus effectively "backs out" any changes since the last commit

7. Logic Facility

Next we describe ALF, the Alltalk Logic Facility, which gives the Smalltalk programmer logic programming capabilities, integrated in a natural way with the object-oriented programming paradigm. The word ALF stands for both the programming language (which is an extension to Prolog), and the runtime logic used to maintain, compile, and execute ALF programs.

7.1 Introduction

ALF is written entirely in Alltalk, and runs under the Alltalk system like any other application. Facilities are provided to compile logic programming statements, to group them into programs, and to submit logic queries against ALF programs. All of these features can be invoked from any Alltalk program and answers to queries can be subsequentiy used in Alltalk programs. Since ALF is implemented in the Alltalk system, ALF also provides permanence for its objects, i.e., rules, facts and queries.

In the following text Smalltalk classes are capitalized, and in general Smalltalk nomenclature is in italics or boldface. Multiple word keywords are run together, with capital letters indicating word breaks, as in solve Query. 7.2 ALF Language

7.2.1 Relationship to Prolog and LOGIN

The ALF language is similar to the LOGIN language developed by Hassan Ait-Kaci and Roger Nasr, which in turn is an extension to Prolog. ALF differs from LOGIN in some details of syntax, and in its integration with the Smalltalk language. Both ALF and LOGIN generalize unification by taking into account a lattice relationship among types, which in the case of ALF is the Smalltalk class hierarchy. Both ALF and LOGIN also generalizes the syntax for terms to allow "attribute labels", which for ALF are taken as identical to the Smalltalk (names of) instance variables.

7.2.2 Definition of ALF

As in Prolog, ALF statements are made up of clauses, which have a head, followed by an arrow, followed by a tail. The head is a single atom, while the tail is a list of atoms separated by commas. Clauses with botii a head and a tail are called rules, those with only a head are called facts, and those with only a tail are caUed queries, as in standard Prolog terminology. Again, as in Prolog, atoms are comprised of predicate symbols with arguments (called terms). The terms are named (unlike Prolog) rather than being positional, and (again, unlike Prolog) can be typed. The type is indicated by the name of a Smalltalk class, and the type itself can be further qualified by giving additional term values for the type class (and these may again by typed, and so on, indefinitely).

Unification of atoms in ALF is the same as in Prolog, except that the unification of logic variable terms takes into account the typing of the logic variable. The following examples will make clear how tiiis works. 7.2.3 Example of ALF rules

Here is an example of an ALF rule:

Hearty(thing=Person(name=X:))←

HeaIthy(thing=Person(name=X:, age=Y:)), Less Than(smalIer=Y:, larger=100).

In this example, Hearty, Healthy, and Less Than are all the names of (Smalltalk) subclasses of class Predicate. Hearty and Healthy have at least one instance variable called thing. It may be that there are other instance variables in Hearty and/or Healthy but there is no way to tell from the rule's specification. Similarly, Less Than. has at least two instance variables, called smaller and larger. Person, which works like a Prolog functor, is merely some subclass of class Object. It has at least two instance variables called name and age. Anyth ing followed by a colon is (the name of) a logic variable, so X: and Y: are both logic variable names.

The rule states that anything that is a person, is healthy, and whose age is less than 100 is also hearty. If we have an object in the Alltalk system of class Healthy whose instance variable thing has an assigned value that is of class Person, and if th is Person object is such as to have an age th at is smaller than 100, the ALF resolution mechanism when applied against the above rule will allow us to assert that the name of our child is also the name of a hearty person.

Now consider the similar rule: Hearty(thing=Person(name=X:)) ←

Healthy(thing=Z:Person(name=X:, age=Y:)),

Less Than(smaller=Y:, larger =100).

Typing the logic variable Z: allows the ALF unification rule to consider objects of subclasses of class Person (as well as objects of class Person itself) to unify with the thing object. Thus, suppose we have an object in the Alltalk system of class Healthy whose instance variable thing has an assigned value that is of class Child. Further suppose th at class Child is a subclass of class Person. Thus class Child also has instance variables of name and age, inherited from class Person. The ALF unification algorithm, will allow the first atom of the tail of the above rule to unify with our fact and our instance of Child (which we assigned into the thing attribute) will unify with the "Person(name=X:, age=Y:)" term, binding X: to the name that occurs in our specific instance of the class Child. If this instance's age (now bound to Y:) is less than 100, the ALF resolution mechanism will allow us to assert that the name of our child is also the name of a hearty person.

It is not required to type the instance variables at any level. For example the rule

Hearty(thing=X:)← Healthy(thing=X:).

asserts that any thing that is healthy is also hearty. On the other hand, typing one of the logic variables in the above:

Hearty(thing=X:)← Healthy(thing=X:Person).

asserts that healthy persons are also hearty (and so are any healthy things that happen to be instances of subclasses of class Person). Type qualification can be nested indefinitely. Thus we may have:

Hearty(thing=X:)←

Healthy(thing=X:Person(profile=Profile(age=W:, country=Y:,

hobby=Sport(name="jogging",level=Z:)))),

Less Than(smaller=W:, larger=65),

Sports Loving(Y:),

Level Less Than(lower="novice", higher=Z:). which means th at any person that is healthy, is less than 65 years of age, from a sports-loving country, and has a hobby of jogging with an expertise level greater than "novice" is hearty.

The syntax of the ALF language is discussed further in the section on the lexical analyzer.

7.2.4 Built-in Predicates in ALF

Unification is accomplished through a method in class Object. This method is overridden for built-in predicates (like Less Than and Level Less Than in the above examples). Thus Smalltalk polymorphism allows one to specify different unification algorithms for each of the built-in predicates. It should also be noted that the unification algorithm tests for "=", implying that the "=" selector will be resolved in the class of the first unificand: another example of how Smalltalk' s polymorphism is used during unification.

As a further integration of ALF and Smalltalk, we have established the following built-in predicates as subclasses of class Predicate: Send 0, Send 1, Send 2, ... in order to send Smalltalk messages from ALF programs. These predicates take arguments receiver, answer, selector, and n additional arguments. The receiver is the receiver of the message to be sent, the answer is the object returned from the message send, the selector is tiiat of the message send (i.e., a Symbol representing the selector to accomplish the message send), and the remaining arguments, if any, are arguments to the message send itself. The unification algorithm in these Send N predicates cause the indicated message to be sent. The receiver must be bound, as must the selector. The message is sent and the result is either bound to the answer or checked against it depending on whether the answer is free or bound in the goal being proved.

In order to provide access to Alltalk objects that are not Predicates (or subclasses of class Predicate) we have established the built-in predicate Exists(is=X:). This will answer true if its single argument exists in the database. Thus

Same Names (ss No=X:)← Exists (is=Person (first Name=Z:, last Name=Z:,ss No=X:)). when invoked by the query

← Same Names (ss No=X:). will cause the database to be scanned for all objects of class Person (and subclasses thereof) with the same first and last name. This Exists built-in predicate will allow any object in the database to be considered a atom, without the need to explicitly set up predicates and assign th ese objects to their arguments. That is, all ALF programs implicitly assume a set of facts: Exists(is=X:) where X: is any object in the Alltalk database. Exists may appear only in the tail of a clause, not in the head. 7.3 ALF Programs

In ALF, clauses are grouped into Alf Programs. An instance of class Alf Program has an instance variable rule Dictionary, which contains lists of the clauses (rules and facts) belonging to the program, keyed by the head predicate. As in standard Prolog, the order within the lists is the order of assertion, and the ALF resolution mechanism respects this. Other instance variables of Alf Program are author, date, comment, and name. Removal of a clause from a program's rule Dictionary provides a Prolog-like retract facility. Addition of a clause to a program gives a Prolog-like assert facility.

There is a class variable in Alf Program, called Pgm Dictionary, which registers all of the ALF programs in the system, keyed by the program's name. Queries in ALF are submitted against a specific Alf Program. Throughout execution of the query, the resolution mechanism looks first at the rule Dictionary for the program requested. If a rule with the appropriate head is not found th ere, it looks at the rules in the rule Dicaonary for the program alf Built ln. This is the way that programs can all share common rules (like the built-in predicates, and otiiers, like the ubiquitous append).

7.4 Object representation of ALF clauses

All clauses are represented in Alltalk as instances of class Clause, and are ALF rules, facts, and queries. Included in the instance variables of class Clause are head and tail. If head is ml, we have a query. If tail is nil, we have a fact. Head must be of class Predicate, or a subclass thereof; tail is a Linked List, whose links must be of class Predicate, or a subclass thereof. An example of compiling a rule is given below. The compilation process merely consists of setting up the appropriate instance of class Clause, and assigning to the head and tail the appropriate objects. If the fields (instance variables) in the predicates are further specified, we set up instance objects of the appropriate class and initialize the predicates' instance variables to these objects. For any instance variable not specified (either in the predicate or elsewhere in the terms), we set up separate instances of class Logic Variable and initialize these unstated instance variables appropriately.

As an example of the compilation process, consider the first "Hearty" rule specified above. To compile this we do the following:

1. Set up a new instance of class Clause to hold the rule. Call it new Clause.

2. Compile the head of the rule.

A. Set up an instance of class Hearty. Call it new Hearty.

B. Set up an instance of class Person. Call it new Person. Set its instance variable name to a new instance of class Logic Variable which will be known by the user as X:. Set oth er instance variables, if any, in the new Person to new (anonymous) instances of class Logic Variable.

C. Assign new Person to the thing instance variable in the new Hearty. Set any nonspecified instance variables in new Hearty to new (anonymous) instances of class Logic Variable.

D. Assign the new Hearty into the head instance variable of new Clause.

3. Build up the tail.

A. Make a new instance of class Healthy. Call it new Healthy.

B. Make a new instance of class Person, call it new Person2, and assign its name instance variable from the same logic Variable assigned in the head (X:). Assign into the instance variable age in new Person2 a new instance of class Logic Variable, which will be known to the user by the name Y:.

C. Assign the new Person2 into the thing instance variable of the new Healthy. As above, initialize any unspecified instance variables to new, anonymous instances of Logic Variable.

D. Assign the instance new Healthy into the tail linked list of the new Clause.

E. Build a new instance of class Less Than, called new Less Than, and assign to its instance variable smaller the appropriate instance of Logic Variable, which has already been set up (Y:). Assign to the instance variable larger the integer object 100. Set any uninitialized instance variables to new (anonymous) Logic Variables.

F. Attach the new Less Than to the tail linked list in the new Clause.

4. Attach the new Clause to the rule Dictionary of the appropriate Alf Program. The representation of clauses, predicates, atoms, and logic variables as Smalltalk objects, and particularly the fact that an ALF term can be any Smalltalk object (and vice versa) is the key idea in the integration of ALF with the rest of the Alltalk system.

7.5 Use of ALF within Alltalk by Application Programmers

From the above, it can be seen that all clauses in ALF are simply objects in Alltalk, which means that the application programmer can move between the logic system (ALF) and the object system (Smalltalk) without converting data between the two systems.

The programmer can write Smalltalk methods that dynamically construct clause objects and insert them as rules in an Alf Program (or, for that matter, dynamically construct new Alf Programs). More commonly, the rules can be submitted as strings (like those above) from the program development environment, interactively, by the programmer. The strings will then be compued to the appropriate clauses and stored in the database, awaiting query submission. The ALF compiler is described in a subsequent section.

Queries too can be submitted interactively as strings, compiled by the system, and the answers returned (as in standard Prolog systems). More commonly, the programmer can build up ALF queries from Smalltalk programs and submit them to existing Alf Programs without ever building a string representation of the query. The idea is that some objects created by an application will have instance variables that are best calculated "procedurally", via normal Smalltalk, and others that are best calculated via the logic system. The application will first calculate the values of the "procedural" instance variables, and fill in the remaining ones with appropriate instances of class Logic Variable. The constructed object can now represent a term to the logic system. Next an instance of the appropriate Predicate will be created, and the term will be assigned into an instance variable of this predicate. Now we have a query. The application will then submit the query to the appropriate Alf Program, and the values of the logic variables that are returned can be used to fill in the "non-procedural" instance variables of the original object replacing the previously assigned Logic Variable instances. The fully instantiated object can then be used in subsequent application logic.

We now examine the logic in the chief components of the ALF system.

7.6 Logic of the ALF Compiler

7.6.1 Overview

The ALF compiler is a combination of an ALF program and some Smalltalk programs. Figure 23 shows an overview of the ALF compiler, which operates as follows: a new instance of the class Alf Compiler 210 is established to compile a rule. A message is sent from an Alltalk application program 212 to the new instance 210. The parameters in the message are the rule 214 and the name 216 of the ALF program that the rule is for (the rule is in the form of a string). The compiler instance 210 will set up a new instance of an Alf Lexer 218, and pass the rule 214 to be compiled to it The instance of Alf Lexer 218 will turn the rule string into a list of tokens 220, passing this back to the Alf compiler instance, 210. The compiler 210 will then set up a logic query 222 using the token list and establish an Alf Query instance 224 to process it The Alf Query instance 224 will process the query against a specific ALF program called #alf Parser 226. If the query is solved by the alf Query 224, the alf Compiler 210 will return this indication to the original program 212, after updating the ALF program 230 with the compiled rule 228. The ALF program 230 is the one whose name 216 was specified by the application 212. The string submitted for compilation by the user is passed to an instance of class Alf Compiler via the message

alf Compile: a String for Pgm: a Pgm Name comment: a Comment which includes the string to be compiled, the name of the ALF program that is to include the string as a new clause, and a user comment to document the new clause. The compiler passes this string to an instance of Alf Lexer, via the message

alf Scan: a String

which returns a list of tokens, which is an instance of class Alf List. This Alf List instance that is returned behaves just like a Prolog list, and contains instances of class Alf Token. An Alf Token has, as instance variables, a type (for the parser to identify the kind of token) and a value (to be used in the code generation process). The compiler passes the token list as a query to an ALF program (called #alf Parser) 216, which will parse the token list and construct the clause object (as in the above example). The clause object is returned bound to one of the variables in the logic query. The logic query constructed looks like:

<- Is Clause (token List=from Lexer, obj=X:, compiler=self). where from Lexer is the object returned from the Alf Lexer, and X: is a logic variable that will be bound by the query processor to the clause object that represents the input string.

Once returned, the clause object will be added to the oth er clauses in the Alf program which was specified when the clause string was submitted. The message that accomplishes this is

add Rule: a Clause

which is sent to the Alf Program specified by the programmer when the input string was submitted. Adding a new clause to a program causes certain optimization logic to be executed, as will be explained in a subsequent section.

7.6.2 ALF Lexical Analyzer

The lexical analyzer 212 routines are all in class Alf Lexer. The primary message is

alf Scan: a String.

where, a String is the string to be scanned. The Alf Lexer is organized as a finite state machine, and looks one character ahead to determine the next state to assume. The lexer removes all white space from the input string (blanks, tabs, new lines), as well as any ALF comments ( which are designated by including text in single quote marks).

The states assumed by the lexer are:

0. Processing the first character of a new token.

1. Processing the interior of an identifier name (i.e. class name or instance variable name).

2. Processing the last character of an arrow symbol (i.e. the '-' in '<-'), which separates the head from the tail of a clause.

3. Processing the interior of a number, to the left of an optional decimal point

4. Processing the first character after the minus sign ('-') in a negative number.

5. Processing the first character ('_') of an anonymous logic variable.

6. Processing the interior of a number, to the right of an explicit decimal point.

7. Processing the interior of a string constant Strings are enclosed in double quotes (") in the source text 8. Processing the interior of a symbol. Symbols begin with a '#'.

9. Processing the interior of a symbolic constant These begin with a '%', and stand for the constants: nil, true, false. Class objects can also be designated via symbolic constants by following the '%' with a class name.

The lexer assumes a new state based upon the state it is in and the look ahead character (i.e. the next character in the input string). Before switching to state 0, we will have consumed a lexeme and be ready to output a token. This logic is handled via the "accept" methods, which are:

1. accept LP: output token type and associated value is "(".

2. accept RP: output token type and associated value is ")".

3. accept EQ: output token type is and associated value "=".

4. accept COMMA: output token type and associated value is ","

5. accept CUT: output token type is #CUT and associated value is a new instance object of class Alf Cut. The Alf Cut objects denote a Prolog type cut and are represented by a '!' in the input string.

6. accept Arrow: output token type and associated value is "<-".

7. accept LB: output token type is "[". This represents the start of an Alf List (which is like a Prolog list). The associated value is an Alf Empty List if the look ahead character is an "]". Otherwise, the value is a new instance of Alf List.

8. accept RB: output token type and associated value is "]".

9. accept BAR: output token type and associated value is "I". The "1" indicates the beginning of the tail of an Alf List, as in standard Prolog.

10. accept Number: output token type is #CONSTANT and associated value is an instance of eith er class Integer or Float, depending on whether the input string had no decimal specified, or had an explicit one specified.

11. accept String: output token type is #CONSTANT, and the associated value is an instance of class String, as taken from the input

12. accept ldentifier: the lexer looks the identifier up in a symbol dictionary, which is maintained by the Alf Lexer. If the identifier is in the dictionary, the associated token is used as the output If it is not in the symbol dictionary, it is added and a token is associated as follows:

a. If the first digit is uppercase, and the last is a colon (":"), a token is set up with type #logic Var and value a new instance of class Logic Variable.

b. If the first digit is uppercase, and the last is not a colon (":"), and the string is the name of some Smalltalk class, then a token is set up with type #predicate Name or #class Name depending on whether the string is the name of a class that does not or does have class Predicate in the superclass chain.

c. If the string is "_:", a token is set up whose type is #logic Var and whose value is a new instance of Logic Variable. This represents an anonymous logic variable.

d. If none of the above cases hold, a token is set up whose type is #inst Var Name, and whose value is the symbol which is the same as the input This represents the name of some instance variable. The parser will check that the instance variable does belong to the specified class. 13. accept Char: output token type is #CONSTANT, and associated value is the instance of class Character that is the same as the input

14. accept Symbol: output token type is *#CONSTANT, and associated value is the instance of class Symbol that is the same as the input.

15. accept Symbolic Constant: output token type is #CONSTANT, and associated value is the instance nil, true, or else the Class object, that is represented by the input string.

After accepting a token, the lexer puts it in the evolving Alf List, and reverts to state 0. When all tokens have been constructed, the lexer returns sie Alf List, unless an error was detected, in which case it returns the appropriate error. 7.6.3 Parser and code generator

As explained above, this is an ALF program and consists of clauses th at parse the Alf List passed by the Alf Lexer, and build up the objects that represent the clause. In the main, the objects necessary have already been constructed as the values of the various Alf Tokens in the Alf List passed by the Alf Lexer. Modification of these objects is accomplished in the parser by using the builtin predicates: Alf Send 0, Alf Send1, Alf Send2, and Alf Send3. These predicates cause message sends to occur that will modify the objects in the Alf Token values.

Eventually, the parser rules will cause the final clause object to be created, and this is passed back to the compiler. If an error is discovered, an error message is passed back instead.

A complete listing of the ALF rules for the parser/code generator can be found in Table 7.1.

7.6.4 Optimizations in class Alf Program

The clause object that is passed back from the parser to the compiler is then sent to the Alf Program specified in the original compilation message. The message to update the ALF program is:

add Rule: a Clause,

where a Clause is that returned from the parser. The receiver of this message is the Alf Program specified by the programmer. If this program does not already exist the ALF compiler 210 will set it up.

Class Alf Program contains the necessary methods to update an Alf Program with a new clause, and to delete old clauses. Each Alf Program includes the following instance variables: clause Lists, which is the list of all clauses belonging to the program, and a rule Dictionary, which contains lists of clauses in the Alf Program, keyed by the class of the head atom of the rule. Thus each element in this rule Dictionary is a sub-list of clauses contained with in the program, all of whose heads belong to the same class (this class being the key to the dictionary).

To add a new clause, the message

add Rule: a Rule

is sent to the appropriate Alf Program, and will execute the following logic:

1. Determine if rules already exist for the program with the same head as the new rule. If not, set up a new (empty) rule list and add it to the dictionary with a key that is the class of the head of the new rule.

2. Add the new clause to the linked list of rules that belong to this program (rule List).

3. If the new rule's head already existed in the rule Dictionary, this single rule is optimized as follows:

a. In the link object that links the new rule to the oth er rules for this program (in the Linked List clause Lists) set up an array with size equal to the number of atoms in the tail of the new rule. We call this array rule Array.

b. At each element of the rule Array, place the list of rules th at could unify with the corresponding atom of the tail of the new rule. This list comes from the programs rule Dictionary, keyed by the class of the atom of the tail.

c. If no rule list is found in rule Dictionary, look in the ALF program #Alf BuiltIn for built-in rules that will unify. If found, update the link's rule Array accordingly.

4. If the rule being added contains a head that was not previously in the program's rule Dictionary, optimize all rules in the program (including the new one) according to the above logic.

5. If the rule being added is for the program #Alf BuiltIn, re-optimize all rules in all programs according to the above logic. There is a class variable in Alf Program called Program Dictionary that contains all of the Alf Programs in the system, keyed by the name of the program.

Thus it can be seen th at the optimization logic constructs for each atom of the tail of a rule, a list of rules whose heads the atom can potentially unify with. This will speed up the query solving logic discussed in a subsequent section.

7.7 Query Solving in ALF.

The main logic for solving logic queries is contained in class Alf Query. This class includes the following instance variables (their class is indicated inside <> : 1. query Clause <Clause> the clause to prove.

2. env <Array> the environment to use for this invocation of the query.

3. choice Point Stack <Stack> of choice Points. This acts like a stack in that the last choice point discovered is first on the list. When this is empty, there are no more choice points that can be taken, and thus tiiere are no more answers to the query.

4. goal Stack <Goal Stack> of Goal Stack Links. This represents the current set of goals to prove. All must be solved in order to answer the query. If the Alf Query fails to prove a goal, or if the goal- Stack becomes empty, next choice point is executed in order to obtain another answer to the query.

5. alf Pgm <Alf Program> against which to execute the query.

6. trail <Trail> the trail of bindings to undo at the various choice points. As unification proceeds, the Alf Query keeps track of the old values of logic variables in this trail stack. Undoing these unifications restores the state of query processing to a point where the next choice Point can be executed.

7. current Freeze Pt <lnteger> All goal Links below, and including the one marked by this point are currently frozen, and must not be altered. This means that some choice Point is pointing into the stack at this point, and hence the stack must be preserved starting with the goal Link marked by tiiis current Freeze Point If the stack is not frozen above a given goal, the Alf Query removes the goal from the stack. Otherwise, it copies the stack before removing the goal, so that existing choice Points will be able to pick -up using the old state of the goal Stack.

The application programmer will normally set up a new Alf Query by sending the message

new Query: query Clause for Pgm: an Alf Pgm

to the class Alf Query. This will set up a new query and initialize it. Answers to the query can be obtained by sending the message

next Answer

to the query. Repetitive next Answer messages will find new solutions, until the answer #fail is returned, indicating no more answers to the query exist. When an answer is found, Alf Query returns the query itself. The env of the query will contain the logic variables (and thus their bindings) th at were used in the original query. The programmer can send the message

dereference Copy Using Env: query Env

to the query Clause of the original query in order to obtain that clause with the logic variables replaced with their bindings.

7.7.1 Finding the Next Answer for a Query.

The methof next Answer checks the choice Point Stack, and if th is is empty returns #fail, since there are no more answers. Otherwise, it sets up the system to process the first choice Point on the stack. To do this, it backs out all of the bindings of logic variables tiiat were made subsequent to the establishment of the choice Point These bindings are all kept on the stack called trail, and each choice Point points into this trail stack. Alf Query undoes the bindings required by processing those on the trail that follow the choice Point' s trail pointer. As in standard Prolog, these choice points represent alternative paths to take in the resolution logic for solving the query. They are placed on the choice Point Stack as they are encountered.

In ALF, a single choice Point object represents all alternatives for proving a given goal. Each choice Point contains a next Rule To Try which is a link in the Linked List of rules that match the first atom in the goal stack for the choice Point If this next Rule To Try is nil, Alf Query removes the choice Point and recalculates the freeze point of the current goal stack. If the next Rule To Try is not nil, Alf Query restores the goal Stack to be that which was saved in the choice Point and sends the following message to the query:

solve Choice: next Choice Point

where next Choice Point is the current one on the choice Point Stack. This metiiod solve Choice continues until another answer is found, or there are no more answers for the current choice Point The method next Answer continues the processing for the next choice Point on the stack.

The initialization logic for the query will have established the first choice Point (which is the query itself), and found the first rule To Try by looking in the rule Dictionary of the program submitted with the query.

7.7.2 Solving a Choice Point.

The method solve Choice: loops for so long as it can prove atoms on the goal stack, until it cannot prove one, or the goal stack is empty. The latter condition constitutes successful binding of the query variables, the former results in returning #fail as no more answers exist (for the current choice Point). If a choice Point results in failure, this metiiod will not remove the choice Point from the stack, but return to method next Answer to try the next one. Before entering the main loop, the method solve Choice initializes some temporary variables: rule To Try, atom To Prove with the first being set to a link in a list of rules that is in the next Rule To Try variable of the current choice Point, and the latter (atom To Prove) being the first goal on the current goal Stack.

The main loop in solve Choice sees if the current rule To Try is nil, and if so, returns #fail, since no further progress can be made on this choice Point Otherwise, it attempts to unify the current atomTo Prove with the head of the rule pointed to by teh link rule To Try by:

1. Obtaining a new environment for this execution of the rule by sending the message

new Env

to the rule. An environment is an array of new logic variables to use for the execution of the rule, and is explained further below in the discussion of Logic Variables.

2. Attempting unification by sending the message

unify Using Env: goal Env with Predicate: rule Head using Env: rule Env

trailing: trail from Query: self.

This message is sent to the current atom To Prove. The unification algorithm is discussed in section 7.9 below. If the message returns #fail, indicating unsuccessful unification, the rule To Try is obtained by following the current one (remember this is a linked list of rules whose heads are of the same class as the, current atom on the goal stack). Unification is then attempted again, continuing until unification is achieved, or there are no more rules to try. The latter case causes #fail to be returned to the calling method (nextAnswer), which obtains the next choice Point and tries again.

Assuming successful unification, and assuming the metiiod is working on the choice Point that was passed into this method, the current choice Point's next Rule To Try is updated so that the next time tiiis choice Point is taken the next available rule is used. In any case, all of the logic variables in the rule's environment are marked as "not local" (this means dial subsequent binding of these logic variables will have to be undone on backtracking. I.e., they will be put on the trail prior to binding them during unification). Local and non-local logic variables are defined in section 7.9 below.

If there are other ways to prove the current atom To Prove in the goal list, and if there is not already a choice Point for this atom, the method sets up another choice Point for this atom, and places it on the choice Point Stack. Backtracking will then allow the method to resume execution, trying the alternative rule. There are potentially other ways to prove the current atom To Prove if the rule To Try points to a non-nil next link. This means th at there are additional rules whose head could potentially unify with the atom To Prove.

The goal that has been unified with the rule head can now be removed from the goal Stack. If the rule that the method is using has a tail, it pushes all of the atoms in the tail on the goal Stack: they represent new goals that must be proved Next, the method examines the goal Stack, and if it is empty, it returns since the query has been proven. At this point, the environment of the query wuI have all of its logic variables bound to the answer.

If goals remain on the goal Stack, the method returns to the top of the main loop, after the following logic:

1. Set the new atom To Prove to be the current one on the goal Stack.

2. Look for a rule that can potentially unify with the new atom To Prove, by looking at the array of rule lists kept in the link that links all of the rules together within a program (discussed above). 3. Set the new rule To Try to be the first link in the list mentioned.

Branching back to the top of the main loop will then attempt unification of the new atom To Prove with the new rule pointed to by the rule To Try, and the method continues proving goals until it fails on an atom, or runs out of them.

7.7.3 Debugging.

Class Alf Query has debugging features that can be turned on or off (by sending messages to the query). Include are:

1. Counting. This will keep track of the total number of choice points at any time, and put out a message when this changes.

2. Tracing. There are multiple levels of tracing. It is possible to display the following:

a. When backtracking occurs.

b. When goals are removed from the stack to be proved. The goal is printed out

c. When a rule head tries to unify with a goal. The rule is printed.

d. When unification succeeds or fails. The goal is printed.

e. When a goal is proved. The goal is printed.

f. When the tail of a rule is pushed on the stack of goals to prove. The entire goal stack is printed.

There exists a long form and a short form for printing out the goals, which can be selected by the user.

7.8 Class Clause

As mentioned above, the basic unit of compliation in ALF is the clause. This class includes the instance variables head, tail, copy Env, save LV. When a clause has been constructed by the compiler (or by a programmer), it must be initialized with the message set Copy Env. The purpose of this method is to construct the copy Env for use during the resolution process. This environment is copied to obtain a new set of Logic Variables for every execution of the clause. The idea of an environment is to obtain a new set of logic variables tiiat can be pointed to by th ose in the clause itself, and which are bound and unbound during unification. The logic variables in the clause itself hold the index into the environment array. Thus during unification of an atom in the clause the logic variables that actually occur in the clause are not considered, but rather those that are pointed to (in the environment) by the index in the logic variables.

In order to construct the copy Env, all of the logic variables that occur in the clause are examined, and an index, which increments by one, is assigned to each one. Logic variables that are the same in the clause are mapped to the same logic variable in the environment. This achieves the necessary common referencing during the unification process. Once constructed, the copy Env is copied when the query processing sends the message

new Env

to the clause to obtain a new environment for execution of the rule.

7.9 Unification and Logic Variables in ALF

Class Object contains the default unification algorithm. The algorithm checks to see if the second object (a parameter in the unification message) is of class Logic Variable, and if so, will resend the unification message to the second object, ratiier than using the default algoridim. Two objects will unify using this algorithm if they are of the same class, and each instance variable in the two objects unify. If an object has no instance variables, unification is achieved if the objects are equal.

The default algoridim is overridden in subclasses of class Predicate, where required, in order to implement the built in predicates. For example, the Alf Fail predicate always returns #fail as the answer to the unification message.

The unification algorithm is also overridden in class Logic Variable. This class includes the following instance variables:

1. bound <Boolean>, true means the logic variable has been bound

2. binding this can be of any class, and is the object th at the logic variable has been bound to.

3. user Name <String>, this is the name that the user has established for this logic variable.

4. is Local <Boolean>, true means that the logic variable does not need to be unbound on backtracking.

5. environlndex nil means this Logic Variable is not resolved through the environment array, but points directly to its binding. not Nil means that the Logic Variable must be resolved dirough the logic variable in the environlndex, in the environment. This is tiie case when the logic variable is in a rule, and copies of the logic variables are used to do unification (one environment set up for each invocation of the rule).

6. binders Env <Array> If a logic variable points to a term that itself has logic variables in it, this is used to resolve those logic variables. This is needed to trace back the variables in the original query, and when a logic variable in one rule is bound to a term that contains a logic variable from another rule. For environment logic Variables (i.e. those with environlndex not ml), it is assured that the binders Env is always nil (tiiat is the way they were set up when the clause was created).

We now summarize how logic variables are used and bound:

1. If the variable is in a clause, the metiiod goes through the environlndex. A new environment (an array of logic Variables) is established every time the clause is executed, and the variable in the clause itself is used only to get the 'real' variable in the current environment via the environlndex. Thus these 'environment variables' are never bound to anything, in the sense that binding is always nil. 2. It should be noted that whenever a logic variable X is bound to another logic variable Y, Y can not be an environment logic variable. The reason is that logic variables are always 'dereferenced' before binding them to another one. Thus if Y is unified with another logic variable (X), and X is an environment variable, X will be dereferenced to its

'non environment variable', Z, and bind Y to Z which is not an environment variable (Z is a member of the environment array, and has environlndex set to nil). Note dial in our example, Z could itself be a term that contains environment logic variables, resolved by a different environment: the binders Env which will be found in Y.

3. When a logic variable is unified with another term, the two environments are passed: one for the logic variable and one for the term. If our term is not itself a logic variable, but contains logic Variables, the term environment is necessary for further unification if any. In order to be able to access the term's environment upon subsequent unifications, the term's environment is placed in the logic variable's binders Env. Then when unifying a new term (T) against the logic variable, the logic variable is dereferenced, but then the term is unified against T using as an environment for the term the binders Env stored in the logic variable.

4. Thus, in general, when attempting the recursive unification algorithm, logic variables are dereferenced eitiier through their environlndex (first priority), or their binding. If they are bound, the environment to use for what they are bound to is found in the logic variables binders Env, the latter having meaning only for bound logic variable's, else it is nil.

5. Local versus Global logic variables. Logic variables in environments can sometimes be replaced in their home environment instead of bound to a term they are unifying with. The reason is that tiiey need never be undone on backtracking and they are not being bound to terms that contain logic variables from another environment (tiiese latter would pose a problem, since there is no place to store the binders Env if the environment slot is merely replaced with the term it is being unified with). Class Clause sets is Local to true in the logic variables that are in the head of the clause. When unifying with an environment logic variable that has a local logic variable in its environment slot which is unbound, replace the slot value with the unbound logic variable tiiat the method is attempting to bind it to. Even if a term the metiiod is attempting to bind it to is a bound logic variable (or not a logic variable at all), if the receiver logic variable is an unbound, local, environment variable the logic variable is not put on the trail, since it never needs to be undone.

Given the above, we now present the detaded algorithm. In the discussion below, we refer to the receiver of the unification message as "self". This is always a Logic Variable. We refer to the term that self is to be unified with by "a Term".

1. If a Term is a Logic Variable, self is a local logic variable, and a Term is unbound, the appropriate self's environment slot is set to the a Term, and success is returned.

2. If a Term is a local, unbound Logic Variable, and self is unbound, a Term's environment slot is updated with self, and success is returned.

3. If self and a Term represent the same object, success is returned.

4. If self is a local, unbound Logic Variable, setf is bound to the value of a Term, and success is returned.

5. If self is unbound, but not a local Logic Variable, self is bound to the value of a Term, self is put on the trail, and success is returned. 6. If self and a Term are bound to objects that are of the same class, the general purpose algorithm in class Object, is used, so a unification message using the binding of self as the receiver is sent and the answer to this message is returned.

7. If the method has not yet returned by this point the general purpose algoridim tiiat checks to see if a Term is bound to an object that is a subclass of the class of the object tiiat self is bound to or vice-versa must be invoked. If either is the case, the appropriate logic variable is placed on the trail, and it is re-bound to the binding of the other, provided all instance variables of the two bindings unify. If any instance variable fails to unify, failure is returned. Otherwise, success is returned. If both a Term and self are bound, and the objects they are bound are not type compatible in the sense above, failure is returned from the unification.

Advantages

The Alltalk system provides the following advantages over the prior art:

1. The ability of a programmer to divide an application into a logic part and an object-oriented part, and to move between the programming styles easily, widiout conversion of data.

2. The ability of a programmer to write applications tiiat store data on disk widiout explicit database management or file management statements.

3. The garbage collection system offers the following advantages:

a. Little execution time overhead.

b. Evenness of processing with no long gabs during which the system is unavailable due to garbage collection.

Claims

1. A program tool, comprising

a. a workstation having an operator interface, a mass memory, a CPU, and main memory; b. an object oriented programming language system including,

1). an object oriented programming language, and

2). object oriented language compiler means for translating source code written in the object oriented programming language into objects and interpreter code; c. a logic programming language system, having components representing terms, clauses, predicates, atoms, and variables, including,

1). a logic programming language, and

2). logic language compiler means for translating source code written in the logic programming language into objects;

d. a database residing in said mass memory, for storing objects and components of a logic programming language as objects in a common data structure format, applications data, and applications stored as compiled interpreter code;

e. object database management means for representing objects and components of a logic program in said common data structure format as objects, and responsive to calls for retrieving and storing such objects in said database, and for automatically deleting objects from said data base when they become obsolete;

f. interpreter means for executing said interpreter code and generating calls to said database management means; and

g. logic subsystem means for solving logic queries, said logic subsystem means treating any object as a term in the logic programming language.

2. The programming tool claimed in claim 1, wherein said object oriented programming language provides for calling subroutines written in another language and treating the call as an object.

3. The programming tool claimed in claim 1 or 2, wherein the syntax of said logic programming language is generalized to allow attribute labels which are taken as identical to object attribute names in the object programming language.

4. The programming tool claimed in claim 3, wherein said object-oriented programming language is a dialect of Smalltalk (Alltalk), wherein primitive invocations are treated as objects, and said logic programming language is an extension of Prolog (ALF), wherein attribute labels are taken as identical to instance variable names, and logic variables can be typed using objects.

5. The programming tool claimed in claim 3, wherein said object oriented langugage compiler means comprises:

a. first phase means for performing compilation including parsing, optimization, and interpreter code generation;

b. second phase means in communication with said first phase means for resolving global symbols and loading the database widi objects and interpreter code; and c. an assembler-like intermediate language for communication between said first and second phase means.

6. The prograrruning tool claimed in claim 4, wherein said interpreter code comprises a plurality of types of bytecodes, and said interpreter includes a plurality of bytecode handler means, one such means for processing each type of bytecode.

7. The programming tool claimed in claim 6, wherein said bytecode types comprise:

execute a primitive,

send a message,

define a block,

evaluate a block,

return from a block or metiiod,

branch, and

assign from one variable to another.

8. The programming tool claimed in claim 7, wherein blocks are maintained by the interpreter means as C data structures, and are made into objects when they are assigned to instance variables, or returned as the result of a message.

9. The programming tool claimed in claim 8, wherein contexts are maintained in said interpreter means as C data structures; and are made into objects if and when an associated block is made into an object

10. The programming tool claimed in claim 9, wherein the bytecode handler means for the "define a block" type bytecode generates block stubs, and wherein said interpreter means creates active context(s) for a block stub, stored separately from said block stub, and wherein said active contexts associated widi block stubs obey a stack discipline.

11. The programming tool claimed in claim 10, wherein said interpreter means maintains runtime data structures for object-oriented processes in an array, each element in the array representing one process, each element containing a stack of active contexts, a pointer to the current context in the stack, an array of block stubs, and a pointer to the nest available block stub.

12. The programming tool claimed in claim 11, wherein said interpreter means manages processes by creating processes, switching processes, destroying processes, and performing optimizations on processes.

13. The programming tool claimed in claim 12, wherein one of the optimizations performed by the interpreter means is message flattening.

14. The programming tool claimed in claim 12, wherein in one of the optimizations performed by the interpreter means is treating each primitive as its own bytecode.

15. The programming tool claimed in claim 3, further comprising debugger means for providing debugging capabilities such as setting break points, stepping through program execution, tracing information (e.g. messages, blocks, bytecodes, processes), and displaying values of data structures, said debugger means being integrated widi said interpreter means and including a set of C routines for performing tasks associated with the debugger commands, code within the interpreter, and a set of global variables and constants used to communicate between the C routines and the code in the interpreter.

16. The programming tool claimed in claim 3, wherein said object database comprises a key file and a prime file, the prime file having records of variable length containing objects, and the key file having records of fixed length containing the address and record length of objects in the prime file.

17. The programming tool claimed in claim 16, wherein objects in the prime file can be one of 6 types, including:

normal objects,

a symbol cross reference record that contains a string for a symbol and associated object identification of a symbol object,

a dictionary cross reference,

a control record,

a checkpoint integrity record, and

logically deleted objects.

18. The program tool claimed in claim 3, wherein said database management means includes: a. object manager means employed by the object oriented language compiler, the interpreter means, primitives, and utilities for providing access to objects in the object database and for maintaining the organization of objects in the database;

b. meth od fetcher means for calling the object manager means to fetch methods for the interpreter;

c. access manager means called by,

1). a buffer manager when objects are to be retrieved from the database,

2). a transaction manager when objects are to be added/updated in the object database at commit points, and

3). the object manager for higher level interface to the database;

d. buffer manager means called by,

1). the object manager means for generating calls to the access manager means and 2). pool manager means for keeping an in-memory copy of objects;

e. pool manager means for maintaining memory for buffers; and

f. garbage collector means integrated with said object manager means and said interpreter means for identifying objects in main memory that are no longer reachable.

19. The programming tool claimed in claim 18, wherein said garbage collector means employees a concept of regions for garbage collection, such tiiat when a context is created, it is assigned a region number, each object created or accessed is assigned the region number of the context that created or accessed it, unless it was previously associated with a lower number; and when an object is returned from a called meth od to the calling method, the object is moved to the region ot the calling method, and when a reference is made from a first object to a second object in another region, the second object is moved to the region of the first object, and when returning from a metiiod, if the context to which it is returning belongs to a region whose number is at least two lower than that of the current region (before returning), then the regions widi the higher numbers than that of the context to which return is being made are collected.

20. The programming tool claimed in claim 19, wherein said garbage collector means includes region cleaning means for detecting when a region has accumulated an excessive number of objects and cleaning the region thus detected

21. The programming tool claimed in claim 19, wherein said garbage collector means includes means for detecting when objects are shared across processes and for insuring that no object is discarded that is in use by another process.

22. The programming tool claimed in claim 19, wherein said garbage collector means includes an off-line mark/sweep collector means for periodically removing objects from the object database that have become unreachable by any other object in the database, by first marking all objects in the database that can be reached, and then sweeping the database to remove unmarked objects.

23. The programming tool claimed in claim 22, wherein said object database contains constants tiiat are permanently marked such that they cannot be removed by said off-line mark/sweep collector means.

24. The programming tool claimed in claim 3, wherein said logic subsystem means performs unification of logic variables to answer logic queries, and in doing so, takes into account the typing of the logic variables to enable constraint of permissible values of logic variables.

25. The programming tool claimed in claim 4, wherein said logic programming language includes a set of built in predicates SEND N for sending messages between the logic programming language system and the object-oriented programming system.

26. The programming tool claimed in claim 25, wherein said set of built-in predicates take arguments "receiver", "answer", "selector", and n additional arguments; wherein "receiver" is the receiver of the message to be sent, "answer" is the object returned from the message, and "selector" is that of the message send, and the n remaining arguments are arguments to the message send itself.

27. The programming tool claimed in claim 4, wherein all clauses in the logic programming language are represented as instances of class "Clause", and are rules, facts, and queries, and wherein included in the instance variables of class "Clause" are "head" and "tail"; if "head" is nil, the clause is a query, if "tail" is nil, the clause is a fact; "head" is of class "Predicate", or a sub-class thereof, "tail" is of class "Linked List" whose links are of class "Predicate", or a sub-class thereof; and wherein values of the instance variables of the "head" and "tail links" can be arbitrary objects.

28. In a heap based programming language system, having garbage collector means for removing objects from memory that are no longer reachable by the system, an improved garbage collector means wherein the improvement comprises: the garbage collector means employing regions for garbage collection, when a context (representing the state of a method which is executing in the system) is created, it is assigned a region number, when an object is created or accessed by a metiiod it is assigned the region number of the context of the method that created or accessed it, unless the object was previously assigned a lower number; when an object is returned from a called metiiod to the caUing method, the object is moved to the region of the calling method; when reference is made from a first object to a second object assigned to another region, the second object is moved to the region of the first object; and when returning from a metiiod, if the context to which it is returning belongs to a number at least two lower than the current region number before returning; the regions with the higher number than that of the context to which it is returning are collected (i.e. the objects in the regions are discarded).

29. The improvement claimed in claim 28 wherein; said garbage collector means includes region cleaning means for detecting when a region has accumulated an excessive number of objects, and cleaning the regions thus detected.

30. The improvement claimed in claim 28 wherein said garbage collector means includes means for detecting when objects are shared across processes for ensuring that no object is collected that is in use by another process.

31. The improvement claimed in claim 28, wherein said system further comprises an object database and wherein said garbage collector means includes off-line mark/sweep collector means for periodically removing objects from the database tiiat have become unreachable by any other object in the database, by first marking all objects in the database tiiat can be reached, and then sweeping the database to remove unmarked objects.

32. The improvement claimed in claim 31 wherein said object database contains constants tiiat are permanently marked such that they cannot be removed by said off-line mark/sweep collector means.

33. The improvement claimed in claims 28, 29, 30, 31, or 32, wherein said system employs an in-use table to keep track of objects that must be kept in memory, said table including a field designating each object's region.

34. In a Smalltalk programming language system, including a Smalltalk programming language, a compiler for translating Smalltalk source code into objects and interpreter code, an object database, a database manager for putting and retrieving objects in the database, and an interpreter for executing the interpreter code and generating calls to the database manager, the improvement comprising: said compiler flagging methods which consist of a single return statement which returns eitiier an instance variable or the result of a primitive for which the first argument is self and the remaining arguments to the primitive invocation correspond to arguments to the method; and said interpreter responding to said flags at runtime by replacing any message send dial would otherwise invoke said methods with, in the first instance, an assign, and in the second instance, a primitive invocation.

35. The improvement claimed in claim 34, wherein a further improvement comprises each primitive performed by the system being associated with a unique byte code.

36. The improvement claimed in claim 34, wherein a further improvement comprises means for providing a pointer to a receiver being held in an associated context, and if a receiver of a message is the same as a receiver of the previous message, the interpreter avoiding a call to the database manager to fetch the receiver again by getting the pointer to the receiver from the current context.

37. The improvement claimed in claim 35, wherein a further improvement comprises: the interpreter replacing message sending bytecodes widi block evaluation bytecodes.

38. In a Smalltalk programming language system, including a Smallltalk programming language, a compiler for ttanslating Smalltalk source code into objects and interpreter code, an object database, a database manager for putting and retrieving objects in the database, and an interpreter for executing the interpreter code and generating calls to the database manager, the improvement comprising: debugger means for providing debugging capabilities such as setting break points, stepping through program execution, tracing information (e.g. messages, blocks, bytecodes, processes), and displaying values of data structures; said debugger means being integrated with the interpreter and including a set of C routines for performing tasks associated with debugger commands, a code widiin the interpreter, and a set of global variables and constants for communicating between the C routines and the code and the interpreter.

39. In an object-oriented programming language system, having an object-oriented programming language, a compiler for translating source code written in the object-oriented language into objects and interpreter code, an object-oriented database, a database manager for retrieving and storing objects in the database and for automatically deleting objects when they become obsolete, and an interpreter for executing the interpreter code and generating calls to the object manager, an improved database manager and object-oriented database, wherein the improvement comprises: the object-oriented database having a key file and a prime file, the prime file having records of variable length containing objects, and the key file having records of fixed length containing the disk address, record length, and type of objects in the prime file.

40. The improvement claimed in claim 39, wherein the objects in the prime file can be one of 6 types including:

normal objects;

a symbol cross reference record that contains a suing for a symbol and associated object identification of a symbol object;

a dictionary cross reference;

a control record;

a check point integrity record; and

a logically deleted object.

41. The improvement claimed in claim 39, wherein said improved database manager comprises: a. object manager means employed by the compiler, the interpreter, primitives, and utilities for providing access to objects in the database and for maintaining the organization of objects in the database;

b. access manager means called by,

1). a buffer manager for returning objects from the database,

2). a transaction manager for adding/updating objects in the database at commit points, and for undoing changes to objects upon abort commands, and

3). the object manager means for providing high-level interface to the database; c. buffer manager means called by,

1). the object manager means for generating calls to the access manager means, and 2). pool manager means for keeping an in-memory copy of objects; and d. pool manager means for maintaining memory for buffers.