GNU SQL Server : Preliminary Description The GNU SQL Server is a portable multiuser DBMS, which supports the SQL89 standard with some extension from SQL92. It implements highly isolated transactions, and static & dynamic query compilation. Currently both the client & server sides of the system work on Unix-like systems. Client/server interaction is based on an RPC mechanism. The server subproceses facility requires message passing and memory sharing facilities. General Architecture GNU SQL is based on a client-server architecture for DBMS organizations. In fact, we support two kinds of "clients and servers" according to a way of mutual communications. The set of processes operating on a data base may be divided into several layers depending on their distance from the data base. The closest or innermost layer is the SDTM layer. These processes provide the facilities of Sinchronization (two-phase predicate locks), Logical Log (the log of updates of logical level), Micro Log (the log for microoperations), Buffer (main memory bufferization, exchanges with disks, locks of pages), and Sorter (to make external sorts). SDTM also includes utilities to restore data base after failures and crashes and a run-time library for transactions. trhis library implements the external interface of SDTM. All processes of this layer communicate with message queues and shared memory. The next layer consists the compiler and interpreter for SQL statements. It implements an interface based on remote procedure calls. It uses the transaction library from the innermost layer to access the data base. The server part of the system also contains administration process that monitors a state of all other processes of the server and starts compiler and interpreter processes on demand. It can also unload all other parts of the server if they aren't used for a long time. This process communicates with other processes of the internal layer with signals and messages and is accessable for external invocations via remote procedure calls. The Administrator may be registered as an internel server. Client parts of the system may reside on any computer of a local network. The client part consists of the client part of the SQL compiler and the client part of the interpreter which are linked together with a user's program. The only requirement for an environment where a client part functions is that it should support the mechanism of remote procedure calls. The client part of the compiler (the precompiler) is responsible for selection of pure SQL-text and its transmission to the main (server) part of the compiler to convert into a procedural representation. After such processing, user gets a pure C program, in which SQL code is replaced by calls of the runtime-library of the client part of the query interpreter. The corresponded procedural translation of the SQL statements is stored within a data base. When executing an SQL program, the server part of the interpreter is provided with reference to previously stored compiled code together with parameters for this invocation. The server part of the interpreter retrieves the code from data base, and executes it with appropriate invocation of the SDTM layer. We decided to use a machine-independent internal representation of procedural SQL code for the sake of portability and reliability, as well as because it is avoids the need for complicated dynamic linking mechanisms for C. Moreover, most of the time is spent not in interpretation butin data retrieval and analysis of retrieval conditions. Therefore we believe this decision has very little influence on the system's overall performance. SQL-Language of GNU SQL Activities for standardization of SQL had started practically simultaneously with its first commercial implementations. The SQL of System R was not developed carefully enough, and commercial dialects were so different that no one could be accepted as a standard. The first International Standard SQL-89 in many parts is not defined precisely and allows different interpretations. This standard also omits such important things as schema manipulation and dynamic SQL facilities. The more complete and precise standard SQL-92 was adopted after the project of GNU SQL had been started. Implementation Requirements to the SQL Dialect for GNU SQL It was obvious that the basis for our project should be SQL. But the problem was to create some complete dialect of this language. It was not possible to use the SQL of System R - in this case the project would be a pure research. An absence of any full descriptions prevented usage of a dialect of any commercial DBMS. The project was started at the end of 1991 when the standard of SQL-92 was not adopted. Available drafts of this standard were too preliminary to use them practically. Thus the only possibility was to base on SQL-89 with necessary additions and extensions. Generally, in forming implementation dialect of SQL we kept in mind following requirements: - the dialect should not contradict to the standard; - the language should be powerful enough to provide practical use of the system; - the extensions should be convenient from practical point of view, and they should not complicate the implementation; - the reductions of the language should not lead to full absence of a correspondent mechanism in the implementation. We describe below the dialect of SQL which was can be used to develop applications with embedded C, to develop interactive applications, and to access data base interactively. The Dialect of SQL of GNU SQL As we noted above, the dialect of the SQL of GNU SQL is oriented towards embedded C and uses an ASCII-compatible character set. The dialect supports data types that are adequate to the base set of types of C, and also strings of various length. Statements for data definitions allow to create and to drop relations and views, and also to grant priviliges to other users. Supported constraints allow to prevent entrance of NULL-values, and to define constraints of uniquety on tables. The language also allows to declare check constraints on a table (conditions that should be true on every tuple of the relation) and referencial constraints. The set of statements for data manipulations includes statements to insert, update and delete tuples from relations. The select statement may include unions (possibly, with elimination of duplicates), joins, and nested subqueries. A result table may be sorted in ascendant or descendant order. A result table may be processed tuple by tuple through a sequential cursor. The SQL dialect is oriented towards use in a Unix-like environment. This environment provides the implementation with an authorization identifier (user name) and a module name for any given C program with embedded SQL. The last name is formed as a name of corresponding source file qualified with user name. ** Is this accurate? The dialect does not support any explicit statement to connect a data base. A connection is provided automatically to SQL-server specified when you compile an SQL program. An SQL server name is simply the host name of the computer where the server is installed. Now we consider some detailed issues of the SQL dialect: 1. Data description SQL statements manipulate the following data elements: constants (numbers and strings), parameters (variables of a host C program), columns, tables, and user defined schemes. The identifiers used to name these elements may be represented in traditional form ([A-Za-z][A-Za-z0-9]*, as in A0z9) or as double quoted strings ("This is an identifier too"). Names of host variables should also satisfy to naming rules of C. GNU SQL does not restrict the length of host variable names but any name of table, column, user-defined schema should not be longer than 18 characters. Literal strings of characters are restricted by single quotas ('this is an example of such literal') and may contain any characters in the character set. Strings are sorted according to character codes; thus, if KOI-8 - Russian extension of ASCII - is in use than sorting will not always provide a desirable ordering. Parameters and columns of any type and expressions constructed from them may have a NULL value in addition to all usual values of that data type. In indexes, the NULL value is considered to be less than every non-NULL value of the same type. To fetch into a host variable a column or expression that may have NULL-value, use additional indicator variable that gets a value of -1 if a value of the main variable is NULL. 2. Data types In the current version we support only those data types that may be directly mapped into data types of standard C programming language. The correspondence between SQL data types and C data types is shown below: SQL data type C data type CHAR(ACTER)([lenght]) char [lenght] INT(EGER) long SMALLINT short REAL float DOUBLE PRECISION The system also partially supports SQL FLOAT data type: depending on required precision it is mapped onto either float or double data type of C. But if too high precision is required (which may not be emulated by double) then an error message is generated. We also handle the SQL DECIMAL data type in the same way, when the scale is zero. 3. Schema manipulations and consistency constraints By definition, a user schema is the totality of all stored objects of given user: tables, views, modules (the set of compiled SQL-statements that correspond to one user's programs) and triggers. The user schema is actually created when the first object of this user is created. The SQL CREATE SCHEMA statement does not create anything itself, it just defines a context (default authority identifier) for all nested statements for table and view creation. The SQL CREATE TABLE statement specifies a table name, the column description, and the integrity constraints that should be applied to a given table. Each column is specified by its name, data type and optionally a default value. Our dialect of SQL allows three kinds of integrity constraints: primary and alternative key constraints (unique constraints), referential constraints (for foreign keys), and check constraints. In the case of referential constraints there should exist a unique constraint for an appropriate set of refered tables. All unique and check constraints are checked on each row update of a given table. Foreign key constraints are checked on each row update both of refering and referred tables. For each check or referential constraint, an appropriate trigger is created within a data base. The only function of this trigger is to check a constraint and to undo any SQL statement that violates the constraint. Explicit triggers are not yet implemented. The SQL ALTER TABLE statement in our system allows only addition of columns and/or integrity constraints. The statement DROP TABLE is successively executed only if there are references (foreign keys) to it. We support views; that is, stored named queries that are treated as tables. Our version of SQL does not support modules explicitly: there is no special language construct to create explicit modules. But implicit modules are created and maintained by the SQL compiler. 4. Data access control Each (registered) user may create objects with any authorization identifier that may differ from his/her own identifier. But only this creator (owner) may modify or drop these objects. Immediately after creation only owner has rights for any operation with this object, and all other users cannot do anything to it. But later the owner can grant all or part of his/her rights to another user(s) with SQL GRANT PRIVILEGES statement, including WITH GRANT OPTION. Any grantor may also revoke previously granted privileges with SQL REVOKE PRIVILEGES statement. Of course, REVOKE statement will revoke privileges of all users that are grantees of a grantee of the recent grantor. In the terms of the SQL-89 standard, GNU SQL supports the SELECT, INSERT, DELETE, UPDATE, and REFERENCE privileges.