THE WORLD BANK GROUP
A World Free of Poverty
Home

Banner

Technology Broadcast and Computer-based
On-line databases and services

Database Architectures

Hugh Robinson

Context:
This selection, the second of two from an introduction to database technology, discusses models—or architectures—of database structures.

Source:
Robinson, Hugh. "Database Architectures." In Relational database systems, Introduction to database technology. Milton Keynes: Open University, pp. 20-31.

Copyright:
Reproduced with permission. (Permission does not cover identifiable third party material).

2.3 Architectures

In this section we describe three models—or architectures, as they are often called—of the way in which a DBMS structures and manipulates persistent data. In examining each architecture it is important to realize that our intention is not to show what any particular DBMS does or how how it may be internally constructed. Rather it is to give you, via each architecture, a particular way of thinking about what a DBMS does. This will give you a greater insight into the data definition and data manipulation functions of a DBMS, and will also illuminate aspects of the other functions. We begin with an architecture for the way in which a DBMS structures data.

2.3.1 A schema architecture

So far, we have discussed the data definition function in terms of a DBMS having a single schema for the description of the persistent data of a database system. This is a useful simplification, but matters are really a little more complex. The schema architecture12 we now describe explores this complexity, and relates it back to the benefits that accrue from the database approach. The schema architecture we shall discuss is illustrated in Figure 2.10.

To begin with, you should note that the various solid lines in Figure 2.10 indicate that certain components in the architecture related to each other, in a manner which will be discussed shortly. In particular, note that the lines are not intended to indicate processing flow. As you can see, our simplification of a single schema has been replaced by three kinds of schema: logical schema, storage schema and userschema. The purpose of these three kinds of schema will be explained shortly. We have also included, at the top of the diagram, various user processes. The term user process is introduced to cover access to a database system by any kind of software on behalf of a user. We shall now explore these various concepts in detail.

  • A logical schema is the single, central definition of the logical properties of the data in a particular database. It defines what the data is, rather than how it may be stored and accessed. For any given database system there will be one logical schema. It is a definition of the properties of types of data, which then determines the properties of occurrences (instances) of data in a database. It is defined in a formal language that can be processed by a machine. That language is sometimes referred to as the logical schema data definition language (or logical schema DDL).

    At this stage, it is reasonable for you to ask what a relational logical schema would look like. The detail of the answer to this question is given in Books 2 and 4, where you will learn about logical schema DDLs for relational databases. However, roughly speaking, the effect of creating a relational schema is to create the various headings of the tables that comprise a particular relational database system. So, the University relational schema will consist of definitions of the headings of the tables that make up the University relational database, such as the headings of the two tables in Figure 2.8. It is important to note that the definition of these headings will include constraints on the data occurrences that can appear under those headings. By way of illustration, the definition of the StudentId column in the Student table will ensure that only valid, unique student identification numbers can appear as occurrences in that column (as opposed to, say, student names). The col umn headings of the Student table are the 'properties of the types of data', whilst the individual rows are the occurrences of data' which are in accordance with the properties.

  • A userschema is a definition of the logical properties of the data required by some user, or group of users, of the database, to be referenced by a user process for that user. As with the logical schema, userschemas are definitions of the properties of types of data and are defined in a form that can be processed by a machine. Since there will typically be many users of a database, there will be many userschemas, each based on the same single logical schema. The only way a user may access the database is via a userschema. A user process may use only one userschema. Typically, several user processes may require the same userschema—view of data—and may therefore share the same userschema.

    In any given database system, the structure of a userschema may differ from that of the underlying logical schema. For example, in terms of the relational approach, the userschema may omit certain tables that are defined in the schema or may omit columns from those tables that it does include. More radically, it may define a (userschema) table that is made up from data from more than one table defined in the schema. Therefore, a mapping defines how the data described by a userschema may be derived (produced) from the data described by a logical schema. Each userschema has an associated mapping.

    An example of a relational userschema and some associated instances of data is depicted in Figure 2.11.

    StudentsWithCounsellors
    StudentId Name Registered CounsellorName Region
    s01 Akeroyd 1988 Jennings 3
    s02 Thompson 1993 Heathcote 4
    s05 Ellis 1992 Heathcote 4
    s07 Gillies 1991 Jennings 3
    s09 Reeves 1993 Heathcote 4
    s10 Urbach 1992 Heathcote 4
    Figure 2.11 A relational userschema

    The userschema, which consists of a single table named StudentsWithCounsellors, meets the needs of a (hypothetical) group of users who require data about students but need the name of a student's counsellor rather than the staff number. Consequently, the (userschema) table StudentsWithCounsellors is derived from the two underlying (schema) tables of Student and Staff. Since StudentsWithCounsellors is a table, its derivation from Student and Staff—the mapping—can be described in terms of statements in a relational DML.13 The DBMS will not store the data defined in such a userschema table as StudentsWithCounsellors. Rather, it will derive the data, as needed, using the mapping.14

  • Each user process includes a description, in a DML, of some manipulation of the data defined by the userschema to which the user process refers. This manipulation is achieved by means of statements that describe the insertion, deletion, amendment and retrieval of data required by the user process. These statements act on the data defined by the userschema, but the mapping for the userschema can be thought of as translating these statements into actions on the data defined by the underlying logical schema. Although a user process does not include the relevant userschema (rather, it references it), it does include data structures, corresponding to those of the userschema, that allow the transfer of data to and from the userschema. This is necessary in order for the user process to assemble new data for insertion and examine previously stored data that has been retrieved.

  • A storage schema is a definition of how the data defined in a logical schema is stored and accessed. It is defined in a language sometimes referred to as the storage schema data definition language (or storage schema DDL). For our purposes, there is one storage schema for any given database system, just as there is one logical schema. The detail of the structures defined in a storage schema are outside the scope of this course,15 but it is helpful to think in terms of stored files, file organizations and access methods. The data in each table described in a logical schema needs to be physically stored in some fashion. This is achieved, roughly speaking, by allocating the data in each table to a stored file, which will be managed by the computer's operating system. Each stored file will be stored according to some file organization (such as ordered, indexed or hashed) that will support an access method (such as sequential or direct) by which the stored records comprising the file may be stored and retrieved. As you can appreciate, a storage schema employs techniques from the mass storage technology used in computer systems.

    Another mapping defines how the data defined in a logical schema is implemented by the storage and access structures in a storage schema.

Figure 2.10 shows two broken lines: one running through the user processes, the other running between the storage schema and the stored database. These broken lines indicate the boundaries between the various schemas and the other components in a database system.

The upper broken line—through the user processes—indicates that they are not properly part of the schema architecture in their entirety. The fact that they employ statements that act on data described by schemas, and include data structures that correspond to a userschema, implies that they are part of the schema architecture. However, the fact that a user process includes much that is not directly relevant to schema architecture—computation on data before and after any access of the database and mechanisms for input by, and output to, the user—implies that they are not completely part of the schema architecture.

The lower broken line—between the storage schema and the stored database—indicates that the data occurrences in the stored database are not part of the schema architecture. As we have emphasized, the various schemas are definitions in terms of types, whereas the stored database is merely a collection of occurrences and is not a definition in the sense that we have been using the term.

It is important to appreciate the primary role played by the logical schema in the schema architecture. The logical schema is the central definition of the structure and semantics of the data in a stored database; it is the basis for providing the various userschemas. It defines the types of data on which, ultimately, the statements in user processes will act, and it defines the types of data which, ultimately, will be stored in the database in accordance with the storage and access structures defined in a storage schema.

SAQ 2.5 For any given database, how many userschemas, how many logical schemas and how many storage schemas can there be?

Solution There can be many userschemas, but only one logical schema and only one storage schema.

SAQ 2.6

  • How many userschemas may a user process refer to?
  • How many user processes may refer to a single userschema?

Solution

  • A user process refers to only one userschema.
  • Many user processes may refer to the same userschema.

SAQ 2.7 Which component plays the primary role in the schema architecture?

Solution The logical schema. Both the userschemas and the storage schema are mapped from the logical schema.

We can now usefully review the benefits that accrue from the database approach in the light of this schema architecture.

Sharing of data by applications. The userschema component of the architecture provides each application with a view of the appropriate data, whilst the associated mapping between a userschema and a logical schema defines how this application-oriented view of the persistent data is related to the data described by a logical schema. Furthermore, controlled redundancy can be introduced. The relational userschema depicted in Figure 2.11 repeats the name of a counsellor (as a CounsellorName value) for each student of that counsellor. However, this redundancy is controlled, in the sense that each repetition of a CounsellorName value is a replica of the Name value which is held just once in the underlying (logical schema) Staff table, as depicted in Figure 2.8. Provided that any change to a counsellor's name is effected by changing the Name value in this underlying Staff table, then the mapping ensures that subsequent use of the userschema shows that the change is propagated to all replicas.

Better applications development. In our discussion of the database approach in Section 2.1.2 we stated that better applications development was possible if constraints were firmly under the control of the DBMS. The architecture allows this. Constraints are expressed centrally in a logical schema, with the mapping between a userschema and the schema ensuring that the constraint is effective for data manipulated by a user process.

Ability to respond to change. We are now in a position to appreciate how the impact on user processes of changes to the structure of data can be minimized. In Section 2.1.2 we described the state of affairs for a file-based system, in which a change to the structure of data required a change to application program code. This state of affairs is known as data dependence: knowledge of the complete logical structure of the data and its physical storage and access is built into the logic and code of application programs. In contrast, the schema architecture shows that a DBMS can provide data independence in two ways: physical data independence and logical data independence.

Physical data independence: changes in the physical storage of data or in the access methods to data do not result in a need to change application programs.

Physical data independence is an outcome of the separation of the logical schema from the storage schema. Any change in physical storage or access method is a change to the storage schema. The logical schema is insulated from changes to the storage schema by virtue of the mapping between the two: it is the mapping that may require change and not the logical schema. If the logical schema does not change, then there is no need for application programs to change.

Logical data independence: changes to the logical structure of the data should not result in a need to change existing application programs.

Logical data independence is an outcome of the separation of the userschemas from the logical schema. A change in the logical structure is a change to the logical schema, either as a result of a new application requiring new data structures or an existing application changing its data requirements so that existing data structures need to be modified. However, the userschemas of other existing application programs are insulated from changes to the logical schema by virtue of the mapping between the two: it is the mapping that may require change and not the userschemas. If existing userschemas do not change, then there is no need for existing application programs to change.

Central control. The central position of the DBMS is reflected by the primary role played by the logical schema in the schema architecture. Constraints and access control mechanisms expressed in the schema apply both to the stored data (as described by the storage schema) and to the use made of the data by applications (as described in the userschemas).

We shall now examine this central role of the DBMS from another aspect: an architecture for the way in which a DBMS processes data.

2.3.2 A processing architecture

Before describing the processing architecture, we need to introduce the idea of a database control system (or DBCS). A database control system is that part of a DBMS which enables and controls access to a database. Many of the functions we have given in Section 2.2.1 are provided by additional components of a DBMS, which in turn use the DBCS. The DBCS receives all requests to access or manipulate data in the database. A more detailed appreciation of the nature and role of the DBCS can be gained from considering the processing architecture.

A simplifying aspect of the processing architecture that we are about to describe is that the use of a DBMS is considered in two distinct phases. The first phase concerns the establishment of the schemas; the second phase concerns the use of the database for retrieval and update.

The first phase covers database definition on the basis of given data requirements and is shown in Figure 2.12. This phase begins with the specification, in the appropriate DDL, of the various schemas and associated mappings. The schema definitions—the source schema—then have to be processed by a module of the DBMS, which creates a stored form of the schemas. This module provides the DBMS data definition function and may be referred to as a schema processor. The schema processor checks the syntax and consistency of the definitions and, if they are acceptable, creates an encoded form—referred to as the object schema. The object schema needs to be kept for regular use, so the schema processor uses the database control system to store this form of the schema. The object schema that is stored consists of the userschemas, the logical schema, the storage schema and their associated mappings.

The second phase of the processing architecture, shown in Figure 2.13 (overleaf), covers database access for retrieval and update by various types of users. Only two methods of database access are considered in this phase of the model: access by means of application programs and access by means of a query language.

In general, there may be any number of application programs that can access a particular database, each for a specific purpose. We are not concerned with the details of how these programs may be written; it is enough to know that embedded within each program are requests to access some part of a database defined by a userschema. These requests are expressed using a DML and, for the purposes of the processing model, it is referred to as an embedded language. DML statements are embedded in the statements written in the programming language used for the application program. The programming language is said to host the DML. When an embedded language statement within an application program is executed, the DBCS is invoked to perform the required access to the database. For our purposes, we can note three factors about this method of processing. One, the user of an application program does not issue requests to the DBCS. Rather, the user provides data, in a fashion suited to the need s of the user, and it is the application program that issues requests to the DBCS, on the basis of the data input by the user. Two, the user does not receive data directly from the DBCS. Rather, the user receives data, in a fashion suited to the needs of the user, from the application program (which has, of course, received it from the DBCS). Three, each application program is designed for a specific purpose.

The alternative way of accessing a database, as shown in Figure 2.13, is to use a query language. Despite the connotations of the name, query languages may be used for retrieval from, or the updating of, a database. Users can make use of a query language via a query processor16—a program which interacts with users to process their query statements requesting database access. Query statements are expressed in terms of a DML and are issued directly by the user. A query language user may give any query statement to the query processor, and the statement is executed by invoking the database control system to perform the required database access, with the response coming back directly to the user. A query processor is written for use with a particular DBMS but it is a general-purpose program that may be used to access any database defined using the DDL for that DBMS (in contrast to an application program which is written to access a specific datab ase in a predetermined way).

Access to a database by means of application programs or a query processor is controlled by the DBCS. The DBCS is another general-purpose program, able to be used for any database defined by the appropriate DDL. The way it processes any request to access a database is determined bv the definitions of the database given in the object schema. Thus any access to a database requires the corresponding stored object schema (i.e. the stored userschemas, logical schema, storage schema and mappings) to be available to the DBCS.

Exercise 2.2
Why should you expect the object schema to be different from the corresponding source schema defined using a DDL?
Exercise 2.3
Describe the differences in the ranges of data which may be accessed using an application program and using a query language.

It is important to emphasize that our two-phase processing architecture is a simplified model of database processing. For completeness, therefore, we conclude this subsection with a brief discussion of two of the simplifications we have made.

First, the architecture overlooks the fact that, once source schemas have been processed, appropriate physical storage structures need to be created before the database can be used—these structures are essentially the containers in which data can be stored. This process may follow automatically after schema processing, or can be part of a process which includes loading existing files of data into the database.

Second, although it is not included in the architecture, you need to be aware that the operating system that forms part of the database environment also has a role to play in database processing. The operating system's role is to provide the input and output functions that a DMBS needs in order to put data on the disks used for storage. There are significant variations in these functions for different operating systems, and in the way different DBMSs make use of them. This results in many differences in the role and specification of storage schemas, and so the details are not considered in this course. General principles, however, do need to be established, and so part of the processing architecture is elaborated in the next subsection to show the use of an operating system and a simple architecture for storage.

2.3.3 An architecture for storage

A computer's operating system stores data for a DBMS using files which it controls. In general, these files are kept on large hard disks containing many millions of bytes, but the same principles apply to files on floppy disks. Data is transferred to and from a disk in a block (also called a page), which is of a specified size, e.g. 1K or 4K bytes. Blocks must be organized in such a way that individual blocks can be stored and retrieved as required. The role of the operating system in processing blocks of data is represented in Figure 2.14.

Figure 2.14 only shows the way the database control system makes use of the operating system, but it should be remembered that the database control system also needs the schemas. It is the storage schema, together with implementation details of the particular DBMS, which determines the actual form of the data which the DBMS gives to the operating system as a stored record. A stored record contains data from one or more logical records together with various types of additional data concerned with storage and retrieval, such as a DBMS label for the related logical record(s), its placement in a block, and pointers to any other stored records to which it may be related.

When the operating system receives a stored record it is stored according to some file organization, and later may be retrieved by an access method. An operating system may provide a number of file organizations for the DBMS to use; for each record type, a file organization is either specified within the storage schema or automatically chosen by the DBMS implementation. A file organization provides a means of placing the stored records on disk, and one or more associated access methods provide a way of retrieving such records.



The World Bank Site
The World Bank Site

Last Updated: April 1999