2.3 Architectures
In this section we describe three modelsor architectures, as they are
often calledof the way in which a DBMS structures and manipulates persistent data.
In examining each architecture it is important to realize that our intention is not to
show what any particular DBMS does or how how it may be internally constructed. Rather it
is to give you, via each architecture, a particular way of thinking about what a DBMS
does. This will give you a greater insight into the data definition and data manipulation
functions of a DBMS, and will also illuminate aspects of the other functions. We begin
with an architecture for the way in which a DBMS structures data.
2.3.1 A schema architecture
So far, we have discussed the data definition function in terms of a DBMS having a
single schema for the description of the persistent data of a database system. This is a
useful simplification, but matters are really a little more complex. The schema
architecture12 we now describe explores this complexity,
and relates it back to the benefits that accrue from the database approach. The schema
architecture we shall discuss is illustrated in Figure 2.10.
To begin with, you should note that the various solid lines in Figure 2.10 indicate
that certain components in the architecture related to each other, in a manner which will
be discussed shortly. In particular, note that the lines are not intended to indicate
processing flow. As you can see, our simplification of a single schema has been replaced
by three kinds of schema: logical schema, storage schema and userschema. The purpose of
these three kinds of schema will be explained shortly. We have also included, at the top
of the diagram, various user processes. The term user process is introduced to
cover access to a database system by any kind of software on behalf of a user. We shall
now explore these various concepts in detail.
- A logical schema is the single, central definition of the logical properties of
the data in a particular database. It defines what the data is, rather than how it may be
stored and accessed. For any given database system there will be one logical schema. It is
a definition of the properties of types of data, which then determines the properties of
occurrences (instances) of data in a database. It is defined in a formal language that can
be processed by a machine. That language is sometimes referred to as the logical schema
data definition language (or logical schema DDL).
At this stage, it is
reasonable for you to ask what a relational logical schema would look like. The detail of
the answer to this question is given in Books 2 and 4, where you will learn about logical
schema DDLs for relational databases. However, roughly speaking, the effect of creating a
relational schema is to create the various headings of the tables that comprise a
particular relational database system. So, the University relational schema will consist
of definitions of the headings of the tables that make up the University relational
database, such as the headings of the two tables in Figure 2.8. It is important to note
that the definition of these headings will include constraints on the data occurrences
that can appear under those headings. By way of illustration, the definition of the StudentId
column in the Student table will ensure that only valid, unique student
identification numbers can appear as occurrences in that column (as opposed to, say,
student names). The col umn headings of the Student table are the 'properties of
the types of data', whilst the individual rows are the occurrences of data' which are in
accordance with the properties.
- A userschema is a definition of the logical properties of the data required by
some user, or group of users, of the database, to be referenced by a user process for that
user. As with the logical schema, userschemas are definitions of the properties of types
of data and are defined in a form that can be processed by a machine. Since there will
typically be many users of a database, there will be many userschemas, each based on the
same single logical schema. The only way a user may access the database is via a
userschema. A user process may use only one userschema. Typically, several user processes
may require the same userschemaview of dataand may therefore share the same
userschema.
In any given database system, the structure of a userschema may differ from
that of the underlying logical schema. For example, in terms of the relational approach,
the userschema may omit certain tables that are defined in the schema or may omit columns
from those tables that it does include. More radically, it may define a (userschema) table
that is made up from data from more than one table defined in the schema. Therefore, a mapping
defines how the data described by a userschema may be derived (produced) from the data
described by a logical schema. Each userschema has an associated mapping.
An example of a relational userschema and some associated instances of data is depicted
in Figure 2.11.
| StudentsWithCounsellors |
| StudentId |
Name |
Registered |
CounsellorName |
Region |
| s01 |
Akeroyd |
1988 |
Jennings |
3 |
| s02 |
Thompson |
1993 |
Heathcote |
4 |
| s05 |
Ellis |
1992 |
Heathcote |
4 |
| s07 |
Gillies |
1991 |
Jennings |
3 |
| s09 |
Reeves |
1993 |
Heathcote |
4 |
| s10 |
Urbach |
1992 |
Heathcote |
4 |
| Figure 2.11 A relational userschema |
The userschema, which consists of a single table named StudentsWithCounsellors,
meets the needs of a (hypothetical) group of users who require data about students but
need the name of a student's counsellor rather than the staff number. Consequently, the
(userschema) table StudentsWithCounsellors is derived from the two underlying
(schema) tables of Student and Staff. Since StudentsWithCounsellors
is a table, its derivation from Student and Staffthe mappingcan
be described in terms of statements in a relational DML.13
The DBMS will not store the data defined in such a userschema table as StudentsWithCounsellors.
Rather, it will derive the data, as needed, using the mapping.14
- Each user process includes a description, in a DML, of some manipulation of the
data defined by the userschema to which the user process refers. This manipulation is
achieved by means of statements that describe the insertion, deletion, amendment and
retrieval of data required by the user process. These statements act on the data defined
by the userschema, but the mapping for the userschema can be thought of as translating
these statements into actions on the data defined by the underlying logical schema.
Although a user process does not include the relevant userschema (rather, it
references it), it does include data structures, corresponding to those of the userschema,
that allow the transfer of data to and from the userschema. This is necessary in order for
the user process to assemble new data for insertion and examine previously stored data
that has been retrieved.
- A storage schema is a definition of how the data defined in a logical schema is
stored and accessed. It is defined in a language sometimes referred to as the storage
schema data definition language (or storage schema DDL). For our purposes,
there is one storage schema for any given database system, just as there is one
logical schema. The detail of the structures defined in a storage schema are outside the
scope of this course,15 but it is helpful to think in
terms of stored files, file organizations and access methods. The data in each table
described in a logical schema needs to be physically stored in some fashion. This is
achieved, roughly speaking, by allocating the data in each table to a stored file,
which will be managed by the computer's operating system. Each stored file will be stored
according to some file organization (such as ordered, indexed or hashed) that will
support an access method (such as sequential or direct) by which the stored
records comprising the file may be stored and retrieved. As you can appreciate, a
storage schema employs techniques from the mass storage technology used in computer
systems.
Another mapping defines how the data defined in a logical schema is
implemented by the storage and access structures in a storage schema.
Figure 2.10 shows two broken lines: one running through the user processes, the other
running between the storage schema and the stored database. These broken lines indicate
the boundaries between the various schemas and the other components in a database system.
The upper broken linethrough the user processesindicates that they are not
properly part of the schema architecture in their entirety. The fact that they employ
statements that act on data described by schemas, and include data structures that
correspond to a userschema, implies that they are part of the schema architecture.
However, the fact that a user process includes much that is not directly relevant to
schema architecturecomputation on data before and after any access of the database
and mechanisms for input by, and output to, the userimplies that they are not
completely part of the schema architecture.
The lower broken linebetween the storage schema and the stored
databaseindicates that the data occurrences in the stored database are not part of
the schema architecture. As we have emphasized, the various schemas are definitions in
terms of types, whereas the stored database is merely a collection of occurrences and is
not a definition in the sense that we have been using the term.
It is important to appreciate the primary role played by the logical schema in the
schema architecture. The logical schema is the central definition of the structure and
semantics of the data in a stored database; it is the basis for providing the various
userschemas. It defines the types of data on which, ultimately, the statements in user
processes will act, and it defines the types of data which, ultimately, will be stored in
the database in accordance with the storage and access structures defined in a storage
schema.
SAQ 2.5 For any given database, how many userschemas, how many logical schemas
and how many storage schemas can there be?
Solution There can be many userschemas, but only one logical schema and only one
storage schema.
SAQ 2.6
- How many userschemas may a user process refer to?
- How many user processes may refer to a single userschema?
Solution
- A user process refers to only one userschema.
- Many user processes may refer to the same userschema.
SAQ 2.7 Which component plays the primary role in the schema architecture?
Solution The logical schema. Both the userschemas and the storage schema are
mapped from the logical schema.
We can now usefully review the benefits that accrue from the database approach in the
light of this schema architecture.
Sharing of data by applications. The userschema component of the architecture
provides each application with a view of the appropriate data, whilst the associated
mapping between a userschema and a logical schema defines how this application-oriented
view of the persistent data is related to the data described by a logical schema.
Furthermore, controlled redundancy can be introduced. The relational userschema depicted
in Figure 2.11 repeats the name of a counsellor (as a CounsellorName value) for
each student of that counsellor. However, this redundancy is controlled, in the sense that
each repetition of a CounsellorName value is a replica of the Name value
which is held just once in the underlying (logical schema) Staff table, as depicted
in Figure 2.8. Provided that any change to a counsellor's name is effected by changing the
Name value in this underlying Staff table, then the mapping ensures that
subsequent use of the userschema shows that the change is propagated to all replicas.
Better applications development. In our discussion of the database approach in
Section 2.1.2 we stated that better applications development was possible if constraints
were firmly under the control of the DBMS. The architecture allows this. Constraints are
expressed centrally in a logical schema, with the mapping between a userschema and the
schema ensuring that the constraint is effective for data manipulated by a user process.
Ability to respond to change. We are now in a position to appreciate how the
impact on user processes of changes to the structure of data can be minimized. In Section
2.1.2 we described the state of affairs for a file-based system, in which a change to the
structure of data required a change to application program code. This state of affairs is
known as data dependence: knowledge of the complete logical structure of the data
and its physical storage and access is built into the logic and code of application
programs. In contrast, the schema architecture shows that a DBMS can provide data
independence in two ways: physical data independence and logical data
independence.
Physical data independence: changes in the physical storage of data or in the
access methods to data do not result in a need to change application programs.
Physical data independence is an outcome of the separation of the logical schema from
the storage schema. Any change in physical storage or access method is a change to the
storage schema. The logical schema is insulated from changes to the storage schema by
virtue of the mapping between the two: it is the mapping that may require change and not
the logical schema. If the logical schema does not change, then there is no need for
application programs to change.
Logical data independence: changes to the logical structure of the data should
not result in a need to change existing application programs.
Logical data independence is an outcome of the separation of the userschemas from the
logical schema. A change in the logical structure is a change to the logical schema,
either as a result of a new application requiring new data structures or an existing
application changing its data requirements so that existing data structures need to be
modified. However, the userschemas of other existing application programs are insulated
from changes to the logical schema by virtue of the mapping between the two: it is the
mapping that may require change and not the userschemas. If existing userschemas do not
change, then there is no need for existing application programs to change.
Central control. The central position of the DBMS is reflected by the primary
role played by the logical schema in the schema architecture. Constraints and access
control mechanisms expressed in the schema apply both to the stored data (as described by
the storage schema) and to the use made of the data by applications (as described in the
userschemas).
We shall now examine this central role of the DBMS from another aspect: an architecture
for the way in which a DBMS processes data.
2.3.2 A processing architecture
Before describing the processing architecture, we need to introduce the idea of a database
control system (or DBCS). A database control system is that part of a DBMS
which enables and controls access to a database. Many of the functions we have given in
Section 2.2.1 are provided by additional components of a DBMS, which in turn use the DBCS.
The DBCS receives all requests to access or manipulate data in the database. A more
detailed appreciation of the nature and role of the DBCS can be gained from considering
the processing architecture.
A simplifying aspect of the processing architecture that we are about to describe is
that the use of a DBMS is considered in two distinct phases. The first phase concerns the
establishment of the schemas; the second phase concerns the use of the database for
retrieval and update.
The first phase covers database definition on the basis of given data
requirements and is shown in Figure 2.12. This phase begins with the specification, in the
appropriate DDL, of the various schemas and associated mappings. The schema
definitionsthe source schemathen have to be processed by a module of
the DBMS, which creates a stored form of the schemas. This module provides the DBMS data
definition function and may be referred to as a schema processor. The schema
processor checks the syntax and consistency of the definitions and, if they are
acceptable, creates an encoded formreferred to as the object schema. The
object schema needs to be kept for regular use, so the schema processor uses the database
control system to store this form of the schema. The object schema that is stored consists
of the userschemas, the logical schema, the storage schema and their associated mappings.
The second phase of the processing architecture, shown in Figure 2.13
(overleaf), covers database access for retrieval and update by various types of users.
Only two methods of database access are considered in this phase of the model: access by
means of application programs and access by means of a query language.
In general, there may be any number of application programs that can access a
particular database, each for a specific purpose. We are not concerned with the details of
how these programs may be written; it is enough to know that embedded within each program
are requests to access some part of a database defined by a userschema. These requests are
expressed using a DML and, for the purposes of the processing model, it is referred to as
an embedded language. DML statements are embedded in the statements written in the
programming language used for the application program. The programming language is said to
host the DML. When an embedded language statement within an application program is
executed, the DBCS is invoked to perform the required access to the database. For our
purposes, we can note three factors about this method of processing. One, the user
of an application program does not issue requests to the DBCS. Rather, the user provides
data, in a fashion suited to the need s of the user, and it is the application program
that issues requests to the DBCS, on the basis of the data input by the user. Two,
the user does not receive data directly from the DBCS. Rather, the user receives data, in
a fashion suited to the needs of the user, from the application program (which has, of
course, received it from the DBCS). Three, each application program is designed for a
specific purpose.
The alternative way of accessing a database, as shown in Figure 2.13, is to use a query
language. Despite the connotations of the name, query languages may be used for
retrieval from, or the updating of, a database. Users can make use of a query
language via a query processor16a program
which interacts with users to process their query statements requesting database access.
Query statements are expressed in terms of a DML and are issued directly by the user. A
query language user may give any query statement to the query processor, and the statement
is executed by invoking the database control system to perform the required database
access, with the response coming back directly to the user. A query processor is written
for use with a particular DBMS but it is a general-purpose program that may be used to
access any database defined using the DDL for that DBMS (in contrast to an application
program which is written to access a specific datab ase in a predetermined way).
Access to a database by means of application programs or a query processor is
controlled by the DBCS. The DBCS is another general-purpose program, able to be used for
any database defined by the appropriate DDL. The way it processes any request to access a
database is determined bv the definitions of the database given in the object schema. Thus
any access to a database requires the corresponding stored object schema (i.e. the stored
userschemas, logical schema, storage schema and mappings) to be available to the DBCS.
Exercise 2.2
Why should you expect the object schema to be different from the corresponding source
schema defined using a DDL? |
Exercise 2.3
Describe the differences in the ranges of data which may be accessed using an application
program and using a query language. |
It is important to emphasize that our two-phase processing architecture is a simplified
model of database processing. For completeness, therefore, we conclude this subsection
with a brief discussion of two of the simplifications we have made.
First, the architecture overlooks the fact that, once source schemas have been
processed, appropriate physical storage structures need to be created before the database
can be usedthese structures are essentially the containers in which data can be
stored. This process may follow automatically after schema processing, or can be part of a
process which includes loading existing files of data into the database.
Second, although it is not included in the architecture, you need to be aware
that the operating system that forms part of the database environment also has a
role to play in database processing. The operating system's role is to provide the input
and output functions that a DMBS needs in order to put data on the disks used for storage.
There are significant variations in these functions for different operating systems, and
in the way different DBMSs make use of them. This results in many differences in the role
and specification of storage schemas, and so the details are not considered in this
course. General principles, however, do need to be established, and so part of the
processing architecture is elaborated in the next subsection to show the use of an
operating system and a simple architecture for storage.
2.3.3 An architecture for storage
A computer's operating system stores data for a DBMS using files which it controls. In
general, these files are kept on large hard disks containing many millions of bytes, but
the same principles apply to files on floppy disks. Data is transferred to and from a disk
in a block (also called a page), which is of a specified size, e.g. 1K or 4K
bytes. Blocks must be organized in such a way that individual blocks can be stored and
retrieved as required. The role of the operating system in processing blocks of data is
represented in Figure 2.14.
Figure 2.14 only shows the way the database control system makes use of the operating
system, but it should be remembered that the database control system also needs the
schemas. It is the storage schema, together with implementation details of the particular
DBMS, which determines the actual form of the data which the DBMS gives to the operating
system as a stored record. A stored record contains data from one or more logical
records together with various types of additional data concerned with storage and
retrieval, such as a DBMS label for the related logical record(s), its placement in a
block, and pointers to any other stored records to which it may be related.
When the operating system receives a stored record it is stored according to some file
organization, and later may be retrieved by an access method. An operating
system may provide a number of file organizations for the DBMS to use; for each record
type, a file organization is either specified within the storage schema or automatically
chosen by the DBMS implementation. A file organization provides a means of placing the
stored records on disk, and one or more associated access methods provide a way of
retrieving such records. |