1 of 14

Database Design

Articles about the database design process

Database Design Example Phase 1: Analysis

This article follows on from Database Design Phase 6: Maintenance.

Real-world example: creating a publishing tracking system

Now let's walk through the database design process with a step-by-step example. The Poet's Circle is a publisher that publishes poetry and poetry anthologies. It is keen to develop a new system that tracks poets, poems, anthologies and sales. The following sections show the steps taken from the initial analysis to the final, working database.

Poet's circle database phase 1: analysis

The following information is gleaned from speaking to the various stakeholders at Poet's Circle. They want to develop a database system to track the poets they have recorded, the poems they write, the publications they appear in, as well as the sales to customers that these publications make.

The designer asks various questions to get more detailed information, such as "What is a poet, as far as the system goes? Does Poet's Circle keep track of poets even if they haven't written or published poems? Are publications recorded even before there are any associated poems? Does a publication consist of one poem, or many? Are potential customer's details recorded?" The following summarizes the responses in our example:

Poet's Circle is a publisher that bases its choices of publications on an active poetry community on its website. If enough of the community wants a poem published, Poet's Circle will do so.
A poet can be anybody who wants to be a poet, not necessarily someone who has a poem captured in the system or someone who has even written a poem.
Poems can be submitted through a web interface, by email or on paper.
All captured poems are written by an associated poet, whose details are already in the system. There can be no poems submitted and stored without a full set of details of the poet.

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Example Phase 2: Design

This article follows on from Database Design Example Phase 1: Analysis.

Based on the provided information, you can begin your logical design and should be able to identify the initial entities:

Poet
Poem
Publication
Sale
Customer

The Poet's Circle is not an entity, or even of instance an a publisher entity. Only if the system were developed for many publishers would publisher be a valid entity.

Neither website nor poetry community are entities. There is only one website, and anyway, a website is merely a means of producing data to populate the database. There is also only one poetry community as far as this system is concerned, and there is not much you'd want to store about it.

Next, you need to determine the relationship between these entities. You can identify the following:

A poet can write many poems. The analysis identified the fact that a poet can be stored in the system even if there are no associated poems. Poems may be captured at a later point in time, or the poet may still be a potential poet. Conversely, many poets could conceivably write a poem, though the poem must have been written by at least one poet.
A publication may contain many poems (an anthology) or just one. It can also contain no poems (poetry criticism for example). A poem may or may not appear in a publication.
A sale must be for at least one publication, but it may be for many. A publication may or may not have made any sales.

You can identify the following attributes:

Poet: first name, surname, address, email address
Poem: poem title, poem contents
Publication: title, price
Sales: date, amount

Based on these entities and relationships, you can construct the entity-relationship diagram shown below:

There are two many-to-many relationships in the figure above. These need to be converted into one-to-many relationships before you can implement them in a DBMS. After doing so, the intersection entities poem-publication and sale-publication are created.

Now, to begin the logical and physical design, you need to add attributes that can create the relationship between the entities and specify primary keys. You do what's usually best, and create new, unique, primary keys. The following tables show the structures for the tables created from each of the entities:

Poet table

Field

Definition

Poem table

Field

Definition

Poem-publication table

Field

Definition

Publication table

Field

Definition

Sale-publication table

Field

Definition

Sale table

Field

Definition

Customer table

Field

Definition

MariaDB will have no problem with this, and is selected as the DBMS. Existing hardware and operating system platforms are also selected. The following section looks at the implementation and the SQL statements used to create the MariaDB tables.

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Example Phase 3: Implementation

This article follows on from Database Design Example Phase 2: Design.

With the design complete, it's time to and run the statements, as follows:

CREATE DATABASE poets_circle;

CREATE TABLE poet (
  poet_code INT NOT NULL, 
  first_name VARCHAR(30),
  surname VARCHAR(40), 
  address VARCHAR(100), 
  postcode VARCHAR(20),
  email VARCHAR(254), 
  PRIMARY KEY(poet_code)
);

CREATE TABLE poem(
  poem_code INT NOT NULL, 
  title VARCHAR(50),
  contents TEXT, 
  poet_code INT NOT NULL, 
  PRIMARY KEY(poem_code),
  INDEX(poet_code), 
  FOREIGN KEY(poet_code) REFERENCES poet(poet_code) 
);

CREATE TABLE publication(
  publication_code INT NOT NULL,
  title VARCHAR(100),
  price MEDIUMINT UNSIGNED,
  PRIMARY KEY(publication_code)
);

CREATE TABLE poem_publication(
  poem_code INT NOT NULL,
  publication_code INT NOT NULL, 
  PRIMARY KEY(poem_code, publication_code), 
  INDEX(publication_code),
  FOREIGN KEY(poem_code) REFERENCES poem(poem_code),
  FOREIGN KEY(publication_code) REFERENCES publication(publication_code)
);

CREATE TABLE sales_publication(
  sales_code INT NOT NULL,
  publication_code INT NOT NULL,
  PRIMARY KEY(sales_code, publication_code)
); 

CREATE TABLE customer(
  customer_code INT NOT NULL, 
  first_name VARCHAR(30), 
  surname VARCHAR(40), 
  address VARCHAR(100), 
  postcode VARCHAR(20), 
  email VARCHAR(254), 
  PRIMARY KEY(customer_code)
);

CREATE TABLE sale(
  sale_code INT NOT NULL, 
  sale_date DATE,
  amount INT UNSIGNED, 
  customer_code INT NOT NULL, 
  PRIMARY KEY(sale_code), 
  INDEX(customer_code), 
  FOREIGN KEY(customer_code) REFERENCES customer(customer_code)
);

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Example Phases 4-6: Testing, Operation and Maintenance

This article follows on from .

Once the database is ready the application programs have been rolled out, it's time for the testing to begin. While the other phases of the database lifecycle can occur reasonably independently of the systems development process, part of the testing phase is how all the components run together.

Load testing may indicate that MariaDB has not been set up to handle the expected 600 concurrent connections, and the configuration file needs to be changed. Other tests may indicate that in certain circumstances, duplicate key errors are received, as the locking mechanism is not uniformly implemented, and the application does not handle locking correctly. The application needs to be fixed. Backups also need to be tested, as well as the ability to smoothly restore from backup with a minimum of downtime.

Testing is one of the most neglected and critical phases. A designer or manager who does not properly account for testing is simply incompetent. No matter how tiny your system, make sure you allocate time for thorough testing, and time for fixing the inevitable bugs.

Once testing is complete, the system can be rolled out. You decide on a low-key rollout and give a few selected poets access to the website to upload their poems. You discover other problems. Some poets upload poems using you haven't catered for, and you need to make a few tweaks to ensure these are handled correctly.

Soon enough, the system is rolled out completely. Maintenance, though, is a never-ending task, and with the immense popularity of the system, and with large numbers of updates and deletes, the system tends to become fragmented. The administrator regularly needs to take care of this, and, of course, the inevitable disk failure leads to an all-night restore session, and much thankfulness for the ease of use of

Database Design: Overview

Databases exist because of the need to change data into information. Data are the raw and unprocessed facts. Information is obtained by processing the data into something useful. For example, millions of names and telephone numbers in a phone book are data. Information is the telephone number of the fire department when your house is burning down.

A database is a large repository of facts, designed in such a way that processing the facts into information is easy. If the phone book was structured in a less convenient way, such as with names and numbers placed in chronological order according to when the numbers were issued, converting the data into information would be much more difficult. Not knowing when the fire department was issued their latest number, you could search for days, and by the time you find the number your house would be a charred pile of ash. So, it's a good thing the phone book was designed as it was.

A database is much more flexible; a similar set of data to what's in a phone book could be ordered by MariaDB according to name, telephone number, address as well as chronologically. But databases are of course more complex, containing many different kinds of information. People, job titles and a company's products can all mingle to provide complex information. But this complexity makes the design of databases more complex as well. Poor design could make for slow queries, or it could even make certain kinds of information impossible to reach. This section of the documentation features articles about good database design, specifically:

The database lifecycle
Entity-relationship modeling
Common mistakes in database design
Real-world example: creating a publishing tracking system
Concurrency control with transactions

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Phase 1: Analysis

This article follows on from .

Your existing system can no longer cope. It's time to move on. Perhaps the existing paper system is generating too many errors, or the old Perl script based on flat files can no longer handle the load. Or perhaps an existing news database is struggling under its own popularity and needs an upgrade. This is the stage where the existing system is reviewed.

Depending on the size of the project, the designer may be an individual, responsible for the database implementation and coding, or may be a whole team of analysts. For now, the term designer will represent all these possibilities.

The following are the steps in the Analysis Phase.

Analyze the organization

Database Design Phase 2: Conceptual Design

This article follows on from Database Design Phase 1: Analysis.

The design phase is where the requirements identified in the previous phase are used as the basis to develop the new system. Another way of putting it is that the business understanding of the data structures is converted to a technical understanding. The what questions ("What data are required? What are the problems to be solved?") are replaced by the how questions ("How will the data be structured? How is the data to be accessed?")

This phase consists of three parts: the conceptual design, the logical design and the physical design. Some methodologies merge the logical design phase into the other two phases. This section is not aimed at being a definitive discussion of database design methodologies (there are whole books written on that!); rather it aims to introduce you to the topic.

Conceptual design

The purpose of the conceptual design phase is to build a conceptual model based upon the previously identified requirements, but closer to the final physical model. A commonly-used conceptual model is called an entity-relationship model.

Entities and attributes

Entities are basically people, places, or things you want to keep information about. For example, a library system may have the book, library and borrower entities. Learning to identify what should be an entity, what should be a number of entities, and what should be an attribute of an entity takes practice, but there are some good rules of thumb. The following questions can help to identify whether something is an entity:

Can it vary in number independently of other entities? For example, person height is probably not an entity, as it cannot vary in number independently of person. It is not fundamental, so it cannot be an entity in this case.
Is it important enough to warrant the effort of maintaining. For example customer may not be important for a small grocery store and will not be an entity in that case, but it will be important for a video store, and will be an entity in that case.
Is it its own thing that cannot be separated into subcategories? For example, a car-rental agency may have different criteria and storage requirements for different kinds of vehicles. Vehicle may not be an entity, as it can be broken up into car and boat, which are the entities.

The following are examples of entities involving a university with possible attributes in parentheses.

Course (name, code, course prerequisites)
Student (first_name, surname, address, age)
Book (title, ISBN, price, quantity in stock)

An instance of an entity is one particular occurrence of that entity. For example, the student Rudolf Sono is one instance of the student entity. There will probably be many instances. If there is only one instance, consider whether the entity is warranted. The top level usually does not warrant an entity. For example, if the system is being developed for a particular university, university will not be an entity because the whole system is for that one university. However, if the system was developed to track legislation at all universities in the country, then university would be a valid entity.

Relationships

Entities are related in certain ways. For example, a borrower may belong to a library and can take out books. A book can be found in a particular library. Understanding what you are storing data about, and how the data relate, leads you a large part of the way to a physical implementation in the database.

There are a number of possible relationships:

Mandatory

For each instance of entity A, there must exist one or more instances of entity B. This does not necessarily mean that for each instance of entity B, there must exist one or more instances of entity A. Relationships are optional or mandatory in one direction only, so the A-to-B relationship can be optional, while the B-to-A relationship is mandatory.

Optional

For each instance of entity A, there may or may not exist instances of entity B.

One-to-one (1:1)

This is where for each instance of entity A, there exists one instance of entity B, and vice-versa. If the relationship is optional, there can exist zero or one instances, and if the relationship is mandatory, there exists one and only one instance of the associated entity.

One-to-many (1:M)

For each instance of entity A, many instances of entity B can exist, which for each instance of entity B, only one instance of entity A exists. Again, these can be optional or mandatory relationships.

Many-to-many (M:N)

For each instance of entity A, many instances of entity B can exist, and vice versa. These can be optional or mandatory relationships.

There are numerous ways of showing these relationships. The image below shows student and course entities. In this case, each student must have registered for at least one course, but a course does not necessarily have to have students registered. The student-to-course relationship is mandatory, and the course-to-student relationship is optional.

The image below shows invoice_line and product entities. Each invoice line must have at least one product (but no more than one); however each product can appear on many invoice lines, or none at all. The invoice_line-to-product relationship is mandatory, while the product-to-invoice_line relationship is optional.

The figure below shows husband and wife entities. In this system (others are of course possible), each husband must have one and only one wife, and each wife must have one, and only one, husband. Both relationships are mandatory.

An entity can also have a relationship with itself. Such an entity is called a recursive entity. Take a person entity. If you're interested in storing data about which people are brothers, you wlll have an "is brother to" relationship. In this case, the relationship is an M:N relationship.

Conversely, a weak entity is an entity that cannot exist without another entity. For example, in a school, the scholar entity is related to the weak entity parent/guardian. Without the scholar, the parent or guardian cannot exist in the system. Weak entities usually derive their primary key, in part or in totality, from the associated entity. parent/guardian could take the primary key from the scholar table as part of its primary key (or the entire key if the system only stored one parent/guardian per scholar).

The term connectivity refers to the relationship classification.

The term cardinality refers to the specific number of instances possible for a relationship. Cardinality limits list the minimum and maximum possible occurrences of the associated entity. In the husband and wife example, the cardinality limit is (1,1), and in the case of a student who can take between one and eight courses, the cardinality limits would be represented as (1,8).

Developing an entity-relationship diagram

An entity-relationship diagram models how the entities relate to each other. It's made up of multiple relationships, the kind shown in the examples above. In general, these entities go on to become the database tables.

The first step in developing the diagram is to identify all the entities in the system. In the initial stage, it is not necessary to identify the attributes, but this may help to clarify matters if the designer is unsure about some of the entities. Once the entities are listed, relationships between these entities are identified and modeled according to their type: one-to-many, optional and so on. There are many software packages that can assist in drawing an entity-relationship diagram, but any graphical package should suffice.

Once the initial entity-relationship diagram has been drawn, it is often shown to the stakeholders. Entity-relationship diagrams are easy for non-technical people to understand, especially when guided through the process. This can help identify any errors that have crept in. Part of the reason for modeling is that models are much easier to understand than pages of text, and they are much more likely to be viewed by stakeholders, which reduces the chances of errors slipping through to the next stage, when they may be more difficult to fix.

It is important to remember that there is no one right or wrong answer. The more complex the situation, the more possible designs that will work. Database design is an acquired skill, though, and more experienced designers will have a good idea of what works and of possible problems at a later stage, having gone through the process before.

Once the diagram has been approved, the next stage is to replace many-to-many relationships with two one-to-many relationships. A DBMS cannot directly implement many-to-many relationships, so they are decomposed into two smaller relationships. To achieve this, you have to create an intersection, or composite entity type. Because intersection entities are less "real-world" than ordinary entities, they are sometimes difficult to name. In this case, you can name them according to the two entities being intersected. For example, you can intersect the many-to-many relationship between student and course by a student-course entity.

The same applies even if the entity is recursive. The person entity that has an M:N relationship "is brother to" also needs an intersection entity. You can come up with a good name for the intersection entity in this case: brother. This entity would contain two fields, one for each person of the brother relationship — in other words, the primary key of the first brother and the primary key of the other brother.

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Phase 2: Logical and Physical Design

This article follows on from Database Design Phase 2: Conceptual Design.

Overview

Once the conceptual design is finalized, it's time to convert this to the logical and physical design. Usually, the DBMS is chosen at this stage, depending on the requirements and complexity of the data structures. Strictly speaking, the logical design and the physical design are two separate stages, but are often merged into one. They overlap because most current DBMSs (including MariaDB) match logical records to physical records on disk on a 1:1 basis.

Each entity will become a database table, and each attribute will become a field of this table. Foreign keys can be created if the DBMS supports them and the designer decides to implement them. If the relationship is mandatory, the foreign key must be defined as NOT NULL, and if it's optional, the foreign key can allow nulls. For example, because of the invoice line-to-product relationship in the previous example, the product code field is a foreign key in the invoice to line table. Because the invoice line must contain a product, the field must be defined as NOT NULL. The default MariaDB storage engine, , does support foreign key constraints, but some storage engines, such as do not. The ON DELETE CASCADE and ON DELETE RESTRICT clauses are used to support foreign keys. ON DELETE RESTRICT means that records cannot be deleted unless all records associated with the foreign key are also deleted. In the invoice line-to-product case, ON DELETE RESTRICT in the invoice line table means that if a product is deleted, the deletion will not take place unless all associated invoice lines with that product are deleted as well. This avoids the possibility of an invoice line existing that points to a non-existent product. ON DELETE CASCADE achieves a similar effect, but more automatically (and more dangerously!). If the foreign key was declared with ON CASCADE DELETE, associated invoice lines would automatically be deleted if a product was deleted. ON UPDATE CASCADE is similar to ON DELETE CASCADE in that all foreign key references to a primary key are updated when the primary key is updated.

your tables is an important step when designing the database. This process helps avoid data redundancy and improves your data integrity.

Novice database designers usually make a number of common errors. If you've carefully identified entities and attributes and you've normalized your data, you'll probably avoid these errors.

Common errors

Keep unrelated data in different tables. People who are used to using spreadsheets often make this mistake because they are used to seeing all their data in one two-dimensional table. A relational database is much more powerful; don't 'hamstring' it in this way.
Don't store values you can calculate. Let's say you're interested three numbers: /A, B and the product of A and B (A*B). Don't store the product. It wastes space and can easily be calculated if you need it. And it makes your database more difficult to maintain: If you change A, you also have to change all of the products as well. Why waste your database's efforts on something you can calculate when you need it?
Does your design cater to all the conditions you've analyzed? In the heady rush of creating an entity-relationship diagram, you can easily overlook a condition. Entity-relationship diagrams are usually better at getting stakeholders to spot an incorrect rule than spot a missing one. The business logic is as important as the database logic and is more likely to be overlooked. For example, it's easy to spot that you cannot have a sale without an associated customer, but have you built in that the customer cannot be approved for a sale of less than $500 if another approved customer has not recommended them?

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Phase 3: Implementation

This article follows on from Database Design Phase 2: Logical and Physical Design.

The implementation phase is where you install the DBMS on the required hardware, optimize the database to run best on that hardware and software platform, and create the database and load the data. The initial data could be either new data captured directly or existing data imported from a MariaDB database or another DBMS. You also establish database security in this phase and give the various users that you've identified access applicable to their requirements. Finally, you also initiate backup plans in this phase.

The following are steps in the implementation phase:

Install the DBMS.
Tune the setup variables according to the hardware, software and usage conditions.
Create the database and tables.
Load the data.
Set up the users and security.
Implement the backup regime.

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Phase 4: Testing

This article follows on from Database Design Phase 3: Implementation.

The testing phase is where the performance, security, and integrity of the data are tested. Usually this will occur in conjunctions with the applications that have been developed. You test the performance under various loads conditions to see how the database handles multiple concurrent connections or high volumes of updating and reading. Are the reports generated quickly enough? For example, an application designed with the old storage engine may prove to be too slow because the impact of the updates was underestimated. The storage engine may have to be changed to in response.

Data integrity also needs to be tested, as the application may have logical flaws that result in transactions being lost or other inaccuracies. Further, security needs to be tested to ensure that users can access and change only the data they should.

The logical or physical designs may have to be modified. Perhaps new indexes are required (which the tester may discover after careful use of MariaDB's statement, for example).

The testing and fine-tuning process is an iterative one, with multiple tests performed and changes implemented.

The following are the steps in the testing phase:

Test the performance
Test the security
Test the data integrity
Fine-tune the parameters or modify the logical or physical designs in response to the tests.

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Phase 5: Operation

This article follows on from Database Design Phase 4: Testing.

The operation phase takes place when the testing is complete and the database is ready to be rolled out for everyday use. The users of the system begin to operate the system, load data, read reports and so on. Inevitably, problems come to light. The designers need to carefully manage the database's scope at this stage, as users may expect all their desires to be pandered to. Poor database designers may find themselves extending the project well beyond their initial time estimate, and the situation may also become unpleasant if the scope has not been clearly defined and agreed upon. Project owners will feel wronged if their needs are not met, and the database designers will feel overworked and underpaid. Even when scope has been well managed, there will always be new requirements, These then lead to the next stage.

There are numerous strategies for implementing a rollout. The low-key approach often works well, where the relatively low number of users in the early stage make bug fixing easy. Hugely publicized rollouts often end with egg on the stakeholder's faces, as the best testers of all, the users, invariably find unforeseen bugs, which is best done away from the spotlight. Alternatively, rollouts can occur in a distributed manner, where a pilot branch or office is selected, and when the system has proven its stability, it's rolled out to the remaining branches.

The following are the steps in the operation phase:

Hand over operation of the database to the users.
Make any final changes based on the problems discovered by users.

_{This page is licensed: CC BY-SA / Gnu FDL}

Database Design Phase 6: Maintenance

This article follows on from .

The database maintenance phase incorporates general maintenance, such as maintaining the indexes, optimizing the tables, adding and removing users, and changing passwords, as well as backups and restoration of backups in case of a failure. New requirements also start to be requested, and this may result in new fields, or new tables, being created.

As the new system and organization changes, the existing database becomes less and less sufficient to meet the organization's needs. For example, the media organization may be amalgamated with media bodies from other countries, requiring integration of many data sources, or the volumes and staff may expand (or reduce) dramatically. Eventually, there comes a time, whether it's 10 months after completion or 10 years, when the database system needs to be replaced. The maintenance of the existing database begins to drain more and more resources, and the effort to create a new design is matched by the current maintenance effort. As this point, the database is coming to the end of its life, and a new project begins life in the Analysis phase.

The following are the steps in the maintenance phase:

Maintain the indexes

Database Lifecycle

This article follows on from .

Like everything else, databases have a finite lifespan. They are born in a flush of optimism and make their way through life achieving fame, fortune, and peaceful anonymity, or notoriety as the case may be, before fading out once more. Even the most successful database at some time is replaced by another, more flexible and up-to-date structure, and so begins life anew. Although exact definitions differ, there are generally six stages of the database lifecycle.

Analysis

The analysis phase is where the stakeholders are interviewed and any existing system is examined to identify problems, possibilities and constraints. The objectives and scope of the new system are determined.

Database Design Phase 2: Conceptual Design

This article follows on from Database Design Phase 1: Analysis.

Conceptual design

Entities and attributes

Can it vary in number independently of other entities? For example, person height is probably not an entity, as it cannot vary in number independently of person. It is not fundamental, so it cannot be an entity in this case.
Is it important enough to warrant the effort of maintaining. For example customer may not be important for a small grocery store and will not be an entity in that case, but it will be important for a video store, and will be an entity in that case.
Is it its own thing that cannot be separated into subcategories? For example, a car-rental agency may have different criteria and storage requirements for different kinds of vehicles. Vehicle may not be an entity, as it can be broken up into car and boat, which are the entities.

The following are examples of entities involving a university with possible attributes in parentheses.