All pages
Powered by GitBook
1 of 14

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Database Design

Articles about the database design process

Database Design Example Phase 1: Analysis

This article follows on from Database Design Phase 6: Maintenance.

Real-world example: creating a publishing tracking system

Now let's walk through the database design process with a step-by-step example. The Poet's Circle is a publisher that publishes poetry and poetry anthologies. It is keen to develop a new system that tracks poets, poems, anthologies and sales. The following sections show the steps taken from the initial analysis to the final, working database.

Poet's circle database phase 1: analysis

The following information is gleaned from speaking to the various stakeholders at Poet's Circle. They want to develop a database system to track the poets they have recorded, the poems they write, the publications they appear in, as well as the sales to customers that these publications make.

The designer asks various questions to get more detailed information, such as "What is a poet, as far as the system goes? Does Poet's Circle keep track of poets even if they haven't written or published poems? Are publications recorded even before there are any associated poems? Does a publication consist of one poem, or many? Are potential customer's details recorded?" The following summarizes the responses in our example:

  • Poet's Circle is a publisher that bases its choices of publications on an active poetry community on its website. If enough of the community wants a poem published, Poet's Circle will do so.

  • A poet can be anybody who wants to be a poet, not necessarily someone who has a poem captured in the system or someone who has even written a poem.

  • Poems can be submitted through a web interface, by email or on paper.

  • All captured poems are written by an associated poet, whose details are already in the system. There can be no poems submitted and stored without a full set of details of the poet.

This page is licensed: CC BY-SA / Gnu FDL

A publication can be a single poem, a poetry anthology, or a work of literary criticism.

  • Customers can sign up through a web interface and may order publications at that point in time, or express interest in receiving updates for possible later purchases.

  • Sales of publications are made to customers whose details are stored in the system. There are no anonymous sales.

  • A single sale can be for one publication, but many publications can also be purchased at the same time. If more than one customer is involved in this sale, Poet's Circle treats it as more than one sale. Each customer has their own sale.

  • Not all publications make sales — some may be special editions, and others simply never sell any copies.

  • Database Design Example Phase 3: Implementation

    This article follows on from Database Design Example Phase 2: Design.

    With the design complete, it's time to and run the statements, as follows:

    CREATE DATABASE poets_circle;
    
    CREATE TABLE poet (
      poet_code INT NOT NULL, 
      first_name VARCHAR(30),
      surname VARCHAR(40), 
      address VARCHAR(100), 
      postcode VARCHAR(20),
      email VARCHAR(254), 
      PRIMARY KEY(poet_code)
    );
    
    CREATE TABLE poem(
      poem_code INT NOT NULL, 
      title VARCHAR(50),
      contents TEXT, 
      poet_code INT NOT NULL, 
      PRIMARY KEY(poem_code),
      INDEX(poet_code), 
      FOREIGN KEY(poet_code) REFERENCES poet(poet_code) 
    );
    
    CREATE TABLE publication(
      publication_code INT NOT NULL,
      title VARCHAR(100),
      price MEDIUMINT UNSIGNED,
      PRIMARY KEY(publication_code)
    );
    
    CREATE TABLE poem_publication(
      poem_code INT NOT NULL,
      publication_code INT NOT NULL, 
      PRIMARY KEY(poem_code, publication_code), 
      INDEX(publication_code),
      FOREIGN KEY(poem_code) REFERENCES poem(poem_code),
      FOREIGN KEY(publication_code) REFERENCES publication(publication_code)
    );
    
    CREATE TABLE sales_publication(
      sales_code INT NOT NULL,
      publication_code INT NOT NULL,
      PRIMARY KEY(sales_code, publication_code)
    ); 
    
    CREATE TABLE customer(
      customer_code INT NOT NULL, 
      first_name VARCHAR(30), 
      surname VARCHAR(40), 
      address VARCHAR(100), 
      postcode VARCHAR(20), 
      email VARCHAR(254), 
      PRIMARY KEY(customer_code)
    );
    
    CREATE TABLE sale(
      sale_code INT NOT NULL, 
      sale_date DATE,
      amount INT UNSIGNED, 
      customer_code INT NOT NULL, 
      PRIMARY KEY(sale_code), 
      INDEX(customer_code), 
      FOREIGN KEY(customer_code) REFERENCES customer(customer_code)
    );

    This page is licensed: CC BY-SA / Gnu FDL

    Database Design Phase 4: Testing

    This article follows on from Database Design Phase 3: Implementation.

    The testing phase is where the performance, security, and integrity of the data are tested. Usually this will occur in conjunctions with the applications that have been developed. You test the performance under various loads conditions to see how the database handles multiple concurrent connections or high volumes of updating and reading. Are the reports generated quickly enough? For example, an application designed with the old storage engine may prove to be too slow because the impact of the updates was underestimated. The storage engine may have to be changed to in response.

    Data integrity also needs to be tested, as the application may have logical flaws that result in transactions being lost or other inaccuracies. Further, security needs to be tested to ensure that users can access and change only the data they should.

    The logical or physical designs may have to be modified. Perhaps new indexes are required (which the tester may discover after careful use of MariaDB's statement, for example).

    The testing and fine-tuning process is an iterative one, with multiple tests performed and changes implemented.

    The following are the steps in the testing phase:

    1. Test the performance

    2. Test the security

    3. Test the data integrity

    4. Fine-tune the parameters or modify the logical or physical designs in response to the tests.

    This page is licensed: CC BY-SA / Gnu FDL

    Database Design Phase 1: Analysis

    This article follows on from .

    Your existing system can no longer cope. It's time to move on. Perhaps the existing paper system is generating too many errors, or the old Perl script based on flat files can no longer handle the load. Or perhaps an existing news database is struggling under its own popularity and needs an upgrade. This is the stage where the existing system is reviewed.

    Depending on the size of the project, the designer may be an individual, responsible for the database implementation and coding, or may be a whole team of analysts. For now, the term designer will represent all these possibilities.

    The following are the steps in the Analysis Phase.

    1. Analyze the organization

    Database Design Example Phases 4-6: Testing, Operation and Maintenance

    This article follows on from .

    Once the database is ready the application programs have been rolled out, it's time for the testing to begin. While the other phases of the database lifecycle can occur reasonably independently of the systems development process, part of the testing phase is how all the components run together.

    Load testing may indicate that MariaDB has not been set up to handle the expected 600 concurrent connections, and the configuration file needs to be changed. Other tests may indicate that in certain circumstances, duplicate key errors are received, as the locking mechanism is not uniformly implemented, and the application does not handle locking correctly. The application needs to be fixed. Backups also need to be tested, as well as the ability to smoothly restore from backup with a minimum of downtime.

    Testing is one of the most neglected and critical phases. A designer or manager who does not properly account for testing is simply incompetent. No matter how tiny your system, make sure you allocate time for thorough testing, and time for fixing the inevitable bugs.

    Once testing is complete, the system can be rolled out. You decide on a low-key rollout and give a few selected poets access to the website to upload their poems. You discover other problems. Some poets upload poems using you haven't catered for, and you need to make a few tweaks to ensure these are handled correctly.

    Soon enough, the system is rolled out completely. Maintenance, though, is a never-ending task, and with the immense popularity of the system, and with large numbers of updates and deletes, the system tends to become fragmented. The administrator regularly needs to take care of this, and, of course, the inevitable disk failure leads to an all-night restore session, and much thankfulness for the ease of use of

    Define any problems, possibilities or constraints
  • Define the objectives

  • Agree on the scope

  • When reviewing a system, the designer needs to look at the bigger picture - not just the hardware or existing table structures, but the whole situation of the organization calling for the redesign. For example, a large bank with centralized management would have a different structure and a different way of operating from a decentralized media organization, where anyone can post news onto a website. This may seem trivial, but understanding the organization you're building the database for is vital to designing a good database for it. The same demands in the bank and media organizations should lead to different designs because the organizations are different. In other words, a solution that was constructed for the bank cannot be unthinkingly implemented for the media organization, even when the situation seems similar. A culture of central control at the bank may mean that news posted on the bank website has to be moderated and authorized by central management, or may require the designer to keep detailed audit trails of who modified what and when. On the flip-side, the media organization may be more laissez-faire and will be happy with news being modified by any authorized editor.

    Understanding an organization's culture helps the designers ask the right questions. The bank may not ask for an audit trail, it may simply expect it; and when the time comes to roll out the implementation, the audit trail would need to be patched on, requiring more time and resources.

    Once you understand the organization structure, you can question the users of any existing system as to what their problems and needs are, as well as what constraints will exist then. You need to question different role players, as each can add new understanding as to what the database may need. For example, the media organization's marketing department may need detailed statistics about the times of day certain articles are read. You may also be alerted to possible future requirements. Perhaps the editorial department is planning to expand the website, which will give them the staff to cross-link web articles. Keeping this future requirement in mind could make it easier to add the cross-linking feature when the time comes.

    Constraints can include hardware ("We have to use our existing database server") or people ("We only have one data capturer on shift at any one time"). Constraints also refer to the limitations on values. For example, a student's grade in a university database may not be able to go beyond 100 percent, or the three categories of seats in a theatre database are small, medium and large.

    It is rarely sufficient to rely on one level of management, or an individual, to supply objectives and current problems, except in the smallest of organizations. Top management may be paying for the database design, but lower levels will need to use it, and their input is probably even more important for a successful design.

    Of course, although anything is possible given infinite time and money, this is (usually) never forthcoming. Determining scope, and formalizing it, is an important part of the project. If the budget is for one month's work but the ideal solution requires three, the designer must make clear these constraints and agree with the project owners on which facets are not going to be implemented.

    This page is licensed: CC BY-SA / Gnu FDL

    Database Lifecycle
    .

    This page is licensed: CC BY-SA / Gnu FDL

    Database Design Example Phase 3: Implementation

    Database Design Phase 3: Implementation

    This article follows on from Database Design Phase 2: Logical and Physical Design.

    The implementation phase is where you install the DBMS on the required hardware, optimize the database to run best on that hardware and software platform, and create the database and load the data. The initial data could be either new data captured directly or existing data imported from a MariaDB database or another DBMS. You also establish database security in this phase and give the various users that you've identified access applicable to their requirements. Finally, you also initiate backup plans in this phase.

    The following are steps in the implementation phase:

    1. Install the DBMS.

    2. Tune the setup variables according to the hardware, software and usage conditions.

    3. Create the database and tables.

    4. Load the data.

    5. Set up the users and security.

    6. Implement the backup regime.

    This page is licensed: CC BY-SA / Gnu FDL

    Database Design Phase 6: Maintenance

    This article follows on from .

    The database maintenance phase incorporates general maintenance, such as maintaining the indexes, optimizing the tables, adding and removing users, and changing passwords, as well as backups and restoration of backups in case of a failure. New requirements also start to be requested, and this may result in new fields, or new tables, being created.

    As the new system and organization changes, the existing database becomes less and less sufficient to meet the organization's needs. For example, the media organization may be amalgamated with media bodies from other countries, requiring integration of many data sources, or the volumes and staff may expand (or reduce) dramatically. Eventually, there comes a time, whether it's 10 months after completion or 10 years, when the database system needs to be replaced. The maintenance of the existing database begins to drain more and more resources, and the effort to create a new design is matched by the current maintenance effort. As this point, the database is coming to the end of its life, and a new project begins life in the Analysis phase.

    The following are the steps in the maintenance phase:

    1. Maintain the indexes

    Database Lifecycle

    This article follows on from .

    Like everything else, databases have a finite lifespan. They are born in a flush of optimism and make their way through life achieving fame, fortune, and peaceful anonymity, or notoriety as the case may be, before fading out once more. Even the most successful database at some time is replaced by another, more flexible and up-to-date structure, and so begins life anew. Although exact definitions differ, there are generally six stages of the database lifecycle.

    Analysis

    The analysis phase is where the stakeholders are interviewed and any existing system is examined to identify problems, possibilities and constraints. The objectives and scope of the new system are determined.

    Maintain the tables

  • Maintain the users

  • Change passwords

  • Backup

  • Restore backups

  • Change the design to meet new requirements

  • This page is licensed: CC BY-SA / Gnu FDL

    Database Design Phase 5: Operation
    Design

    The design phase is where a conceptual design is created from the previously determined requirements, and a logical and physical design are created that will ready the database for implementation.

    Implementation

    The implementation phase is where the database management system (DBMS) is installed, the databases are created, and the data are loaded or imported.

    Testing

    The testing phase is where the database is tested and fine-tuned, usually in conjunction with the associated applications.

    Operation

    The operation phase is where the database is working normally, producing information for its users.

    Maintenance

    The maintenance phase is where changes are made to the database in response to new requirements or changed operating conditions (such as heavier load).

    Database development is not independent of systems development, often being one component of the greater systems development process. The stages of systems development basically mirror the stages of a database lifecycle but are a superset. Whereas database design deals with designing the system to store the data, systems design is also concerned with the processes that will impact on the data.

    This page is licensed: CC BY-SA / Gnu FDL

    Database Design: Overview

    Database Design Phase 2: Conceptual Design

    This article follows on from Database Design Phase 1: Analysis.

    The design phase is where the requirements identified in the previous phase are used as the basis to develop the new system. Another way of putting it is that the business understanding of the data structures is converted to a technical understanding. The what questions ("What data are required? What are the problems to be solved?") are replaced by the how questions ("How will the data be structured? How is the data to be accessed?")

    This phase consists of three parts: the conceptual design, the logical design and the physical design. Some methodologies merge the logical design phase into the other two phases. This section is not aimed at being a definitive discussion of database design methodologies (there are whole books written on that!); rather it aims to introduce you to the topic.

    Conceptual design

    The purpose of the conceptual design phase is to build a conceptual model based upon the previously identified requirements, but closer to the final physical model. A commonly-used conceptual model is called an entity-relationship model.

    Entities and attributes

    Entities are basically people, places, or things you want to keep information about. For example, a library system may have the book, library and borrower entities. Learning to identify what should be an entity, what should be a number of entities, and what should be an attribute of an entity takes practice, but there are some good rules of thumb. The following questions can help to identify whether something is an entity:

    • Can it vary in number independently of other entities? For example, person height is probably not an entity, as it cannot vary in number independently of person. It is not fundamental, so it cannot be an entity in this case.

    • Is it important enough to warrant the effort of maintaining. For example customer may not be important for a small grocery store and will not be an entity in that case, but it will be important for a video store, and will be an entity in that case.

    • Is it its own thing that cannot be separated into subcategories? For example, a car-rental agency may have different criteria and storage requirements for different kinds of vehicles. Vehicle may not be an entity, as it can be broken up into car and boat, which are the entities.

    The following are examples of entities involving a university with possible attributes in parentheses.

    • Course (name, code, course prerequisites)

    • Student (first_name, surname, address, age)

    • Book (title, ISBN, price, quantity in stock)

    An instance of an entity is one particular occurrence of that entity. For example, the student Rudolf Sono is one instance of the student entity. There will probably be many instances. If there is only one instance, consider whether the entity is warranted. The top level usually does not warrant an entity. For example, if the system is being developed for a particular university, university will not be an entity because the whole system is for that one university. However, if the system was developed to track legislation at all universities in the country, then university would be a valid entity.

    Relationships

    Entities are related in certain ways. For example, a borrower may belong to a library and can take out books. A book can be found in a particular library. Understanding what you are storing data about, and how the data relate, leads you a large part of the way to a physical implementation in the database.

    There are a number of possible relationships:

    Mandatory

    For each instance of entity A, there must exist one or more instances of entity B. This does not necessarily mean that for each instance of entity B, there must exist one or more instances of entity A. Relationships are optional or mandatory in one direction only, so the A-to-B relationship can be optional, while the B-to-A relationship is mandatory.

    Optional

    For each instance of entity A, there may or may not exist instances of entity B.

    One-to-one (1:1)

    This is where for each instance of entity A, there exists one instance of entity B, and vice-versa. If the relationship is optional, there can exist zero or one instances, and if the relationship is mandatory, there exists one and only one instance of the associated entity.

    One-to-many (1:M)

    For each instance of entity A, many instances of entity B can exist, which for each instance of entity B, only one instance of entity A exists. Again, these can be optional or mandatory relationships.

    Many-to-many (M:N)

    For each instance of entity A, many instances of entity B can exist, and vice versa. These can be optional or mandatory relationships.

    There are numerous ways of showing these relationships. The image below shows student and course entities. In this case, each student must have registered for at least one course, but a course does not necessarily have to have students registered. The student-to-course relationship is mandatory, and the course-to-student relationship is optional.

    The image below shows invoice_line and product entities. Each invoice line must have at least one product (but no more than one); however each product can appear on many invoice lines, or none at all. The invoice_line-to-product relationship is mandatory, while the product-to-invoice_line relationship is optional.

    The figure below shows husband and wife entities. In this system (others are of course possible), each husband must have one and only one wife, and each wife must have one, and only one, husband. Both relationships are mandatory.

    An entity can also have a relationship with itself. Such an entity is called a recursive entity. Take a person entity. If you're interested in storing data about which people are brothers, you wlll have an "is brother to" relationship. In this case, the relationship is an M:N relationship.

    Conversely, a weak entity is an entity that cannot exist without another entity. For example, in a school, the scholar entity is related to the weak entity parent/guardian. Without the scholar, the parent or guardian cannot exist in the system. Weak entities usually derive their primary key, in part or in totality, from the associated entity. parent/guardian could take the primary key from the scholar table as part of its primary key (or the entire key if the system only stored one parent/guardian per scholar).

    The term connectivity refers to the relationship classification.

    The term cardinality refers to the specific number of instances possible for a relationship. Cardinality limits list the minimum and maximum possible occurrences of the associated entity. In the husband and wife example, the cardinality limit is (1,1), and in the case of a student who can take between one and eight courses, the cardinality limits would be represented as (1,8).

    Developing an entity-relationship diagram

    An entity-relationship diagram models how the entities relate to each other. It's made up of multiple relationships, the kind shown in the examples above. In general, these entities go on to become the database tables.

    The first step in developing the diagram is to identify all the entities in the system. In the initial stage, it is not necessary to identify the attributes, but this may help to clarify matters if the designer is unsure about some of the entities. Once the entities are listed, relationships between these entities are identified and modeled according to their type: one-to-many, optional and so on. There are many software packages that can assist in drawing an entity-relationship diagram, but any graphical package should suffice.

    Once the initial entity-relationship diagram has been drawn, it is often shown to the stakeholders. Entity-relationship diagrams are easy for non-technical people to understand, especially when guided through the process. This can help identify any errors that have crept in. Part of the reason for modeling is that models are much easier to understand than pages of text, and they are much more likely to be viewed by stakeholders, which reduces the chances of errors slipping through to the next stage, when they may be more difficult to fix.

    It is important to remember that there is no one right or wrong answer. The more complex the situation, the more possible designs that will work. Database design is an acquired skill, though, and more experienced designers will have a good idea of what works and of possible problems at a later stage, having gone through the process before.

    Once the diagram has been approved, the next stage is to replace many-to-many relationships with two one-to-many relationships. A DBMS cannot directly implement many-to-many relationships, so they are decomposed into two smaller relationships. To achieve this, you have to create an intersection, or composite entity type. Because intersection entities are less "real-world" than ordinary entities, they are sometimes difficult to name. In this case, you can name them according to the two entities being intersected. For example, you can intersect the many-to-many relationship between student and course by a student-course entity.

    The same applies even if the entity is recursive. The person entity that has an M:N relationship "is brother to" also needs an intersection entity. You can come up with a good name for the intersection entity in this case: brother. This entity would contain two fields, one for each person of the brother relationship — in other words, the primary key of the first brother and the primary key of the other brother.

    This page is licensed: CC BY-SA / Gnu FDL

    Database Design: Overview

    Databases exist because of the need to change data into information. Data are the raw and unprocessed facts. Information is obtained by processing the data into something useful. For example, millions of names and telephone numbers in a phone book are data. Information is the telephone number of the fire department when your house is burning down.

    A database is a large repository of facts, designed in such a way that processing the facts into information is easy. If the phone book was structured in a less convenient way, such as with names and numbers placed in chronological order according to when the numbers were issued, converting the data into information would be much more difficult. Not knowing when the fire department was issued their latest number, you could search for days, and by the time you find the number your house would be a charred pile of ash. So, it's a good thing the phone book was designed as it was.

    A database is much more flexible; a similar set of data to what's in a phone book could be ordered by MariaDB according to name, telephone number, address as well as chronologically. But databases are of course more complex, containing many different kinds of information. People, job titles and a company's products can all mingle to provide complex information. But this complexity makes the design of databases more complex as well. Poor design could make for slow queries, or it could even make certain kinds of information impossible to reach. This section of the documentation features articles about good database design, specifically:

    • The database lifecycle

    • Entity-relationship modeling

    • Common mistakes in database design

    • Real-world example: creating a publishing tracking system

    • Concurrency control with transactions

    This page is licensed: CC BY-SA / Gnu FDL

    Does it list a type of thing, not an instance? The video game blow-em-up 6 is not an entity, rather an instance of the game entity.
  • Does it have many associated facts? If it only contains one attribute, it is unlikely to be an entity. For example, city may be an entity in some cases, but if it contains only one attribute, city name, it is more likely to be an attribute of another entity, such as customer.

  • many-to-many
    one_to_many
    one-to-one
    student-course
    brother-intersection

    Database Design Phase 2: Logical and Physical Design

    This article follows on from Database Design Phase 2: Conceptual Design.

    Overview

    Once the conceptual design is finalized, it's time to convert this to the logical and physical design. Usually, the DBMS is chosen at this stage, depending on the requirements and complexity of the data structures. Strictly speaking, the logical design and the physical design are two separate stages, but are often merged into one. They overlap because most current DBMSs (including MariaDB) match logical records to physical records on disk on a 1:1 basis.

    Each entity will become a database table, and each attribute will become a field of this table. Foreign keys can be created if the DBMS supports them and the designer decides to implement them. If the relationship is mandatory, the foreign key must be defined as NOT NULL, and if it's optional, the foreign key can allow nulls. For example, because of the invoice line-to-product relationship in the previous example, the product code field is a foreign key in the invoice to line table. Because the invoice line must contain a product, the field must be defined as NOT NULL. The default MariaDB storage engine, , does support foreign key constraints, but some storage engines, such as do not. The ON DELETE CASCADE and ON DELETE RESTRICT clauses are used to support foreign keys. ON DELETE RESTRICT means that records cannot be deleted unless all records associated with the foreign key are also deleted. In the invoice line-to-product case, ON DELETE RESTRICT in the invoice line table means that if a product is deleted, the deletion will not take place unless all associated invoice lines with that product are deleted as well. This avoids the possibility of an invoice line existing that points to a non-existent product. ON DELETE CASCADE achieves a similar effect, but more automatically (and more dangerously!). If the foreign key was declared with ON CASCADE DELETE, associated invoice lines would automatically be deleted if a product was deleted. ON UPDATE CASCADE is similar to ON DELETE CASCADE in that all foreign key references to a primary key are updated when the primary key is updated.

    your tables is an important step when designing the database. This process helps avoid data redundancy and improves your data integrity.

    Novice database designers usually make a number of common errors. If you've carefully identified entities and attributes and you've normalized your data, you'll probably avoid these errors.

    Common errors

    • Keep unrelated data in different tables. People who are used to using spreadsheets often make this mistake because they are used to seeing all their data in one two-dimensional table. A relational database is much more powerful; don't 'hamstring' it in this way.

    • Don't store values you can calculate. Let's say you're interested three numbers: /A, B and the product of A and B (A*B). Don't store the product. It wastes space and can easily be calculated if you need it. And it makes your database more difficult to maintain: If you change A, you also have to change all of the products as well. Why waste your database's efforts on something you can calculate when you need it?

    • Does your design cater to all the conditions you've analyzed? In the heady rush of creating an entity-relationship diagram, you can easily overlook a condition. Entity-relationship diagrams are usually better at getting stakeholders to spot an incorrect rule than spot a missing one. The business logic is as important as the database logic and is more likely to be overlooked. For example, it's easy to spot that you cannot have a sale without an associated customer, but have you built in that the customer cannot be approved for a sale of less than $500 if another approved customer has not recommended them?

    This page is licensed: CC BY-SA / Gnu FDL

  • Are your attributes, which are about to become field names, well chosen? Fields should be clearly named. For example, if you use f1 and f2 instead of surname and first_name, the time saved in less typing will be lost in looking up the correct spelling of the field, or in mistakes where a developer thought f1 was the first name, and f2 the surname. Similarly, try to avoid the same names for different fields. If six tables have a primary key of code, you're making life unnecessarily difficult. Rather, use more descriptive terms, such as sales_code or customer_code.

  • Don't create too many relationships. Almost every table in a system can be related by some stretch of the imagination, but there's no need to do this. For example, a tennis player belongs to a sports club. A sports club belongs to a region. The tennis players then also belong to a region, but this relationship can be derived through the sports club, so there's no need to add another foreign key (except to achieve performance benefits for certain kinds of queries). Normalizing can help you avoid this sort of problem (and even when you're trying to optimize for speed, it's usually better to normalize and then consciously denormalize rather than not normalize at all).

  • Conversely, have you catered to all relations? Do all relations from your entity-relationship diagram appear as common fields in your table structures? Have you covered all relations? Are all many-to-many relationships broken up into two one-to-many relationships, with an intersection entity?

  • Have you listed all constraints? Constraints include a gender that can only be m or f, ages of schoolchildren that cannot exceed twenty, or email addresses that need to have an @ sign and at least one period (.; don't take these limits for granted. At some stage the system you will need to implement them, and you're either going to forget to do so, or have to go back and gather more data if you don't list these up front.

  • Are you planning to store too much data? Should a customer be asked to supply their eye color, favorite kind of fish, and names of their grandparents if they are simply trying to register for an online newsletter? Sometimes stakeholders want too much information from their customers. If the user is outside the organization, they may not have a voice in the design process, but they should always be thought of foremost. Consider also the difficulty and time taken to capture all the data. If a telephone operator needs to take all this information down before making a sale, imagine how much slower they will be. Also consider the impact data has on database speed. Larger tables are generally slower to access, and unnecessary , and fields lead to record and table fragmentation.

  • Have you combined fields that should be separate? Combining first name and surname into one field is a common beginner mistake. Later you'll realise that sorting names alphabetically is tricky if you've stored them as John Ellis and Alfred Ntombela. Keep distinct data discrete.

  • Has every table got a primary key? There had better be a good reason for leaving out a primary key. How else are you going to identify a unique record quickly? Consider that an index speeds up access time tremendously, and when kept small it adds very little overhead. Also, it's usually better to create a new field for the primary key rather than take existing fields. First name and surname may be unique in your current database, but they may not always be. Creating a system-defined primary key ensures it will always be unique.

  • Give some thought to your other indexes. What fields are likely to be used in this condition to access the table? You can always create more fields later when you test the system, but add any you think you need at this stage.

  • Are your foreign keys correctly placed? In a one-to-many relationship, the foreign key appears in the many table, and the associated primary key in the one table. Mixing these up can cause errors.

  • Do you ensure referential integrity? Foreign keys should not relate to a primary key in another table that no longer exists.

  • Have you covered all character sets you may need? German letters, for example, have an expanded character set, and if the database is to cater for German users it will have to take this into account. Similarly, dates and currency formats should be carefully considered if the system is to be international

  • Is your security sufficient? Remember to assign the minimum permissions you can. Do not allow anyone to view a table if they do not need to do so. Allowing malicious users view data, even if they cannot change it, is often the first step in for an attacker.

  • Normalizing

    Database Design Example Phase 2: Design

    This article follows on from Database Design Example Phase 1: Analysis.

    Based on the provided information, you can begin your logical design and should be able to identify the initial entities:

    • Poet

    • Poem

    • Publication

    • Sale

    • Customer

    The Poet's Circle is not an entity, or even of instance an a publisher entity. Only if the system were developed for many publishers would publisher be a valid entity.

    Neither website nor poetry community are entities. There is only one website, and anyway, a website is merely a means of producing data to populate the database. There is also only one poetry community as far as this system is concerned, and there is not much you'd want to store about it.

    Next, you need to determine the relationship between these entities. You can identify the following:

    • A poet can write many poems. The analysis identified the fact that a poet can be stored in the system even if there are no associated poems. Poems may be captured at a later point in time, or the poet may still be a potential poet. Conversely, many poets could conceivably write a poem, though the poem must have been written by at least one poet.

    • A publication may contain many poems (an anthology) or just one. It can also contain no poems (poetry criticism for example). A poem may or may not appear in a publication.

    • A sale must be for at least one publication, but it may be for many. A publication may or may not have made any sales.

    You can identify the following attributes:

    • Poet: first name, surname, address, email address

    • Poem: poem title, poem contents

    • Publication: title, price

    • Sales: date, amount

    Based on these entities and relationships, you can construct the entity-relationship diagram shown below:

    There are two many-to-many relationships in the figure above. These need to be converted into one-to-many relationships before you can implement them in a DBMS. After doing so, the intersection entities poem-publication and sale-publication are created.

    Now, to begin the logical and physical design, you need to add attributes that can create the relationship between the entities and specify primary keys. You do what's usually best, and create new, unique, primary keys. The following tables show the structures for the tables created from each of the entities:

    Poet table

    Field
    Definition

    Poem table

    Field
    Definition

    Poem-publication table

    Field
    Definition

    Publication table

    Field
    Definition

    Sale-publication table

    Field
    Definition

    Sale table

    Field
    Definition

    Customer table

    Field
    Definition

    MariaDB will have no problem with this, and is selected as the DBMS. Existing hardware and operating system platforms are also selected. The following section looks at the implementation and the SQL statements used to create the MariaDB tables.

    This page is licensed: CC BY-SA / Gnu FDL

    A customer may be made for many sales, or none at all. A sale is only made for one and only one customer.
    Customer: first name, surname, address, email address

    poet code

    primary key, integer

    first name

    character (30)

    surname

    character (40)

    address

    character (100)

    postcode

    character (20)

    email address

    character (254)

    poem code

    primary key, integer

    poem title

    character(50)

    poem contents

    text

    poet code

    foreign key, integer

    poem code

    joint primary key, foreign key, integer

    publication code

    joint primary key, foreign key, integer

    publication code

    primary key, integer

    title

    character(100)

    price

    numeric(5.2)

    sale code

    joint primary key, foreign key, integer

    publication code

    joint primary key, foreign key, integer

    sale code

    primary key, integer

    date

    date

    amount

    numeric(10.2)

    customer code

    foreign key, integer

    customer code

    primary key, integer

    first name

    character (30)

    surname

    character (40)

    address

    character (100)

    postcode

    character (20)

    email address

    character (254)

    poets-circle-erd1
    poets-circle-erd2
    install MariaDB
    CREATE
    MyISAM
    XtraDB
    EXPLAIN
    character sets
    mariadb-dump
    XtraDB
    MyISAM
    BLOB
    TEXT
    VARCHAR

    Database Design Phase 5: Operation

    This article follows on from Database Design Phase 4: Testing.

    The operation phase takes place when the testing is complete and the database is ready to be rolled out for everyday use. The users of the system begin to operate the system, load data, read reports and so on. Inevitably, problems come to light. The designers need to carefully manage the database's scope at this stage, as users may expect all their desires to be pandered to. Poor database designers may find themselves extending the project well beyond their initial time estimate, and the situation may also become unpleasant if the scope has not been clearly defined and agreed upon. Project owners will feel wronged if their needs are not met, and the database designers will feel overworked and underpaid. Even when scope has been well managed, there will always be new requirements, These then lead to the next stage.

    There are numerous strategies for implementing a rollout. The low-key approach often works well, where the relatively low number of users in the early stage make bug fixing easy. Hugely publicized rollouts often end with egg on the stakeholder's faces, as the best testers of all, the users, invariably find unforeseen bugs, which is best done away from the spotlight. Alternatively, rollouts can occur in a distributed manner, where a pilot branch or office is selected, and when the system has proven its stability, it's rolled out to the remaining branches.

    The following are the steps in the operation phase:

    1. Hand over operation of the database to the users.

    2. Make any final changes based on the problems discovered by users.

    This page is licensed: CC BY-SA / Gnu FDL