The Importance of Planning

Whenever I start a new project with MySQL or MariaDB I am tempted to jump in and start creating my schema and tables with nothing more than a basic outline of the requirements. Often the justification I give myself is that the sooner I have something up and running, the better off I’ll be, and besides, there will be plenty of time to sort things out later. Even when I am absolutely sure my ad-hoc schema is only for “development purposes” and will “never ever be used in production” it often is.

A lot of grief can be avoided if time is taken right at the start, before any tables are defined, to plan. This is big picture time and every stakeholder, from IT to the developers to management and even to marketing, needs to be involved. If for no other reason than to set expectations.

Right at the start, everyone involved in the project needs to know that there is no perfect data definition, and no one schema, server, or cluster is capable of being all things to all people. As hard as it may be to believe, this actually may be news to people unfamiliar with databases or technology in general. The more thought and planning that goes into the design of your database, and the more realistically you can set the expectations right and the beginning, and long before real data is stored in it and real users are using it, the better off you’ll be.

During the planning stages you need to look at “nuts and bolts” things like data types, indexes, and normalization. You should also look at things like performance requirements, budget, business use cases, and so on. Be thorough, but be pragmatic. For example, if you’re creating a schema for the backend database of a simple workgroup app that will never have more than a dozen users and a few dozen megabytes of data, you can get away with things which would make an enterprise-wide system with thousands of concurrent users fall flat on its face.

Following are some things I try to keep in mind whenever I begin a new project:

Know your data. INTs, VARCHARs, CHARs, BLOBs, DATETIMEs, and other datatypes are not exciting to think about (for most people), but defining your tables properly, with the correct (and correctly sized) data types in the correct places can have a measurable impact on the performance of your database. For example, avoid the temptation to stuff everything into large VARCHARs and call it good. It might make development easier, but it could sink you performance-wise. Different data-types exist for a reason. It is to your benefit to learn them, their differences, and how to properly use them.

Normalize only as long as it benefits you. In theory, you should almost always normalize your database. Normalization minimizes duplicated data and can greatly improve performance. However, there are different levels of normalization and for certain data and/or workloads, 2NF (2nd Normal Form) may work better than 3NF or vice-versa. There are also cases where de-normalized data may work better. On a similar note, if terms like “3NF” are unfamiliar to you, pick up a book (or search the web) and learn about the different normal forms. You’ll be glad you did.

No database lives in a vacuum. Your database has to live in the real world, so don’t forget to consider budgets, business use cases, and the budget for your project. It’s a fact of life that compromises will have to be made in almost any project, so try and get out in front and set realistic goals and deadlines. Also, when changes or additions to the project are proposed, communicate clearly how the changes will affect things like execution times, hardware requirements, and so on. Sometimes the reasons behind a change are worth it, but sometimes they’re not.

Lastly: Don’t be afraid to ask for help. No single person, or small group, can be expected to know everything and there’s no shame in asking for or hiring outside help.

Good luck with your next project!