December 21, 2016

Of Temporal Datatypes, Electricity and Cows

Introduction

In an earlier blog post I discussed some aspects of temporal datatypes and how they apply to databases, in particular MariaDB. In this follow-up blog post I will get into some not-so-pleasant aspects of temporal datatypes, for example why this was caused by a shortage of electricity and then get into a subject where cows and developers share an opinion (which is not to say that there might not be more such subjects).

How an attempt to save electricity gets time zones all wrong

Let’s again go back a few years, back to the early years of World War 1 when the government in Germany determined that they needed a lot of energy to fight the war and they wanted to conserve power. They first figured that they would do this by telling people to switch off their iPhones when they weren’t using them, but then they realized that wouldn’t work as the iPhone hadn’t been invented yet. Instead they forced people (people here not including the government officials themselves) to get out of bed earlier (yes, these folks were bad, real bad) and they did this by changing the time in the spring/summer so that people could work in the daylight longer and not have to use artificial lightning. People were outraged by this, obviously, no one likes to get up early, so in the fall, some 6 months after implementing this, they changed the time back again, as they didn’t want a revolution. Some 6’ish months later they forgot all about that and changed the time again to force people to get up earlier, and then back again in the fall. And so it continued.

Then the world followed that scheme, or rather, some did and some didn’t. They decided to give constant flipping of the time a name also, and they decided on Daylight Saving Time (DST) to make it sound like something sane, pleasant and modern, and “Get Out Of Bed One Hour Earlier For Half The Year” was considered too long a name for this atrocity.

Not everybody liked DST though. Cows for example did not. And I know what you are asking now, as you are modern technical IT guys reading this, you ask “what is a cow”. When I was a kid, my mum told me that cows were mild mannered animals that you got milk from, and that was a lie of course, as everyone knows that you get milk from the grocery store, not from some animal, mild mannered or not. But cows sound funny at least. In addition to cows, software developers also hate DST, for a number of reasons, but for a different reason than cows (apparently cows like to be “milked”, whatever that is, the same time every day which is why they dislike DST so much. And how they know that DST means that the time is different I do not know).

The MariaDB TIMESTAMP datatype in practice

In the previous post on this subject we determined that the MariaDB TIMESTAMP datatype supports time zones, but what does this mean? Well, this is what it means, in short:
•    MariaDB has a default TIME ZONE setting that is, unless you change it, set to the time zone of the operating system that the MariaDB Server runs on.
•    All TIMESTAMP values are converted to the UTC (Universal Time Coordinated) time zone before being stored. More on this later.
•    Each session connecting to MariaDB has a time zone (unless specified it uses that defined in the server) and conversion to and from UTC is automatic.


So what is this UTC time zone then? Well, it is actually not a TIME ZONE per se, rather it is the standard time that all other time settings reference, but as such it also assumes a time zone for technical reasons, although there is no physical place on earth with the UTC time zone. 

One thing with UTC is that it doesn’t have something like DST, and that it doesn’t makes a lot of sense. But this also means that we are going to convert to and from UTC anytime we run with a non-UTC time zone which most people do (I can argue that maybe you should, but that depends on your application). The issue is that as we are not using UTC on our clients we have to convert to and from UTC all the time and the non-UTC time zones we typically use do have DST and then conversion will be difficult. Let me show you what this means.
For the PST time zone, on Nov 6 2016 we changed from DST and back to normal time, meaning that we would our clocks back. When the time was 02:00 the clocks were reset to 1:00. Lets start with inserting some data using UTC, or “Cow-time”:

MariaDB> SET time_zone = 'UTC';
Query OK, 0 rows affected (0.00 sec)
MariaDB> CREATE TABLE timetable(id NOT NULL PRIMARY KEY AUTO_INCREMENT, ts1 TIMESTAMP);
Query OK, 0 rows affected (0.00 sec)
MariaDB> INSERT INTO timetable(ts1) VALUES('2016-11-06 07:30:00');
Query OK, 1 row affected (0.00 sec)
MariaDB> INSERT INTO timetable(ts1) VALUES('2016-11-06 08:00:00');
Query OK, 1 row affected (0.00 sec)
MariaDB> INSERT INTO timetable(ts1) VALUES('2016-11-06 08:30:00');
Query OK, 1 row affected (0.00 sec)
MariaDB> INSERT INTO timetable(ts1) VALUES('2016-11-06 09:00:00');
Query OK, 1 row affected (0.00 sec)
MariaDB> SELECT * FROM  timetable ORDER BY ts1;
+----+---------------------+
| id | ts1                 |
+----+---------------------+
|  1 | 2016-11-06 07:30:00 |
|  2 | 2016-11-06 08:00:00 |
|  3 | 2016-11-06 08:30:00 |
|  4 | 2016-11-06 09:00:00 |
+----+---------------------+
4 rows in set (0.00 sec)

That seems fair, right? Now, the time we are inserting this data in UTC is actually when PST stops DST, 09:00 is UST is 02:00 PST. Let’s look at what the result of that last SELECT looks like in the PST timezone. Note that this is exactly the same data, table and SELECT statement, the latter including an explicit ORDER BY:

MariaDB> SET time_zone = 'America/Los_Angeles';
Query OK, 0 rows affected (0.00 sec)
MariaDB> SELECT * FROM  timetable ORDER BY ts1;
+----+---------------------+
| id | ts1                 |
+----+---------------------+
|  1 | 2016-11-06 00:30:00 |
|  2 | 2016-11-06 01:00:00 |
|  3 | 2016-11-06 01:30:00 |
|  4 | 2016-11-06 01:00:00 |
+----+---------------------+
4 rows in set (0.00 sec)

What! That is not ordered by the ts1 column? Well, it is, but fact is that 01:00 happens twice that night! Let’s try something else in the PST time zone:

MariaDB> INSERT INTO timetable(ts1) VALUES('2017-03-12 02:30:00');
Query OK, 1 row affected, 1 warning (0.00 sec)
MariaDB> SHOW WARNINGS;
+---------+------+--------------------------------------------------+
| Level   | Code | Message                                          |
+---------+------+--------------------------------------------------+
| Warning | 1299 | Invalid TIMESTAMP value in column 'ts1' at row 1 |
+---------+------+--------------------------------------------------+
1 row in set (0.00 sec)

The time we want to set isn’t valid in PST that day, as when the clock turns 02:00, the time is reset to 03:00. The data is “truncated” to a valid PST timestamp:

MariaDB> SELECT * FROM timetable;
+----+---------------------+
| id | ts1                 |
+----+---------------------+
|  7 | 2016-11-06 00:30:00 |
|  8 | 2016-11-06 01:00:00 |
|  9 | 2016-11-06 01:30:00 |
| 10 | 2016-11-06 01:00:00 |
| 22 | 2017-03-12 03:00:00 |
+----+---------------------+
5 rows in set (0.00 sec)

Conclusions

So, how do we solve this? Is there a best practice? Well, if we want to follow the DST changes and also support timestamped data in different time-zones, we at least has to learn to live with it. One way is to stick to UTC across the range and let the application handle this? Or have each MariaDB connection set the timezone to something appropriate? Or have all client run the with time zone of the server? Really, this is difficult, but it’s not as much an IT problem, more a problem with Cows, Electricity and Trains.

Happy SQL’ing
/Karlsson

About Anders Karlsson

Anders Karlsson has spent many years in the database industry, working for companies such as Oracle, Informix and MySQL. He works at MariaDB Corporation as Principal Sales Engineer since 2012.

Read all posts by Anders Karlsson