July 13, 2017

Analysis of Financial Time Series Data Using MariaDB ColumnStore

MariaDB ColumnStore is an open source columnar database built on MariaDB Server. It can be deployed in the cloud or on a local cluster of Linux servers using either local or networked storage. In this blog post, I will show some examples of analysis of financial time series data using MariaDB ColumnStore.

Free Forex Historical Data at HistData.com

First of all, we will download forex historical data (GBPUSD M1 Full 2016 Year Data) which is freely available on HistData.com. HistaData.com forex historical data look like:

2016.01.03,17:00,1.473350,1.473350,1.473290,1.473290,0
2016.01.03,17:01,1.473280,1.473360,1.473260,1.473350,0
2016.01.03,17:02,1.473350,1.473350,1.473290,1.473290,0
2016.01.03,17:03,1.473300,1.473330,1.473290,1.473320,0

1st column is timestamp of currency rate of every minute, but needed to convert the format, in order to fit with ColumnStore DATETIME data type. I modified the timestamp format using the following simple Ruby script :

convert.rb:
#!/usr/bin/env ruby
id = 0
while line = gets
  date, time, open, high, low, close = line.split(",")
  year, month, day = date.unpack("a4xa2xa2")
  hour, minute = time.unpack("a2xa2")
  id+= 1
  print "#{id},#{year}-#{month}-#{day} #{hour}:#{minute},”
  puts  [open, high, low, close].join(“,”)
end

For example, you can convert the historical data as follows:

ruby convert.rb DAT_ASCII_GBPUSD_M1_2016.csv > gbpusd2016.csv

The converted forex historical data will be:

1,2016-01-03 17:00,1.473350,1.473350,1.473290,1.473290
2,2016-01-03 17:01,1.473280,1.473360,1.473260,1.473350
3,2016-01-03 17:02,1.473350,1.473350,1.473290,1.473290
4,2016-01-03 17:03,1.473300,1.473330,1.473290,1.473320

This demo was performed on following setup:

  • CPU: Intel Core i7-7560U 2.4 GHz 2 physical cores / 4 logical cores
  • OS: CentOS 7.3 ( virtualized on VMware Workstation 12 Pro on Windows 10 Pro)
  • Memory: 4GB
  • ColumnStore: 1 UM & 1 PM on single node

Next, we create a database and a table with MariaDB monitor:

# mcsmysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.1.23-MariaDB Columnstore 1.0.9-1

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE forex;
MariaDB [(none)]> USE forex;
MariaDB [forex]> CREATE TABLE gbpusd (
    id INT, 
    time DATETIME,
    open DOUBLE,
    high DOUBLE, 
    low DOUBLE, 
    close DOUBLE
    ) engine=ColumnStore default character set=utf8;

Notes:
With standard ColumnStore installation, MariaDB monitor command is mcsmysql ( aliased to /usr/local/mariadb/columnstore/mysql/bin/mysql )
Storage engine must be ColumnStore

Now it’s time to import CSV data whose timestamps are corrected. For bulk import, we use cpimport command with ColumnStore. With -s (separator) option, we specify the delimiter character (in this example, comma). 

# cpimport -s ',' forex gbpusd gbpusd2016.csv
Locale is : C
Column delimiter : ,

Using table OID 3163 as the default JOB ID
Input file(s) will be read from : /home/vagrant/histdata
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3163_D20170624_T103843_S950145_Job_3163.xml

…

2017-06-24 10:38:45 (29756) INFO : For table forex.gbpusd: 372480 rows processed and 372480 rows inserted.
2017-06-24 10:38:46 (29756) INFO : Bulk load completed, total run time : 2.11976 seconds

We could import 372,280 rows of forex data in 2 seconds. LOAD DATA LOCAL INFILE can be used as well with ColumnStore.

Sample queries using functions

The reason why GBPUSD M1 data 2016 was chosen, is to track very quick moves of GBPUSD before and after the Brexit vote, which was held on 23th June 2016.

gbpusd-h4.png

 

Now we look at maximum, minimum and drop off rate during 23th – 24th June 2016, using Windows Functions of ColumnStore.

MariaDB [forex]> SELECT MAX(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+------------+
| MAX(close) |
+------------+
|    1.50153 |
+------------+
1 row in set (0.03 sec)

MariaDB [forex]> SELECT MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+------------+
| MIN(close) |
+------------+
|    1.32322 |
+------------+
1 row in set (0.03 sec)

MariaDB [forex]> SELECT MAX(close)/MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25');
+-----------------------+
| MAX(close)/MIN(close) |
+-----------------------+
|    1.1347546399658233 |
+-----------------------+
1 row in set (0.03 sec)

GBPUSD fell from 1.50 to 1.32 (-13%) in 2 days (actually in half a day). 

Correlation GBPUSD – USDJPY on 23th June 2016

Next graph shows scatter plot of GBPUSD vs. USDJPY during 23th – 24th June 2016. (USDJPY M1 2016 were imported same as GBPUSD)

scatter-plot-gbpusd-usdjpy.png

Note: GBPUSD is multiplied by 100, then subtracted by 130. USDJPY is subtracted by 110.

Now we calculate Pearson correlation coefficient using  statistical aggregation functions. Following SELECT statement was used: 

SELECT
  (AVG(gbpusd.close*usdjpy.close) - AVG(gbpusd.close)*AVG(usdjpy.close)) / 
  (STDDEV(gbpusd.close) * STDDEV(usdjpy.close))
AS correlation_coefficient_population
FROM gbpusd
JOIN usdjpy ON gbpusd.time = usdjpy.time
WHERE
  gbpusd.time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');

Query result:

+ ------------------------------------+
| correlation_coefficient_population |
+------------------------------------+
|                 0.9647697302087848 |
+------------------------------------+
1 row in set (0.57 sec)

GBPUSD and USDJPY were highly correlated during the UK EU membership referendum.

Moving Average GBPUSD 23th - 24th June 2016

In Forex trading, moving average is often used to smooth out price fluctuations. The following query allows moving average of sliding 13 rows window.

SELECT 
  time,
  close,
  AVG(close) OVER ( 
               ORDER BY time ASC
                 ROWS BETWEEN 
                   6 PRECEDING AND
                   6 FOLLOWING ) AS MA13,
  COUNT(close) OVER ( 
               ORDER BY time ASC
                 ROWS BETWEEN 
                   6 PRECEDING AND
                   6 FOLLOWING ) AS row_count
FROM gbpusd
WHERE time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');

GBPUSD-MA13.png

Summary

In this blog post, the following features of MariaDB ColumnStore were explained:

  • With cpimport command, users can easily perform fast bulk import to MariaDB ColumnStore tables.
  • MariaDB ColumnStore allows fast and easy data analytics using SQL 2003 compliant Window Functions, such as MAX, MIN, AVG and STDDEV.
  • No index management for query performance tuning needed.

MariaDB ColumnStore can be downloaded here, detailed instructions to install and test MariaDB ColumnStore on Windows using Hyper-V can be found here.

About Satoru Goto

Satoru Goto has spent more than 20 years on UNIX/Linux administration using various open source softwares. He works at MariaDB Corporation as Customer Solutions Engineer.

Read all posts by Satoru Goto