Analysis of Financial Time Series Data Using MariaDB ColumnStore

MariaDB ColumnStore is an open source columnar database built on MariaDB Server. It can be deployed in the cloud or on a local cluster of Linux servers using either local or networked storage. In this blog post, I will show some examples of analysis of financial time series data using MariaDB ColumnStore.
Free Forex Historical Data at HistData.com
First of all, we will download forex historical data (GBPUSD M1 Full 2016 Year Data) which is freely available on HistData.com. HistData.com forex historical data look like:
20160103 170000;1.473350;1.473350;1.473290;1.473290;0 20160103 170100;1.473280;1.473360;1.473260;1.473350;0 20160103 170200;1.473350;1.473350;1.473290;1.473290;0 20160103 170300;1.473300;1.473330;1.473290;1.473320;0
1st column is timestamp of currency rate of every minute, but needed to convert the format, in order to fit with ColumnStore DATETIME data type. I modified the timestamp format using the following simple Ruby script :
convert.rb: #!/usr/bin/env ruby id = 0 while line = gets  datetime, open, high, low, close = line.split(";")  year, month, day, hour, minute = datetime.unpack("a4a2a2xa2a2")  id+= 1  print "#{id},#{year}-#{month}-#{day} #{hour}:#{minute},"  puts  [open, high, low, close].join(',') end
For example, you can convert the historical data as follows:
ruby convert.rb DAT_ASCII_GBPUSD_M1_2016.csv > gbpusd2016.csv
The converted forex historical data will be:
1,2016-01-03 17:00,1.473350,1.473350,1.473290,1.473290 2,2016-01-03 17:01,1.473280,1.473360,1.473260,1.473350 3,2016-01-03 17:02,1.473350,1.473350,1.473290,1.473290 4,2016-01-03 17:03,1.473300,1.473330,1.473290,1.473320
This demo was performed on following setup:
- CPU: Intel Core i7-7560U 2.4 GHz 2 physical cores / 4 logical cores
- OS: CentOS 7.3 ( virtualized on VMware Workstation 12 Pro on Windows 10 Pro)
- Memory:Â 4GB
- ColumnStore:Â 1 UM & 1 PM on single node
Next, we create a database and a table with MariaDB monitor:
# mcsmysql Welcome to the MariaDB monitor. Commands end with ; or g. Your MariaDB connection id is 4 Server version: 10.1.23-MariaDB Columnstore 1.0.9-1 Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. MariaDB [(none)]> CREATE DATABASE forex; MariaDB [(none)]> USE forex; MariaDB [forex]> CREATE TABLE gbpusd ( id INT, time DATETIME, open DOUBLE, high DOUBLE, low DOUBLE, close DOUBLE ) engine=ColumnStore default character set=utf8;
Notes:
With standard ColumnStore installation, MariaDB monitor command is mcsmysql ( aliased to /usr/local/mariadb/columnstore/mysql/bin/mysql )
Storage engine must be ColumnStore
Now it’s time to import CSV data whose timestamps are corrected. For bulk import, we use cpimport command with ColumnStore. With -s (separator) option, we specify the delimiter character (in this example, comma).
# cpimport -s ',' forex gbpusd gbpusd2016.csv Locale is : C Column delimiter : , Using table OID 3163 as the default JOB ID Input file(s) will be read from : /home/vagrant/histdata Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3163_D20170624_T103843_S950145_Job_3163.xml … 2017-06-24 10:38:45 (29756) INFO : For table forex.gbpusd: 372480 rows processed and 372480 rows inserted. 2017-06-24 10:38:46 (29756) INFO : Bulk load completed, total run time : 2.11976 seconds
We could import 372,280 rows of forex data in 2 seconds. LOAD DATA LOCAL INFILE can be used as well with ColumnStore.
Sample queries using functions
The reason why GBPUSD M1 data 2016 was chosen, is to track very quick moves of GBPUSD before and after the Brexit vote, which was held on 23th June 2016.
Now we look at maximum, minimum and drop off rate during 23th – 24th June 2016, using Windows Functions of ColumnStore.
MariaDB [forex]> SELECT MAX(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25'); +------------+ | MAX(close) | +------------+ | 1.50153 | +------------+ 1 row in set (0.03 sec) MariaDB [forex]> SELECT MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25'); +------------+ | MIN(close) | +------------+ | 1.32322 | +------------+ 1 row in set (0.03 sec) MariaDB [forex]> SELECT MAX(close)/MIN(close) FROM gbpusd WHERE time BETWEEN TIMESTAMP ('2016-06-23') AND TIMESTAMP ('2016-06-25'); +-----------------------+ | MAX(close)/MIN(close) | +-----------------------+ | 1.1347546399658233 | +-----------------------+ 1 row in set (0.03 sec)
GBPUSD fell from 1.50 to 1.32 (-13%) in 2 days (actually in half a day).
Correlation GBPUSD – USDJPY on 23th June 2016
Next graph shows scatter plot of GBPUSD vs. USDJPY during 23th – 24th June 2016. (USDJPY M1 2016 were imported same as GBPUSD)
Note: GBPUSD is multiplied by 100, then subtracted by 130. USDJPY is subtracted by 110.
Now we calculate Pearson correlation coefficient using  statistical aggregation functions. Following SELECT statement was used:
SELECT (AVG(gbpusd.close*usdjpy.close) - AVG(gbpusd.close)*AVG(usdjpy.close)) / (STDDEV(gbpusd.close) * STDDEV(usdjpy.close)) AS correlation_coefficient_population FROM gbpusd JOIN usdjpy ON gbpusd.time = usdjpy.time WHERE gbpusd.time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');
Query result:
+ ------------------------------------+ | correlation_coefficient_population | +------------------------------------+ | 0.9647697302087848 | +------------------------------------+ 1 row in set (0.57 sec)
GBPUSD and USDJPY were highly correlated during the UK EU membership referendum.
Moving Average GBPUSD 23th – 24th June 2016
In Forex trading, moving average is often used to smooth out price fluctuations. The following query allows moving average of sliding 13 rows window.
SELECT time, close, AVG(close) OVER ( ORDER BY time ASC ROWS BETWEEN 6 PRECEDING AND 6 FOLLOWING ) AS MA13, COUNT(close) OVER ( ORDER BY time ASC ROWS BETWEEN 6 PRECEDING AND 6 FOLLOWING ) AS row_count FROM gbpusd WHERE time BETWEEN TIMESTAMP('2016-06-23') AND TIMESTAMP('2016-06-25');
Summary
In this blog post, the following features of MariaDB ColumnStore were explained:
- With cpimport command, users can easily perform fast bulk import to MariaDB ColumnStore tables.
- MariaDB ColumnStore allows fast and easy data analytics using SQL 2003 compliant Window Functions, such as MAX, MIN, AVG and STDDEV.
- No index management for query performance tuning needed.
MariaDB ColumnStore can be downloaded here, detailed instructions to install and test MariaDB ColumnStore on Windows using Hyper-V can be found here.
Post a Comment
Log into your MariaDB ID account to post a comment.