R Statistical Programming using MariaDB as the background database
1. Introduction to R
R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
2. The R environment
R is an integrated suite of software facilities for data manipulation, calculation, machine learning and graphical display. It includes:
• an effective data handling and storage facility,
• a suite of operators for calculations on arrays, in particular matrices,
• a large, coherent, integrated collection of intermediate tools for data analysis,
• graphical facilities for data analysis and display either on-screen or on hardcopy, and
• a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
3. Using R with MariaDB
3.1. R Installation
Some basic notions / tips on how to use R along with MariaDB are the following:
A. The recommended R distribution is “Microsoft R Open”:
B. The recommended R GUIs are RStudio Desktop, or RStudio Server:
An alternative would be Microsoft Visual Studio 2015 or 2017:
“Microsoft R Open” and “MariaDB Server” can be installed either in the same server, or in different servers, as an ODBC communication protocol will be used for the exchange of data between the two environments.
3.2. Data Transfer between R and MariaDB
For the transfer of data between MariaDB Server and R Environment, it is recommended to use RODBC R Package:
For bug report to the RODBC package maintainer, use the following R statement: bug.report(package = "RODBC")
A vignette on how to use the RODBC package can be found here:
The RODBC package requires to have previously installed the MariaDB ODBC connector:
There are other alternatives for data transfer between R and MariaDB:
• “readr” R package, for writing / reading CSV files. To be used in MariaDB along with “LOAD DATA INFILE”.
• “RMySQL” R library, whose use it is not recommended for not being robust enough when dealing with “big data”.
• “RMariaDB” R library, a modern 'MariaDB' client based on 'Rcpp' is available from the 'RMariaDB' package on 'Github': https://github.com/rstats-db/RMariaDB
For installing RMariaDB package on 3.4.1+ execute the following R statement: install.packages("RMariaDB")
RMariaDB package requires the installation of MariaDB's C Connector, which can be downloaded from:
3.3. R Programming Resources
Two excellent and recommended books for learning how to program in R are the following:
• R Cookbook (O’Reilly Media; Paul Teetor): http://shop.oreilly.com/product/9780596809164.do
• R Graphics Cookbook (O’Reilly Media; Winston Chang): http://shop.oreilly.com/product/0636920023135.do
Books could be acquired directly in O’Reilly Store, or in Amazon, over other vendors.
An extract of both books can be foundhere:
Rstudio Cheatsheets are also a recommended and valuable resource:
Along with the following Base R reference card:
At last, but not least, for searching any R related information, the following web searcher is recommended (Based on Google):