All pages
Powered by GitBook
1 of 1

Loading...

R Statistical Programming Using MariaDB as the Background Database

  1. Introduction to R "Introduction to R"

  2. The R Environment "The R Environment"

  3. Using R with MariaDB "Using R with MariaDB"

  4. R Installation "R Installation"

  5. Data Transfer between R and MariaDB "Data Transfer between R and MariaDB"

1. [Package: "odbc" "Package: "odbc""](#package-odbc)
1. [Package: "RMariaDB" "Package: "RMariaDB""](#package-rmariadb)
1. [Other Packages: "readr", "RODBC" "Other Packages: "readr", "RODBC""](#other-packages-readr-rodbc)
  1. R Programming Resources "R Programming Resources"

1. [A) Programming "A) Programming"](#a-programming)
1. [B) Statistics "B) Statistics"](#b-statistics)
1. [C) Cheatsheets: Concept Summary "C) Cheatsheets: Concept Summary"](#c-cheatsheets-concept-summary)
1. [D) Search Engine & R Package Spotlight "D) Search Engine & R Package Spotlight"](#d-search-engine-r-package-spotlight)
1. [E) Statistical / Unsupervised Machine Learning, Deep Learning and Artificial Intelligence "E) Statistical / Unsupervised Machine Learning, Deep Learning and Artificial Intelligence"](#e-statistical-unsupervised-machine-learning-deep-learning-and-artificial-intelligence)
1. [F) Text Mining "F) Text Mining"](#f-text-mining)
1. [G) Shiny Web Apps & RMarkdown Documents "G) Shiny Web Apps & RMarkdown Documents"](#g-shiny-web-apps-rmarkdown-documents)
1. [H) Advanced R Resources "H) Advanced R Resources"](#h-advanced-r-resources)

Introduction to R

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …), graphical techniques, machine learning packages and is highly extensible.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

The R Environment

R is an integrated suite of software facilities for data manipulation, calculation, and graphical display. It includes:

• an effective data handling and storage facility,

• a suite of operators for calculations on arrays, in particular matrices,

• a large, coherent, integrated collection of intermediate tools for data analysis,

• graphical facilities for data analysis and display either on-screen or on hardcopy, and

• a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Using R with MariaDB

R Installation

Some basic notions / tips on how to use R along with MariaDB are the following:

A. The recommended R distribution is “Base R”: CRAN

B. The recommended R GUIs are RStudio Desktop, or RStudio Server: RStudio

Alternative GUIs would be:

  • RCode (PGM Solutions): RCode.

“R” and “MariaDB Server” can be installed either in the same server, or in different servers, as an ODBC communication protocol will be used for the exchange of data between the two environments.

Data Transfer between R and MariaDB

Package: "odbc"

For the transfer of data between MariaDB Server and R Environment, it is recommended R's "odbc" Package: CRAN odbc

  • “odbc" is a new R package available on CRAN (Since 2017-02-05), and maintained by RStudio, which is designed to comply with the DBI specification.

  • Tutorials on how to use R's "odbc" package can be found here:

    • Setting up ODBC Drivers: DB RStudio Drivers

    • "odbc" R Package: DB RStudio odbc Usage

The "odbc" package requires to have previously installed the MariaDB or MySQL ODBC connector:

  • MariaDB ODBC Connector

  • MySQL ODBC Connector

For installing the "odbc" package from CRAN, execute in R:

install.packages("odbc")

Package: "RMariaDB"

“RMariaDB” R library, is a modern 'MariaDB' client based on 'Rcpp'.

For installing RMariaDB package through CRAN, execute the following R statement:

install.packages("RMariaDB")

And for connecting to MariaDB:

library(RMariaDB)

con <- dbConnect(
  drv = RMariaDB::MariaDB(), 
  username = NULL,
  password = NULL, 
  host = NULL, 
  port = 3306
)

Other Packages: "readr", "RODBC"

There are other alternatives for data transfer between R and MariaDB:

  • “readr” R package, for writing / reading CSV files. To be used in MariaDB along with “LOAD DATA INFILE”.

  • "RODBC" R package: Robust and well-tested (Since 2000-05-24) package which enables data transfer between R and MariaDB by means of an ODBC connector: CRAN RODBC

    • It is slightly slower than RStudio's new "odbc" package (See benchmarks): RStudio odbc

    • For bug report to the RODBC package maintainer, use the following R statement: bug.report(package = "RODBC")

    • A vignette on how to use the RODBC package can be found here: RODBC CRAN Vignette

R Programming Resources

A) Programming

Recommended resources for learning how to program in R are the following:

  • R Cookbook Second Edition (O’Reilly Media; Paul Teetor; James (JD) Long)

  • R Graphics Cookbook Second Edition (O’Reilly Media; Winston Chang)

  • R for Data Science (O’Reilly Media; Garrett Grolemund, Hadley Wickham)

  • Advanced R Second Edition (CRC R Series; Hadley Wickham)

  • Mastering Spark with R (O'Reilly; Javier Luraschi, Kevin Kuo, Edgar Ruiz)

  • R Packages (Hadley Wickham; O’Reilly)

B) Statistics

A recommended book for understanding the underlying statistics in the R packages is:

  • Practical Statistics for Data Scientists (O’Reilly Media; Peter Bruce, Andrew Bruce)

C) Cheatsheets: Concept Summary

  • Rstudio Cheatsheets are a recommended and valuable resource: RStudio Cheatsheets: Webpage

  • Along with the following Base R reference card: R Reference Card v2

D) Search Engine & R Package Spotlight

  • Search Engines:

    • RSeek: For searching any R related information (Based on Google).

    • RPackages: Search and stats for CRAN packages.

  • Information on new R packages is regularly published in the following websites:

    • R-bloggers

    • Towards Data Science

    • MRAN: Package Spotlight

E) Statistical / Unsupervised Machine Learning, Deep Learning and Artificial Intelligence

H2O.AI

The R Programming language has support for the H2O.ai library (h2o), which enables to create in-memory multi-cluster GPU powered machine learning models.

For installing H2O.ai through CRAN, execute:

install.packages("h2o")
  • H2O.ai: Webpage

  • H2O.ai Algorithms: Cheatsheet

  • h2o R Package Functions: Cheatsheet

  • Practical Machine Learning with H2O (O'Reilly Media; Darren Cook)

  • Machine Learning with R and H2O (Mark Landry): Booklet Online Version

  • Deep Learning with H2O: Vignette

The following R Statements can be used for importing a MariaDB table to H2O.ai using the R Front End:

  • import_sql_table: "This function imports a SQL table to H2OFrame in memory".

  • import_sql_select: "This function imports the SQL table that is the result of the specified SQL query to H2OFrame in memory".

connection_url <- "jdbc:mariadb://172.16.2.178:3306/ingestSQL?&useSSL=false"
username <- "root"
password <- "abc123"

# Whole Table:
table <- "citibike20k"
my_citibike_data <- h2o.import_sql_table(connection_url, table, username, password)

# SELECT Query:
select_query <-  "SELECT  bikeid  FROM citibike20k"
my_citibike_data <- h2o.import_sql_select(connection_url, select_query, username, password)

NOTE: Be sure to start the h2o.jar in the terminal with your downloaded JDBC driver in the classpath:

java -cp <path_to_h2o_jar>:<path_to_jdbc_driver_jar> water.H2OApp

KERAS

R package keras offers an interface to Python's 'Keras', a high-level neural networks 'API'.

'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

  • R interface to Keras: Webpage

  • Deep Learning With R (François Chollet with J. J. Allaire, Manning)

  • Keras Rstudio Cheatsheet

R LIBRARIES: CARET

A book which introduces core Machine Learning concepts:

  • Introduction to Machine Learning with R (O'Reilly; Scott Burger)

F) Text Mining

Documentation on how to perform Text Mining in R can be found in the book "Text Mining With R":

  • Text Mining With R: A Tidy Approach (O’Reilly Media; Julia Silge and David Robinson): Book Online Version

G) Shiny Web Apps & RMarkdown Documents

SHINY WEB APPS

Shiny R Package makes it incredibly easy to build interactive web applications with R.

Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.

  • Shiny Written Tutorials

  • Shiny R Package Cheatsheet

For deploy Shiny Web Applications using Open Source Alternatives, you can either use:

  • RInno: CRAN Webpage (Windows)

  • ShinyProxy: Webpage

  • Shiny Server (Open Source Edition): Webpage

RMARKDOWN DOCUMENTS

  • R Markdown: The Definitive Guide (Book).

  • R Markdown Cheatsheet.

H) Advanced R Resources

Some of the most advanced R resources for fully understanding the internals and nuances of the R Programming Language are the following:

  • Chapman & Hall/CRC The R Series: Subject-specific Books

This page is licensed: CC BY-SA / Gnu FDL