arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Use Cases

MariaDB ColumnStore is ideal for real-time analytics and complex queries on large datasets across industries.

Using MariaDB With R

hashtag
Introduction to R

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …), graphical techniques, machine learning packages and is highly extensible.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

hashtag
The R Environment

R is an integrated suite of software facilities for data manipulation, calculation, and graphical display.

It includes:

• an effective data handling and storage facility,

• a suite of operators for calculations on arrays, in particular matrices,

• a large, coherent, integrated collection of intermediate tools for data analysis,

• graphical facilities for data analysis and display either on-screen or on hardcopy, and

• a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

hashtag
Using R with MariaDB

hashtag
R Installation

Some basic notions / tips on how to use R along with MariaDB are the following:

A. The recommended R distribution is “Base R”:

B. The recommended R GUIs are RStudio Desktop, or RStudio Server:

Alternative GUIs would be:

  • RCode (PGM Solutions): .

“R” and “MariaDB Server” can be installed either in the same server, or in different servers, as an ODBC communication protocol will be used for the exchange of data between the two environments.

hashtag
Data Transfer between R and MariaDB

hashtag
Package: "odbc"

For the transfer of data between MariaDB Server and R Environment, it is recommended R's "odbc" Package:

  • “odbc" is a new R package available on CRAN (Since 2017-02-05), and maintained by RStudio, which is designed to comply with the DBI specification.

  • Tutorials on how to use R's "odbc" package can be found here:

    • Setting up ODBC Drivers:

The "odbc" package requires to have previously installed the MariaDB or MySQL ODBC connector:

For installing the "odbc" package from CRAN, execute in R:

hashtag
Package: "RMariaDB"

“RMariaDB” R library, is a modern 'MariaDB' client based on 'Rcpp'.

For installing RMariaDB package through CRAN, execute the following R statement:

And for connecting to MariaDB:

hashtag
Other Packages: "readr", "RODBC"

There are other alternatives for data transfer between R and MariaDB:

  • “readr” R package, for writing / reading CSV files. To be used in MariaDB along with “LOAD DATA INFILE”.

  • "RODBC" R package: Robust and well-tested (Since 2000-05-24) package which enables data transfer between R and MariaDB by means of an ODBC connector:

    • It is slightly slower than RStudio's new "odbc" package (See benchmarks):

hashtag
R Programming Resources

hashtag
A) Programming

Recommended resources for learning how to program in R are the following:

hashtag
B) Statistics

A recommended book for understanding the underlying statistics in the R packages is:

hashtag
C) Cheatsheets: Concept Summary

  • Rstudio Cheatsheets are a recommended and valuable resource:

  • Along with the following Base R reference card:

hashtag
D) Search Engine & R Package Spotlight

  • Search Engines:

hashtag
E) Statistical / Unsupervised Machine Learning, Deep Learning and Artificial Intelligence

H2O.AI

The R Programming language has support for the H2O.ai library (), which enables to create in-memory multi-cluster GPU powered machine learning models.

For installing H2O.ai through CRAN, execute:

The following R Statements can be used for importing a MariaDB table to H2O.ai using the R Front End:

  • import_sql_table: "This function imports a SQL table to H2OFrame in memory".

  • import_sql_select: "This function imports the SQL table that is the result of the specified SQL query to H2OFrame in memory".

NOTE: Be sure to start the h2o.jar in the terminal with your downloaded JDBC driver in the classpath:

KERAS

offers an interface to , a high-level neural networks 'API'.

'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

R LIBRARIES: CARET

A book which introduces core Machine Learning concepts:

hashtag
F) Text Mining

Documentation on how to perform Text Mining in R can be found in the book "Text Mining With R":

hashtag
G) Shiny Web Apps & RMarkdown Documents

SHINY WEB APPS

R Package makes it incredibly easy to build interactive web applications with R.

Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.

For deploy Shiny Web Applications using Open Source Alternatives, you can either use:

RMARKDOWN DOCUMENTS

hashtag
H) Advanced R Resources

Some of the most advanced R resources for fully understanding the internals and nuances of the R Programming Language are the following:

This page is: Copyright © 2025 MariaDB. All rights reserved.

"odbc" R Package:

For bug report to the RODBC package maintainer, use the following R statement: bug.report(package = "RODBC")

  • A vignette on how to use the RODBC package can be found here:

  • Information on new R packages is regularly published in the following websites:

  • install.packages("odbc")
    install.packages("RMariaDB")
    library(RMariaDB)
    
    con <- dbConnect(
      drv = RMariaDB::MariaDB(), 
      username = NULL,
      password = NULL, 
      host = NULL, 
      port = 3306
    )
    install.packages("h2o")
    connection_url <- "jdbc:mariadb://172.16.2.178:3306/ingestSQL?&useSSL=false"
    username <- "root"
    password <- "abc123"
    
    # Whole Table:
    table <- "citibike20k"
    my_citibike_data <- h2o.import_sql_table(connection_url, table, username, password)
    
    # SELECT Query:
    select_query <-  "SELECT  bikeid  FROM citibike20k"
    my_citibike_data <- h2o.import_sql_select(connection_url, select_query, username, password)
    java -cp <path_to_h2o_jar>:<path_to_jdbc_driver_jar> water.H2OApp
    CRANarrow-up-right
    RStudioarrow-up-right
    RCodearrow-up-right
    CRAN odbcarrow-up-right
    DB RStudio Driversarrow-up-right
    MariaDB ODBC Connectorarrow-up-right
    MySQL ODBC Connectorarrow-up-right
    CRAN RODBCarrow-up-right
    RStudio odbcarrow-up-right
    R Cookbook Second Edition (O’Reilly Media; Paul Teetor; James (JD) Long)arrow-up-right
    R Graphics Cookbook Second Edition (O’Reilly Media; Winston Chang)arrow-up-right
    R for Data Science (O’Reilly Media; Garrett Grolemund, Hadley Wickham)arrow-up-right
    Advanced R Second Edition (CRC R Series; Hadley Wickham)arrow-up-right
    Practical Statistics for Data Scientists (O’Reilly Media; Peter Bruce, Andrew Bruce)arrow-up-right
    RStudio Cheatsheets: Webpagearrow-up-right
    R Reference Card v2arrow-up-right
    RSeek: For searching any R related information (Based on Google).arrow-up-right
    RPackages: Search and stats for CRAN packages.arrow-up-right
    h2oarrow-up-right
    H2O.ai: Webpagearrow-up-right
    H2O.ai Algorithms: Cheatsheetarrow-up-right
    h2o R Package Functions: Cheatsheetarrow-up-right
    Practical Machine Learning with H2O (O'Reilly Media; Darren Cook)arrow-up-right
    R package kerasarrow-up-right
    Python's 'Keras'arrow-up-right
    R interface to Keras: Webpagearrow-up-right
    Deep Learning With R (François Chollet with J. J. Allaire, Manning)arrow-up-right
    Keras Rstudio Cheatsheetarrow-up-right
    Introduction to Machine Learning with R (O'Reilly; Scott Burger)arrow-up-right
    Text Mining With R: A Tidy Approach (O’Reilly Media; Julia Silge and David Robinson): Book Online Versionarrow-up-right
    Shinyarrow-up-right
    Shiny Written Tutorialsarrow-up-right
    Shiny R Package Cheatsheetarrow-up-right
    RInno: CRAN Webpage (Windows)arrow-up-right
    ShinyProxy: Webpagearrow-up-right
    Shiny Server (Open Source Edition): Webpagearrow-up-right
    R Markdown: The Definitive Guide (Book).arrow-up-right
    R Markdown Cheatsheet.arrow-up-right
    Chapman & Hall/CRC The R Series: Subject-specific Booksarrow-up-right
    DB RStudio odbc Usagearrow-up-right
    RODBC CRAN Vignettearrow-up-right
    Mastering Spark with R (O'Reilly; Javier Luraschi, Kevin Kuo, Edgar Ruiz)arrow-up-right
    R Packages (Hadley Wickham; O’Reilly)arrow-up-right
    R-bloggersarrow-up-right
    Towards Data Sciencearrow-up-right
    MRAN: Package Spotlightarrow-up-right
    Machine Learning with R and H2O (Mark Landry): Booklet Online Versionarrow-up-right
    Deep Learning with H2O: Vignettearrow-up-right

    About MariaDB ColumnStore

    MariaDB ColumnStore is a columnar storage engine that utilizes a massively parallel distributed data architecture. It's a columnar storage system built by porting InfiniDB 4.6.7 to MariaDB and released under the GPL license.

    is available as a storage engine for MariaDB Server. Before then, it is available as a separate download.

    circle-info

    Release notes and other documentation for ColumnStore is also available in the Enterprise docs section of the MariaDB website. For example:

    It is designed for big data scaling to process petabytes of data, linear scalability, and exceptional performance with real-time response to analytical queries. It leverages the I/O benefits of columnar storage, compression, just-in-time projection, and horizontal and vertical partitioning to deliver tremendous performance when analyzing large data sets.

    Links:

    • .

    • A Google Group exists for MariaDB ColumnStore that can be used to discuss ideas and issues and communicate with the community: Send email to mariadb-columnstore@googlegroups.com or use the

    • Bugs can be reported in MariaDB Jira: (see ). Please file bugs under the MCOL project and include the output from the if possible.

    MariaDB ColumnStore is released under the GPL license.

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    Deployment Instructions
    MariaDB Columnstore Blogsarrow-up-right
    forum interfacearrow-up-right
    jira.mariadb.orgarrow-up-right
    support utility

    Query Accelerator

    circle-exclamation

    MariaDB Query Accelerator is an Alpha release. Do not use it in production environments. Query Accelerator works only in ColumnStore 25.10.0 and with MariaDB Enterprise Server 11.8.3+.

    hashtag
    What is Query Accelerator

    Query Accelerator allows MariaDB to use ColumnStore to execute queries that are otherwise executed by InnoDB. Under the hood, Columnstore:

    • receives a query;

    • searches for applicable Engine Independent statistics for InnoDB table index column;

    • applies RBO rule to transform its InnoDB tables into a number of UNION queries over non-overlapping ranges of a suitable InnoDB table index;

    hashtag
    Queries Benefitting From Query Accelerator

    Query Accelerator improves the performance of queries that use aggregation functions, for instance SUM, AVG, MIN, MAX, and GROUP BY, where the performance overhead of pulling the data out of InnoDB can be overcome by the performance optimization of running in the ColumnStore engine.

    This avoids the bottleneck/pipeline of having to move data out of InnoDB and into ColumnStore. Query Accelerator strives to parallelize data out of InnoDB, by utilizing table statistics to optimize multiple threads to data ranges on disk. If the InnoDB table in question uses an index, Query Accelerator is able to get the data much faster.

    Example of a query benefitting from Query Accelerator (assuming column_a is indexed):

    The effectiveness of Query Accelerator can vary depending on the type of queries you run and the specific characteristics of your database schema. Certain types of queries or configurations may not benefit from Query Accelerator, or could potentially experience decreased performance. It's essential to understand when Query Accelerator is most advantageous and when traditional InnoDB operations might be more efficient. Consider the following points to optimize query performance with Query Accelerator:

    • Make sure your query uses tables that are indexed, and the index key has the first integer column.

    • Also, run ANALYZE TABLE before running Query Accelerator.

    hashtag
    Queries not to run in Query Accelerator

    hashtag
    Performance Issues

    Performance issues occur for queries like this:

    InnoDB handles such comparison much better than ColumnStore in general, and in Query Accelerator, that would be even worse.

    • Generally, if your query takes longer than a minute in InnoDB, try Query Accelerator.

    hashtag
    Queries not Working in Query Accelerator

    Query Accelerator has the same limitations as ColumnStore in general, in that it has a limited set of and it can handle. Therefore, be aware of

    • syntax or functions that Columnstore does not support;

    • data types ColumnStore does not support.

    hashtag
    Enabling Query Accelerator

    1

    Edit the MariaDB configuration file (my.cnf or my.ini)

    Locate (or create) the mariadb section, and add a line enabling Query Accelerator, like this:

    circle-exclamation

    The loose- prefix is required for ColumnStore system variables in the configuration file. Without it, MariaDB Server will fail to start if the ColumnStore plugin is not installed or has been removed.

    hashtag
    Enabling Processing for InnoDB Tables

    There must be engine-independent statistics for an InnoDB table index column so that it can be used for Query Accelerator.

    hashtag
    Control Client Session Variables and Parameters

    • columnstore_unstable_optimizer enables unstable optimizer that is required for Query Accelerator RBO rule.

    • columnstore_select_handler enables/disables ColumnStore processing for InnoDB tables.

    circle-exclamation

    Watch out for max_connections. If you set columnstore_query_accel_parallel_factor to a high value, you may need to increase max_connections to avoid connection pool exhaustion.

    circle-info

    enable_queryacc() sets columnstore_query_accel_parallel_factor to 5 by default. To use a different value, set it manually after calling enable_queryacc().

    hashtag
    Verifying That Query Accelerator is Being Used

    There are two ways to verify Query Accelerator is being used:

    1. Use select mcs_get_plan('rules') to get a list of the rules that were applied to the query.

    2. Look for patterns like derived table - $added_sub_#db_name_#table_name_X in the optimized plan using select mcs_get_plan('optimized').

    hashtag
    Query Accelerator Quick Start

    This example shows a SUM(x) GROUP BY y query which runs ~2.6s in InnoDB with indexes, and 3x faster via ColumnStore query acceleration ( ~0.7s ), provided there's enough CPU and a high enough parallel_factor.

    1

    In mariadb (MariaDB command-line client), run these statements:

    2

    Turn on Query Accelerator - On CLI:

    3

    In mariadb (MariaDB command-line client), run these statements:

    4

    hashtag
    Quick Verifications

    1

    Tail the ColumnStore log debug.log, and confirm parallel access to InnoDB:

    Increase or decrease parallelism with columnstore_ces_optimization_parallel_factor. Keep in mind you need enough max_connections in MariaDB server:

    2

    Check the execution plan via EXPLAIN FORMAT=JSON. It should say Pushed select

    This page is: Copyright © 2025 MariaDB. All rights reserved.

    retrieves the data in parallel from MariaDB, and runs it using Columnstore runtime.

    Restart MariaDB Server for the change to take effect.
    2

    Enable Query Accelerator in a client session

    Use the routines in the queryacc schema to enable and disable Query Accelerator. These are automatically created during ColumnStore installation.

    -- Enable Query Accelerator and save previous settings
    SET @old_settings = queryacc.enable_queryacc();
    
    -- Run your queries
    SELECT c_zip, SUM(c_payment_cnt) FROM test.customer_indexed GROUP BY c_zip;
    
    -- Disable and restore previous settings
    CALL queryacc.disable_queryacc(@old_settings);

    To run a single query with Query Accelerator without manually managing enable/disable:

    CALL queryacc.with_queryacc('SELECT c_zip, SUM(c_payment_cnt) FROM test.customer_indexed GROUP BY c_zip');
    circle-exclamation

    Do not leave Query Accelerator enabled for an entire session. It changes optimizer_switch settings that effectively disable the MariaDB Optimizer for queries that ColumnStore cannot execute. Always call disable_queryacc() after your queries, or use with_queryacc() which handles this automatically.

    columnstore_query_accel_parallel_factor controls the number of parallel ranges to be used for Query Accelerator.

    Log out of mariadb (MariaDB command-line client), and log in again.

    5

    In mariadb (MariaDB command-line client), run these statements:

    SET @old_settings = queryacc.enable_queryacc();
    SELECT c_zip, sum(c_payment_cnt)  FROM test.customer_indexed GROUP BY c_zip ORDER BY c_zip ; -- 0.7s
    CALL queryacc.disable_queryacc(@old_settings);
    6

    Turn off Query Accelerator - On CLI:

    sed -i 's/^\s*loose-columnstore_innodb_queries_use_mcs\s*=.*/loose-columnstore_innodb_queries_use_mcs = off/' /etc/my.cnf.d/columnstore.cnf
    systemctl restart mariadb
    :
    3

    Verify that mcs_get_plan shows parallel_ces, and that the detailed ColumnStore execution plan shows derived table:

    SELECT mcs_get_plan('rules');
    +-----------------------+
    | mcs_get_plan('rules') |
    +-----------------------+
    | parallel_ces          |
    +-----------------------+
    
    SELECT mcs_get_plan('optimized');
    +-----------------------+
    | mcs_get_plan('rules') |
    +-----------------------+
    ...
    >>From Tables
      derived table - $added_sub_test_customer_indexed_0
    SELECT column_a, SUM(column_b) FROM innodb_table GROUP BY column_a
     SELECT column_a FROM tbl WHERE column_a = column_b
    [mariadb]
    loose-columnstore_innodb_queries_use_mcs = on
    ANALYZE TABLE table_name PERSISTENT FOR COLUMNS (column_name) indexes();
    CREATE DATABASE IF NOT EXISTS test; USE test;
    CREATE TABLE IF NOT EXISTS test.customer_indexed (  `c_d_id` int(2) NOT NULL, `c_w_id` int(6) NOT NULL, `c_first` varchar(16) , `c_middle` char(2) , `c_last` varchar(16) , `c_street_1` varchar(20) , `c_street_2` varchar(20) , `c_city` varchar(20) , `c_state` char(2) , `c_zip` int(5) , `c_phone` char(16) , `c_since` datetime DEFAULT NULL, `c_credit` char(2) , `c_credit_lim` decimal(12,2) DEFAULT NULL, `c_discount` decimal(4,4) DEFAULT NULL, `c_balance` decimal(12,2) DEFAULT NULL, `c_ytd_payment` decimal(12,2) DEFAULT NULL, `c_payment_cnt` int(8) DEFAULT NULL, `c_delivery_cnt` int(8) DEFAULT NULL, `c_data` varchar(500)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    INSERT INTO test.customer_indexed  SELECT  ROUND(RAND() * 42000, 0), ROUND(RAND() * 42000, 0), substring(MD5(RAND()*1000000000),1,16), substring(MD5(RAND()),1,2), substring(MD5(RAND()*1000000000),1,16), substring(MD5(RAND()*1000000000),1,20), substring(MD5(RAND()*1000000000),1,20), substring(MD5(RAND()*1000000000),1,20), substring(MD5(RAND()),1,2), ROUND(RAND() * 42000, 0), substring(MD5(RAND()),1,16), CURRENT_TIMESTAMP - INTERVAL FLOOR(RAND() * 365 * 24 * 60 *60) SECOND, substring(MD5(RAND()),1,2), ROUND(RAND() * 9999999999, 2), ROUND(RAND() * 0, 4), ROUND(RAND() * 9999999999, 2), ROUND(RAND() * 9999999999, 2), ROUND(RAND() * 42000, 0), ROUND(RAND() * 42000, 0), substring(MD5(RAND()*1000000000),1,500) FROM seq_1_to_8000000; -- 3.5 min
    ALTER TABLE test.customer_indexed ADD INDEX idx_fast (`c_zip`, `c_payment_cnt`); -- ~1.5 min
    -- baseline
    SELECT c_zip, sum(c_payment_cnt)  FROM test.customer_indexed GROUP BY c_zip ORDER BY c_zip ;  --2.6s
    sed -i 's/^\s*loose-columnstore_innodb_queries_use_mcs\s*=.*/loose-columnstore_innodb_queries_use_mcs = on/' /etc/my.cnf.d/columnstore.cnf
    systemctl restart mariadb
    # In mariadb (MariaDB command-line client)
    USE test;
    ANALYZE table test.customer_indexed PERSISTENT FOR COLUMNS (c_zip,c_payment_cnt) indexes(); --8s
    SELECT table_name, column_name, hist_type FROM mysql.column_stats WHERE table_name="customer_indexed";
    SHOW VARIABLES LIKE "%columnstore_innodb_queries_use_mcs%";
    tail -f /var/log/mariadb/columnstore/debug.log
    SET columnstore_ces_optimization_parallel_factor=100;
    functions
    data types
    EXPLAIN FORMAT=JSON SELECT c_zip, SUM(c_payment_cnt) FROM test.customer_indexed GROUP BY c_zip ORDER BY c_zip ;
    ...
    | {
      "query_block": {
        "select_id": 1,
        "table": {
          "message": "Pushed select"
        }
      }
    } |
    ...
    spinner
    spinner
    spinner
    MariaDB 10.5.4
    ColumnStore 23.10 Release Notes
    ColumnStore 23.02 Release Notes
    ColumnStore 22.08 Release Notes
    ColumnStore 6 Release Notes
    ColumnStore 5.6 Release Notes
    Reporting Bugs