Mixing SQL and NoSQL with MariaDB and MongoDB
Let’s say you have an application developed in Node.js (or any other platform). This application connects to a MongoDB database (NoSQL) to store ratings on books (number of stars given and a comment). Let’s also say that you have another application developed in Java (or Python, C#, TypeScript… anything). This application connects to a MariaDB database (SQL, relational) to manage a catalog of books (title, year of publishing, number of pages).
You are asked to create a report that shows the title and rating information for each book. Notice that the MongoDB database doesn’t contain the title of the books and the relational database doesn’t contain the ratings. We need to mix data created by a NoSQL application with data created by a SQL application.
A common approach to this is to query both databases independently (using different data sources) and process the data to match by, for example, ISBN (the id of a book) and put the combined information in a new object. This needs to be done in a programming language like Java, TypeScript, C#, Python, or any other imperative programming language that is able to connect to both databases.
A polyglot application
This approach works. However, joining data is a job for a database. They are built for this kind of data operations. Also, with this approach, the SQL application is no longer a SQL-only application, it becomes a database polyglot and this increases complexity making it harder to maintain.
With a database proxy like MaxScale, you can join this data at the database level using the best language for data—SQL. Your SQL application doesn’t need to become a polyglot.
Although this requires an additional element in the infrastructure, you also gain all the functionality that a database proxy has to offer. Things such as automatic failover, transparent data masking, topology isolation, caches, security filters, and more.
MaxScale is a powerful intelligent database proxy that understands both SQL and NoSQL. It also understands Kafka (for CDC or data ingestion) but that’s a topic for another occasion. In short, with MaxScale you can connect your NoSQL application to a fully ACID-compliant relational database and store the data right there next to tables that other SQL applications use.
MaxScale allows a SQL application to consume NoSQL data
Let’s try this last approach in a quick and easy-to-follow experiment with MaxScale. You will need the following installed on your computer:
Setting up the MariaDB database
Using a plain text editor create a new file and save it with the name docker-compose.yml. The file should contain the following:
version: "3.9" services: mariadb: image: alejandrodu/mariadb environment: - MARIADB_CREATE_DATABASE=demo - MARIADB_CREATE_USER=user:Password123! - MARIADB_CREATE_MAXSCALE_USER=maxscale_user:MaxScalePassword123! maxscale: image: alejandrodu/mariadb-maxscale command: --admin_host 0.0.0.0 --admin_secure_gui false ports: - "3306:4000" - "27017:27017" - "8989:8989" environment: - MAXSCALE_USER=maxscale_user:MaxScalePassword123! - MARIADB_HOST_1=mariadb 3306 - MAXSCALE_CREATE_NOSQL_LISTENER=user:Password123!
This is a Docker Compose file. It describes a set of services to be created by Docker. We are creating 2 services (or containers)—a MariaDB database server and a MaxScale database proxy. They will be running locally on your machine, but in production environments, it’s common to deploy them in separate physical machines. Keep in mind that these Docker images are not suitable for production! They are intended to be suitable for quick demos and tests. You can find the source code for these images on GitHub. For the official Docker images from MariaDB, head to the MariaDB page on Docker Hub.
The previous Docker Compose file configures a MariaDB database server with a database (or schema, they are synonyms in MariaDB) called demo
. It also creates a user named user
with the password Password123!
. This user has suitable privileges on the demo
database. There’s an additional user with name maxscale_user
and password MaxScalePassword123!
. This is the user that the MaxScale database proxy will use to connect to the MariaDB database.
The Docker Compose file also configures the database proxy by disabling HTTPS (don’t do this in production!), exposing a set of ports (more on this in a moment), and configuring the database user and location of the MariaDB database proxy (normally an IP address, but here we can use the name of the container previously defined in the Docker file). The last line creates a NoSQL listener that we’ll use to connect as a MongoDB client on the default port (27017).
To start the services (containers) using the command line, move to the directory in which you saved the Docker Compose file and run the following:
docker compose up -d
After downloading all the software and starting the containers, you’ll have a MariaDB database and MaxScale proxy both preconfigured for this experiment.
Creating a SQL table in MariaDB
Let’s connect to the relational database. In the command line, execute the following:
mariadb-shell --dsn mariadb://user:'Password123!'@127.0.0.1
Check that you can see the demo
database:
show databases;
Switch to the demo database:
use demo;
Connecting to a database with MariaDB Shell
Create the books
table:
CREATE TABLE books( isbn VARCHAR(20) PRIMARY KEY, title VARCHAR(256), year INT );
Insert some data. I’m going to use the cliché of inserting my own books:
INSERT INTO books(title, isbn, year) VALUES ("Vaadin 7 UI Design By Example", "978-1-78216-226-1", 2013), ("Data-Centric Applications with Vaadin 8", "978-1-78328-884-7", 2018), ("Practical Vaadin", "978-1-4842-7178-0", 2021);
Check that the books are stored in the database by running:
SELECT * FROM books;
Inserting data with MariaDB Shell
Creating a JSON collection in MariaDB
We haven’t installed MongoDB, yet we can use a MongoDB client to connect to create collections and documents as if we were using MongoDB, except that the data is stored in a powerful fully ACID-compliant and scalable relational database. Let’s try that out!
In the command line, use the MongoDB shell tool to connect to the MongoDB… wait… it’s actually the MariaDB database! Simply run the following:
mongosh
By default, this tool tries to connect to a MongoDB server (which again, happens to be MariaDB this time) running on your local machine (127.0.0.1) using the default port (20017). If everything goes well, you should be able to see the demo
database listed when you run the following command:
show databases
Switch to the demo
database:
use demo
Connecting to MariaDB using Mongo Shell
We are connected to a relational database from a non-relational client! Let’s create the ratings
collection and insert some data into it:
db.ratings.insertMany([ { "isbn": "978-1-78216-226-1", "stars": 5, "comment": "A good resource for beginners who want to learn Vaadin" }, { "isbn": "978-1-78328-884-7", "stars": 4, "comment": "Explains Vaadin in the context of other Java technologies" }, { "isbn": "978-1-4842-7178-0", "stars": 5, "comment": "The best resource to learn web development with Java and Vaadin" } ])
Check that the ratings are stored in the database:
db.ratings.find()
Querying a MariaDB database using Mongo Shell
Using JSON functions in MariaDB
At this point, we have a single database that, from the outside, looks like a NoSQL (MongoDB) database and a relational (MariaDB) database. We are able to connect to the same database and write and read data from MongoDB clients and SQL clients. All the data is stored in MariaDB, so we can use SQL to join data from MongoDB clients or applications with data from MariaDB clients or applications. Let’s explore how MaxScale is using MariaDB to store MongoDB data (collections and documents).
Connect to the database using a SQL client like mariadb-shell
, and show the tables in the demo schema:
show tables in demo;
You should see both the books
and ratings
tables listed. ratings
was created as a MongoDB collection. MaxScale translated the commands sent from the MongoDB client and created a table to store the data in a table. Let’s see the structure of this table:
describe demo.ratings;
A NoSQL collection stored as a MariaDB relational table
The ratings
table contains two columns:
id
: the object iddoc
: the document in JSON format
If we inspect the contents of the table, we’ll see that all the data about ratings is stored in the doc
column in JSON format:
SELECT doc FROM demo.ratings \G
NoSQL documents stored in a MariaDB database
Let’s get back to our original goal—show the book titles with their rating information. The following is not the case, but let’s suppose for a moment that the ratings
table is a regular table with columns stars
and comment
. Joining this table with the books
table would be easy and our job would be done:
/* this doesn’t work */ SELECT b.title, r.stars, r.comment FROM ratings r JOIN books b USING(isbn)
We need to convert the doc
column of the ratings
table to a relational expression that can be used as a new table in the query. Something like this:
/* this still doesn’t work */ SELECT b.title, r.stars, r.comment FROM ratings rt JOIN ...something to convert rt.doc to a table... AS r JOIN books b USING(isbn)
That something is the JSON_TABLE function. MariaDB includes a comprehensive set of JSON functions for manipulating JSON strings. We’ll use the JSON_TABLE
function to convert the doc
column into a relational form that we can use to perform SQL joins. The general syntax of the JSON_TABLE
function is as follows:
JSON_TABLE(json_document, context_path COLUMNS ( column_definition_1, column_definition_2, ... ) ) [AS] the_new_relational_table
Where:
json_document
: a string or expression that returns the JSON documents to be usedcontext_path
: a JSON Path expression that defines the nodes to be used as the source of the rows
And the column definitions have the following syntax:
new_column_name sql_type PATH path_in_the_json_doc [on_empty] [on_error]
Combining this knowledge, our SQL query would look like the following:
SELECT b.title, r.stars, r.comment FROM ratings rt JOIN JSON_TABLE(rt.doc, '$' COLUMNS( isbn VARCHAR(20) PATH '$.isbn', stars INT PATH '$.stars', comment TEXT PATH '$.comment' ) ) AS r JOIN books b USING(isbn);
Joining NoSQL and SQL data in a single SQL query
We could have used the ISBN value as the MongoDB ObjectID and consequently as the id
column in the ratings
table, but I’ll leave that to you as an exercise (hint: use _id
instead of isbn
when inserting data using the MongoDB client or app).
A word on scalability
There’s a misconception that relational databases don’t scale horizontally (adding more nodes) while NoSQL databases do. But relational databases scale without sacrificing ACID properties. MariaDB has multiple storage engines tailored to different workloads. For example, you can scale a MariaDB database by implementing data sharding with the help of Spider. You can also use a variety of storage engines to handle different workloads in a per-table basis. Cross-engine joins are possible in a single SQL query.
Combining multiple storage engines in a single logical MariaDB database
Conclusion
Our job here is done! Now your systems can have an ACID-compliant scalable 360-degree view of your data independent of whether it was created by SQL or NoSQL applications. There’s less need to migrate your apps from NoSQL to SQL or to make SQL apps database polyglots. If you want to learn more about other features in MaxScale, watch this video or visit the docs.