NoSQL Protocol and Caching in MariaDB MaxScale

spacer

The MaxScale NoSQL protocol module allows a MariaDB server (Community and Enterprise) or cluster to be used as the backend of an application using a MongoDB® client library. Internally, all documents are stored in a table containing two columns; an id column for the object id and a doc column for the document itself.

When the MongoDB client application issues MongoDB protocol commands they are transparently converted into the equivalent SQL and executed against the MariaDB backend. The MariaDB responses are then in turn converted into the format expected by the MongoDB client library and application.

For instance, if the following command

db.runCommand({create: "cars"});

is issued in the MongoDB shell, it will be converted into the following SQL:

CREATE TABLE `test`.`cars` (
  id VARCHAR(35) AS (JSON_COMPACT(JSON_EXTRACT(doc, "$._id"))) UNIQUE KEY,
  doc JSON,
  CONSTRAINT id_not_null
    CHECK(id IS NOT NULL)
)

That is, a table with two columns; a doc column that will contain JSON and an id column whose content is extracted from the JSON document. The fact that the id column is defined as UNIQUE KEY ensures, together with the constraint, that every document must have a unique id.

Similarly, the following command

db.runCommand({find: "cars"});

will be converted into the following SQL:

SELECT doc FROM `test`.`cars`

Watch this video to see it in action:

Converting a MongoDB command into SQL involves a number of steps. The commands sent by the client arrive as MongoDB protocol packets and the content is BSON. The BSON must be interpreted and converted into the equivalent SQL, which then must be placed in a MariaDB protocol packet so that it can be transferred through the rest of MaxScale and further to the backend. Incidentally, this means that the NoSQL protocol module can be used together with every other router and filter of MaxScale. The server returns the results as a MariaDB result set, which then must be converted to BSON that the client will understand.

The interpretation of the BSON sent by the client and the conversion of MariaDB server results into BSON involves additional processing. With caching, the conversion cost can be reduced.

 

Caching

The Cache filter has been a part of MaxScale since 2.1. The first version only evicted data from the cache based upon specified time-to-live, but from version 2.5 onwards the cache has supported table-level invalidation as well. That is, when an updating statement affects a table, then all cached entries related to that table are evicted from the cache.

Since everything behind the NoSQL protocol module sees regular MariaDB protocol packets, the cache can be used with it as well. Even if the result would be found in the cache, the BSON conversion in both directions still must be made, which decreases the benefit of the cache.

In MaxScale 23.08, the cache filter was integrated into the NoSQL protocol module, so that entries can be looked up from the cache using a key generated from the BSON received from the client and where the cache entries are BSON that directly can be sent to the client.

Some processing of the BSON is needed, because when creating the key, the session id of the client must be excluded, otherwise, the caching would be session-specific. A found response must also be processed before it is sent to the client, as the response-to field in the response must match the request-id field of the request. That change will then require the crc of the packet to be recalculated.

Normally, the cache filter resides in the request/response processing chain where it either is able to directly generate the response or expects to eventually receive a response from its downstream component, whenever it has forwarded a request to it.

So, to be able to use exactly the same cache filter component, that arrangement is maintained also when the cache filter is used internally in the NoSQL component. When caching is enabled, the procedure for handling a MongoDB command is simplified as follows:

  1. A key is generated from the BSON and a cache lookup is made.
  2. If found, the response is directly sent to the client.
  3. Otherwise, the command is converted to a MariaDB request containing SQL, which is passed to the cache filter.
  4. The cache filter makes a cache lookup using a key created from the SQL.
  5. If not found, the request is sent forward and eventually reaches a server that will generate the response. On its way back, when the response reaches the cache filter, it is cached using the key created in step 4.
  6. When the response reaches the NoSQL protocol module, either directly if found in the cache or as the result of a roundtrip to the backend, the NoSQL protocol module will cache the response, converted into BSON, using the key generated in step 1.

The outcome is that the result may initially be cached twice; once as a MariaDB result set using a key generated from the SQL and once as a BSON response using a key generated from the BSON command.

The double caching does not matter much since, as the cache uses the Least Recently Used strategy (LRU), the SQL/Result Set entries will not be used and thus be evicted among the first ones, if the converted BSON response can be cached. If the converted BSON response cannot be cached, we will still benefit from the MariaDB result set being in the cache, as it removes the need for a server roundtrip.

 

Benchmark

During benchmarking, all software was running on a Lenovo ThinkPad P53s with 40GB of RAM with frequency boost off. MaxScale was configured to use one routing thread and a single MariaDB 10.5 server was used as the backend. For the comparison, MongoDB 4.4.6 was used.

The client program, written using version 3.6.10 of the MongoDB Node.js driver, communicated with MongoDB and MaxScale over a TCP socket, while MaxScale communicated with the MariaDB server over a Unix domain socket.

The data used when benchmarking consisted of some ten thousand documents of the following kind:

    {
      "Year": 2020,
      "Make": "Audi",
      "Model": "Q3",
      "Category": "SUV",
      "createdAt": "2020-01-27T20:44:17.665Z",
      "updatedAt": "2020-01-27T20:44:17.665Z",
      "_id": 1
    }

Before each test

  • the collection was deleted,
  • a new collection was created and the car data was loaded into the collection, and
  • the test query was executed once to warm things up.

The reported result is then the average of 100 executions of the query.

In the results, the values in the product column have the following meaning:

ProductMeaning
MongoDBMongoDB was used as the backend.
MariaDB NoSQL, no caching.MaxScale NoSQL protocol module without caching of any kind.
MariaDB NoSQL, filter caching.MaxScale NoSQL protocol module with a regular cache filter in the routing chain; SQL/Result-Sets will be cached.
MariaDB NoSQL, internal caching.MaxScale NoSQL protocol module with internal caching enabled; SQL/Result-Sets and BSON-requests/BSON-responses will be cached.

 

Find First 101 Documents.

Query

{
    find: "cars"
}

This query will return the first 101 (the default amount unless a specific amount is specified) cars from the collection and a cursor using which additional cars could be fetched. In the test, the cursor was simply discarded.

ProductMilliseconds
MongoDB1.75
MariaDB NoSQL, no caching.13.25
MariaDB NoSQL, filter caching.3.12
MariaDB NoSQL, internal caching.3.03

 

The almost same result of filter and internal caching is explained by the fact that currently the BSON response is cached only if the entire result can be returned in the first response to the client. Here it cannot, since there are more cars than 101 and consequently the internal caching will behave the same way the filter caching does.

 

Find All in One Go

Query

{
    find: "cars",
    batchSize: 10000
}

This query will return at most 10000 cars (i.e. all, since the number of cars was slightly less than that) from the collection.

ProductMilliseconds
MongoDB70.12
MariaDB NoSQL, no caching.129.98
MariaDB NoSQL, filter caching.119.59
MariaDB NoSQL, internal caching.62.17

 

In this case, since all documents can be returned, the BSON response will be cached; hence internal caching is significantly faster than filter caching. The small difference between no caching and filter caching suggests that, in this case, most time is spent transforming the result set into the equivalent BSON, something that will be done just once with internal caching.

 

Find Some Cars

Query

{
    find: "cars",
    filter : { "Make": "Toyota" },
    batchSize: 10000
}

This query will return all cars where the value of Make is Toyota, which in practice means 579 documents.

ProductMilliseconds
MongoDB11.47
MongoDB (indexed)5.18
MariaDB NoSQL, no caching.7.61
MariaDB NoSQL, filter caching.7.22
MariaDB NoSQL, internal caching.4.12

 

In this benchmark, there were two cases for MongoDB. In the first case, there was no index. In the second case, there was an index on the Make field used in the query. The index was created as follows

db.runCommand({createIndexes: "cars", 
              indexes: [ { key: { Make: 1}, name: "Make" }]});

The most surprising result here is that unless an index is used, even the default non-cached MaxScale NoSQL protocol alternative is faster than non-indexed MongoDB and fairly close to the indexed one. Currently, the NoSQL protocol does not support indexes, except for the default  created on _id, but indexes on expressions are on the roadmap of the MariaDB server, and once they are available, it will be straightforward to take them into use.

 

Find One by Query

Query

{
    find: "cars",
    filter: { "_id": { "$eq": 4711 }}
}

The query will return the document whose _id is 4711. In the NoSQL case, when the query is expressed like this, the fact that the id is available in a separate indexed column will not be utilized. MongoDB will use its implicit index on _id.

ProductMilliseconds
MongoDB0.62
MariaDB NoSQL, no caching.0.63
MariaDB NoSQL, filter caching.0.44
MariaDB NoSQL, internal caching.0.40

 

The performance of MongoDB and the non-caching alternative of NoSQL is almost the same, while both caching alternatives are faster.

 

Find One by Id

Query

{
find: "cars",
filter: { "_id": 4711 }
}

The query will return the document whose _id is 4711. When the filter is expressed like this, the NoSQL implementation will generate SQL that refers to the indexed id column.

ProductMilliseconds
MongoDB0.55
MariaDB NoSQL, no caching.0.56
MariaDB NoSQL, filter caching.0.39
MariaDB NoSQL, internal caching.0.35

 

Again, the performance of MongoDB and the non-caching alternative of the MaxScale NoSQL listener is almost the same, while both caching alternatives are faster.

 

Configuration

The MaxScale configuration that was used can be found here and the program used for making the measurements can be found here.

 

Conclusion

You get what you measure, so care should be taken before drawing too far-reaching conclusions from the results. The amount of data was quite small, so all data is likely to have resided in the InnoDB buffer pool of the MariaDB server, although presumably there is something similar in MongoDB as well. All programs ran on the same computer, so there was no networking overhead, but in a typical setup MaxScale and the MariaDB server might reside on different hosts.

However, with this setup, this data, and these use cases, the MaxScale NoSQL listener performs well and with internal caching enabled, it performs even better. Caching improves performance only if the same find command ends up being executed multiple times, without updates in between, a scenario that is found frequently in software applications.

 

Getting Started

In this related article, Alejandro Duarte uses MaxScale, MariaDB, mariadb-shell, mongosh and Docker to demonstrate bootstrapping a development environment and provides some example queries. Source is available on Github for you to use in learning / non-production scenarios.

 

Getting Help

MariaDB is here to help if you need additional assistance, see our MariaDB Migration Service, or try community resources like the MariaDB Slack Community, DBA Stack Exchange, and the Community Knowledge Base.

 

Learn More

Go to mariadb.com/downloads for MariaDB Community Server downloads, along with connectors and everything else you need to get going with MariaDB Server! Customers also have access to MariaDB Enterprise Server which includes the advanced database proxy, MariaDB MaxScale, for production deployments.