1 of 23

Contributing & Participating

There are many ways to contribute to MariaDB.

Contributing Code

For contributors interested in MariaDB development, explore open projects via and check for . Engage with the community on the mailing list, , , or channel for guidance.

General information about contributing to MariaDB (for developers and non-developers) can be found on the page.

Finding Development Projects to Work on

There are many open development projects for MariaDB which you can contribute to (in addition to any ideas you may have yourself).

Contributing to the MariaDB Project

How to Contribute to MariaDB

The success of MariaDB relies heavily on community involvement. You can contribute in various ways, even if you are not a developer:

Bug Reporting: Create an account to report bugs.

MariaDB Public FTP Server

MariaDB provides a secure FTP, SFTP and WebDAV server where you can upload files to be used by MariaDB developers, for example table structures and data for bug reports.

The folder tree consists of:

The public folder for files that the MariaDB developers want to give the public access to (patches, samples etc).
The private folder for uploads. Files uploaded there can only be accessed by MariaDB developers. You will not be able to see your upload and this folder does not allow downloads. This is done to protect any sensitive information which may be in test results, mysqld & core files. Upload those into this folder.
The secret folder is for private downloads. Files in this folder are not visible so you will need the complete filename to successfully download a file from this folder.

To share files with MariaDB developers, upload it into the private directory with either:

SFTP client (scp), enter 'anonymous' as the password:

You can ignore the 'fsetstat: Permission denied' error.

WebDAV client (curl):

FTP client (lftp); enter 'anonymous' as the password:

You can ignore the 'network error'.

Note for MariaDB developers: please request your access to the SFTP service if not already at ftp@mariadb.org (provide public SSH key and username). You will then be able to access the service with:

or with HTTPS at .

Sponsoring the MariaDB Project

Sponsorships are crucial for ongoing and future development of the MariaDB project! There are a number of easy ways you for you to help the project:

Fund the development of a specific feature. You can find a list of suggested features to sponsor here or in JIRA. Feel free to sign in and add more projects to either place!
Contribute with developer time. If your organization has talented developers familiar with MariaDB or MySQL codebase they can become part of the MariaDB team and contribute to the development of the MariaDB project.
Hire a developer that you dedicate to work on the MariaDB project.
A pure with no strings attached

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code

MariaDB has participated in since 2013. This section contains pages providing information for each year.

We believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , ) and on , which allows you to scale your reads & writes. Lately, we also have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Google Summer of Code 2025

This year we are again participating in the . We, joined with the , believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , , ) and on , which allows you to scale your reads & writes. And we have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on to mingle with the community. You should also subscribe to the (this is the main list where we discuss development - there are also ).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Google Summer of Code 2022

In 2022, we again participated in the Google Summer of Code. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , C++, , , Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

make my_vsnprintf to use gcc-compatible format extensions (Part-time project - 175h) MariaDB has its own implementation of most standard C libraries. This is to ensure compatibility across different platforms. Over time this library has evolved and is currently not behaving exactly like POSIX standard library. Thus we want to attain the principle of "least-surprise" with this library. Everything that is supported by standard printf functions should function the same with MariaDB's compatibility library extension.
- Skills needed: C/C++. Project difficulty: easy Mentor Sergei Golubchik
JSON_DETAILED output unnecessarily verbose. (Part Time project - 175h) As is explained in the MDEV in detail, we want to improve JSON_DETAILED function to better suit our development and debugging purposes. This project will aim to clean up the function's implementation as well as introduce test cases as well as potential "nice-to-have" features to make developer's lives easier.

ColumnStore

Research/implement basic vectorized filtering for ARM platforms (Full-time project - 350h) AS of 6.2.2 Columnstore(MCS) supports vectorization on x86_64 platforms only. The goal of the project is to implement a vectorized low-level filtering for ARM platforms using 128bit ARM NEON extention(SVE is optional). Low-level filtering in the context is the simple predicate WHERE filters, e.g. WHERE c1 = 5 AND c2 in (10, 25). Please see the corresponding Jira issue for details.
- Skills needed: C/C++, understand low-level platform specifics. Project difficulty: medium Mentor Roman Nozdrin
Build/run Columnstore on MacOS (Part-time project - 175h) As of Columnstore(MCS) 6.2.2 there is no way to compile/use the MCS engine on MacOS. The goal of this project is to be able to boostrap MariaDB + basic(maybe rudimentary) MCS on MacOS. There are number of known issues that prevents MCS compilation on MacOS: a number of offending preprocessor macroses/definitions specific for Linux x86_64 combination; MacOS doesn't provide syslog used by MCS as the only log messages sink. Please see the corresponding Jira issue for details.

Buildbot (CI/CD)

Better Grid view for buildbot.mariadb.org (Python / Javascript / Web Dev ( Full-time project 350h) Our CI/CD infrastructure uses a recent version of Buildbot. The GRID view plugin that comes with Buildbot is not adequate for our needs. In this project, you will discuss with your mentor as well as other MariaDB developers on how to best improve the User Experience of Buildbot's grid view for what MariaDB Developers needs to accomplish.
- Skills needed: Understanding of web-dev technologies like Angular, React, and Javascript related libraries. Python may also be required. Mentor Vlad Bogolin / Andreia Hendea

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2014

We participated in Google Summer of Code 2014. MariaDB and the MariaDB Foundation believe we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on , which allows you to scale your reads & writes.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Or subscribe to . Or both.

Please keep in mind that in April we travel a lot (conferences, busy time), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!

Google Season of Docs

Remove from the navigation. Discuss those pages with Max.

MariaDB applied to participate in the first .

Google Season of Docs 2020

MariaDB applied to participate in the 2020 Google Season of Docs, but was unsuccessful.

Where to Start

Please join us on Zulip to mingle with the community. You can also subscribe to , the documentation mailing list.

Google Summer of Code 2022

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

make my_vsnprintf to use gcc-compatible format extensions (Part-time project - 175h) MariaDB has its own implementation of most standard C libraries. This is to ensure compatibility across different platforms. Over time this library has evolved and is currently not behaving exactly like POSIX standard library. Thus we want to attain the principle of "least-surprise" with this library. Everything that is supported by standard printf functions should function the same with MariaDB's compatibility library extension.
- Skills needed: C/C++. Project difficulty: easy Mentor Sergei Golubchik
JSON_DETAILED output unnecessarily verbose. (Part Time project - 175h) As is explained in the MDEV in detail, we want to improve JSON_DETAILED function to better suit our development and debugging purposes. This project will aim to clean up the function's implementation as well as introduce test cases as well as potential "nice-to-have" features to make developer's lives easier.

ColumnStore

Research/implement basic vectorized filtering for ARM platforms (Full-time project - 350h) AS of 6.2.2 Columnstore(MCS) supports vectorization on x86_64 platforms only. The goal of the project is to implement a vectorized low-level filtering for ARM platforms using 128bit ARM NEON extention(SVE is optional). Low-level filtering in the context is the simple predicate WHERE filters, e.g. WHERE c1 = 5 AND c2 in (10, 25). Please see the corresponding Jira issue for details.
- Skills needed: C/C++, understand low-level platform specifics. Project difficulty: medium Mentor Roman Nozdrin
Build/run Columnstore on MacOS (Part-time project - 175h) As of Columnstore(MCS) 6.2.2 there is no way to compile/use the MCS engine on MacOS. The goal of this project is to be able to boostrap MariaDB + basic(maybe rudimentary) MCS on MacOS. There are number of known issues that prevents MCS compilation on MacOS: a number of offending preprocessor macroses/definitions specific for Linux x86_64 combination; MacOS doesn't provide syslog used by MCS as the only log messages sink. Please see the corresponding Jira issue for details.

Buildbot (CI/CD)

Better Grid view for buildbot.mariadb.org (Python / Javascript / Web Dev ( Full-time project 350h) Our CI/CD infrastructure uses a recent version of Buildbot. The GRID view plugin that comes with Buildbot is not adequate for our needs. In this project, you will discuss with your mentor as well as other MariaDB developers on how to best improve the User Experience of Buildbot's grid view for what MariaDB Developers needs to accomplish.
- Skills needed: Understanding of web-dev technologies like Angular, React, and Javascript related libraries. Python may also be required. Mentor Vlad Bogolin / Andreia Hendea

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2024

In 2024, MariaDB again participated in the Google Summer of Code. We believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , C++, , , Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to the developers mailing list (this is the main list where we discuss development - there are also other mailing lists).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

Implement IVFFlat indexing strategy for MariaDB Vector and evaluate performance

Part-time (175h) or full-time project (350h) - depending on scope MariaDB Vector is coming to MariaDB Server to serve AI Workloads. The current indexing strategy will use HNSW, but IVFFlat is a possible alternative that costs fewer resources to create. Having it as an option is desirable.

Spatial (GIS) functions in MariaDB

Part-time (175h) or full-time project (350h) - depending on scope

Our GIS functionality is limitted compared to other DBMSes. Given that MariaDB looks to facilitate migration from MySQL, we should be on par. We have a list of functions that are missing in MariaDB compared to MySQL, as described in . Our goal is to have as many of these functions available within MariaDB. Some of the functionality can be ported from MySQL, while others might require implementation from scratch.

Skills needed: Understanding of C++ development. Ability to navigate a large codebase (with help from mentor).Mentors: Anel Husakovic (primary) / Vicențiu Ciorbaru (secondary)

MariaDB Oracle mode misses Synonyms

Full-time project 350h

Synonyms are an important feature, particularly as it helps smooth migration from other databases. While the initial project scope seems straightforward, there are a number of aspects that must be considered:

Grammar extension
Where will the synonyms definitions be stored?
How do synonyms map to the underlying privilege system? Who can create a synonym? Who can access a synonym?
Do we enforce the underlying object to exists before creating a synonym? What if the underlying object gets dropped?

Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Vicențiu Ciorbaru (primary) / Michael Widenius (secondary)

CREATE TRIGGER FOR { STARTUP | SHUTDOWN }

Full-time project 350h

Support generalized triggers like

the latter being a synonym for CREATE EVENT.

should STARTUP/SHUTDOWN triggers run exclusively? that is, STARTUP trigger is run before any connection is allowed or in parallel with them? Same for SHUTDOWN.

make my_vsnprintf to use gcc-compatible format extensions

Part-time project 175h

my_vsnprintf() is used internally in the server as a portable printf replacement. And it's also exported to plugins as a service.

It supports a subset of printf formats and three extensions:

%smeans that a string should be quoted like anidentifier`
%b means that it's a binary string, not zero-terminated; printing won't stop at \0, so one should always specify the field width (like %.100b)
%M is used in error messages and prints the integer (errno) and the corresponding strerror() for it

gcc knows printf formats and check whether actual arguments match the format string and issue a warning if they don't. Unfortunately there seems to be no easy way to teach gcc our extensions, so for now we have to disable printf format checks.

An better approach would be to use gcc compatible format extensions, like Linux kernel does. We should migrate to a different syntax for our extensions

%sI to mean "print as an identifier"
%sB to mean "print a binary string"
%uE to mean "print an errno"

old formats can still be supported or they can be removed and in the latter case the major version of the service should be increased to signal an incompatible change.

All error messages and all usages of my_vsnprintf should be changed to use the new syntax. One way to do it is to disable old syntax conditionally, only in debug builds. All gcc printf format checks should be enabled.

Skills needed: Understanding of C development.Mentors: Sergei Golubchik

Parallel CSV read leveraging Apache Arrow

Full-time project 350h

cpimport is a binary that ingests data into MCS in an efficient manner reducing ingest timings significantly whilst preserving transaction isolation levels.

cpimport is relatively complex facility that reads data from local file/S3 parses it, converts and put into MCS-specific files. cpimport is unable to read a big-sized single CSV file from disk in parallel. Apache Arrow has a CSV read faciilty that can do parallel CSV read. The goal of the project is to replace an existing homebrew CSV parser implemented in cpimport with the one from Apache Arrow.

Skills needed: modern C++.Mentors: Leonid Fedorov

Manual vacuum cleaning for on-disk data empty records

Full-time project 350h

Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.

Skills needed: modern C++Mentors: Roman Nozdrin

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2021

In 2021, we again participated in the Google Summer of Code. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the and from the MariaDB Issue Tracker.

List of Tasks

Support for GTID in mysqlbinlog

The mysqlbinlog client program needs to be updated to support GTID. Here is a suggested list of things to be done:

The --start-position and --stop-position options should be able to take GTID positions; or maybe there should be new --start-gtid and --stop-gtid options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.
A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.

Probably some more things will come up during the work, but this looks like a reasonable start.

Details:

Mentor:

Granted to PUBLIC

Implement the standard behavior for

Also, this statement is supposed to work:

And these should not

Note that

should not list roles and privileges granted to PUBLIC (unless granted to xxx too), but

should, arguably, list them.

Details:

Mentor:

Control over memory allocated for SP/PS

SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)Task minimum: Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.Other ideas:

Automatic stop allocation in debugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory

Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.

Details:

Mentor:

Add JSON_NORMALIZE function to normalize JSON values

Background is this question on stackexchange: The task is to provide a function that can be used to compare 2 json documents for equality, then name can be e.g JSON_NORMALIZE JSON_COMPACT already takes care of removing spaces, but this is not sufficient. Keys need to be (recursively) sorted , and if spaces are removed, then documents can be compared as binary strings.

Details:

Mentor:

Add linear regression functions

The following linear regression functions exist in a number of other DBMSs, such as Oracle, PostgreSQL:

Some have also been added to Columnstore.

Details:

Mentor:

Create utility to parse frm files and print their DDL

It would be useful if MariaDB had a utility that was able to parse frm files and print the DDL associated with the table. For example, it would be useful for users who performed a partial backup with mariadb-backup: But they forgot to also backup the table DDL, so they can't restore the tables using the following process: mysqlfrm is a tool that already exists that does similar things: But it doesn't seem very user-friendly. It needs to be able to contact the local MariaDB server, and it also needs to be able to spawn a server instance, and it seems to need to be able to create a bunch of files during this process. e.g.:

Details:

Mentor:

JSON_DETAILED output unnecessarily verbose

JSON_DETAILED function ( ) is described as

We now got a use case for it: Optimizer Trace output. Optimizer trace is too large to be copied in full, instead we use expressions like

Our experience is that JSON_DETAILED has some room for improvement when it comes to the quality of automatic JSON formatting. Example:

Things to note:

empty lines at the start (right before/after the "range_scan_alternatives")
"analyzing_index_merge_union":[] occupies 3 lines where one would be sufficient.
the same goes for "ranges"

One can look at the JSON pretty-printer that is used by EXPLAIN FORMAT=JSON and optimizer trace. It produces a better result (but it has room for improvement, too.) Extra: in MySQL, the function is called JSON_PRETTY. We should add ability to use this name as an alias.

Details:

Mentor:

Histograms: use JSON as on-disk format

Currently, histograms are stored as array of 1-byte bucket bounds (SINGLE_PREC_HB) or 2-byte bucket bounds (DOUBLE_PREC_HB). The table storing the histograms supports different histogram formats but limits them to 256 bytes (hist_size is tinyint).

This prevents us from supporting other kinds of histograms. The first low-hanging fruit would be to store the histogram bucket bounds precisely (like MySQL and PostgreSQL do, for example). The idea of this MDEV is to switch to JSON as storage format for histograms. If we do that, it will:

Improve the histogram precision
Allow the DBAs to examine the histograms
Enable other histogram types to be collected/used. h2. Milestone-1: Let histogram_type have another possible value, tentative name "JSON" when that is set, let ANALYZE TABLE syntax collect a JSON "histogram"

that is, the following should work:

this should produce {"hello":"world"}. h2. Milestone-2: produce JSON with histogram(). &#xNAN;()- the exact format is not specified, for now, print the bucket endpoints and produce output like this:

Milestone-2, part#2: make mysql.column_stats.histogram a blob. h2. Milestone-3: Parse the JSON back into an array Figure out how to use the JSON parser. Parse the JSON data produced in Milestone-2 back. For now, just print the parsed values to stderr. (Additional input provided on Zulip re parsing valid/invalid JSON histograms) h2. Milestone-4: Make the code support different kinds of Histograms Currently, there's only one type of histogram. smaller issue: histogram lookup functions assume the histogram stores fractions, not values. bigger issue: memory allocation for histograms is de-coupled from reading the histograms. See alloc_statistics_for_table, read_histograms_for_table. The histogram object lives in a data structure that is bzero'ed first and then filled later (IIRC there was a bug (fixed) where the optimizer attempted to use bzero'ed histogram) Can histograms be collected or loaded in parallel by several threads? This was an (unintentional?) possibility but then it was disabled (see TABLE_STATISTICS_CB object and its use) h3. Step #0: Make Histogram a real class Here's the commit: h3. Step 1: Separate classes for binary and JSON histograms Need to introduce

and a factory function

for now, let Histogram_json::point_selectivity() and Histogram_json::range_selectivity() return 0.1 and 0.5, respectively. h3. Step 2: Demonstrate saving/loading of histograms Now, the code already can:

collect a JSON histogram and save it.
when loading a histogram, figure from histogram_type column that this is JSON histogram being loaded, create Histogram_json and invoke the parse function. Parse function at the moment only prints to stderr. However, we should catch parse errors and make sure they are reported to the client. The test may look like this:

h2. Milestone-5: Parse the JSON data into a structure that allows lookups. The structure is

and it holds the data in KeyTupleFormat (See the comments for reasoning. There was a suggestion to use in_vector (This is what IN subqueries use) but it didn't work out) h2. Milestone 5.1 (aka Milestone 44) Make a function to estimate selectivity using the data structure specified in previous milestone. h2. Make range_selectivity() accept key_range parameters. (currently, they accept fractions, which is only suitable for binary histograms) This means Histogram_binary will need to have access to min_value and max_value to compute the fractions.

Details:

Mentor:

make my_vsnprintf to use gcc-compatible format extensions

my_vsnprintf() is used internally in the server as a portable printf replacement. And it's also exported to plugins as a service. It supports a subset of printf formats and three extensions:

%s` means that a string should be quoted like an identifier
%b means that it's a binary string, not zero-terminated; printing won't stop at \0, so one should always specify the field width (like %.100b)
%M

gcc knows printf formats and check whether actual arguments match the format string and issue a warning if they don't. Unfortunately there seems to be no easy way to teach gcc our extensions, so for now we have to disable printf format checks. An better approach would be to use gcc compatible format extensions, like Linux kernel does. We should migrate to a different syntax for our extensions

%sI to mean "print as an identifier"
%sB to mean "print a binary string"
%uE to mean "print an errno"

old formats can still be supported or they can be removed and in the latter case the major version of the service should be increased to signal an incompatible change. All error messages and all usages of my_vsnprintf should be changed to use the new syntax and gcc printf format checks should be enabled.

Details:

Mentor:

Add JSON_EQUALS function to check JSON equality

JSON_CONTAINS can be used to test for JSON object equality in some cases, but we seem to lack a clear JSON_EQUALS function.

Details:

Mentor:

Concurrent multi-reader, multi-writer buffer for IO_CACHE

IO_CACHE has basically three read/write modes: only read, only write, and a sequential read/write FIFO mode SEQ_READ_APPEND. While some performance-sensitive places, like replication slave thread, use SEQ_READ_APPEND, that may be a bottleneck. since reads and writes are sequential (and co-sequential i.e. reads and writes block each other). The task is to implement a non-blocking mode or multi-reader, multi-writer use-case through a concurrent ring buffer implementation. h2. Possible approaches h3. Lock-free n-consumer, m-producer ring buffer This implementation requires limiting a number of simultaneous accessors and reserving slots for them. Lock-free implementations can contain busy waits, but no locks, except when a number of consumers or producers is exceeded. This can be controlled by a semaphore with a capacity of a number of cores. This is an ideal way, but can be an overhaul because of complicated busy loops and slot management. This is also hard because writes can be bigger than a buffer. See buffer excess. h3. Simple rwlock-based non-blocking approach The bottleneck basically occurs because SEQ_READ_APPEND blocks the whole time during buffer copy. We can avoid it by moving the pointers first, thus allocating a place for copying, and then make a copy from/to the buffer without a lock. rwlock will be used to access the pointers, i.e. reads access IO_CACHE::end_of_file with read lock to make ensure the borders, but writers access it with write lock. h2. Buffer excess Excesses make things work sequential. When the buffer is full, a separate write buffer is created. When the write buffer is full, a flush happens. Flushes wait for all writers to finish first, then lock the write buffer for flushing. The read buffer can be flushed in a more relaxed way: no need to need to lock for flushing, but we have to lock for buffer allocation and wait for all writers. Waiting for writers can be done with another rwlock. h2. Single-readerness The real-world cases are mostly single-consumer, and it is essential for IO_CACHE: it is variable-lengthed, and has no underlying data format, so the reader always has to make at least two sequential reads (one to read size and another to read the body) Single-readerness considerations can relax some conditions and ease the implementation h2. io_cache_reserve api We can add a function to reserve the space to writing for the case of writing big objects (both bigger then the write cache and smaller then the latter, but big enough to not fit to the external buffer), for the cases like copying one cache to another. The function should return future-like object, since we have to notify IO_CACHE back that the writing is finished (to make flush for example)

Details:

Mentor:

Custom formatting of strings in MariaDB queries

Formatting more complex strings in a SELECT statement can get awkward when there are many concat(), format(), etc calls involved. It would be very cool and helpful to have a function that takes an input string and a formatting specification and returns string formatted using the rules the user passed in the specification. A great example for such a function is the classic C printf function, which, in this context, would look something like:SELECT printf('%s %s, %s', first_name, last_name, job_title) from employees; But it doesn't necessarily need to look this way, an alternative syntax could be Python-ish, which would leverage the fact that the server already knows the datatype of each field used in the formatting scheme:SELECT sformat('arg1: {}, arg2: {}', col1, col2) from table; In that syntax one passes formatting options within the curly braces:

Ideally, this new function should use, behind the scenes, the existing builtin formatting functions in MariaDB (e.g. date_format(), format()) and even future formatting functions (e.g. MySQL's format_bytes(), format_pico_time()), so the syntax has to be designed in a smart way to accommodate easily future additions.

Details:

Mentor:

Add autocompletion capabilities to the MariaDB Jupyter kernel

As part of the protocol, the Jupyter frontend sends a complete_request message to the MariaDB kernel when the user invokes the code completer in a Jupyter notebook. This message is handled in the function from the MariaDBKernel class. In simpler words, whenever the user hits the key shortcut for code autocompletion in a notebook, the MariaDB kernel's do_complete function is called with a number of arguments that help the kernel understand what the user wants to autocomplete. So the autocompletion infrastructure in the MariaDB kernel is already kindly provided by Jupyter, we only need to send back to Jupyter a list of suggestions based on the arguments that do_complete receives :-). Ideally we should aim to enable at least database, table and column name completion and also SQL keyword completion. But no worries, there are plenty of possibilities to extend the functionality even more if the accepted student turns out to be very productive :D

Details:

Mentor:

Implement interacting editing of result sets in the MariaDB Jupyter kernel

At this moment the MariaDB kernel is only capable of getting the results sets from the MariaDB client in HTML format and packing them in a Jupyter compatible format. Jupyter then displays them in notebooks like it would display Python Pandas dataframes. Sure, the users can easily write SQL code to modify the content of a table like they would write in a classical command line database client. But we want to go a bit further, we would love to have the capability to edit a result set returned by a SELECT statement (i.e. double click on table cells and edit) and have a button that users can press to generate a SQL statement that will update the content of the table via the MariaDB server. Apart from interacting with the Jupyter frontend for providing this UI capability, we also have to implement a field integrity functionality so that we make sure users can't enter data that is not compatible with the datatype of the column as it is seen by the MariaDB server. The project should start with a fair bit of research to understand how we can play with the protocol to create the UI functionality and also to check other Jupyter kernels and understand what's the right and best approach for tackling this.

Details:

Mentor:

Make the MariaDB Jupyter kernel capable of dealing with huge SELECTs

Currently the MariaDB kernel doesn't impose any internal limits for the number of rows a user can SELECT in a notebook cell. Internally the kernel gets the result set from MariaDB and stores it in a pandas DataFrame, so users can use it with magic commands to chart data. But this DataFrame is stored in memory, so if you SELECT a huge number of rows, say 500k or 1M, it's probably not a very good idea to create such a huge DataFrame. We tested with 500k rows, and the DataFrame itself is not the biggest problem, it consumed around 500MB of memory. The problem is the amount of rows the browser needs to render, for 500k rows the browser tab with the notebook consumes around 2GB of memory, so the Jupyter frontend (JupyterLab, Jupyter Notebook) slows down considerably. A potential solution is to introduce a two new config options which would specify:

a limit for the number of rows the Jupyter notebook should render, a reasonable default value for this could 50 rows for instance (display_max_rows)
a limit for each SELECT statement, limit_max_rows, that the kernel would use to determine whether it should store the result set in memory in a DataFrame or store the result set on disk. A reasonable default value might be 100k rows.

The trickiest part of the project though is that, once the kernel writes a result set on disk, the charting magic commands need to detect that the data is not in memory, it is on disk, and they should find a smart mechanism for generating the chart from the disk data without loading the entire data in memory (which would defeat the whole purpose of the project). This might involve finding a new Python plotting library (instead of current matplotlib) that can accomplish the job.

Details:

Mentor:

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2013

We participated in Google Summer of Code 2013. MariaDB and the MariaDB Foundation believes we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and we also work on MariaDB Galera Cluster which allows you to scales your reads & writes.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Or subscribe to maria-developers@lists.launchpad.net. Or both.

LDAP authentication plugin

We would like the authentication system to be able to authenticate against a LDAP Directory Server.

See .

Skills: C, working knowledge of LDAP

Mentor: Sergei Golubchik

Kerberos authentication plugin

this project is taken

Kerberos is a security mechanism used in a lot of financial institutions. A MySQL plugin that allows authentication against Kerberos is the goal here.

See .

Skills: C/C++, working knowledge of Kerberos

Mentor: Sergei Golubchik

Active Directory authentication plugin

The Microsoft Windows world is all about Active Directory and upstream MySQL Enterprise already has this feature (though its a paid offering). It would be great to have an open source equivalent.

See .

Skills: C/C++, working knowledge of Active Directory/SAMBA, Windows-based development environment

Mentor: Sergei Golubchik, Vladislav Vaintroub

Keystone authentication plugin

Keystone is the OpenStack Identity Service. The idea would be to ensure that MariaDB can authenticate to Keystone directly.

Skills: Python, C/C++

Mentor: Mark Riddoch

Regex enhancements

this project is taken

MySQL and MariaDB use an old regex library, it works bytewise, and thus only supports one byte character set. It needs to be replaced by a modern multi-byte character set aware regex library.

Additionally a much requested REGEX_REPLACE function should be implemented. (See also for some UDF code that could be used as a starting point for this)

Detailed task description:

Skills: C/C++

Mentor: Alexander Barkov

Self-Tuning Optimizer

One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table.

Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also — somewhat — on the application.

A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.

See .

Skills: C/C++

Mentor: Sergei Golubchik

Roles

this project is taken

Roles, close to SQL:2003 standard. See .

Skills: C/C++

Mentor: Sergei Golubchik

Potential list

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2015

We participated in the Google Summer of Code 2015. MariaDB and the MariaDB Foundation believe we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB MaxScale which is a pluggable database proxy.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

Please keep in mind that in April we travel a lot (conferences, busy time focusing on a release), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!

At the moment, tasks that may be suitable for GSoC 2015 are listed in the MariaDB Issue Tracker under

Some suggested projects

Enhancing mysqlbinlog

This project consists of two parts -- it can either be performed by 2 students or 1 student with the relevant skills:

Support for GTID in mysqlbinlog

The tool needs to be updated to understand the replication feature called (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.

Details:

Skills:

Mentor:

Remote backup of binary logs

in MySQL 5.6 also supports streaming servers for . This is important as the MHA tool can also use this feature.

Details:

Skills:

Mentor:

Indexes on virtual columns

We have the concept of (non-materialized) columns, and currently to have an on a virtual column one has to materialize it. To support indexes on fully virtual columns, a storage engine must call back into the server to calculate the value of the virtual column.

Details:

Skills:

Mentor:

Table functions

User functions (UDF-like) that return a table, rows and columns. It should be possible to use it in other statements. A possible implementation could be: the function exports a generator, we create a handler of the hidden "storage engine" class, no indexes, and convert this generator to rnd_init/rnd_next. Need to disable rnd_pos somehow. Alternatively, it can materialize the result set in a temporary table (like information_schema does), then this table can be used normally.

Details:

Skills:

Mentor:

Aggregate stored functions

With one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like , , etc). This task is to add support for aggregate .

Details:

Skills:

Mentor:

GIS enhancements

enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.

Details:

Skills:

Mentor:

Port InnoDB memcached interface to MariaDB

MySQL 5.6 has a memcached plugin to InnoDB. MySQL 5.7 has improved performance of this. The task would be to port this to run against MariaDB, and make it work against XtraDB/InnoDB for the 10.2 series of MariaDB.

Details:

Skills:

Mentor:

Automatic provisioning of slave

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB slave.

Details:

Skills:

Mentor:

Indexes for BLOBs (in MyISAM and Aria)

MyISAM and Aria support special that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.

This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.

Details:

Skills:

Mentor:

Improved temporary tables

It is a well-known and very old MySQL/MariaDB limitation that temporary tables can only be used once in any query; for example, one cannot join a temporary table to itself. This task is about removing this limitation

Details:

Skills:

Mentor:

Provide GTID support for MHA

MySQL Master HA (MHA) is a tool to assist with automating master failover and slave promotion within short downtime, without suffering from replication consistency problems, and without performance penalty. We would like to have this tool support MariaDB 10 .

Skills:

Mentor:

Import and export popular data formats from and to dynamic columns

Provide import and export functions for popular data formats like JSON, XML (limited), PHP, ... for Connector/C and MariaDB Server (which use same code base for dynamic columns)

Details:

Skills:

Mentor:

MaxScale filter to capture incoming operations for consumption in external sources

Design a filter that will capture incoming inserts, updates and deletes, for specified tables (as regex) in a separate log file that is consumable as JSON or CSV form. So that external ETL processes can process it for data uploading into DWH or big data platform. Optionally a plugin that takes this log into a Kafka broker that can put this data on Hadoop node can be developed as next step.

Details:

Skills:

Mentor:

MaxScale filter to real Microsoft SQL Server syntax

Develop a MaxScale filter that will translate SQL Server syntax to MariaDB syntax. Develop a SQL Server client protocol plugin.

Details:

Skills:

Mentor:

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2016

We participated in the Google Summer of Code 2016 (we have participated previously in 2015, 2014, and 2013). The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB MaxScale which is a pluggable database proxy.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

Please keep in mind that in April we travel a lot (conferences, busy time focusing on a release), so if you have a question and nobody on IRC answers, don't feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!

The complete list of tasks suggested for GSoC 2016 is located in the . A subset is listed below.

Support for GTID in mysqlbinlog

The tool needs to be updated to understand the replication feature called (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.

Details:

Skills:

Mentor:

Students Interested:

Aggregate stored functions

With one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like , , etc). This task is to add support for aggregate .

Details:

Skills:

Mentor:

Students Interested:

GIS enhancements

Details:

Skills:

Mentor:

Students Interested:

Indexes for BLOBs (in MyISAM and Aria)

This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.

Details:

Skills:

Mentor:

Students Interested:

Provide GTID support for MHA

Skills:

Mentor:

Students Interested:

Import and export popular data formats from and to dynamic columns

Provide import and export functions for popular data formats like JSON, XML (limited), PHP, ... for Connector/C and MariaDB Server (which use same code base for dynamic columns)

Details:

Skills:

Mentor:

Students Interested:

MaxScale filter to capture incoming operations for consumption in external sources

Details:

Skills:

Mentor:

Students Interested:

MaxScale filter to real Microsoft SQL Server syntax

Develop a MaxScale filter that will translate SQL Server syntax to MariaDB syntax. Develop a SQL Server client protocol plugin.

Details:

Skills:

Mentor:

Students Interested:

Additional libraries for MaxScale's experimental Luafilter

Create additional entry points into MaxScale that the Lua side scripts can use. Various types of functions can be added ranging from SQL processing functions to utility functions which communicate with MaxScale.

Details:

Skills:

Mentor:

Students Interested:

Query Injection Filter

Create a filter which can inject queries before the client executes any queries. This filter could be used for various purposes for example auditing.

Details:

Skills:

Mentor:

Students Interested:

Cassandra Storage Engine V2

Current Cassandra Storage Engine was developed against Cassandra 1.1 and it uses Thrift API to communicate with Cassandra. However, starting from Cassandra 1.2, the preferred way to access Cassandra database is use CQL (Cassandra Query Language) and DataStax C++ Driver (). Thrift-based access is deprecated and places heavy constraints on the schema.

This task is about re-implementing Cassandra Storage Engine using DataStax C++ Driver and CQL.

Details:

Skills:

Mentor:

Students Interested:

NO PAD collations

Currently MariaDB ignores trailing spaces when comparing values of the CHAR, VARCHAR, TEXT data types. In some cases it would be nice to take trailing spaces into account. This task will introduce a set of new collations that will make this possible.

Details:

Skills:

Mentor:

Students Interested:

Suggest a task

Are you a student interested in working on something? Let us know here.

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2019

We participated in the Google Summer of Code 2019. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip and on IRC to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

List of Tasks

Loaded from the

MariaDB Server: Optimizer

Evaluate subquery predicates earlier or later depending on their SELECTIVITY

(Based on conversation with Igor) There are a lot of subquery conditions out there that are inexpensive to evaluate and have good selectivity. If we just implement , we may get regressions. We need to take subquery condition's selectivity into account. It is difficult to get a meaningful estimate for an arbitrary, correlated subquery predicate. One possible solution is to measure selectivity during execution and reattach predicates on the fly. We don't want to change the query plan all the time, one way to dynamically move items between item trees is to wrap them inside Item_func_trig_cond so we can switch them on and off.

Details:

Mentor:

Add support for Indexes on Expressions

An index on expression means something like

in this case the optimizer should be able to use an index. This task naturally splits in two steps:

add expression matching into the optimizer, use it for generated columns. Like in CREATE TABLE t1 (a int, b int, v INT GENERATED ALWAYS AS (a/2+b), INDEX (v));
support the syntax to create an index on expression directly, this will automatically create a hidden generated column under the hood

original task description is visible in the history

Details:

Mentor:

Histograms with equal-width bins in MariaDB

Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval

It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N. To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:

Here:

'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.
'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.
'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.

Details:

Mentor:

Add FULL OUTER JOIN to MariaDB

Add support for FULL OUTER JOIN One of the way how to implement is to re-write the query

into the following union all:

Here t1.a is some non-nullable column of t1 (e.g. the column of single column primary key).

Details:

Mentor:

Recursive CTE support for UPDATE (and DELETE) statements

supported in MySQL-8.0 and MSSQL

Details:

Mentor:

Implement EXCEPT ALL and INTERSECT ALL operations

SQL Standard allows to use EXCEPT ALL and INTERSECT ALL as set operations. Currently MariaDB Server does not support them. The goal of this task is to support EXCEPT ALL and INTERSECT ALL

at syntax level - allow to use operators EXCEPT ALL and INTERSECT ALL in query expression body
at execution level - implement these operations employing temporary tables (the implementation could use the idea similar to that used for the existing implementation of the INTERSECT operation).

Details:

Mentor:

MariaDB Server: others

Implement UPDATE with result set

Add an UPDATE operation that returns a result set of the changed rows to the client.

I'm not exactly sure how the corresponding multiple-table syntax should look like, or if it is possible at all. But already having it for single-table updates would be a nice feature.

Details:

Mentor:

Automatic provisioning of slave

Idea

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave. Setting up a new slave currently involves: 1) installing MariaDB with initial database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying initial data from the master to the slave; and 4) starting the slave with START SLAVE. The idea is to automate step (3), which currently needs to be done manually. The syntax could be something as simple as

This would then connect to the master that is currently configured. It will load a snapshot of all the data on the master, and leave the slave position at the point of the snapshot, ready for START SLAVE to continue replication from that point.

Implementation:

The idea is to do this non-blocking on the master, in a way that works for any storage engine. It will rely on row-based replication to be used between the master and the slave. At the start of LOAD DATA FROM MASTER, the slave will enter a special provisioning mode. It will start replicating events from the master at the master's current position. The master dump thread will send binlog events to the slave as normal. But in addition, it will interleave a dump of all the data on the master contained in tables, views, or stored functions. Whenever the dump thread would normally go to sleep waiting for more data to arrive in the binlog, the dump thread will instead send another chunk of data in the binlog stream for the slave to apply. A "chunk of data" can be:

A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION
A range of N rows (N=100, for example). Each successive chunk will do a range scan on the primary key from the end position of the last chunk.

Sending data in small chunks avoids the need for long-lived table locks or transactions that could adversely affect master performance. The slave will connect in GTID mode. The master will send dumped chunks in a separate domain id, allowing the slave to process chunks in parallel with normal data. During the provisioning, all normal replication events from the master will arrive on the slave, and the slave will attempt to apply them locally. Some of these events will fail to apply, since the affected table or row may not yet have been loaded. In the provisioning mode, all such errors will be silently ignored. Proper locking (isolation mode, eg.) must be used on the master when fetching chunks, to ensure that updates for any row will always be applied correctly on the slave, either in a chunk, or in a later row event. In order to make the first version of this feature feasible to implement in a reasonable amount of time, it should set a number of reasonable restrictions (which could be relaxed in a later version of the feature):

Give up with an error if the slave is not configured for GTID mode (MASTER_USE_GTID != NO).
Give up with error if the slave receives any event in statement-based binlogging (so the master must be running in row-based replication mode, and no DDL must be done while the provisioning is running).
Give up with an error if the master has a table without primary key.
Secondary indexes will be enabled during the provisioning; this means that tables with large secondary indexes could be expensive to provision.

Details:

Mentor:

connection encryption plugin support

As a follow-on to we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that. From having written GSSAPI for the internal interface, there were a couple things I would like to see in the plugin encryption interface. First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others. Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around). This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.) Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it. It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!

Details:

Mentor:

Add RETURNING to INSERT

Please add a RETURNING option to INSERT. Example from PostgreSQL

Inspired by:This could make it easier to write statements which work with both MariaDB and PostgreSQL. And this might improve compatibility with Oracle RDBMS.

Details:

Mentor:

Aggregate Window Functions

With a few exceptions, most native aggregate functions are supported as window functions. In , support for creating of custom aggregate functions was added. This task proposes to extend that feature and allow custom aggregate functions to be used as window functions An example of a creating a custom aggregate function is given below:

This functions can be used in the following query:

After this task is complete the following must also work:

Details:

Mentor:

True ALTER LOCK=NONE on slave

Currently no true LOCK=NONE exists on slave. Alter table is first committed on master then it is replicated on slaves. The purpose of this task is to create a true LOCK=NONE

Implementation Idea

Master will write BEGIN_DDL_EVENT in binlog after it hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On reciving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute untill ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are avaliable then event will be enqueued, then If we recieved rollback event the we will simply discard event from queue, If we recieved commit event then SQL thread will syncrolysly process DDL event.

Details:

Mentor:

Improve mysqltest language

mysqltest has a lot of historical problems:

ad hoc parser, weird limitations
commands added as needed with no view over the total language structure
historical code issues (e.g. casts that become unnecessary 10 years ago) etc

A lot can be done to improve it. Ideas

control structures, else in if, break and continue in while, for (or foreach) loop
proper expression support in let, if, etc
rich enough expressions to make resorting to sql unnecessary in most cases
remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)

Details:

Mentor:

Implement multiple-table UPDATE/DELETE returning a result set

A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:

first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it updates the matching rows

A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:

first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it deletes the matching rows

Details:

Mentor:

sort out the compression library chaos

As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes. InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries. A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it

hide all compression libraries behind a single unified compression API. Either develop our own or use something like Squash. This would require changing all engines to use this API
use the same approach as in server services — create a service per compression library, a service implementation will just return an error code for any function invocation if the corresponding library is not installed. this way — may be — we could avoid modifying all affected storage engines

Details:

Mentor:

Control over memory allocated for SP/PS

Task minimum:

Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.

Other ideas:

Automatic stop allocation in debvugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory

Details:

Mentor:

MariaDB ColumnStore

Full DECIMAL support in ColumnStore

MariaDB ColumnStore supports DECIMAL with some limitations:

We do not support the full DECIMAL range that is in MariaDB
In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision Implementing this will likely require the following:

Implementation of methods to handle MariaDB's DECIMAL format
Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)
Modification of the primitives processor for the math
Modification of the function expression processor to handle the new type

Details

Mentor:

mcsapi needs a new read API Design

We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi. This will likely take the form of:

A reader in ExeMgr which will convert messages from mcsapi into jobs
Code in mcsapi to send/receive the messages Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects). We should look at other ORM frameworks for inspiration as the API design. This task to do the design for this API.

Details:

Mentor:

Use JIT for aggregation functions

CS uses Volcano processing approach working on one value at a time. This is very inefficient way for analytics workload that usually uses lots of aggregation functions in projections, filtering or sorting. We are interested in using JIT for basic aggregation functions: sum, avg, count, min, max. The patch must compile and run a program that processes and returns the aggregation function result. We were written this description having LLVM in mind as it is widespread and has a lots of examples in the wild. I suggest to start looking at RowAggregation::addRowGroup() from ./utils/rowgroup/rowaggregation.cpp to get what it takmakees to get avg() function result. Here is the link on how to build fast a CS developer environment.

Details:

Mentor:

Engine independent statistics for Columnstore

CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:

implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection
implement ANALYZE TABLE functionality for Columnstore engine
implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to ) together with NDV histograms to decrease I/O

We expect to have both unit and regression tests but this is optional.

Details:

Mentor:

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

_{This page is licensed: CC BY-SA / Gnu FDL}

Google Summer of Code 2018

We participated in the Google Summer of Code 2018. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

List of tasks

Loaded from the

Full DECIMAL support in ColumnStore/

MariaDB ColumnStore supports DECIMAL with some limitations:

We do not support the full DECIMAL range that is in MariaDB
In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision

Implementing this will likely require the following:

Implementation of methods to handle MariaDB's DECIMAL format
Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)
Modification of the primitives processor for the math
Modification of the function expression processor to handle the new type

Details:

Mentor:

mcsapi needs a new read API Design/

We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi.

This will likely take the form of:

A reader in ExeMgr which will convert messages from mcsapi into jobs
Code in mcsapi to send/receive the messages

Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects).

We should look at other ORM frameworks for inspiration as the API design.

This task to do the design for this API.

Details:

Mentor:

Support for GTID in mysqlbinlog/

The mysqlbinlog client program needs to be updated to support GTID.

Here is a suggested list of things to be done:

The --start-position and --stop-position options should be able to take GTID positions; or maybe there should be new --start-gtid and --stop-gtid options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.
A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.

Probably some more things will come up during the work, but this looks like a reasonable start.

Details:

Mentor:

Implement UPDATE with result set/

Add an UPDATE operation that returns a result set of the changed rows to the client.

I'm not exactly sure how the corresponding multiple-table syntax should look like, or if it is possible at all. But already having it for single-table updates would be a nice feature.

Details:

Mentor:

optimizer trace/

In MySQL, Optimizer trace is a JSON object recording the execution path through the optimizer, decisions that were made and the reasons for them. See

Users were asking for MariaDB to have a similar feature.

Details:

Mentor:

Automatic provisioning of slave/

Idea

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave.

Setting up a new slave currently involves: 1) installing MariaDB with initial database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying initial data from the master to the slave; and 4) starting the slave with START SLAVE. The idea is to automate step (3), which currently needs to be done manually.

The syntax could be something as simple as

LOAD DATA FROM MASTER

Implementation:

The idea is to do this non-blocking on the master, in a way that works for any storage engine. It will rely on row-based replication to be used between the master and the slave.

At the start of LOAD DATA FROM MASTER, the slave will enter a special provisioning mode. It will start replicating events from the master at the master's current position.

The master dump thread will send binlog events to the slave as normal. But in addition, it will interleave a dump of all the data on the master contained in tables, views, or stored functions. Whenever the dump thread would normally go to sleep waiting for more data to arrive in the binlog, the dump thread will instead send another chunk of data in the binlog stream for the slave to apply.

A "chunk of data" can be:

A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION
A range of N rows (N=100, for example). Each successive chunk will do a range scan on the primary key from the end position of the last chunk.

Sending data in small chunks avoids the need for long-lived table locks or transactions that could adversely affect master performance.

The slave will connect in GTID mode. The master will send dumped chunks in a separate domain id, allowing the slave to process chunks in parallel with normal data.

During the provisioning, all normal replication events from the master will arrive on the slave, and the slave will attempt to apply them locally. Some of these events will fail to apply, since the affected table or row may not yet have been loaded. In the provisioning mode, all such errors will be silently ignored. Proper locking (isolation mode, eg.) must be used on the master when fetching chunks, to ensure that updates for any row will always be applied correctly on the slave, either in a chunk, or in a later row event.

In order to make the first version of this feature feasible to implement in a reasonable amount of time, it should set a number of reasonable restrictions (which could be relaxed in a later version of the feature):

Give up with an error if the slave is not configured for GTID mode (MASTER_USE_GTID != NO).
Give up with error if the slave receives any event in statement-based binlogging (so the master must be running in row-based replication mode, and no DDL must be done while the provisioning is running).
Give up with an error if the master has a table without primary key.
Secondary indexes will be enabled during the provisioning; this means that tables with large secondary indexes could be expensive to provision.

Details:

Mentor:

connection encryption plugin support/

As a follow-on to MDEV-4691 we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that.

From having written , there were a couple things I would like to see in the plugin encryption interface.

First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others.

Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around).

This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.)

Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it.

It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!

Details:

Mentor:

Aggregate Window Functions/

Currently only a few aggregate function are supported as window functions, the list can be found here

So in MDEV-7773, support for creating of custom aggregate functions was added. Now this task would deal with extending that feature and make custom aggregate functions behave as window functions

An example of a creating a custom aggregate function is given below:

Details:

Mentor:

True ALTER LOCK=NONE on slave/

Currently no true LOCK=NONE exists on slave. Alter table is first committed on master then it is replicated on slaves. The purpose of this task is to create a true LOCK=NONE

Implementation Idea

On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On reciving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute untill ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are avaliable then event will be enqueued, then If we recieved rollback event the we will simply discard event from queue, If we recieved commit event then SQL thread will syncrolysly process DDL event.

Details:

Mentor:

Improve mysqltest language/

mysqltest has a lot of historical problems:

ad hoc parser, weird limitations
commands added as needed with no view over the total language structure
historical code issues (e.g. casts that become unnecessary 10 years ago) etc

A lot can be done to improve it.

Ideas

control structures, else in if, break and continue in while, for (or foreach) loop
proper expression support in let, if

Details:

Mentor:

Cassandra Storage Engine v2, based on DataStax C++ driver/

had Cassandra Storage Engine which was developed for Cassandra 1.1.x. Back then, Cassandra provided a Thrift API, and that was what Cassandra-SE used.

Then, Cassandra 2.0 switched to using a different network protocol (and also changed the data model).

This task is to develop a Cassandra Storage Engine V2 using DataStax's C++ client library ().

See also: MDEV-8947 was a previous attempt to implement this engine. Unfortunately it didn't even produce a skeleton engine.

Details:

Mentor:

Histograms with equal-width bins in MariaDB/

It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N.

To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:

Here:

'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.
'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.
'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.

Details:

Mentor:

Implement multiple-table UPDATE/DELETE returning a result set/

A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:

first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it updates the matching rows

A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:

first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it deletes the matching rows

Details:

Mentor:

Blacklist for access control a.k.a. "negative grants"/

Currently, MariaDB privilege system only perform whiltelist check for access control to certain database, table and column. This makes it difficult if we need to block access to certain database/table/column while allow for all others. — A good solution would be to allow to REVOKE anything that a user is able to do — not only exactly those grants that were granted to a user, but also a subset. Like

Details:

Mentor:

Add syntax to manually encrypt/decrypt InnoDB's system tablespace/

Currently, the InnoDB system tablespace can only be automatically encrypted/decrypted by the background encryption threads if innodb_encrypt_tables=ON|FORCE, innodb_encryption_threads>0, and innodb_encryption_rotate_key_age>0. There is no way to manually encrypt/decrypt the tablespace.

File-per-table tablespaces can be manually encrypted with:

File-per-table tablespaces can be manually decrypted with:

Some users want a similar method that would allow them to manually encrypt/decrypt the InnoDB system tablespace.

This is loosely related to MDEV-14571, since both issues are related to the fact that the system tablespace can only be encrypted/decrypted by the background threads.

Details:

Mentor:

Control over memory allocated for SP/PS/

Task minimum:

Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.

Other ideas:

Automatic stop allocation in debvugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory

Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found.

MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.

Details:

Mentor:

_{This page is licensed: CC BY-SA / Gnu FDL}

Log of MariaDB Contributions

How to Document Contributions

Add the following in the global comment for each contribution:

For those cases this is not done, please add to this page a short line for each push into MariaDB that includes code from contributors not employed by the MariaDB Foundation or the MariaDB Corporation. The purpose of this is to properly track that all such patches are submitted either under MCA or BSD-new and to ensure that the developer gets credit for his work.

Example:

(Please enhance the example with anything that makes sense.)

Log of Contributions

. Log of Contributions

Log of Contributions

()

Tencent Game DBA Team, developed by vinchen.

()

Jerome Brauge.

Per-engine mysql.gtid_slave_pos tables ()

Kristian Nielsen funded by Booking.com.

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from

Log of Contributions

New variable permits restricting the speed at which the slave reads the binlog from the master ()

Tencent Game DBA Team, developed by chouryzhou.

()

Tencent Game DBA Team, developed by vinchen.

()

Daniil Medvedev

Lixun Peng, Alibaba

Implement non-recursive common table expressions () Implement recursive common table expressions () Pushdown conditions into non-mergeable views/derived tables ()

Galina Shalygina

Backporting Delayed replication () from MySQL 5.6

Kristian Nielsen funded by Booking.com

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from

Log of Contributions

, optimizer, security, speed enhancements, bug fixing, etc

Power8 optimization

Stewart Smith
In cooperation with IBM

enhancements and speedups

Reviews for , , compression, , storage engine, storage engine, , etc.

, scrubbing, enhanced semisync, dump thread enhancements, thd_specifics plugin service

Google

Table level ,

, online alter progress monitoring

Antony Curtis

Sriram Patil

New

Daniel Black

Daniël van Eeden

Atomic writes, page compression, trim, multi-threaded flush for XtraDB/InnoDB

In cooperation with FusionIO

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from

Also Used Code Snippets by

Facebook

Defragmentation, prefix index queries optimization, lazy flushing, buffer pool list scan optimization, configurable long semaphore wait timeout

Percona

Oracle

optimization,

Log of Contributions

Per thread memory counting and usage

Base code and idea by Lixun Peng, Taobao
License: BSD

Base code by Lixun Peng, Taobao
License: BSD

Code by Konstantin "Kostja" Osipov, mail.ru
License: BSD

Code by Olivier Bertrand
License: GPL

Code by Kentoku Shiba, Spiral Arms
License: GPL

Code by Vicentiu Ciorbaru, Google Summer of Code 2013
License: BSD

Code by Sudheera Palihakkara, Google Summer of Code 2013
License: BSD

Some patches by Pavel Ivanov, Google

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from

Log of Contributions

Function last_value() which returns the last value but evaluates all arguments as a side effect.

Original patch by Eric Herman, Booking.com
License: BSD

nowatch option for mysqld_safe (allow systemd)

Based on code from Maarten Vanraes
License: BSD

Security fixes, patches

Work by Honza Horak, Red Hat

Coverity scans

Work by Christian Convey

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from

Log of Contributions

Virtual Columns

Andrey Zhakov (modified by Sanja and Igor)
Author has

Declaring many CHARSET objects as const.

Antony T Curtis (LinkedIn)
License: BSD

Authors: People at Google, Facebook and Percona. This code owes a special thanks to Mark Callaghan!
License: BSD

Fredrik Nylander from Stardoll.com
License: MCA

The storage engine

Created by Arjen Lenz, Open Query
License GPL

The storage engine

Created by Andrew Aksyonoff.
License: GPL

Pluggable Authentication

RJ Silk License: MCA

Various bug fixes

Stewart Smith, Percona

Log of Contributions (Outside of Monty Program Ab)

Microsecond precision in process list

Percona Inc
Patch was .

Slow Query Log Extened Statistics

Percona Inc
Patch was .

The

Created by Paul McCullagh
License: GPL

The FederatedX storage engine

All changes are made by Patrik Galbraith and Antony Curtis and are given to us under BSD-new.
In addition we are allowed to promote FederatedX.

Windows enhancements and various bug fixes

Alex Budovski, under MCA

Creating of MariaDB packages

Arjen Lenz, Open Query

Various bug fixes

Stewart Smith, Percona

Contributing & Participating

Contributing Code

Finding Development Projects to Work on

Contributing to the MariaDB Project

How to Contribute to MariaDB

MariaDB Public FTP Server

Sponsoring the MariaDB Project

Google Summer of Code

Google Summer of Code 2025

Where to Start

Google Summer of Code 2022

Where to Start

List of Tasks

MariaDB Server

ColumnStore

Buildbot (CI/CD)

Suggest a Task

Google Summer of Code 2014

Where to start

Google Season of Docs

Google Season of Docs 2020

Where to Start

Contributing & Participating

MariaDB Public FTP Server

Google Summer of Code

Sponsoring the MariaDB Project

Contributing to the MariaDB Project

How to Contribute to MariaDB

Getting Started

MariaDB Email Lists

Getting Started for Developers

Google Summer of Code 2014

Where to start

Google Season of Docs 2020

Where to Start

Google Season of Docs

Self-Tuning Optimizer

Port InnoDB memcached interface to MariaDB

GIS enhancements to MariaDB

User defined events

Indexes for BLOBs (in MyISAM and Aria)

CREATE OR REPLACE, CREATE IF NOT EXISTS, and DROP IF EXISTS

Statistically optimize mysql-test runs by running less tests

Improved temporary tables

Table UDFs

GTID support in mysqlbinlog

See Also

Migration Documentation

Stored Procedures Documentation

Getting Started with Connector/C and Connector/J

Spider Documentation

Mroonga Documentation

Translation

%fields.summary%

Suggest a Task

Google Summer of Code 2022

Where to Start

List of Tasks

MariaDB Server

ColumnStore

Buildbot (CI/CD)

Suggest a Task

Google Summer of Code 2025

Where to Start

Contributing Code

Finding Development Projects to Work on

List of Tasks

MariaDB Server

MDEV-28395 LOAD DATA plugins

MDEV-36100 Generate vector embeddings automatically on INSERT

MDEV-36107 expressions in mysqltest

MDEV-36108 variable substitutions in mysqltest

MDEV-18827 Create utility to parse frm files and print their DDL

MDEV-9345 Replication to enable filtering on master

Buildbot build statistics dashboard

MCOL-4889 Manual extent vacuuming

MCOL-5142 Support for recursive CTE

MCOL-5598 Support for EXCEPT and INTERSECT SQL expressions

MCOL-XXX Bloom-filters for data scanning

MCOL-5758 Reduce the computations in JOINS by simpler Bloom-filter-based pre-joins