Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 324 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

General Resources

About

About MariaDB Documentation

The documentation for MariaDB products is

  • written in standard American English,

  • using Markdown format,

  • stored in Git.

It is maintained by a team of technical writers from MariaDB plc and the MariaDB Foundation.

Until June 2025, the documentation used to be in the MariaDB KnowledgeBase (KB). With a few exceptions, mostly concerning outdated modules and functionality, all pages were migrated to a new platform, GitBook.

Screenshot of the MariaDB documentation start page, listing the categories like MariaDB Server, Enterprise Operator, MaxScale, ColumnStore, and Connectors.
The MariaDB documentation that contained the MariaDB documentation until June 2025

Contributing Documentation

Instructions on how to contribute to MariaDB documentation

The documentation for MariaDB products is

  • written in standard American English,

  • using Markdown format,

  • stored in Git.

While the documentation is mainly maintained by the documentation team at MariaDB plc, we are happy to get contributions. Being stored in Git, it allows anyone to contribute to the documentation. You need a GitHub account, a basic knowledge of Markdown, and some expertise in what you’re writing about. You also have to agree to our contributor agreement.

Contributing is as simple as this:

  • Access this repository, log in with your GitHub account.

  • Find the page in the documentation that you want to edit, correct, or amend.

  • Make a pull request, edit, and submit.

The MariaDB documentation team will review your edits, smooth out any rough edges (language and style), and incorporate your contribution into the documentation. Don’t be afraid to submit imperfect contributions, as long as they’re factually correct.

Before you start making larger contributions, make yourself familiar with the basics of technical writing (a 1-minute read). This is about using proper tone and content, language and grammar, as well as formatting, punctuation, and organization.

The source format of pages is Markdown. How to format text is on this GitBook page. A Markdown cheat sheet with a 10-minute tutorial and a Markdown "playground" can be found here.

Read our style guide, too. (It's short!)

Also see the About Links page. It has useful information for when you are adding links to other pages.

Documentation Style Guide

Style Guide

We adhere to the Google developer documentation style guide. Here are some links to particularly important resources from that style guide:

  • Word list – an alphabetically ordered list that allows you to quickly find words to use, or not to use, and recommendations of words and terms to use.

  • Accessibility – write inclusive documentation for a global audience.

  • Timeless documentation – avoid words and terms like currently, existing, in the future, now, or soon.

  • Capitalization – when to use the Shift key, and how to format headings and titles. We use upper case for headings, which is about the only deviation from the Google style guide.

  • Abbreviations – how to use acronyms and initialisms.

  • Punctuation – how to properly use colons, commas, dashes, etc.

  • Formatting and organization – how to write dates and times, headings and titles, when to use lists or rather tables.

  • Cross-references – how to properly write links.

  • Code samples – how to write and format code blocks.

  • Example domain names, IP numbers, and person names – and how to use filenames and trademarks.

Additional Information

Version-specific information: We refer to software versions (like "MariaDB 10.6") only for products (like MariaDB Server or MaxScale) or features (like replication or authentication) that follow the N-1 rule, where N is the last version still under maintenance. (At the time of writing, that's MariaDB 10.6 for the Server.) In other words, we'd mention MariaDB 10.5, but not versions older than that. This is the principle, from which we will deviate if there's a valid reason to do so.

Reporting Documentation Issues

Bug Reports

If you find issues in the documentation, please report them:

  1. Report only one issue per request. If you find multiple issues, report them one by one. Only report documentation issues, not software issues or software help requests.

  2. Provide the URL of the page that has an issue, for example https://mariadb.com/docs/general-resources/about/readme/reporting-documentation-bugs. ℹ️ When reporting via the rating system, the URL of the page you're on will be automatically be included in your response, so there's no need to include the URL.

  3. Indicate the nature of the issue:

    1. Typo, for example "known bucks should be known bugs".

    2. Wrong information. Provide details of what's wrong. Ideally, point out what the right information should be.

    3. Missing information. Provide details of what's missing.

    4. Unclear information. Provide details of what's unclear. Ideally, provide a clarification.

Reporting Channels

Use one of the following channels to report documentation issues. Please don't report software issues via those channels — instructions for doing that are on this page.

Screenshot of the GitBook rating system, with sample text in the text field, and a Submit button.

Rating system in the right sidebar

This is a super quick way to provide feedback or report issues in the documentation. However, it's one-way communication — we can't provide feedback to you, since we don't know who you are. 😇 ℹ️ Don't paste the URL of the page you're reporting from, since it will automatically be included.

MariaDB Community Slack

Join the #documentation channel in MariaDB Community Slack. This allows for more detailed feedback or reports, and naturally provides two-way communication.

Screenshot of the MariaDB Community Slack, with the #documentation channel highlighted.

This page is licensed: CC BY-SA / Gnu FDL

About Links

This page is for contributors to the MariaDB Documentation and goes into detail on the internals of links. This page is not about MariaDB. If you're interested in contributing to the MariaDB Docs, please also see the Contributing Documentation and Documentation Style Guide pages.

There are three types of links in the MariaDB docs: external, relative, and space. The general rules for when to use each are:

  • If the link is outside of https://mariadb.com/docs/ → Use an External Link

  • If the link is to a page in the same space → Use a Relative Link

  • If the link is to a page in another space → Use a Space Link

See About Spaces for information on what Spaces are.

About Spaces

In GitBook (our documentation system), Spaces are the main sections of the site you see along the top of every docs page:

What space you are in is very important in determining whether you need to use a Relative or Space link. Gitbook identifies Spaces via a unique space identifier. See the Space Links section for more details. We also have a handy list of Space prefixes for use when creating space links in Markdown.

External Links

External links are the easiest, they are to external pages outside the https://mariadb.com/docs site. Some examples in Markdown syntax of external links are:

* [Example Website](https://example.com)
* [MariaDB Corp Blog](https://mariadb.com/blog)
* [MariaDB JIRA](https://jira.mariadb.org)

Technically, you can use external links for links to docs content, you just put in the full URI to the page you want to link to. However, if you do that we lose the ability to automatically update the link if the page you're linking to is moved or renamed. So for links to docs content we prefer to use Relative Links or Space Links.

Relative Links

Relative links are links to a page in the same space, relative to the page you are editing. For example a relative link to the Joining the Community page, from this page, looks like this in Markdown:

[Joining the Community](../../community/joining-the-community.md)

One big limitation of relative links is that they cannot cross Space boundaries.

This page you are currently reading is under the General Resources space, so we can use internal links to link to other pages under https://mariadb.com/docs/general-resources/. If we want to link to a page in another space, we need to use Space Links.

Space Links

To link to pages in other Spaces we need to use special Space Links which use an internal identifier so that GitBook knows exactly what page you are pointing to.

A space link begins with https://app.gitbook.com/s/ , followed by a unique alphanumeric space identifier (in this doc we'll call both of these together the space prefix), and finally the path to the page.

The path is everything after the space name in a full page URI. For example, take the following full URI for the Securing MariaDB page:

https://mariadb.com/docs/server/security/securing-mariadb

In this URI, the space name is server and the path, if you were creating a space link, is:

/security/securing-mariadb

To convert that into a space link we need to get the Server space identifier and combine it with the path. Rather than list out just the identifiers for our spaces, we have a List of Space Prefixes that you can copy from when creating space links.

Continuing with our example, a full space link in Markdown for the Securing MariaDB page is: space prefix (for the Server space) + path:

[Securing MariaDB](https://app.gitbook.com/s/SsmexDFPv2xG2OTyO5yV/security/securing-mariadb)

See the List of Space Prefixes section for a list of all of our space prefixes.

List of Space Prefixes

A handy list of all space prefixes for the MariaDB Docs:

MariaDB Platform space prefix

https://app.gitbook.com/s/JqgUabdZsoY5EiaJmqgn

Server space prefix

https://app.gitbook.com/s/SsmexDFPv2xG2OTyO5yV

MaxScale space prefix

https://app.gitbook.com/s/0pSbu5DcMSW4KwAkUcmX

ColumnStore space prefix

https://app.gitbook.com/s/rBEU9juWLfTDcdwF3Q14

Galera Cluster space prefix

https://app.gitbook.com/s/3VYeeVGUV4AMqrA3zwy7

Connectors space prefix

https://app.gitbook.com/s/CjGYMsT2MVP4nd3IyW2L

Tools space prefix

https://app.gitbook.com/s/kuTXWg0NDbRx6XUeYpGD

Release Notes space prefix

https://app.gitbook.com/s/aEnK0ZXmUbJzqQrTjFyb

General Resources space prefix

https://app.gitbook.com/s/WCInJQ9cmGjq1lsTG91E

Space Link Examples

Here are some examples of Markdown links to various pages using space links:

Options, System & Status Variables in the Server space

[Options, System & Status Variables](https://app.gitbook.com/s/SsmexDFPv2xG2OTyO5yV/reference/full-list-of-mariadb-options-system-and-status-variables

MariaDB 12.1 Changes & Improvements in the Release Notes space

[MariaDB 12.1 Changes & Improvements](https://app.gitbook.com/s/aEnK0ZXmUbJzqQrTjFyb/community-server/release-notes-mariadb-12.1-rolling-releases/changes-and-improvements-in-mariadb-12.1)

List of MariaDB Connector/C Releases in the Connectors space

[List of MariaDB Connector/C Releases/](https://app.gitbook.com/s/CjGYMsT2MVP4nd3IyW2L/mariadb-connector-c/list-of-mariadb-connector-c-releases)

When Space Links are rendered to the public site, GitBook handles translating Space Links into a link to the correct page. And if a page is moved or renamed then the link will be automatically updated on every page it appears on.

About MariaDB

About MariaDB Server

MariaDB Server is a leading open source relational database, developed by the original creators of MySQL. Renowned for its speed, scalability, and robustness, it serves a wide range of applications, from banking to websites, and powers notable platforms like Wikipedia and WordPress.com. Originally designed as a drop-in replacement for MySQL, MariaDB Server offers an enhanced feature set, including a rich ecosystem of storage engines and plugins. Its versatility is further extended by features like GIS, Oracle and MySQL compatibility, JSON functionality, and vector search in recent versions. MariaDB Server utilizes a standard SQL interface for data access and benefits from continuous open-source development, ensuring its reliability and innovation.

Learn how to get started with MariaDB Server.

About MariaDB plc

MariaDB plc is the lead developer behind the popular open source database MariaDB Server, which has been downloaded over 1 billion times. The company also develops an enterprise version, MariaDB Enterprise Platform, that delivers reliability, security and scalability needed to run production workloads with extreme peace-of-mind. MariaDB plc delivers long term releases for customers and provides world-class support and enterprise-level SLAs.

For more information, visit mariadb.com.

About MariaDB Foundation

The MariaDB Foundation is a non-profit organization that acts as the global contact point for collaboration on MariaDB Server. The MariaDB Foundation’s mission is around Openness, Adoption and Continuity.

  • Ensuring the MariaDB Server code base remains open for usage and contributions on technical merits.

  • Stiving to increase adoption by users and across use cases, platforms and means of deployment.

  • Providing continuity to the MariaDB Server ecosystem, independent of any commercial entities.

For more information, visit mariadb.org.\

Contact Us

Explore these resources to optimize your use of MariaDB, including documentation, enterprise solutions and community engagement.

  • Documentation: Feedback & issue reporting

  • Enterprise products & support: Contact us

  • Chat & mailing lists: Joining the Community

Community

Joining the Community

For discussions, release announcements, and development updates, engage with the MariaDB community through various chat platforms and mailing lists. Participate in Zulip, Slack, IRC channels, or subscribe to mailing lists like announce, discuss, docs, developers, commits, and packagers for extensive community interactions.

Chat

MariaDB developers and community members can be found on these chat channels:

  • MariaDB Zulip Chat: https://mariadb.zulipchat.com/

  • MariaDB Slack Channel: https://r.mariadb.com/join-community-slack

  • MariaDB IRC Chat: https://kiwiirc.com/nextclient/irc.libera.chat/#maria

  • More ways to connect: Connect with the Community | MariaDB

Mailing lists

  • The announce list is a low-volume list for release announcements. The archive can be found here.

  • The discuss mailing list is for MariaDB users and general discussion. The archive can be found here.

  • The docs mailing list is for those interested in documentation for MariaDB. The archive can be found here.

  • The developers mailing list is for those who want to contribute code to the project or otherwise closely monitor MariaDB development. The archive can be found here.

  • The commits mailing list is for commit messages and patch. The archive can be found here.

  • The packagers mailing list for those packaging MariaDB in distros.

GitHub

The MariaDB project is hosted on GitHub.

Feature requests

If you have a feature request, you can file it in JIRA. If you are prepared to pay for the feature you can contact the MariaDB Corporation to get a quote. You can of course also fix it yourself and become a MariaDB contributor/developer.

Social media

A list of MariaDB-related social media links and pages. Note that many of them are community groups not officially affiliated with MariaDB.

YouTube

  • MariaDB plc

  • MariaDB Foundation

LinkedIn

  • MariaDB plc

  • MariaDB Foundation

X

  • MariaDB plc: @mariadb

  • MariaDB Foundation: @mariadb_org

Reddit

  • r/mariadb

Facebook

  • MariaDB plc

  • MariaDB Foundation

Instagram

  • MariaDB Foundation

Fosstodon

  • MariaDB Foundation: @mariadb_org

\

MariaDB Community

MariaDB Server is an open-source project developed by MariaDB plc and stewarded by MariaDB Foundation. Anyone can participate in the development.

This section provides information to help you participate in making MariaDB Server and other MariaDB products a success.

If you're interested in contributing to the documentation, see this page.

Bug Tracking

Bug Tracking

How and where the community can report bugs and file feature requests, and how bug reports are processed for community users. MariaDB plc provides SLA for customer issues (see Technical Support Services).

To report documentation issues, please review this page.

Reporting Software Bugs

For reporting documentation bugs specifically, see Reporting Documentation Bugs.

MariaDB's bug and feature tracker is found at jira.mariadb.org.

This page contains general guidelines for the community for reporting bugs and feature requests in MariaDB products. If you want to discuss a problem or a new feature with other MariaDB developers, you can find the email lists and forums here.

Known Issues

First, check that the bug or feature request isn't already filed in the MariaDB bugs database.

For the MariaDB issue database, use JIRA search to check if a report you are going to submit already exists. You are not expected to be a JIRA search guru, but please at least make some effort.

  • Choose Issues => Search for issues;

  • If the form opens for you with a long blank line at top, press Basic on the right to switch to a simpler mode;

  • In the Project field, choose the related project, (MDEV for generic MariaDB server and clients);

  • In the Contains text text field, enter the most significant key words from your future report;

  • Press Enter or the magnifying glass icon to search.

If you see issue reports which are already closed, pay attention to the 'Fix version/s' field -- it is possible that they were addressed in the upcoming release. If they are said to be addressed in the release that you are currently using or earlier, you can ignore them and file a new one (although please mention in your report that you found them, it might be useful).

If you find an open issue report, please vote/add a comment that the issue also interests you along with any additional information you have that may help us to find and address the issue.

If the issue is not in the MariaDB issue database yet, then it's time to file a report. If you're filing a report about an issue that's already in the MySQL issue database, please indicate so at the start of the report. Filing issue reports from MySQL in the MariaDB issue database makes sense, because:

  • It shows the MariaDB team that there is interest in having this issue addressed in MariaDB.

  • It allows work to start on addressing the issue in MariaDB - assigning versions, assigning MariaDB developers to the issue, etc.

Reporting an Issue

Bugs and feature requests are reported to the MariaDB issue tracker.

JIRA Privacy

Please note that our JIRA entries are public, and JIRA is very good at keeping a record of everything that has been done. What this means is that if you ever include confidential information in the description there will be a log containing it, even after you've deleted it. The only way to get rid of it will be removing the JIRA entry completely.

Attachments in JIRA are also public.

Access to a comment can be restricted to a certain group (e.g. Developers only), but the existing groups are rather wide, so you should not rely on it either.

If you have private information -- SQL fragments, logs, database dumps, etc. -- that you are willing to share with MariaDB team, but not with the entire world, put it into a file, compress if necessary, upload to the mariadb-ftp-server, and just mention it in the JIRA description. This way only the MariaDB team will have access to it.

Reporting Security Vulnerabilities

As explained above, all JIRA issues are public. If you believe you have found a security vulnerability, send an email to security@mariadb.org for, please, do not use JIRA for that. We will enter it in JIRA ourselves, following the responsible disclosure practices.

Contents of a Good Bug Report

Below is the information we need to be able to fix bugs. The more information we get and the easier we can repeat the bug, the faster it will be fixed.

A good bug report consists of:

  1. The environment (Operating system, hardware and MariaDB version) where the bug happened.

  2. Any related errors or warnings from the server error log file. Normally it is hostname.err file in your database directory, but it can be different depending on the distribution and version; if you cannot find it, run SELECT @@log_error on the running server. If either the variable or the file it points at is empty, the error log most likely goes to your system log. If this is systemd you can get the last 50 lines of the MariaDB log with journalctl -n 50 -u mariadb.service. If possible, attach the full unabridged error log at least from the last server restart and till the end of the log.,

  3. If the problem is related to MariaDB updates, or otherwise changing the version of the server, recovery from a previous crash, and such, then include the previous versions used, and the error log from previous server sessions.

  4. The content of your my.cnf file or alternatively the output from mariadbd --print-defaults or SHOW VARIABLES.

  5. Any background information you can provide (stack trace, tables, table definitions (show-create-table SHOW CREATE TABLE {tablename}), data dumps, query logs).

  6. If the bug is about server producing wrong query results: the actual result (what you are getting), the expected result (what you think should be produced instead), and, unless it is obvious, the reason why you think the current result is wrong.

  7. If the bug about a performance problem, e.g. a certain query is slower on one version than on another, output of EXPLAIN EXTENDED <query> on both servers. If its a SELECT query use analyze-format-json ANALYZE FORMAT=JSON.

  8. A test case or some other way to repeat the bug. This should preferably be in plain SQL or in mysqltest format. See mysqltest/README for information about this.

  9. If it's impossible to do a test case, then providing us with a backtrace information would be of great help.

JIRA Fields

The section below describes which JIRA fields need to be populated while filing reports, and what should be put there. Apart from what's mentioned below, you don't have to fill or change any fields while creating a new bug report.

Project

If you are filing a report for MariaDB server, client programs, or MariaDB Galera cluster, the target project is MDEV. Connectors and MaxScale have separate projects with corresponding names. If you choose a wrong project, bug processing can be delayed, but there is no reason to panic -- we'll correct it. If you inform us about the mistake, we'll change it faster.

Some project names include:

  • CONC - MariaDB Connector/C

  • CONCPP - MariaDB Connector/C++

  • CONJ - MariaDB Connector/J

  • CONJS - MariaDB Connector/node.js

  • CONPY - MariaDB Connector/Python

  • MCOL - ColumnStore

  • MDBF - MariaDB Foundation Development (anything related to the mariadb.org domain)

  • MDEV - MariaDB server, client programs, or MariaDB Galera Cluster

  • MXS - MaxScale

  • ODBC - MariaDB Connector/ODBC

  • R2DBC - MariaDB Connector/R2DBC

Type

Feature requests are not the same as bug reports. Specify a New feature type for feature requests in Jira, and a Bug type for bug reports. Like with the project field, choosing a wrong type will put the request to the wrong queue and can delay its processing, but eventually it will be noticed and amended.

See also plans for next release for things that we are considering to have in the next MariaDB release.

Summary

Please make sure the summary line is informative and distinctive. It should always be easy to recognize your report among other similar ones, otherwise a reasonable question arises -- why are they not duplicates?

Examples:

  • good summary: Server crash with insert statement containing DEFAULT into view

  • not a good summary: mariadbd crash

Generally, we try not to change the original summary without a good reason to do it, so that you can always recognize your own reports easily.

Priority

We do not have separate Severity/Priority fields in JIRA, so this Priority field serves a double purpose. For original reports, it indicates the importance of the problem from the reporter's point of view. The default is 'Major'; there are two lower and two higher values. Please set the value accurately. While we do take it into account during initial processing, increasing the value above reasonable won't do any good, the only effect will be the waste of time while somebody will be trying to understand why a trivial problem got such a high priority. After that, the value will be changed, and the report will be processed in its due time anyway.

Affected Versions

Put everything you know about which versions are affected. There are both major versions (10.6, 10.5 etc.) and minor versions (10.5.9, 10.4.12, etc.) available for choosing. Please always specify there the exact version(s) (X.Y.Z) which you are working with, and where you experience the problem.

Additionally, If you know the exact version where the problem appeared, please put it as well. If the problem has been present, as far as you know, in all previous releases, you can also put there the major version, e.g. 10.0. Alternatively, you can mention all of it in the description or comments.

Please also note in the description or comments which versions you know as not affected. This information will help to shorten further processing.

Environment

Put here environment-related information that might be important for reproducing or analyzing the problem: operating system, hardware, related 3rd-party applications, compilers, etc.

Description

The most important part of the description are steps to reproduce the problem. See more details about bug report contents above in the section Contents of a good bug report.

If in the process of reproducing, you executed some SQL, don't describe it in words such as "I created a table with text columns and date columns and populated it with some rows" -- instead, whenever possible, put the exact SQL queries that you ran. The same goes for problems that you encountered: instead of saying "it did not work, the query failed, I got an error", always paste the exact output that you received.

Use {noformat}...{noformat} and {code}...{code} blocks for code and console output in the description.

Attachments

If you have SQL code, a database dump, a log etc. of a reasonable size, attach them to the report (archive them first if necessary). If they are too big, you can upload them to ftp.askmonty.org/private. It is always a good idea to attach your cnf file(s), unless it is absolutely clear from the nature of the report that configuration is irrelevant.

Links

If you found or filed a bug report either in MariaDB or MySQL or Percona bug base which you think is related to yours, you can put them in the Links section; same for any external links to 3rd-party resources which you find important to mention. Alternatively, you can just mention them in the description or comments.

Tags

You don't have to set any tags, but if you want to use any for your convenience, feel free to do so. However, please don't put too generic values -- for example, the tag mariadb is meaningless, because everything there is mariadb. Don't be surprised if some tags are removed later during report processing.

Bugs that also Affect MySQL or Percona

Our normal practice is to report a bug upstream if it's applicable to their version. While we can do it on your behalf, it is always better if you do it yourself -- it will be easier for you to track it further.

If the bug affects MySQL, it should also be reported at MySQL bugs database. If the bug affects Percona server and not MySQL, it should go to Percona Launchpad.

Collecting Additional Information for a Bug Report

Getting a Stack Trace with Details

See the article How to produce a stack trace from a core file.

Extracting a Portion of a Binary Log

See the article here.

Getting Help with your Servers

If you require personalized assistance, want to ensure that the bug is fixed with high priority, or want someone to login to your server to find out what's wrong, you can always purchase a Support contract from MariaDB plc or use their consulting services.

This page is licensed: CC BY-SA / Gnu FDL

Bug Processing

This page describes how community bug reports are processed among our products and explains what you need to notice while tracking bugs.

Commitments

MariaDB does not have any SLA or guaranteed reaction times on bugs in Jira. While we are taking bugs reported by the community very seriously, and aim to provide response and to handle issues as fast as possible, MariaDB does not have a dedicated bug verification team, this activity is performed on the best-effort basis.

To make sure your bug report will be confirmed and moved forward faster, please follow the guidelines about creating bug reports.

Bug Verification Routine

As of today, initial bug processing routine in MariaDB is not strictly formalized. This section describes the de-facto status rather than any policy.

The process is different for bug reports (Bug type in JIRA) vs feature requests (Task type). The process described below is related to bug reports.

Incoming Queue

All new bug reports go to the waiting list, to be reproduced and confirmed by a member of the team. The bug stays in the queue until one or more of the conditions below are met:

  • Bug report is assigned to a developer;

  • Bug report gets status 'Confirmed';

  • Bug report gets the label 'upstream';

  • Bug report is closed (on whatever reason).

With other things equal, bug reports in the queue are initially handled in the FIFO manner; however, there are various factors that make things not equal.

Bug Processing Order

First thing that is taken into account is Priority. It does not mean that everything needs to be filed as Critical; on the contrary, it means that Priority should be chosen wisely. Although a report with higher Priority will be looked at sooner, as soon as it becomes clear that the Priority is set to a higher value than the problem deserves, it will be amended and put back to the queue. However, if the high priority is justified, we will try to process the report as fast as possible.

Another important factor is the quality of the report itself.

  • If the report is written well and has all information, including a reproducible test case, it can be verified and moved forward quickly.

  • If the report is written clearly enough, but does not have enough information, it will get fast enough first response where we will request the missing details; but the further exchange can take a lot of time until we get everything we need to work on the issue.

  • Finally, reports which are written in a tangled and incomprehensible manner get the slowest response time, because even if eventually it turns out that they do have all required information, it is difficult and time-consuming to extract and process, so they can be put aside for some time.

First Response

Complete processing of a reported bug can be complicated and time-consuming, especially the reproducing part. We do not want our users to wait for long not knowing if their bug report has even been noticed, we try to provide first response quicker than that.

First response to the bug, which we are trying to provide as quickly as possible, is one of these:

  • If we can reproduce the problem based on the information that was provided in the initial description, the report gets the status Confirmed.

  • If it is obvious from the initial description that the bug report is a duplicate of an existing one, or the problem has already been fixed in later releases or in the upcoming release, or the described behavior is not a bug, or, in very rare cases, it is admitted to be a bug, but it is not going to be fixed, the report gets closed with the corresponding Resolution value and a comment with the explanation.

  • If the bug report at least appears to describe a real bug, but we do not have enough information to proceed, we will request the information from the reporter, and the report will go to the Need feedback list.

  • If on some reason it is clear from the bug report that it will be very difficult to reproduce based on the information from the user, but there is a reason to believe that the problem can be analyzed by code inspection, the bug report can be assigned to a developer who is an expert in the corresponding area for the analysis.

We realize that "as quickly as possible" is a relative term. The dream scenario is that all reports are responded to in a matter of hours; however, more realistically, it can take a few days, and in some cases, when the team is overly busy with a big upcoming release or some other extraordinary events, it can even be weeks.

Need Feedback

When a report does not have all the information to reproduce the problem right away (which is quite often the case), we will ask the reporter to provide the missing information. Usually it takes more than one iteration to get everything right, so it is important that you respond to the questions as precisely as you can. Please make sure that you answered all questions (or, if you cannot answer some of them, please say so, otherwise we will have to ask again, and more time will be wasted on it).

There is no status "Need Feedback" in our JIRA; instead, we are using the label need_feedback. As long as the report has this label, it remains on the "Waiting for feedback" list. The label is set and removed manually by whoever asks for the feedback and receives it; so it can happen that the reporter has provided the response, but it remained unnoticed and the bug keeps waiting. It will be our fault, but human errors happen; it would help a lot if the reporter removed the label along with providing the feedback.

E-mail Notifications

This question arises fairly often, so it deserves mentioning.

As already said before, the need_feedback label is set and removed manually. JIRA e-mail updates about it can be confusing when you look at them quickly. For example, when someone removes the label, the email looks like this:

Elena Stepanova updated MDEV-9791:
----------------------------------
    Labels:   (was: need_feedback)

What it says that the Labels field has become empty, while before it had been need_feedback. People often misread it and ask "What else do you need from me? I've answered your questions". This update means that at the moment we don't need anything, your report is back to the incoming queue, and your feedback will be analyzed as soon as possible. Then, we will possibly ask more questions and set the label again, and the notification will look like this:

Elena Stepanova updated MDEV-9801:
----------------------------------
    Labels: need_feedback  (was: )

Successful Outcome

If the feedback exchange was fruitful and we received enough information to proceed, the bug report will go through the normal verification steps.

Incomplete Reports

Reports do not stay open on the "Need Feedback" list forever. After a month of waiting, if we do not get a response from the reporter, and still cannot proceed without it, we close the report as Incomplete with the corresponding comment. This state is not irreversible: you can still add comments and provide the information even when the report is closed as Incomplete, and it will be re-opened.

Worst Case Scenario

Sometimes it happens that after iterations of feedback requests we run out of ideas what else to ask from the reporter, and still could not verify the bug, or that the reporter is willing to collaborate with us, but cannot provide the necessary information on objective reasons (usually when the problem happens on a production instance). In some cases we might close the report as "Cannot reproduce", which we consider our loss; but more often we want to keep it open, in hope that more information arrives, maybe from a different source, and together with this report they will help us get to the bottom of the problem; if it happens so, the report gets assigned to somebody without being confirmed, just so it remains at least on somebody's radar, and it will stay open for a long time. It does not mean it is forgotten, it means that for the time being we hit the wall. You are very welcome to comment on such reports, whenever you think you might have something to add, because this is exactly what we are waiting for.

Bug Verification

Normally the bug report has to go through the following steps before it is moved forward to fixing:

  • the described problem needs to be reproduced;

  • it needs to be checked against all active post-Beta versions of MariaDB where it is theoretically applicable (as of the moment of writing this article, it is 5.5, 10.0, 10.1);

  • in case it is a relatively recent regression, the guilty change needs to be found;

  • the component or functional area should be determined, so that the bug gets assigned to the right person.

After that the bug is ready for fixing.

Bug Fixing Routine

Sometimes it seems hard to understand from the outside how MariaDB development team chooses which bugs to fix in a particular release, or why some bugs are fixed faster than others, or why critical bugs stay untouched for a long time.

Sprint Model for Bug Fixing

MariaDB currently uses 1- or 2-week sprint model for server development and bugfixing. It needs a separate article to describe it in more detail, but for bugfixing, in a nutshell it means the following.

  • one or two weeks before a scheduled release the team creates a new sprint and evaluates existing bugs which affect this release;

  • the selected bugs are added to the new sprint;

  • during the active sprint, the developer is supposed to work on the tasks which are part of the sprint, unless a true emergency arises.

There are two important consequences of this model which sometimes cause a confusion:

  1. If the current sprint is for one version, e.g. 10.0, and you file a bug for another version, e.g. 10.1, then, even if the bug is really critical, it won't be jumped on right away: it makes no sense, because the 10.1 is not going to be released next week anyway, while 10.0 will be. When the 10.0 sprint finishes, and 10.1 sprint starts, your bug will be picked up for that sprint and fixed then.

  2. If the current sprint for 10.1 is already in progress, newly created 10.1 reports normally won't be included into it, unless they are considered so extremely important that the developer is allowed to ignore the sprint plan.

Bugs Chosen for a Sprint

When a new sprint is created, bugs which affect the scheduled release are evaluated.

  • from all such bugs assigned to a developer, each developer chooses bugs he is able to work on during the given time interval;

  • bug priority plays the most significant role in this process, but this is not the only factor.

    • Blocker bugs must be either fixed or degraded before the release goes out;

    • Critical bugs should be chosen above other bugs, except for Blockers;

    • among Major bugs,

      • bugs with patches, either external, or upstream, or internal, are usually prioritized above ordinary bug reports;

      • external reports (community reports) are ranked higher than bugs reported by the development team;

      • bugs which can realistically be fixed in the given time interval are chosen more frequently than those that are likelly to take several cycles;

      • bugs which affect the reporter in a worse matter get more attention than those that have viable workarounds;

    • Minor bugs are usually fixed when there are no more urgent tasks.

Tracking Progress

If a bug report has passed through verification stage, either being confirmed, or pushed forward to the development-level analysis as is, there can be various updates on it. It is important to understand what they mean.

JIRA Fields to Watch

All JIRA fields are public, but some of them are mainly used for internal development process, while others are more user-facing. This article describes which fields should be populated during the initial report submission. There is a different set of fields important for tracking purposes.

Resolution vs. Status

It might come as counter-intuitive, but in the existing JIRA structure, the Status field does not mean much for the user, it is mainly used for development and management purposes. On the contrary, the Resoluton field is entirely user-facing: it does not participate in planning or development. It remains the same 'Unresolved' for the whole life of the report, and is only changed when the bug gets closed, demonstrating the reason why it was closed.

Resolution

  • Unresolved - the bug report remains open, the work has not been finished.

  • Fixed - the bug has been fixed, see Fix version/s and possibly comments to the report for more information. This is almost always a terminal state, we do not re-open fixed bugs even if they later re-appear; please create a new one instead. The only case when it can be re-opened is when the 'Fix version/s' have not been released yet.

  • Duplicate - the bug report is identical to an already existing open (or recently fixed) report, which will be quoted in the comments and/or links. It is usually a terminal state, unless it is proven later that the report was not a duplicate after all.

  • Not a bug - the described behavior is not a bug, there will be a comment explaining why. It is usually a terminal state, unless you object and show why it is a bug. If the report is in fact a feature request, then rather than closing it as 'Not a bug', we will switch the type to 'Task'.

  • Incomplete - we had requested feedback from the user and waited for 1 month, but did not receive it. It is a pseudo-terminal state, the report can be re-opened any time when the requested information is provided.

  • Cannot reproduce - rather rarely used "resolution", which means we could not find the way to confirm the problem described by the reporter, and ran out of ideas what other information to request from the reporter in order to reproduce it.

  • Won't fix - another rarely used "resolution", which means that the bug is admitted, but we have no intention to fix it. Usually it happens when the bug only affects old versions, and is not important enough to fix in the old versions; or, when it is related to systems or architectures we don't officially support.

Status

  • Open, Confirmed - this distinction is used in our internal queues, but from the user's perspective the difference is slim: setting the bug report to 'Confirmed' does mean that we are satisfied with the information provided in the report, but the user will also know about it from our comments and other updates. Otherwise, bugs in both statuses can be considered for fixing.

  • In Progress, Stalled - different intermediate states of bugs which help developers to filter their lists and management to gather a picture of the current activity. For the user, there is no important difference -- despite the negative semantics, 'Stalled' does not mean that something is wrong with the bug report, only that the developer is not working on it actively at the moment.

  • In review - means, literally, that a peer review has been requested.

  • Closed - means that the bug report is closed, on whatever reason. The real reason is in the 'Resolution' field.

Fix Versions

This is an important field for progress tracking. After the bug is confirmed or otherwise acknowledged, this field is populated with a set of major versions where we intend to fix it. E.g. if the field is set to 10.0 10.1, it means that at the moment we consider it for fixing in some future 10.0 release (not necessarily the next one), and the bugfix will be merged into the next 10.1 release after that; but we do not consider it for fixing in 5.5, even if it is affected to.

To some extent, you can influence the initial plans: if you see that the fix is not targeted for versions where you think it should be, you can comment on the report, and if you provide convincing arguments and make your case, it can be reconsidered.

The value of the field is not a promise to fix the bug in the mentioned releases. It can be changed both ways: during further analysis, the developer can find out that it can be safely fixed in an earlier release, or, on the contrary, that it cannot be safely fixed in the GA release, and the fix can only go to the next versions which are currently under development.

After the bug is fixed, the value of the field is changed to the exact versions, e.g. 10.0.25 10.1.14. It means that the patch has been pushed into the 10.0 branch, and will be released with 10.0.25 release; it also means that the patch will be merged to 10.1 tree and released with 10.1.14 release, but it does not mean that it is already in the 10.1 branch.

Priority

As the other article says, the Priority field serves two purposes. During the initial bug creation, it indicates the importance of the bug report from the user's perspective (in other bug tracking systems it is called 'Severity' or alike). After the bug has been confirmed, the same field is used for development purposes, to prioritize bug fixing (real 'Priority'). While we take into account the reporter's view on the matter, we can change the initial priority both ways, depending on the information we revealed during the problem analysis, versions affected, etc.

The value of the field normally means the following:

  • Blocker - we currently think that the bug must be fixed before the next release(s) set in the 'Fix version/s' field;

  • Critical - the bug should be picked up for fixing earlier than any other bugs apart from blockers;

  • Major - the bug will be present in the main queue for fixing in the upcoming 'Fix version/s', although only a part of such bugs will be fixed in every release;

  • Minor, Trivial - the bugs will be picked up when the assignee does not have more pressing issues for the upcoming release.

Please note that the Priority field only demonstrates our intentions at the moment, it does not guarantee that things will happen according to these intentions.

Labels

Labels are mostly used for more convenient filtering and don't carry much importance otherwise. However, there are a few that affect the processing of a bug report:

  • need_feedback - its role during the initial bug processing was already described above. However, after a bug is confirmed and queued for fixing, it should not appear anymore; and even if it's left by mistake, it won't affect the progress.

  • upstream - the label means that the bug also exists in the upstream version of the corresponding component - normally, in MySQL server or a client program, but can also be in Percona's XtraDB or TokuDB. Normally there should also be a link to the upstream bug report. Setting this label means that we might want to take for a while and see whether the bug is fixed in the upstream version before we fix it in MariaDB directly. It was usual for 5.5, less usual for 10.x where bugfixes, apart from InnoDB, are not merged automatically. The label is still set, but it is more for informational purposes than to affect the priority.

  • upstream-fixed - the label means that the bug used to exist in the upstream version, but not anymore. It means that there is nothing more to wait; moreover, it might be worth picking up the bug soon and at least evaluating the upstream bugfix.

Bug Reports with Patches

MariaDB encourages contributors to provide bug fixes; so, bug reports which come with the fixes in general have a quicker turnaround. The bug fix can come in a form of Git pull request, or, in simple cases, as a diff pasted in or attached to the bug report itself.

Principles for External Bug Reports

There are some basic rules for bugs, particularly for setting the Resolution value, which we want to stick to and which might be different from procedures you came across in other projects. It mainly concerns external bugs (those that come from the community), for internal ones we can cut corners more freely.

This all is easier to understand if one remembers that the Resolution or its analogues in other bug-tracking systems is a user-facing field, as already mentioned above, and that it relates more to the report, than to the bug itself.

Duplicate

An older bug report cannot be a duplicate of a newer one, it is nonsensical. The only possible exception is when an older bug has no useful information whatsoever and the reporter does not provide any helpful feedback, while a newer report was not closed as a duplicate right away and got some useful updates. The common example of such exception is when the first report is just an optimized stack trace, no query, no data, nothing to work with, while the second report has a test case. But if the first reporter at least makes an effort to collaborate, the report deserves to be treated with respect.

Bug reports which have essentially different descriptions and/or test cases should not be duplicates. The common example is this: a developer creates a bug saying something like "this and that pieces of code are wrong, it should be so and so"; and then a user files a bug saying "this SQL produces a wrong result on this data set". Even if they are about the same error in the code at the end, they are not duplicate bug reports.

Obviously, a report can never be a duplicate of anything private (luckily it does not concern MariaDB server so far, as the bug reports are public).

In general, a bug report is a duplicate of another one if, and only if, the new reporter could find the existing report just by a reasonable JIRA search.

Cannot Reproduce

A bug report should not be closed as "cannot reproduce" if it was once verified/confirmed, but disappeared in later versions. It's unfair to the reporter, and also dangerous to the product. We should know why a bug stopped being reproducible -- either we find when and how it was fixed (and close the report as "Fixed in version X by a patch for Y"), or we discover that it wasn't in fact fixed, but just masked. The simplest example is a change of execution plan in optimizer: server would crash on a particular query, then due to a change in optimizer it started using a different plan for the same query, so it wouldn't go through the crashing path anymore. The crash is still there, though.

In general, the "cannot reproduce" resolution is a last resort. Usually if we can't reproduce something, it means that either the reporter did not provide required information (and then the resolution should be "Incomplete"), or we don't know what to request from the reporter, and then we should keep thinking, rather than close it. Of course, it happens that the bug is genuinely not reproducible, but it shouldn't be decided lightly.

This page is licensed: CC BY-SA / Gnu FDL

Security Bug Fixing

See MariaDB Security Policy.

See Also

  • MariaDB Maintenance Policy

  • MariaDB Deprecation Policy

  • MariaDB plc Legal

This page is licensed: CC BY-SA / Gnu FDL

Building MariaDB Server for Debugging

Instructions on how to build a mysqld that contains all the information we need to fix problems you encounter. (A more detailed explanation can be found here.)

  • Add the --core-file option to your /.my.cnf or /etc/my.cnf file under the [mysqld] tag.

  • Get the latest MariaDB code from GitHub.

  • Compile MariaDB with the -g compiler flag (Unix).

  • Optionally: with more checking Compile MariaDB for debugging - will cause slowdown.

  • Shut down your old mysqld server.

  • Install the new compiled mysqld binary. Note that if you are compiling same version of MariaDB that you have already installed it's enough to just copy this one binary!

  • Restart mysqld.

Compiling with -g should not cause any notable slowdown of the server.

You can of course also do make install, but the above way allows you to go back to your old binary if needed.

If you get any errors about a wrong number of error messages, you can fix that by copying the corresponding language file from sql/share over your old ones (this should be reasonably safe to do).

cp sql/share/english/* mariadb-install-dir/share/mysql/english

What to Do When You Get a Crash After Installing a Debug Binary

Now when you get a crash do the following:

  • Create a README file that describes the problem. You can use the mysqlbug script to generate a template for this.

  • Create a tar file containing the core, the mysqld binary and README. If possible, also add any database files that could help us repeat the problem!

sh> tar cvfz /tmp/mariadb-bug-'short-description'.tgz mariadb-data-dir/core* mariadb-install-dir/libexec/mysqld README
  • Send it to our secure ftp server:

sh> ftp -a ftp.askmonty.org
ftp> cd private
ftp> binary 
ftp> put /tmp/mariadb-bug-'short-description'.tgz
ftp> quit
  • To be able to follow the progress, create a bug report in JIRA about this. This should be easy to do based on the information you have in your README file.

See Also

  • Compiling MariaDB from Source.

  • Compiling MariaDB for debugging

  • How to produce a stack trace from a core file

This page is licensed: CC BY-SA / Gnu FDL

Extracting Entries from the Binary Log

Instructions to narrow down problems on a replication replica.

Sometimes a binary log event causes an error of some sort. A whole binary log file is sometimes impractical due to size or sensitivity reasons.

Step 1: Copy the binary log locally

This is just in case you don't quite extract the right information first. If the binlog expired off and you haven't got the right information, your bug report may not easily be reproducible.

sudo cp /var/lib/mysql/mysql-bin.000687 ~/
sudo chown $USER: ~/mysql-bin.000687

Step 2: Create an extract header

Binary logs have a header portion. Without the header mariadb-binlog won't be able to read it. The header also contains valuable session information

We look at the binary log to see how big the header and session information is:

mariadb-binlog --base64-output=decode-rows --verbose mysql-bin.000687 | more
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150323 22:45:58 server id 76  end_log_pos 245  Start: binlog v 4, server v 5.5.39-MariaDB-log created 150323 22:45:58
# at 245
#150323 22:45:58 server id 76  end_log_pos 328  Query   thread_id=9709067       exec_time=0     error_code=0
SET TIMESTAMP=1427116558.923924/*!*/;
SET @@session.pseudo_thread_id=9709067/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 328

We see that the session information ends at 328 because of the last line, so we extract to that point.

dd if=mysql-bin.000687 of=mysql-bin.000687-extract-offset-129619 bs=1 count=328

We need to find out at what offset the entry at 129619 ends and it might be useful to extract some previous entries as well.

mariadb-binlog --base64-output=decode-rows --verbose mysql-bin.000687 | grep  '^# at ' |  grep -C 10 '^# at 129619$'
# at 127602
# at 127690
# at 128201
# at 128290
# at 128378
# at 128829
# at 128918
# at 129006
# at 129459
# at 129548
# at 129619
# at 129647
# at 130070
# at 130097
# at 130168
# at 130196
# at 130738
# at 130942
# at 130969
# at 131040
# at 131244

Take a look at those entries with:

mariadb-binlog --base64-output=decode-rows --verbose --start-position 129006  --stop-position 130168  mysql-bin.000687 | more

Now let's assume we want to start at our original 129619 and finish before 130168

dd if=mysql-bin.000687 bs=1 skip=129619 count=$(( 130168 - 129619 ))  >> mysql-bin.000687-extract-offset-129619

Check the extract:

mariadb-binlog mysql-bin.000687-extract-offset-129619

Upload this to the private uploads or attach to the public bug report if nothing sensitive there.

This page is licensed: CC BY-SA / Gnu FDL

Profiling Memory Usage

Profiling the memory usage can be useful for finding out why a program appears to use more memory than it should. It is especially helpful for analyzing OOM situations or other cases where the memory grows linearly and causes problems.

To profile the memory usage of a program, there are multiple options. The following sections describe the methods that are available.

If a problem in memory usage is identified and it appears to be due to a bug, please open a new bug report on the MariaDB Jira under the correct project and include the relevant memory profiling output in it. Refer to How to Write a Good Bug Report for more details.

Known issues that can cause extended memory usage

Transparent huge pages (THP)

Transparent huge pages (THP), which is enabled by default in many newer Linux distributions, can cause out-of-memory-issues for MariaDB as THP is not suitable for databases. This is described at MDEV-33279.

MariaDB Community Server 10.6.17, MariaDB Enterprise Server 10.6.16-11 and all other MariaDB server releases after these have THP disabled.

System malloc is not good if there are a lot of allocations of different size.

If Memory_used and information_schema.processlist do not show an increase in memory, but the process still increases in size, then a likely problem is the system memory allocation library (malloc). Replacing malloc with tcmalloc or jemalloc should fix the issue in this case.

Profiling with the MariaDB server

Recent MariaDB versions have a global variable Memory_used that shows how much memory the MariaDB server has allocated. By monitoring this variable one can find out if if the MariaDB allocated memory grows.

One can also check memory usage per user with the information_schema.processlist:

SELECT id, MEMORY_USED, MAX_MEMORY_USED FROM information_schema.processlist;

This shows the current memory used per connection and the maximum memory they have used since the user connected.

The Performance Schema can also be used to find out who is allocated memory and for what.

Note that one can also set the max_session_mem_used variable to restrict a user's memory usage.

BPF Compiler Collection (bcc)

The BPF Compiler Collection (bcc) toolkit comes with the memleak program that traces outstanding memory allocations. This is a very convenient way of debugging high memory usage as it'll immediately show where the memory is allocated at.

By default the tool will print output once every five seconds with the stacktraces that have the most open allocations. Ctrl+C can be used to interrupt the collection of the traces.

The profiling interval and the profiling duration can be passed as arguments to memleak. The first argument is how often a sample is taken and the second argument is how long to sample for. To help analyze excessive memory usage, collect the output of the memleak program for at least 60 seconds. The longer the profiling can be left on, the more accurate the information will be.

The overhead of the profiling can be large enough that it affects production workloads negatively. To reduce the overhead, the sampling frequency of memory allocations can be lowered using the --sample-rate option:

-s SAMPLE_RATE, --sample-rate SAMPLE_RATE
                        sample every N-th allocation to decrease the overhead

For example, -s 10 will sample only 10% of memory allocations which may miss out memory leaks from individual allocations but the longer the system is left running, the more likely it is that a leaking memory allocation is sampled. This means that even with a lower sampling rate, the source of the memory leak will eventually be found.

RHEL, CentOS, Rocky Linux and Fedora

On RHEL based systems, the package is named bcc-tools. After installing it, use the following command to profile the memory usage 5 times per second over a window of 60 seconds:

sudo /usr/share/bcc/tools/memleak -p $(pidof mariadbd) 5 60 | tee memleak.log

Ubuntu and Debian

On Ubuntu/Debian the package is named bpfcc-tools. After installing it, use the following command to profile the memory usage 5 times per second over a window of 60 seconds:

sudo memleak-bpfcc -p $(pidof mariadbd) 5 60 | tee memleak.log

Jemalloc Heap Profiling

Jemalloc is an alternative to the default glibc memory allocator. It is capable of analyzing the heap memory usage of a process which allows it to be used to detect all sorts of memory usage problems with a lower overhead compared to tools like Valgrind. Unlike the ASAN and LSAN sanitizers, it is capable of detecting cases where memory doesn't actually leak but keeps growing with no upper limit (e.g. items get appended to a list but are never removed).

Ubuntu and Debian

To enable jemalloc, the packages for it must be first installed from the system repositories. Ubuntu 20.04 requires the following packages to be installed for jemalloc profiling:

apt-get -y install libjemalloc2 libjemalloc-dev binutils

RHEL, CentOS, Rocky Linux and Fedora

The version of jemalloc that is available in most Red Hat repositories is not compiled with memory profiling support enabled. For RHEL based distributions, the only option is to build jemalloc from source.

Configuring Jemalloc for Heap Profiling

Once installed, edit the systemd service file with systemctl edit mariadb.service and add the following lines into it. The path to the libjemalloc.so file is OS-specific so make sure it points to the correct file. The example here is for Ubuntu and Debian environments.

[Service]
Environment=MALLOC_CONF=prof:true,prof_leak:true,prof_gdump:true,lg_prof_sample:18,prof_prefix:/var/lib/mysql/jeprof/jeprof
Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2

Then create the directory for the profile files:

mkdir /var/lib/mysql/jeprof/
chown mysql:mysql /var/lib/mysql/jeprof/

And finally restart MariaDB with systemctl restart mariadb.service.

The directory in /var/lib/mysql/jeprof/ will start to be filled by versioned files with a .heap suffix. Every time the virtual memory usage reaches a new high, a file will be created. Initially, the files will be created very often but eventually the pace will slow down. Once the problematic memory usage has been identified, the latest .heap file can be analyzed with the jeprof program.

The simplest method is to generate a text report with the following command.

jeprof --txt /usr/sbin/mariadbd $(ls -1 /var/lib/mysql/jeprof/*.heap|sort -V|tail -n 1) > heap-report.txt

A better way to look at the generated heap profile is with the PDF output. However, this requires the installation of extra packages (apt -y install graphviz ghostscript gv). To generate the PDF report of the latest heap dump, run the following command:

jeprof --pdf /usr/sbin/mariadbd $(ls -1 /var/lib/mysql/jeprof/*.heap|sort -V|tail -n 1) > heap-report.pdf

The generated heap-report.pdf will contain a breakdown of the memory usage.

Note that the report generation with the jeprof program must be done on the same system where the profiling was done. If done elsewhere, the binaries do not necessarily match and can cause the report generation to fail.

Tcmalloc Heap Profiling

Similarly to the jemalloc memory allocator, the tcmalloc memory allocator comes with a leak checker and heap profiler.

Installation

RHEL, CentOS and Rocky Linux

On RHEL based systems, the gperftools package is in the EPEL repositories. These must be first enabled by installing the epel-release package.

sudo dnf -y install epel-release

After this, the gperftools package can be installed.

sudo dnf -y install gperftools

Ubuntu 20.04

sudo apt -y install google-perftools

Service file configuration

Once tcmalloc is installed, edit the systemd service file with systemctl edit mariadb.service and add the following lines into it.

Note: Make sure to use the correct path and library name to the tcmalloc library in LD_PRELOAD. The following example uses the Debian location of the library. The file is usually located in /usr/lib64/libtcmalloc_and_profiler.so.4 on RHEL systems. The version number of the library can also change which might require other adjustments to the library path.

[Service]
Environment=LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_and_profiler.so.4
Environment=HEAPPROFILE=/var/lib/mysql/pprof/mariadbd.prof
Environment=HEAPCHECK=normal
Environment=HEAP_CHECK_AFTER_DESTRUCTORS=true

Then create the directory for the profile files:

mkdir /var/lib/mysql/pprof/
chown mysql:mysql /var/lib/mysql/pprof/

And finally restart MariaDB with systemctl restart mariadb.service.

Configuring Heap Dump Frequency

The heap profiling is configured using environment variables. The details can be found in the Modifying Runtime Behavior section of the gperftools documentation: heapprofile.html

By default, tcmalloc dumps the heap profile every time 1GiB of memory has been allocated (HEAP_PROFILE_ALLOCATION_INTERVAL) or whenever the high-water memory usage mark increases by 100MiB (HEAP_PROFILE_INUSE_INTERVAL). If there's no activity, no memory dumps will be generated.

To trigger a memory dump based on a time interval, set the HEAP_PROFILE_TIME_INTERVAL environment variable to the number of seconds between each dump. For example, with Environment=HEAP_PROFILE_TIME_INTERVAL=3600 there will be one heap dump per hour.

Report generation

Depending on which OS you are using, the report generation program is named either pprof (RHEL) or google-pprof (Debian/Ubuntu).

It is important to pick the latest .heap file to analyze. The following command generates the heap-report.pdf from the latest heap dump. The file will show the breakdown of the memory usage.

pprof --pdf /usr/sbin/mariadbd $(ls /var/lib/mysql/pprof/*.heap|sort -V|tail -n 1) > heap-report.pdf

See also

  • Using MariaDB with tcmalloc or jemalloc

This page is licensed: CC BY-SA / Gnu FDL

Legal Documents

There are several legal aspects related to MariaDB related to licensing and contributing code

MariaDB Contributor Agreement

Questions? See the MCA FAQ page.

MariaDB Contributor Agreement (MCA)

These terms apply to your contribution of materials to a product or project owned or managed by us ('project'), and set out the intellectual property rights you grant to us (The MariaDB Foundation) in the contributed material. If this contribution is on behalf of a company, the term 'you' will also mean the company you identify below. If you agree to be bound by these terms, fill in the information requested below and provide your signature. Read this agreement carefully before signing.

  • The term 'contribution' means any source code, object code, patch, tool, sample, graphic, specification, manual, documentation, or any other material posted, committed or submitted by you to a project. Each submission must explicitly be marked that it's donated under the MCA.

  • With respect to any worldwide copyrights, or copyright applications and registrations, in your contribution:

    • you hereby assign to us joint ownership, and to the extent that such assignment is or becomes invalid, ineffective or unenforceable, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free, unrestricted license to exercise all rights under those copyrights. This includes, at our option, the right to sub-license these same rights to third parties through multiple levels of sub-licensees or other licensing arrangements; you agree that each of us can do all things in relation to your contribution as if each of us were the sole owners, and if one of us makes a derivative work of your contribution, the one who makes the derivative work (or has it made) will be the sole owner of that derivative work;

    • you agree that you will not assert any moral rights in your contribution against us, our licensees or transferees;

    • you agree that we may register a copyright in your contribution and exercise all ownership rights associated with it; and

    • you agree that neither of us has any duty to consult with, obtain the consent of, pay or render an accounting to the other for any use or distribution of your contribution.

  • With respect to any patents you grant licenses to without payment to any third party, which would restrict us from using the contributed code as if we would own a shared copyright to it, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free license to: make, have made, use, sell, offer to sell, import, and otherwise transfer your contribution in whole or in part, alone or in combination with or included in any product, work or materials arising out of the project to which your contribution was submitted, and

    • at our option, to sub-license these same rights to third parties through multiple levels of sub-licensees or other licensing arrangements.

    • Note that you don't give us rights to the patent in any other way than to give us the right to freely use the contributed code in the above ways.

  • Except as set out above, you keep all right, title, and interest in your contribution. The rights that you grant to us under these terms are effective on the date you first submitted a contribution to us, even if your submission took place before the date you sign these terms. Any contribution we make available under any license will also be made available under a suitable FSF (Free Software Foundation) or OSI (Open Source Initiative) approved license.

  • With respect to your contribution, you represent that:

    • it is an original work and that you can legally grant the rights set out in these terms;

    • it does not to the best of your knowledge violate any third party's copyrights, trademarks, patents, or other intellectual property rights; and

    • you are authorized to sign this contract on behalf of your company (if identified below).

  • These terms will be governed by the laws of the Finland. Any choice of law rules will not apply.

Signatory Information

Your user name on GitHub:



Your contact information (Please print clearly): 

Your name:



Your company's name (if applicable):



Mailing address:



Telephone:



Email:

 

Your signature:



Date:

To deliver these terms to us, scan and email to

f o u n d a t i o n [at] m a r i a d b (dot) o r g

MariaDB Contributor Agreement – version 1.0

Questions? See the MCA FAQ page.

This page is licensed: CC BY-SA / Gnu FDL

MariaDB Contributor Agreement FAQ

What is the Intention of the MCA?

  1. The original copyright holder and the MariaDB Foundation both get (shared) copyright for the code.

  2. Both parties can use their copy independently of each other for any purpose (as if each copy was completely separate and unique).

What if I Don't Want to Submit Code to You Under the MCA?

If you don't like the MCA for code that you want to contribute to MariaDB, the other option is to submit your code under the BSD-new license.

What is the difference between MCA and BSD-new?

The main difference is that MCA is a copyright assignment agreement; you assign the copyright to the MariaDB Foundation (but you retain it too, on your copy of the contribution), while BSD-new is a license; you allow the MariaDB Foundation to use your contribution under specific (albeit rather broad) terms.

When should I use MCA and when should I use BSD-new?

With MCA, the MariaDB Foundation becomes a copyright holder, and it can defend the GPL and pursue violators, protecting its and your rights. This is the same reason why the Free Software Foundation requires copyright assignment too.

On the other hand, BSD-new is a well known FSF-approved Open Source license.

If you contribute on behalf of your employer, your employer's legal department is most probably already familiar with the BSD-new license. It might be much easier and faster to get them to approve your contribution if you contribute under this license.

If you are an individual contributor (or if your employer is fine with MCA), it makes sense to contribute under MCA, as it will allow the MariaDB Foundation to defend its Open Source way when needed.

For most other practical purposes there is no difference between these two ways of contributing. You keep the copyright to your contribution. And the MariaDB Foundation will be able to use your contribution in MariaDB Server, relicense it (for example, as GPLv2 or GPLv3) or use separately as a patch.

What Will the MariaDB Foundation Do With the Code I Contribute?

The MariaDB Foundation agrees that when it dual licenses code, it will not restrict the way the third party licensee uses the licensed copy of the code nor restrict how they use their own code.

Where Did the MCA Come From?

The MCA is based on Sun's Contributor Agreement. We think it is one of the best contributor agreements around!

What's the Difference Between Sun's SCA and the MCA?

Sun's SCA was unclear in a couple of points which we have fixed in the MCA:

  • The patent grant has been made clear that it only affects the donated code, not any other code.

  • We added a clarification how code is donated; "Each submission must explicitly be marked that it's donated under the MCA"

  • We have made a promise in this FAQ how the MariaDB Foundation will Dual-license code. See Monty's blog post "Thoughts about Dual-licensing Open Source software" for a discussion about this.

What Happened to the Monty Program Contributor Agreement?

With the establishment of the MariaDB Foundation, Monty Program Ab transferred all existing signed contributor agreements to the Foundation.

Other Questions?

Question not answered? Subscribe to the maria-developers mailing list and ask there or visit us on our Zulip channel or the #maria IRC channel on.

This page is licensed: CC BY-SA / Gnu FDL

Frequently Asked Questions

There are many questions that are frequently asked about MariaDB, the organizations and the software. They have been grouped based on common topics. Click on a topic to find the question you may have. If you don't see it, please click on the button in the left margin labeled, "Ask a Question Here". We'll try to answer you question, quickly.

MariaDB Community Server

  • Q: What is MariaDB Community Server and its main purpose? A: MariaDB Community Server is a popular free and open-source relational database management system (RDBMS), originally forked from MySQL. It is actively developed and maintained by MariaDB plc and a global community of contributors. It's known for its strong performance, stability, and a rich feature set suitable for a wide range of applications.

  • Q: Can MariaDB Community Server be used as a direct MySQL replacement? A: Yes, in many scenarios and for numerous versions, MariaDB Community Server is designed to be a compatible, drop-in replacement for MySQL, facilitating easy migration for users. However, as both MariaDB and MySQL evolve independently, some differences may exist in newer or more advanced features.

  • Q: What are the key benefits and features of MariaDB Community Server? A: Key benefits of MariaDB Community Server include its true open-source nature (GPLv2 license), a broader selection of advanced storage engines (like Aria, MyRocks, and Spider, in addition to the standard InnoDB), continuous feature enhancements, performance improvements, and the support of a vibrant, active global community.

  • Q: What license is MariaDB Community Server distributed under? A: MariaDB Community Server is licensed under the GNU General Public License version 2 (GPLv2), ensuring it remains free and open source.

  • Q: Where can I download MariaDB Community Server? A: You can download official MariaDB Community Server packages from the MariaDB Foundation website (mariadb.org). Additionally, it is readily available in the software repositories of most major Linux distributions for easy installation.

The mariadb-backup Tool

  • Q: What is mariadb-backup and its primary use? A: mariadb-backup is a command-line utility included with MariaDB Community Server. Its primary use is to perform physical "hot" backups of MariaDB database instances, especially those utilizing transactional storage engines like InnoDB and XtraDB. This tool is based on the well-regarded Percona XtraBackup technology.

  • Q: What are the main advantages of using mariadb-backup for database backups? A: The main advantages of mariadb-backup are its ability to perform online (hot) backups with minimal locking. This means your MariaDB database remains largely available for both read and write operations during the backup process. For large databases, mariadb-backup is generally much faster than logical backup methods (like mariadb-dump) and, when used in conjunction with MariaDB's binary logs, it enables precise point-in-time recovery (PITR).

  • Q: How does the mariadb-backup utility work to create backups? A: mariadb-backup works by copying the physical data files from the MariaDB data directory while the server is running. It continuously monitors for changes made to these files during the backup operation and records these modifications in a separate log file. In a subsequent "prepare" phase, these logged changes are applied to the copied data files to ensure they are brought to a transactionally consistent state, ready for restoration.

  • Q: Is mariadb-backup a suitable backup solution for all MariaDB storage engines? A: mariadb-backup is primarily designed for, and works most effectively with, transactional storage engines such as InnoDB and XtraDB, for which it can guarantee consistency and perform online backups. While it might be able to copy data files for other storage engines like Aria or MyISAM, it cannot ensure the same level of transactional consistency or perform true online backups for them. For non-transactional engines, mariadb-dump or filesystem-level snapshots might be more appropriate choices.

  • Q: Is the mariadb-backup tool free to use? A: Yes, mariadb-backup is an open-source tool that is included as a standard utility with MariaDB Community Server and is completely free to use.

Community Questions

How Can I Contribute to MariaDB?

There are many ways to contribute to MariaDB. If you want to contribute to, or participate in the development of MariaDB, there are many ways to do so. You don't have to be a developer (but we always welcome more of those), you just have to have the willingness to help make MariaDB better. For example, we are looking for writers or translators of KB articles and people helping setting up MariaDB discussions groups.

Contributing to the MariaDB Project is the page that gathers together everything you need to get started.

The community category contains a lot of interesting links about how to participate.

You can also consider sponsoring a feature.

Welcome to the MariaDB community!

This page is licensed: CC BY-SA / Gnu FDL

DBA Questions

Developer Questions

How can a VBA Application Connect to MariaDB?

How can I Learn about Developing MariaDB?

See MariaDB Development for this.

This page is licensed: CC BY-SA / Gnu FDL

How many JOIN Clauses are Allowed in a Query?

According to MySQL docs: joins-limits.html

The maximum number of tables that can be referenced in a single join is 61. This also applies to the number of tables that can be referenced in the definition of a view.

Does the same maximum (61) applies to MariaDB, too?

If yes, are there plans for raising this limit?

This page is licensed: CC BY-SA / Gnu FDL

Remote connection work but often get error HY000/2002 connection timed out

Hi all,

I have create a remote connection between two linux server running mariadb. One run centos stream 9 with 10.5.16-MariaDB and the second run centos 7 with 5.5.68-MariaDB. Through php script automated with cron I run queries and it work.

The problem is that often i get the error HY000/2002 connection timed out. The error occurs at random times either using cron or launching it manually. For example, the script work for 10 times and on the 11th time I receive the error (HY000/2002 connection timed out) which can occur one or more times consecutively and then return to work on the umpteenth execution. The number of times the script works or not is always different.

I can't find the cause of the error. Any suggestions?

Thanks

This page is licensed: CC BY-SA / Gnu FDL

Tools Available for Developing on the MariaDB Code

The code is hosted on server. You can branch the latest code from there, and you can also push your own changes as a new branch that can be shared with others.

Building the code from source is done with standard Unix tools: CMake (or autotools for MariaDB versions below 5.5), Gnu Make, GCC (or other C/C++ compiler on some systems). On Windows, CMake and Visual Studio are used.

The current state of the source with respect to build/test failures can be seen in buildbot.

For project management and bug tracking, we use JIRA.

The source page has links to instructions on setting up a full development environment, if you are interested.

This page is licensed: CC BY-SA / Gnu FDL

Where is the table_cache in MariaDB?

All versions of MariaDB are based on MySQL 5.1 and greater, thus the table_cache option is deprecated in favor of table_open_cache. This is also documented at: mariadbd Options.

For further reading, please refer to the MySQL manual: How MySQL Opens and Closes Tables.

Examples of use cases:

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE 'opened_tables';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Opened_tables | 354858 |
+---------------+--------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT @@table_open_cache;
+--------------------+
| @@table_open_cache |
+--------------------+
|                400 |
+--------------------+
1 row in set (0.00 sec)

This page is licensed: CC BY-SA / Gnu FDL

Why is ORDER BY in a FROM Subquery Ignored?

Query with ORDER BY in a FROM subquery produces unordered result. Is this a bug? Below is an example of this:

SELECT field1, field2 FROM ( SELECT field1, field2 FROM table1 ORDER BY field2 ) ALIAS

returns a result set that is not necessarily ordered by field2. This is not a bug.

A "table" (and subquery in the FROM clause too) is - according to the SQL standard - an unordered set of rows. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order. That's why the optimizer can ignore the ORDER BY clause that you have specified. In fact, the SQL standard does not even allow the ORDER BY clause to appear in this subquery (we allow it, because ORDER BY ... LIMIT ... changes the result, the set of rows, not only their order).

You need to treat the subquery in the FROM clause, as a set of rows in some unspecified and undefined order, and put the ORDER BY on the top-level SELECT.

Source: MDEV-3926, Comment by Sergei Golubchik

See also

  • MDEV-3926, MDEV-5007, MDEV-3795.

  • SELECT

  • ORDER BY

This page is licensed: CC BY-SA / Gnu FDL

General Questions

Frequently Asked Questions for MariaDB that also apply to MySQL

Does the Password Prompt Appear Before or After Establishing a Connection?

Question: When I use the mysql client as in mysql -u root -p -h <host>, does the password-enter prompt popup before or after the client has established a connection with the server?

Answer: Before the client has established a connection to the server.

How do we derive the answer?

lovegood:~ byte$ mysql -uroot -p -hlocalhost
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 1
Server version: 5.2.8-MariaDB Source distribution

This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> \q
Bye
lovegood:~ byte$ mysql.server stop
Shutting down MySQL
. SUCCESS! 
lovegood:~ byte$ mysql -uroot -p -hlocalhost
Enter password: 
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)

This page is licensed: CC BY-SA / Gnu FDL

What is the Goal of MariaDB?

To provide a community developed, stable, and always Free DBMS that is, on the user level, broadly compatible with MySQL.

We strive for interoperability with both our own, and our upstream, communities.

Until MariaDB 5.5, MariaDB was kept up to date with the latest MySQL release from the same branch. For example MariaDB 5.1.47 was up to date with MySQL 5.1.47, and so on.

We did a merge from the main MySQL branch for every new MySQL release or when there was some critical bugfix applied to the main branch.

Since MariaDB 10.0, MariaDB includes backported features from MySQL as well as entirely new features not found anywhere else, but does not necessarily include all MySQL features.

We strive to keep our main trees as free from bugs as possible. It should be reasonably safe to pull from our trees at any time.

This page is licensed: CC BY-SA / Gnu FDL

Why is the Software Called MariaDB?

The 'MySQL' name is trademarked by Oracle, and they have chosen to keep that trademark to themselves. The name MySQL (just like the MyISAM storage engine) comes from Monty's first daughter My. The first part of 'MySQL' is pronounced like the English adjective, even if this doesn't match the correct pronunciation of the Finnish name.

MariaDB continues this tradition by being named after his younger daughter, Maria.

The name Maria was initially given to a storage engine. After MariaDB was started, to avoid confusion, it was renamed to Aria. The new name was decided as a result of a contest.

This page is licensed: CC BY-SA / Gnu FDL

High Availability Questions

Licensing Questions

Here you can find answer to questions about licensing MariaDB/MySQL/plugins etc

Licensing FAQ

In this article we will explain how the MariaDB and MySQL server and client library licenses affect other applications and what steps you can take to comply with the GPL or avoid having to change your applications to GPL.

In the following text we talk about MariaDB. The guidelines also apply to MySQL in general, at least up to 5.1.55 which was the latest stable release at the time of writing.

Licenses used by MariaDB

MariaDB is distributed under the GPL license, version 2.

The MariaDB client libraries for C, Java and ODBC are distributed under the LGPL license, version 2.1 or later. The LGPL license allows you to distribute these MariaDB client libraries freely with any application.

The MariaDB client library included with the MariaDB server is also GPL version 2, but has a FLOSS exception that allows you to combine it with most other open source software, without conflicting with their license, even if that license is incompatible with the GPL. We do however recommend you to use the new client libraries for any non-GPL application.

Internal usage is free

The GPL license only affects code that you distribute to other parties.

Internal usage within an organization is totally free and not subject to any conditions. There is no such thing as 'internal distribution' that would restrict the usage of your code by requiring it to be GPLed.

Connecting to a remote service that runs MariaDB (or any other GPL software) in the background is also free.

For internal programs for which you own all the copyright(s), there is essentially no risk in using GPL software. The argument you can use in your defense is that if the software became GPL as part of the distribution, you as the copyright holder could immediately revert your part back to its original copyright. No one has the right to require you to reveal or redistribute your code to the outside of your organization even if you would have distributed it internally linked with GPL software!

If your lawyers are concerned about distributions of software linked with GPL libraries between different legal entities within your organization, you can solve this by distributing your components and the GPL software separately, and have your other entity combining them. You can also switch to use the new LGPL client libraries.

Distributing an application with a MariaDB connector/client

This section is for those that want to distribute the MariaDB client library code, but not the server, with their applications.

Free software/open source applications

If your application is Free software/open source and uses one of the licenses listed in the FLOSS exception, the GPL in the client library does not affect your application.

In other cases we recommend you to use the new LGPL client libraries.

Using a connector that is not GPL

If you are using a connector that is not GPL, you are only bound by the license of that connector. Some examples are:

  • MySQL native driver for PHP - mysqlnd

  • ruby-mysql

  • LGPL client libraries or C, Java and ODBC.

The above have licenses that allow you to use them freely, without you being bound by the GPL.

Using a database source independent framework

If you are using a framework that allows you to connect dynamically to different RDBMS systems, any GPL licensed module loaded by the framework will not affect the application. Such frameworks are

  • ODBC (Open Database Connectivity)

  • JDBC (Java Database connectivity)

  • Perl

  • PHP PDO MySQL driver

The reason the GPL in the MySQL client library would not affect your application in this case is that the GPL client is supporting a standard interface and is thus merely an optional component among many. Your application could just as easily use the framework to connect to a RDBMS other than MariaDB or MySQL.

Any software can be connected to the GPL v2 licensed MySQL Connector/ODBC, without the need for that software to be GPLed. This is because there is a piece of general management software, the ODBC manager, between the GPLed MySQL Connector/ODBC and your software. If any logic would require the software which interfaces with MySQL Connector/ODBC to be GPL, then that would apply also to the ODBC manager itself. Yet, the ODBC manager is not GPL, neither on Windows nor on Linux. By consequence, no one would be allowed to use MySQL ODBC driver for anything.

Using the MariaDB client library for C

If your application is using a license that is not covered by the FLOSS exception, then you should use the new LGPL client libraries or C.

The LGPL license allows you to distribute these MariaDB client library freely with any application. If you modify the client library, you need to publish the new source code.

Distributing a proprietary application with the MariaDB / MySQL server

When you are distributing your application together with MariaDB or MySQL you are bound (or can be seen to be bound by some lawyers) by the GPL if some of the following statements apply:

  • You are using GPL code from MySQL linked directly to your application. (Like the MySQL GPL client library).

  • Your application requires the MariaDB server to work and without the MariaDB server it doesn't start or it has very limited functionality.

The problem with the client library can be avoided by using one of the solutions mentioned earlier.

If your application works with many databases, either natively or by using one of the database source independent frameworks, then you can freely distribute the MariaDB server with your application without being affected by the GPL. The reason for this is that MariaDB would only be an optional, independent component in your software distribution and section 2 of the GPL explicitely allows this:

"In addition, mere aggregation of another work not based on
the Program with the Program (or with a work based on the Program) on
a volume of a storage or distribution medium does not bring the other
work under the scope of this License."

You also have the option to buy licenses for MySQL from Oracle to get MySQL under other copyright terms. If you would like to later be able to use MariaDB instead of MySQL, please ensure that your license agreement allows you to make changes to the MySQL code! (This is something that you should ensure in all cases as otherwise you may run into bugs that Oracle will not fix, you are not allowed to fix and could make MySQL software unusable for you!)

The rights to use the MariaDB code changes in your application can be requested from SkySQL.

Legal notice

The text above is written by Michael "Monty" Widenius, who is not a lawyer and you should not regard any statements of the above as 'ultimate truth' in all scenarios. On the other hand, it was David and Monty who together decided to make MySQL GPL and also decided and openly declared the intentions behind this license change, so there is some merit to information in this article.

If you want a second opinion of how GPL works in this case, you can contact Software Freedom Law Center or Free Software Foundation about this. Neither part has had anything to do with this KB article but they are the default authorities to turn to when you want to know more about the GPL or LGPL.

This page is licensed: CC BY-SA / Gnu FDL

MariaDB Licenses

MariaDB Software Questions

What is MariaDB?

MariaDB is a backwards compatible, drop-in replacement of the MySQL® Database Server. It includes all major open source storage engines.

You can see a list of all current and old releases as well as a feature comparison to MySQL.

The source code for MariaDB is publically available on GitHub. Binaries and packages are also available.

This page is licensed: CC BY-SA / Gnu FDL

What is MariaDB's Release Policy and Schedule?

The MariaDB Development Plans page links to plans for future versions of MariaDB.

The release schedule for upcoming MariaDB releases can be found on the MariaDB Jira release page.

See also the release criteria page.

This page is licensed: CC BY-SA / Gnu FDL

What will be in MariaDB version X?

You can read about the features and improvements included in the current stable release of MariaDB on the MariaDB versus MySQL page.

The MariaDB Development Plans page links to our plans for future versions of MariaDB.

This page is licensed: CC BY-SA / Gnu FDL

When will MariaDB version X be released?

The downloads page has the current stable and development releases.

See the MariaDB Development Plans page for plans for future versions of MariaDB.

This page is licensed: CC BY-SA / Gnu FDL

Migrating to MariaDB Questions

How can I Import Only a Table's Structure?

How do I migrate a MySQL database to Amazon RDS?

Amazon Relational Database Service (Amazon RDS) is part of the Amazon Web Services, that allows one to setup and run a relational database in the cloud. It supports running MariaDB in addition to MySQL, PostgreSQL, and other common relational databases.

Gee-Hwan Chuang of Listia has written a guide titled Moving a Production MySQL Database to Amazon RDS with Minimal Downtime. He recommends using mysqldump.

This page is licensed: CC BY-SA / Gnu FDL

Red Hat 6 Process Limit a Errors

When upgrading from MySQL 5.1 to MariaDB 5.5 you may encounter a process limit problem with your MariaDB installation. This is not MariaDB at fault. The issue is likely triggered with high connection counts to the database and possibly exacerbated by not using Thread pooling.

You might encounter the following error message if you are not out of available memory:

##SQLSTATE[HY000] [1135] Can't create a new thread (errno 11);##

You can consult the manual for a possible OS-dependent bug when trying to connect. There is a very good explanation and troubleshooting with a work around on the Percona MySQL Performance Blog

Review your connections and thread consumption usage prior to doing an upgrade to avoid this being a problem.

This page is licensed: CC BY-SA / Gnu FDL

Public Questions

Questions asked by the public about MariaDB and related software. Note that bugs, problems and features requests should be submitted to the MariaDB bug database so that they can be process properly by the MariaDB developer and QA teams.

Support Questions

Getting Help With MariaDB

If you can't find help in the MariaDB documentation, you can also subscribe to the mailing lists or Zulip to communicate with MariaDB users and developers.

There's an active help community on StackOverflow for programming with SQL and indexing questions, and DBA StackExchange for non-programming questions for example tuning/backups/architecture/migration/replication etc. Use the mariadb tag so questions are searchable by those that can help you.

If you have a question about a feature that is not properly documented or something that is not working as expected, go to the corresponding documentation page and ask your question there.

You can report and check on bugs which apply to MariaDB in JIRA.

MariaDB plc offers commercial support.

This page is licensed: CC BY-SA / Gnu FDL

Supported Software & Features

Here you can ask questions if MariaDB supports a certain feature that doesn't fit into any specific category.

Before asking a questions about a non existing feature, please first check the JIRA if there is already plans for that feature.

Contributing & Participating

There are many ways to contribute to MariaDB.

Cover

Contributing Code

For contributors interested in MariaDB development, explore open projects via JIRA and check for beginner-friendly tasks. Engage with the community on the maria-developers mailing list, Slack , Zulip, or IRC channel for guidance.

Cover

Contributing to the MariaDB Project

The success of MariaDB relies heavily on community involvement. You can contribute in various ways, even if you are not a developer.

Contributing Code

For contributors interested in MariaDB development, explore open projects via JIRA and check for beginner-friendly tasks. Engage with the community on the maria-developers mailing list, Slack , Zulip, or IRC channel for guidance.

General information about contributing to MariaDB (for developers and non-developers) can be found on the Contributing to the MariaDB Project page.

Finding Development Projects to Work on

There are many open development projects for MariaDB which you can contribute to (in addition to any ideas you may have yourself).

  • We are using JIRA to manage the MariaDB project. Go to jira.mariadb.org and click on "Projects" to get to the MariaDB project. Browse around the unresolved and unassigned issues to see if there is something that interests you. Some issues have sponsors and you can be paid for doing them!

  • A list of beginner friendly tasks is also available.

  • Check the development plans for the next MariaDB version.

  • Join maria-developers and ask for suggestions of tasks you could do. Please include your programming experience and your knowledge of the MariaDB source and how much you know about using MySQL/MariaDB with the email so that we know which tasks we can suggest to you.

  • If this is your first project, check out the Suggested Development page. It lists projects that will make a good start.

  • Join MariaDB's Zulip instance at mariadb.zulipchat.com and ask for suggestions.

  • Join #/maria on IRC and ask for suggestions.

If you have your own ideas, please submit them to JIRA so other MariaDB developers can comment on them and suggest how to implement them. You can of course also use the maria-developers list for this.

What to Expect From a MariaDB Server Developer

This section is mainly directed to developers with commit rights to the MariaDB git repository. However, we hope it’s also useful for anyone wanting to contribute code to MariaDB to know what a reviewer will expect from them.

This is not about coding style or if one should prefer C instead of C++. That would be a separate topic that should be created sooner or later.

The Basics

When coding, try to create code that 'never has to be changed again'. Try to make the code as performant as possible. In general it is acceptable to spend 50% more time to make the code 15% faster than what you originally intended. Take that into account when you plan your time estimates! That said, don't try to add classes or functionality that is not yet used.

The code should be easy to read and follow the coding standards of the project. Patches that are smaller and simpler are often better than complex solutions. Don't make the server depend on new external libraries without first checking with Sergei or Monty!

Add code comments for anything that is not obvious. When possible, use assertions within the code to document expectations of arguments etc. In general, if the code requires complex comments, think if there is a better way to structure the logic. Simpler is often better and with fewer bugs.

What to Have in a Commit Comment

  • Jira issue number and summary ex:MDEV-23839 innodb_fast_shutdown=0 hang on change buffer merge

  • An empty line

  • A short description of the problem

  • A description of the solution

  • Any extra information needed to understand the patch

  • The commit message should be self contained and the reviewer shouldn't preferably have to look at the Jira at all to understand the commit. This doesn’t mean that the commit message should include all background and different design options considered, as the Jira should contain.

  • Name of all reviewers and authors should be clear from the commit message. The preferred way would be (one line per person)

    • Reviewed-by: email

    • Co-authored-by: email

    • See creating-a-commit-with-multiple-authors for details

  • The default is that all code should be reviewed. Only in really extraordinary cases, like merge (where the original code was already reviewed) then it can be self-reviewed, which should clear from the commit. In this case the code should of course be tested extra carefully both locally and in buildbot before pushing.

Testing

  • All code should have a test case that shows that the new code works or, in case of a bug fix, that the problem is fixed! It should fail with an unpatched server and work with the new version. In the extreme case that a test case is practically impossible to do, there needs to be documentation (in the commit message, optionally also in Jira) how the code was tested.

  • The test case should have a reference to the Jira issue, if such one exists.

  • Patches related to performance should be tested either by the developer (for simple commits) or by performance testers. The result should be put in Jira with a summary in the commit.

  • Complex patches and should be tested by QA in a bb- branch before pushing. The Jira entry should include information that this has been done and what kind of test has been run.

    • Example: git push --force origin HEAD:bb-11.8-MDEV-1234

  • For anything not trivial, one should run either Valgrind or ASAN/MSAN on the new code. (Buildbot will do this for you if you can’t get valgrind or ASAN to work). At least the test case added should be tested by either. If the developer cannot do that for some reason, he should check the buildbot builders that do this and ensure that at least his test case doesn’t give any warnings about using not initialized memory or other failures.

  • For complex code the developer should preferably use gcov or some similar tool to ensure that at least not all not-error branches are executed. “mtr --gcov” or “dgcov.pl” can help you with this.

Getting Your Code into the Main MariaDB Tree

All code in MariaDB comes from one of the following sources:

  • MySQL

  • Code developed by people employed by the MariaDB Foundation.

  • Code developed by people employed by MariaDB Corporation.

  • Code shared with the MariaDB Foundation under the MCA.

  • Code with a known origin that is under a permissive license (BSD or public domain).

If you want the code to be part of the main MariaDB tree, you also have to give the MariaDB Foundation a shared copyright to your code. This is needed so that the foundation can offer the code to other projects (like MySQL).

You do this by either:

  1. Signing the MariaDB Contributor Agreement (MCA) and then scanning and sending it to the foundation.

  2. Sending an email to maria-developers where you say that your patch and all fixes to it are provided to the MariaDB Foundation under the MCA.

  3. Licensing your code using the BSD license.

We need shared copyright for the following reasons:

  1. to defend the copyright or GPL if someone breaks it (this is the same reason why the Free Software Foundation also requires copyright assignment for its code)

  2. to be able to donate code to MySQL (for example to fix security bugs or new features)

  3. to allow people who have a non-free license to the MySQL code to also use MariaDB (the MCA/BSD allows us to give those companies the rights to all changes between MySQL and MariaDB so they can use MariaDB instead of MySQL)

More information about the MCA can be found on the MCA FAQ page.

Before Pushing Code to a Stable Branch

  • Ensure that you have compiled everything for your new code, in a debug server (configured with cmake -DCMAKE_BUILD_TYPE=Debug ) including embedded and all plugins that may be affected by your code change..

  • Run the mysql-test-run (mtr) test suite locally with your debug server.

  • For anything complex the full test suite should be run.

  • For something absolutely trivial, at least the main suite must be run.

  • Always push first to a bb- branch to test the code. When the bb- branch is green in buildbot you can push to the main branch. Take care of checking that Windows builds compiles (take extra care of checking this as this often fails) and that valgrind and msan builds doesn’t show any problems with your new test cases.

    • You can find your push at the link similar to grid?branch=your-branch-name.

  • If you have to do a rebase before pushing, you have to start from the beginning again.

  • When porting code from third parties (such as MySQL), make sure to attribute copyright to the right owner, in the header of each modified file.

  • For example: Copyright © 2000, 2018, Oracle and/or its affiliates. Copyright © 2009, 2020, MariaDB

  • The only exception is that if the changes are trivial and the rebase was trivial and the local mysql-test-run worked, then you can push directly to the main branch. Only do this if you are 99% sure there are no issues! * Please don't make us regret that we have made this one exception! When we have protected git branches, then the above rule will be enforced automatically as the protection will take care of this.

Working on a New Project

  • First create a Jira entry that explains the problems and the different solutions that can be used to solve the problem. If there is a new syntax include examples of queries and results.

  • After getting an agreement of the to-be-used solution, update the Jira entry with the detailed architecture of the suggested solution.

  • When the architecture is reviewed, the assigned developer can start coding.

  • When the code is ready, the Jira entry should be updated with the reviewer.

  • The reviewer checks the code and either approves it to be pushed or gives comments to the developers that should be fixed. In the later case the developer updates the code and gives it back to the reviewer. This continues until the code is approved.

  • If the design changes during the project, the design in Jira needs to be updated.

Working on a Bug Fix

  • Ensure that the Jira issue is up to date.

  • For complex bugs that require redesign, follow the process in "Working on a new project"

  • For simpler bugs, one can skip the listing of different solutions and architecture. However one should still document the reason for the bug and how it's fixed or to-be-fixed, in a JIRA comment.

Making Things Easier for Reviewers

  • Ensure that code compiles, all MTR test works before asking for a review

  • Try to split a bigger project into smaller, self-contained change sets.

  • Automatic things, like renames of classes, variables, functions etc is better to be done in a separate commit.

When Reviewing Code

Remember that the stability and security of any project hangs a lot on the reviewers. If there is something wrong with an accepted patch, it's usually the reviewer who is to be blamed for it, as the reviewer was the one who allowed it to go in!

  • Ensure that the code is licensed under New BSD or another approved license for MariaDB (basically any open source license not conflicting with GPL) or that the contributor has signed the MCA.

  • GPL is only allowed for code from MySQL (as MariaDB is already depending on MySQL code).

  • Ensure that commits are not too large. If the code is very large, give suggestions how to split it into smaller pieces. Merge commits, when rebasing is possible, are not allowed, to keep history linear.

  • Check that the commits message describes the commit properly. For code that improves performance, ensure that Jira and the commit message contains information about the improvements.

  • Check that there are no unexplained changes in old tests.

  • Check the quality of the code (no obvious bugs, right algorithms used)

  • Check if any code can be simplified or optimized. Using already existing functions, are loops optimal, are mutexes used correctly etc.

  • Check that there is an appropriate test case for the code. See ‘testing’ for what is required!

  • Ensuring the code follows the coding standard for MariaDB. This document should be created shortly, but in the meantime ask an old MySQL/MariaDB developer if you are unsure.

  • Ensuring that the code follows the architecture agreed for in Jira (if it's in Jira).

  • Code should be easy to understand (good code comments, good function and variable names etc).

  • Ensure you understand every single line of code that is reviewed. If not, ask the developer to add more comments to get things clear or ask help from another reviewer.

  • No performance degradations for all common cases.

  • Any code that touches any sensitive area (files, communication, login, encryption or security) needs to have another reviewer that is an expert in this area.

See Also

  • Getting Started For Developers (mariadb.org)

  • Get the Code, Build It, Test It (mariadb.org)

  • Writing Good Test Cases for MariaDB Server (mariadb.org)

  • Submitting a Pull Request (mariadb.org)

  • Contributing to the MariaDB Project (for non-developers)

<<nowiki>>
<<style class="redbox centered">>
**The MariaDB source is now hosted on Github:  https://github.com/MariaDB/server. \\See [[using-git|Using Git]]. The information below is outdated and will be rewritten at some point.**
<</style>>

== Prerequisites

You need [[http://bazaar-vcs.org|Bazaar]] for revision control.

= = = bzr Login Setup

* Get a launchpad account at https://launchpad.net/+login
* When logged in, setup your SSH keys
**  Click your name (top right corner)
**  Click Change Details link (upper right)
**  Click SSH Keys button (middle of page)
**  Upload your public SSH key
([[https://help.launchpad.net/YourAccount/CreatingAnSSHKeyPair|How do I get a public key?]])
* Register yourself with launchpad from your local commandline: <<fixed>>bzr launchpad-login [yourloginid]<</fixed>>

=== Getting the MariaDB Code

# First, [[getting-the-mariadb-source-code|get a clean copy of the MariaDB code]]
(for some, the fastest way to get up and running is to follow the
instructions in the "Source Tree Tarball" section).

#decimal.2 Once you have a clean copy of the source, create a working copy for
  your changes:
<<code lang=sh inline=false indent=1>>
cd $maria-repo # ex: ~/repos/maria
bzr branch trunk maria-fix-bugNNNNNN
cd maria-fix-bugNNNNNN
<</code>>

<<style class="bluebox">>
**Tip:** //Use descriptive names such as maria-fix-bugNNNNNN (where NNNNNN is the bug # of course).//
<</style>>

#decimal.3 You should now be ready to [[generic-build-instructions|Compile MariaDB]]. It's
a good idea to compile at this stage to confirm the source tree you are going
to be hacking on works . . . before you make any changes.

The following pages will help you get up and running with the MariaDB source
code:

* [[getting-the-mariadb-source-code|Getting the MariaDB Source Code]]
* Setup your build environment:
** [[build-environment-setup-for-linux|Linux]]
** [[build-environment-setup-for-mac|Mac]]
** [[building-mariadb-on-windows|Windows]]
* [[compiling-mariadb-from-source|Building MariaDB]]

== Setting up tests

The Maria test suite is contained in the <<code>>./mysql-test/<</code>>
subdirectory of the source tree. The mysql-test directory has two
subdirectories of utmost concern to you, the bug fixer: the <<code>>t/<</code>>
directory and the <<code>>r/<</code>> directory (for "tests" and "results"
respectively). Be sure to check and see if your bug already has a test too.

All the tests are found in the <<code>>t/<</code>> directory. Open up the file
that corresponds to the functionality you are changing (or add a new file) and
add the commands that will reproduce the bug or validate the new functionality.

For example, the test below creates a new test table "t1"; shows us the result
of that CREATE TABLE statement; and lastly, we cleanup the test by dropping the
test table:

<<sql>>
#
# Bug #XXXXXX: INET_ATON() returns signed, not unsigned
#
create table t1 select INET_ATON('255.255.0.1') as `a`;
show create table t1;
drop table t1;
<</sql>>

By adding your test first, it will remind you to re-record the test output file
later (and inform future efforts about your expected output, of course). Now
it's time to make your changes.

Examine existing tests to get a better idea of how you should write your test.

We are always on the lookout for better tests, so if you create new test or
improve an existing test, please upload it to the "private" folder on our
[[FTP]] server and then ping us on [[IRC]] or send a note to the
Maria-Developers mailing list to let us know about it.

== Editing and Adding to Your Contribution

With a working version, you can commence making changes in your new branch,
committing code regularly to your local working copy; feel free to commit early
and often as you will formalize your contribution later with a push and
proposal.
 
<<code lang=sh>>
cd $maria-repo/maria-fix-bugNNNNNN
# Make Changes
bzr commit -m "Merge comment"
<</code>>

Prior to publishing your completed work, you need to confirm that your branch
works as expected and update the test runs.

To allow others to see your commits, you should configure
[[setting-up-and-using-bazaar-bzr-with-mariadb|bzr]] to send its commit emails
to
[[http://lists.askmonty.org/cgi-bin/mailman/listinfo/commits|commits 'at' mariadb 'dot' org]]
email list.

== Testing Your Branch

Make sure you have at least libtool 1.5.22
([[http://www.gnu.org/software/libiconv/|found here]]).

First, check to see that all the tests pass (remember the test you set up
earlier? It should fail. That's ok, you will re-record it momentarily):

<<code lang=sh>>
cd $maria-repo/mysql-test
./mysql-test-run
<</code>>

Any that fail will need to be re-recorded (assuming the new result is correct).

<<code lang=sh>>
cd $maria-repo/mysql-test
./mysql-test-run --record $test # where $test is the name of the test that failed 
<</code>>

You are now ready to merge into trunk.

== Merging Recent Changes

It is important to merge any changes from trunk into your branch before pushing
or publishing.

Update your local trunk.

<<code lang=sh>>
cd $maria-repo 
cd trunk
bzr pull   
<</code>>

Updating your local branch.

<<code lang=sh>>
cd $maria-repo
cd maria-fix-bugNNNNNN
bzr merge ../trunk
bzr commit -m "Merged from trunk"
<</code>>

Conflicts can be resolved in bazaar via:

<<code lang=sh>>
bzr resolve $filename
<</code>>

To revert to your last commit on your branch use:

<<code lang=sh>>
bzr revert $filename
<</code>>

(Note you will need to remerge with trunk before pushing)

Verify differences carefully

<<code lang=sh>>
bzr diff
<</code>>

== Publish Your Branch

When all changes are merged and your changes are all consistent you can push
your branch to LaunchPad

<<code lang=sh inline=false>>
cd $maria-repos/$your-branch # where $your-branch is the branch you want to push (ex: maria-bugNNNN)
bzr push lp:~[yourloginid]/maria/$your-branch
<</code>>

If you find that this takes a very long time (eg. >30 minutes), you may want to
try using '<<code>>bzr init-repo --format=1.9<</code>>' to initialize a new
repo and merge your work into it, then push again.

== How to Propose Branch for Merging
 
On your Launchpad Code page
{{{https://code.launchpad.net/~{yourloginid}/maria/{branch-name} }}} click the
**Propose for merging into another branch** link to propose branch to the
maintainers to be merged into the main trunk.

== Fix Branch (if needed)

If fixes are needed on your branch you will need to: make the changes, re-merge
any new changes to trunk, commit and re-push; you do not need to re-propose.
After the push, LaunchPad will pick up the changes automagically.

Please be aware that changes can take a few minutes for LaunchPad to merge your
new changes into your proposal.

== How to Re-Submit a Proposal

At the moment, this is a tricky process, and none of the "Request another
review" links on Launchpad work.

To resubmit a merge proposal, follow these steps:
# On the main page of the merge proposal, the top line will be something like
  "Status: Needs Review". Just to the right of this is a small button; click on
  this to change the status.
# Select "Resubmit" from the drop down menu and click "Change Status".
# The next page should prompt you to resubmit the merge proposal and inform you
  that the new proposal will supersede the old one
# Click "Resubmit" to finish.

A couple of easy ways to get attention to your proposed merge are:

* Join the #maria [[IRC]] channel on [[https://libera.chat/]] ask people to review/discuss your
  merge.
* Subscribe to and send an email to the
  [[http://launchpad.net/~maria-developers|maria-developers]] group on
  Launchpad.
<</nowiki>>

This page is licensed: CC BY-SA / Gnu FDL

Contributing to the MariaDB Project

How to Contribute to MariaDB

The success of MariaDB relies heavily on community involvement. You can contribute in various ways, even if you are not a developer:

  • Bug Reporting: Create an account to report bugs.

  • Knowledge Sharing: Answer questions in the documentation or on IRC and Zulip.

  • Testing: Upload bug tests to the FTP server for private testing.

  • Documentation: Write or translate articles and documentation for MariaDB.

  • Advocacy: Promote MariaDB and participate in open source events.

  • Support: Donate time or resources, or engage your company in supporting MariaDB.

Explore more ways to get involved on the MariaDB Foundation page.

The success of MariaDB depends on the participation of the community. MariaDB would not be as good as it is today without the participation of the entire MariaDB community.

Getting Started

There are several ways to contribute besides Contributing Code, and not all of them require strong C/C++ skills. Areas for non-developers include:

  • Reporting Bugs

    • To report a bug you'll need to sign up for an account by clicking on the Create an account link below the login fields

  • Suggesting Features

  • Helping other people answering problems or even fixing their bugs on IRC in the #maria channel on Libera.Chat or on MariaDB's Zulip instance at mariadb.zulipchat.com.

  • Testing and Benchmarking

    • Bug tests can be uploaded to the 'private' directory of our FTP server.

  • Creating documentation for MariaDB, or translating existing documentation.

  • Advocating MariaDB in your area.

  • Participate in open source events and talk about MariaDB.

  • Running a BuildBot on a platform which is not currently builders.

  • Donate time or money to the MariaDB project.

  • Ask your company to sponsor a feature.

  • MariaDB Foundation page on getting involved

MariaDB Email Lists

A great way to get started in MariaDB is to participate in e-mail discussions via our mailing lists (whichever list best matches your interests):

  • developers

  • discuss

  • docs

Sensitive security issues can be sent directly to the persons responsible for MariaDB security: security [AT] mariadb (dot) org.

You can find additional email address, email archives and ways to connect with MariaDB people here.

All MariaDB contributors are expected to follow the Ubuntu Code Of Conduct.

Getting Started for Developers

  • Contributing Code

See also the pages for new developers on the MariaDB Foundation website:

  • Getting Started For Developers

  • Get the Code, Build It, Test It

  • Writing Good Test Cases for MariaDB Server

  • Submitting a Pull Request

This page is licensed: CC BY-SA / Gnu FDL

MariaDB Public FTP Server

MariaDB provides a secure FTP, SFTP and WebDAV server where you can upload files to be used by MariaDB developers, for example table structures and data for bug reports.

The folder tree consists of:

  • The public folder for files that the MariaDB developers want to give the public access to (patches, samples etc).

  • The private folder for uploads. Files uploaded there can only be accessed by MariaDB developers. You will not be able to see your upload and this folder does not allow downloads. This is done to protect any sensitive information which may be in test results, mysqld & core files. Upload those into this folder.

  • The secret folder is for private downloads. Files in this folder are not visible so you will need the complete filename to successfully download a file from this folder.

To share files with MariaDB developers, upload it into the private directory with either:

  • SFTP client (scp), enter 'anonymous' as the password:

scp MDEV-XXXXX.tgz anonymous@ftp.mariadb.org:private/
(anonymous@ftp.mariadb.org) Password: 
MDEV-XXXXX.tgz                                                          100%  152KB 218.8KB/s   00:00    
scp: remote fsetstat: Permission denied

You can ignore the 'fsetstat: Permission denied' error.

  • WebDAV client (curl):

curl -T MDEV-XXXXX.tgz -u anonymous:anonymous https://webdav.mariadb.org/private/MDEV-XXXXX.tgz
Created
  • FTP client (lftp); enter 'anonymous' as the password:

lftp -u anonymous -e 'put  MDEV-XXXXX.tgz' ftp://ftp.mariadb.org/private/
Password: 
cd ok, cwd=/private                                            
put: Access failed: 550 Issue during transfer: network error: error transferring data: read tcp
[...] read: connection reset by peer (MDEV-XXXXX.tgz)

You can ignore the 'network error'.

Note for MariaDB developers: please request your access to the SFTP service if not already at ftp@mariadb.org (provide public SSH key and username). You will then be able to access the service with:

sftp user@ftp.mariadb.org

or with HTTPS at https://ftp.mariadb.org.

Sponsoring the MariaDB Project

Sponsorships are crucial for ongoing and future development of the MariaDB project! There are a number of easy ways you for you to help the project:

  • Fund the development of a specific feature. You can find a list of suggested features to sponsor here or in JIRA. Feel free to sign in and add more projects to either place!

  • Contribute with developer time. If your organization has talented developers familiar with MariaDB or MySQL codebase they can become part of the MariaDB team and contribute to the development of the MariaDB project.

  • Hire a developer that you dedicate to work on the MariaDB project.

  • A pure donation with no strings attached

This page is licensed: CC BY-SA / Gnu FDL

Log of MariaDB Contributions

How to Document Contributions

Add the following in the global comment for each contribution:

Patch: Name, url or where we got patch
Author:   ....
License:  MCA or BSD
Reviewer: ....

For those cases this is not done, please add to this page a short line for each push into MariaDB that includes code from contributors not employed by the MariaDB Foundation or the MariaDB Corporation. The purpose of this is to properly track that all such patches are submitted either under MCA or BSD-new and to ensure that the developer gets credit for his work.

Example:

Feature/Patch name
* Author(s)
* Author has signed MCA on "date" | Patch was licensed under BSD

(Please enhance the example with anything that makes sense.)

MariaDB 11.7 Log of Contributions

  • MariaDB 11.7.1 contributors

MariaDB 11.6 Log of Contributions

  • MariaDB 11.6.2 contributors

  • MariaDB 11.6.1 contributors

MariaDB 11.5 Log of Contributions

  • MariaDB 11.5.2 contributors

  • MariaDB 11.5.1 contributors

MariaDB 11.4 Log of Contributions

  • MariaDB 11.4.5 contributors

  • MariaDB 11.4.4 contributors

  • MariaDB 11.4.3 contributors

  • MariaDB 11.4.2 contributors

  • MariaDB 11.4.1 contributors

  • Binary log writing speed was improved by moving checksum calculations out of the global binlog mutex. This is a contribution by Kristian Nielsen (MDEV-31273)

  • New system variable max_binlog_total_size enables binary log purging when the total size of all binary logs exceeds the specified threshold. The implementation is based on the patch from Percona (MDEV-31404)

  • FULL_NODUP is a new value for the binlog_row_image system variable. It essentially works like FULL, that is all columns are included in the event, but it takes less space, because the after image omits columns that were not changed by the UPDATE statement, and have same values as in the before image. This is a contribution from Alibaba (MDEV-32589)

MariaDB 11.3 Log of Contributions

  • MariaDB 11.3.2 contributors

  • MariaDB 11.3.1 contributors

MariaDB 11.2 Log of Contributions

  • MariaDB 11.2.6 contributors

  • MariaDB 11.2.5 contributors

  • MariaDB 11.2.4 contributors

  • MariaDB 11.2.3 contributors

  • MariaDB 11.2.2 contributors

  • MariaDB 11.2.1 contributors

MariaDB 11.1 Log of Contributions

  • MariaDB 11.1.6 contributors

  • MariaDB 11.1.5 contributors

  • MariaDB 11.1.4 contributors

  • MariaDB 11.1.3 contributors

  • MariaDB 11.1.2 contributors

  • MariaDB 11.1.1 contributors

MariaDB 11.0 Log of Contributions

  • MariaDB 11.0.6 contributors

  • MariaDB 11.0.5 contributors

  • MariaDB 11.0.4 contributors

  • MariaDB 11.0.3 contributors

  • MariaDB 11.0.2 contributors

  • MariaDB 11.0.1 contributors

MariaDB 10.11 Log of Contributions

  • MariaDB 10.11.11 contributors

  • MariaDB 10.11.10 contributors

  • MariaDB 10.11.9 contributors

  • MariaDB 10.11.8 contributors

  • MariaDB 10.11.7 contributors

  • MariaDB 10.11.6 contributors

  • MariaDB 10.11.5 contributors

  • MariaDB 10.11.4 contributors.

  • MariaDB 10.11.3 contributors

  • MariaDB 10.11.2 contributors

  • MariaDB 10.11.1 contributors

MariaDB 10.10 Log of Contributions

  • MariaDB 10.10.7 contributors

  • MariaDB 10.10.6 contributors

  • MariaDB 10.10.5 contributors.

  • MariaDB 10.10.4 contributors

  • MariaDB 10.10.2 contributors

  • MariaDB 10.10.3 contributors

  • MariaDB 10.10.1 contributors

MariaDB 10.9 Log of Contributions

  • MariaDB 10.9.8 contributors

  • MariaDB 10.9.7 contributors.

  • MariaDB 10.9.6 contributors

  • MariaDB 10.9.5 contributors

  • MariaDB 10.9.4 contributors

  • MariaDB 10.9.3 contributors

  • MariaDB 10.9.2 contributors

  • MariaDB 10.9.1 contributors

MariaDB 10.8 Log of Contributions

  • MariaDB 10.8.8 contributors

  • MariaDB 10.8.7 contributors

  • MariaDB 10.8.6 contributors

  • MariaDB 10.8.5 contributors

  • MariaDB 10.8.4 contributors

  • MariaDB 10.8.3 contributors

  • MariaDB 10.8.2 contributors

  • MariaDB 10.8.1 contributors

MariaDB 10.7 Log of Contributions

  • MariaDB 10.7.8 contributors

  • MariaDB 10.7.7 contributors

  • MariaDB 10.7.6 contributors

  • MariaDB 10.7.5 contributors

  • MariaDB 10.7.4 contributors

  • MariaDB 10.7.3 contributors

  • MariaDB 10.7.2 contributors

  • MariaDB 10.7.1 contributors

MariaDB 10.6. Log of Contributions

  • MariaDB 10.6.21 contributors

  • MariaDB 10.6.20 contributors

  • MariaDB 10.6.19 contributors

  • MariaDB 10.6.18 contributors

  • MariaDB 10.6.17 contributors

  • MariaDB 10.6.16 contributors

  • MariaDB 10.6.15 contributors

  • MariaDB 10.6.14 contributors.

  • MariaDB 10.6.13 contributors

  • MariaDB 10.6.12 contributors

  • MariaDB 10.6.11 contributors

  • MariaDB 10.6.10 contributors

  • MariaDB 10.6.9 contributors

  • MariaDB 10.6.8 contributors

  • MariaDB 10.6.7 contributors

  • MariaDB 10.6.6 contributors

  • MariaDB 10.6.5 contributors

  • MariaDB 10.6.4 contributors

  • MariaDB 10.6.3 contributors

  • MariaDB 10.6.2 contributors

  • MariaDB 10.6.1 contributors

  • MariaDB 10.6.0 contributors

MariaDB 10.5 Log of Contributions

  • MariaDB 10.5.28 contributors

  • MariaDB 10.5.27 contributors

  • MariaDB 10.5.26 contributors

  • MariaDB 10.5.25 contributors

  • MariaDB 10.5.24 contributors

  • MariaDB 10.5.23 contributors

  • MariaDB 10.5.22 contributors

  • MariaDB 10.5.21 contributors.

  • MariaDB 10.5.20 contributors

  • MariaDB 10.5.19 contributors

  • MariaDB 10.5.18 contributors

  • MariaDB 10.5.17 contributors

  • MariaDB 10.5.16 contributors

  • MariaDB 10.5.15 contributors

  • MariaDB 10.5.14 contributors

  • MariaDB 10.5.13 contributors

  • MariaDB 10.5.12 contributors

  • MariaDB 10.5.11 contributors

  • MariaDB 10.5.10 contributors

  • MariaDB 10.5.9 contributors

  • MariaDB 10.5.8 contributors

  • MariaDB 10.5.7 contributors

  • MariaDB 10.5.6 contributors

  • MariaDB 10.5.5 contributors

  • MariaDB 10.5.4 contributors

  • MariaDB 10.5.3 contributors

  • MariaDB 10.5.2 contributors

  • MariaDB 10.5.1 contributors

  • MariaDB 10.5.0 contributors

MariaDB 10.4 Log of Contributions

  • MariaDB 10.4.34 contributors

  • MariaDB 10.4.33 contributors

  • MariaDB 10.4.32 contributors

  • MariaDB 10.4.31 contributors

  • MariaDB 10.4.30 contributors.

  • MariaDB 10.4.29 contributors

  • MariaDB 10.4.28 contributors

  • MariaDB 10.4.27 contributors

  • MariaDB 10.4.26 contributors

  • MariaDB 10.4.25 contributors

  • MariaDB 10.4.24 contributors

  • MariaDB 10.4.23 contributors

  • MariaDB 10.4.23 contributors

  • MariaDB 10.4.22 contributors

  • MariaDB 10.4.21 contributors

  • MariaDB 10.4.20 contributors

  • MariaDB 10.4.19 contributors

  • MariaDB 10.4.18 contributors

  • MariaDB 10.4.17 contributors

  • MariaDB 10.4.16 contributors

  • MariaDB 10.4.15 contributors

  • MariaDB 10.4.14 contributors

  • MariaDB 10.4.13 contributors

  • MariaDB 10.4.12 contributors

  • MariaDB 10.4.11 contributors

  • MariaDB 10.4.10 contributors

  • MariaDB 10.4.9 contributors

  • MariaDB 10.4.8 contributors

  • MariaDB 10.4.7 contributors

  • MariaDB 10.4.6 contributors

  • MariaDB 10.4.5 contributors

  • MariaDB 10.4.4 contributors

  • MariaDB 10.4.3 contributors

  • MariaDB 10.4.2 contributors

  • MariaDB 10.4.1 contributors

  • MariaDB 10.4.0 contributors

MariaDB 10.3 Log of Contributions

Instant ADD COLUMN (MDEV-11369)

  • Tencent Game DBA Team, developed by vinchen.

UPDATE statements with the same source and target (MDEV-12874)

  • Jerome Brauge.

Per-engine mysql.gtid_slave_pos tables (MDEV-12179)

  • Kristian Nielsen funded by Booking.com.

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from MariaDB 10.3.5

  • MariaDB 10.3.39 contributors

  • MariaDB 10.3.38 contributors

  • MariaDB 10.3.37 contributors

  • MariaDB 10.3.36 contributors

  • MariaDB 10.3.35 contributors

  • MariaDB 10.3.34 contributors

  • MariaDB 10.3.33 contributors

  • MariaDB 10.3.32 contributors

  • MariaDB 10.3.31 contributors

  • MariaDB 10.3.30 contributors

  • MariaDB 10.3.29 contributors

  • MariaDB 10.3.28 contributors

  • MariaDB 10.3.27 contributors

  • MariaDB 10.3.26 contributors

  • MariaDB 10.3.25 contributors

  • MariaDB 10.3.24 contributors

  • MariaDB 10.3.23 contributors

  • MariaDB 10.3.22 contributors

  • MariaDB 10.3.21 contributors

  • MariaDB 10.3.20 contributors

  • MariaDB 10.3.19 contributors

  • MariaDB 10.3.18 contributors

  • MariaDB 10.3.17 contributors

  • MariaDB 10.3.16 contributors

  • MariaDB 10.3.15 contributors

  • MariaDB 10.3.14 contributors

  • MariaDB 10.3.13 contributors

  • MariaDB 10.3.12 contributors

  • MariaDB 10.3.11 contributors

  • MariaDB 10.3.10 contributors

  • MariaDB 10.3.9 contributors

  • MariaDB 10.3.8 contributors

  • MariaDB 10.3.7 contributors

  • MariaDB 10.3.6 contributors

  • MariaDB 10.3.5 contributors

MariaDB 10.2 Log of Contributions

New variable read_binlog_speed_limit permits restricting the speed at which the slave reads the binlog from the master (MDEV-11064)

  • Tencent Game DBA Team, developed by chouryzhou.

Compression of events in the binary log (MDEV-11065)

  • Tencent Game DBA Team, developed by vinchen.

No Pad collations (MDEV-9711)

  • Daniil Medvedev

Flashback

  • Lixun Peng, Alibaba

Implement non-recursive common table expressions (MDEV-8789) Implement recursive common table expressions (MDEV-9864) Pushdown conditions into non-mergeable views/derived tables (MDEV-9197)

  • Galina Shalygina

Backporting Delayed replication (MDEV-7145) from MySQL 5.6

  • Kristian Nielsen funded by Booking.com

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from MariaDB 10.2.14

  • MariaDB 10.2.43 contributors

  • MariaDB 10.2.42 contributors

  • MariaDB 10.2.41 contributors

  • MariaDB 10.2.40 contributors

  • MariaDB 10.2.39 contributors

  • MariaDB 10.2.38 contributors

  • MariaDB 10.2.37 contributors

  • MariaDB 10.2.36 contributors

  • MariaDB 10.2.35 contributors

  • MariaDB 10.2.34 contributors

  • MariaDB 10.2.33 contributors

  • MariaDB 10.2.32 contributors

  • MariaDB 10.2.31 contributors

  • MariaDB 10.2.30 contributors

  • MariaDB 10.2.29 contributors

  • MariaDB 10.2.28 contributors

  • MariaDB 10.2.27 contributors

  • MariaDB 10.2.26 contributors

  • MariaDB 10.2.25 contributors

  • MariaDB 10.2.24 contributors

  • MariaDB 10.2.23 contributors

  • MariaDB 10.2.22 contributors

  • MariaDB 10.2.21 contributors

  • MariaDB 10.2.20 contributors

  • MariaDB 10.2.19 contributors

  • MariaDB 10.2.18 contributors

  • MariaDB 10.2.17 contributors

  • MariaDB 10.2.16 contributors

  • MariaDB 10.2.15 contributors

  • MariaDB 10.2.14 contributors

MariaDB 10.1 Log of Contributions

Replication, optimizer, security, speed enhancements, bug fixing, etc

  • MariaDB Corporation

Power8 optimization

  • MariaDB Foundation

  • Stewart Smith

  • In cooperation with IBM

Documentation

  • MariaDB Foundation

Query timeouts

  • MariaDB Foundation

Character set enhancements and speedups

  • MariaDB Foundation

Upgraded regexp library

  • MariaDB Foundation

Reviews for replication, encryption, compression, Galera, Connect storage engine, Mroonga storage engine, Spider, OR REPLACE etc.

  • MariaDB Foundation

Encryption, scrubbing, enhanced semisync, dump thread enhancements, thd_specifics plugin service

  • Google

Table level encryption, plugin for secure encryption

  • Eperi GmbH

Defragmentation, online alter progress monitoring

  • Kakao Inc

Galera (wsrep patches)

  • Codership

Compound statements

  • Antony Curtis

CREATE OR REPLACE/IF NOT EXISTS

  • Sriram Patil

New status variables for replication

  • Daniel Black

RESET MASTER TO #

  • Daniël van Eeden

Atomic writes, page compression, trim, multi-threaded flush for XtraDB/InnoDB

  • In cooperation with FusionIO

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from MariaDB 10.1.32

  • MariaDB 10.1.48 contributors

  • MariaDB 10.1.47 contributors

  • MariaDB 10.1.46 contributors

  • MariaDB 10.1.45 contributors

  • MariaDB 10.1.44 contributors

  • MariaDB 10.1.43 contributors

  • MariaDB 10.1.42 contributors

  • MariaDB 10.1.41 contributors

  • MariaDB 10.1.40 contributors

  • MariaDB 10.1.39 contributors

  • MariaDB 10.1.38 contributors

  • MariaDB 10.1.37 contributors

  • MariaDB 10.1.36 contributors

  • MariaDB 10.1.35 contributors

  • MariaDB 10.1.34 contributors

  • MariaDB 10.1.33 contributors

  • MariaDB 10.1.32 contributors

Also Used Code Snippets by

Facebook

  • Defragmentation, prefix index queries optimization, lazy flushing, buffer pool list scan optimization, configurable long semaphore wait timeout

Percona

  • SET STATEMENT, enforce_storage_engine

Oracle

  • UNION ALL optimization, default_tmp_storage_engine

MariaDB 10.0 Log of Contributions

Per thread memory counting and usage

  • Base code and idea by Lixun Peng, Taobao

  • License: BSD

Multi-source replication

  • Base code by Lixun Peng, Taobao

  • License: BSD

GET_LOCK

  • Code by Konstantin "Kostja" Osipov, mail.ru

  • License: BSD

CONNECT storage engine

  • Code by Olivier Bertrand

  • License: GPL

Spider storage enginemetadata_lock_info Information schema

  • Code by Kentoku Shiba, Spiral Arms

  • License: GPL

Roles

  • Code by Vicentiu Ciorbaru, Google Summer of Code 2013

  • License: BSD

PCRE Regular Expressions

  • Code by Sudheera Palihakkara, Google Summer of Code 2013

  • License: BSD

Global Transaction IDs

  • Some patches by Pavel Ivanov, Google

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from MariaDB 10.0.35

  • MariaDB Galera Cluster 10.0.38 contributors

  • MariaDB 10.0.38 contributors

  • MariaDB Galera Cluster 10.0.37 contributors

  • MariaDB 10.0.37 contributors

  • MariaDB Galera Cluster 10.0.36 contributors

  • MariaDB 10.0.36 contributors

  • MariaDB Galera Cluster 10.0.35 contributors

  • MariaDB 10.0.35 contributors

MariaDB 5.5 Log of Contributions

Function last_value() which returns the last value but evaluates all arguments as a side effect.

  • Original patch by Eric Herman, Booking.com

  • License: BSD

nowatch option for mysqld_safe (allow systemd)

  • Based on code from Maarten Vanraes

  • License: BSD

Security fixes, patches

  • Work by Honza Horak, Red Hat

Coverity scans

  • Work by Christian Convey

The MariaDB Foundation website provides a more detailed list of contributors by release, starting from MariaDB 5.5.60

  • MariaDB 5.5.68 contributors

  • MariaDB 5.5.67 contributors

  • MariaDB 5.5.66 contributors

  • MariaDB 5.5.65 contributors

  • MariaDB 5.5.64 contributors

  • MariaDB Galera Cluster 5.5.63 contributors

  • MariaDB 5.5.63 contributors

  • MariaDB Galera Cluster 5.5.62 contributors

  • MariaDB 5.5.62 contributors

  • MariaDB Galera Cluster 5.5.61 contributors

  • MariaDB 5.5.61 contributors

  • MariaDB Galera Cluster 5.5.60 contributors

  • MariaDB 5.5.60 contributors

MariaDB 5.2 Log of Contributions

Virtual Columns

  • MySQL_virtual_columns_ref_manual

  • mysql-6.0-wl1075-wl411

  • Andrey Zhakov (modified by Sanja and Igor)

  • Author has signed MCA

Declaring many CHARSET objects as const.

  • Antony T Curtis (LinkedIn)

  • License: BSD

Extended user statistics

  • Authors: People at Google, Facebook and Percona. This code owes a special thanks to Mark Callaghan!

  • License: BSD

Segmented MyISAM Key Cache

  • Fredrik Nylander from Stardoll.com

  • License: MCA

The OQGRAPH storage engine

  • doc

  • Created by Arjen Lenz, Open Query

  • License GPL

The Sphinx storage engine

  • current.html

  • Created by Andrew Aksyonoff.

  • License: GPL

Pluggable Authentication

  • RJ Silk License: MCA

Various bug fixes

  • Stewart Smith, Percona

MariaDB 5.1 Log of Contributions (Outside of Monty Program Ab)

Microsecond precision in process list

  • microsec_process.patch

  • Percona Inc

  • Patch was licensed to Monty Program under BSD (new).

Slow Query Log Extened Statistics

  • microslow.patch

  • Percona Inc

  • Patch was licensed to Monty Program under BSD (new).

The PBXT storage engine

  • Created by Paul McCullagh

  • License: GPL

The FederatedX storage engine

  • project.php?id=265

  • All changes are made by Patrik Galbraith and Antony Curtis and are given to us under BSD-new.

  • In addition we are allowed to promote FederatedX.

Windows enhancements and various bug fixes

  • Alex Budovski, under MCA

Creating of MariaDB packages

  • Arjen Lenz, Open Query

Various bug fixes

  • Stewart Smith, Percona

Sponsored Features

Google has sponsored:

  • Parallel Replication

  • Google tests GTID, parallel replication and lots more on the mailing list

Facebook has sponsored many features, including:

  • LIMIT ROWS EXAMINED

  • The non-blocking client library

  • Facebook employees do frequent the mailing list

See Also

  • SHOW CONTRIBUTORS lists all members and sponsors of the MariaDB Foundation and other sponsors.

  • SHOW AUTHORS lists the authors of MariaDB (including documentation, QA etc).

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code

MariaDB has participated in Google Summer of Code since 2013. This section contains pages providing information for each year.

We believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB ColumnsStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Google Summer of Code 2025

This year we are again participating in the Google Summer of Code. We, joined with the MariaDB Foundation, believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, C++, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to the developers mailing list (this is the main list where we discuss development - there are also other mailing lists).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

MDEV-28395 LOAD DATA plugins

Full-time project 350h

LOAD DATA INFILE can flexibly load data into a table from CSV-like files accessible by the mariadbdb process. LOAD XML INFILE can do it for XML files. LOAD DATA LOCAL INFILE and LOAD XML LOCAL INFILE can do it with files accessible by the client, but not by the server. But there are requests to suport loading more file formats and from other locations, for example, from S3.

This project is to implement support for LOAD plugins and refactor the current LOAD code accordingly. There are two kind of plugins — data parser plugin (CSV-like and XML) and transfer plugin (file and LOCAL). Implementing new plugins is not in the scope of this task, this task is mainly about moving existing code around, creating a possibility for new plugins (like JSON or S3).

Skills needed: C++, bisonMentors: Sergei Golubchik

MDEV-36100 Generate vector embeddings automatically on INSERT

Full-time project 350h

Implement a syntax and a plugin API that the server will use to generate embeddings for documents that the user stores in the database. This should allow to simplify significantly the vector pipeline. mariadbd will not generate embeddings internally, it will invoke a plugin to do that.

Skills needed: C++Mentors: Sergei Golubchik

MDEV-36107 expressions in mysqltest

Part-time project 175h

extend mysqltest language to support

  • standard arithmetic +, -, *, /, %

  • comparisons ==, !=, <, <=, >, >=

  • boolean &&, ||, may be ? :

  • if possible: string repetition, perl-style x (to replace SELECT REPEAT() in test files)

This should work in commands if, while

Can be done together with MDEV-36108 as a full-time project.

Skills needed: C++Mentors: Sergei Golubchik

MDEV-36108 variable substitutions in mysqltest

Part-time project 175h

extend mysqltest language to support bash-like substitutions:

  • ${var}

  • ${parameter:offset:length}

  • ${#parameter}

  • ${parameter/pattern/string/flags}

  • may be ${parameterˆ}, ${parameterˆˆ}, ${parameter,}, ${parameter}

  • may be ${parameter@function} with functions like u, U, Q, etc

recursive expansion:

  • ${${var}}

Can be done together with MDEV-36107 as a full-time project.

Skills needed: C++Mentors: Sergei Golubchik

MDEV-18827 Create utility to parse frm files and print their DDL

Full-time project - potential part-time (175 - 350h, depending on scope)

FRM files are what MariaDB uses to store metadata about tables. These files can be used to generate DDL statements (CREATE TABLE ...). We are lacking a utility to parse these files which could in turn make DBAs lives easier. The task of this project is to have this utility implemented, making use of MariaDB's FRM parsing logic. You may have to carry out some refactoring to extract the parsing code into a reusable library, once for MariaDB Server, once for the FRM parsing tool.

Skills needed: C/C++, understanding libraries and APIs.Mentors: Vicențiu Ciorbaru / Sergei Golubchik

MDEV-9345 Replication to enable filtering on master

Part-time project 175h

The current methods of filtering replication events are limited to either 1) at binlog-write time, which can break point-in-time recovery because some committed transactions will be missing from the binary log, or 2) on the replica, which forces all events on the primary server to always be sent to the replica, which can be a security concern and is also not efficient. This task aims to eliminate these limitations by adding in another point at which replication filtering occurs: on the binlog dump threads. This would allow users to both maintain a consistent binary log, and minimize network traffic by guarding events which are never intended for replication.

Skills needed: C++Mentors: Brandon Nesterenko

Buildbot build statistics dashboard

Part-time project 175h TODO - A more ample description will be created.

Skills needed:Mentors: Vlad Radu

MCOL-4889 Manual extent vacuuming

Full-time project 350h

Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.

Skills needed: modern C++Mentors: Roman Nozdrin

MCOL-5142 Support for recursive CTE

Full-time project 350h

MariaDB Columnstore lacks recursive CTE handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains recursive CTE.

Here is the info about the feature:recursive-common-table-expressions-overview

Skills needed: modern C++Mentors: Leonid Fedorov

MCOL-5598 Support for EXCEPT and INTERSECT SQL expressions

Full-time project 350h

MariaDB Columnstore lacks UNION EXCEPT INTERSECT handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains UNION EXCEPT or UNION INTERCEPT

Here is the info about the feature:exceptintersect

Skills needed: modern C++Mentors: Alexey Antipovsky

MCOL-XXX Bloom-filters for data scanning

Full-time project 350h

MariaDB Columnstore lacks indexes so it reads a lot of extra data from disk. This project introduces Bloom-filters to reduce data read from disk during the most IO heavy operation that is scanning.

Skills needed: modern C++Mentors: Roman Nozdrin

MCOL-5758 Reduce the computations in JOINS by simpler Bloom-filter-based pre-joins

Full-time project 350h

Joins are very heavy algorithms, both in computation and/or in memory use. They need to hold a substantial amount of data in memory and perform hashing and other operations on that data. Joins can overflow memory limits and keeping balance between memory use and performance is tricky. Thus we have to filter information thaat is going into joins as much as possible. Columnstore already does great work in that regard, pushing WHERE filters before joins. This particular task is also concerned with that, adding Bloom filters' operations that approximate JOIN results and perform a secondary read to feed into joins data that is highly likely will be used in a join.

Skills needed: modern C++Mentors: Sergey Zefirov

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2024

In 2024, MariaDB again participated in the Google Summer of Code. We believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, C++, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to the developers mailing list (this is the main list where we discuss development - there are also other mailing lists).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

Implement IVFFlat indexing strategy for MariaDB Vector and evaluate performance

Part-time (175h) or full-time project (350h) - depending on scope MariaDB Vector is coming to MariaDB Server to serve AI Workloads. The current indexing strategy will use HNSW, but IVFFlat is a possible alternative that costs fewer resources to create. Having it as an option is desirable.

MDEV-17398 Spatial (GIS) functions in MariaDB

Part-time (175h) or full-time project (350h) - depending on scope

Our GIS functionality is limitted compared to other DBMSes. Given that MariaDB looks to facilitate migration from MySQL, we should be on par. We have a list of functions that are missing in MariaDB compared to MySQL, as described in function-differences-between-mariadb-10-10-and-mysql-8-0. Our goal is to have as many of these functions available within MariaDB. Some of the functionality can be ported from MySQL, while others might require implementation from scratch.

Skills needed: Understanding of C++ development. Ability to navigate a large codebase (with help from mentor).Mentors: Anel Husakovic (primary) / Vicențiu Ciorbaru (secondary)

MDEV-16482 MariaDB Oracle mode misses Synonyms

Full-time project 350h

Synonyms are an important feature, particularly as it helps smooth migration from other databases. While the initial project scope seems straightforward, there are a number of aspects that must be considered:

  1. Grammar extension

  2. Where will the synonyms definitions be stored?

  3. How do synonyms map to the underlying privilege system? Who can create a synonym? Who can access a synonym?

  4. Do we enforce the underlying object to exists before creating a synonym? What if the underlying object gets dropped?

  5. What kind of error messages do we present to the user in various corner cases?

  6. How do synonyms interact with replication (row based vs statement based)

  7. How do synonyms interact with views (and views execution)

  8. How to present synonyms to users (as part of INFORMATION_SCHEMA for instance?)

  9. Performance considerations for multiple connections to the database.

Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Vicențiu Ciorbaru (primary) / Michael Widenius (secondary)

MDEV-30645 CREATE TRIGGER FOR { STARTUP | SHUTDOWN }

Full-time project 350h

Support generalized triggers like

CREATE TRIGGER ... AFTER STARTUP ...
CREATE TRIGGER ... BEFORE SHUTDOWN ...
CREATE TRIGGER ... ON SCHEDULE ...

the latter being a synonym for CREATE EVENT.

  • should STARTUP/SHUTDOWN triggers run exclusively? that is, STARTUP trigger is run before any connection is allowed or in parallel with them? Same for SHUTDOWN.

Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Sergei Golubchik

MDEV-21978 make my_vsnprintf to use gcc-compatible format extensions

Part-time project 175h

my_vsnprintf() is used internally in the server as a portable printf replacement. And it's also exported to plugins as a service.

It supports a subset of printf formats and three extensions:

  • %smeans that a string should be quoted like anidentifier`

  • %b means that it's a binary string, not zero-terminated; printing won't stop at \0, so one should always specify the field width (like %.100b)

  • %M is used in error messages and prints the integer (errno) and the corresponding strerror() for it

  • %T takes string and print it like %s but if the string should be truncated puts "..." at the end

gcc knows printf formats and check whether actual arguments match the format string and issue a warning if they don't. Unfortunately there seems to be no easy way to teach gcc our extensions, so for now we have to disable printf format checks.

An better approach would be to use gcc compatible format extensions, like Linux kernel does. We should migrate to a different syntax for our extensions

  • %sI to mean "print as an identifier"

  • %sB to mean "print a binary string"

  • %uE to mean "print an errno"

  • %sT to put a "..." as truncation indicator

old formats can still be supported or they can be removed and in the latter case the major version of the service should be increased to signal an incompatible change.

All error messages and all usages of my_vsnprintf should be changed to use the new syntax. One way to do it is to disable old syntax conditionally, only in debug builds. All gcc printf format checks should be enabled.

Skills needed: Understanding of C development.Mentors: Sergei Golubchik

MCOL-5688 Parallel CSV read leveraging Apache Arrow

Full-time project 350h

cpimport is a binary that ingests data into MCS in an efficient manner reducing ingest timings significantly whilst preserving transaction isolation levels.

cpimport is relatively complex facility that reads data from local file/S3 parses it, converts and put into MCS-specific files. cpimport is unable to read a big-sized single CSV file from disk in parallel. Apache Arrow has a CSV read faciilty that can do parallel CSV read. The goal of the project is to replace an existing homebrew CSV parser implemented in cpimport with the one from Apache Arrow.

Skills needed: modern C++.Mentors: Leonid Fedorov

MCOL-4889 Manual vacuum cleaning for on-disk data empty records

Full-time project 350h

Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.

Skills needed: modern C++Mentors: Roman Nozdrin

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2023

In 2023, MariaDB participated in the Google Summer of Code. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, C++, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

MDEV-30610 Update RocksDB to the latest upstream version

Full-time project 350h

Our version of RocksDB is lagging behind the current upstream and needs to be updated. This isn't a case of simply updating the submodule, there have been significant API changes. It will likely require porting the latest MyRocks code over to the MariaDB storage API.

Skills needed: Understanding of C/C++ development. Preferably some experience with the MariaDB or MySQL codebase (but not essential).Mentor: Andrew Hutchings

MDEV-17398 Spatial (GIS) functions in MariaDB

Part-time (175h) or full-time project (350h) - depending on scope

Our GIS functionality is limitted compared to other DBMSes. Given that MariaDB looks to facilitate migration from MySQL, we should be on par. We have a list of functions that are missing in MariaDB compared to MySQL, as described in function-differences-between-mariadb-10-10-and-mysql-8-0. Our goal is to have as many of these functions available within MariaDB. Some of the functionality can be ported from MySQL, while others might require implementation from scratch.

Skills needed: Understanding of C++ development. Ability to navigate a large codebase (with help from mentor).Mentors: Anel Husakovic (primary) / Vicențiu Ciorbaru (secondary)

MDEV-16482 MariaDB Oracle mode misses Synonyms

Full-time project 350h

Synonyms are an important feature, particularly as it helps smooth migration from other databases. While the initial project scope seems straightforward, there are a number of aspects that must be considered:

  1. Grammar extension

  2. Where will the synonyms definitions be stored?

  3. How do synonyms map to the underlying privilege system? Who can create a synonym? Who can access a synonym?

  4. Do we enforce the underlying object to exists before creating a synonym? What if the underlying object gets dropped?

  5. What kind of error messages do we present to the user in various corner cases?

  6. How do synonyms interact with replication (row based vs statement based)

  7. How do synonyms interact with views (and views execution)

  8. How to present synonyms to users (as part of INFORMATION_SCHEMA for instance?)

  9. Performance considerations for multiple connections to the database.

Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Vicențiu Ciorbaru (primary) / Michael Widenius (secondary)

GH-457 MariaDB ColumnStore in Docker Official OCI Image

Part-time project 175h / Full-time project 350h - depending on scope

MariaDB ships with ColumnStore as a storage engine. However the architecture of ColumnStore is not like a traditional storage engine. Instead it relies on multiple database nodes working in unison. This means that starting up a ColumnStore enabled MariaDB service is not a trivial endeavour. This project seeks to create the necessary tooling around starting MariaDB with ColumnStore inside a OCI containers. You will be writing Dockerfiles, configuration files as well as bash scripts to achieve this.

The challenge of this project lies in:

  1. Limited documentation around ColumnStore. There will be some time spent on the discovery process.

  2. Formulating a clear plan to facilitate:

  3. Starting MariaDB with ColumnStore

  4. Upgrading MariaDB with ColumnStore on a version upgrade

  5. Creating health checks to validate the health of the ColumnStore service nodes

  6. Setting configuration variables via environment switches where appropriate

  7. Declaratively (likely docker-compose yml file) state the system's architecture.

  8. Documenting the necessary steps to deployment

  9. Producing a blog of its operation

  10. Optionally enable deployment via Kubernetes

  11. Implementing the plan and creating a CI/CD pipeline for testing.

Skills needed: Ability to develop durable bash scripts, understanding of container runtime and ability to confirm to container best practices. Able to incrementally develop and test functionality.Mentors: Daniel Black (primary - containers) / Andrew Hutchings (secondary - ColumnStore)

MDEV-6166 LOAD DATA INFILE - store all warnings

Part-time project 175h

The main focus of this project is around developer / sysadmin experience. We want to improve the quality of life of those using MariaDB. Migrating large datasets is one of these challenges. As is described in the MDEV, a simple limitation related to LOAD DATA INFILE can severly hamper developer productivity. A related problem is discussed in MDEV-13046.

The goal of this project is to come up with a solution for storing warnings during LOAD DATA INFILE. This will require modifying the existing server codebase to create an SQL interface for processing the generated warnings.

Challenges:

  • LOAD DATA INFILE can process large datasets. That means that the server must not just store all warnings in memory. You will need to make use of already existing mechanisms (creating temporary tables) so one can spill to disk.

Skills needed: Understanding of C++ development.Mentors: Anel Husakovic (primary) / Daniel Black (secondary)

Buildbot (CI/CD)

MDBF-320 Better Grid view for buildbot.mariadb.org (Python / Javascript / Web Dev

Full-time project 350h

Our CI/CD infrastructure uses a recent version of Buildbot. The GRID view plugin that comes with Buildbot is not adequate for our needs. In this project, you will discuss with your mentor as well as other MariaDB developers on how to best improve the User Experience of Buildbot's grid view for what MariaDB Developers needs to accomplish.

Skills needed: Understanding of web-dev technologies like Angular, React, and Javascript related libraries. Python may also be required.Mentor: Vlad Bogolin

MariaDB Columnstore

MariaDB Columnstore is a columnar engine for MariaDB Server for OLAP workload. MCS is also a distributed multithreaded application written in C++. C++20 is curently used for development. There is a number of interesting MCS projects to be part of in both research and production programming areas.

JIT compiled SQL expressions

Part-time project 175h / Full-time project 350h - depending on scope

MCS uses interepreted execution to calculate SQL expressions results. Here is an example of a SQL expression 'table1.col1 + FLOOR(table2.col1)'. Given that table1.col1 is DECIMAL and table2.col1 is DOUBLE there is a number of conditions that drives the calculation of this relatively simple example in runtime. Given that SQL types and expression tree are known before the query begins it is possible to replace interpretation with JIT to produce specialized compiled bytecode that is:

  1. small

  2. has no or almost no branches

  3. optimized for the specific platform it is run at

This is a research mostly project which goal is to produce a set of microbenchmarks that:

  1. leverages any JIT compiler available, e.g. LLVM, MIR

  2. demonstrates a negative and positive effects of using JIT

Skills needed:

  1. C++

  2. at least basic compilers internals knowledge

Mentor: Roman Nozdrin

Parquet support in cpimport

Full-time project 350h

cpimport in MCS is a standalone tool that does bulk ingestion outside SQL interface. It takes a source data as an input and puts it into MCS cluster. This put is an atomic operation that supports rollback. The sources can either local files or files on S3. The only format cpimport now reads is CSV with custom:

  1. delimiters

  2. quotation signs

  3. NULL symbol

The goal of this project is to either teach cpimport to support parquet format as input format or introduce a modular framework to add input formats. This project is mostly about reading/writing production code where the challenges are:

  1. to learn a codebase

  2. produce a feature

  3. support the feature with unit and integration tests using existing frameworks

cpimport consist of:

  1. a buffer where parsed data lines goes in a form of low-level representation of SQL datatypes

  2. a set of parser threads that populates the buffer

  3. a set of writer threads that takes values that makes a single SQL record and puts them into the corresponding files

Parser threads now has a fixed Delimiter Separated Values parser that can be parametrized only with:

  1. escape character

  2. 'enclosed by' characters

The suggested approach is to replace this DSV parser with a modular one that understands how to read popular formats, e.g. parquet, Arrow, Avro, JSON

Skills needed:

  1. C++

  2. production development tooling like git, Cmake

Mentor: Gagan Goel

SIMD for SQL expressions

Part-time project 175h / Full-time project 350h - depending on scope

MCS uses scalar processing to calculate SQL expressions results. The expressions can be in projection or filtering part of a SQL query. Here is an example of a SQL expression 'table1.col1 + FLOOR(table2.col1)'. In most cases scalar processing can be replaced with a vectorized execution that reduces a number of cycles to render the result of an expression. The challenge of this project is that in-memory representation ca be both vertical and horisontal.

This is a research mostly project which goal is to produce a set of microbenchmarks that:

  1. unveils limitations or problems applying vectorization for expressions

  2. compares performance for cases:

  3. vectorized execution with vertical data

  4. vectorized execution with horizontal data

  5. scalar execution with horizontal data

Skills needed:

  1. C++

  2. ASM knowledge to manage with low-level part of this project

Mentor: Andrey Piskunov

Fuzzing infrastructure for Columnstore engine.

Full-time project 350h - depending on scope

Fuzzing is a well-known technique for finding various types of bugs. This task is to integrate libFuzzer, sanitizers (ASan, TSan, UBSan) and MCS Columnstore into one fuzzing pipeline and create a fuzzing infrastructure. This task requires:

  1. Add support to the Columnstore for building with sanitizers (ASan, TSan, UBSan)

  2. Write a code which integrates C++ MariaDB Connectors and libFuzzer.

  3. Prepare a valid corpus with SQL sripts suitable for Columnstore.

  4. Create a fuzzing infrastructure.

Skills needed:

  1. Basic knowledge how to work with C++ build tools (CMake, clang, ld, rtld).

  2. Basic C++.

Mentor: Denis Khalikov

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2022

In 2022, we again participated in the Google Summer of Code. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, C++, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

  • MDEV-21978 make my_vsnprintf to use gcc-compatible format extensions (Part-time project - 175h) MariaDB has its own implementation of most standard C libraries. This is to ensure compatibility across different platforms. Over time this library has evolved and is currently not behaving exactly like POSIX standard library. Thus we want to attain the principle of "least-surprise" with this library. Everything that is supported by standard printf functions should function the same with MariaDB's compatibility library extension.

    • Skills needed: C/C++. Project difficulty: easy Mentor Sergei Golubchik

  • MDEV-19160 JSON_DETAILED output unnecessarily verbose. (Part Time project - 175h) As is explained in the MDEV in detail, we want to improve JSON_DETAILED function to better suit our development and debugging purposes. This project will aim to clean up the function's implementation as well as introduce test cases as well as potential "nice-to-have" features to make developer's lives easier.

    • Skills needed: C/C++, understand JSON, OOP. Project difficulty: easy Mentor: Vicențiu Ciorbaru / Sergei Petrunia

  • MDEV-18827 Create utility to parse frm files and print their DDL (Full-time project - potential part-time (175 - 350h, depending on scope)) FRM files are what MariaDB uses to store metadata about tables. These files can be used to generate DDL statements (CREATE TABLE ...). We are lacking a utility to parse these files which could in turn make DBAs lives easier. The task of this project is to have this utility implemented, making use of MariaDB's FRM parsing logic. You may have to carry out some refactoring to extract the parsing code into a reusable library, once for MariaDB Server, once for the FRM parsing tool.

    • Skills needed: C/C++, understanding libraries and APIs. Project difficulty: medium to hard, depending on time allocated Mentor Vicențiu Ciorbaru / Sergei Golubchik / Monty Widenius

  • MDEV-17467 Add linear regression functions (Full-time project - 350h) This project consists of implementing dedicated regression functions within MariaDB. The specification of each function will be decided during the project, based on what other competing databases are offering. We will choose an implementation that best matches user expectations. It is the student's job to perform research into at least one other database and come up with exact semantics for each one of the functions in the MDEV.

    • Skills needed: C/C++, understanding of regression functions and mathematics, APIs, OOP. Project difficulty: medium Mentor Vicențiu Ciorbaru

  • MDEV-23251 Client compatible delimiter for mysqltest (Full-time project - potential part-time (175 - 350h, depending on scope)) We have a DELIMITER command that has a different syntax in client and mysqltest mysqltest needs an additional (previous) delimiter in the end of the DELIMITER expression, which confuses and makes it hard to copy and paste simple scripts with stored procedures from one to another. We would like to have a new command (--delimiter=client|mysqltest) with current behavior as default.

    • Expected outcomes: You will learn the C command line tools development minors. You'll get familiar with a part of MariaDB server infrastructure -- testing framework, which is written in C and Perl.

    • Skills required: good C knowledge; ability to use console terminal, ability to build from console. Project difficulty: easy Mentor Nikita Malyavin

  • MDEV-25774 Improve build speed of the server code base (Full-time project - 350h) We have already learned that precompiling the headers improves the build speed five times, however, the standard CMake solution doesn't fit for our comfort of development: CMake PCH generates one "header of headers" and pre-includes it into the each of compilation units. This makes everything that wasn't included by the unit itself be available across the precompiled set. There are the alternative ways of unit precompiling: clang modules and gcc .gch files. We want to prefer it on the per-comiler header.

    • Expected outcomes: You will make a strong practical impact of a high need and show off your mix of analysis and programming skills.

    • Skills required: good C knowledge, some CMake knowledge; ability to use console terminal, ability to build from console. Project difficulty: medium Mentor Nikita Malyavin

  • MDEV-12130 Improve mysqltest language(Full-time project - 350h) mysqltest has a lot of historical problems: ad hoc parser, weird limitations commands added as needed with no view over the total language structure, etc. The purpose of this work would be improvement of the language.

    • Expected outcomes : Rewrite mysqltest interpreter using either a real parser generator, e.g bison, or cleanly hand-written parser, e. recursive descent, that can be easily extended with new functionality. Added missing control structures, for example "else" for an existing "if". Simple expression evaluations without contacting server, i.e math and string comparisons. Added functionality for minimal string manipulation ,e.g substr function.

    • Skills required: good C/C++ knowledge, interest in parsers/interpreter. Project difficulty: medium Mentor Vladislav Vaintroub

  • MDEV-26182 Create a function to check for JSON intersection (Part-time project - 175h) This project aims at implementing JSON_INTERSECT() function between two JSON objects or two JSON arrays. If the two documents are json arrays then we want to return all the common elements between the two arrays. And in case of objects, we want to return all common key value pairs.

    • Skills required: C/C++, OOP, basic understanding of JSON. Project difficulty: easy Mentor Rucha Deodhar, Oleksandr Byelkin

ColumnStore

  • MCOL-4995 Research/implement basic vectorized filtering for ARM platforms (Full-time project - 350h) AS of 6.2.2 Columnstore(MCS) supports vectorization on x86_64 platforms only. The goal of the project is to implement a vectorized low-level filtering for ARM platforms using 128bit ARM NEON extention(SVE is optional). Low-level filtering in the context is the simple predicate WHERE filters, e.g. WHERE c1 = 5 AND c2 in (10, 25). Please see the corresponding Jira issue for details.

    • Skills needed: C/C++, understand low-level platform specifics. Project difficulty: medium Mentor Roman Nozdrin

  • MCOL-4994 Build/run Columnstore on MacOS (Part-time project - 175h) As of Columnstore(MCS) 6.2.2 there is no way to compile/use the MCS engine on MacOS. The goal of this project is to be able to boostrap MariaDB + basic(maybe rudimentary) MCS on MacOS. There are number of known issues that prevents MCS compilation on MacOS: a number of offending preprocessor macroses/definitions specific for Linux x86_64 combination; MacOS doesn't provide syslog used by MCS as the only log messages sink. Please see the corresponding Jira issue for details.

    • Skills needed: C/C++, MacOS specifics. Project difficulty: easy Mentor Roman Nozdrin

  • MCOL-785 Implement DISTRIBUTED JSON functions (Full-time project - 350h) As of 6.2.2 Columnstore there are two query execution modes: relatively slow but compatible Table mode and fast Select Handler mode. Table mode execution supports all JSON_* functions and SH mode doesn't support any. We want to add support for JSON_* functions family in SH query execution mode. Please see the corresponding Jira issue for details.

    • Skills needed: C/C++, JSON format. Project difficulty: easy Mentor Roman Nozdrin

Buildbot (CI/CD)

  • MDBF-320 Better Grid view for buildbot.mariadb.org (Python / Javascript / Web Dev ( Full-time project 350h) Our CI/CD infrastructure uses a recent version of Buildbot. The GRID view plugin that comes with Buildbot is not adequate for our needs. In this project, you will discuss with your mentor as well as other MariaDB developers on how to best improve the User Experience of Buildbot's grid view for what MariaDB Developers needs to accomplish.

    • Skills needed: Understanding of web-dev technologies like Angular, React, and Javascript related libraries. Python may also be required. Mentor Vlad Bogolin / Andreia Hendea

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2021

In 2021, we again participated in the Google Summer of Code. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues and issues labelled gsoc21 from the MariaDB Issue Tracker.

List of Tasks

Support for GTID in mysqlbinlog

The mysqlbinlog client program needs to be updated to support GTID. Here is a suggested list of things to be done:

  • The --start-position and --stop-position options should be able to take GTID positions; or maybe there should be new --start-gtid and --stop-gtid options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.

  • A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.

  • Starting and stopping GTID should work both with local files, and with--read-from-remote-server. For the latter, there are a couple of extra things that need doing in the master-slave protocol, seeget_master_version_and_clock() in sql/slave.cc.

  • At the end of the dump, put these statements, to reduce the risk of those session variables incorrectly spilling into subsequent statements run in the same session:

SET session.server_id = @@global.server_id,
       session.gtid_domain_id=@@global.gtid_domain_id;

Probably some more things will come up during the work, but this looks like a reasonable start.

Details:
Mentor:

Details:

MDEV-4989

Mentor:

Brandon Nesterenko

Granted to PUBLIC

Implement the standard behavior for

GRANT xxx TO PUBLIC;
REVOKE xxx FROM PUBLIC;

Also, this statement is supposed to work:

SHOW GRANTS FOR PUBLIC;

And these should not

CREATE ROLE PUBLIC;
DROP ROLE PUBLIC;
SET ROLE PUBLIC;
GRANT PUBLIC TO xxx;
REVOKE PUBLIC FROM xxx;

Note that

SHOW GRANTS FOR xxx;

should not list roles and privileges granted to PUBLIC (unless granted to xxx too), but

SHOW GRANTS;

should, arguably, list them.

Details:
Mentor:

Details:

MDEV-5215

Mentor:

Oleksandr Byelkin

Control over memory allocated for SP/PS

SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)Task minimum: Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.Other ideas:

  • Automatic stop allocation in debugging version after second execution and call exception on attempt.

  • Information schema by threads and SP/PS with information about allocated and used memory

Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.

Details:
Mentor:

Details:

MDEV-14959

Mentor:

Oleksandr Byelkin

Add JSON_NORMALIZE function to normalize JSON values

Background is this question on stackexchange:mariadb-compare-json The task is to provide a function that can be used to compare 2 json documents for equality, then name can be e.g JSON_NORMALIZE JSON_COMPACT already takes care of removing spaces, but this is not sufficient. Keys need to be (recursively) sorted , and if spaces are removed, then documents can be compared as binary strings.

Details:
Mentor:

Details:

MDEV-16375

Mentor:

Vicențiu Ciorbaru

Add linear regression functions

The following linear regression functions exist in a number of other DBMSs, such as Oracle, PostgreSQL:

REGR_SLOPE
    REGR_INTERCEPT
    REGR_COUNT
    REGR_R2
    REGR_AVGX
    REGR_AVGY
    REGR_SXX
    REGR_SYY
    REGR_SXY

Some have also been added to Columnstore.

Details:
Mentor:

Details:

MDEV-17467

Mentor:

Nikita Malyavin

Create utility to parse frm files and print their DDL

It would be useful if MariaDB had a utility that was able to parse frm files and print the DDL associated with the table. For example, it would be useful for users who performed a partial backup with mariadb-backup:partial-backup-and-restore-with-mariadb-backup But they forgot to also backup the table DDL, so they can't restore the tables using the following process:innodb-file-per-table-tablespaces mysqlfrm is a tool that already exists that does similar things:mysqlfrm.py But it doesn't seem very user-friendly. It needs to be able to contact the local MariaDB server, and it also needs to be able to spawn a server instance, and it seems to need to be able to create a bunch of files during this process. e.g.:

[ec2-user@ip-172-30-0-249 ~]$ cd /tmp
[ec2-user@ip-172-30-0-249 tmp]$ sudo mysqlfrm --server=root:@localhost:3306 /var/lib/mysql/db1/tab.frm --port=12345 --user=mysql
# Source on localhost: ... connected.
# Spawning server with --user=mysql.
# Starting the spawned server on port 12345 ... done.
# Reading .frm files
#
# Reading the tab.frm file.
#
# CREATE statement for /var/lib/mysql/db1/tab.frm:
#

CREATE TABLE `db1`.`tab` (
  `id` int(11) NOT NULL,
  `str` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

#...done.
Details:
Mentor:

Details:

MDEV-18827

Mentor:

Vicențiu Ciorbaru

JSON_DETAILED output unnecessarily verbose

JSON_DETAILED function (json_detailed ) is described as

Represents JSON in the most understandable way emphasizing nested structures.

We now got a use case for it: Optimizer Trace output. Optimizer trace is too large to be copied in full, instead we use expressions like

SELECT 
JSON_DETAILED(JSON_EXTRACT(trace, '$**.analyzing_range_alternatives')) 
FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;

Our experience is that JSON_DETAILED has some room for improvement when it comes to the quality of automatic JSON formatting. Example:

source mdev19160-data.sql
mysql> select JSON_DETAILED(JSON_EXTRACT(a, '$**.analyzing_range_alternatives')) from t200\G
*************************** 1. row ***************************
JSON_DETAILED(JSON_EXTRACT(a, '$**.analyzing_range_alternatives')): [

    {
        "range_scan_alternatives": 
        [

            {
                "index": "a_b",
                "ranges": 
                [
                    "2 <= a <= 2 AND 4 <= b <= 4"
                ],
                "rowid_ordered": true,
                "using_mrr": false,
                "index_only": true,
                "rows": 1,
                "cost": 1.1752,
                "chosen": true
            }
        ],
        "analyzing_roworder_intersect": 
        {
            "cause": "too few roworder scans"
        },
        "analyzing_index_merge_union": 
        [
        ]
    }
]

Things to note:

  • empty lines at the start (right before/after the "range_scan_alternatives")

  • "analyzing_index_merge_union":[] occupies 3 lines where one would be sufficient.

  • the same goes for "ranges"

One can look at the JSON pretty-printer that is used by EXPLAIN FORMAT=JSON and optimizer trace. It produces a better result (but it has room for improvement, too.) Extra: in MySQL, the function is called JSON_PRETTY. We should add ability to use this name as an alias.

Details:
Mentor:

Details:

MDEV-19160

Mentor:

Vicențiu Ciorbaru

Histograms: use JSON as on-disk format

Currently, histograms are stored as array of 1-byte bucket bounds (SINGLE_PREC_HB) or 2-byte bucket bounds (DOUBLE_PREC_HB). The table storing the histograms supports different histogram formats but limits them to 256 bytes (hist_size is tinyint).

CREATE TABLE mysql.column_stats (
  min_value varbinary(255) DEFAULT NULL, 
  max_value varbinary(255) DEFAULT NULL, 
  ...
  hist_size TINYINT UNSIGNED, 
  hist_type ENUM('SINGLE_PREC_HB','DOUBLE_PREC_HB'), 
  histogram varbinary(255), 
  ...

This prevents us from supporting other kinds of histograms. The first low-hanging fruit would be to store the histogram bucket bounds precisely (like MySQL and PostgreSQL do, for example). The idea of this MDEV is to switch to JSON as storage format for histograms. If we do that, it will:

  • Improve the histogram precision

  • Allow the DBAs to examine the histograms

  • Enable other histogram types to be collected/used. h2. Milestone-1: Let histogram_type have another possible value, tentative name "JSON" when that is set, let ANALYZE TABLE syntax collect a JSON "histogram"

{ "hello":"world"}

that is, the following should work:

SET histogram_type='json';
ANALYZE TABLE t1 persisent FOR ALL;
SELECT histogram FROM mysql.column_stats WHERE TABLE_NAME='t1' ;

this should produce {"hello":"world"}. h2. Milestone-2: produce JSON with histogram(). &#xNAN;()- the exact format is not specified, for now, print the bucket endpoints and produce output like this:

[
  "value1",
  "value2",
  ...
]

Milestone-2, part#2: make mysql.column_stats.histogram a blob. h2. Milestone-3: Parse the JSON back into an array Figure out how to use the JSON parser. Parse the JSON data produced in Milestone-2 back. For now, just print the parsed values to stderr. (Additional input provided on Zulip re parsing valid/invalid JSON histograms) h2. Milestone-4: Make the code support different kinds of Histograms Currently, there's only one type of histogram. smaller issue: histogram lookup functions assume the histogram stores fractions, not values. bigger issue: memory allocation for histograms is de-coupled from reading the histograms. See alloc_statistics_for_table, read_histograms_for_table. The histogram object lives in a data structure that is bzero'ed first and then filled later (IIRC there was a bug (fixed) where the optimizer attempted to use bzero'ed histogram) Can histograms be collected or loaded in parallel by several threads? This was an (unintentional?) possibility but then it was disabled (see TABLE_STATISTICS_CB object and its use) h3. Step #0: Make Histogram a real class Here's the commit:3ac32917ab6c42a5a0f9ed817dd8d3c7e20ce34d h3. Step 1: Separate classes for binary and JSON histograms Need to introduce

class Histogram -- interface, no data members.
class Histogram_binary : public Histogram
class Histogram_json : public Histogram

and a factory function

Histogram *create_histogram(Histogram_type)

for now, let Histogram_json::point_selectivity() and Histogram_json::range_selectivity() return 0.1 and 0.5, respectively. h3. Step 2: Demonstrate saving/loading of histograms Now, the code already can:

  • collect a JSON histogram and save it.

  • when loading a histogram, figure from histogram_type column that this is JSON histogram being loaded, create Histogram_json and invoke the parse function. Parse function at the moment only prints to stderr. However, we should catch parse errors and make sure they are reported to the client. The test may look like this:

INSERT INTO mysql.column_stats VALUES('test','t1','column1', .... '[invalid, json, data']);
FLUSH TABLES;
# this should print some descriptive test
--error NNNN
SELECT * FROM test.t1;

h2. Milestone-5: Parse the JSON data into a structure that allows lookups. The structure is

std::vector<std::string>

and it holds the data in KeyTupleFormat (See the comments for reasoning. There was a suggestion to use in_vector (This is what IN subqueries use) but it didn't work out) h2. Milestone 5.1 (aka Milestone 44) Make a function to estimate selectivity using the data structure specified in previous milestone. h2. Make range_selectivity() accept key_range parameters. (currently, they accept fractions, which is only suitable for binary histograms) This means Histogram_binary will need to have access to min_value and max_value to compute the fractions.

Details:
Mentor:

Details:

MDEV-21130

Mentor:

Sergei Petrunia

make my_vsnprintf to use gcc-compatible format extensions

my_vsnprintf() is used internally in the server as a portable printf replacement. And it's also exported to plugins as a service. It supports a subset of printf formats and three extensions:

  • %s` means that a string should be quoted like an identifier

  • %b means that it's a binary string, not zero-terminated; printing won't stop at \0, so one should always specify the field width (like %.100b)

  • %M is used in error messages and prints the integer (errno) and the corresponding strerror() for it

gcc knows printf formats and check whether actual arguments match the format string and issue a warning if they don't. Unfortunately there seems to be no easy way to teach gcc our extensions, so for now we have to disable printf format checks. An better approach would be to use gcc compatible format extensions, like Linux kernel does. We should migrate to a different syntax for our extensions

  • %sI to mean "print as an identifier"

  • %sB to mean "print a binary string"

  • %uE to mean "print an errno"

  • %sT to put a "..." as truncation indicator

old formats can still be supported or they can be removed and in the latter case the major version of the service should be increased to signal an incompatible change. All error messages and all usages of my_vsnprintf should be changed to use the new syntax and gcc printf format checks should be enabled.

Details:
Mentor:

Details:

MDEV-21978

Mentor:

Sergei Golubchik

Add JSON_EQUALS function to check JSON equality

JSON_CONTAINS can be used to test for JSON object equality in some cases, but we seem to lack a clear JSON_EQUALS function.

Details:
Mentor:

Details:

MDEV-23143

Mentor:

Vicențiu Ciorbaru

Concurrent multi-reader, multi-writer buffer for IO_CACHE

IO_CACHE has basically three read/write modes: only read, only write, and a sequential read/write FIFO mode SEQ_READ_APPEND. While some performance-sensitive places, like replication slave thread, use SEQ_READ_APPEND, that may be a bottleneck. since reads and writes are sequential (and co-sequential i.e. reads and writes block each other). The task is to implement a non-blocking mode or multi-reader, multi-writer use-case through a concurrent ring buffer implementation. h2. Possible approaches h3. Lock-free n-consumer, m-producer ring buffer This implementation requires limiting a number of simultaneous accessors and reserving slots for them. Lock-free implementations can contain busy waits, but no locks, except when a number of consumers or producers is exceeded. This can be controlled by a semaphore with a capacity of a number of cores. This is an ideal way, but can be an overhaul because of complicated busy loops and slot management. This is also hard because writes can be bigger than a buffer. See buffer excess. h3. Simple rwlock-based non-blocking approach The bottleneck basically occurs because SEQ_READ_APPEND blocks the whole time during buffer copy. We can avoid it by moving the pointers first, thus allocating a place for copying, and then make a copy from/to the buffer without a lock. rwlock will be used to access the pointers, i.e. reads access IO_CACHE::end_of_file with read lock to make ensure the borders, but writers access it with write lock. h2. Buffer excess Excesses make things work sequential. When the buffer is full, a separate write buffer is created. When the write buffer is full, a flush happens. Flushes wait for all writers to finish first, then lock the write buffer for flushing. The read buffer can be flushed in a more relaxed way: no need to need to lock for flushing, but we have to lock for buffer allocation and wait for all writers. Waiting for writers can be done with another rwlock. h2. Single-readerness The real-world cases are mostly single-consumer, and it is essential for IO_CACHE: it is variable-lengthed, and has no underlying data format, so the reader always has to make at least two sequential reads (one to read size and another to read the body) Single-readerness considerations can relax some conditions and ease the implementation h2. io_cache_reserve api We can add a function to reserve the space to writing for the case of writing big objects (both bigger then the write cache and smaller then the latter, but big enough to not fit to the external buffer), for the cases like copying one cache to another. The function should return future-like object, since we have to notify IO_CACHE back that the writing is finished (to make flush for example)

Details:
Mentor:

Details:

MDEV-24676

Mentor:

Nikita Malyavin

Custom formatting of strings in MariaDB queries

Formatting more complex strings in a SELECT statement can get awkward when there are many concat(), format(), etc calls involved. It would be very cool and helpful to have a function that takes an input string and a formatting specification and returns string formatted using the rules the user passed in the specification. A great example for such a function is the classic C printf function, which, in this context, would look something like:SELECT printf('%s %s, %s', first_name, last_name, job_title) from employees; But it doesn't necessarily need to look this way, an alternative syntax could be Python-ish, which would leverage the fact that the server already knows the datatype of each field used in the formatting scheme:SELECT sformat('arg1: {}, arg2: {}', col1, col2) from table; In that syntax one passes formatting options within the curly braces:

-- Print 'arg1: col1, arg2: col2'  where col1 from table is of datetime type and should be printed as: 'Sunday November 2021'
SELECT sformat('arg1: {%W %M %Y}, arg2: {}', col1, col2) FROM table;

Ideally, this new function should use, behind the scenes, the existing builtin formatting functions in MariaDB (e.g. date_format(), format()) and even future formatting functions (e.g. MySQL's format_bytes(), format_pico_time()), so the syntax has to be designed in a smart way to accommodate easily future additions.

Details:
Mentor:

Details:

MDEV-25015

Mentor:

Sergei Golubchik

Add autocompletion capabilities to the MariaDB Jupyter kernel

As part of the Jupyter Messaging protocol, the Jupyter frontend sends a complete_request message to the MariaDB kernel when the user invokes the code completer in a Jupyter notebook. This message is handled in the do_complete function from the MariaDBKernel class. In simpler words, whenever the user hits the key shortcut for code autocompletion in a notebook, the MariaDB kernel's do_complete function is called with a number of arguments that help the kernel understand what the user wants to autocomplete. So the autocompletion infrastructure in the MariaDB kernel is already kindly provided by Jupyter, we only need to send back to Jupyter a list of suggestions based on the arguments that do_complete receives :-). Ideally we should aim to enable at least database, table and column name completion and also SQL keyword completion. But no worries, there are plenty of possibilities to extend the functionality even more if the accepted student turns out to be very productive :D

Details:
Mentor:

Details:

Project Issue

Mentor:

Robert Bindar

Implement interacting editing of result sets in the MariaDB Jupyter kernel

At this moment the MariaDB kernel is only capable of getting the results sets from the MariaDB client in HTML format and packing them in a Jupyter compatible format. Jupyter then displays them in notebooks like it would display Python Pandas dataframes. Sure, the users can easily write SQL code to modify the content of a table like they would write in a classical command line database client. But we want to go a bit further, we would love to have the capability to edit a result set returned by a SELECT statement (i.e. double click on table cells and edit) and have a button that users can press to generate a SQL statement that will update the content of the table via the MariaDB server. Apart from interacting with the Jupyter frontend for providing this UI capability, we also have to implement a field integrity functionality so that we make sure users can't enter data that is not compatible with the datatype of the column as it is seen by the MariaDB server. The project should start with a fair bit of research to understand how we can play with the Jupyter Messaging protocol to create the UI functionality and also to check other Jupyter kernels and understand what's the right and best approach for tackling this.

Details:
Mentor:

Details:

Project Issue

Mentor:

Andreia Hendea

Make the MariaDB Jupyter kernel capable of dealing with huge SELECTs

Currently the MariaDB kernel doesn't impose any internal limits for the number of rows a user can SELECT in a notebook cell. Internally the kernel gets the result set from MariaDB and stores it in a pandas DataFrame, so users can use it with magic commands to chart data. But this DataFrame is stored in memory, so if you SELECT a huge number of rows, say 500k or 1M, it's probably not a very good idea to create such a huge DataFrame. We tested with 500k rows, and the DataFrame itself is not the biggest problem, it consumed around 500MB of memory. The problem is the amount of rows the browser needs to render, for 500k rows the browser tab with the notebook consumes around 2GB of memory, so the Jupyter frontend (JupyterLab, Jupyter Notebook) slows down considerably. A potential solution is to introduce a two new config options which would specify:

  • a limit for the number of rows the Jupyter notebook should render, a reasonable default value for this could 50 rows for instance (display_max_rows)

  • a limit for each SELECT statement, limit_max_rows, that the kernel would use to determine whether it should store the result set in memory in a DataFrame or store the result set on disk. A reasonable default value might be 100k rows.

The trickiest part of the project though is that, once the kernel writes a result set on disk, the charting magic commands need to detect that the data is not in memory, it is on disk, and they should find a smart mechanism for generating the chart from the disk data without loading the entire data in memory (which would defeat the whole purpose of the project). This might involve finding a new Python plotting library (instead of current matplotlib) that can accomplish the job.

Details:
Mentor:

Details:

Project Issue

Mentor:

Vlad Bogolin

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2020

We participated in the Google Summer of Code 2020. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip and on IRC to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues and issues labelled gsoc20 from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server: Optimizer

Evaluate subquery predicates earlier or later depending on their SELECTIVITY

(Based on conversation with Igor) There are a lot of subquery conditions out there that are inexpensive to evaluate and have good selectivity. If we just implement MDEV-83, we may get regressions. We need to take subquery condition's selectivity into account. It is difficult to get a meaningful estimate for an arbitrary, correlated subquery predicate. One possible solution is to measure selectivity during execution and reattach predicates on the fly. We don't want to change the query plan all the time, one way to dynamically move items between item trees is to wrap them inside Item_func_trig_cond so we can switch them on and off.

Details:
Mentor:

Details:

MDEV-383

Mentor:

Igor Babaev

Histograms with equal-width bins in MariaDB

Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval

(max(f)-min(f))/k *(i-1), max(f)-min(f))/k *i-1).

It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N. To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:

ANALYZE FAST TABLE tbl [ WITH n ROWS ] [SAMPLING p PERCENTS ]
   PERSISTENT FOR COLUMNS (col1 [IN RANGE r] [WITH k INTERVALS],...)

Here:

  • 'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.

  • 'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.

  • 'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.

  • WITH k INTERVALS says how many bins are included in the histogram. If it is omitted this value is taken from the system variable histogram_size.

Details:
Mentor:

Details:

MDEV-12313

Mentor:

Vicentiu Ciorbaru

Add FULL OUTER JOIN to MariaDB

Add support for FULL OUTER JOIN sql_join_full.asp One of the way how to implement is to re-write the query

SELECT t1.*, t2.* FROM t1 FULL OUTER JOIN t2 ON P(t1,t2)

into the following union all:

SELECT t1.*, t2.* FROM t1 LEFT OUTER JOIN t2 ON P(t1,t2) 
UNION ALL 
SELECT t1.*,t2.* FROM t2 LEFT OUTER JOIN t1 ON P(t1,t2) WHERE t1.a IS NULL

Here t1.a is some non-nullable column of t1 (e.g. the column of single column primary key).

Details:
Mentor:

Details:

MDEV-13648

Mentor:

Igor Babaev

Recursive CTE support for UPDATE (and DELETE) statements

CREATE TABLE tree (
       `Node` VARCHAR(3),
       `ParentNode` VARCHAR(3),
       `EmployeeID` INTEGER,
       `Depth` INTEGER,
       `Lineage` VARCHAR(16)
     );

     INSERT INTO tree
       (`Node`, `ParentNode`, `EmployeeID`, `Depth`, `Lineage`)
     VALUES
       ('100', NULL, '1001', 0, '/'),
       ('101', '100', '1002', NULL, NULL),
       ('102', '101', '1003', NULL, NULL),
       ('103', '102', '1004', NULL, NULL),
       ('104', '102', '1005', NULL, NULL),
       ('105', '102', '1006', NULL, NULL);
 
     WITH RECURSIVE prev AS (
     SELECT * FROM tree WHERE ParentNode IS NULL
     UNION
     SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
     FROM tree t JOIN prev p ON t.ParentNode = p.Node
     )
     SELECT * FROM prev;

     WITH RECURSIVE prev AS (
     SELECT * FROM tree WHERE ParentNode IS NULL
     UNION
     SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
     FROM prev p JOIN tree t ON t.ParentNode = p.Node
     )
     UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.Node; 

You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.No' at line 7

supported in MySQL-8.0 and MSSQL

Details:
Mentor:

Details:

MDEV-18511

Mentor:

Igor Babaev

MariaDB Server: others

Support for GTID in mysqlbinlog

The mysqlbinlog client program needs to be updated to support GTID. Here is a suggested list of things to be done:

  • The --start-position and --stop-position options should be able to take GTID positions; or maybe there should be new --start-gtid and --stop-gtid options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.

  • A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.

  • Starting and stopping GTID should work both with local files, and with --read-from-remote-server. For the latter, there are a couple of extra things that need doing in the master-slave protocol, see get_master_version_and_clock() in sql/slave.cc.

  • At the end of the dump, put these statements, to reduce the risk of those session variables incorrectly spilling into subsequent statements run in the same session:

SET session.server_id = @@global.server_id,
       session.gtid_domain_id=@@global.gtid_domain_id;
Details:
Mentor:

Details:

MDEV-4989

Mentor:

Andrei Elkin

connection encryption plugin support

As a follow-on to MDEV-4691 we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that. From having written GSSAPI for the internal interface, there were a couple things I would like to see in the plugin encryption interface. First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others. Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around). This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.) Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it. It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!

Details:
Mentor:

Details:

MDEV-9090

Mentor:

Vicențiu Ciorbaru

Aggregate Window Functions

With a few exceptions, most native aggregate functions are supported as window functions.aggregate-functions-as-window-functions In MDEV-7773, support for creating of custom aggregate functions was added. This task proposes to extend that feature and allow custom aggregate functions to be used as window functions An example of a creating a custom aggregate function is given below:

CREATE AGGREGATE FUNCTION agg_sum(x INT) RETURNS DOUBLE
BEGIN
  DECLARE z DOUBLE DEFAULT 0;
  DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN z;
  LOOP
    FETCH GROUP NEXT ROW;
    SET z = z + x;
  END LOOP;
END|

This functions can be used in the following query:

CREATE TABLE balances (id INT, amount INT);
INSERT INTO balances VALUES (1, 10), (2, 20), (3, 30);
 
SELECT agg_sum(amount) FROM balances;

After this task is complete the following must also work:

SELECT agg_sum(amount) OVER (ORDER BY id);
Details:
Mentor:

Details:

MDEV-11263

Mentor:

Varun Gupta

Improve mysqltest language

mysqltest has a lot of historical problems:

  • ad hoc parser, weird limitations

  • commands added as needed with no view over the total language structure

  • historical code issues (e.g. casts that become unnecessary 10 years ago) etc

A lot can be done to improve it. Ideas

  • control structures, else in if, break and continue in while, for (or foreach) loop

  • proper expression support in let, if, etc

  • rich enough expressions to make resorting to sql unnecessary in most cases

  • remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)

  • remove complex commands that do many sql statements under the hood, if they can be scripted, e.g. sync_slave_with_master

  • remove over-verbose treatment of rpl test failures

  • scoped variables

  • parameters for the source command

  • remove dead code

Details:
Mentor:

Details:

MDEV-12130

Mentor:

Sergei Golubchik

Implement multiple-table UPDATE/DELETE returning a result set

A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:

  • first performs join operations

  • for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set

  • after this it updates the matching rows

A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:

  • first performs join operations

  • for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set

  • after this it deletes the matching rows

Details:
Mentor:

Details:

MDEV-12326

Mentor:

Igor Babaev

sort out the compression library chaos

As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes. InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries. A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it

  • hide all compression libraries behind a single unified compression API. Either develop our own or use something like Squash. This would require changing all engines to use this API

  • use the same approach as in server services — create a service per compression library, a service implementation will just return an error code for any function invocation if the corresponding library is not installed. this way — may be — we could avoid modifying all affected storage engines

Details:
Mentor:

Details:

MDEV-12933

Mentor:

Sergei Golubchik

Control over memory allocated for SP/PS

SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)

Task minimum:

Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.

Other ideas:

  • Automatic stop allocation in debvugging version after second execution and call exception on attempt.

  • Information schema by threads and SP/PS with information about allocated and used memory

Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.

Details:
Mentor:

Details:

MDEV-14959

Mentor:

Oleksandr Byelkin

MariaDB ColumnStore

Full DECIMAL support in ColumnStore

MariaDB ColumnStore supports DECIMAL with some limitations:

  1. We do not support the full DECIMAL range that is in MariaDB

  2. In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision Implementing this will likely require the following:

  • Implementation of methods to handle MariaDB's DECIMAL format

  • Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)

  • Modification of the primitives processor for the math

  • Modification of the function expression processor to handle the new type

  • Version upgrade support for DECIMAL from the current form to the new form

Details
Mentor:

Details

MCOL-641

Mentor:

Andrew Hutchings

Replace glibc with google's re2 for regex processing

CS as of 1.4.2 relies on glibc for regex processing. We need to replace glibc with re2 for LIKE, REGEX and other facilities that affects performance.

  1. Identify places with glibc regex functions invocations

  2. Pick the invocations that significantly affects timings of the query

  3. Replace glibc regex calls with appropriate re2

Details
Mentor:

Details

MCOL-3778

Mentor:

Roman

Speed up RowGroup data access methods reducing level of inderection

RowGroup is the unit of the data sent around CS cluster. It is basically a set of records for fixed size data types + additional storage area for binary data, e.g. strings. Right now there are lots of RowGroup methods that access RowGroup fixed size data using extra level of inderection. Here is the example: return ((int64_t) &data[offsets[colIndex]]); This expression uses extra assembly instruction to calculate the effective address. We want to remove 'offsets[colIndex]' part of this and similar expressions in RowGroup code.

Details
Mentor:

Details

MCOL-3759

Mentor:

Roman

Parallel sorting 2nd phase

As of 1.4.1 CS uses two-phase sorting. Here are the phases:

  • Presort partial runs of data.

  • Merge the presorted partial runs produced during the 1st phase. Here is more detailed explanation of how sorting works as of 1.4.1

CS gets a portion of records data from previous steps of the query execution(RGData instance from the ring buffer of RGData-s) and produces a sorting run out of it using existing sorting class LimitedOrderBy. If the query contains LIMIT then we apply it at this phase. This allows to significantly reduce the data set cardinality. If the query contains LIMIT + OFFSET then CS builds a sorted run of the records that is up to LIMIT+OFFSET size. CS does this step in parallel dividing the whole data set into k runs where k is governed by a session variable - infinidb/columnstore_orderby_threads. At this phase CS tries to preallocate memory in QUEUE_RESERVE_SIZE batches. CS merges and sorts k presorted partial runs produced by a previous phase in a single thread. If the query contains DISTINCT keyword CS rebuilds a hash map to preserve uniqueness. We want to make 2nd phase also parallel using range partitioning of the presorted runs produced by the 1st phase. After 1st phase finishes we know the distribution of the sorting key values thus can divide the thread key values run into regions - buckets. Every 2nd phase thread takes values from corresponding region buckets (contains the same values region) from every 1st phase sorted run. Then all 2nd phase threads sorts its runs in parallel. In the end we put the sorted regions in asscending order of the key values into output stream.

Details
Mentor:

Details

MCOL-3758

Mentor:

Roman

Add a bitmap for NULL/empty data

We need a bitmap to store NULL/empty values instead of in-column values for this.

Details
Mentor:

Details

MCOL-3754

Mentor:

Anderw Hutchings

Engine independent statistics for Columnstore

CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:

  • implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection

  • implement ANALYZE TABLE functionality for Columnstore engine

  • implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to MDEV-12313) together with NDV histograms to decrease I/O We expect to have both unit and regression tests but this is optional.

Details
Mentor:

Details

MCOL-2223

Mentor:

Roman

mcsapi needs a new read API Design

We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi. This will likely take the form of:

  1. A reader in ExeMgr which will convert messages from mcsapi into jobs

  2. Code in mcsapi to send/receive the messages Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects). We should look at other ORM frameworks for inspiration as the API design. This task to do the design for this API.

Details:
Mentor:

Details:

MCOL-1151

Mentor:

Andrew Hutchings

Use JIT for aggregation functions

CS uses Volcano processing approach working on one value at a time. This is very inefficient way for analytics workload that usually uses lots of aggregation functions in projections, filtering or sorting. We are interested in using JIT for basic aggregation functions: sum, avg, count, min, max. The patch must compile and run a program that processes and returns the aggregation function result. We were written this description having LLVM in mind as it is widespread and has a lots of examples in the wild. I suggest to start looking at RowAggregation::addRowGroup() from ./utils/rowgroup/rowaggregation.cpp to get what it takmakees to get avg() function result. Here is the link on how to build fast a CS developer environment.

Details:
Mentor:

Details:

MCOL-2222

Mentor:

Roman

Engine independent statistics for Columnstore

CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:

  • implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection

  • implement ANALYZE TABLE functionality for Columnstore engine

  • implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to MDEV-12313) together with NDV histograms to decrease I/O

We expect to have both unit and regression tests but this is optional.

Details:
Mentor:

Details:

MCOL-2223

Mentor:

Roman

Histograms: use JSON as on-disk format

Currently, histograms are stored as array of 1-byte bucket bounds (SINGLE_PREC_HB) or 2-byte bucket bounds (DOUBLE_PREC_HB). The table storing the histograms supports different histogram formats but limits them to 256 bytes (hist_size is tinyint).

CREATE TABLE mysql.column_stats (
  min_value varbinary(255) DEFAULT NULL, 
  max_value varbinary(255) DEFAULT NULL, 
  ...
  hist_size TINYINT UNSIGNED, 
  hist_type ENUM('SINGLE_PREC_HB','DOUBLE_PREC_HB'), 
  histogram varbinary(255), 
  ...

This prevents us from supporting other kinds of histograms. The first low-hanging fruit would be to store the histogram bucket bounds precisely (like MySQL and PostgreSQL do, for example). The idea of this MDEV is to switch to JSON as storage format for histograms.

Details:
Mentor:

Details:

MDEV-21130

Mentor:

Sergei Petrunia

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2019

We participated in the Google Summer of Code 2019. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip and on IRC to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

List of beginner friendly bugs

List of Tasks

Loaded from the MariaDB issue tracker

MariaDB Server: Optimizer

Evaluate subquery predicates earlier or later depending on their SELECTIVITY

(Based on conversation with Igor) There are a lot of subquery conditions out there that are inexpensive to evaluate and have good selectivity. If we just implement MDEV-83, we may get regressions. We need to take subquery condition's selectivity into account. It is difficult to get a meaningful estimate for an arbitrary, correlated subquery predicate. One possible solution is to measure selectivity during execution and reattach predicates on the fly. We don't want to change the query plan all the time, one way to dynamically move items between item trees is to wrap them inside Item_func_trig_cond so we can switch them on and off.

Details:
Mentor:

Details:

MDEV-383

Mentor:

Igor Babaev

Add support for Indexes on Expressions

An index on expression means something like

CREATE TABLE t1 (a INT, b INT, INDEX (a/2+b));
...
SELECT * FROM t1 WHERE a/2+b=100;

in this case the optimizer should be able to use an index. This task naturally splits in two steps:

  1. add expression matching into the optimizer, use it for generated columns. Like in CREATE TABLE t1 (a int, b int, v INT GENERATED ALWAYS AS (a/2+b), INDEX (v));

  2. support the syntax to create an index on expression directly, this will automatically create a hidden generated column under the hood

original task description is visible in the history

Details:
Mentor:

Details:

MDEV-6017

Mentor:

Sergei Golubchik

Histograms with equal-width bins in MariaDB

Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval

(max(f)-min(f))/k *(i-1), max(f)-min(f))/k *i-1).

It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N. To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:

ANALYZE FAST TABLE tbl [ WITH n ROWS ] [SAMPLING p PERCENTS ]
   PERSISTENT FOR COLUMNS (col1 [IN RANGE r] [WITH k INTERVALS],...)

Here:

  • 'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.

  • 'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.

  • 'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.

  • WITH k INTERVALS says how many bins are included in the histogram. If it is omitted this value is taken from the system variable histogram_size.

Details:
Mentor:

Details:

MDEV-12313

Mentor:

Vicentiu Ciorbaru

Add FULL OUTER JOIN to MariaDB

Add support for FULL OUTER JOIN sql_join_full.asp One of the way how to implement is to re-write the query

SELECT t1.*, t2.* FROM t1 FULL OUTER JOIN t2 ON P(t1,t2)

into the following union all:

SELECT t1.*, t2.* FROM t1 LEFT OUTER JOIN t2 ON P(t1,t2) 
UNION ALL 
SELECT t1.*,t2.* FROM t2 LEFT OUTER JOIN t1 ON P(t1,t2) WHERE t1.a IS NULL

Here t1.a is some non-nullable column of t1 (e.g. the column of single column primary key).

Details:
Mentor:

Details:

MDEV-13648

Mentor:

Igor Babaev

Recursive CTE support for UPDATE (and DELETE) statements

CREATE TABLE tree (
       `Node` VARCHAR(3),
       `ParentNode` VARCHAR(3),
       `EmployeeID` INTEGER,
       `Depth` INTEGER,
       `Lineage` VARCHAR(16)
     );

     INSERT INTO tree
       (`Node`, `ParentNode`, `EmployeeID`, `Depth`, `Lineage`)
     VALUES
       ('100', NULL, '1001', 0, '/'),
       ('101', '100', '1002', NULL, NULL),
       ('102', '101', '1003', NULL, NULL),
       ('103', '102', '1004', NULL, NULL),
       ('104', '102', '1005', NULL, NULL),
       ('105', '102', '1006', NULL, NULL);
 
     WITH RECURSIVE prev AS (
     SELECT * FROM tree WHERE ParentNode IS NULL
     UNION
     SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
     FROM tree t JOIN prev p ON t.ParentNode = p.Node
     )
     SELECT * FROM prev;

     WITH RECURSIVE prev AS (
     SELECT * FROM tree WHERE ParentNode IS NULL
     UNION
     SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
     FROM prev p JOIN tree t ON t.ParentNode = p.Node
     )
     UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.Node; 

You have an error IN your SQL syntax; CHECK the MANUAL that corresponds TO your MariaDB server VERSION FOR the RIGHT syntax TO USE near 'UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.No' AT LINE 7

supported in MySQL-8.0 and MSSQL

Details:
Mentor:

Details:

MDEV-18511

Mentor:

Igor Babaev

Implement EXCEPT ALL and INTERSECT ALL operations

SQL Standard allows to use EXCEPT ALL and INTERSECT ALL as set operations. Currently MariaDB Server does not support them. The goal of this task is to support EXCEPT ALL and INTERSECT ALL

  1. at syntax level - allow to use operators EXCEPT ALL and INTERSECT ALL in query expression body

  2. at execution level - implement these operations employing temporary tables (the implementation could use the idea similar to that used for the existing implementation of the INTERSECT operation).

Details:
Mentor:

Details:

MDEV-18844

Mentor:

Igor Babaev

MariaDB Server: others

Implement UPDATE with result set

Add an UPDATE operation that returns a result set of the changed rows to the client.

UPDATE [LOW_PRIORITY] [IGNORE] tbl_name
    SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ...
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]
RETURNING select_expr [, select_expr ...]

I'm not exactly sure how the corresponding multiple-table syntax should look like, or if it is possible at all. But already having it for single-table updates would be a nice feature.

Details:
Mentor:

Details:

MDEV-5092

Mentor:

Igor Babaev

Automatic provisioning of slave

Idea

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave. Setting up a new slave currently involves: 1) installing MariaDB with initial database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying initial data from the master to the slave; and 4) starting the slave with START SLAVE. The idea is to automate step (3), which currently needs to be done manually. The syntax could be something as simple as

LOAD DATA FROM MASTER

This would then connect to the master that is currently configured. It will load a snapshot of all the data on the master, and leave the slave position at the point of the snapshot, ready for START SLAVE to continue replication from that point.

Implementation:

The idea is to do this non-blocking on the master, in a way that works for any storage engine. It will rely on row-based replication to be used between the master and the slave. At the start of LOAD DATA FROM MASTER, the slave will enter a special provisioning mode. It will start replicating events from the master at the master's current position. The master dump thread will send binlog events to the slave as normal. But in addition, it will interleave a dump of all the data on the master contained in tables, views, or stored functions. Whenever the dump thread would normally go to sleep waiting for more data to arrive in the binlog, the dump thread will instead send another chunk of data in the binlog stream for the slave to apply. A "chunk of data" can be:

  • A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION

  • A range of N rows (N=100, for example). Each successive chunk will do a range scan on the primary key from the end position of the last chunk.

Sending data in small chunks avoids the need for long-lived table locks or transactions that could adversely affect master performance. The slave will connect in GTID mode. The master will send dumped chunks in a separate domain id, allowing the slave to process chunks in parallel with normal data. During the provisioning, all normal replication events from the master will arrive on the slave, and the slave will attempt to apply them locally. Some of these events will fail to apply, since the affected table or row may not yet have been loaded. In the provisioning mode, all such errors will be silently ignored. Proper locking (isolation mode, eg.) must be used on the master when fetching chunks, to ensure that updates for any row will always be applied correctly on the slave, either in a chunk, or in a later row event. In order to make the first version of this feature feasible to implement in a reasonable amount of time, it should set a number of reasonable restrictions (which could be relaxed in a later version of the feature):

  • Give up with an error if the slave is not configured for GTID mode (MASTER_USE_GTID != NO).

  • Give up with error if the slave receives any event in statement-based binlogging (so the master must be running in row-based replication mode, and no DDL must be done while the provisioning is running).

  • Give up with an error if the master has a table without primary key.

  • Secondary indexes will be enabled during the provisioning; this means that tables with large secondary indexes could be expensive to provision.

Details:
Mentor:

Details:

MDEV-7502

Mentor:

Andrei Elkin

connection encryption plugin support

As a follow-on to MDEV-4691 we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that. From having written GSSAPI for the internal interface, there were a couple things I would like to see in the plugin encryption interface. First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others. Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around). This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.) Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it. It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!

Details:
Mentor:

Details:

MDEV-9090

Mentor:

Add RETURNING to INSERT

Please add a RETURNING option to INSERT. Example from PostgreSQL

postgres=# CREATE TABLE t1 (id SERIAL, name VARCHAR(100));
CREATE TABLE
postgres=# INSERT INTO t1(name) VALUES('test') RETURNING id;
 id 
----
  1
(1 row)

INSERT 0 1

Inspired by:This could make it easier to write statements which work with both MariaDB and PostgreSQL. And this might improve compatibility with Oracle RDBMS.

Details:
Mentor:

Details:

MDEV-10014

Mentor:

Oleksandr Byelkin

Aggregate Window Functions

With a few exceptions, most native aggregate functions are supported as window functions.aggregate-functions-as-window-functions In MDEV-7773, support for creating of custom aggregate functions was added. This task proposes to extend that feature and allow custom aggregate functions to be used as window functions An example of a creating a custom aggregate function is given below:

CREATE AGGREGATE FUNCTION agg_sum(x INT) RETURNS DOUBLE
BEGIN
  DECLARE z DOUBLE DEFAULT 0;
  DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN z;
  LOOP
    FETCH GROUP NEXT ROW;
    SET z = z + x;
  END LOOP;
END|

This functions can be used in the following query:

CREATE TABLE balances (id INT, amount INT);
INSERT INTO balances VALUES (1, 10), (2, 20), (3, 30);
 
SELECT agg_sum(amount) FROM balances;

After this task is complete the following must also work:

SELECT agg_sum(amount) OVER (ORDER BY id);
Details:
Mentor:

Details:

MDEV-11263

Mentor:

Varun Gupta

True ALTER LOCK=NONE on slave

Currently no true LOCK=NONE exists on slave. Alter table is first committed on master then it is replicated on slaves. The purpose of this task is to create a true LOCK=NONE

Implementation Idea

Master will write BEGIN_DDL_EVENT in binlog after it hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On reciving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute untill ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are avaliable then event will be enqueued, then If we recieved rollback event the we will simply discard event from queue, If we recieved commit event then SQL thread will syncrolysly process DDL event.

Details:
Mentor:

Details:

MDEV-11675

Mentor:

Sachin Setiya

Improve mysqltest language

mysqltest has a lot of historical problems:

  • ad hoc parser, weird limitations

  • commands added as needed with no view over the total language structure

  • historical code issues (e.g. casts that become unnecessary 10 years ago) etc

A lot can be done to improve it. Ideas

  • control structures, else in if, break and continue in while, for (or foreach) loop

  • proper expression support in let, if, etc

  • rich enough expressions to make resorting to sql unnecessary in most cases

  • remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)

  • remove complex commands that do many sql statements under the hood, if they can be scripted, e.g. sync_slave_with_master

  • remove over-verbose treatment of rpl test failures

  • scoped variables

  • parameters for the source command

  • remove dead code

Details:
Mentor:

Details:

MDEV-12130

Mentor:

Sergei Golubchik

Implement multiple-table UPDATE/DELETE returning a result set

A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:

  • first performs join operations

  • for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set

  • after this it updates the matching rows

A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:

  • first performs join operations

  • for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set

  • after this it deletes the matching rows

Details:
Mentor:

Details:

MDEV-12326

Mentor:

Igor Babaev

sort out the compression library chaos

As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes. InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries. A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it

  • hide all compression libraries behind a single unified compression API. Either develop our own or use something like Squash. This would require changing all engines to use this API

  • use the same approach as in server services — create a service per compression library, a service implementation will just return an error code for any function invocation if the corresponding library is not installed. this way — may be — we could avoid modifying all affected storage engines

Details:
Mentor:

Details:

MDEV-12933

Mentor:

Sergei Golubchik

Control over memory allocated for SP/PS

SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)

Task minimum:

Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.

Other ideas:

  • Automatic stop allocation in debvugging version after second execution and call exception on attempt.

  • Information schema by threads and SP/PS with information about allocated and used memory

Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.

Details:
Mentor:

Details:

MDEV-14959

Mentor:

Oleksandr Byelkin

MariaDB ColumnStore

Full DECIMAL support in ColumnStore

MariaDB ColumnStore supports DECIMAL with some limitations:

  1. We do not support the full DECIMAL range that is in MariaDB

  2. In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision Implementing this will likely require the following:

  • Implementation of methods to handle MariaDB's DECIMAL format

  • Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)

  • Modification of the primitives processor for the math

  • Modification of the function expression processor to handle the new type

  • Version upgrade support for DECIMAL from the current form to the new form

Details
Mentor:

Details

MCOL-641

Mentor:

Andrew Hutchings

mcsapi needs a new read API Design

We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi. This will likely take the form of:

  1. A reader in ExeMgr which will convert messages from mcsapi into jobs

  2. Code in mcsapi to send/receive the messages Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects). We should look at other ORM frameworks for inspiration as the API design. This task to do the design for this API.

Details:
Mentor:

Details:

MCOL-1151

Mentor:

Andrew Hutchings

Use JIT for aggregation functions

CS uses Volcano processing approach working on one value at a time. This is very inefficient way for analytics workload that usually uses lots of aggregation functions in projections, filtering or sorting. We are interested in using JIT for basic aggregation functions: sum, avg, count, min, max. The patch must compile and run a program that processes and returns the aggregation function result. We were written this description having LLVM in mind as it is widespread and has a lots of examples in the wild. I suggest to start looking at RowAggregation::addRowGroup() from ./utils/rowgroup/rowaggregation.cpp to get what it takmakees to get avg() function result. Here is the link on how to build fast a CS developer environment.

Details:
Mentor:

Details:

MCOL-2222

Mentor:

Roman

Engine independent statistics for Columnstore

CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:

  • implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection

  • implement ANALYZE TABLE functionality for Columnstore engine

  • implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to MDEV-12313) together with NDV histograms to decrease I/O

We expect to have both unit and regression tests but this is optional.

Details:
Mentor:

Details:

MCOL-2223

Mentor:

Roman

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2018

We participated in the Google Summer of Code 2018. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

List of beginner friendly bugs

List of tasks

Loaded from the MariaDB issue tracker

Full DECIMAL support in ColumnStore/

MariaDB ColumnStore supports DECIMAL with some limitations:

  1. We do not support the full DECIMAL range that is in MariaDB

  2. In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision

Implementing this will likely require the following:

  • Implementation of methods to handle MariaDB's DECIMAL format

  • Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)

  • Modification of the primitives processor for the math

  • Modification of the function expression processor to handle the new type

  • Version upgrade support for DECIMAL from the current form to the new form

Details:
Mentor:

Details:

MCOL-641

Mentor:

Andrew Hutchings

mcsapi needs a new read API Design/

We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi.

This will likely take the form of:

  1. A reader in ExeMgr which will convert messages from mcsapi into jobs

  2. Code in mcsapi to send/receive the messages

Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects).

We should look at other ORM frameworks for inspiration as the API design.

This task to do the design for this API.

Details:
Mentor:

Details:

MCOL-1151

Mentor:

Andrew Hutchings

Support for GTID in mysqlbinlog/

The mysqlbinlog client program needs to be updated to support GTID.

Here is a suggested list of things to be done:

  • The --start-position and --stop-position options should be able to take GTID positions; or maybe there should be new --start-gtid and --stop-gtid options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.

  • A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.

  • Starting and stopping GTID should work both with local files, and with

    1. --read-from-remote-server. For the latter, there are a couple of extra things that need doing in the master-slave protocol, see

    2. get_master_version_and_clock()insql/slave.cc.

  • At the end of the dump, put these statements, to reduce the risk of those session variables incorrectly spilling into subsequent statements run in the same session:

SET session.server_id = @@global.server_id,
      session.gtid_domain_id=@@global.gtid_domain_id;

Probably some more things will come up during the work, but this looks like a reasonable start.

Details:
Mentor:

Details:

MDEV-4989

Mentor:

Andrei Elkin

Implement UPDATE with result set/

Add an UPDATE operation that returns a result set of the changed rows to the client.

UPDATE [LOW_PRIORITY] [IGNORE] tbl_name
    SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ...
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]
RETURNING select_expr [, select_expr ...]

I'm not exactly sure how the corresponding multiple-table syntax should look like, or if it is possible at all. But already having it for single-table updates would be a nice feature.

Details:
Mentor:

Details:

MDEV-5092

Mentor:

Igor Babaev

optimizer trace/

In MySQL, Optimizer trace is a JSON object recording the execution path through the optimizer, decisions that were made and the reasons for them. See optimizer-tracing.html

Users were asking for MariaDB to have a similar feature.

Details:
Mentor:

Details:

MDEV-6111

Mentor:

Varun Gupta

Automatic provisioning of slave/

Idea

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave.

Setting up a new slave currently involves: 1) installing MariaDB with initial database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying initial data from the master to the slave; and 4) starting the slave with START SLAVE. The idea is to automate step (3), which currently needs to be done manually.

The syntax could be something as simple as

LOAD DATA FROM MASTER

This would then connect to the master that is currently configured. It will load a snapshot of all the data on the master, and leave the slave position at the point of the snapshot, ready for START SLAVE to continue replication from that point.

Implementation:

The idea is to do this non-blocking on the master, in a way that works for any storage engine. It will rely on row-based replication to be used between the master and the slave.

At the start of LOAD DATA FROM MASTER, the slave will enter a special provisioning mode. It will start replicating events from the master at the master's current position.

The master dump thread will send binlog events to the slave as normal. But in addition, it will interleave a dump of all the data on the master contained in tables, views, or stored functions. Whenever the dump thread would normally go to sleep waiting for more data to arrive in the binlog, the dump thread will instead send another chunk of data in the binlog stream for the slave to apply.

A "chunk of data" can be:

  • A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION

  • A range of N rows (N=100, for example). Each successive chunk will do a range scan on the primary key from the end position of the last chunk.

Sending data in small chunks avoids the need for long-lived table locks or transactions that could adversely affect master performance.

The slave will connect in GTID mode. The master will send dumped chunks in a separate domain id, allowing the slave to process chunks in parallel with normal data.

During the provisioning, all normal replication events from the master will arrive on the slave, and the slave will attempt to apply them locally. Some of these events will fail to apply, since the affected table or row may not yet have been loaded. In the provisioning mode, all such errors will be silently ignored. Proper locking (isolation mode, eg.) must be used on the master when fetching chunks, to ensure that updates for any row will always be applied correctly on the slave, either in a chunk, or in a later row event.

In order to make the first version of this feature feasible to implement in a reasonable amount of time, it should set a number of reasonable restrictions (which could be relaxed in a later version of the feature):

  • Give up with an error if the slave is not configured for GTID mode (MASTER_USE_GTID != NO).

  • Give up with error if the slave receives any event in statement-based binlogging (so the master must be running in row-based replication mode, and no DDL must be done while the provisioning is running).

  • Give up with an error if the master has a table without primary key.

  • Secondary indexes will be enabled during the provisioning; this means that tables with large secondary indexes could be expensive to provision.

Details:
Mentor:

Details:

MDEV-7502

Mentor:

Andrei Elkin

connection encryption plugin support/

As a follow-on to MDEV-4691 we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that.

From having written GSSAPI for the internal interface, there were a couple things I would like to see in the plugin encryption interface.

First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others.

Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around).

This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.)

Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it.

It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!

Details:
Mentor:

Details:

MDEV-9090

Mentor:

Vladislav Vaintroub

Aggregate Window Functions/

Currently only a few aggregate function are supported as window functions, the list can be found here aggregate-functions-as-window-functions

So in MDEV-7773, support for creating of custom aggregate functions was added. Now this task would deal with extending that feature and make custom aggregate functions behave as window functions

An example of a creating a custom aggregate function is given below:

CREATE AGGREGATE FUNCTION agg_sum(x INT) RETURNS DOUBLE
BEGIN
  DECLARE z DOUBLE DEFAULT 0;
  DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN z;
  LOOP
    FETCH GROUP NEXT ROW;
    SET z = z + x;
  END LOOP;
END|
Details:
Mentor:

Details:

MDEV-11263

Mentor:

Varun Gupta

True ALTER LOCK=NONE on slave/

Currently no true LOCK=NONE exists on slave. Alter table is first committed on master then it is replicated on slaves. The purpose of this task is to create a true LOCK=NONE

Implementation Idea

Master will write BEGIN_DDL_EVENT in binlog after it hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT.

On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On reciving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute untill ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are avaliable then event will be enqueued, then If we recieved rollback event the we will simply discard event from queue, If we recieved commit event then SQL thread will syncrolysly process DDL event.

Details:
Mentor:

Details:

MDEV-11675

Mentor:

Sachin Setiya

Improve mysqltest language/

mysqltest has a lot of historical problems:

  • ad hoc parser, weird limitations

  • commands added as needed with no view over the total language structure

  • historical code issues (e.g. casts that become unnecessary 10 years ago) etc

A lot can be done to improve it.

Ideas

  • control structures, else in if, break and continue in while, for (or foreach) loop

  • proper expression support in let, if, etc

  • rich enough expressions to make resorting to sql unnecessary in most cases

  • remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)

  • remove complex commands that do many sql statements under the hood, if they can be scripted, e.g. sync_slave_with_master

  • remove over-verbose treatment of rpl test failures

  • scoped variables

  • parameters for the source command

  • remove dead code

Details:
Mentor:

Details:

MDEV-12130

Mentor:

Sergei Golubchik

Cassandra Storage Engine v2, based on DataStax C++ driver/

MariaDB 10.1 had Cassandra Storage Engine which was developed for Cassandra 1.1.x. Back then, Cassandra provided a Thrift API, and that was what Cassandra-SE used.

Then, Cassandra 2.0 switched to using a different network protocol (and also changed the data model).

This task is to develop a Cassandra Storage Engine V2 using DataStax's C++ client library (cpp-driver).

See also: MDEV-8947 was a previous attempt to implement this engine. Unfortunately it didn't even produce a skeleton engine.

Details:
Mentor:

Details:

MDEV-12296

Mentor:

Sergei Golubchik

Histograms with equal-width bins in MariaDB/

Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval

(max(f)-min(f))/k *(i-1), max(f)-min(f))/k *i-1).

It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N.

To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:

ANALYZE FAST TABLE tbl [ WITH n ROWS ] [SAMPLING p PERCENTS ]
   PERSISTENT FOR COLUMNS (col1 [IN RANGE r] [WITH k INTERVALS],...)

Here:

  • 'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.

  • 'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.

  • 'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.

  • WITH k INTERVALS says how many bins are included in the histogram. If it is omitted this value is taken from the system variable histogram_size.

Details:
Mentor:

Details:

MDEV-12313

Mentor:

Vicentiu Ciorbaru

Implement multiple-table UPDATE/DELETE returning a result set/

A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:

  • first performs join operations

  • for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set

  • after this it updates the matching rows

A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:

  • first performs join operations

  • for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set

  • after this it deletes the matching rows

Details:
Mentor:

Details:

MDEV-12326

Mentor:

Igor Babaev

Blacklist for access control a.k.a. "negative grants"/

Currently, MariaDB privilege system only perform whiltelist check for access control to certain database, table and column. This makes it difficult if we need to block access to certain database/table/column while allow for all others. — A good solution would be to allow to REVOKE anything that a user is able to do — not only exactly those grants that were granted to a user, but also a subset. Like

GRANT SELECT ON some_database.* TO a_user@%;
REVOKE SELECT ON some_database.secret_table FROM a_user@%;
Details:
Mentor:

Details:

MDEV-14443

Mentor:

Vicentiu Ciorbaru

Add syntax to manually encrypt/decrypt InnoDB's system tablespace/

Currently, the InnoDB system tablespace can only be automatically encrypted/decrypted by the background encryption threads if innodb_encrypt_tables=ON|FORCE, innodb_encryption_threads>0, and innodb_encryption_rotate_key_age>0. There is no way to manually encrypt/decrypt the tablespace.

File-per-table tablespaces can be manually encrypted with:

ALTER TABLE tab ENCRYPTION=YES;

File-per-table tablespaces can be manually decrypted with:

ALTER TABLE tab ENCRYPTION=NO;

Some users want a similar method that would allow them to manually encrypt/decrypt the InnoDB system tablespace.

This is loosely related to MDEV-14571, since both issues are related to the fact that the system tablespace can only be encrypted/decrypted by the background threads.

Details:
Mentor:

Details:

MDEV-14610

Mentor:

Jan Lindström

Control over memory allocated for SP/PS/

SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)

Task minimum:

Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.

Other ideas:

  • Automatic stop allocation in debvugging version after second execution and call exception on attempt.

  • Information schema by threads and SP/PS with information about allocated and used memory

Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found.

MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.

Details:
Mentor:

Details:

MDEV-14959

Mentor:

Oleksandr Byelkin

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2017

We participated in the Google Summer of Code 2017 (we have participated previously in 2016, 2015, 2014, and 2013). The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor

List of tasks

The complete list of tasks suggested for GSoC 2017 is located in the MariaDB Jira. A subset is listed below.

Support for GTID in mysqlbinlog

The mysqlbinlog tool needs to be updated to understand the replication feature called Global Transaction IDs (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-4989

Skills:

C/C++

Mentor:

Students Interested:

2

Automatic provisioning of slave

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-7502

Skills:

C/C++

Mentor:

Students Interested:

3

GIS enhancements

GIS enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-5813

Skills:

C/C++

Mentor:

Holyfoot

Students Interested:

1

mysqltest improvements

mysqltest is a client utility that runs tests in the mysql-test framework. It sends sql statements to the server, compares the results with the expected results, and uses a special small DSL for loops, assignments, and so on. It's pretty old and very ad hoc with many strange limitations. It badly needs a proper parser and a consistent logical grammar.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-12130

Skills:

C/C++

Mentor:

Sergei Golubchik

Students Interested:

3

Allow multiple alternative authentication methods for the same user

Currently one can specify only one authentication method per user. It would make a lot of sense to support multiple authentication methods per user. PAM-style. For example, one may want to authenticate using unix_socket when connecting locally, but ask for a password if connecting remotely or if unix_socket authentication failed.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-11340

Skills:

C/C++

Mentor:

Sergei Golubchik

Students Interested:

4

connection encryption plugins

Encrypting the client-server communications is closely related to authentication. Normally SSL is used for the on-the-wire encryption, and SSL can be used to authenticate the client too. GSSAPI can be used for authentication, and it has support for on-the-wire encryption. This task is about making on-the-wire encryption pluggable.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-9090

Skills:

C/C++

Mentor:

Sergei Golubchik

Students Interested:

2

OSS-Fuzz configuration for MariaDB

This would involve randomizing a bunch of queries (RQG based?), configurations and replication setups to search for segfaults, race conditions and perhaps invalid results.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-12178

Skills:

C, SQL

Mentor:

Daniel Black

Students Interested:

1

NUMA for MariaDB

This would involve organising a bunch of memory and threads to run on the same NUMA node. Attention to detail to ensure no additional race conditions get added in the process. A good understanding of systems programming would be useful. Ability to implement WIndows NUMA support at the same time as Linux NUMA support would be advantageous.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-12213

Skills:

C, locking and threads, Windows system programming and Innodb internals would be a plus.

Mentor:

Daniel Black

Students Interested:

1

Cassandra Storage Engine V2

Current Cassandra Storage Engine was developed against Cassandra 1.1 and it uses Thrift API to communicate with Cassandra. However, starting from Cassandra 1.2, the preferred way to access Cassandra database is use CQL (Cassandra Query Language) and DataStax C++ Driver (cpp-driver). Thrift-based access is deprecated and places heavy constraints on the schema.

This task is about re-implementing Cassandra Storage Engine using DataStax C++ Driver and CQL.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-12296

Skills:

C/C++

Mentor:

Sergei Petrunia

Students Interested:

1

ColumnStore: Add proper NULL support

At the moment NULL is just the maximum integer for a column (or empty string for VARCHAR/CHAR). We need a mechanism to store NULLs separately to give us full type ranges.

ColumnStore: Add full support for DECIMAL

Right now it is cast to double which is not great for obvious reasons. It will mean modifying a lot of ColumnStore's version of MariaDB's function implementations and allowing column files to store more than 8 bytes per field.

Details:
Skills:
Mentor:
Students Interested:

Details:

MCOL-641

Skills:

C/C++

Mentor:

Andrew Hutchings

Students Interested:

1

ColumnStore: Full UTF8 support.

This includes collations and anything that works on the length of the string.

Suggest a task

Do you have an idea of your own, not listed above or in Jira? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2016

We participated in the Google Summer of Code 2016 (we have participated previously in 2015, 2014, and 2013). The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB MaxScale which is a pluggable database proxy.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

Please keep in mind that in April we travel a lot (conferences, busy time focusing on a release), so if you have a question and nobody on IRC answers, don't feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!

The complete list of tasks suggested for GSoC 2016 is located in the MariaDB Jira. A subset is listed below.

Support for GTID in mysqlbinlog

The mysqlbinlog tool needs to be updated to understand the replication feature called Global Transaction IDs (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-4989

Skills:

C/C++

Mentor:

Kristian Nielsen

Students Interested:

2

Aggregate stored functions

With CREATE FUNCTION one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like SUM, AVG, etc). This task is to add support for aggregate stored functions.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-7773

Skills:

C/C++

Mentor:

Sergei Golubchik

Students Interested:

6

GIS enhancements

GIS enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-5813

Skills:

C/C++

Mentor:

Holyfoot

Students Interested:

2

Indexes for BLOBs (in MyISAM and Aria)

MyISAM and Aria support special kinds of indexes that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.

This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-371

Skills:

C/C++

Mentor:

Sergei Golubchik

Students Interested:

3

Provide GTID support for MHA

MySQL Master HA (MHA) is a tool to assist with automating master failover and slave promotion within short downtime, without suffering from replication consistency problems, and without performance penalty. We would like to have this tool support MariaDB 10 GTIDs.

Skills:
Mentor:
Students Interested:

Skills:

Perl

Mentor:

Colin Charles

Students Interested:

2

Import and export popular data formats from and to dynamic columns

Provide import and export functions for popular data formats like JSON, XML (limited), PHP, ... for Connector/C and MariaDB Server (which use same code base for dynamic columns)

Details:
Skills:
Mentor:
Students Interested:

Details:

CONC-125

Skills:

C

Mentor:

Oleksandr Byelkin, Georg Richter

Students Interested:

2

MaxScale filter to capture incoming operations for consumption in external sources

Design a filter that will capture incoming inserts, updates and deletes, for specified tables (as regex) in a separate log file that is consumable as JSON or CSV form. So that external ETL processes can process it for data uploading into DWH or big data platform. Optionally a plugin that takes this log into a Kafka broker that can put this data on Hadoop node can be developed as next step.

Details:
Skills:
Mentor:
Students Interested:

Details:

MXS-2

Skills:

C/C++

Mentor:

Markus Makela

Students Interested:

2

MaxScale filter to real Microsoft SQL Server syntax

Develop a MaxScale filter that will translate SQL Server syntax to MariaDB syntax. Develop a SQL Server client protocol plugin.

Details:
Skills:
Mentor:
Students Interested:

Details:

MXS-1

Skills:

C/C++

Mentor:

Markus Makela & Massimiliano Pinto

Students Interested:

1

Additional libraries for MaxScale's experimental Luafilter

Create additional entry points into MaxScale that the Lua side scripts can use. Various types of functions can be added ranging from SQL processing functions to utility functions which communicate with MaxScale.

Details:
Skills:
Mentor:
Students Interested:

Details:

MXS-593

Skills:

C/C++

Mentor:

Markus Makela

Students Interested:

2

Query Injection Filter

Create a filter which can inject queries before the client executes any queries. This filter could be used for various purposes for example auditing.

Details:
Skills:
Mentor:
Students Interested:

Details:

MXS-591

Skills:

C/C++

Mentor:

Markus Makela

Students Interested:

1

Cassandra Storage Engine V2

Current Cassandra Storage Engine was developed against Cassandra 1.1 and it uses Thrift API to communicate with Cassandra. However, starting from Cassandra 1.2, the preferred way to access Cassandra database is use CQL (Cassandra Query Language) and DataStax C++ Driver (cpp-driver). Thrift-based access is deprecated and places heavy constraints on the schema.

This task is about re-implementing Cassandra Storage Engine using DataStax C++ Driver and CQL.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-8947

Skills:

C/C++

Mentor:

Sergei Petrunia

Students Interested:

3

NO PAD collations

Currently MariaDB ignores trailing spaces when comparing values of the CHAR, VARCHAR, TEXT data types. In some cases it would be nice to take trailing spaces into account. This task will introduce a set of new collations that will make this possible.

Details:
Skills:
Mentor:
Students Interested:

Details:

MDEV-9711

Skills:

C/C++

Mentor:

Alexander Barkov

Students Interested:

2

Suggest a task

Are you a student interested in working on something? Let us know here.

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2015

We participated in the Google Summer of Code 2015. MariaDB and the MariaDB Foundation believe we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB MaxScale which is a pluggable database proxy.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).

Please keep in mind that in April we travel a lot (conferences, busy time focusing on a release), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!

At the moment, tasks that may be suitable for GSoC 2015 are listed in the MariaDB Issue Tracker under «GSoC 2015 tasks»

Some suggested projects

Enhancing mysqlbinlog

This project consists of two parts -- it can either be performed by 2 students or 1 student with the relevant skills:

Support for GTID in mysqlbinlog

The mysqlbinlog tool needs to be updated to understand the replication feature called Global Transaction IDs (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.

Details:
Skills:
Mentor:

Details:

MDEV-4989

Skills:

C/C++

Mentor:

Kristian Nielsen

Remote backup of binary logs

mysqlbinlog in MySQL 5.6 also supports streaming binlog servers for backups. This is important as the MHA tool can also use this feature.

Details:
Skills:
Mentor:

Details:

MDEV-5993

Skills:

C/C++

Mentor:

Sergey Vojtovich

Indexes on virtual columns

We have the concept of virtual (non-materialized) columns, and currently to have an index on a virtual column one has to materialize it. To support indexes on fully virtual columns, a storage engine must call back into the server to calculate the value of the virtual column.

Details:
Skills:
Mentor:

Details:

MDEV-5800

Skills:

C/C++

Mentor:

Sergei Golubchik

Table functions

User functions (UDF-like) that return a table, rows and columns. It should be possible to use it in other statements. A possible implementation could be: the function exports a generator, we create a handler of the hidden "storage engine" class, no indexes, and convert this generator to rnd_init/rnd_next. Need to disable rnd_pos somehow. Alternatively, it can materialize the result set in a temporary table (like information_schema does), then this table can be used normally.

Details:
Skills:
Mentor:

Details:

MDEV-5199

Skills:

C/C++

Mentor:

Sergei Golubchik

Aggregate stored functions

With CREATE FUNCTION one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like SUM, AVG, etc). This task is to add support for aggregate stored functions.

Details:
Skills:
Mentor:

Details:

MDEV-7773

Skills:

C/C++

Mentor:

Sergei Golubchik

GIS enhancements

GIS enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.

Details:
Skills:
Mentor:

Details:

MDEV-5813

Skills:

C/C++

Mentor:

Holyfoot

Port InnoDB memcached interface to MariaDB

MySQL 5.6 has a memcached plugin to InnoDB. MySQL 5.7 has improved performance of this. The task would be to port this to run against MariaDB, and make it work against XtraDB/InnoDB for the 10.2 series of MariaDB.

Details:
Skills:
Mentor:

Details:

MDEV-4674

Skills:

C/C++

Mentor:

Colin Charles

Automatic provisioning of slave

The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave.

Details:
Skills:
Mentor:

Details:

MDEV-7502

Skills:

C/C++

Mentor:

Kristian Nielsen

Indexes for BLOBs (in MyISAM and Aria)

MyISAM and Aria support special kinds of indexes that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.

This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.

Details:
Skills:
Mentor:

Details:

MDEV-371

Skills:

C/C++

Mentor:

Sergei Golubchik

Improved temporary tables

It is a well-known and very old MySQL/MariaDB limitation that temporary tables can only be used once in any query; for example, one cannot join a temporary table to itself. This task is about removing this limitation

Details:
Skills:
Mentor:

Details:

MDEV-5535

Skills:

C/C++

Mentor:

Sergei Golubchik

Provide GTID support for MHA

MySQL Master HA (MHA) is a tool to assist with automating master failover and slave promotion within short downtime, without suffering from replication consistency problems, and without performance penalty. We would like to have this tool support MariaDB 10 GTIDs.

Skills:
Mentor:

Skills:

Perl

Mentor:

Colin Charles

Import and export popular data formats from and to dynamic columns

Provide import and export functions for popular data formats like JSON, XML (limited), PHP, ... for Connector/C and MariaDB Server (which use same code base for dynamic columns)

Details:
Skills:
Mentor:

Details:

CONC-125

Skills:

C

Mentor:

Oleksandr Byelkin, Georg Richter

MaxScale filter to capture incoming operations for consumption in external sources

Design a filter that will capture incoming inserts, updates and deletes, for specified tables (as regex) in a separate log file that is consumable as JSON or CSV form. So that external ETL processes can process it for data uploading into DWH or big data platform. Optionally a plugin that takes this log into a Kafka broker that can put this data on Hadoop node can be developed as next step.

Details:
Skills:
Mentor:

Details:

MXS-2

Skills:

C/C++

Mentor:

Markus Makela

MaxScale filter to real Microsoft SQL Server syntax

Develop a MaxScale filter that will translate SQL Server syntax to MariaDB syntax. Develop a SQL Server client protocol plugin.

Details:
Skills:
Mentor:

Details:

MXS-1

Skills:

C/C++

Mentor:

Markus Makela & Massimiliano Pinto

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2014

We participated in Google Summer of Code 2014. MariaDB and the MariaDB Foundation believe we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Or subscribe to maria-developers@lists.launchpad.net. Or both.

Please keep in mind that in April we travel a lot (conferences, busy time), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!

LDAP authentication plugin

We would like the authentication system to be able to authenticate against a LDAP Directory Server.

See pluggable authentication.

Skills: C, working knowledge of LDAP

Mentor: Sergei Golubchik

Self-Tuning Optimizer

this project is taken

One of the reasons for bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but the optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table.

Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also — somewhat — on the application.

A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.

See MDEV-350.

Skills: C/C++

Mentor: Sergei Golubchik

Port InnoDB memcached interface to MariaDB

MySQL 5.6 has a memcached plugin to InnoDB. MySQL 5.7 has improved performance of this. The task would be to port this to run against MariaDB, and make it work against XtraDB/InnoDB for the 10.1 series of MariaDB.

See MDEV-4674 for more.

Skills: C/C++

Mentor: Colin Charles

GIS enhancements to MariaDB

GIS enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), as well as making sure we are fully OpenGIS compliant. MDEV-5813

Skills: C

Mentor: Holyfoot

User defined events

User defined events are supported on several other databases in different form and semantics. Events are used to signal a named event in the database. Applications can use named events instead of polling, which uses more resources.

See MDEV-5532 for more.

Skills: C/C++

Mentor: Jan Lindstrom, Sergei Golubchik

Indexes for BLOBs (in MyISAM and Aria)

MyISAM and Aria support special kinds of indexes that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.

This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.

Skills: C++

Mentor: Sergei Golubchik

CREATE OR REPLACE, CREATE IF NOT EXISTS, and DROP IF EXISTS

this project is taken

This task is to add support for OR REPLACE and IF EXISTS / IF NOT EXISTS to all CREATE and DROP variants for all objects (where it makes sense). MDEV-5359

Skills: C++

Mentor: Sergei Golubchik

Statistically optimize mysql-test runs by running less tests

this project is taken

This is a research not a coding task. See MDEV-5776

Skills: SQL, Perl/Python or other language of your choice, mathematical statistics

Mentor: Elena Stepanova, Sergei Golubchik

Improved temporary tables

It is a well-known and very old MySQL/MariaDB limitation that temporary tables can only be used once in any query; for example, one cannot join a temporary table to itself. This task is about removing this limitation. MDEV-5535

Skills: C++

Mentor: Sergei Golubchik

Table UDFs

Implement a new plugin type that adds support for table UDFs — loadable User-Defined Functions that return tables, not scalar values.

Skills: C++

Mentor: Sergei Golubchik

GTID support in mysqlbinlog

The mysqlbinlog tool needs to understand global transaction ids. In particular, it should be possible to start and end the dump at the specified GTID. Both when reading binlog files and when connecting to a running server. See MDEV-4989.

If time permits, other client programs could be extended similarly, like mysqldump --master-data or the --sync-with-master command in mysqltest.

Skills: C++

Mentor:

See Also

  • See also «GSoC 2014 tasks» list in Jira.

This page is licensed: CC BY-SA / Gnu FDL

Google Summer of Code 2013

We participated in Google Summer of Code 2013. MariaDB and the MariaDB Foundation believes we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and we also work on MariaDB Galera Cluster which allows you to scales your reads & writes.

Where to start

Please join us at irc.freenode.net at #maria to mingle with the community. Or subscribe to maria-developers@lists.launchpad.net. Or both.

Please keep in mind that in April we travel a lot (conferences, busy time), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net.

LDAP authentication plugin

We would like the authentication system to be able to authenticate against a LDAP Directory Server.

See pluggable authentication.

Skills: C, working knowledge of LDAP

Mentor: Sergei Golubchik

Kerberos authentication plugin

this project is taken

Kerberos is a security mechanism used in a lot of financial institutions. A MySQL plugin that allows authentication against Kerberos is the goal here.

See pluggable authentication.

Skills: C/C++, working knowledge of Kerberos

Mentor: Sergei Golubchik

Active Directory authentication plugin

The Microsoft Windows world is all about Active Directory and upstream MySQL Enterprise already has this feature (though its a paid offering). It would be great to have an open source equivalent.

See pluggable authentication.

Skills: C/C++, working knowledge of Active Directory/SAMBA, Windows-based development environment

Mentor: Sergei Golubchik, Vladislav Vaintroub

Keystone authentication plugin

Keystone is the OpenStack Identity Service. The idea would be to ensure that MariaDB can authenticate to Keystone directly.

Skills: Python, C/C++

Mentor: Mark Riddoch

Regex enhancements

this project is taken

MySQL and MariaDB use an old regex library, it works bytewise, and thus only supports one byte character set. It needs to be replaced by a modern multi-byte character set aware regex library.

Additionally a much requested REGEX_REPLACE function should be implemented. (See also mysql-udf-regexp for some UDF code that could be used as a starting point for this)

Detailed task description: MDEV-4425

Skills: C/C++

Mentor: Alexander Barkov

Self-Tuning Optimizer

One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table.

Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also — somewhat — on the application.

A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.

See MDEV-350.

Skills: C/C++

Mentor: Sergei Golubchik

Roles

this project is taken

Roles, close to SQL:2003 standard. See MDEV-4397.

Skills: C/C++

Mentor: Sergei Golubchik

Potential list

suggested development

This page is licensed: CC BY-SA / Gnu FDL

Google Season of Docs

Remove from the navigation. Discuss those pages with Max.

MariaDB applied to participate in the first Google Season of Docs.

Google Season of Docs 2019

Google Season of Docs 2019

MariaDB applied to participate in the first Google Season of Docs. Unfortunately, as a pilot project, Google were only able to accept a limited number of applications, and we were unsuccessful.

Where to Start

Please join us on Zulip to mingle with the community. You can also subscribe to maria-docs@lists.launchpad.net, the documentation mailing list.

List of beginner friendly bugs

List of Tasks

Stored Procedures Documentation

The Stored Procedures capabilities of MariaDB Server are critical to producing large-scale applications. The current documentation lacks sufficient examples, and the examples warrant testing. Getting Started content would make this easier to adopt. MariaDB's implementation is based on ISO SQL/PSM.

Getting Started with Connector/C and Connector/J

MariaDB Connector/C and MariaDB Connector/J provide the ability to access a MariaDB Server from applications built in C/C++ and Java, respectively. The current documentation for these connectors lacks Getting Started guides.

Spider Documentation

The Spider Storage Engine uses partitioning to provide data sharding through multiple servers. This task involves greatly expanding the existing documentation, including more detail about when to use Spider, basic usage, updating the Spider Feature Matrix, detailed examples of the effects of the server system variables and table system variables, as well as the Spider functions. You will also ensure changes in the most recent Spider releases are properly documented.

Mroonga Documentation

Mroonga is a full text search storage engine based on Groonga, which is an open-source CJK-ready fulltext search engine using column base. This project involves greatly expanding the existing MariaDB documentation on the use of this storage engine. A detailed tutorial and user guide, including examples of the various Mroonga user-defined functions, system variables and the effects of changing their settings, as well as the parser and parser settings.

Translation

You will choose a major language and ensure that a substantial subsection of the documentation is translated into that language. See translations.

Migration Documentation

You will choose a major relational DBMS, and, focusing on the most recent stable releases, document the process to migrate to MariaDB, including MariaDB equivalents to features in that system, and a detailed list of features that exist in one but not the other, as well as possible workarounds. For an example, see the documentation on migrating from MySQL to MariaDB (bearing that MariaDB is a MySQL fork, and is substantially more similar to MySQL than to other systems).

Loaded from the MariaDB issue tracker Loading Issues from [Jira](https://jira.mariadb.org/issues/?jql=labels=gsdocs19 and project=mdev order by key)

%fields.summary%

%renderedFields.description%

Details:
Mentor:

Details:

%key%

Mentor:

%fields.assignee.displayName%

<>

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Google Season of Docs 2020

MariaDB applied to participate in the 2020 Google Season of Docs, but was unsuccessful.

Where to Start

Please join us on Zulip to mingle with the community. You can also subscribe to maria-docs@lists.launchpad.net, the documentation mailing list.

List of Tasks

Migration Documentation

You will choose a major relational DBMS, and, focusing on the most recent stable releases, document the process to migrate to MariaDB, including MariaDB equivalents to features in that system, and a detailed list of features that exist in one but not the other, as well as possible workarounds. For an example, see the work-in-progress Migrating from SQL Server to MariaDB as well as the documentation on migrating from MySQL to MariaDB (bearing that MariaDB is a MySQL fork, and is substantially more similar to MySQL than to other systems).

Stored Procedures Documentation

The Stored Procedures capabilities of MariaDB Server are critical to producing large-scale applications. The current documentation lacks sufficient examples, and the examples warrant testing. Getting Started content would make this easier to adopt. MariaDB's implementation is based on ISO SQL/PSM.

Getting Started with Connector/C and Connector/J

MariaDB Connector/C and MariaDB Connector/J provide the ability to access a MariaDB Server from applications built in C/C++ and Java, respectively. The current documentation for these connectors lacks Getting Started guides.

Spider Documentation

The Spider Storage Engine uses partitioning to provide data sharding through multiple servers. This task involves greatly expanding the existing documentation, including more detail about when to use Spider, basic usage tutorials, updating the Spider Feature Matrix, detailed examples of the effects of the server system variables and table system variables, as well as the Spider functions. You will also ensure changes in the most recent Spider releases are properly documented.

Mroonga Documentation

Mroonga is a full text search storage engine based on Groonga, which is an open-source CJK-ready fulltext search engine using column base. This project involves greatly expanding the existing MariaDB documentation on the use of this storage engine. A detailed tutorial and user guide, including examples of the various Mroonga user-defined functions, system variables and the effects of changing their settings, as well as the parser and parser settings.

Translation

You will choose a major language and ensure that a substantial subsection of the documentation is translated into that language. See translations.

Loaded from the MariaDB issue tracker Loading Issues from [Jira](https://jira.mariadb.org/issues/?jql=labels=gsdocs19 and project=mdev order by key)

%fields.summary%

%renderedFields.description%

Details:
Mentor:

Details:

%key%

Mentor:

%fields.assignee.displayName%

<>

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know!

This page is licensed: CC BY-SA / Gnu FDL

Product Development

Articles of interest to MariaDB developers, including contributors from the MariaDB Community

MariaDB Quality Development Rules

Those are quality-improving rules that everyone with a write access to the MariaDB repository is expected to follow:

  • Respect previews

    • A feature can be pushed into an RC release X.Y.1 only after it was in an earlier preview. Normally, in X.Y.0, but generally any earlier preview will do.

  • Tester sign-off for all new features

    • A feature being in the preview is a necessary, but not a sufficient condition. It needs to be tested (by a dedicated tester, not a developer) and the tester has to say it's good enough

    • Testing might discover bugs, that's normal, they have to be fixed before the feature is pushed (or — at the tester's discretion — they could be fixed after the push, if they're minor)

    • For tester's sake, develop in a dedicated branch with the version and the issue number in the branch name, for example, bb-11.1-[MDEV-11111](https://jira.mariadb.org/browse/MDEV-11111), and let the release master know when it's ready, so that they could cherry-pick it into a preview.

  • Features must not be pushed directly into the GA release bypassing the above

    • Keep an eye on the release schedule (jira.mariadb.org) to know when the next release is due

    • Or simply remember that preview happen in mid-March/mid-June/mid-September/mid-December, innovation releases — in early February, early May, early August, early November, see mariadb-release-model.

  • Get your commit reviewed.

    • If you don't know who would be the best reviewer for your PR, assign whoever github suggests — this person can reassign, if github was wrong

  • Don't push into the red (in buildbot) branch

    • Fix failures first (or make sure they're fixed)

    • Eventually buildbot will evolve to simply not let you to

  • Blocker issues block a release

    • we don't release if there's a Blocker bug open, that's why they're called blockers

    • so fix them asap, as your first priority, you don't want all the users to wait specifically for you

  • There's no penalty for breaking these rules, we hope that everyone wants quality bug-free releases anyway

    • but if there will be serial violators, some kind of a penalty can be introduced later

This page is licensed: CC BY-SA / Gnu FDL

Uploading Package to PPA

After creating a Launchpad account:

Docker build, cloning the MariaDB repository and mapping it to the docker container

  1. mkdir mariadb-source

  2. cd mariadb-source

  3. vi Dockerfile

  4. Copy the following contents to Dockerfile:

# MariaDB 10.3 Ubuntu 17.10 build environment
# Published as mariadb-10-3-ubuntu-17.10-build-env
FROM ubuntu:17.10
RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
apt-get install -y --no-install-recommends \
systemd \
devscripts \
build-essential \
lsb-release \
equivs \
git \
curl \
git-buildpackage \
nano \
vim \
pristine-tar

RUN curl -skO https://raw.githubusercontent.com/ottok/mariadb-10.1/ubuntu-17.10/debian/control

ENV GIT_SSL_NO_VERIFY true

RUN mk-build-deps -t 'apt-get -y -o Debug::pkgProblemResolver=yes --no-install-recommends' -i control

ENV container docker
ENV DEBIAN_FRONTEND noninteractive
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
  1. Run docker build . -t ubuntu-17.10-packaging

  2. Do git clone of the latest repository:

cd && mkdir debian-packaging && cd debian-packaging && git clone https://salsa.debian.org/mariadb-team/mariadb-10.1.git

Generate, publish and upload PGP key

  1. Generate OpenPGP key with the following command:

$ gpg --gen-key

  • select (1) RSA and RSA

  • Keysize: accept 2048

  • Valid for: accept 0

  • Type name, email and comment (comment is optional)

  • Type 0

  • Type passphrase (twice)

  • Follow instructions to help generate a random key

  • Keep the Key ID and fingerprint text, they are needed in the next step

Set generated key as default in ~/.bashrc

$ nano ~/.bashrc

[.. add this ..]

export GPGKEY=<key_id>

Restart GPG-agent and source '/.bashrc', or restart session

  1. Publish the key to the key server:

gpg --keyserver keyserver.ubuntu.com --send-keys 12345678 and substitute 12345678 with your key's id

  • If this gives timeout error, keep re-trying after a while

  1. Upload the key's fingerprint here: Upload ImportingYourPGPKey fingerprint here +editpgpkeys

  2. gpg --export [your-key-id] > ~/debian-packaging/pub.key

  3. gpg --export-secret-key [your-key-id] > ~/debian-packaging/secret.key

gpg -k

^Should show the key

  1. How to upload:How-to-upload-to-Launchpad-PPA-repository-(.deb-packages)

  2. Open /etc/devscripts.conf

And look for this line:

DEBSIGN_MAINT

Uncomment it and add your name there

export DEBEMAIL=[your-email-id]

From inside the container

  1. docker run -v ~/debian-packaging/:/repo -it ubuntu-17.10-packaging bash

  2. apt-get install devscripts

  3. gpg --import pub.key

  4. gpg --import secret.key

  5. gpg -k

  6. cd /repo/mariadb-10.1 && git fetch && git checkout pristine-tar && git checkout ubuntu-17.10

  7. git clean -dffx && git reset --hard HEAD

  8. export DEB_BUILD_OPTIONS="parallel=10 nocheck" or export DEB_BUILD_OPTIONS="parallel=5 nocheck"

  9. Go to /repo folder (Inside docker) and delete all the files except mariadb-10.1 folder:

rm *

gbp buildpackage

For re-running the set up container

  1. To generate an ID, run:docker commit <container-id> This will generate an ID.

For restarting the same container again use this ID:docker run -v ~/debian-packaging/:/repo -it <ID> bash

  1. Last command for uploading package to PPA:

backportpackage -u <your-ppa-address> -d <ubuntu-version-to-backport-to> -S ~<a-version-suffix-name-for-this-package> <the-most-recent-dsc-file>

Example:

backportpackage -u ppa:cvicentiu/mariadb-10.0-dev2 -d bionic -S ~testtry mariadb-10.1_10.1.30-0ubuntu0.17.10.1.dsc

Run this command in the /repo folder, where the .dsc file is located It should ask for the gpg key password again

  • Docker tutorial available here:edit#slide=id.p4

This page is licensed: CC BY-SA / Gnu FDL

Debugging MariaDB

This section is for articles on debugging MariaDB

Compiling MariaDB for Debugging

Compiling MariaDB for Debugging Using the CMAKE_BUILD_TYPE Option

This option enables multiple debug instrumentation aspects within the MariaDB server that provided more detailed information around complex parts of the server and can be used to implement and run tests where the concurrent execution of multiple threads must be controlled to achieve a specific state. If you are not doing this, the following option is sufficient.

Compiling MariaDB with full debug information includes all code symbols and also new code to do internal testing of structures and allow one to trace MariaDB execution. A full debug binary will be notably slower than a normal binary (30%). Most of this overhead can be removed by disabling -DWITH_DBUG_TRACE=OFF

You can configure build a debug build by executing cmake and by setting the CMAKE_BUILD_TYPE option to Debug. For example:

cmake -DCMAKE_BUILD_TYPE=Debug source_directory

To compile:

cmake --build .

You can find a list of the needed packages/libraries for building on Linux here.

Building Optimized Build With Debug Symbols

To build MariaDB with debug symbols, to get better stack traces and to be able to debug the binary with gdb, you need to supply the -Og and -g3 options to the gcc compiler for an debuggable, with limited optimization tradeoffs..

Pass these are options to cmake like:

cmake -DCMAKE_CXX_COMPILE_FLAGS='-Og -g3' -DCMAKE_C_COMPILE_FLAGS='-Og -g3' source_directory

To compile:

cmake --build .

Doing a Debug Build on Debian/Ubuntu

To build a "mariadbd" binary with debugging enabled that uses the same parameters as the ones used in Debian/Ubuntu binary packages, you must do as follows (you must have a deb-src line of one of the MariaDB repositories on your /etc/apt/sources.list in order to do that):

apt install build-essential devscripts fakeroot dpkg-dev
apt-get build-dep mariadb-server
apt-get source mariadb-server
cd mariadb-*
 DEB_CFLAGS_APPEND='-Og -g3' DEB_CXXFLAGS_APPEND='-Og -g3' debuild -us -uc

The packages created will have these flags set.

Temporarily Installing your Debug Build

The commands shown below replace the release mariadbd binary with the debug mariadbd binary that you compiled. Most importantly, they replace the binary in a way which makes it trivial to revert back to the original release mariadbd binary.

First, stop MariaDB.

Then, use the mv utility to rename the release mariadbd binary:

sudo mv /usr/sbin/mariadbd /usr/sbin/mariadbd-orig

Note: Do not use the cp utility because that will change the file modification timestamp.

Then, install the debug mariadbd binary from your source tree:

sudo install ~/mariadb-*/sql/mariadbd /usr/sbin/mariadbd

Then, start MariaDB.

Be sure to replace /usr/sbin/mariadbd with the path to your mariadbdd binary and to also replace ~/mariadb-*/sql/mariadbd with the path to your debug #mariadbdbinary.

Reinstalling your Release Build

If you want to restore your original mariadbd binary, you can do it with the following process::

First, stop MariaDB.

Then, execute the following command to delete the symbolic link:

sudo mv /usr/sbin/mariadbd /usr/sbin/mariadbd-debug

Then, execute the following command to move the original mariadbd release binary back into place:

sudo mv /usr/sbin/mariadbd-orig /usr/sbin/mariadbd

Then, start MariaDB.

Be sure to replace /usr/sbin/mariadbd with the path to your mariadbd binary

The debug mariadb-debug binary is still present if it is needed again in the future.

Different Compilation Options

Changing DBUG_ASSERT to Print to Error Log

A debug binary has lots of code checks and asserts, that are not checked in production. This is done to get more performance when running in production. In some cases, when one is trying to find a hard-to-repeat bug, it could be beneficial to have these checks in production builds too.

Compiling with -DDBUG_ASSERT_AS_PRINTF will change DBUG_ASSERT() to print any failed check to the error log.

cmake . -DDBUG_ASSERT_AS_PRINTF

Enabling the above option should not have any notable impact on performance (probably < 1% slowdown). This is achieved by grouping asserts in MariaDB server code into two groups:

  • Fast checks, using DBUG_ASSERT(): These are converted to printing to error log.

  • Slow checks, using DBUG_SLOW_ASSERT(). These will always be removed in production builds.

See Also

  • Build environment setup for Linux

  • Debugging MariaDB with a debugger

  • Creating a trace file

  • Using ASAN with MariaDB

This page is licensed: CC BY-SA / Gnu FDL

Creating a Trace File

If mysqld is crashing, creating a trace file is a good way to find the issue.

A mysqld binary that has been compiled with debugging support can create trace files using the DBUG package created by Fred Fish. To find out if your mysqld binary has debugging support, run mysqld -V on the command line. If the version number ends in -debug then your mysqld binary was compiled with debugging support.

See Compiling MariaDB for debugging for instructions on how to create your own mysqld binary with debugging enabled.

To create the trace log, start mysqld like so:

mysqld --debug

Without options for --debug, the trace file will be named /tmp/mysqld.trace in MySQL and older versions of MariaDB before 10.5 and /tmp/mariadbd.trace starting from MariaDB 10.5.

On Windows, the debug mysqld is called mysqld-debug and you should also use the --standalone option. So the command on Windows will look like:

mysqld-debug --debug --standalone

Once the server is started, use the regular mysql command-line client (or another client) to connect and work with the server.

After you are finished debugging, stop the server with:

mysqladmin shutdown

DBUG Options

Trace files can grow to a significant size. You can reduce their size by telling the server to only log certain items.

The --debug flag can take extra options in the form of a colon (:) delimited string of options. Individual options can have comma-separated sub-options.

For example:

mysqld --debug=d,info,error,query:o,/tmp/mariadbd.trace

The 'd' option limits the output to the named DBUG_ macros. In the above example, the /tmp/mariadbd.trace tracefile will contain output from the info, error, and query DBUG macros. A 'd' by itself (with no sub-options) will select all DBUG_ macros.

The 'o' option redirects the output to a file (/tmp/mariadbd.trace in the example above) and overwrites the file if it exists.

See Also

  • Options for --debug

This page is licensed: CC BY-SA / Gnu FDL

Debugging a Running Server (on Linux)

Even if you don't have a server that is compiled for debugging, there are still ways to get more information out from it if things go wrong.

When things go wrong, it's always better to have a version of mysqld daemon that is not stripped.

shell> file /usr/sbin/mysqld

If this doesn't say 'stripped' then you are fine. If not, you should either download a binary with debugging information or compile it, without stripping the binary.

Debugging Memory Consumption With tcmalloc

Read the Profiling Memory Usage page for more information on how to debug high memory consumption.

If you have a problem with a mysqld process that keeps on growing, you can use tcmalloc to find out what is allocating memory:

Depending on the system you have to install the tcmalloc (OpenSuse) or the google-perftools-lib (RedHat, Centos) package.

The following set of commands starts mysqld with memory profiling and if you kill it with SIGABRT, you will get a core dump that you can examine:

HEAPPROFILE=/tmp/mysqld.prof /usr/sbin/mysqld_safe --malloc-lib=tcmalloc --core-file-size=unlimited --core-file

or if you prefer to invoke mysqld directly:

ulimit -c unlimted
LD_PRELOAD=/usr/lib64/libtcmalloc.so.4 HEAPPROFILE=/tmp/mysqld.prof /usr/sbin/mysqld --core-file

You can of course add other mysqld options to the end of the above line.

Now start your client/application that uses MariaDB. You can find where memory is allocated in the /tmp/mysqld.prof file. If you find any memory issues, please report this in the MariaDB bug tracker!

ptrace Protection and Attaching GDB to a mysqld Instance

New Ubuntu releases do not allow one process to examine the memory of an arbitrary user's process. As a result, when trying to attach GDB to a running MariaDB (or any other process) instance, one gets the following error in GDB:

ptrace: Operation not permitted

More details are available in the Ubuntu Wiki.

To allow GDB to attach, one needs to edit the value of the/proc/sys/kernel/yama/ptrace_scope sysctl value.

  • To change it temporarily, open a root shell and issue:

echo 0 > /proc/sys/kernel/yama/ptrace_scope
  • To change it permanently, edit as root:

/etc/sysctl.d/10-ptrace.conf

and set the value to 0.

Debugging a Server That Hangs

If your mysqld server hangs, you may want to debug it to know what happened.

Preferably the server should be compiled for debugging, but it's not strictly necessary:

cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_VALGRIND=ON .
make -j4

To know what the server is doing:

  • Find out the process number of mysqld

ps -edalf | grep mysqld
  • Attach to the process and get a back trace:

gdb -p 'pid of mysqld' path-to-mysqld
set height 0
set logging file /tmp/mysqld.log
set logging on
thread apply all backtrace full

After the above, you have a full backtrace, including all local variables, in the mysqld.log file. Note that you will only get all variables if the server is not stripped.

This page is licensed: CC BY-SA / Gnu FDL

Debugging MariaDB With a Debugger

If you have MariaDB compiled for debugging you can both use it in a debugger, like ddd or gdb, and get comprehensive trace files of the execution of MariaDB. The trace files allow you to both see the flow of the code and to see the differences in execution by comparing two trace files.

Core dumps are also much easier to investigate if they come from a debug binary.

Note that a binary compiled for debugging and tracing is about 10-20% slower than a normal binary. If you just compile a binary for debugging (option -g with gcc) the speed difference compared to a normal binary is negligible.

Checking That MariaDB is Compiled For Debugging

Execute:

mariadbd --debug --help

If you are using MariaDB before 10.5, then you should use mysqld instead of mariadbd!

If you get an error unknown option '--debug, then MariaDB is not compiled for debugging and tracing.

Building MariaDB for Debugging Starting from 5.5

On Unix you need to pass -DCMAKE_BUILD_TYPE=Debug to cmake to compile with debug information.

Building MariaDB 5.3 and Older

Here is how you compile with debug on older versions:

Use the scripts in the BUILD directory that will compile MariaDB with most common debug options and plugins, for example:

./BUILD/compile-pentium64-debug-max

For the most common configurations there exists a fine-tuned script in the BUILD directory.

If you want to use valgrind, a very good memory instrumentation tool and memory overrun checker, you should use

./BUILD/compile-pentium64-valgrind-max

Some recommended debugging scripts for Intel/AMD are:

BUILD/compile-pentium64-debug-max
BUILD/compile-pentium64-valgrind-max

This is an example of how to compile MariaDB for debugging in your home directory with MariaDB 5.2.9 as an example:

cd ~
mkdir mariadb
cd mariadb
tar xvf mariadb-5.2.9.tar.gz
ln -s mariadb-5.2.9 current
cd current
./BUILD/compile-pentium64-debug-max

The last command will produce a debug version of sql/mysqld.

Debugging MariaDB From the Source Directory

Creating the MariaDB Database Directory

The following example creates the MariaDB databases in /data.

./scripts/mariadb-install-db --srcdir=. --datadir=/data

Running MariaDB in a Debugger

The following example is using ddd, an excellent graphical debugger in Linux. If you don't have ddd installed, you can use gdb instead.

cd sql
ddd ./mariadbd &

In ddd or gdb

run --datadir=/data --language=./share/english --gdb

You can set the options in your /.my.cnf file so as not to have to repeat them on the run line.

If you run mysqld with --debug, you will get a trace file in /tmp/mysqld.trace that shows what is happening.

Note that you can have different options in the configuration file for each MariaDB version (like having a specific language directory).

Debugging MariaDB Server with mariadb-test-run

If you get a crash while running mariadb-test-run you can debug this in a debugger by using one of the following options:

mariadb-test-run --gdb failing-test-name

or if you prefer the ddd debugger:

mariadb-test-run --ddd failing-test-name

Sample .my.cnf file to Make Debugging Easier

[client-server]
socket=/tmp/mysql-dbug.sock
port=3307

[mariadb]
datadir=/my/data
loose-innodb_file_per_table
server_id= 1
log-basename=master
loose-debug-mutex-deadlock-detector
max-connections=20
lc-messages=en_us

[mariadb-10.0]
lc-messages-dir=/my/maria-10.0/sql/share

[mariadb-10.1]
lc-messages-dir=/my/maria-10.1/sql/share

[mariadb-10.2]
lc-messages-dir=/my/maria-10.2/sql/share

[mariadb-10.3]
lc-messages-dir=/my/maria-10.3/sql/share

The above .my.cnf file:

  • Uses an explicit socket for both client and server.

  • Assumes the server source is in /my/maria-xxx. You should change this to point to where your sources are located.

  • Has a unique patch for each MariaDB version so that one doesn't have to specify --lc-messages-dir or --language even if one switches between debugging different MariaDB versions.

See Also

  • Creating a trace file

  • Configuring MariaDB with my.cnf

  • Running mariadbd from the build director

This page is licensed: CC BY-SA / Gnu FDL

Debugging Memory Usage

Debugging memory usage on CentOS 7.

This page describes how to debug MariaDB's memory usage. It uses CentOS 7 but can be applied to other systems as well.

The idea is to employ Google PerfTools: heapprofile.html

On CentOS :

sudo yum install gperftools 
service mariadb stop
systemctl edit mariadb

This will open an editor.

Add this content and close the file:

[Service]
Environment="HEAPPROFILE=/tmp/heap-prof-1"
Environment="HEAP_PROFILE_ALLOCATION_INTERVAL=10737418240"
Environment="HEAP_PROFILE_INUSE_INTERVAL=1073741824"
Environment="LD_PRELOAD=/usr/lib64/libtcmalloc.so.4"

Then run

service mariadb start

Then, run the workload. When memory consumption becomes large enough, ruh

ls -la /tmp/heap-prof-*

This should show several files.

Copy away the last one of them:

cp /tmp/heap-prof-1.0007.heap .

Then, run

pprof --dot /usr/sbin/mysqld heap-prof-1.0007.heap  > 7.dot

(Note: this produces a lot of statements like /bin/addr2line: Dwarf Error: .... Is this because it cannot find locations from the plugin .so files in mariadbd? Anyhow, this is not a showstopper at the moment)

Then, please send us the 7.dot file.

This page is licensed: CC BY-SA / Gnu FDL

Enabling Core Dumps

Enabling in an Option File

Core dumps are enabled by default on Windows, so this step can be skipped on Windows in those versions. See MDEV-18439 for more information.

In order to enable core dumps, you need to set the core_file system variable either on the command-line or in a relevant server option group in an option file. For example:

[mariadb]
...
core_file

You can check your current value by executing:

my_print_defaults --mariadbd

core_file is a system variable. Its value can be checked at runtime by executing the following:

SHOW GLOBAL VARIABLES LIKE 'core_file';

Core Files on Linux

There are some additional details related to using core files on Linux.

Disabling Core File Size Restrictions on Linux

On some systems there is a limit on the sizes of core files that can be dumped. You can check the system's current system-wide limit by executing the following:

ulimit -c

You can check the current limit of the mariadbd process specifically by executing the following:

sudo cat /proc/$(pidof mariadbd)/limits | grep "core file"

If you need to change the core size limit, the method you use depends on how you start mariadbd. See the sections below for more details.

The resource limits for the mariadbd process are printed to the error log when the mariadbd process crashes. That way, users can confirm whether the process may have been allowed to dump a core file. See MDEV-15051 for more information.

Running mariadbd Using mysqld_safe

If you are starting MariaDB by running mysqld_safe, then configuring the following in the [mysqld_safe] option group in an option file should allow for unlimited sized core files:

[mysqld_safe]
...
core_file_size=unlimited

You can check your current values by executing:

my_print_defaults mysqld_safe

See mysqld_safe: Configuring the Core File Size for more details.

Note: If you are using mysqld_safe and running mariadbd as the root user, then no core file is created on some systems. The solution is to run mariadbd as another user.

Running mariadbd Manually

If you are starting mariadbd manually or in a custom script, then you can allow for unlimited sized core files by executing the following in the same shell or script in which mariadbd is executed:

ulimit -c unlimited

Running mariadbd Using systemd

If you are starting mariadbd using systemd, then you may need to customize the MariaDB service to allow for unlimited size core files. For example, you could execute the following:

Using sudo systemctl edit mariadb.service add the contents:

[Service]

LimitCORE=infinity

See systemd: Configuring the Core File Size for more details.

Running MariaDB Containers

To get a core dump in a mariadb container requires setting the path on Linux to not include a sysctl kernel.core_pattern beginning with a pipe to an executable that doesn't exist in the container. Setting to a straight core is recommended.

Also see Container with Debug Symbols.

Changing the System-Wide Limit

If you want to change the system-wide limit to allow for unlimited size core files for the mysql user account, then you can do so by adding the following lines to a file in /etc/security/limits.d/. For example:

sudo tee /etc/security/limits.d/mariadb_core.conf <<EOF
mysql soft core unlimited
mysql hard core unlimited
EOF

The system would have to be restarted for this change to take effect.

See Configuring Linux for MariaDB: Configuring the Core File Size for more details.

Setting the Path on Linux

If you are using Linux, then it can be helpful to change a few settings to alter where the core files is written and what file name is used. This is done by setting the kernel.core_pattern and kernel.core_uses_pid attributes. You can check the current values by executing the following:

sysctl kernel.core_pattern
sysctl kernel.core_uses_pid

If you are using mysql-test-run and want to have the core as part of the test result, the optimal setting is probably the following (store cores in the current directory as core.number-of-process-id):

sudo sysctl kernel.core_pattern=core.%p kernel.core_uses_pid=0

If you are using a production system, you probably want to have the core files in a specific directory, not in the data directory. They place to store cores can be temporarily altered using the sysctl utility, but it is often more common to alter them via the /proc file system. See the following example:

sudo mkdir /tmp/corefiles
sudo chmod 777 /tmp/corefiles
sudo sysctl kernel.core_pattern=/tmp/corefiles/core
sudo sysctl kernel.core_uses_pid=1

The above commands will tell the system to put core files in /tmp/corefiles, and it also tells the system to put the process ID in the file name.

If you want to make these changes permanent, then you can add the following to a file in /etc/sysctl.conf.d/. For example:

sudo tee /etc/sysctl.d/mariadb_core.conf <<EOF
kernel.core_pattern=/tmp/corefiles/core
kernel.core_uses_pid=1
EOF

Note: if you are using containers, the pid is always going to be 1, so this may not be a useful setting. Appending an identifier like %t to the kernel.core_pattern will generate more unique files.

The value of kernel.core_pattern is printed to the error log when the mariadbd process crashes. That way, users can determine where the process may have dumped a core file. See MDEV-15051 for more information.

Note: Ensure that you have enough free disk space in the path pointed to by kernel.core_pattern.

Extracting Linux core dumps with systemd-coredump

Core dump management can be automated using systemd, which then centrally manages all core dump files and provides information about detected core dumps and access to collected core files using the coredumpctl utility.

This is enabled per default on Red Hat Enterprise Linux 8 and CentOS 8, and maybe other contemporary Linux distribution releases by now, too. It can be easily checked for by looking at the kernel.core_pattern setting. If it looks like this systemd-coredump is enabled:

# sysctl kernel.core_pattern
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

On other distributions like Ubuntu (at least up to 21.10) it is not enabled by default, but can be set up manually.

To see all recent core dumps on the system you can then simply run

# coredumpctl list

Or you can check for MariaDB Server core dumps specifically with:

# coredumpctl list /usr/sbin/mariadbd

If an actual core file got stored you'll see present in the COREFILE column of the output, you can then extract the core file with:

# coredumpctl dump -o mariadbd.core ...PID...

using the process id number from the PID column, or when you just want to retrieve the latest MariaDB Server related entry:

# coredumpctl dump -o mariadb.core /usr/sbin/mariadbd

Starting with systemd 248 it is also possible to invoke the gdb debugger directly using the new --debugger-arguments=... option, e.g. making the extraction of all thread backtraces from the most recent MariaDB server crash a one liner without even having to extract the core dump file first (requires gdb to be installed):

# coredumpctl debug --debugger-arguments="-batch -ex 'thread apply all bt full'" /usr/sbin/mariadbd

So far none of the long term support Linux distribution releases have a new enough systemd version for this, the (as of this writing) still upcoming Ubuntu 22.04 "Jammy Jellyfish" will probably the first to support this.

Core Dumps and setuid on Linux

Since mariadbd executes setuid, you may have to set fs.suid_dumpable=2 to allow core dumps on Linux. You can check the current fs.suid_dumpable value by using the sysctl utility. For example:

sysctl fs.suid_dumpable

You can temporarily set it to 2 by using the sysctl utility. For example:

sudo sysctl -w fs.suid_dumpable=2

Or you can temporarily set it to 2 by writing to the /proc file system. For example:

sudo echo 2 > /proc/sys/fs/suid_dumpable

If you want to permanently set it to 2 then you can add the following to a file in /etc/sysctl.conf.d/:

sudo tee /etc/sysctl.d/mariadb_fs_suid_dumpable.conf <<EOF
fs.suid_dumpable=2
EOF

Note: If you don't want to change fs.suid_dumpable, then another solution is to start mariadbd directly as the mysql user, so that the setuid call is not needed.

Forcing a Core File on Linux

You can get a core dump from a running server with:

sudo gcore -o filename $(pidof mariadbd)

This will store a core dump in filename.pid where pid is the process ID of mariadbd.mariadbd will continue to be running and will not be affected by gcore.

Another method is to force a core file for mariadbd by sending the process the sigabrt signal, which has the signal code 6. This is very useful to get the state of the unresponsive mariadbd process. However, this will cause mariadbd to crash, and crash recovery will be run on restart.

You can send the signal with the kill command. For example:

sudo kill -6 $(pidof mariadbd)

As an alternative to $(pidof mariadbd), you can find the process ID either by using the ps utility or by checking the file defined by the pid_file system variable.

Core Files on Windows

Core dumps are enabled by default on Windows. See MDEV-18439 for more information.

There are some additional details related to using core files on Windows.

Minidump Files on Windows

On Windows, the core file is created as a minidump file.

For details on how to configure and read the minidump file, see How to read the small memory dump file that is created by Windows if a crash occurs.

Core Files on Kubernetes

See the IBM Core Dump Handler project.

Core Files and Address Sanitizer (ASAN)

If your mariadbd binary is built with Address Sanitizer (ASAN) then it will not be able to generate a core file.

What's Included in Core Files

Core files usually contain a dump of all memory in the process's full address space. This means that if a server has some large buffers configured (such as a large InnoDB buffer pool), then the server's core files can get very large.

Some large buffers have been excluded from core files on some systems as a way to reduce the size.

The following buffers are excluded:

  • InnoDB buffer pool

  • InnoDB log buffer

  • InnoDB Redo log buffer (fixed 2M)

  • Query cache

The buffers are only excluded on Linux when using kernel version 3.4 and above and when using a non-debug build of mariadbd. Some Linux kernel versions have a bug which would cause the following warning to be printed to the log:

Sep 25 10:41:19 srv1 mysqld: 2018-09-25 10:41:19 0 [Warning] InnoDB: Failed to set memory to DODUMP: Invalid argument ptr 0x2aaac3400000 size 33554432

In those cases, the core dump may exclude some additional data. If that is not a concern, then the warning can be ignored. The problem can be fixed by upgrading to a Linux kernel version in which the bug is fixed.

See Also

  • How to Produce a Full Stack Trace for mariadbd

  • HowTo: Debug Crashed Linux Application Core Files Like A Pro

  • A Nice Feature in MariaDB 10.3: no InnoDB Buffer Pool in Core Dumps

  • Getting MySQL Core file on Linux

This page is licensed: CC BY-SA / Gnu FDL

How to Produce a Full Stack Trace for mariadbd

Partial Stack Traces in the Error Log

When mariadbd crashes, it will write a stack trace in the error log by default. This is because the stack_trace option defaults to ON. With a normal release build, this stack trace in the error log may look something like this:

2019-03-28 23:31:08 0x7ff4dc62d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.2.23/storage/innobase/rem/rem0rec.cc line 574
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
190328 23:31:08 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.23-MariaDB-10.2.23+maria~stretch
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=234
max_threads=752
thread_count=273
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1783435 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7ff4d8001f28
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7ff4dc62ccc8 thread_stack 0x49000
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ffa09af5bfb]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7ffa09b7e437]
/lib/x86_64-linux-gnu/libc.so.6(+0xf7570)[0x7ffa09b7c570]
/lib/x86_64-linux-gnu/libc.so.6(+0xf93aa)[0x7ffa09b7e3aa]
/usr/sbin/mysqld(my_addr_resolve+0xe2)[0x55ca42284922]
/usr/sbin/mysqld(my_print_stacktrace+0x1bb)[0x55ca4226b1eb]
/usr/sbin/mysqld(handle_fatal_signal+0x41d)[0x55ca41d0a01d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7ffa0b4180e0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7ffa09ab7fff]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7ffa09ab942a]
/usr/sbin/mysqld(+0x40f971)[0x55ca41ab8971]
/usr/sbin/mysqld(+0x887df6)[0x55ca41f30df6]
/usr/sbin/mysqld(+0x863673)[0x55ca41f0c673]
/usr/sbin/mysqld(+0x96648e)[0x55ca4200f48e]
/usr/sbin/mysqld(+0x89b559)[0x55ca41f44559]
/usr/sbin/mysqld(+0x8a15e4)[0x55ca41f4a5e4]
/usr/sbin/mysqld(+0x8a2187)[0x55ca41f4b187]
/usr/sbin/mysqld(+0x8b1a20)[0x55ca41f5aa20]
/usr/sbin/mysqld(+0x7f5c04)[0x55ca41e9ec04]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0x107)[0x55ca41d140d7]
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x72)[0x55ca41b4b992]
/usr/sbin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0x1206)[0x55ca41b560f6]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3f68)[0x55ca41b6bee8]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x28a)[0x55ca41b70e4a]
/usr/sbin/mysqld(+0x4c864f)[0x55ca41b7164f]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1a7c)[0x55ca41b737fc]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x176)[0x55ca41b748a6]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x25a)[0x55ca41c3ec0a]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x55ca41c3ed7d]
/usr/sbin/mysqld(+0xb75791)[0x55ca4221e791]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7ffa0b40e4a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ffa09b6dd0f]

If you plan to report a bug about the problem, then this information can be very useful for MariaDB's developers to track down the root cause. However, notice that some of the function names in the call stack are missing. In some cases, this partial stack trace may not be enough to find out exactly where the problem is.

A full stack trace can only be produced if you have debugging symbols for your mariadbd binary.

Obtaining Debugging Symbols for Your mariadbd executable

Debug information is used by debugging tools to produce a meaningful stack trace. Importantly these packages do not replace any executables or any existing production executables or in any way interfere with the way the production server ran before these packages where installed.

If you are obtaining a backtrace for a coredump, you can move the core dump to a difference server that has the identical mariadb-server and debug info packages, and perform the backtrace there with no loss of information.

Installing Debug Info Packages on Linux

On some Linux distributions, you may be able to install debuginfo packages that contain debugging symbols.

Currently, debuginfo packages may not allow the server to print a nice stack trace in the error log. They also allow users to extract full stack traces from core dumps. See MDEV-20738 for more information.

Installing Debug Info Packages with yum/dnf

The MariaDB yum repository contains debuginfo packages.

On RHEL, CentOS, Fedora, and other similar Linux distributions, it is highly recommended to install the relevant RPM package from MariaDB's repository using yum or dnf. Starting with RHEL 8 and Fedora 22, yum has been replaced by dnf, which is the next major version of yum. However, yum commands still work on many systems that use dnf. For example:

sudo yum install MariaDB-server-debuginfo

See Installing MariaDB with yum/dnf: Installing Debug Info Packages with YUM for more information.

Installing Debug Info Packages with zypper

The MariaDB zypper repository contains debuginfo packages.

On SLES, OpenSUSE, and other similar Linux distributions, it is highly recommended to install the relevant RPM package from MariaDB's repository using zypper. For example:

sudo zypper install MariaDB-server-debuginfo

See Installing MariaDB with zypper: Installing Debug Info Packages with ZYpp for more information.

Installing Debug Info Packages from MariaDB's Debian or Ubuntu repository

These are for when you already installed MariaDB from a MariaDB mirror.

For Ubuntu an additional repository step is needed:

sudo add-apt-repository 'deb [arch=amd64,arm64,ppc64el,s390x]  https://ftp.osuosl.org/pub/mariadb/repo/10.5/ubuntu focal main/debug'

Adjust 10.5 to the major version you are debugging and focal to the required distribution.

apt-get update && apt-get install -y mariadb-server-core-10.5-dbgsym

From MariaDB 10.9 the version isn't included in the package name and mariadb-server-core-dbgsym can be used as the package name.

Installing Debug Info Packages packaged by Ubuntu or Debian

If you used the MariaDB versions provided by Debian or Ubuntu see the following links.

For Debian see AutomaticDebugPackages

For Ubuntu see Debug%20Symbol%20Packages

Installing Debugging Symbols on Windows

Debugging symbols are available to install on Windows.

Installing Debugging Symbols with the MSI Installer on Windows

Debugging symbols can be installed with the MSI installer. Debugging symbols are not installed by default. You must perform a custom installation and explicitly choose to install debugging symbols.

The MSI installer can be downloaded from the MariaDB downloads page.

Installing Debugging Symbols with the ZIP Package on Windows

MariaDB also provides a ZIP package that contains debugging symbols on Windows.

The ZIP package that contains debugging symbol can be downloaded from the MariaDB downloads page.

Containers with Debug Symbols

Prebuilt Debug Containers

These are currently only per major release version and are generated out of CI. They are always the latest version in the main branch on GitHub.

There are available on at mariadb-debug?tab=tags.

Use the container name quay.io/mariadb-foundation/mariadb-debug:VERSION.

Where VERSION corresponds to the major version you wish to test with.

Creating a Container with Debugging Symbols

Build using a Containerfile/Dockerfile:

ARG VERSION=10.11
FROM mariadb:$VERSION

RUN apt-get update \
        && apt-get install -y gdb mariadb-server-core-dbgsym=$(dpkg -s mariadb-server-core |  awk '/Version/{print $NF}') ; \
        rm -rf /var/lib/apt/lists/*

Build with:

buildah bud --tag mariadb_debug --build-arg VERSION=10.11.6 -f Containerfile .

Then you will have a mariadb_debug container.

Example use:

podman run --entrypoint gdb --user mysql --volume mariadb_data:/var/lib/mysql mariadb_debug -ex r --args /usr/sbin/mariadbd

Enabling Core Dumps

To enable core dumps, see Enabling Core Dumps for details.

Where is the Core File on Linux?

At the bottom of the error log there will be some text about the core location including:

Writing a core file...
Working directory at /var/lib/mariadb
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
...
Max core file size        unlimited            unlimited            bytes     
...
Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

Kernel version: Linux version 6.0.0-0.rc2.19.fc38.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-1), GNU ld version 2.39-2.fc38) #1 SMP PREEMPT_DYNAMIC Mon Aug 22 12:52:40 UTC 2022

If the was a core limit in the resource limits there may be limited or no core file information.

If the core pattern begins with a |, then the following is the executable that handled the core file during the crash. The following show a few techniques to access the core depending on the pattern. If another program is used, look at its manual page to see how to get access to the core file.

If a plain core filename is in the "Core pattern" there's a good chance it will be in the Working directory location. It might have a .{process number} suffix on the filename.

Extracting a core file from a container

If you are running MariaDB in a container, the locations where the core dump can be generated are limited. Looking at the container log, this will likely be where the error log information is. The "Core pattern" of a Linux system is currently a global fixed value. The consequence is if this core pattern refers to a program, that program isn't likely to be in the container and won't be executed on the crash.

The system wide crash handler can be changed with sysctl kernel.core_pattern=core to set this back to a file based crash. With this, the crash should occur in the working directory, normally the /var/lib/mysql data directory of the container volume.

Extracting a core file from systemd-coredump

For systemd-coredump, there is a program coredumpctl to manage access.

coredumpctl list
TIME                            PID   UID   GID SIG     COREFILE EXE                                                                                        >
Fri 2022-09-09 14:16:37 AEST 213571  1000  1000 SIGSEGV present  /usr/sbin/mariadbd

To access the program using gdb, coredumpctl debug (defaults to last crash), will load the core dump in gdb. The instructions in the next section for extracting information.

See also: extracting core dumps with systemd-coredump.

Extract a core file from abrt

A core pattern of |/usr/libexec/abrt-hook-ccpp indicates abrt system is used.

abrt-cli is a command line user interface for access the core file.

Extract a core file from apport

A core pattern of [|/usr/share/apport/apport indicates apport.

For more information see Apport Project Wiki.

apport-retrace allows you to "Examine Locally" and run a gdb session. One you have gdb started instructions in the next section can be used for extracting information.

Analyzing a Core File with gdb on Linux

To analyze the core file on Linux, you can use gdb.

For example, to open a core file with gdb, you could execute the following:

sudo gdb /usr/sbin/mariadbd  /var/lib/mysql/core.932

Be sure to replace /usr/sbin/mariadbd with the path to your mariadbd binary (might be mysqld on MariaDB 10.4 and earlier) and to also replace /var/lib/mysql/core.932 with the path to your core file.

Once gdb has opened the core file, if you want to log all output to a file, then you could execute the following commands:

set logging file /tmp/gdb_output.log
set logging on

If you do not execute set logging file, then the set logging on command creates a gdb.txt in your current working directory. Redirecting the output to a file is useful, because it can make it easier to analyze. It also makes it easier to send the information to a MariaDB developer, if that becomes necessary.

Do any commands that you would like to do. For example, you could get the backtraces.

Once you are done, you can exit gdb by executing the quit command.

Getting Backtraces with gdb on Linux

On Linux, once you have debugging symbols for your mariadbd binary, you can use the gdb utility to get backtraces, which are what gdb calls stack traces. Backtraces can be obtained from a core file or from a running mariadbd process.

Full backtraces are preferred and will contain function arguments, which can contain useful information such as query strings, so it can make the information easier to analyze.

To get a full backtrace of the main thread, then you could execute the following:

bt -frame-arguments all full

If you want to get a full backtrace of all threads, then you could execute the following:

thread apply all bt -frame-arguments all full

If you want to get a full backtrace to a file to report a bug, the recommended way is to use gdb:

set logging on
set pagination off
set print frame-arguments all
thread apply all bt full
set logging off

This will write the full backtrace into the file gdb.txt.

Getting Full Backtraces For All Threads From a Core File

Sometimes it can be helpful to get full backtraces for all threads. The full backtraces will contain function arguments, which can contain useful information such as query strings, so it can make the information easier to analyze.

To get full backtraces for all threads from a mariadbd core file, execute a command like the following:

sudo gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" /usr/sbin/mariadbd /var/lib/mysql/core.932  > mariadbd_full_bt_all_threads.txt

Be sure to replace /usr/sbin/mariadbd with the path to your mariadbd binary and to also replace /var/lib/mysql/core.932 with the path to your core dump.

The backtraces will be output to the file mariadbd_full_bt_all_threads.txt.

Getting Full Backtraces For All Threads From a Running mariadbd Process

Sometimes it can be helpful to get full backtraces for all threads. The full backtraces will contain function arguments, which can contain useful information such as query strings, so it can make the information easier to analyze.

To get full backtraces for all threads from a running mariadbd process, execute a command like the following:

sudo gdb --batch --eval-command="set print frame-arguments all"  --eval-command="thread apply all bt full" /usr/sbin/mariadbd $(pgrep -xn mariadbd)  > mariadbd_full_bt_all_threads.txt

Be sure to replace /usr/sbin/mariadbd with the path to your mariadbd binary.

The backtraces will be output to the file mariadbd_full_bt_all_threads.txt.

Sometimes very busy systems are too busy to batch obtain the backtrace. If this is the case, gcore $(pidof mariadbd) can save the core and then obtain the backtrace out of the dumped core.

Getting a Full Backtrace out of a Container

If the crash or assertion is repeatable it could be easiest to run mariadbd under #gdb.

The container image name here can be a prebuilt one from quay.io/mariadb-foundation/mariadb-debug or an explicit version built yourself as above.

docker run -v datavolume:/var/lib/mysql/ --rm --user mysql  quay.io/mariadb-foundation/mariadb-debug:10.11 gdb -ex r  -ex 'thread apply all bt -frame-arguments all full'  --args mariadbd

In docker-compose.yml form this looks like:

services:
  mariadb:
    image: quay.io/mariadb-foundation/mariadb-debug:10.11
    volumes:
      - mariadbdata:/var/lib/mysql
    environment:
      - MARIADB_ROOT_PASSWORD=bob
    command: gdb -ex r  -ex 'thread apply all bt -frame-arguments all full'  --args mariadbd
    user: mysql
volumes:
  mariadbdata: {}

Note, the initialization of data is assumed. Omit command and user if it isn't.

If you wish to attach to and existing process in a container, the container needs to be started with the SYS_PTRACE capability. The sysctl kernel.yama.ptrace_scope that allows this should also be set to 0.

$ podman run -v data:/var/lib/mysql/ --cap-add SYS_PTRACE --name mtest -d quay.io/mariadb-foundation/mariadb-debug:11.2
$ podman exec --user mysql mtest gdb -p 1 -ex  'thread apply all bt -frame-arguments all full'

Note: podman has the same arguments and behaviour as docker if you'd rather use that.

or in compose:

cap_add:
    - SYS_PTRACE

The container process is always pid one, and here we use c then thread apply all bt -frame-arguments all full as the pre-loaded gdb commands. When a particular signal like assert or SEGV is triggered, the backtrace will be displayed.

Letting a Container coredump

First, the sysctl kernel.core_pattern needs to be core. If it starts with a pipe character it try to execute this within the container. This is a kernel wide setting and cannot be applied to a specific container. The bottom of a crash will show you want it is set to.

With this set, just running a container with debug info is sufficient. The core should be dumped on the datadir volume. With the core dump here, analysis can occur like:

podman run --rm --user mysql --volume data:/var/lib/mysql -i mariadb_debug gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" /usr/sbin/mariadbd /var/lib/mysql/core | tee mariadbd_full_bt_all_threads.txt

Running a Copy of the Database Directory

If you are concerned with debuggers running on your production database you can also copy the database to another location.

This is useful when you know which statement crashed the server.

Just start mariadbd with the options--datadir=/copy-of-original-data-directory --core-file --stack-trace --socket=/tmp/mariadbd-alone.sock --skip-networking

Disabling Stack Traces in the Error Log

In order to disable stack traces in the error log, you can configure the skip_stack_trace option either on the command-line or in a relevant server option group in an option file. For example:

[mariadb]
...
skip_stack_trace

Reporting the Problem

If you encounter some problem in MariaDB, then MariaDB's developers would appreciate if you would report a bug at the MariaDB JIRA bug tracker. Please include the following information:

  • Your full stack trace.

  • Your error log.

  • Your option files.

  • How to reproduce the problem.

  • SHOW ENGINE INNODB STATUS

  • SHOW CREATE TABLE {table (for each table in query) and EXPLAIN {query} if a query related crash.

A MariaDB FTP server is available for large and/or sensitive information. Please upload in .tar.gz or .zip archive.

For very difficult or critical errors, you should consider uploading the following information to the MariaDB FTP server the following:

  • Your build of mariadbd (if you compiled it), otherwise version information on the mariadb-server package.

  • Your core file.

  • Your contact information.

  • The associated JIRA issue identifier for the bug, if you reported a bug.

This information will allow MariaDB developers to analyze it and try to create a fix.

This page is licensed: CC BY-SA / Gnu FDL

How to Use procmon to Trace mysqld.exe Filesystem Access

This article provides a walkthrough on using the Process Monitor on Windows, tracing file system access by mysqld.exe during the "install plugin" call.

Download

Process Monitor is an advanced monitoring tool for Windows that shows real-time file system, registry and process/thread activity. It is a part of sysinternals suite developed by Mark Russinovich and Bryce Cogswell. Process Monitor can be directly downloaded from ProcessMonitor.zip . More description can be found at https://learn.microsoft.com/en-us/sysinternals/downloads/procmon

Installation

There is no installation necessary; the single executable can be used after unpacking. I suggest putting procmon into some directory in the PATH environment variable.

Example of taking a mysqld.exe trace

The purpose of the following exercise is to learn how to use procmon to trace mysqld.exe calls to the filesystem.

We assume that mysqld.exe is already started.

  1. Start procmon.exe . Dialog will pop up that offers to set filter. Use this dialog to set filter to "Process name" "is" "mysqld.exe", as shown in the screenshot below.

Filter Setup

Click on "Add" button to mysqld.exe to include it in the filter, "Apply" and "OK".

  1. Capture events (Menu File=>Capture Events (Ctrl+E)

  2. Start mysql command line client and connect to the server. Execute

mysql> install plugin blackhole soname 'ha_blackhole.dll';
Query OK, 0 rows affected (0.03 sec)
  1. Saving the trace

Back to Process Monitor Windows, you should see the filesystem events initiated by the "INSTALL PLUGIN" operation

Process Monitor Events

To save it, choose File/Save.

(Advanced) Seeing stack traces corresponding to events

It is also possible to see stacktraces corresponding to the events. For this to work , symbols support needs to be configured. This needs to be only done once.

  1. Install Debugging Tools for Windows (google on how to do that).

  2. Switch to Process Monitor's menu Options => Configure symbols.

  3. Add dbghelp.dll from your installation of Debugging Tools into "dbghelp.dll path" input field . On my system it is C:\Program Files\Debugging Tools for Windows (x64)\dbghelp.dll

  4. In "symbol path" input field, add srvC:\symbolssymbols;<path\to\your\installation\bin> (substitute last path element with real path to your installation)

This is how it looks on my machine:

Symbol Config

Once symbols are configured, you'll get a stack trace corresponding to a filesystem event by simply doubleclicking on the line corresponding to the event. This is what I see after clicking on the first event of my tracing session (corresponds to opening my.ini file)

Callstack

It is also possible to save the whole trace with callstacks as text (File/Save, choose XML, include callstack + resolve callstack).

This page is licensed: CC BY-SA / Gnu FDL

Limitations/Differences with a MariaDB Server Compiled for Debugging

A MariaDB server configured with --with-debug=full has the following differences from a normal MariaDB server:

  • You can have maximum of 1000 tables locked at the same time in one statement. (Define MAX_LOCKS in mysys/thrlock.c). This is to detect loops in the used lists.

  • You can have maximum of 1000 threads locking the same table. (Define MAX_THREADS in mysys/thrlock.c). This is to detect loops in the used lists.

  • Deadlock detection of mutex will be done at runtime. If wrong mutex handling is found an error will be printed to the error log. (Define SAFE_MUTEX)

  • Memory overrun/underrun and not freed memory will be reported to the error log (Define SAFEMALLOC)

  • You can get a trace of what mysqld (and most other binaries) is doing by starting it with the --debug option. The trace is usually put in /tmp or C:\

This page is licensed: CC BY-SA / Gnu FDL

MariaDB Development

MariaDB Server Roadmap

This page talks in general about the MariaDB Server roadmap, and how it is formed.The roadmap it describes is located in our JIRA issue tracker: jira.mariadb.org

MariaDB was designed as a drop-in replacement of MySQL with more features, new storage engines, fewer bugs, and better performance. We aim to make upgrading from MySQL to MariaDB extremely easy.

The roadmap for MariaDB Server is formed by its engineers and product managers, with input from the community.

MariaDB Server developers work with several storage engine vendors and developers to get the best storage engines into MariaDB Server.

The collected information is then turned into specific tasks which are then prioritized and added to the roadmap in the JIRA issue tracker at jira.mariadb.org

Goals and Quality Standards

The primary goal of MariaDB Server is for it to be a practical database developed in the direction users and developers want it to be developed. Each feature should ideally be developed by or for users who want to test and put the feature into production ASAP — this helps ensure we don't implement features no one wants or needs.

We are also putting a lot of effort into speeding up MariaDB Server, and to keep it stable and easy to use!

The MariaDB Server source tree is maintained by MariaDB the company and its developers. They are the primary contributors to the MariaDB Server project and the ones who are ultimately responsible for the quality of the code.

MariaDB Server 5.1, MariaDB Server 5.2, and MariaDB Server 5.3 were built off of MySQL 5.1.

MariaDB Server 5.5 was a combination of MariaDB Server 5.3 and MySQL 5.5.

MariaDB Server 10.0, and later build off of the previous MariaDB Server releases with backported features from MySQL and entirely new features not found anywhere else.

Short descriptions of the various MariaDB Server releases and their main new features can be found on the MariaDB Server Releases page.

Plans

MariaDB 11.8 is an upcoming major development release.

JIRA

We manage our development plans in JIRA, so the definitive list will be there. This search shows what we currently plan for 11.8. It shows all tasks with the Fix-Version being 11.8. Not all these tasks will really end up in 11.8 but tasks with the "red" priorities have a much higher chance of being done in time for 11.8. Practically, you can think of these tasks as "features that will be in 11.8". Tasks with the "green" priorities probably won't be in 11.8. Think of them as "bonus features that would be nice to have in 11.8".

Contributing

If you want to be part of developing any of these features, see Contributing to the MariaDB Project. You can also add new features to this list or to JIRA.

See Also

  • Current tasks for 11.8

  • 11.8 Features/fixes by vote

  • What is MariaDB 11.8?

  • Plans for MariaDB 11.7

  • Plans for MariaDB 11.6

This page is licensed: CC BY-SA / Gnu FDL

Quality Assurance

This section collects articles related to MariaDB quality assurance efforts

Code Coverage

We are working on getting more of the MariaDB source covered by our mysql-test-run (MTR) test suite. This is an ongoing (and slow) task as there is still a lot of old code with not very good coverage.

Goals for new code

For new code in MariaDB, we aim much higher:

The goals are:

  1. All new lines of code should ideally be tested by MTR.

  2. Code which cannot reasonably be tested by MTR needs to be tested by another tool and those code lines marked with /* purecov: tested */.

  • In this case the tool used for testing should be documented in the worklog entry for the code or in the commit message.

  1. Code that can't reasonably be tested (such as error conditions) should be marked with '/* purecov: inspected */' so that a reviewer of the code can easily spot this code.

  2. Code that is suspected to be deadcode should have a 'DBUG_ASSERT(0)' or be marked with '/* purecov: deadcode */' so that we have a chance to notice if the code is ever executed.

The reason we are using 'purecov' to mark lines is an attribution to the purecov tool we originally used for code coverage in the early years of MySQL.

Markers

The recommended markers are:

/* purecov: tested */

  • For code lines that are tested by something other than mysql-test-run:

/* purecov: inspected */

  • For code lines that are hard to test but for which one has read the line multiple times to ensure it is correct. A code reviewer should also inspect these lines with care as they have not been properly tested.

/* purecov: deadcode */

  • For code lines that one suspects will never be called. Having this marker allows us to generate a warning during mysql-test-run code coverage if this line is executed.

The comment must be placed on the line/lines that are affected.

For code blocks larger than 1 line one can use the block syntax:

/* purecov: begin tested */
....
/* purecov: end */

Running mysql-test-run with gcov

Prerequisites

  1. First make sure that gcov 4.9 is installed.Older versions of the gocv library (lgcov) can't handle running several instances of a program in parallel. This causes the generated .gov files to not contain all executed lines when running mysql-test-run with the --parallel option or running test that starts several mysqld servers, like replication or spider tests.

  2. Compile MariaDB with BUILD/compile-pentium64-gcov (if your machine does not have a pentium CPU, hack this script, or just live with the pentium-specific stuff)

Running mysql-test-run

To be able to see the level of coverage within the current test suite, do the following:

  1. In the mysql-test directory, run this command: ./mysql-test-run -gcov

  2. To see which lines are not yet covered, look at source_file_name.gcov in the source tree. In MariaDB 10.1 or below it's in the CMakeFiles directory where the object files are stored. In MariaDB 10.2 it's stored together with the source files.

  3. Think hard about a test case which will cover those lines that are not tested, and write one.

Tools

  • You can use the code-coverage-with-dgcov/dgcov tool to check the coverage for the new code. This is especially written and maintained for MariaDB.

  • For code coverage you also use the lcov tool.

Code coverage in buildbot

buildbot, the MariaDB build system, is doing automatic coverage testing for each push.

This page is licensed: CC BY-SA / Gnu FDL

Code Coverage with dgcov

The dgcov tool helps you check the coverage for new code. The dgcov.pl script is part of the mariadb-test framework (and any packages that include mariadb-test).

Overview

The dgcov program runs gcov for code coverage analysis, aggregates the coverage data, and (optionally) reports coverage only for those lines that are changed by the commit(s). Commits are specified in the git diff format.

If no commits are specified, the default is to work on all uncommitted changes, if any, otherwise on the last commit (in other words, on git diff HEAD or git diff HEAD^).

It's recommended that a developer runs dgcov on their new code before pushing it into a MariaDB repository.

Usage

./dgcov.pl --help
./dgcov.pl [options] [<commit> [<commit>]]

Options and Variables

Short Option
Long Option
Description

-h

--help

Print help and exit

-v

--verbose

Show commands run.

-p

--purge

Delete all test coverage information, to prepare for a new coverage test.

-o

--only-gcov

Stop after running gcov, don't run git

-s

--skip-gcov

Do not run gcov, assume .gcov files are already in place

-g

--generate

Create .dgcov files for all source files

How to Prepare the Code for dgcov

Prior to running this tool, MariaDB should be built with

cmake -DENABLE_GCOV=ON

and the testsuite should be run. dgcov will report the coverage for all lines modified in the specified commits.

Output

Output .dgcov files have a conventional gcov format: lines not covered are prefixed with #####, lines without generated code are prefixed with -, and other lines are prefixed with the number of times they were executed. See info gcov for more information.

The patch-like coverage for commits uses gcov format (as above) for lines, changed in these commits, and no prefix at all for lines that were not changed.

Examples

Checking the coverage for all unpushed commits:

dgcov.pl @{u} HEAD

Checking the coverate for all uncommitted changes:

dgcov.pl HEAD

Checking the coverage for a specific commit 1234567:

dgcov.pl 1234567^ 1234567

mariadb-test-run can invoke dgcov automatically:

./mtr --gcov

in the latter case the coverate for the uncommitted changes (or the last commit) will be not printed to the stdout, but will be put into var/last_changes.dgcov file.

Caveats

Note that to be able to run gcov with the mariadb-test framework you need to have gcc version 4.8 or newer.

References

  • dgcov was created by Kristian Nielsen and was first announced here.

  • dgcov was re-implemented to aggregate the data and to work for git and cmake by Sergei Golubchik.

This page is licensed: CC BY-SA / Gnu FDL

Optimizer Quality

Generating the Queries

As of Nov 2010, there are 5 primary SQL grammars available for testing the Optimizer:

  • optimizer_nosubquery.yy generates random queries with no subselects, with up to 3-way join and with various SQL clauses such as aggregates, GROUP BY , HAVING, LIMIT;

  • optimizer_subquery.yy generates queries with subselects with up to 2 levels of nesting. Subqueries are placed in various locations, such as in the SELECT list, in the ON clause, etc. Aggregates, LIMIT, HAVING, etc. are used if allowed by the server;

  • optimizer_subquery.yy generates outer joins of various depths;

  • range_optimizer.yy tests the range optimizer by joinging several tables and generating various conditions on which range optimization is likely to be applied;

  • range_optimizer2.yy tests the range optimizer by generating single-table queries that contain a lot of range-optimizable clauses. Avoiding joins allows the single table to be arbitrarily large, this allowing for more interesting range overlaps;

Validating the Results

As of Nov 2010, the RQG has two primary modes for validating the results:

  • by using a reference implementation. This can be a PostgreSQL, JavaDB or another version or flavor of MySQL/Maria/Drizzle. Testing this way requires trusting external software that is not under our control. Also, it is sometimes difficult to determine which implementation has returned the correct result. Technically, 3 implementations can "vote" as to which is the correct result, but this is not reliable if the implementations all derive from one another.

  • by executing the generated query using a different execution plan. This is usually achieved by disabling particular optimizations and thus "downgrading" the plan to a more basic, presumed stable one. It is assumed that a nested-loop-join that takes no advantage of indexes would always provide the correct result. The advantage of this approach is that there is no need for a reference implementation and the source of the wrong result can be obtained by diffing the original and the downgraded execution plan.

In addition to result set validation, there is a module which executes each generated SELECT in various contexts, such as part of a union, stored procedure, trigger, etc. and makes sure that the query returns a correct result. This is most often used for testing subselects.

Running a Complete Test Cycle

A test cycle is described in a configuration file called the CC file. The CC file contains a list of mysqld options to use, the list of grammars to use and other settings (e.g. Engines, table sizes, etc.). The testing framework will then take a random permutation from the settings described in the file and run them as a RQG test for a predefined time, such as 10 minutes. This is repeated up to 100 times, each with a different random permutation. The PRNG seed for each run will also be different, so different queries will be generated for each run, in addition to using different mysqld options, engine, etc.

By default, all cycles include MyISAM, Aria and InnoDB, and some percentage are run under Valgrind. Cycles run with both NULL and NOT NULL fields and with and without simple views.

Configuration for Join Cache Testing

outer_join_with_cache is always ON.--join_cache_level varies from 0 to 8.--join_buffer_size varies between 1, 100, 1K, 10K and 100K. The optimizer_no_subquery.yy, outer_join.yy and range_access.yy grammars are used. Once semijoin is stable, join_cache + semijoin will be tested with optimizer_subquery.yy.

Configuration for MRR/ICP/DS-MRR-CPK Testing

--optimizer_use_mrr is ON,mrr_sort_keys is both ON and OFF,index_condition_pushdown is both ON and OFF,join_cache_level is between 0 and 8,join_buffer_size and mrr_buffer_size are 1, 100, 1K, 10K and 100K.optimizer_no_subquery.yy, outer_join.yy, range_access.yy and range_access2.yy grammars are used.

Configuration for Subquery Testing

The optimizer_no_subquery.yy grammar is used. Each individualoptimizer_switch related to subquery optimization may be disabled so that the "second best" plan is generated.

Testing MWL#89

When testing MWL#89, the following optimizer_switch are used:in_to_exists=ON,materialization=OFF,in_to_exists=OFF,materialization=ON andin_to_exists=ON,materialization=ON. In addition semijoin is always OFF to force more queries to use materialization/in_to_exists. subquery_cache is OFF to prevent subquery cache bugs from showing up during the test.

See Also

  • RQG Documentation

  • RQG Performance Comparisons

  • RQG Extensions for MariaDB Features

  • QA Tools

  • Worklog Quality Checklist Template

This page is licensed: CC BY-SA / Gnu FDL

QA - Aria Recovery

General Principles

Recovery is tested via the RQG, which provides a random workload against the server, and then uses kill -9 to kill the process. After that, recovery is attempted both by using maria_read_log and by restarting the mysqld process. Once the server has started up, the tables are verified in various ways, including ALTER|OPTIMIZE|ANALYZE|REPAIR TABLE as well SELECT queries that read the table back and forth using various access methods.

A combinations .CC file namedlp:randgen/conf/engines/maria/maria_recovery.cc is used to define variousmysql options and RQG parameters that are relavant to recovery. Then, RQG'scombinations.pl script is used to run hundreds of individual test runs. Each run uses a random permutation from the settings in the .CC file in order to generate a unique workload that is then validated via the Recovery RQG Reporter.

Individual tests

The following are the individual tests or test runs that must be completed or created in order to ensure that Aria recovery is solid.

Standard kill -9 testing

Done 2011-02-28 The standard conf/engines/maria/maria_recovery.cc passes with no failures when run with hundreds of trials.

Testing with small block sizes

On hold pending 2 bug fixes related to --maria-block-size=1K and --maria-block-size=2K

Testing with small page cache size

Done 2011-03-04 Completed 400 rounds with

'--mysqld=--maria-block-size=4K --mysqld=--maria-pagecache-buffer-size=128K',
'--mysqld=--maria-block-size=16K --mysqld=--maria-pagecache-buffer-size=256K',
'--mysqld=--maria-block-size=32K --mysqld=--maria-pagecache-buffer-size=512K'

two pre-recovery crashes were filed, no recovery issues.

Killing and restarting the recovery process itself

In Progress

The AriaDoubleRecovery reporter currently attempts doble recovery via maria_read_log. The first invocation of maria_read_log is killed halfway through the process and the second invocation is left to complete the recovery.

Future testing will involve doing the same with the mysqld server in place of maria_read_log.

Another realistic workload

The usefullness of the SMF workload, derived from the SimpleMachines forum application means that another such workload is required in order to make sure no residual recovery bugs remain. Hopefully something can be cooked up using Wikipedia so that longer fields and blobs are exercised.

Transactional consistency

A transactional grammar that simulates transactions using LOCK TABLEs is required. The RecoveryConsistency Reporter can then be used to validate that no partial transactions appear in the database after recovery.

See also

  • RQG Performance Comparisons

  • RQG Extensions for MariaDB Features

  • Optimizer Quality

  • QA Tools

  • Worklog Quality Checklist Template

This page is licensed: CC BY-SA / Gnu FDL

QA Metrics

Code coverage

The gcov/lcov reports are run by BuildBot and can be viewed here. The tests that are run to obtain the coverage percentage are listed under the f_gcov Factory in the BuildBot configuration file.

Bugs

The list of MariaDB bugs is available on JIRA.

This page is licensed: CC BY-SA / Gnu FDL