Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Sponsorships are crucial for ongoing and future development of the MariaDB project! There are a number of easy ways you for you to help the project:
Contribute with developer time. If your organization has talented developers familiar with MariaDB or MySQL codebase they can become part of the MariaDB team and contribute to the development of the MariaDB project.
Hire a developer that you dedicate to work on the MariaDB project.
A pure with no strings attached
This page is licensed: CC BY-SA / Gnu FDL
MariaDB has participated in since 2013. This section contains pages providing information for each year.
We believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , ) and on , which allows you to scale your reads & writes. Lately, we also have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Remove from the navigation. Discuss those pages with Max.
MariaDB applied to participate in the first .
Contributing Code
For contributors interested in MariaDB development, explore open projects via JIRA and check for beginner-friendly tasks. Engage with the community on the maria-developers mailing list, Slack , Zulip, or IRC channel for guidance.
Contributing to the MariaDB Project
The success of MariaDB relies heavily on community involvement. You can contribute in various ways, even if you are not a developer.


The success of MariaDB relies heavily on community involvement. You can contribute in various ways, even if you are not a developer:
Bug Reporting: Create an account to report bugs.
Knowledge Sharing: Answer questions in the documentation or on IRC and Zulip.
Testing: Upload bug tests to the FTP server for private testing.
Documentation: Write or translate articles and documentation for MariaDB.
Advocacy: Promote MariaDB and participate in open source events.
Support: Donate time or resources, or engage your company in supporting MariaDB.
Explore more ways to get involved on the .
The success of MariaDB depends on the participation of the community. MariaDB would not be as good as it is today without the participation of the entire MariaDB community.
There are several ways to contribute besides , and not all of them require strong C/C++ skills. Areas for non-developers include:
To report a bug you'll need to sign up for an account by clicking on the Create an account link below the login fields
Helping other people answering problems or even fixing their bugs on in the #maria channel on or on MariaDB's Zulip instance at .
A great way to get started in MariaDB is to participate in e-mail discussions via our mailing lists (whichever list best matches your interests):
Sensitive security issues can be sent directly to the persons responsible for MariaDB security: security [AT] mariadb (dot) org.
You can find additional email address, email archives and ways to connect with MariaDB people .
All MariaDB contributors are expected to follow the .
See also the pages for new developers on the MariaDB Foundation website:
This page is licensed: CC BY-SA / Gnu FDL
MariaDB provides a secure FTP, SFTP and WebDAV server where you can upload files to be used by MariaDB developers, for example table structures and data for bug reports.
The folder tree consists of:
The public folder for files that the MariaDB developers want to give the public access to (patches, samples etc).
The private folder for uploads. Files uploaded there can only be accessed by MariaDB developers. You will not be able to see your upload and this folder does not allow downloads. This is done to protect any sensitive information which may be in test results, mysqld & core files. Upload those into this folder.
The secret folder is for private downloads. Files in this folder are not visible so you will need the complete filename to successfully download a file from this folder.
To share files with MariaDB developers, upload it into the private directory with either:
SFTP client (scp), enter 'anonymous' as the password:
You can ignore the 'fsetstat: Permission denied' error.
WebDAV client (curl):
FTP client (lftp); enter 'anonymous' as the password:
You can ignore the 'network error'.
Note for MariaDB developers: please request your access to the SFTP service if not already at ftp@mariadb.org (provide public SSH key and username). You will then be able to access the service with:
or with HTTPS at .
For contributors interested in MariaDB development, explore open projects via and check for . Engage with the community on the mailing list, , , or channel for guidance.
General information about contributing to MariaDB (for developers and non-developers) can be found on the page.
There are many open development projects for MariaDB which you can contribute to (in addition to any ideas you may have yourself).
We participated in Google Summer of Code 2013. MariaDB and the MariaDB Foundation believes we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and we also work on MariaDB Galera Cluster which allows you to scales your reads & writes.
Please join us at irc.freenode.net at #maria to mingle with the community. Or subscribe to . Or both.
Please keep in mind that in April we travel a lot (conferences, busy time), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net.
Testing and Benchmarking
Bug tests can be uploaded to the 'private' directory of our FTP server.
Creating documentation for MariaDB.
Advocating MariaDB in your area.
Participate in open source events and talk about MariaDB.
Donate time or money to the MariaDB project.
Ask your company to sponsor a feature.
We are using JIRA to manage the MariaDB project. Go to jira.mariadb.org and click on "Projects" to get to the MariaDB project. Browse around the unresolved and unassigned issues to see if there is something that interests you. Some issues have sponsors and you can be paid for doing them!
A list of beginner friendly tasks is also available.
Check the development plans for the next MariaDB version.
Join maria-developers and ask for suggestions of tasks you could do. Please include your programming experience and your knowledge of the MariaDB source and how much you know about using MySQL/MariaDB with the email so that we know which tasks we can suggest to you.
If this is your first project, check out the page. It lists projects that will make a good start.
Join MariaDB's Zulip instance at and ask for suggestions.
Join #/maria on and ask for suggestions.
If you have your own ideas, please submit them to JIRA so other MariaDB developers can comment on them and suggest how to implement them. You can of course also use the maria-developers list for this.
This section is mainly directed to developers with commit rights to the MariaDB git repository. However, we hope it’s also useful for anyone wanting to contribute code to MariaDB to know what a reviewer will expect from them.
This is not about coding style or if one should prefer C instead of C++. That would be a separate topic that should be created sooner or later.
When coding, try to create code that 'never has to be changed again'. Try to make the code as performant as possible. In general it is acceptable to spend 50% more time to make the code 15% faster than what you originally intended. Take that into account when you plan your time estimates! That said, don't try to add classes or functionality that is not yet used.
The code should be easy to read and follow the coding standards of the project. Patches that are smaller and simpler are often better than complex solutions. Don't make the server depend on new external libraries without first checking with Sergei or Monty!
Add code comments for anything that is not obvious. When possible, use assertions within the code to document expectations of arguments etc. In general, if the code requires complex comments, think if there is a better way to structure the logic. Simpler is often better and with fewer bugs.
Jira issue number and summary ex:MDEV-23839 innodb_fast_shutdown=0 hang on change buffer merge
An empty line
A short description of the problem
A description of the solution
Any extra information needed to understand the patch
The commit message should be self contained and the reviewer shouldn't preferably have to look at the Jira at all to understand the commit. This doesn’t mean that the commit message should include all background and different design options considered, as the Jira should contain.
Name of all reviewers and authors should be clear from the commit message. The preferred way would be (one line per person)
Reviewed-by: email
Co-authored-by: email
See for details
The default is that all code should be reviewed. Only in really extraordinary cases, like merge (where the original code was already reviewed) then it can be self-reviewed, which should clear from the commit. In this case the code should of course be tested extra carefully both locally and in buildbot before pushing.
All code should have a test case that shows that the new code works or, in case of a bug fix, that the problem is fixed! It should fail with an unpatched server and work with the new version. In the extreme case that a test case is practically impossible to do, there needs to be documentation (in the commit message, optionally also in Jira) how the code was tested.
The test case should have a reference to the Jira issue, if such one exists.
Patches related to performance should be tested either by the developer (for simple commits) or by performance testers. The result should be put in Jira with a summary in the commit.
Complex patches and should be tested by QA in a bb- branch before pushing. The Jira entry should include information that this has been done and what kind of test has been run.
Example: git push --force origin HEAD:bb-11.8-MDEV-1234
For anything not trivial, one should run either Valgrind or ASAN/MSAN on the new code. (Buildbot will do this for you if you can’t get valgrind or ASAN to work). At least the test case added should be tested by either. If the developer cannot do that for some reason, he should check the buildbot builders that do this and ensure that at least his test case doesn’t give any warnings about using not initialized memory or other failures.
For complex code the developer should preferably use gcov or some similar tool to ensure that at least not all not-error branches are executed. “mtr --gcov” or “dgcov.pl” can help you with this.
All code in MariaDB comes from one of the following sources:
MySQL
Code developed by people employed by the MariaDB Foundation.
Code developed by people employed by MariaDB Corporation.
Code shared with the MariaDB Foundation under the MCA.
Code with a known origin that is under a permissive license (BSD or public domain).
If you want the code to be part of the main MariaDB tree, you also have to give the MariaDB Foundation a shared copyright to your code. This is needed so that the foundation can offer the code to other projects (like MySQL).
You do this by either:
Signing the MariaDB Contributor Agreement (MCA) and then scanning and sending it to the foundation.
Sending an email to maria-developers where you say that your patch and all fixes to it are provided to the MariaDB Foundation under the MCA.
Licensing your code using the BSD license.
We need shared copyright for the following reasons:
to defend the copyright or GPL if someone breaks it (this is the same reason why the Free Software Foundation also requires copyright assignment for its code)
to be able to donate code to MySQL (for example to fix security bugs or new features)
to allow people who have a non-free license to the MySQL code to also use MariaDB (the MCA/BSD allows us to give those companies the rights to all changes between MySQL and MariaDB so they can use MariaDB instead of MySQL)
More information about the MCA can be found on the MCA FAQ page.
Ensure that you have compiled everything for your new code, in a debug server (configured with cmake -DCMAKE_BUILD_TYPE=Debug ) including embedded and all plugins that may be affected by your code change..
Run the mysql-test-run (mtr) test suite locally with your debug server.
For anything complex the full test suite should be run.
For something absolutely trivial, at least the main suite must be run.
Always push first to a bb- branch to test the code. When the bb- branch is green in you can push to the main branch. Take care of checking that Windows builds compiles (take extra care of checking this as this often fails) and that valgrind and msan builds doesn’t show any problems with your new test cases.
You can find your push at the link similar to .
If you have to do a rebase before pushing, you have to start from the beginning again.
When porting code from third parties (such as MySQL), make sure to attribute copyright to the right owner, in the header of each modified file.
For example: Copyright © 2000, 2018, Oracle and/or its affiliates. Copyright © 2009, 2020, MariaDB
The only exception is that if the changes are trivial and the rebase was trivial and the local mysql-test-run worked, then you can push directly to the main branch. Only do this if you are 99% sure there are no issues! * Please don't make us regret that we have made this one exception! When we have protected git branches, then the above rule will be enforced automatically as the protection will take care of this.
First create a Jira entry that explains the problems and the different solutions that can be used to solve the problem. If there is a new syntax include examples of queries and results.
After getting an agreement of the to-be-used solution, update the Jira entry with the detailed architecture of the suggested solution.
When the architecture is reviewed, the assigned developer can start coding.
When the code is ready, the Jira entry should be updated with the reviewer.
The reviewer checks the code and either approves it to be pushed or gives comments to the developers that should be fixed. In the later case the developer updates the code and gives it back to the reviewer. This continues until the code is approved.
If the design changes during the project, the design in Jira needs to be updated.
Ensure that the Jira issue is up to date.
For complex bugs that require redesign, follow the process in "Working on a new project"
For simpler bugs, one can skip the listing of different solutions and architecture. However one should still document the reason for the bug and how it's fixed or to-be-fixed, in a JIRA comment.
Ensure that code compiles, all MTR test works before asking for a review
Try to split a bigger project into smaller, self-contained change sets.
Automatic things, like renames of classes, variables, functions etc is better to be done in a separate commit.
Remember that the stability and security of any project hangs a lot on the reviewers. If there is something wrong with an accepted patch, it's usually the reviewer who is to be blamed for it, as the reviewer was the one who allowed it to go in!
Ensure that the code is licensed under New BSD or another approved license for MariaDB (basically any open source license not conflicting with GPL) or that the contributor has signed the MCA.
GPL is only allowed for code from MySQL (as MariaDB is already depending on MySQL code).
Ensure that commits are not too large. If the code is very large, give suggestions how to split it into smaller pieces. Merge commits, when rebasing is possible, are not allowed, to keep history linear.
Check that the commits message describes the commit properly. For code that improves performance, ensure that Jira and the commit message contains information about the improvements.
Check that there are no unexplained changes in old tests.
Check the quality of the code (no obvious bugs, right algorithms used)
Check if any code can be simplified or optimized. Using already existing functions, are loops optimal, are mutexes used correctly etc.
Check that there is an appropriate test case for the code. See ‘testing’ for what is required!
Ensuring the code follows the coding standard for MariaDB. This document should be created shortly, but in the meantime ask an old MySQL/MariaDB developer if you are unsure.
Ensuring that the code follows the architecture agreed for in Jira (if it's in Jira).
Code should be easy to understand (good code comments, good function and variable names etc).
Ensure you understand every single line of code that is reviewed. If not, ask the developer to add more comments to get things clear or ask help from another reviewer.
No performance degradations for all common cases.
Any code that touches any sensitive area (files, communication, login, encryption or security) needs to have another reviewer that is an expert in this area.
Getting Started For Developers (mariadb.org)
Get the Code, Build It, Test It (mariadb.org)
Writing Good Test Cases for MariaDB Server (mariadb.org)
Submitting a Pull Request (mariadb.org)
(for non-developers)
This page is licensed: CC BY-SA / Gnu FDL
We would like the authentication system to be able to authenticate against a LDAP Directory Server.
See .
Skills: C, working knowledge of LDAP
Mentor: Sergei Golubchik
this project is taken
Kerberos is a security mechanism used in a lot of financial institutions. A MySQL plugin that allows authentication against Kerberos is the goal here.
See .
Skills: C/C++, working knowledge of Kerberos
Mentor: Sergei Golubchik
The Microsoft Windows world is all about Active Directory and upstream MySQL Enterprise already has this feature (though its a paid offering). It would be great to have an open source equivalent.
See .
Skills: C/C++, working knowledge of Active Directory/SAMBA, Windows-based development environment
Mentor: Sergei Golubchik, Vladislav Vaintroub
Keystone is the OpenStack Identity Service. The idea would be to ensure that MariaDB can authenticate to Keystone directly.
Skills: Python, C/C++
Mentor: Mark Riddoch
this project is taken
MySQL and MariaDB use an old regex library, it works bytewise, and thus only supports one byte character set. It needs to be replaced by a modern multi-byte character set aware regex library.
Additionally a much requested REGEX_REPLACE function should be implemented. (See also mysql-udf-regexp for some UDF code that could be used as a starting point for this)
Detailed task description: MDEV-4425
Skills: C/C++
Mentor: Alexander Barkov
One of the reasons of bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table.
Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also — somewhat — on the application.
A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.
See MDEV-350.
Skills: C/C++
Mentor: Sergei Golubchik
this project is taken
Roles, close to SQL:2003 standard. See MDEV-4397.
Skills: C/C++
Mentor: Sergei Golubchik
This page is licensed: CC BY-SA / Gnu FDL
curl -T MDEV-XXXXX.tgz -u anonymous:anonymous https://webdav.mariadb.org/private/MDEV-XXXXX.tgz
CreatedMariaDB applied to participate in the 2020 Google Season of Docs, but was unsuccessful.
Please join us on Zulip to mingle with the community. You can also subscribe to maria-docs@lists.launchpad.net, the documentation mailing list.
You will choose a major relational DBMS, and, focusing on the most recent stable releases, document the process to migrate to MariaDB, including MariaDB equivalents to features in that system, and a detailed list of features that exist in one but not the other, as well as possible workarounds. For an example, see the work-in-progress as well as the documentation on (bearing that MariaDB is a MySQL fork, and is substantially more similar to MySQL than to other systems).
The capabilities of MariaDB Server are critical to producing large-scale applications. The current documentation lacks sufficient examples, and the examples warrant testing. Getting Started content would make this easier to adopt. MariaDB's implementation is based on ISO SQL/PSM.
and provide the ability to access a MariaDB Server from applications built in C/C++ and Java, respectively. The current documentation for these connectors lacks Getting Started guides.
The Storage Engine uses partitioning to provide data sharding through multiple servers. This task involves greatly expanding the existing documentation, including more detail about when to use Spider, basic usage tutorials, updating the , detailed examples of the effects of the and , as well as the . You will also ensure changes in the most recent Spider releases are properly documented.
is a full text search storage engine based on Groonga, which is an open-source CJK-ready fulltext search engine using column base. This project involves greatly expanding the existing MariaDB documentation on the use of this storage engine. A detailed tutorial and user guide, including examples of the various Mroonga , and the effects of changing their settings, as well as the parser and .
You will choose a major language and ensure that a substantial subsection of the documentation is translated into that language. See .
Loaded from the Loading Issues from [Jira](https://jira.mariadb.org/issues/?jql=labels=gsdocs19 and project=mdev order by key)
%renderedFields.description%
<>
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
We participated in Google Summer of Code 2014. MariaDB and the MariaDB Foundation believe we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on , which allows you to scale your reads & writes.
Please join us at irc.freenode.net at #maria to mingle with the community. Or subscribe to . Or both.
Please keep in mind that in April we travel a lot (conferences, busy time), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!
scp MDEV-XXXXX.tgz anonymous@ftp.mariadb.org:private/
(anonymous@ftp.mariadb.org) Password:
MDEV-XXXXX.tgz 100% 152KB 218.8KB/s 00:00
scp: remote fsetstat: Permission deniedlftp -u anonymous -e 'put MDEV-XXXXX.tgz' ftp://ftp.mariadb.org/private/
Password:
cd ok, cwd=/private
put: Access failed: 550 Issue during transfer: network error: error transferring data: read tcp
[...] read: connection reset by peer (MDEV-XXXXX.tgz)sftp user@ftp.mariadb.orgWe would like the authentication system to be able to authenticate against a LDAP Directory Server.
See .
Skills: C, working knowledge of LDAP
Mentor: Sergei Golubchik
this project is taken
One of the reasons for bad query plans is inadequate cost estimation of individual operations. A cost of reading a row in one engine might be a lot higher than in some other, but the optimizer cannot know it. Also, it uses hard-coded constants, assuming, for example, that evaluating a WHERE clause is 5 times cheaper than reading a row from a table.
Obviously, some kind of calibration procedure is needed to get these cost estimates to be relatively correct. It is not easy, because the estimates depend on the actual hardware where MariaDB is run (a cost of a row read is different on HD and SSD), and also — somewhat — on the application.
A simple and low-maintenance solution would be to use self-tuning cost coefficients. They measure the timing and adjust automatically to the configuration where MariaDB is run.
See MDEV-350.
Skills: C/C++
Mentor: Sergei Golubchik
MySQL 5.6 has a memcached plugin to InnoDB. MySQL 5.7 has improved performance of this. The task would be to port this to run against MariaDB, and make it work against XtraDB/InnoDB for the 10.1 series of MariaDB.
See MDEV-4674 for more.
Skills: C/C++
Mentor: Colin Charles
enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), as well as making sure we are fully OpenGIS compliant. MDEV-5813
Skills: C
Mentor: Holyfoot
User defined events are supported on several other databases in different form and semantics. Events are used to signal a named event in the database. Applications can use named events instead of polling, which uses more resources.
See MDEV-5532 for more.
Skills: C/C++
Mentor: Jan Lindstrom, Sergei Golubchik
MyISAM and Aria support special kinds of indexes that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.
This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.
Skills: C++
Mentor: Sergei Golubchik
this project is taken
This task is to add support for OR REPLACE and IF EXISTS / IF NOT EXISTS to all CREATE and DROP variants for all objects (where it makes sense). MDEV-5359
Skills: C++
Mentor: Sergei Golubchik
this project is taken
This is a research not a coding task. See MDEV-5776
Skills: SQL, Perl/Python or other language of your choice, mathematical statistics
Mentor: Elena Stepanova, Sergei Golubchik
It is a well-known and very old MySQL/MariaDB limitation that temporary tables can only be used once in any query; for example, one cannot join a temporary table to itself. This task is about removing this limitation. MDEV-5535
Skills: C++
Mentor: Sergei Golubchik
Implement a new plugin type that adds support for table UDFs — loadable User-Defined Functions that return tables, not scalar values.
Skills: C++
Mentor: Sergei Golubchik
The mysqlbinlog tool needs to understand global transaction ids. In particular, it should be possible to start and end the dump at the specified GTID. Both when reading binlog files and when connecting to a running server. See MDEV-4989.
If time permits, other client programs could be extended similarly, like mysqldump --master-data or the --sync-with-master command in mysqltest.
Skills: C++
Mentor:
See also «GSoC 2014 tasks» list in Jira.
This page is licensed: CC BY-SA / Gnu FDL
The capabilities of MariaDB Server are critical to producing large-scale applications. The current documentation lacks sufficient examples, and the examples warrant testing. Getting Started content would make this easier to adopt. MariaDB's implementation is based on ISO SQL/PSM.
and provide the ability to access a MariaDB Server from applications built in C/C++ and Java, respectively. The current documentation for these connectors lacks Getting Started guides.
The Storage Engine uses partitioning to provide data sharding through multiple servers. This task involves greatly expanding the existing documentation, including more detail about when to use Spider, basic usage, updating the , detailed examples of the effects of the and , as well as the . You will also ensure changes in the most recent Spider releases are properly documented.
is a full text search storage engine based on Groonga, which is an open-source CJK-ready fulltext search engine using column base. This project involves greatly expanding the existing MariaDB documentation on the use of this storage engine. A detailed tutorial and user guide, including examples of the various Mroonga , and the effects of changing their settings, as well as the parser and .
You will choose a major language and ensure that a substantial subsection of the documentation is translated into that language. See translations.
You will choose a major relational DBMS, and, focusing on the most recent stable releases, document the process to migrate to MariaDB, including MariaDB equivalents to features in that system, and a detailed list of features that exist in one but not the other, as well as possible workarounds. For an example, see the documentation on (bearing that MariaDB is a MySQL fork, and is substantially more similar to MySQL than to other systems).
Loaded from the MariaDB issue tracker Loading Issues from [Jira](https://jira.mariadb.org/issues/?jql=labels=gsdocs19 and project=mdev order by key)
%renderedFields.description%
Details:
Mentor:
%fields.assignee.displayName%
<>
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
Details:
Mentor:
%fields.assignee.displayName%
In 2024, MariaDB again participated in the . We believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , , ) and on , which allows you to scale your reads & writes. And we have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on to mingle with the community. You should also subscribe to the (this is the main list where we discuss development - there are also ).
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
In 2022, we again participated in the . The believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , , ) and on , which allows you to scale your reads & writes. And we have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on to mingle with the community. You should also subscribe to (this is the main list where we discuss development).
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
Part-time (175h) or full-time project (350h) - depending on scope MariaDB Vector is coming to MariaDB Server to serve AI Workloads. The current indexing strategy will use HNSW, but IVFFlat is a possible alternative that costs fewer resources to create. Having it as an option is desirable.
Part-time (175h) or full-time project (350h) - depending on scope
Our GIS functionality is limitted compared to other DBMSes. Given that MariaDB looks to facilitate migration from MySQL, we should be on par. We have a list of functions that are missing in MariaDB compared to MySQL, as described in . Our goal is to have as many of these functions available within MariaDB. Some of the functionality can be ported from MySQL, while others might require implementation from scratch.
Skills needed: Understanding of C++ development. Ability to navigate a large codebase (with help from mentor).Mentors: Anel Husakovic (primary) / Vicențiu Ciorbaru (secondary)
Full-time project 350h
Synonyms are an important feature, particularly as it helps smooth migration from other databases. While the initial project scope seems straightforward, there are a number of aspects that must be considered:
Grammar extension
Where will the synonyms definitions be stored?
How do synonyms map to the underlying privilege system? Who can create a synonym? Who can access a synonym?
Do we enforce the underlying object to exists before creating a synonym? What if the underlying object gets dropped?
What kind of error messages do we present to the user in various corner cases?
How do synonyms interact with replication (row based vs statement based)
How do synonyms interact with views (and views execution)
How to present synonyms to users (as part of INFORMATION_SCHEMA for instance?)
Performance considerations for multiple connections to the database.
Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Vicențiu Ciorbaru (primary) / Michael Widenius (secondary)
Full-time project 350h
Support generalized triggers like
the latter being a synonym for CREATE EVENT.
should STARTUP/SHUTDOWN triggers run exclusively? that is, STARTUP trigger is run before any connection is allowed or in parallel with them? Same for SHUTDOWN.
Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Sergei Golubchik
Part-time project 175h
my_vsnprintf() is used internally in the server as a portable printf replacement. And it's also exported to plugins as a service.
It supports a subset of printf formats and three extensions:
%smeans that a string should be quoted like anidentifier`
%b means that it's a binary string, not zero-terminated; printing won't stop at \0, so one should always specify the field width (like %.100b)
%M is used in error messages and prints the integer (errno) and the corresponding strerror() for it
%T takes string and print it like %s but if the string should be truncated puts "..." at the end
gcc knows printf formats and check whether actual arguments match the format string and issue a warning if they don't. Unfortunately there seems to be no easy way to teach gcc our extensions, so for now we have to disable printf format checks.
An better approach would be to use gcc compatible format extensions, like Linux kernel does. We should migrate to a different syntax for our extensions
%sI to mean "print as an identifier"
%sB to mean "print a binary string"
%uE to mean "print an errno"
%sT to put a "..." as truncation indicator
old formats can still be supported or they can be removed and in the latter case the major version of the service should be increased to signal an incompatible change.
All error messages and all usages of my_vsnprintf should be changed to use the new syntax. One way to do it is to disable old syntax conditionally, only in debug builds. All gcc printf format checks should be enabled.
Skills needed: Understanding of C development.Mentors: Sergei Golubchik
Full-time project 350h
cpimport is a binary that ingests data into MCS in an efficient manner reducing ingest timings significantly whilst preserving transaction isolation levels.
cpimport is relatively complex facility that reads data from local file/S3 parses it, converts and put into MCS-specific files. cpimport is unable to read a big-sized single CSV file from disk in parallel. Apache Arrow has a CSV read faciilty that can do parallel CSV read. The goal of the project is to replace an existing homebrew CSV parser implemented in cpimport with the one from Apache Arrow.
Skills needed: modern C++.Mentors: Leonid Fedorov
Full-time project 350h
Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.
Skills needed: modern C++Mentors: Roman Nozdrin
Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!
This page is licensed: CC BY-SA / Gnu FDL
MDEV-21978 make my_vsnprintf to use gcc-compatible format extensions (Part-time project - 175h) MariaDB has its own implementation of most standard C libraries. This is to ensure compatibility across different platforms. Over time this library has evolved and is currently not behaving exactly like POSIX standard library. Thus we want to attain the principle of "least-surprise" with this library. Everything that is supported by standard printf functions should function the same with MariaDB's compatibility library extension.
Skills needed: C/C++. Project difficulty: easy Mentor Sergei Golubchik
MDEV-19160 JSON_DETAILED output unnecessarily verbose. (Part Time project - 175h) As is explained in the MDEV in detail, we want to improve JSON_DETAILED function to better suit our development and debugging purposes. This project will aim to clean up the function's implementation as well as introduce test cases as well as potential "nice-to-have" features to make developer's lives easier.
Skills needed: C/C++, understand JSON, OOP. Project difficulty: easy Mentor: Vicențiu Ciorbaru / Sergei Petrunia
Create utility to parse frm files and print their DDL (Full-time project - potential part-time (175 - 350h, depending on scope)) FRM files are what MariaDB uses to store metadata about tables. These files can be used to generate DDL statements (CREATE TABLE ...). We are lacking a utility to parse these files which could in turn make DBAs lives easier. The task of this project is to have this utility implemented, making use of MariaDB's FRM parsing logic. You may have to carry out some refactoring to extract the parsing code into a reusable library, once for MariaDB Server, once for the FRM parsing tool.
Skills needed: C/C++, understanding libraries and APIs. Project difficulty: medium to hard, depending on time allocated Mentor Vicențiu Ciorbaru / Sergei Golubchik / Monty Widenius
Add linear regression functions (Full-time project - 350h) This project consists of implementing dedicated regression functions within MariaDB. The specification of each function will be decided during the project, based on what other competing databases are offering. We will choose an implementation that best matches user expectations. It is the student's job to perform research into at least one other database and come up with exact semantics for each one of the functions in the MDEV.
Skills needed: C/C++, understanding of regression functions and mathematics, APIs, OOP. Project difficulty: medium Mentor Vicențiu Ciorbaru
Client compatible delimiter for mysqltest (Full-time project - potential part-time (175 - 350h, depending on scope)) We have a DELIMITER command that has a different syntax in client and mysqltest mysqltest needs an additional (previous) delimiter in the end of the DELIMITER expression, which confuses and makes it hard to copy and paste simple scripts with stored procedures from one to another. We would like to have a new command (--delimiter=client|mysqltest) with current behavior as default.
Expected outcomes: You will learn the C command line tools development minors. You'll get familiar with a part of MariaDB server infrastructure -- testing framework, which is written in C and Perl.
Skills required: good C knowledge; ability to use console terminal, ability to build from console. Project difficulty: easy Mentor Nikita Malyavin
Improve build speed of the server code base (Full-time project - 350h) We have already learned that precompiling the headers improves the build speed five times, however, the standard CMake solution doesn't fit for our comfort of development: CMake PCH generates one "header of headers" and pre-includes it into the each of compilation units. This makes everything that wasn't included by the unit itself be available across the precompiled set. There are the alternative ways of unit precompiling: clang modules and gcc .gch files. We want to prefer it on the per-comiler header.
Expected outcomes: You will make a strong practical impact of a high need and show off your mix of analysis and programming skills.
Skills required: good C knowledge, some CMake knowledge; ability to use console terminal, ability to build from console. Project difficulty: medium Mentor Nikita Malyavin
Improve mysqltest language(Full-time project - 350h) mysqltest has a lot of historical problems: ad hoc parser, weird limitations commands added as needed with no view over the total language structure, etc. The purpose of this work would be improvement of the language.
Expected outcomes : Rewrite mysqltest interpreter using either a real parser generator, e.g bison, or cleanly hand-written parser, e. recursive descent, that can be easily extended with new functionality. Added missing control structures, for example "else" for an existing "if". Simple expression evaluations without contacting server, i.e math and string comparisons. Added functionality for minimal string manipulation ,e.g substr function.
Skills required: good C/C++ knowledge, interest in parsers/interpreter. Project difficulty: medium Mentor Vladislav Vaintroub
Create a function to check for JSON intersection (Part-time project - 175h) This project aims at implementing JSON_INTERSECT() function between two JSON objects or two JSON arrays. If the two documents are json arrays then we want to return all the common elements between the two arrays. And in case of objects, we want to return all common key value pairs.
Skills required: C/C++, OOP, basic understanding of JSON. Project difficulty: easy Mentor Rucha Deodhar, Oleksandr Byelkin
MCOL-4995 Research/implement basic vectorized filtering for ARM platforms (Full-time project - 350h) AS of 6.2.2 Columnstore(MCS) supports vectorization on x86_64 platforms only. The goal of the project is to implement a vectorized low-level filtering for ARM platforms using 128bit ARM NEON extention(SVE is optional). Low-level filtering in the context is the simple predicate WHERE filters, e.g. WHERE c1 = 5 AND c2 in (10, 25). Please see the corresponding Jira issue for details.
Skills needed: C/C++, understand low-level platform specifics. Project difficulty: medium Mentor Roman Nozdrin
MCOL-4994 Build/run Columnstore on MacOS (Part-time project - 175h) As of Columnstore(MCS) 6.2.2 there is no way to compile/use the MCS engine on MacOS. The goal of this project is to be able to boostrap MariaDB + basic(maybe rudimentary) MCS on MacOS. There are number of known issues that prevents MCS compilation on MacOS: a number of offending preprocessor macroses/definitions specific for Linux x86_64 combination; MacOS doesn't provide syslog used by MCS as the only log messages sink. Please see the corresponding Jira issue for details.
Skills needed: C/C++, MacOS specifics. Project difficulty: easy Mentor Roman Nozdrin
Implement DISTRIBUTED JSON functions (Full-time project - 350h) As of 6.2.2 Columnstore there are two query execution modes: relatively slow but compatible Table mode and fast Select Handler mode. Table mode execution supports all JSON_* functions and SH mode doesn't support any. We want to add support for JSON_* functions family in SH query execution mode. Please see the corresponding Jira issue for details.
Skills needed: C/C++, JSON format. Project difficulty: easy Mentor Roman Nozdrin
MDBF-320 Better Grid view for buildbot.mariadb.org (Python / Javascript / Web Dev ( Full-time project 350h) Our CI/CD infrastructure uses a recent version of Buildbot. The GRID view plugin that comes with Buildbot is not adequate for our needs. In this project, you will discuss with your mentor as well as other MariaDB developers on how to best improve the User Experience of Buildbot's grid view for what MariaDB Developers needs to accomplish.
Skills needed: Understanding of web-dev technologies like Angular, React, and Javascript related libraries. Python may also be required. Mentor Vlad Bogolin / Andreia Hendea
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
We participated in the Google Summer of Code 2017 (we have participated previously in 2016, 2015, 2014, and 2013). The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).
A few handy tips for any interested students who are unsure which projects to choose:
The complete list of tasks suggested for GSoC 2017 is located in the . A subset is listed below.
The tool needs to be updated to understand the replication feature called (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.
The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB slave.
enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.
mysqltest is a client utility that runs tests in the framework. It sends sql statements to the server, compares the results with the expected results, and uses a special small for loops, assignments, and so on. It's pretty old and very ad hoc with many strange limitations. It badly needs a proper parser and a consistent logical grammar.
Currently one can specify only one authentication method per user. It would make a lot of sense to support multiple authentication methods per user. PAM-style. For example, one may want to authenticate using unix_socket when connecting locally, but ask for a password if connecting remotely or if unix_socket authentication failed.
Encrypting the client-server communications is closely related to authentication. Normally SSL is used for the on-the-wire encryption, and SSL can be used to authenticate the client too. GSSAPI can be used for authentication, and it has support for on-the-wire encryption. This task is about making on-the-wire encryption pluggable.
This would involve randomizing a bunch of queries (RQG based?), configurations and replication setups to search for segfaults, race conditions and perhaps invalid results.
This would involve organising a bunch of memory and threads to run on the same NUMA node. Attention to detail to ensure no additional race conditions get added in the process. A good understanding of systems programming would be useful. Ability to implement WIndows NUMA support at the same time as Linux NUMA support would be advantageous.
Current Cassandra Storage Engine was developed against Cassandra 1.1 and it uses Thrift API to communicate with Cassandra. However, starting from Cassandra 1.2, the preferred way to access Cassandra database is use CQL (Cassandra Query Language) and DataStax C++ Driver (). Thrift-based access is deprecated and places heavy constraints on the schema.
This task is about re-implementing Cassandra Storage Engine using DataStax C++ Driver and CQL.
At the moment NULL is just the maximum integer for a column (or empty string for VARCHAR/CHAR). We need a mechanism to store NULLs separately to give us full type ranges.
Right now it is cast to double which is not great for obvious reasons. It will mean modifying a lot of ColumnStore's version of MariaDB's function implementations and allowing column files to store more than 8 bytes per field.
This includes collations and anything that works on the length of the string.
Do you have an idea of your own, not listed above or in Jira? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
This year we are again participating in the . We, joined with the , believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , , ) and on , which allows you to scale your reads & writes. And we have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on to mingle with the community. You should also subscribe to the (this is the main list where we discuss development - there are also ).
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
CREATE TRIGGER ... AFTER STARTUP ...
CREATE TRIGGER ... BEFORE SHUTDOWN ...
CREATE TRIGGER ... ON SCHEDULE ...Students Interested:
2
Students Interested:
3
Students Interested:
1
Students Interested:
3
Students Interested:
4
Students Interested:
2
Students Interested:
1
Students Interested:
1
Students Interested:
1
Students Interested:
1
Details:
Skills:
C/C++
Mentor:
Details:
Skills:
C/C++
Mentor:
Details:
Skills:
C/C++
Mentor:
Holyfoot
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C, SQL
Mentor:
Daniel Black
Details:
Skills:
C, locking and threads, Windows system programming and Innodb internals would be a plus.
Mentor:
Daniel Black
Details:
Skills:
C/C++
Mentor:
Sergei Petrunia
Details:
Skills:
C/C++
Mentor:
Andrew Hutchings
Also see the List of beginner friendly issues from the MariaDB Issue Tracker.
Full-time project 350h
LOAD DATA INFILE can flexibly load data into a table from CSV-like files accessible by the mariadbdb process. LOAD XML INFILE can do it for XML files. LOAD DATA LOCAL INFILE and LOAD XML LOCAL INFILE can do it with files accessible by the client, but not by the server. But there are requests to suport loading more file formats and from other locations, for example, from S3.
This project is to implement support for LOAD plugins and refactor the current LOAD code accordingly. There are two kind of plugins — data parser plugin (CSV-like and XML) and transfer plugin (file and LOCAL). Implementing new plugins is not in the scope of this task, this task is mainly about moving existing code around, creating a possibility for new plugins (like JSON or S3).
Skills needed: C++, bisonMentors: Sergei Golubchik
Full-time project 350h
Implement a syntax and a plugin API that the server will use to generate embeddings for documents that the user stores in the database. This should allow to simplify significantly the vector pipeline. mariadbd will not generate embeddings internally, it will invoke a plugin to do that.
Skills needed: C++Mentors: Sergei Golubchik
Part-time project 175h
extend mysqltest language to support
standard arithmetic +, -, *, /, %
comparisons ==, !=, <, <=, >, >=
boolean &&, ||, may be ? :
if possible: string repetition, perl-style x (to replace SELECT REPEAT() in test files)
This should work in commands if, while
Can be done together with MDEV-36108 as a full-time project.
Skills needed: C++Mentors: Sergei Golubchik
Part-time project 175h
extend mysqltest language to support bash-like substitutions:
${var}
${parameter:offset:length}
${#parameter}
${parameter/pattern/string/flags}
may be ${parameterˆ}, ${parameterˆˆ}, ${parameter,}, ${parameter}
may be ${parameter@function} with functions like u, U, Q, etc
recursive expansion:
${${var}}
Can be done together with MDEV-36107 as a full-time project.
Skills needed: C++Mentors: Sergei Golubchik
Full-time project - potential part-time (175 - 350h, depending on scope)
FRM files are what MariaDB uses to store metadata about tables. These files can be used to generate DDL statements (CREATE TABLE ...). We are lacking a utility to parse these files which could in turn make DBAs lives easier. The task of this project is to have this utility implemented, making use of MariaDB's FRM parsing logic. You may have to carry out some refactoring to extract the parsing code into a reusable library, once for MariaDB Server, once for the FRM parsing tool.
Skills needed: C/C++, understanding libraries and APIs.Mentors: Vicențiu Ciorbaru / Sergei Golubchik
Part-time project 175h
The current methods of filtering replication events are limited to either 1) at binlog-write time, which can break point-in-time recovery because some committed transactions will be missing from the binary log, or 2) on the replica, which forces all events on the primary server to always be sent to the replica, which can be a security concern and is also not efficient. This task aims to eliminate these limitations by adding in another point at which replication filtering occurs: on the binlog dump threads. This would allow users to both maintain a consistent binary log, and minimize network traffic by guarding events which are never intended for replication.
Skills needed: C++Mentors: Brandon Nesterenko
Part-time project 175h TODO - A more ample description will be created.
Skills needed:Mentors: Vlad Radu
Full-time project 350h
Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.
Skills needed: modern C++Mentors: Roman Nozdrin
Full-time project 350h
MariaDB Columnstore lacks recursive CTE handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains recursive CTE.
Here is the info about the feature:
Skills needed: modern C++Mentors: Leonid Fedorov
Full-time project 350h
MariaDB Columnstore lacks UNION EXCEPT INTERSECT handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains UNION EXCEPT or UNION INTERCEPT
Here is the info about the feature:
Skills needed: modern C++Mentors: Alexey Antipovsky
Full-time project 350h
MariaDB Columnstore lacks indexes so it reads a lot of extra data from disk. This project introduces Bloom-filters to reduce data read from disk during the most IO heavy operation that is scanning.
Skills needed: modern C++Mentors: Roman Nozdrin
Full-time project 350h
Joins are very heavy algorithms, both in computation and/or in memory use. They need to hold a substantial amount of data in memory and perform hashing and other operations on that data. Joins can overflow memory limits and keeping balance between memory use and performance is tricky. Thus we have to filter information thaat is going into joins as much as possible. Columnstore already does great work in that regard, pushing WHERE filters before joins. This particular task is also concerned with that, adding Bloom filters' operations that approximate JOIN results and perform a secondary read to feed into joins data that is highly likely will be used in a join.
Skills needed: modern C++Mentors: Sergey Zefirov
Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!
This page is licensed: CC BY-SA / Gnu FDL
In 2023, MariaDB participated in the Google Summer of Code. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , C++, , , Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on Zulip to mingle with the community. You should also subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
Also see the from the MariaDB Issue Tracker.
Full-time project 350h
Our version of RocksDB is lagging behind the current upstream and needs to be updated. This isn't a case of simply updating the submodule, there have been significant API changes. It will likely require porting the latest MyRocks code over to the MariaDB storage API.
Skills needed: Understanding of C/C++ development. Preferably some experience with the MariaDB or MySQL codebase (but not essential).Mentor: Andrew Hutchings
Part-time (175h) or full-time project (350h) - depending on scope
Our GIS functionality is limitted compared to other DBMSes. Given that MariaDB looks to facilitate migration from MySQL, we should be on par. We have a list of functions that are missing in MariaDB compared to MySQL, as described in . Our goal is to have as many of these functions available within MariaDB. Some of the functionality can be ported from MySQL, while others might require implementation from scratch.
Skills needed: Understanding of C++ development. Ability to navigate a large codebase (with help from mentor).Mentors: Anel Husakovic (primary) / Vicențiu Ciorbaru (secondary)
Full-time project 350h
Synonyms are an important feature, particularly as it helps smooth migration from other databases. While the initial project scope seems straightforward, there are a number of aspects that must be considered:
Grammar extension
Where will the synonyms definitions be stored?
How do synonyms map to the underlying privilege system? Who can create a synonym? Who can access a synonym?
Do we enforce the underlying object to exists before creating a synonym? What if the underlying object gets dropped?
Skills needed: Understanding of C++ development. Able to write and discuss various tradeoffs such that we achieve a feature set that makes sense given the database's priorities.Mentors: Vicențiu Ciorbaru (primary) / Michael Widenius (secondary)
Part-time project 175h / Full-time project 350h - depending on scope
MariaDB ships with ColumnStore as a storage engine. However the architecture of ColumnStore is not like a traditional storage engine. Instead it relies on multiple database nodes working in unison. This means that starting up a ColumnStore enabled MariaDB service is not a trivial endeavour. This project seeks to create the necessary tooling around starting MariaDB with ColumnStore inside a OCI containers. You will be writing Dockerfiles, configuration files as well as bash scripts to achieve this.
The challenge of this project lies in:
Limited documentation around ColumnStore. There will be some time spent on the discovery process.
Formulating a clear plan to facilitate:
Starting MariaDB with ColumnStore
Upgrading MariaDB with ColumnStore on a version upgrade
Skills needed: Ability to develop durable bash scripts, understanding of container runtime and ability to confirm to container best practices. Able to incrementally develop and test functionality.Mentors: Daniel Black (primary - containers) / Andrew Hutchings (secondary - ColumnStore)
Part-time project 175h
The main focus of this project is around developer / sysadmin experience. We want to improve the quality of life of those using MariaDB. Migrating large datasets is one of these challenges. As is described in the MDEV, a simple limitation related to LOAD DATA INFILE can severly hamper developer productivity. A related problem is discussed in .
The goal of this project is to come up with a solution for storing warnings during LOAD DATA INFILE. This will require modifying the existing server codebase to create an SQL interface for processing the generated warnings.
Challenges:
LOAD DATA INFILE can process large datasets. That means that the server must not just store all warnings in memory. You will need to make use of already existing mechanisms (creating temporary tables) so one can spill to disk.
Skills needed: Understanding of C++ development.Mentors: Anel Husakovic (primary) / Daniel Black (secondary)
Full-time project 350h
Our CI/CD infrastructure uses a recent version of Buildbot. The GRID view plugin that comes with Buildbot is not adequate for our needs. In this project, you will discuss with your mentor as well as other MariaDB developers on how to best improve the User Experience of Buildbot's grid view for what MariaDB Developers needs to accomplish.
Skills needed: Understanding of web-dev technologies like Angular, React, and Javascript related libraries. Python may also be required.Mentor: Vlad Bogolin
MariaDB Columnstore is a columnar engine for MariaDB Server for OLAP workload. MCS is also a distributed multithreaded application written in C++. C++20 is curently used for development. There is a number of interesting MCS projects to be part of in both research and production programming areas.
Part-time project 175h / Full-time project 350h - depending on scope
MCS uses interepreted execution to calculate SQL expressions results. Here is an example of a SQL expression 'table1.col1 + FLOOR(table2.col1)'. Given that table1.col1 is DECIMAL and table2.col1 is DOUBLE there is a number of conditions that drives the calculation of this relatively simple example in runtime. Given that SQL types and expression tree are known before the query begins it is possible to replace interpretation with JIT to produce specialized compiled bytecode that is:
small
has no or almost no branches
optimized for the specific platform it is run at
This is a research mostly project which goal is to produce a set of microbenchmarks that:
leverages any JIT compiler available, e.g. LLVM, MIR
demonstrates a negative and positive effects of using JIT
Skills needed:
C++
at least basic compilers internals knowledge
Mentor: Roman Nozdrin
Full-time project 350h
cpimport in MCS is a standalone tool that does bulk ingestion outside SQL interface. It takes a source data as an input and puts it into MCS cluster. This put is an atomic operation that supports rollback. The sources can either local files or files on S3. The only format cpimport now reads is CSV with custom:
delimiters
quotation signs
NULL symbol
The goal of this project is to either teach cpimport to support parquet format as input format or introduce a modular framework to add input formats. This project is mostly about reading/writing production code where the challenges are:
to learn a codebase
produce a feature
support the feature with unit and integration tests using existing frameworks
cpimport consist of:
a buffer where parsed data lines goes in a form of low-level representation of SQL datatypes
a set of parser threads that populates the buffer
a set of writer threads that takes values that makes a single SQL record and puts them into the corresponding files
Parser threads now has a fixed Delimiter Separated Values parser that can be parametrized only with:
escape character
'enclosed by' characters
The suggested approach is to replace this DSV parser with a modular one that understands how to read popular formats, e.g. parquet, Arrow, Avro, JSON
Skills needed:
C++
production development tooling like git, Cmake
Mentor: Gagan Goel
Part-time project 175h / Full-time project 350h - depending on scope
MCS uses scalar processing to calculate SQL expressions results. The expressions can be in projection or filtering part of a SQL query. Here is an example of a SQL expression 'table1.col1 + FLOOR(table2.col1)'. In most cases scalar processing can be replaced with a vectorized execution that reduces a number of cycles to render the result of an expression. The challenge of this project is that in-memory representation ca be both vertical and horisontal.
This is a research mostly project which goal is to produce a set of microbenchmarks that:
unveils limitations or problems applying vectorization for expressions
compares performance for cases:
vectorized execution with vertical data
vectorized execution with horizontal data
Skills needed:
C++
ASM knowledge to manage with low-level part of this project
Mentor: Andrey Piskunov
Full-time project 350h - depending on scope
Fuzzing is a well-known technique for finding various types of bugs. This task is to integrate libFuzzer, sanitizers (ASan, TSan, UBSan) and MCS Columnstore into one fuzzing pipeline and create a fuzzing infrastructure. This task requires:
Add support to the Columnstore for building with sanitizers (ASan, TSan, UBSan)
Write a code which integrates C++ MariaDB Connectors and libFuzzer.
Prepare a valid corpus with SQL sripts suitable for Columnstore.
Create a fuzzing infrastructure.
Skills needed:
Basic knowledge how to work with C++ build tools (CMake, clang, ld, rtld).
Basic C++.
Mentor: Denis Khalikov
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
We participated in the Google Summer of Code 2016 (we have participated previously in 2015, 2014, and 2013). The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB MaxScale which is a pluggable database proxy.
Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).
Please keep in mind that in April we travel a lot (conferences, busy time focusing on a release), so if you have a question and nobody on IRC answers, don't feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!
The complete list of tasks suggested for GSoC 2016 is located in the . A subset is listed below.
The tool needs to be updated to understand the replication feature called (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.
With one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like , , etc). This task is to add support for aggregate .
enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.
MyISAM and Aria support special that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.
This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.
MySQL Master HA (MHA) is a tool to assist with automating master failover and slave promotion within short downtime, without suffering from replication consistency problems, and without performance penalty. We would like to have this tool support MariaDB 10 .
Provide import and export functions for popular data formats like JSON, XML (limited), PHP, ... for Connector/C and MariaDB Server (which use same code base for dynamic columns)
Design a filter that will capture incoming inserts, updates and deletes, for specified tables (as regex) in a separate log file that is consumable as JSON or CSV form. So that external ETL processes can process it for data uploading into DWH or big data platform. Optionally a plugin that takes this log into a Kafka broker that can put this data on Hadoop node can be developed as next step.
Develop a MaxScale filter that will translate SQL Server syntax to MariaDB syntax. Develop a SQL Server client protocol plugin.
Create additional entry points into MaxScale that the Lua side scripts can use. Various types of functions can be added ranging from SQL processing functions to utility functions which communicate with MaxScale.
Create a filter which can inject queries before the client executes any queries. This filter could be used for various purposes for example auditing.
Current Cassandra Storage Engine was developed against Cassandra 1.1 and it uses Thrift API to communicate with Cassandra. However, starting from Cassandra 1.2, the preferred way to access Cassandra database is use CQL (Cassandra Query Language) and DataStax C++ Driver (). Thrift-based access is deprecated and places heavy constraints on the schema.
This task is about re-implementing Cassandra Storage Engine using DataStax C++ Driver and CQL.
Currently MariaDB ignores trailing spaces when comparing values of the CHAR, VARCHAR, TEXT data types. In some cases it would be nice to take trailing spaces into account. This task will introduce a set of new collations that will make this possible.
Are you a student interested in working on something? Let us know here.
This page is licensed: CC BY-SA / Gnu FDL
What kind of error messages do we present to the user in various corner cases?
How do synonyms interact with replication (row based vs statement based)
How do synonyms interact with views (and views execution)
How to present synonyms to users (as part of INFORMATION_SCHEMA for instance?)
Performance considerations for multiple connections to the database.
Setting configuration variables via environment switches where appropriate
Declaratively (likely docker-compose yml file) state the system's architecture.
Documenting the necessary steps to deployment
Producing a blog of its operation
Optionally enable deployment via Kubernetes
Implementing the plan and creating a CI/CD pipeline for testing.
Students Interested:
2
Students Interested:
6
Students Interested:
2
Students Interested:
3
Students Interested:
2
Students Interested:
2
Students Interested:
1
Students Interested:
2
Students Interested:
1
Students Interested:
3
Students Interested:
2
Details:
Skills:
C/C++
Mentor:
Kristian Nielsen
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Holyfoot
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Skills:
Perl
Mentor:
Colin Charles
Students Interested:
2
Details:
Skills:
C
Mentor:
Oleksandr Byelkin, Georg Richter
Details:
Skills:
C/C++
Mentor:
Markus Makela
Details:
Skills:
C/C++
Mentor:
Markus Makela & Massimiliano Pinto
Details:
Skills:
C/C++
Mentor:
Markus Makela
Details:
Skills:
C/C++
Mentor:
Markus Makela
Details:
Skills:
C/C++
Mentor:
Sergei Petrunia
Details:
Skills:
C/C++
Mentor:
Alexander Barkov
We participated in the Google Summer of Code 2015. MariaDB and the MariaDB Foundation believe we are making a better database that remains a drop-in replacement to MySQL. We also work on making LGPL connectors (currently in C, Java, C++ in development) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. Lately, we also have MariaDB MaxScale which is a pluggable database proxy.
Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).
Please keep in mind that in April we travel a lot (conferences, busy time focusing on a release), so if you have a question and nobody on IRC answers — do not feel disappointed, ask in an email to maria-developers@lists.launchpad.net. Asking on the mailing list means others benefit from your Q&A too!
At the moment, tasks that may be suitable for GSoC 2015 are listed in the MariaDB Issue Tracker under
This project consists of two parts -- it can either be performed by 2 students or 1 student with the relevant skills:
The tool needs to be updated to understand the replication feature called (GTIDs) in MariaDB 10. The current version does not support GTIDs and the MySQL variant does not speak MariaDB 10's GTIDs.
in MySQL 5.6 also supports streaming servers for . This is important as the MHA tool can also use this feature.
We have the concept of (non-materialized) columns, and currently to have an on a virtual column one has to materialize it. To support indexes on fully virtual columns, a storage engine must call back into the server to calculate the value of the virtual column.
User functions (UDF-like) that return a table, rows and columns. It should be possible to use it in other statements. A possible implementation could be: the function exports a generator, we create a handler of the hidden "storage engine" class, no indexes, and convert this generator to rnd_init/rnd_next. Need to disable rnd_pos somehow. Alternatively, it can materialize the result set in a temporary table (like information_schema does), then this table can be used normally.
With one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like , , etc). This task is to add support for aggregate .
enhancements for 10.1 that we want to work on include adding support for altitude (the third coordinate), converters (eg. ST_GeomFromGeoJSON - ST_AsGeoJSON, ST_GeomFromKML - ST_AsKML, etc.), Getting data from SHP format (shp2sql convertor), as well as making sure we are fully OpenGIS compliant.
MySQL 5.6 has a memcached plugin to InnoDB. MySQL 5.7 has improved performance of this. The task would be to port this to run against MariaDB, and make it work against XtraDB/InnoDB for the 10.2 series of MariaDB.
The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB slave.
MyISAM and Aria support special that only store the hash of the data in the index tree. When two hashes match in the index, the engine compares actual row data to find whether the rows are identical. This is used in internal temporary tables that the optimizer creates to resolve SELECT DISTINCT queries. Normal unique indexes cannot always be used here, because the select list can be very long or include very long strings.
This task is to provide a direct SQL interface to this feature and to allow users to create these indexes explicitly. This way we can have unique constraints for blobs and very longs strings.
It is a well-known and very old MySQL/MariaDB limitation that temporary tables can only be used once in any query; for example, one cannot join a temporary table to itself. This task is about removing this limitation
MySQL Master HA (MHA) is a tool to assist with automating master failover and slave promotion within short downtime, without suffering from replication consistency problems, and without performance penalty. We would like to have this tool support MariaDB 10 .
Provide import and export functions for popular data formats like JSON, XML (limited), PHP, ... for Connector/C and MariaDB Server (which use same code base for dynamic columns)
Design a filter that will capture incoming inserts, updates and deletes, for specified tables (as regex) in a separate log file that is consumable as JSON or CSV form. So that external ETL processes can process it for data uploading into DWH or big data platform. Optionally a plugin that takes this log into a Kafka broker that can put this data on Hadoop node can be developed as next step.
Develop a MaxScale filter that will translate SQL Server syntax to MariaDB syntax. Develop a SQL Server client protocol plugin.
This page is licensed: CC BY-SA / Gnu FDL
Details:
Skills:
C/C++
Mentor:
Kristian Nielsen
Details:
Skills:
C/C++
Mentor:
Sergey Vojtovich
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Holyfoot
Details:
Skills:
C/C++
Mentor:
Colin Charles
Details:
Skills:
C/C++
Mentor:
Kristian Nielsen
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Details:
Skills:
C/C++
Mentor:
Sergei Golubchik
Skills:
Perl
Mentor:
Colin Charles
Details:
Skills:
C
Mentor:
Oleksandr Byelkin, Georg Richter
Details:
Skills:
C/C++
Mentor:
Markus Makela
Details:
Skills:
C/C++
Mentor:
Markus Makela & Massimiliano Pinto
We participated in the Google Summer of Code 2018. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us at irc.freenode.net at #maria to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).
A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
Loaded from the
MariaDB ColumnStore supports DECIMAL with some limitations:
We do not support the full DECIMAL range that is in MariaDB
In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision
Implementing this will likely require the following:
Implementation of methods to handle MariaDB's DECIMAL format
Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)
Modification of the primitives processor for the math
Modification of the function expression processor to handle the new type
We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi.
This will likely take the form of:
A reader in ExeMgr which will convert messages from mcsapi into jobs
Code in mcsapi to send/receive the messages
Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects).
We should look at other ORM frameworks for inspiration as the API design.
This task to do the design for this API.
The mysqlbinlog client program needs to be updated to support GTID.
Here is a suggested list of things to be done:
The --start-position and --stop-position options should be able to take
GTID positions; or maybe there should be new --start-gtid and --stop-gtid
options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.
A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.
Probably some more things will come up during the work, but this looks like a reasonable start.
Add an UPDATE operation that returns a result set of the changed rows to the client.
I'm not exactly sure how the corresponding multiple-table syntax should look like, or if it is possible at all. But already having it for single-table updates would be a nice feature.
In MySQL, Optimizer trace is a JSON object recording the execution path through the optimizer, decisions that were made and the reasons for them. See
Users were asking for MariaDB to have a similar feature.
The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave.
Setting up a new slave currently involves: 1) installing MariaDB with initial database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying initial data from the master to the slave; and 4) starting the slave with START SLAVE. The idea is to automate step (3), which currently needs to be done manually.
The syntax could be something as simple as
LOAD DATA FROM MASTER
This would then connect to the master that is currently configured. It will load a snapshot of all the data on the master, and leave the slave position at the point of the snapshot, ready for START SLAVE to continue replication from that point.
The idea is to do this non-blocking on the master, in a way that works for any storage engine. It will rely on row-based replication to be used between the master and the slave.
At the start of LOAD DATA FROM MASTER, the slave will enter a special provisioning mode. It will start replicating events from the master at the master's current position.
The master dump thread will send binlog events to the slave as normal. But in addition, it will interleave a dump of all the data on the master contained in tables, views, or stored functions. Whenever the dump thread would normally go to sleep waiting for more data to arrive in the binlog, the dump thread will instead send another chunk of data in the binlog stream for the slave to apply.
A "chunk of data" can be:
A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION
A range of N rows (N=100, for example). Each successive chunk will do a range scan on the primary key from the end position of the last chunk.
Sending data in small chunks avoids the need for long-lived table locks or transactions that could adversely affect master performance.
The slave will connect in GTID mode. The master will send dumped chunks in a separate domain id, allowing the slave to process chunks in parallel with normal data.
During the provisioning, all normal replication events from the master will arrive on the slave, and the slave will attempt to apply them locally. Some of these events will fail to apply, since the affected table or row may not yet have been loaded. In the provisioning mode, all such errors will be silently ignored. Proper locking (isolation mode, eg.) must be used on the master when fetching chunks, to ensure that updates for any row will always be applied correctly on the slave, either in a chunk, or in a later row event.
In order to make the first version of this feature feasible to implement in a reasonable amount of time, it should set a number of reasonable restrictions (which could be relaxed in a later version of the feature):
Give up with an error if the slave is not configured for GTID mode (MASTER_USE_GTID != NO).
Give up with error if the slave receives any event in statement-based binlogging (so the master must be running in row-based replication mode, and no DDL must be done while the provisioning is running).
Give up with an error if the master has a table without primary key.
Secondary indexes will be enabled during the provisioning; this means that tables with large secondary indexes could be expensive to provision.
As a follow-on to MDEV-4691 we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that.
From having written , there were a couple things I would like to see in the plugin encryption interface.
First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others.
Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around).
This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.)
Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it.
It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!
Currently only a few aggregate function are supported as window functions, the list can be found here
So in MDEV-7773, support for creating of custom aggregate functions was added. Now this task would deal with extending that feature and make custom aggregate functions behave as window functions
An example of a creating a custom aggregate function is given below:
Currently no true LOCK=NONE exists on slave. Alter table is first committed on master then it is replicated on slaves. The purpose of this task is to create a true LOCK=NONE
Implementation Idea
Master will write BEGIN_DDL_EVENT in binlog after it hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT.
On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On reciving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute untill ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are avaliable then event will be enqueued, then If we recieved rollback event the we will simply discard event from queue, If we recieved commit event then SQL thread will syncrolysly process DDL event.
mysqltest has a lot of historical problems:
ad hoc parser, weird limitations
commands added as needed with no view over the total language structure
historical code issues (e.g. casts that become unnecessary 10 years ago) etc
A lot can be done to improve it.
control structures, else in if, break and continue in while, for (or foreach) loop
proper expression support in let, if
had Cassandra Storage Engine which was developed for Cassandra 1.1.x. Back then, Cassandra provided a Thrift API, and that was what Cassandra-SE used.
Then, Cassandra 2.0 switched to using a different network protocol (and also changed the data model).
This task is to develop a Cassandra Storage Engine V2 using DataStax's C++ client library ().
See also: MDEV-8947 was a previous attempt to implement this engine. Unfortunately it didn't even produce a skeleton engine.
Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval
It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N.
To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:
Here:
'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.
'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.
'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.
A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:
first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it updates the matching rows
A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:
first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it deletes the matching rows
Currently, MariaDB privilege system only perform whiltelist check for access control to certain database, table and column. This makes it difficult if we need to block access to certain database/table/column while allow for all others.
—
A good solution would be to allow to REVOKE anything that a user is able to do — not only exactly those grants that were granted to a user, but also a subset. Like
Currently, the InnoDB system tablespace can only be automatically encrypted/decrypted by the background encryption threads if innodb_encrypt_tables=ON|FORCE, innodb_encryption_threads>0, and innodb_encryption_rotate_key_age>0. There is no way to manually encrypt/decrypt the tablespace.
File-per-table tablespaces can be manually encrypted with:
File-per-table tablespaces can be manually decrypted with:
Some users want a similar method that would allow them to manually encrypt/decrypt the InnoDB system tablespace.
This is loosely related to MDEV-14571, since both issues are related to the fact that the system tablespace can only be encrypted/decrypted by the background threads.
SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)
Task minimum:
Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.
Other ideas:
Automatic stop allocation in debvugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory
Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found.
MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.
This page is licensed: CC BY-SA / Gnu FDL
Version upgrade support for DECIMAL from the current form to the new form
Starting and stopping GTID should work both with local files, and with
--read-from-remote-server. For the latter, there are a couple of extra things that need doing in the master-slave protocol, see
get_master_version_and_clock()insql/slave.cc.
At the end of the dump, put these statements, to reduce the risk of those session variables incorrectly spilling into subsequent statements run in the same session:
rich enough expressions to make resorting to sql unnecessary in most cases
remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)
remove complex commands that do many sql statements under the hood, if they can be scripted, e.g. sync_slave_with_master
remove over-verbose treatment of rpl test failures
scoped variables
parameters for the source command
remove dead code
Details:
Mentor:
Andrew Hutchings
Details:
Mentor:
Andrew Hutchings
Details:
Mentor:
Andrei Elkin
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Varun Gupta
Details:
Mentor:
Andrei Elkin
Details:
Mentor:
Vladislav Vaintroub
Details:
Mentor:
Varun Gupta
Details:
Mentor:
Sachin Setiya
Details:
Mentor:
Sergei Golubchik
Details:
Mentor:
Sergei Golubchik
Details:
Mentor:
Vicentiu Ciorbaru
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Vicentiu Ciorbaru
Details:
Mentor:
Jan Lindström
Details:
Mentor:
Oleksandr Byelkin
In 2021, we again participated in the . The believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , , , ) and on , which allows you to scale your reads & writes. And we have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on to mingle with the community. You should also subscribe to (this is the main list where we discuss development).
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
SET session.server_id = @@global.server_id,
session.gtid_domain_id=@@global.gtid_domain_id;UPDATE [LOW_PRIORITY] [IGNORE] tbl_name
SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ...
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
RETURNING select_expr [, select_expr ...]CREATE AGGREGATE FUNCTION agg_sum(x INT) RETURNS DOUBLE
BEGIN
DECLARE z DOUBLE DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN z;
LOOP
FETCH GROUP NEXT ROW;
SET z = z + x;
END LOOP;
END|(max(f)-min(f))/k *(i-1), max(f)-min(f))/k *i-1).ANALYZE FAST TABLE tbl [ WITH n ROWS ] [SAMPLING p PERCENTS ]
PERSISTENT FOR COLUMNS (col1 [IN RANGE r] [WITH k INTERVALS],...)GRANT SELECT ON some_database.* TO a_user@%;
REVOKE SELECT ON some_database.secret_table FROM a_user@%;ALTER TABLE tab ENCRYPTION=YES;ALTER TABLE tab ENCRYPTION=NO;The mysqlbinlog client program needs to be updated to support GTID. Here is a suggested list of things to be done:
The --start-position and --stop-position options should be able to take
GTID positions; or maybe there should be new --start-gtid and --stop-gtid
options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.
A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.
Starting and stopping GTID should work both with local files, and with--read-from-remote-server. For the latter, there are a couple of extra
things that need doing in the master-slave protocol, seeget_master_version_and_clock() in sql/slave.cc.
At the end of the dump, put these statements, to reduce the risk of those session variables incorrectly spilling into subsequent statements run in the same session:
Probably some more things will come up during the work, but this looks like a reasonable start.
Details:
Mentor:
Brandon Nesterenko
Implement the standard behavior for
Also, this statement is supposed to work:
And these should not
Note that
should not list roles and privileges granted to PUBLIC (unless granted to xxx too), but
should, arguably, list them.
Details:
Mentor:
Oleksandr Byelkin
SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)Task minimum: Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.Other ideas:
Automatic stop allocation in debugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory
Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.
Details:
Mentor:
Oleksandr Byelkin
Background is this question on stackexchange:mariadb-compare-json The task is to provide a function that can be used to compare 2 json documents for equality, then name can be e.g JSON_NORMALIZE JSON_COMPACT already takes care of removing spaces, but this is not sufficient. Keys need to be (recursively) sorted , and if spaces are removed, then documents can be compared as binary strings.
Details:
Mentor:
Vicențiu Ciorbaru
The following linear regression functions exist in a number of other DBMSs, such as Oracle, PostgreSQL:
Some have also been added to Columnstore.
Details:
Mentor:
Nikita Malyavin
It would be useful if MariaDB had a utility that was able to parse frm files and print the DDL associated with the table. For example, it would be useful for users who performed a partial backup with mariadb-backup:partial-backup-and-restore-with-mariadb-backup But they forgot to also backup the table DDL, so they can't restore the tables using the following process: mysqlfrm is a tool that already exists that does similar things:mysqlfrm.py But it doesn't seem very user-friendly. It needs to be able to contact the local MariaDB server, and it also needs to be able to spawn a server instance, and it seems to need to be able to create a bunch of files during this process. e.g.:
Details:
Mentor:
Vicențiu Ciorbaru
JSON_DETAILED function ( ) is described as
We now got a use case for it: Optimizer Trace output. Optimizer trace is too large to be copied in full, instead we use expressions like
Our experience is that JSON_DETAILED has some room for improvement when it comes to the quality of automatic JSON formatting.
Example:
Things to note:
empty lines at the start (right before/after the "range_scan_alternatives")
"analyzing_index_merge_union":[] occupies 3 lines where one would be sufficient.
the same goes for "ranges"
One can look at the JSON pretty-printer that is used by EXPLAIN FORMAT=JSON and optimizer trace. It produces a better result (but it has room for improvement, too.) Extra: in MySQL, the function is called JSON_PRETTY. We should add ability to use this name as an alias.
Details:
Mentor:
Vicențiu Ciorbaru
Currently, histograms are stored as array of 1-byte bucket bounds (SINGLE_PREC_HB) or 2-byte bucket bounds (DOUBLE_PREC_HB). The table storing the histograms supports different histogram formats but limits them to 256 bytes (hist_size is tinyint).
This prevents us from supporting other kinds of histograms. The first low-hanging fruit would be to store the histogram bucket bounds precisely (like MySQL and PostgreSQL do, for example). The idea of this MDEV is to switch to JSON as storage format for histograms. If we do that, it will:
Improve the histogram precision
Allow the DBAs to examine the histograms
Enable other histogram types to be collected/used. h2. Milestone-1: Let histogram_type have another possible value, tentative name "JSON" when that is set, let ANALYZE TABLE syntax collect a JSON "histogram"
that is, the following should work:
this should produce {"hello":"world"}. h2. Milestone-2: produce JSON with histogram(). &#xNAN;()- the exact format is not specified, for now, print the bucket endpoints and produce output like this:
Milestone-2, part#2: make mysql.column_stats.histogram a blob. h2. Milestone-3: Parse the JSON back into an array Figure out how to use the JSON parser. Parse the JSON data produced in Milestone-2 back. For now, just print the parsed values to stderr. (Additional input provided on Zulip re parsing valid/invalid JSON histograms) h2. Milestone-4: Make the code support different kinds of Histograms Currently, there's only one type of histogram. smaller issue: histogram lookup functions assume the histogram stores fractions, not values. bigger issue: memory allocation for histograms is de-coupled from reading the histograms. See alloc_statistics_for_table, read_histograms_for_table. The histogram object lives in a data structure that is bzero'ed first and then filled later (IIRC there was a bug (fixed) where the optimizer attempted to use bzero'ed histogram) Can histograms be collected or loaded in parallel by several threads? This was an (unintentional?) possibility but then it was disabled (see TABLE_STATISTICS_CB object and its use) h3. Step #0: Make Histogram a real class Here's the commit:3ac32917ab6c42a5a0f9ed817dd8d3c7e20ce34d h3. Step 1: Separate classes for binary and JSON histograms Need to introduce
and a factory function
for now, let Histogram_json::point_selectivity() and Histogram_json::range_selectivity() return 0.1 and 0.5, respectively. h3. Step 2: Demonstrate saving/loading of histograms Now, the code already can:
collect a JSON histogram and save it.
when loading a histogram, figure from histogram_type column that this is JSON histogram being loaded, create Histogram_json and invoke the parse function.
Parse function at the moment only prints to stderr.
However, we should catch parse errors and make sure they are reported to the client.
The test may look like this:
h2. Milestone-5: Parse the JSON data into a structure that allows lookups. The structure is
and it holds the data in KeyTupleFormat (See the comments for reasoning. There was a suggestion to use in_vector (This is what IN subqueries use) but it didn't work out)
h2. Milestone 5.1 (aka Milestone 44)
Make a function to estimate selectivity using the data structure specified in previous milestone.
h2. Make range_selectivity() accept key_range parameters.
(currently, they accept fractions, which is only suitable for binary histograms)
This means Histogram_binary will need to have access to min_value and max_value to compute the fractions.
Details:
Mentor:
Sergei Petrunia
my_vsnprintf() is used internally in the server as a portable printf replacement. And it's also exported to plugins as a service.
It supports a subset of printf formats and three extensions:
%s` means that a string should be quoted like an identifier
%b means that it's a binary string, not zero-terminated; printing won't stop at \0, so one should always specify the field width (like %.100b)
%M is used in error messages and prints the integer (errno) and the corresponding strerror() for it
gcc knows printf formats and check whether actual arguments match the format string and issue a warning if they don't. Unfortunately there seems to be no easy way to teach gcc our extensions, so for now we have to disable printf format checks.
An better approach would be to use gcc compatible format extensions, like Linux kernel does. We should migrate to a different syntax for our extensions
%sI to mean "print as an identifier"
%sB to mean "print a binary string"
%uE to mean "print an errno"
%sT to put a "..." as truncation indicator
old formats can still be supported or they can be removed and in the latter case the major version of the service should be increased to signal an incompatible change.
All error messages and all usages of my_vsnprintf should be changed to use the new syntax and gcc printf format checks should be enabled.
Details:
Mentor:
Sergei Golubchik
JSON_CONTAINS can be used to test for JSON object equality in some cases, but we seem to lack a clear JSON_EQUALS function.
Details:
Mentor:
Vicențiu Ciorbaru
IO_CACHE has basically three read/write modes: only read, only write, and a sequential read/write FIFO mode SEQ_READ_APPEND.
While some performance-sensitive places, like replication slave thread, use SEQ_READ_APPEND, that may be a bottleneck. since reads and writes are sequential (and co-sequential i.e. reads and writes block each other).
The task is to implement a non-blocking mode or multi-reader, multi-writer use-case through a concurrent ring buffer implementation.
h2. Possible approaches
h3. Lock-free n-consumer, m-producer ring buffer
This implementation requires limiting a number of simultaneous accessors and reserving slots for them.
Lock-free implementations can contain busy waits, but no locks, except when a number of consumers or producers is exceeded. This can be controlled by a semaphore with a capacity of a number of cores.
This is an ideal way, but can be an overhaul because of complicated busy loops and slot management.
This is also hard because writes can be bigger than a buffer. See buffer excess.
h3. Simple rwlock-based non-blocking approach
The bottleneck basically occurs because SEQ_READ_APPEND blocks the whole time during buffer copy.
We can avoid it by moving the pointers first, thus allocating a place for copying, and then make a copy from/to the buffer without a lock.
rwlock will be used to access the pointers, i.e. reads access IO_CACHE::end_of_file with read lock to make ensure the borders, but writers access it with write lock.
h2. Buffer excess
Excesses make things work sequential.
When the buffer is full, a separate write buffer is created. When the write buffer is full, a flush happens.
Flushes wait for all writers to finish first, then lock the write buffer for flushing.
The read buffer can be flushed in a more relaxed way: no need to need to lock for flushing, but we have to lock for buffer allocation and wait for all writers.
Waiting for writers can be done with another rwlock.
h2. Single-readerness
The real-world cases are mostly single-consumer, and it is essential for IO_CACHE: it is variable-lengthed, and has no underlying data format,
so the reader always has to make at least two sequential reads (one to read size and another to read the body)
Single-readerness considerations can relax some conditions and ease the implementation
h2. io_cache_reserve api
We can add a function to reserve the space to writing for the case of writing big objects (both bigger then the write cache
and smaller then the latter, but big enough to not fit to the external buffer), for the cases like copying one cache to another.
The function should return future-like object, since we have to notify IO_CACHE back that the writing is finished (to make flush for example)
Details:
Mentor:
Nikita Malyavin
Formatting more complex strings in a SELECT statement can get awkward when there are many concat(), format(), etc calls involved.
It would be very cool and helpful to have a function that takes an input string and a formatting specification and returns string formatted using the rules the user passed in the specification.
A great example for such a function is the classic C printf function, which, in this context, would look something like:SELECT printf('%s %s, %s', first_name, last_name, job_title) from employees;
But it doesn't necessarily need to look this way, an alternative syntax could be Python-ish, which would leverage the fact that the server already knows the datatype of each field used in the formatting scheme:SELECT sformat('arg1: {}, arg2: {}', col1, col2) from table;
In that syntax one passes formatting options within the curly braces:
Ideally, this new function should use, behind the scenes, the existing builtin formatting functions in MariaDB (e.g. date_format(), format()) and even future formatting functions (e.g. MySQL's format_bytes(), format_pico_time()), so the syntax has to be designed in a smart way to accommodate easily future additions.
Details:
Mentor:
Sergei Golubchik
As part of the Jupyter Messaging protocol, the Jupyter frontend sends a complete_request message to the MariaDB kernel when the user invokes the code completer in a Jupyter notebook.
This message is handled in the do_complete function from the MariaDBKernel class.
In simpler words, whenever the user hits the key shortcut for code autocompletion in a notebook, the MariaDB kernel's do_complete function is called with a number of arguments that help the kernel understand what the user wants to autocomplete.
So the autocompletion infrastructure in the MariaDB kernel is already kindly provided by Jupyter, we only need to send back to Jupyter a list of suggestions based on the arguments that do_complete receives :-).
Ideally we should aim to enable at least database, table and column name completion and also SQL keyword completion.
But no worries, there are plenty of possibilities to extend the functionality even more if the accepted student turns out to be very productive :D
Details:
Mentor:
Robert Bindar
At this moment the MariaDB kernel is only capable of getting the results sets from the MariaDB client in HTML format and packing them in a Jupyter compatible format. Jupyter then displays them in notebooks like it would display Python Pandas dataframes.
Sure, the users can easily write SQL code to modify the content of a table like they would write in a classical command line database client.
But we want to go a bit further, we would love to have the capability to edit a result set returned by a SELECT statement (i.e. double click on table cells and edit) and have a button that users can press to generate a SQL statement that will update the content of the table via the MariaDB server.
Apart from interacting with the Jupyter frontend for providing this UI capability, we also have to implement a field integrity functionality so that we make sure users can't enter data that is not compatible with the datatype of the column as it is seen by the MariaDB server.
The project should start with a fair bit of research to understand how we can play with the Jupyter Messaging protocol to create the UI functionality and also to check other Jupyter kernels and understand what's the right and best approach for tackling this.
Details:
Mentor:
Andreia Hendea
Currently the MariaDB kernel doesn't impose any internal limits for the number of rows a user can SELECT in a notebook cell. Internally the kernel gets the result set from MariaDB and stores it in a pandas DataFrame, so users can use it with magic commands to chart data.
But this DataFrame is stored in memory, so if you SELECT a huge number of rows, say 500k or 1M, it's probably not a very good idea to create such a huge DataFrame.
We tested with 500k rows, and the DataFrame itself is not the biggest problem, it consumed around 500MB of memory. The problem is the amount of rows the browser needs to render, for 500k rows the browser tab with the notebook consumes around 2GB of memory, so the Jupyter frontend (JupyterLab, Jupyter Notebook) slows down considerably.
A potential solution is to introduce a two new config options which would specify:
a limit for the number of rows the Jupyter notebook should render, a reasonable default value for this could 50 rows for instance (display_max_rows)
a limit for each SELECT statement, limit_max_rows, that the kernel would use to determine whether it should store the result set in memory in a DataFrame or store the result set on disk. A reasonable default value might be 100k rows.
The trickiest part of the project though is that, once the kernel writes a result set on disk, the charting magic commands need to detect that the data is not in memory, it is on disk, and they should find a smart mechanism for generating the chart from the disk data without loading the entire data in memory (which would defeat the whole purpose of the project). This might involve finding a new Python plotting library (instead of current matplotlib) that can accomplish the job.
Details:
Mentor:
Vlad Bogolin
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
We participated in the 2020. The believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on , which allows you to scale your reads & writes. And we have , which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on to mingle with the community. Don't forget to subscribe to (this is the main list where we discuss development).
A few handy tips for any interested students who are unsure which projects to choose:
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
SET session.server_id = @@global.server_id,
session.gtid_domain_id=@@global.gtid_domain_id;GRANT xxx TO PUBLIC;
REVOKE xxx FROM PUBLIC;SHOW GRANTS FOR PUBLIC;CREATE ROLE PUBLIC;
DROP ROLE PUBLIC;
SET ROLE PUBLIC;
GRANT PUBLIC TO xxx;
REVOKE PUBLIC FROM xxx;SHOW GRANTS FOR xxx;SHOW GRANTS;REGR_SLOPE
REGR_INTERCEPT
REGR_COUNT
REGR_R2
REGR_AVGX
REGR_AVGY
REGR_SXX
REGR_SYY
REGR_SXY[ec2-user@ip-172-30-0-249 ~]$ cd /tmp
[ec2-user@ip-172-30-0-249 tmp]$ sudo mysqlfrm --server=root:@localhost:3306 /var/lib/mysql/db1/tab.frm --port=12345 --user=mysql
# Source on localhost: ... connected.
# Spawning server with --user=mysql.
# Starting the spawned server on port 12345 ... done.
# Reading .frm files
#
# Reading the tab.frm file.
#
# CREATE statement for /var/lib/mysql/db1/tab.frm:
#
CREATE TABLE `db1`.`tab` (
`id` int(11) NOT NULL,
`str` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
#...done.Represents JSON in the most understandable way emphasizing nested structures.SELECT
JSON_DETAILED(JSON_EXTRACT(trace, '$**.analyzing_range_alternatives'))
FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;source mdev19160-data.sql
mysql> select JSON_DETAILED(JSON_EXTRACT(a, '$**.analyzing_range_alternatives')) from t200\G
*************************** 1. row ***************************
JSON_DETAILED(JSON_EXTRACT(a, '$**.analyzing_range_alternatives')): [
{
"range_scan_alternatives":
[
{
"index": "a_b",
"ranges":
[
"2 <= a <= 2 AND 4 <= b <= 4"
],
"rowid_ordered": true,
"using_mrr": false,
"index_only": true,
"rows": 1,
"cost": 1.1752,
"chosen": true
}
],
"analyzing_roworder_intersect":
{
"cause": "too few roworder scans"
},
"analyzing_index_merge_union":
[
]
}
]CREATE TABLE mysql.column_stats (
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
...
hist_size TINYINT UNSIGNED,
hist_type ENUM('SINGLE_PREC_HB','DOUBLE_PREC_HB'),
histogram varbinary(255),
...{ "hello":"world"}SET histogram_type='json';
ANALYZE TABLE t1 persisent FOR ALL;
SELECT histogram FROM mysql.column_stats WHERE TABLE_NAME='t1' ;[
"value1",
"value2",
...
]class Histogram -- interface, no data members.
class Histogram_binary : public Histogram
class Histogram_json : public HistogramHistogram *create_histogram(Histogram_type)INSERT INTO mysql.column_stats VALUES('test','t1','column1', .... '[invalid, json, data']);
FLUSH TABLES;
# this should print some descriptive test
--error NNNN
SELECT * FROM test.t1;std::vector<std::string>-- Print 'arg1: col1, arg2: col2' where col1 from table is of datetime type and should be printed as: 'Sunday November 2021'
SELECT sformat('arg1: {%W %M %Y}, arg2: {}', col1, col2) FROM table;Also see the List of beginner friendly issues and issues labelled gsoc20 from the MariaDB Issue Tracker.
(Based on conversation with Igor) There are a lot of subquery conditions out there that are inexpensive to evaluate and have good selectivity. If we just implement MDEV-83, we may get regressions. We need to take subquery condition's selectivity into account. It is difficult to get a meaningful estimate for an arbitrary, correlated subquery predicate. One possible solution is to measure selectivity during execution and reattach predicates on the fly. We don't want to change the query plan all the time, one way to dynamically move items between item trees is to wrap them inside Item_func_trig_cond so we can switch them on and off.
Details:
Mentor:
Igor Babaev
Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval
It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N. To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:
Here:
'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.
'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.
'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.
WITH k INTERVALS says how many bins are included in the histogram. If it is omitted this value is taken from the system variable histogram_size.
Details:
Mentor:
Vicentiu Ciorbaru
Add support for FULL OUTER JOIN sql_join_full.asp One of the way how to implement is to re-write the query
into the following union all:
Here t1.a is some non-nullable column of t1 (e.g. the column of single column primary key).
Details:
Mentor:
Igor Babaev
supported in MySQL-8.0 and MSSQL
Details:
Mentor:
Igor Babaev
The mysqlbinlog client program needs to be updated to support GTID. Here is a suggested list of things to be done:
The --start-position and --stop-position options should be able to take
GTID positions; or maybe there should be new --start-gtid and --stop-gtid
options. Like --start-gtid=0-1-100,1-2-200,2-1-1000.
A GTID position means the point just after that GTID. So starting from GTID 0-1-100 and stopping at GTID 0-1-200, the first GTID output will probably be 0-1-101 and the last one 0-1-200. Note that if some domain is not specified in the position, it means to start from the begining, respectively stop immediately in that domain.
Starting and stopping GTID should work both with local files, and with --read-from-remote-server. For the latter, there are a couple of extra things that need doing in the master-slave protocol, see get_master_version_and_clock() in sql/slave.cc.
At the end of the dump, put these statements, to reduce the risk of those session variables incorrectly spilling into subsequent statements run in the same session:
Details:
Mentor:
Andrei Elkin
As a follow-on to MDEV-4691 we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that. From having written GSSAPI for the internal interface, there were a couple things I would like to see in the plugin encryption interface. First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others. Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around). This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.) Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it. It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!
Details:
Mentor:
Vicențiu Ciorbaru
With a few exceptions, most native aggregate functions are supported as window functions. In MDEV-7773, support for creating of custom aggregate functions was added. This task proposes to extend that feature and allow custom aggregate functions to be used as window functions An example of a creating a custom aggregate function is given below:
This functions can be used in the following query:
After this task is complete the following must also work:
Details:
Mentor:
Varun Gupta
mysqltest has a lot of historical problems:
ad hoc parser, weird limitations
commands added as needed with no view over the total language structure
historical code issues (e.g. casts that become unnecessary 10 years ago) etc
A lot can be done to improve it. Ideas
control structures, else in if, break and continue in while, for (or foreach) loop
proper expression support in let, if, etc
rich enough expressions to make resorting to sql unnecessary in most cases
remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)
remove complex commands that do many sql statements under the hood, if they can be scripted, e.g. sync_slave_with_master
remove over-verbose treatment of rpl test failures
scoped variables
parameters for the source command
remove dead code
Details:
Mentor:
Sergei Golubchik
A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:
first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it updates the matching rows
A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:
first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it deletes the matching rows
Details:
Mentor:
Igor Babaev
As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes. InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries. A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it
hide all compression libraries behind a single unified compression API. Either develop our own or use something like Squash. This would require changing all engines to use this API
use the same approach as in server services — create a service per compression library, a service implementation will just return an error code for any function invocation if the corresponding library is not installed. this way — may be — we could avoid modifying all affected storage engines
Details:
Mentor:
Sergei Golubchik
SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)
Task minimum:
Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.
Other ideas:
Automatic stop allocation in debvugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory
Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.
Details:
Mentor:
Oleksandr Byelkin
MariaDB ColumnStore supports DECIMAL with some limitations:
We do not support the full DECIMAL range that is in MariaDB
In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision Implementing this will likely require the following:
Implementation of methods to handle MariaDB's DECIMAL format
Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)
Modification of the primitives processor for the math
Modification of the function expression processor to handle the new type
Version upgrade support for DECIMAL from the current form to the new form
Details
Mentor:
Andrew Hutchings
CS as of 1.4.2 relies on glibc for regex processing. We need to replace glibc with re2 for LIKE, REGEX and other facilities that affects performance.
Identify places with glibc regex functions invocations
Pick the invocations that significantly affects timings of the query
Replace glibc regex calls with appropriate re2
Details
Mentor:
Roman
RowGroup is the unit of the data sent around CS cluster. It is basically a set of records for fixed size data types + additional storage area for binary data, e.g. strings. Right now there are lots of RowGroup methods that access RowGroup fixed size data using extra level of inderection. Here is the example: return ((int64_t) &data[offsets[colIndex]]); This expression uses extra assembly instruction to calculate the effective address. We want to remove 'offsets[colIndex]' part of this and similar expressions in RowGroup code.
Details
Mentor:
Roman
As of 1.4.1 CS uses two-phase sorting. Here are the phases:
Presort partial runs of data.
Merge the presorted partial runs produced during the 1st phase. Here is more detailed explanation of how sorting works as of 1.4.1
CS gets a portion of records data from previous steps of the query execution(RGData instance from the ring buffer of RGData-s) and produces a sorting run out of it using existing sorting class LimitedOrderBy. If the query contains LIMIT then we apply it at this phase. This allows to significantly reduce the data set cardinality. If the query contains LIMIT + OFFSET then CS builds a sorted run of the records that is up to LIMIT+OFFSET size. CS does this step in parallel dividing the whole data set into k runs where k is governed by a session variable - infinidb/columnstore_orderby_threads. At this phase CS tries to preallocate memory in QUEUE_RESERVE_SIZE batches. CS merges and sorts k presorted partial runs produced by a previous phase in a single thread. If the query contains DISTINCT keyword CS rebuilds a hash map to preserve uniqueness. We want to make 2nd phase also parallel using range partitioning of the presorted runs produced by the 1st phase. After 1st phase finishes we know the distribution of the sorting key values thus can divide the thread key values run into regions - buckets. Every 2nd phase thread takes values from corresponding region buckets (contains the same values region) from every 1st phase sorted run. Then all 2nd phase threads sorts its runs in parallel. In the end we put the sorted regions in asscending order of the key values into output stream.
Details
Mentor:
Roman
We need a bitmap to store NULL/empty values instead of in-column values for this.
Details
Mentor:
Anderw Hutchings
CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:
implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection
implement ANALYZE TABLE functionality for Columnstore engine
implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to MDEV-12313) together with NDV histograms to decrease I/O
We expect to have both unit and regression tests but this is optional.
Details
Mentor:
Roman
We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi. This will likely take the form of:
A reader in ExeMgr which will convert messages from mcsapi into jobs
Code in mcsapi to send/receive the messages Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects). We should look at other ORM frameworks for inspiration as the API design. This task to do the design for this API.
Details:
Mentor:
Andrew Hutchings
CS uses Volcano processing approach working on one value at a time. This is very inefficient way for analytics workload that usually uses lots of aggregation functions in projections, filtering or sorting. We are interested in using JIT for basic aggregation functions: sum, avg, count, min, max. The patch must compile and run a program that processes and returns the aggregation function result. We were written this description having LLVM in mind as it is widespread and has a lots of examples in the wild. I suggest to start looking at RowAggregation::addRowGroup() from ./utils/rowgroup/rowaggregation.cpp to get what it takmakees to get avg() function result. Here is the link on how to build fast a CS developer environment.
Details:
Mentor:
Roman
CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:
implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection
implement ANALYZE TABLE functionality for Columnstore engine
implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to MDEV-12313) together with NDV histograms to decrease I/O
We expect to have both unit and regression tests but this is optional.
Details:
Mentor:
Roman
Currently, histograms are stored as array of 1-byte bucket bounds (SINGLE_PREC_HB) or 2-byte bucket bounds (DOUBLE_PREC_HB). The table storing the histograms supports different histogram formats but limits them to 256 bytes (hist_size is tinyint).
This prevents us from supporting other kinds of histograms. The first low-hanging fruit would be to store the histogram bucket bounds precisely (like MySQL and PostgreSQL do, for example). The idea of this MDEV is to switch to JSON as storage format for histograms.
Details:
Mentor:
Sergei Petrunia
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
We participated in the Google Summer of Code 2019. The MariaDB Foundation believes we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, ODBC, Java) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.
Please join us on Zulip and on IRC to mingle with the community. Don't forget to subscribe to maria-developers@lists.launchpad.net (this is the main list where we discuss development).
A few handy tips for any interested students who are unsure which projects to choose:Blog post from former GSoC student & mentor
To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
Loaded from the
(Based on conversation with Igor) There are a lot of subquery conditions out there that are inexpensive to evaluate and have good selectivity. If we just implement , we may get regressions. We need to take subquery condition's selectivity into account. It is difficult to get a meaningful estimate for an arbitrary, correlated subquery predicate. One possible solution is to measure selectivity during execution and reattach predicates on the fly. We don't want to change the query plan all the time, one way to dynamically move items between item trees is to wrap them inside Item_func_trig_cond so we can switch them on and off.
An index on expression means something like
in this case the optimizer should be able to use an index. This task naturally splits in two steps:
add expression matching into the optimizer, use it for generated columns. Like in CREATE TABLE t1 (a int, b int, v INT GENERATED ALWAYS AS (a/2+b), INDEX (v));
support the syntax to create an index on expression directly, this will automatically create a hidden generated column under the hood
original task description is visible in the history
Histograms with equal-width bins are easy to construct using samples. For this it's enough to look through the given sample set and for each value from it to figure out what bin this value can be placed in. Each bin requires only one counter. Let f be a column of a table with N rows and n be the number of samples by which the equal-width histogram of k bins for this column is constructed. Let after looking through all sample rows the counters created for the histogram bins contain numbers c[1],..,c[k]. Then m[i]= c[i]/n * 100 is the percentage of the rows whose values of f are expected to be in the interval
It means that if the sample rows have been chosen randomly the expected number of rows with the values of f from this interval can be approximated by the number m[i]*/100 * N. To collect such statistics it is suggested to use the following variant of the ANALYZE TABLE command:
Here:
'WITH n ROWS' provides an estimate for the number of rows in the table in the case when this estimate cannot be obtained from statistical data.
'SAMPLING p PERCENTS' provides the percentage of sample rows to collect statistics. If this is omitted the number is taken from the system variable samples_ratio.
'IN RANGE r' sets the range of equal-width bins of the histogram built for the column col1. If this is omitted then and min and max values for the column can be read from statistical data then the histogram is built for the range [min(col1), max(col1)]. Otherwise the range [MIN_type(col1), MAX_type(col1) is considered]. The values beyond the given range, if any, are also is taken into account in two additional bins.
Add support for FULL OUTER JOIN One of the way how to implement is to re-write the query
into the following union all:
Here t1.a is some non-nullable column of t1 (e.g. the column of single column primary key).
supported in MySQL-8.0 and MSSQL
SQL Standard allows to use EXCEPT ALL and INTERSECT ALL as set operations. Currently MariaDB Server does not support them. The goal of this task is to support EXCEPT ALL and INTERSECT ALL
at syntax level - allow to use operators EXCEPT ALL and INTERSECT ALL in query expression body
at execution level - implement these operations employing temporary tables (the implementation could use the idea similar to that used for the existing implementation of the INTERSECT operation).
Add an UPDATE operation that returns a result set of the changed rows to the client.
I'm not exactly sure how the corresponding multiple-table syntax should look like, or if it is possible at all. But already having it for single-table updates would be a nice feature.
Idea
The purpose of this task is to create an easy-to-use facility for setting up a new MariaDB replication slave. Setting up a new slave currently involves: 1) installing MariaDB with initial database; 2) point the slave to the master with CHANGE MASTER TO; 3) copying initial data from the master to the slave; and 4) starting the slave with START SLAVE. The idea is to automate step (3), which currently needs to be done manually. The syntax could be something as simple as
This would then connect to the master that is currently configured. It will load a snapshot of all the data on the master, and leave the slave position at the point of the snapshot, ready for START SLAVE to continue replication from that point.
Implementation:
The idea is to do this non-blocking on the master, in a way that works for any
storage engine. It will rely on row-based replication to be used between the
master and the slave.
At the start of LOAD DATA FROM MASTER, the slave will enter a special
provisioning mode. It will start replicating events from the master at the
master's current position.
The master dump thread will send binlog events to the slave as normal. But in
addition, it will interleave a dump of all the data on the master contained in
tables, views, or stored functions. Whenever the dump thread would normally go
to sleep waiting for more data to arrive in the binlog, the dump thread will
instead send another chunk of data in the binlog stream for the slave to apply.
A "chunk of data" can be:
A CREATE OR REPLACE TABLE / VIEW / PROCEDURE / FUNCTION
A range of N rows (N=100, for example). Each successive chunk will do a range scan on the primary key from the end position of the last chunk.
Sending data in small chunks avoids the need for long-lived table locks or transactions that could adversely affect master performance. The slave will connect in GTID mode. The master will send dumped chunks in a separate domain id, allowing the slave to process chunks in parallel with normal data. During the provisioning, all normal replication events from the master will arrive on the slave, and the slave will attempt to apply them locally. Some of these events will fail to apply, since the affected table or row may not yet have been loaded. In the provisioning mode, all such errors will be silently ignored. Proper locking (isolation mode, eg.) must be used on the master when fetching chunks, to ensure that updates for any row will always be applied correctly on the slave, either in a chunk, or in a later row event. In order to make the first version of this feature feasible to implement in a reasonable amount of time, it should set a number of reasonable restrictions (which could be relaxed in a later version of the feature):
Give up with an error if the slave is not configured for GTID mode (MASTER_USE_GTID != NO).
Give up with error if the slave receives any event in statement-based binlogging (so the master must be running in row-based replication mode, and no DDL must be done while the provisioning is running).
Give up with an error if the master has a table without primary key.
Secondary indexes will be enabled during the provisioning; this means that tables with large secondary indexes could be expensive to provision.
As a follow-on to we would like GSSAPI encryption (in addition to authentication) support in MariaDB. I am told that the current plan is to create a plugin interface and then we can build GSSAPI encryption on top of that, so here is a ticket for that. From having written GSSAPI for the internal interface, there were a couple things I would like to see in the plugin encryption interface. First, GSSAPI is weird in that it does authentication before encryption (TLS/SSL are the other way around, establishing an encrypted channel and then doing authentication over it). Of course support for this is needed, but more importantly, packets must be processed in a fully serialized fashion. This is because encrypted packets may be queued while one end of the connection is still finishing up processing the authentication handshake. One way to do this is registering "handle" callbacks with connection-specific state, but there are definitely others. Additionally, for whatever conception there ends up being of authentication and encryption, it needs to be possible to share more data than just a socket between them. The same context will be used for authentication and encryption, much as an SSL context is (except of course we go from authentication to encryption and not the other way around). This ties into an issue of dependency. If authentication plugins are separate entities from encryption plugins in the final architecture, it might make sense to do mix-and-match authentication with encryption. However, there are cases - and GSSAPI is one - where doing encryption requires a certain kind of authentication (or vice versa). You can't do GSSAPI encryption without first doing GSSAPI authentication. (Whether or not it's permitted to do GSSAPI auth->encryption all over a TLS channel, for instance, is not something I'm concerned about.) Finally, encrypted messages are larger than their non-encrypted counterparts. The transport layer should cope with this so that plugins don't have to think about reassembly, keeping in mind that there may not be a way to get the size of a message when encrypted without first encrypting it. It's unfortunately been a little while since I wrote that code, but I think those were the main things that we'll need for GSSAPI. Thanks!
Please add a RETURNING option to INSERT. Example from PostgreSQL
Inspired by:This could make it easier to write statements which work with both MariaDB and PostgreSQL. And this might improve compatibility with Oracle RDBMS.
With a few exceptions, most native aggregate functions are supported as window functions. In , support for creating of custom aggregate functions was added. This task proposes to extend that feature and allow custom aggregate functions to be used as window functions An example of a creating a custom aggregate function is given below:
This functions can be used in the following query:
After this task is complete the following must also work:
Currently no true LOCK=NONE exists on slave. Alter table is first committed on master then it is replicated on slaves. The purpose of this task is to create a true LOCK=NONE
Implementation Idea
Master will write BEGIN_DDL_EVENT in binlog after it hits ha_prepare_inplace_alter_table. Then master will write QUERY_EVENT on binlog with actual alter query . On commit/rollback master will write COMMIT_DDL_EVENT/ROLLBACK_DDL_EVENT. On slave there will be pool of threads(configurable global variable), which will apply these DDLs. On reciving BEGIN_DDL_EVENT slave thread will pass the QUERY_EVENT to one of the worker thread. Worker thread will execute untill ha_inplace_alter_table. Actual commit_inplace_alter will be called by sql thread. If sql thread recieve some kind of rollback event , then it will somehow signal worker thread to stop executing alter. If none of the worker threads are avaliable then event will be enqueued, then If we recieved rollback event the we will simply discard event from queue, If we recieved commit event then SQL thread will syncrolysly process DDL event.
mysqltest has a lot of historical problems:
ad hoc parser, weird limitations
commands added as needed with no view over the total language structure
historical code issues (e.g. casts that become unnecessary 10 years ago) etc
A lot can be done to improve it. Ideas
control structures, else in if, break and continue in while, for (or foreach) loop
proper expression support in let, if, etc
rich enough expressions to make resorting to sql unnecessary in most cases
remove unused and redundant commands (e.g. system vs exec, query_vertical vs vertical_results ONCE)
A multiple-table UPDATE first performs join operations, then it updates the matching rows. A multiple-table UPDATE returning a result set does the following:
first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it updates the matching rows
A multiple-table DELETE first performs join operations, then it deletes the matching rows. A multiple-table DELETE returning a result set does the following:
first performs join operations
for each row of the result of the join it calculates some expressions over the columns of the join and forms from them a row of the returned result set
after this it deletes the matching rows
As MariaDB is getting more storage engines and as they're getting more features, MariaDB can optionally use more and more compression libraries for various purposes. InnoDB, TokuDB, RocksDB — they all can use different sets of compression libraries. Compiling them all in would result in a lot of run-time/rpm/deb dependencies, most of which will be never used by most of the users. Not compiling them in, would result in requests to compile them in. While most users don't use all these libraries, many users use some of these libraries. A solution could be to load these libraries on request, without creating a packaging dependency. There are different ways to do it
hide all compression libraries behind a single unified compression API. Either develop our own or use something like Squash. This would require changing all engines to use this API
use the same approach as in server services — create a service per compression library, a service implementation will just return an error code for any function invocation if the corresponding library is not installed. this way — may be — we could avoid modifying all affected storage engines
SP/PS (Stored Procedures / Prepared Statements) allocates memory till the PS cache of SP will be destroyed. There is no way to see how many memory allocated and if it grows with each execution (first 2 execution can lead to new memory allocation but not more)
Task minimum:
Status variables which count the memory used/allocated for SP/PS by thread and/or for the server.
Other ideas:
Automatic stop allocation in debvugging version after second execution and call exception on attempt.
Information schema by threads and SP/PS with information about allocated and used memory
Information can be collected in MEM_ROOTs of SP/PS. Storing info about status of mem_root before execution then checking after new allocated memory can be found. MEM_ROOT can be changed to have debug mode which make it read only which can be switched on after second execution.
MariaDB ColumnStore supports DECIMAL with some limitations:
We do not support the full DECIMAL range that is in MariaDB
In several places in the code we convert the DECIMAL to DOUBLE during execution therefore losing precision Implementing this will likely require the following:
Implementation of methods to handle MariaDB's DECIMAL format
Support for a longer than 8-byte numeric column type (there is an InfiniDB tree with work for this already)
Modification of the primitives processor for the math
Modification of the function expression processor to handle the new type
We need an ORM-style NoSQL read API to go along with the bulk write API of mcsapi. This will likely take the form of:
A reader in ExeMgr which will convert messages from mcsapi into jobs
Code in mcsapi to send/receive the messages Although ExeMgr can already receive messages with an execution plan the format is very complex and ABI breaks easily (we often send whole C++ objects). We should look at other ORM frameworks for inspiration as the API design. This task to do the design for this API.
CS uses Volcano processing approach working on one value at a time. This is very inefficient way for analytics workload that usually uses lots of aggregation functions in projections, filtering or sorting. We are interested in using JIT for basic aggregation functions: sum, avg, count, min, max. The patch must compile and run a program that processes and returns the aggregation function result. We were written this description having LLVM in mind as it is widespread and has a lots of examples in the wild. I suggest to start looking at RowAggregation::addRowGroup() from ./utils/rowgroup/rowaggregation.cpp to get what it takmakees to get avg() function result. Here is the link on how to build fast a CS developer environment.
CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms. There are different levels of complexity for the task:
implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection
implement ANALYZE TABLE functionality for Columnstore engine
implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to ) together with NDV histograms to decrease I/O
We expect to have both unit and regression tests but this is optional.
Do you have an idea of your own, not listed above? Do let us know!
This page is licensed: CC BY-SA / Gnu FDL
(max(f)-min(f))/k *(i-1), max(f)-min(f))/k *i-1).ANALYZE FAST TABLE tbl [ WITH n ROWS ] [SAMPLING p PERCENTS ]
PERSISTENT FOR COLUMNS (col1 [IN RANGE r] [WITH k INTERVALS],...)SELECT t1.*, t2.* FROM t1 FULL OUTER JOIN t2 ON P(t1,t2)SELECT t1.*, t2.* FROM t1 LEFT OUTER JOIN t2 ON P(t1,t2)
UNION ALL
SELECT t1.*,t2.* FROM t2 LEFT OUTER JOIN t1 ON P(t1,t2) WHERE t1.a IS NULLCREATE TABLE tree (
`Node` VARCHAR(3),
`ParentNode` VARCHAR(3),
`EmployeeID` INTEGER,
`Depth` INTEGER,
`Lineage` VARCHAR(16)
);
INSERT INTO tree
(`Node`, `ParentNode`, `EmployeeID`, `Depth`, `Lineage`)
VALUES
('100', NULL, '1001', 0, '/'),
('101', '100', '1002', NULL, NULL),
('102', '101', '1003', NULL, NULL),
('103', '102', '1004', NULL, NULL),
('104', '102', '1005', NULL, NULL),
('105', '102', '1006', NULL, NULL);
WITH RECURSIVE prev AS (
SELECT * FROM tree WHERE ParentNode IS NULL
UNION
SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
FROM tree t JOIN prev p ON t.ParentNode = p.Node
)
SELECT * FROM prev;
WITH RECURSIVE prev AS (
SELECT * FROM tree WHERE ParentNode IS NULL
UNION
SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
FROM prev p JOIN tree t ON t.ParentNode = p.Node
)
UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.Node;
You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.No' at line 7SET session.server_id = @@global.server_id,
session.gtid_domain_id=@@global.gtid_domain_id;CREATE AGGREGATE FUNCTION agg_sum(x INT) RETURNS DOUBLE
BEGIN
DECLARE z DOUBLE DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN z;
LOOP
FETCH GROUP NEXT ROW;
SET z = z + x;
END LOOP;
END|CREATE TABLE balances (id INT, amount INT);
INSERT INTO balances VALUES (1, 10), (2, 20), (3, 30);
SELECT agg_sum(amount) FROM balances;SELECT agg_sum(amount) OVER (ORDER BY id);CREATE TABLE mysql.column_stats (
min_value varbinary(255) DEFAULT NULL,
max_value varbinary(255) DEFAULT NULL,
...
hist_size TINYINT UNSIGNED,
hist_type ENUM('SINGLE_PREC_HB','DOUBLE_PREC_HB'),
histogram varbinary(255),
...remove complex commands that do many sql statements under the hood, if they can be scripted, e.g. sync_slave_with_master
remove over-verbose treatment of rpl test failures
scoped variables
parameters for the source command
remove dead code
Version upgrade support for DECIMAL from the current form to the new form
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Sergei Golubchik
Details:
Mentor:
Vicentiu Ciorbaru
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Andrei Elkin
Details:
Mentor:
Details:
Mentor:
Oleksandr Byelkin
Details:
Mentor:
Varun Gupta
Details:
Mentor:
Sachin Setiya
Details:
Mentor:
Sergei Golubchik
Details:
Mentor:
Igor Babaev
Details:
Mentor:
Sergei Golubchik
Details:
Mentor:
Oleksandr Byelkin
Details
Mentor:
Andrew Hutchings
Details:
Mentor:
Andrew Hutchings
Details:
Mentor:
Roman
Details:
Mentor:
Roman
CREATE TABLE t1 (a INT, b INT, INDEX (a/2+b));
...
SELECT * FROM t1 WHERE a/2+b=100;(max(f)-min(f))/k *(i-1), max(f)-min(f))/k *i-1).ANALYZE FAST TABLE tbl [ WITH n ROWS ] [SAMPLING p PERCENTS ]
PERSISTENT FOR COLUMNS (col1 [IN RANGE r] [WITH k INTERVALS],...)SELECT t1.*, t2.* FROM t1 FULL OUTER JOIN t2 ON P(t1,t2)SELECT t1.*, t2.* FROM t1 LEFT OUTER JOIN t2 ON P(t1,t2)
UNION ALL
SELECT t1.*,t2.* FROM t2 LEFT OUTER JOIN t1 ON P(t1,t2) WHERE t1.a IS NULLCREATE TABLE tree (
`Node` VARCHAR(3),
`ParentNode` VARCHAR(3),
`EmployeeID` INTEGER,
`Depth` INTEGER,
`Lineage` VARCHAR(16)
);
INSERT INTO tree
(`Node`, `ParentNode`, `EmployeeID`, `Depth`, `Lineage`)
VALUES
('100', NULL, '1001', 0, '/'),
('101', '100', '1002', NULL, NULL),
('102', '101', '1003', NULL, NULL),
('103', '102', '1004', NULL, NULL),
('104', '102', '1005', NULL, NULL),
('105', '102', '1006', NULL, NULL);
WITH RECURSIVE prev AS (
SELECT * FROM tree WHERE ParentNode IS NULL
UNION
SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
FROM tree t JOIN prev p ON t.ParentNode = p.Node
)
SELECT * FROM prev;
WITH RECURSIVE prev AS (
SELECT * FROM tree WHERE ParentNode IS NULL
UNION
SELECT t.Node,t.ParentNode,t.EmployeeID,p.Depth + 1 AS Depth, CONCAT(p.Lineage, t.ParentNode, '/')
FROM prev p JOIN tree t ON t.ParentNode = p.Node
)
UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.Node;
You have an error IN your SQL syntax; CHECK the MANUAL that corresponds TO your MariaDB server VERSION FOR the RIGHT syntax TO USE near 'UPDATE tree t, prev p SET t.Depth=p.Depth, t.Lineage=p.Lineage WHERE t.Node=p.No' AT LINE 7UPDATE [LOW_PRIORITY] [IGNORE] tbl_name
SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ...
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
RETURNING select_expr [, select_expr ...]LOAD DATA FROM MASTERpostgres=# CREATE TABLE t1 (id SERIAL, name VARCHAR(100));
CREATE TABLE
postgres=# INSERT INTO t1(name) VALUES('test') RETURNING id;
id
----
1
(1 row)
INSERT 0 1CREATE AGGREGATE FUNCTION agg_sum(x INT) RETURNS DOUBLE
BEGIN
DECLARE z DOUBLE DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN z;
LOOP
FETCH GROUP NEXT ROW;
SET z = z + x;
END LOOP;
END|CREATE TABLE balances (id INT, amount INT);
INSERT INTO balances VALUES (1, 10), (2, 20), (3, 30);
SELECT agg_sum(amount) FROM balances;SELECT agg_sum(amount) OVER (ORDER BY id);Add the following in the global comment for each contribution:
For those cases this is not done, please add to this page a short line for each push into MariaDB that includes code from contributors not employed by the MariaDB Foundation or the MariaDB Corporation. The purpose of this is to properly track that all such patches are submitted either under MCA or BSD-new and to ensure that the developer gets credit for his work.
Example:
(Please enhance the example with anything that makes sense.)
.
.
()
Tencent Game DBA Team, developed by vinchen.
()
Jerome Brauge.
Per-engine mysql.gtid_slave_pos tables ()
Kristian Nielsen funded by Booking.com.
The MariaDB Foundation website provides a more detailed list of contributors by release, starting from
New variable permits restricting the speed at which the slave reads the binlog from the master ()
Tencent Game DBA Team, developed by chouryzhou.
()
Tencent Game DBA Team, developed by vinchen.
()
Daniil Medvedev
Lixun Peng, Alibaba
Implement non-recursive common table expressions () Implement recursive common table expressions () Pushdown conditions into non-mergeable views/derived tables ()
Galina Shalygina
Backporting Delayed replication () from MySQL 5.6
Kristian Nielsen funded by Booking.com
The MariaDB Foundation website provides a more detailed list of contributors by release, starting from
, optimizer, security, speed enhancements, bug fixing, etc
Power8 optimization
Stewart Smith
In cooperation with IBM
enhancements and speedups
Reviews for , , compression, , storage engine, storage engine, , etc.
, scrubbing, enhanced semisync, dump thread enhancements, thd_specifics plugin service
Table level ,
, online alter progress monitoring
Antony Curtis
Sriram Patil
New
Daniel Black
Daniël van Eeden
Atomic writes, page compression, trim, multi-threaded flush for XtraDB/InnoDB
In cooperation with FusionIO
The MariaDB Foundation website provides a more detailed list of contributors by release, starting from
Defragmentation, prefix index queries optimization, lazy flushing, buffer pool list scan optimization, configurable long semaphore wait timeout
Percona
,
Oracle
optimization,
Per thread memory counting and usage
Base code and idea by Lixun Peng, Taobao
License: BSD
Base code by Lixun Peng, Taobao
License: BSD
Code by Konstantin "Kostja" Osipov, mail.ru
License: BSD
Code by Olivier Bertrand
License: GPL
Code by Kentoku Shiba, Spiral Arms
License: GPL
Code by Vicentiu Ciorbaru, Google Summer of Code 2013
License: BSD
Code by Sudheera Palihakkara, Google Summer of Code 2013
License: BSD
Some patches by Pavel Ivanov, Google
The MariaDB Foundation website provides a more detailed list of contributors by release, starting from
Function last_value() which returns the last value but evaluates all arguments as a side effect.
Original patch by Eric Herman, Booking.com
License: BSD
nowatch option for mysqld_safe (allow systemd)
Based on code from Maarten Vanraes
License: BSD
Security fixes, patches
Work by Honza Horak, Red Hat
Coverity scans
Work by Christian Convey
The MariaDB Foundation website provides a more detailed list of contributors by release, starting from
Virtual Columns
Andrey Zhakov (modified by Sanja and Igor)
Author has
Declaring many CHARSET objects as const.
Antony T Curtis (LinkedIn)
License: BSD
Authors: People at Google, Facebook and Percona. This code owes a special thanks to Mark Callaghan!
License: BSD
Fredrik Nylander from Stardoll.com
License: MCA
The storage engine
Created by Arjen Lenz, Open Query
License GPL
The storage engine
Created by Andrew Aksyonoff.
License: GPL
Pluggable Authentication
RJ Silk License: MCA
Various bug fixes
Stewart Smith, Percona
Microsecond precision in process list
Percona Inc
Patch was .
Slow Query Log Extened Statistics
Percona Inc
Patch was .
The
Created by Paul McCullagh
License: GPL
The FederatedX storage engine
All changes are made by Patrik Galbraith and Antony Curtis and are given to us under BSD-new.
In addition we are allowed to promote FederatedX.
Windows enhancements and various bug fixes
Alex Budovski, under MCA
Creating of MariaDB packages
Arjen Lenz, Open Query
Various bug fixes
Stewart Smith, Percona
Google has sponsored:
Google tests GTID, parallel replication and lots more on the mailing list
Facebook has sponsored many features, including:
The
Facebook employees do frequent the mailing list
lists and other sponsors.
lists the authors of MariaDB (including documentation, QA etc).
This page is licensed: CC BY-SA / Gnu FDL
Patch: Name, url or where we got patch
Author: ....
License: MCA or BSD
Reviewer: ....Feature/Patch name
* Author(s)
* Author has signed MCA on "date" | Patch was licensed under BSDwriting speed was improved by moving checksum calculations out of the global binlog mutex. This is a contribution by Kristian Nielsen (MDEV-31273)
New system variable enables binary log purging when the total size of all binary logs exceeds the specified threshold. The implementation is based on the patch from Percona (MDEV-31404)
FULL_NODUP is a new value for the . It essentially works like FULL, that is all columns are included in the event, but it takes less space, because the after image omits columns that were not changed by the UPDATE statement, and have same values as in the before image. This is a contribution from Alibaba (MDEV-32589)