All pages
Powered by GitBook
1 of 1

Loading...

Google Summer of Code 2025

This year we are again participating in the Google Summer of Code. We, joined with the MariaDB Foundation, believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently , C++, , , Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to the developers mailing list (this is the main list where we discuss development - there are also other mailing lists).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

LOAD DATA plugins

Full-time project 350h

LOAD DATA INFILE can flexibly load data into a table from CSV-like files accessible by the mariadbdb process. LOAD XML INFILE can do it for XML files. LOAD DATA LOCAL INFILE and LOAD XML LOCAL INFILE can do it with files accessible by the client, but not by the server. But there are requests to suport loading more file formats and from other locations, for example, from S3.

This project is to implement support for LOAD plugins and refactor the current LOAD code accordingly. There are two kind of plugins — data parser plugin (CSV-like and XML) and transfer plugin (file and LOCAL). Implementing new plugins is not in the scope of this task, this task is mainly about moving existing code around, creating a possibility for new plugins (like JSON or S3).

Skills needed: C++, bisonMentors: Sergei Golubchik

Generate vector embeddings automatically on INSERT

Full-time project 350h

Implement a syntax and a plugin API that the server will use to generate embeddings for documents that the user stores in the database. This should allow to simplify significantly the vector pipeline. mariadbd will not generate embeddings internally, it will invoke a plugin to do that.

Skills needed: C++Mentors: Sergei Golubchik

expressions in mysqltest

Part-time project 175h

extend mysqltest language to support

  • standard arithmetic +, -, *, /, %

  • comparisons ==, !=, <, <=

This should work in commands if, while

Can be done together with as a full-time project.

Skills needed: C++Mentors: Sergei Golubchik

variable substitutions in mysqltest

Part-time project 175h

extend mysqltest language to support bash-like substitutions:

  • ${var}

  • ${parameter:offset:length}

  • ${#parameter}

  • ${parameter/pattern/string/flags}

recursive expansion:

  • ${${var}}

Can be done together with as a full-time project.

Skills needed: C++Mentors: Sergei Golubchik

Create utility to parse frm files and print their DDL

Full-time project - potential part-time (175 - 350h, depending on scope)

FRM files are what MariaDB uses to store metadata about tables. These files can be used to generate DDL statements (CREATE TABLE ...). We are lacking a utility to parse these files which could in turn make DBAs lives easier. The task of this project is to have this utility implemented, making use of MariaDB's FRM parsing logic. You may have to carry out some refactoring to extract the parsing code into a reusable library, once for MariaDB Server, once for the FRM parsing tool.

Skills needed: C/C++, understanding libraries and APIs.Mentors: Vicențiu Ciorbaru / Sergei Golubchik

Replication to enable filtering on master

Part-time project 175h

The current methods of filtering replication events are limited to either 1) at binlog-write time, which can break point-in-time recovery because some committed transactions will be missing from the binary log, or 2) on the replica, which forces all events on the primary server to always be sent to the replica, which can be a security concern and is also not efficient. This task aims to eliminate these limitations by adding in another point at which replication filtering occurs: on the binlog dump threads. This would allow users to both maintain a consistent binary log, and minimize network traffic by guarding events which are never intended for replication.

Skills needed: C++Mentors: Brandon Nesterenko

Buildbot build statistics dashboard

Part-time project 175h TODO - A more ample description will be created.

Skills needed:Mentors: Vlad Radu

Manual extent vacuuming

Full-time project 350h

Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.

Skills needed: modern C++Mentors: Roman Nozdrin

Support for recursive CTE

Full-time project 350h

MariaDB Columnstore lacks recursive CTE handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains recursive CTE.

Here is the info about the feature:

Skills needed: modern C++Mentors: Leonid Fedorov

Support for EXCEPT and INTERSECT SQL expressions

Full-time project 350h

MariaDB Columnstore lacks UNION EXCEPT INTERSECT handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains UNION EXCEPT or UNION INTERCEPT

Here is the info about the feature:

Skills needed: modern C++Mentors: Alexey Antipovsky

MCOL-XXX Bloom-filters for data scanning

Full-time project 350h

MariaDB Columnstore lacks indexes so it reads a lot of extra data from disk. This project introduces Bloom-filters to reduce data read from disk during the most IO heavy operation that is scanning.

Skills needed: modern C++Mentors: Roman Nozdrin

Reduce the computations in JOINS by simpler Bloom-filter-based pre-joins

Full-time project 350h

Joins are very heavy algorithms, both in computation and/or in memory use. They need to hold a substantial amount of data in memory and perform hashing and other operations on that data. Joins can overflow memory limits and keeping balance between memory use and performance is tricky. Thus we have to filter information thaat is going into joins as much as possible. Columnstore already does great work in that regard, pushing WHERE filters before joins. This particular task is also concerned with that, adding Bloom filters' operations that approximate JOIN results and perform a secondary read to feed into joins data that is highly likely will be used in a join.

Skills needed: modern C++Mentors: Sergey Zefirov

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!

This page is licensed: CC BY-SA / Gnu FDL

,
>
,
>=
  • boolean &&, ||, may be ? :

  • if possible: string repetition, perl-style x (to replace SELECT REPEAT() in test files)

  • may be ${parameterˆ}, ${parameterˆˆ}, ${parameter,}, ${parameter}

  • may be ${parameter@function} with functions like u, U, Q, etc

  • List of beginner friendly issues
    MDEV-28395
    MDEV-36100
    MDEV-36107
    MDEV-36108
    MDEV-36108
    MDEV-36107
    MDEV-18827
    MDEV-9345
    MCOL-4889
    MCOL-5142
    MCOL-5598
    MCOL-5758
    C
    ODBC
    Java
    recursive-common-table-expressions-overview
    except
    intersect