Google Summer of Code 2025

You are viewing an old version of this article. View the current version here.

This year we are again participating in the Google Summer of Code. We, joined with the MariaDB Foundation, believe we are making a better database that remains application compatible with MySQL. We also work on making LGPL connectors (currently C, C++, ODBC, Java, Node.js) and on MariaDB Galera Cluster, which allows you to scale your reads & writes. And we have MariaDB ColumnStore, which is a columnar storage engine, designed to process petabytes of data with real-time response to analytical queries.

Where to Start
List of Tasks
1. MariaDB Server
Suggest a Task

Where to Start

Please join us on Zulip to mingle with the community. You should also subscribe to the developers mailing list (this is the main list where we discuss development - there are also other mailing lists).

To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.

Also see the List of beginner friendly issues from the MariaDB Issue Tracker.

List of Tasks

MariaDB Server

MDEV-28395 LOAD DATA plugins

Full-time project 350h

LOAD DATA INFILE can flexibly load data into a table from CSV-like files accessible by the mariadbdb process. LOAD XML INFILE can do it for XML files. LOAD DATA LOCAL INFILE and LOAD XML LOCAL INFILE can do it with files accessible by the client, but not by the server. But there are requests to suport loading more file formats and from other locations, for example, from S3.

This project is to implement support for LOAD plugins and refactor the current LOAD code accordingly. There are two kind of plugins — data parser plugin (CSV-like and XML) and transfer plugin (file and LOCAL). Implementing new plugins is not in the scope of this task, this task is mainly about moving existing code around, creating a possibility for new plugins (like JSON or S3).

Skills needed: C++, bison
Mentors: Sergei Golubchik

MDEV-36100 Generate vector embeddings automatically on INSERT

Full-time project 350h

Implement a syntax and a plugin API that the server will use to generate embeddings for documents that the user stores in the database. This should allow to simplify significantly the vector pipeline. mariadbd will not generate embeddings internally, it will invoke a plugin to do that.

Skills needed: C++
Mentors: Sergei Golubchik

MDEV-36107 expressions in mysqltest

Part-time project 175h

extend mysqltest language to support

standard arithmetic +, -, *, /, %
comparisons ==, !=, <, <=, >, >=
boolean &&, ||, may be ? :
if possible: string repetition, perl-style x (to replace SELECT REPEAT() in test files)

This should work in commands let, if, while

Skills needed: C++
Mentors: Sergei Golubchik

MDEV-36108 variable substitutions in mysqltest

Part-time project 175h

extend mysqltest language to support bash-like substitutions:

${var}
${parameter:offset:length}
${#parameter}
${parameter/pattern/string/flags}
may be ${parameterˆ}, ${parameterˆˆ}, ${parameter,}, ${parameter_}
may be ${parameter@function} with functions like u, U, Q, etc

recursive expansion:

${${var}}

Skills needed: C++
Mentors: Sergei Golubchik

MDEV-18827 Create utility to parse frm files and print their DDL

Full-time project - potential part-time (175 - 350h, depending on scope)

FRM files are what MariaDB uses to store metadata about tables. These files can be used to generate DDL statements (CREATE TABLE ...). We are lacking a utility to parse these files which could in turn make DBAs lives easier. The task of this project is to have this utility implemented, making use of MariaDB's FRM parsing logic. You may have to carry out some refactoring to extract the parsing code into a reusable library, once for MariaDB Server, once for the FRM parsing tool.

Skills needed: C/C++, understanding libraries and APIs.
Mentors: Vicențiu Ciorbaru / Sergei Golubchik

MDEV-9345 Replication to enable filtering on master

Part-time project 175h

The current methods of filtering replication events are limited to either 1) at binlog-write time, which can break point-in-time recovery because some committed transactions will be missing from the binary log, or 2) on the replica, which forces all events on the primary server to always be sent to the replica, which can be a security concern and is also not efficient. This task aims to eliminate these limitations by adding in another point at which replication filtering occurs: on the binlog dump threads. This would allow users to both maintain a consistent binary log, and minimize network traffic by guarding events which are never intended for replication.

Skills needed: C++
Mentors: Brandon Nesterenko

Buildbot build statistics dashboard

Part-time project 175h TODO - A more ample description will be created.

Skills needed:
Mentors: Vlad Radu

MCOL-4889 Manual extent vacuuming

Full-time project 350h

Here extent is a unit of group of columnar values and partition is a group of extents that stores all column values for a specific portion of a table. MCS has a notion of an empty value for columnar segment/token files and dictionaries. Empty values are marked with a bit in the special 1 byte auxiliary column that is created for every table. When DELETE removes records from a table the records are marked with empty bit in the auxiliary column. The deleted records become a wasted disk space. The goal of the project is to reclaim the wasted disk space either re-creating the whole partition or moving partition values.

Skills needed: modern C++
Mentors: Roman Nozdrin

MCOL-5142 Support for recursive CTE

Full-time project 350h

MariaDB Columnstore lacks recursive CTE handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains recursive CTE.

Here is the info about the feature: https://mariadb.com/kb/en/recursive-common-table-expressions-overview/

Skills needed: modern C++
Mentors: Leonid Fedorov

MCOL-5598 Support for EXCEPT and INTERSECT SQL expressions

Full-time project 350h

MariaDB Columnstore lacks UNION EXCEPT INTERSECT handling, so as of now Columnstore hands over the processing back to MariaDB Server if a query contains UNION EXCEPT or UNION INTERCEPT

Here is the info about the feature:
https://mariadb.com/kb/en/except/
https://mariadb.com/kb/en/intersect/

Skills needed: modern C++
Mentors: Alexey Antipovsky

MCOL-XXX Bloom-filters for data scanning

Full-time project 350h

MariaDB Columnstore lacks indexes so it reads a lot of extra data from disk. This project introduces Bloom-filters to reduce data read from disk during the most IO heavy operation that is scanning.

Skills needed: modern C++
Mentors: Roman Nozdrin

MCOL-5758 Reduce the computations in JOINS by simpler Bloom-filter-based pre-joins

Full-time project 350h

Joins are very heavy algorithms, both in computation and/or in memory use. They need to hold a substantial amount of data in memory and perform hashing and other operations on that data. Joins can overflow memory limits and keeping balance between memory use and performance is tricky. Thus we have to filter information thaat is going into joins as much as possible. Columnstore already does great work in that regard, pushing WHERE filters before joins. This particular task is also concerned with that, adding Bloom filters' operations that approximate JOIN results and perform a secondary read to feed into joins data that is highly likely will be used in a join.

Skills needed: modern C++
Mentors: Sergey Zefirov

Suggest a Task

Do you have an idea of your own, not listed above? Do let us know in the comments below (Click 'Login' on the top of the page first)!

Comments

Comments loading...

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

Google Summer of Code 2025

Contents

Where to Start

List of Tasks

MariaDB Server

MDEV-28395 LOAD DATA plugins

MDEV-36100 Generate vector embeddings automatically on INSERT

MDEV-36107 expressions in mysqltest

MDEV-36108 variable substitutions in mysqltest

MDEV-18827 Create utility to parse frm files and print their DDL

MDEV-9345 Replication to enable filtering on master

Buildbot build statistics dashboard

MCOL-4889 Manual extent vacuuming

MCOL-5142 Support for recursive CTE

MCOL-5598 Support for EXCEPT and INTERSECT SQL expressions

MCOL-XXX Bloom-filters for data scanning

MCOL-5758 Reduce the computations in JOINS by simpler Bloom-filter-based pre-joins

Suggest a Task

Comments

Products

Services

Pricing

Resources

About Us

Download MariaDB