To improve your chances of being accepted, it is a good idea to submit a pull request with a bug fix to the server.
Loading Issues from Jira
Add autocompletion capabilities to the MariaDB Jupyter kernel
As part of the Jupyter Messaging protocol, the Jupyter frontend sends a
complete_request message to the MariaDB kernel when the user invokes the code completer in a Jupyter notebook.
This message is handled in the do_complete function from the
In simpler words, whenever the user hits the key shortcut for code autocompletion in a notebook, the MariaDB kernel's do_complete function is called with a number of arguments that help the kernel understand what the user wants to autocomplete.
So the autocompletion infrastructure in the MariaDB kernel is already kindly provided by Jupyter, we only need to send back to Jupyter a list of suggestions based on the arguments that
do_complete receives :-).
Ideally we should aim to enable at least database, table and column name completion and also SQL keyword completion.
But no worries, there are plenty of possibilities to extend the functionality even more if the accepted student turns out to be very productive :D
Implement interacting editing of result sets in the MariaDB Jupyter kernel
At this moment the MariaDB kernel is only capable of getting the results sets from the MariaDB client in
HTML format and packing them in a Jupyter compatible format. Jupyter then displays them in notebooks like it would display Python Pandas dataframes.
Sure, the users can easily write
SQL code to modify the content of a table like they would write in a classical command line database client.
But we want to go a bit further, we would love to have the capability to edit a result set returned by a
SELECT statement (i.e. double click on table cells and edit) and have a button that users can press to generate a
SQL statement that will update the content of the table via the MariaDB server.
Apart from interacting with the Jupyter frontend for providing this UI capability, we also have to implement a field integrity functionality so that we make sure users can't enter data that is not compatible with the datatype of the column as it is seen by the MariaDB server.
The project should start with a fair bit of research to understand how we can play with the Jupyter Messaging protocol to create the UI functionality and also to check other Jupyter kernels and understand what's the right and best approach for tackling this.
Make the MariaDB Jupyter kernel capable of dealing with huge SELECTs
Currently the MariaDB kernel doesn't impose any internal limits for the number of rows a user can
SELECT in a notebook cell. Internally the kernel gets the result set from MariaDB and stores it in a pandas DataFrame, so users can use it with magic commands to chart data.
But this DataFrame is stored in memory, so if you
SELECT a huge number of rows, say 500k or 1M, it's probably not a very good idea to create such a huge DataFrame.
We tested with 500k rows, and the DataFrame itself is not the biggest problem, it consumed around 500MB of memory. The problem is the amount of rows the browser needs to render, for 500k rows the browser tab with the notebook consumes around 2GB of memory, so the Jupyter frontend (JupyterLab, Jupyter Notebook) slows down considerably.
A potential solution is to introduce a two new config options which would specify:
- a limit for the number of rows the Jupyter notebook should render, a reasonable default value for this could 50 rows for instance (
- a limit for each
limit_max_rows, that the kernel would use to determine whether it should store the result set in memory in a DataFrame or store the result set on disk. A reasonable default value might be 100k rows.
The trickiest part of the project though is that, once the kernel writes a result set on disk, the charting magic commands need to detect that the data is not in memory, it is on disk, and they should find a smart mechanism for generating the chart from the disk data without loading the entire data in memory (which would defeat the whole purpose of the project). This might involve finding a new Python plotting library (instead of current matplotlib) that can accomplish the job.