MaxScale Customisation - Filter, Routing of Clients, "Sticky" Statements
I am with a company that provides services and develops software for shipping companies. We usually deploy our web applications on our own infrastructure. I've been dealing with MariaDB and MaxScale for 2 weeks now and I'd say I do have a pretty good overview of whats going on. I already set up a demo environment with 1 master, 3 slaves, 1 client and 1 MaxScale proxy.
Since we want to use MariaDB and MaxScale in production environments when it gets out of the Beta state, I am trying to test this technology for some requirements we have. This one might be rather tricky than trivial, though I wanted to discuss this here because I am having struggle setting it up the way I want.
If you need more information on the requirements itself let me know so I can figure out if I can provide you with a more detailed description.
Requirement: Our web applications excessively use temporary tables. This means, a lot of tables are generated on-the-fly and used within a session. They are deleted by cronjobs after a specified period of time (or after the current http session was closed/invalidated/timed out). The web applications then use these temporary tables to join them with persistent tables on the databases to generate reports. Usually, the table names of temporary tables are named somewhat similar to tmp_<number> . The last part of those table names are equal to a part of the URL of the web application within the “session_id=” variable. For example: + URL: https://url.to.web.app./session_id=123 + temporary table name: tmp_123
So, depending on the variable “session_id=” in the URL of the application the names of the temporary tables gets generated dynamically from within the application. To reduce network overhead, throughput decrease and other performance parameters, we do not want those tables to be replicated to the slaves. But we do not as well want to have all the read/write load for those temporary tables on one single master machine. So, this is when a multi-master setup comes on the plan. And with it the question on filters and how to manage and distribute the load on the available master servers.
Suggestion: Supposing we have 4 master servers (master_00, master_01, master_02 and master_03). Supposing they are set up in a multi-master replication setup. Supposing we use MaxScale as the respective proxy for routing the requests. When a client sends one of those MySQL statements that create or read from a temporary table, we could use the RegEx filter to route the client to specific master machines (not slaves, since write statements are redirected to master servers in any case). Excluding the temporary tables from replication in the server configuration might be mandatory here.
Questionnaire: Since the names of those temporary tables are unique throughout their lifecycle (meaning the name could be used again in another session after the table currently holding the name is being deleted), it might come in handy to do an operation such as: Table name: tmp_<number> Operation: <number> MOD x (x being the number of master servers in the cluster)
For example: <number> = 6091073567508922368; Number of master servers: 8 Resolves in 6091073567508922368 MOD 4 = 0, so the request gets routed to master_00 (assuming we start counting on 0 for the number of master servers)
Regarding the (RegEx) filter, these questions come to mind: + Is it possible to do mathematic operations such as Modulo within the filter definition? + Is it possible to retrieve the number of current master servers being a member of the cluster from within the filter definition? + Is it possible to dynamically build the filter definitions with a script and inject them into a running MaxScale implementation or does MaxScale need to be restarted after changing the configuration? + What happens when one of the masters dies? + Is it possible to have join statements spread over multiple servers?
Answer Answered by Markus Mäkelä in this comment.
The regexfilter does not support mathematical operations. It uses the POSIX regular expression syntax: http://pubs.opengroup.org/onlinepubs/009696899/basedefs/xbd_chap09.html
Retrieving the master server is doable from inside a filter, the servers used by the service are accessible to the filter through the shared session data.
Right now there is only experimental support for scripting interfaces to MaxScale. The Luafilter on the GitHub branch 'luafilter' has a filter which supports scripts written in the Lua language: https://github.com/mariadb-corporation/MaxScale/blob/luafilter/server/modules/filter/luafilter.c
If a master server dies, MaxScale will detect this when the monitor modules try to query the server for status. Connections to this server are severed if they are still open and an error is sent to the client. Currently the session is closed if the master server dies while it is still open even though reads could still be accepted.
JOIN queries over multiple servers is not something MaxScale currently supports. I recommend looking at the Spider storage engine for MariaDB: https://mariadb.com/kb/en/mariadb/spider/