MaxScale - from proxy to MySQL replication relay
Mark Riddoch, one of the MaxScale team, describes how a MaxScale plugin was developed for booking.com that allowed the proxy to be used to reduce the load placed on the master in large MySQL replication environments.
During the first part of the year I spent a lot of time working on a proof of concept to use MaxScale as a way to distribute MySQL binlogs for large replications installations. I have to admit when I first heard the idea from Booking.com my reaction was - "MaxScale is a proxy for client applications, it can't do this". However I was wrong, proving that making versatile, configurable software can throw up surprises even for the people that design it.
There have been posts elsewhere about the problem this is trying to solve, so I will not go into too much detail. Suffice to say that with large numbers of slaves connected to a single master the load on the master becomes too high, using intermediate relay servers causes other issues because of the way MySQL replication re-executes the statements on the relay server and then sends the binlog records for that re-executed SQL rather than the original binlog records.
As followers of MaxScale are probably bored of hearing by now MaxScale is built as a general purpose core that provides support facilities for a proxy and a number of plugins of different types. Hence the original idea of using it as the basis for a replication relay service came about. My problem was how to fit replication into something that was designed to act as a proxy for database applications and forward queries from those applications to a backend server. The most obvious feature that needs to be utilised is the query router within MaxScale. Normally these routers take requests in and forward them to one of a number of database servers, very much a push model of interaction. In replication however the slave servers each register with the master and then the master will stream changes, in the shape of binlog records, to the slaves - not the classical request/response model.
The conventional structure of a query router plugin within MaxScale is to have an instance of the router per service and that each client that connects is allocated a new session within the router. Each request that then arrives at the router for a given session is routed to an outbound server using rules in the router. Replies back from that backend server are sent to the client connection that was opened when the session was created. Replication calls for a somewhat different structure however.
In replication we need the router module to register with the master server and request binlog records from the master. This should be done once, and should probably not be in response to any event that comes in from the clients of MaxScale - in this case the clients are the slave servers. The other difference is that we don't get a single binlog record, the equivalent of a response in a more traditional router, and return it to the single client. Rather we may send it nowhere or to multiple connections, it depends how many slave servers we have attached and the current binlog positions of those slaves servers. We also may need to send this record to a slave at some undetermined time in the future, if the slave is lagging behind or not connected at the time. Therefore the router has to act as a store and forward relay rather than a mere proxy, forwarding request and response packets.
So after spending a little time thinking about the problem and listening to Jean-Francois at Booking.com explain just why something like this would be useful, I decided that what seemed like a crazy idea at first was indeed a very good idea. A germ of an implementation plan began to form and I started to construct a proof of concept of a router plugin module that would route binlog records. This rest of this post is the story of how I put that proof of concept together.
The Prototype Router
The requirements for this new router plugin where now becoming clearer
- It must request and receive binlog records from the master autonomously of any slave activity.
- Binlog records must be stored on permanent or semi-permanment storage to allow replaying of those records to the slave servers.
- The slave servers must be able to request historical binlog records without sending any additional traffic to the master server.
- Binlog records received from the master must be relayed to the slaves that are able to accept them, i.e. not lagging behind the master, in a timely fashion.
I was also very keen that while doing this I did not do anything that would specialise MaxScale, I wanted the binlog router to work with the same MaxScale core, with the same protocol plugins, monitors and filters, as any other plugin. It is fine to enhance the core to provide new generic services that are not already available, but wholesale adaptation of the core for this one requirement would not be desirable.
With these requirements and constraints in place I decided the best thing to do, as always, was to divide and conquer. Hence the first part of the implementation would concentrate on the interaction between MaxScale and the master. Registering for the binlog records, retrieving them and storing them in a file. Once this was working the next step would be to move to the slave side.
The first step in the process was to examine a typical interaction between a slave and the master during the registration. Although the protocol defines the registration process itself, the slaves run a number of queries in order to determine server version and settings in place on the master that might have an impact on the slave. Since MaxScale would need to act as a master it must also be able to support these queries from slaves of MaxScale. The choice was that either MaxScale should know how to respond to these queries or it merely proxies these slave requests to the real master. The problem with the later approach is that it would require a different connection from the one that MaxScale uses to receive binlog records. It is not possible to use the same connection, since once the connection is receiving binlog records you can not send another query on that connection without halting the flow of binlog records. The decision was therefore taken to not use this method, but rather to have MaxScale respond directly to these requests without forwarding them on to the master.
The method of providing these query responses is fairly simple, a list of the requests that slaves may make was built up by observing the message traffic for a number of registration sessions. These queries are then saved in the binlog router; the router executes each of these queries itself during MaxScale's own registration process and the responses are stored within the router. When a slave makes a request some time later the saved response the master made to MaxScale is simply replayed to the slave.
Having settled on the method of dealing with the registration queries from the slave servers the next issue was how to get MaxScale to register with the master as a slave. The normal flow of router interactions is for a client session to be created, with traffic begin forwarded to the backend databases only when that client session made a request. In this case the client session is a slave server connection, and the backend server is the master from which MaxScale is receiving the binlog records. Ideally MaxScale should register with the master and start the process of receiving and storing binlog records before the first slave connects to MaxScale. Or indeed it should collect binlog records even if no slaves are connected; so the existing workflow of a router would not be acceptable.
Fortunately there was a solution, as part of the process of creating a listener for a service MaxScale will load the router that the service uses and create an instance of the router. This involves calling the createInstance() entry point of the router. In the case of most routers this will simply set up any router wide data structures, check any router options etc. However in the case of the binlog router we use this as the trigger for MaxScale to connect to the master. The connection to the master is not a simple process however, it requires several interactions with the master server. The rule of MaxScale implementation, no thread should block, means that we can not complete the process in the createInstance() call, as it would have to block waiting for a response for the master.
The solution to this problem was for the createInstance() call to create a dummy client and establish a MaxScale session to connect to the master. This allows the usual proxy calls to be used, but with an internal client rather than an external client application. Requests are sent via this dummy client, and responses received back. A finite state machine is built to execute the various queries required in the registration process.
The trigger to move from each state to the next is the reception of the response to the previous query or in the case of the last two states the COM_REGISTER_SLAVE and COM_BINLOG_DUMP messages. This allows the registration process to be implemented as a non-blocking process, with the clientReply() entry point of the router triggering the transition to the next state and sending the next request. After each message is sent control of the thread is returned back to the MaxScale core, thus satisfying the "no blocking" rule.
Upon completion of the state machine the master server will then start to stream binlog records to MaxScale. These messages arrive asynchronously at MaxScale, as replies to the COM_BINLOG_DUMP message that was sent by MaxScale during the final state transition. As far as MaxScale is concerned it is receiving an endless stream of responses to the COM_BINLOG_DUMP message that it sent to the master. MaxScale then saves these into a local binlog file maintained by the router plugin. The router must examine each of the binlog records in order to determine if these are records that affect the management of the binlog file itself, i.e. a binlog rotate event, or if it is a binlog record that should not be stored in the file.
With the completion of this phase of the router we now had a MaxScale setup that could connect to a master server, register for replication updates and save binlog files locally on the MaxScale server. This enabled us to test the process and confirm the contents of the MaxScale maintained binlog files matched those on the master server. The next phase in the implementation of the router was to create the slave side of the interaction, this will be covered in another post.
- The Booking.com post - MySQL Slave Scaling (and more)
- Anders Karlsson - MariaDB Replication, MaxScale and the need for a binlog server
- MaxScale GitHub Project
- MaxScale Google Discussion Group