Our live webinar on how to build and scale an application with MariaDB on Docker was so well attended that we weren’t able to get to all the audience questions at the end. Here are our answers to the ones we missed during the live session.
And if you couldn’t make it to the webinar or would like to watch it again, you can access the Docker recording and slide deck.
Q: What is the fundamental content of an image? Is it the line commands you include in the Dockerfile that make up the image? Do you image from the Dockerfile or are there out-of-the-box images out there?
A: There are many pre-created images out there, take a look at https://hub.docker.com/explore/. The result of building a Dockerfile is an image. A container is instantiated from an image at runtime. So the image contains everything that is needed for execution. Under the covers, each line of the Dockerfile may cause an action of the file-system – for example, to load an O/S package. Each change creates a “layer,” i.e., a set of changes from the previous state of the file-system. Each line in the Dockerfile builds another layer until you get the final image. Each of these layers can be reused by other images.
Q: Is the code for this demo available on GitHub or other source code repository?
Q: Does Docker support MariaDB replication? I see you emphasized on clustering and not replication.
A: The Docker image we used in the demo uses Galera Cluster for replication. The entrypoint script that is also in the image will take care of configuration of MariaDB and Galera, and makes sure that the databases are clustered and replicated. The demo showed MariaDB cluster, which is a multi-master solution for high availability. The base Docker image does not configure master/slave replication. This is on the roadmap to deliver to the Docker Store later this year.
Q: How do containers influence database performance if you store data inside a container?
A: At DockerCon last week, Netflix stated that the containers have a < 0.1% performance penalty at runtime. Obviously, your mileage may vary. Storing the data inside the container adds more overhead, it impacts the I/O to the file-system. In the case of a container, this is the ‘union’ file-system (or AUFS) that the image is created from. Using a mounted volume means that you are writing directly to the host’s file-system (e.g., EXT4, XFS), so you should have better performance.
Q: Are there best practices for deploying schema changes in dev and production? It appears you have a single file for dev which is fine when starting from scratch, but production will need deltas from its current state. It seems that ideally we’d use the same provisioning approach to both dev and production.
A: There are many tools for managing schema migration. Some are independent and others are built into the language frameworks. The choice is largely personal. For the sake of simplicity of the demo, the same image was used for dev and production.
Q: Can you expose a MariaDB instance over the local network so MySQL Workbench can connect to it?
A: Yes, the MariaDB container can be accessed from outside of the container world. To show that during the webinar, our presenter would have needed to install a MySQL client on his Mac. It was simpler for him to run the client in another container.
Q: You mentioned MariaDB Cluster … does that use Galera for clustering?
A: Yes, MariaDB Cluster packages Galera.
Q: How are you handling data persistence? I did not see persistent data volumes in your Dockerfile/docker-compose.yml. If the container goes down, that data will remove itself?
A: Correct, for the demo we stored the data inside the container. The last section in the presentation discussed the merits (and complications) of storing data outside of the container. Presently, this requires using one of several vendor technologies in order to map the storage back into the correct container on restart. This is still a weak area of the Docker ecosystem in my view.
Q: Can you show some way to bootstrap and restart Galera Cluster which is running with Docker?
A: Using service discovery (in our example we used DNS), cluster can be self-formed and maintained. Docker is introducing an event endpoint so that direct manipulation of configuration could be orchestrated (e.g., adding and removing nodes). This is waiting for a pull request to be merged by Docker – see PR #26331.
Q: What level of coding skills are required to stand up a dev and prod box?
A: It’s a simple set of CLI commands – however, testing and validating the operations use cases may require setup and config changes – for example, your tolerance for data loss during a failover event, etc.
Q: Could you reiterate the methods/benefits/downsides for persisting data (volumes, etc.) for transient Docker instances? Do MariaDB DBAs seem to have a preference based on their backup/recovery practices?
A: Storing data outside of the container on a mounted volume means that the data will survive a container crash, stop/start cycle, etc. Using an external volume means that you can then use tools to manage that data volume. For example, if this is an EBS volume, then you could use snapshots for backup, etc. There are several options for backup/restore. The choice will depend on data size and operational needs; it’s so hard to recommend a specific solution. More details here: https://mariadb.com/kb/en/mariadb/backup-and-restore-overview/
Q: In case the data is stored in a different container … how do we ensure the data is persistent, what do you suggest? Like cluster above data containers, or replica to backup files in other system, outside of containerization … or any other?
A: If you use a data container for storage, then you will still need a solution for data replication. For example, you could use binlog replication in order to ensure other copies exist, or a “FLUSH TABLES WITH READ LOCK” and then a snapshot. If you are mounting the volume from a storage array, then that array may have the capability to replicate data as well. So there are many choices here.
Q: When you start MariaDB, 1st node needs to start with bootstrap flag on, and others in regular way, so what’s the right way to restart cluster, because I can’t restart the 1st node again with bootstrap flag?
A: The Docker entrypoint script we used relies on doing a DNS dig in order to find the other service instances. Docker creates a virtual IP for the service, then IPs for each instance (or in Docker terms, “task”) for the service. Therefore, the Galera Cluster is always started with a “seed” which gets modified if other tasks are discovered in DNS.