Ready for the holiday shopping season? 20 tips to prepare your MariaDB database environment for Black Friday and Cyber Monday!
Setting up a database environment for the expected loads that Black Friday or Cyber Monday brings can be a tricky situation. However, the suggestions in this blog centers on the principles of Scalability, Capacity, Performance and High Availability (HA). For this post, I define each principle as:
Scalability is the ability to add capacity by adding resources
Capacity is the ability to handle load
Performance is tuning your database environment for increased capacity and scale
High Availability is serving requests with good performance
This blog post covers how to tune system variable in the MariaDB database environment for capacity, scalability and high availability.
Before deploying, do not accept these suggestions blindly. Each MariaDB environment is unique and requires additional thought before making any changes. You will most likely need to adjust these settings for your specific use case and environment.
Things you need to know:
MariaDB configuration file is located in /etc/my.cnf. Every time you modify this file you will need to restart the MySQL service so the new changes can take effect.
Black Friday and Cyber Monday Tuning Recommendations
1. InnoDB Buffer Pool Size
The InnoDB buffer pool should generally be set to 60-80% of the available RAM when using InnoDB exclusively. Ideally, all InnoDB data and indexes should be able to fit within memory or at least the working data set.
2. InnoDB Logs
There are two general suggestions for InnoDB log file sizing: Set combined total size of InnoDB log files greater than 25-50% of the InnoDB buffer pool size or set combined InnoDB log file log size equal to one hour’s worth of log entries during peak load. Larger log files can lead to slower recovery in the event of a server crash. However, they also reduce the number of checkpoints needed and reduce disk I/O.
Evaluate the size of one hour’s worth of binary logs under operational load, then decide whether to increase the size of the InnoDB log files or not.
3. InnoDB Log Buffer Size
A larger InnoDB log buffer size means less disk I/O for larger transactions. It is suggested to set this to 64M on all servers.
4. InnoDB Log Flush Interval
The innodb_flush_log_at_trx_commit variable controls when flushing of the log buffer to disk occurs. innodb_flush_log_at_trx_commit = 1 (default) flushes the log buffer to disk at each transaction commit. This is the safest, but also the least performant.
innodb_flush_log_at_trx_commit = 0 flushes the log buffer to disk every second, but nothing on transaction commit. Up to one second (possibly more due to process scheduling) could be lost. If there’s any crash, MySQL or the server, can lose data. This is the fastest, but least safe option.
innodb_flush_log_at_trx_commit = 2 writes the log buffer out to file on each commit but flushes to disk every second. If the disk cache has a battery backup (for instance a battery backed cache raid controller) this is generally the best balance of performance and safety. A crash of MySQL should not lose data. A server crash or power outage could lose up to a second (possibly more due to process scheduling). A battery backed cache reduces this possibility.
It is suggested to use the first option for safety.
5. InnoDB IO Capacity
innodb_io_capacity should be set to approximately the maximum number of IOPS the underlying storage can handle.
This was set to 400, and has been increased to 1000 in the new configuration. It is suggested to benchmark the storage to determine whether this value can be increased further.
6. Thread Cache Size
It is suggested to monitor the value of Threads_created. If it continues increasing at more than a few threads per minute, increase the value of thread_cache_size.
The thread cache size is set to 200 in the new configuration.
7. Table Cache, Table Definition Cache
The table_open_cache and table_defintion_cache variables control the number of tables and definitions to keep open for all threads.
Monitor Open_tables, Open_table_defintitions, Opened_tables, and Opened_table_definitions to determine the best value. The general suggestion is to set table_open_cache (and subsequently table_definition_cache) only high enough to reduce the rate of increase of the Opened_tables (and Opened_table_definitions respectively) status value.
Both table open cache and definition cache have been set to 2048 in the new configuration.
8. Query Cache
Generally, if the query cache hit rate is below 50%, it should be investigated whether a performance benefit is being seen or not by having query cache enabled. There is an overhead with query cache for each query.
The query cache is currently disabled. Due to the nature of the application and the ratio of writes to reads, it is unlikely the query cache will offer any performance improvements and could negatively impact performance.
9. Temporary Tables, tmp_table_size, & max_heap_table_size
MySQL uses the lower of max_heap_table_size and tmp_table_size to limit the size of temporary tables in memory. These are per client variables. While having this value large can help reduce the number of temporary tables created on disk, it also raises the risk of reaching the server's memory capacity since this is per client. Generally 32M to 64M is the suggested value to begin with for both variables and tune as needed.
Temporary tables are often used for GROUP BY, ORDER BY, DISTINCT, UNION, sub queries, etc. Ideally, MySQL should create these in memory, and as few on disk as possible.
It is important to note that queries not using joins appropriately and creating large temporary tables can be a cause for higher number of temporary tables on disk. Another reason is the memory storage engine uses fixed length columns and assumes worst case scenario. If columns are not sized correctly (for example, a VARCHAR(255) for a short string), this influences the size of the table in memory and can cause it to go to disk earlier than it should. Also, temporary tables with blob and text columns will immediately go to disk as the memory storage engine does not support them.
Both have been set to 64M in the new configuration.
10. Warning Log Level
It is suggested to set this to log_warnings = 2. Doing so logs information about aborted connections and access-denied errors.
11. Max Connections
Determine an appropriate value for max connections and change. Recommended starting value would be 500 and adjust up or down as needed by monitoring the Max_used_connections status variable.
12. Transaction Isolation
It is suggested to investigate the available transaction isolation levels, and determine the best transaction isolation for this server’s use case.
13. Binary Log Format
It is recommended to use ROW binary log format for master-master replication.
14. Auto Increment Offsets
To help reduce the chances of collision between two masters being written to simultaneously, the auto increment and auto increment offset values need to be adjusted accordingly.
15. Sync Binlog
By default, flushing of the binlog to disk is handled by the OS. In the event of a server crash, it is possible to lose transactions from the binary log leading to replication being out sync. Setting sync_binlog = 1 causes the binlog file to be flushed on every commit.
This is slower, but the safest option.
16. Crash Safe(r) Slaves
To help avoid replication errors after a slave crash, enable relay log recover and the syncing of the relay log and relay log info files to disk.
17. Log Slave Updates
To have chained replication (master -> slave-> slave), log slave updates needs to be enabled. This tells a slave to write replicated transactions to its own binary log, so that they can then be replicated to slaves off of it.
18. Read Only Slaves
Slaves should be read only to avoid data accidentally being written to them.
Note: Users with super privilege can still write when the server is read only.
19. Slave Net Timeout
The slave_net_timeout variable is the number of seconds the slave will wait for a packet from the master before trying to reconnect. The default is 3600 (1 hour). This means if the link goes down and isn’t detected, it could be up to an hour before the slave reconnects. This could lead to the slave suddenly being up to an hour behind the master.
It is suggested to set slave_net_timeout to a more reasonable value such as 30 or 60.
20. Learn More
Join our webinar on Thursday, November 10 at 10am PST and 10am CET on last minute preparations for peak traffic periods like Black Friday and Cyber Monday.
Register now for the 10am PST webinar
Register now for the 10am CET webinar