Replication stops after one hour or so
we operate a big MariaDB-server (24 CPUs, 48 GB RAM, Version 5.5.32 with Percona XTRADB). Because it is mission-critical it is replicated to another server of similar size (row based replication). We get the ball rolling with "innobackupex" (GREAT tool to clone a life server). After setting the slave to its coordinates it starts up, catches up to the master (times behind = 0) and all is well. For an hour. Or four. Or two days. Then the slave suddenly stops executing the masters binlog.
There are no errors logged. I checked the binlog and relaylog on master and slave around the last log entry processed and can't find anything strange. I'm at a total loss. I can't say that I'm an MySQL or MariaDB specialist, so I'd like to reach out to this community in order to get some idea what's happening. Anybody got an idea? Thanks in advance!
Martin Trenz, Soquero GmbH, Frankfurt, Germany
Answer Answered by Martin Trenz in this comment.
The problem stemmed from a table with 390000 records without a primary index that is replicated row-based. InnoDB creates a invisible primary key (six bytes, basically the row-number) but this key is not replicated. Therefor every DELETE leads to a full table scan on the slave. Transaction-Encapsulation lead to the impression of absolute standstill.