Comments - Galera cluster fail during dump or optimize

8 years, 3 months ago Tom Worster Worster

Sounds like flow control. Jay Janssen has a couple of good articles on the Percona blog.

Short form: if activity in one node causes it to fall behind on applying replicated transactions received from the others, its receive queue builds up and when it reaches a threshold it signals the other nodes to stop sending, which they do by halting new transactions.

You may be able to alleviate this by adjusting the threshold and/or disabling flow control (donor desync) for the duration of the slow task. Jay Janssen has the details.

I've noticed that things happening in mysqld or in other processes can have enough impact to lead to flow control.

 
8 years, 3 months ago gijs van der velden

Yes, that was the issue. Thank you very much :)

I've added the suggested wsrep_provider_options from blog of Jay Janssen: gcs.fc_limit=500; gcs.fc_master_slave=YES; gcs.fc_factor=1.0";

Which resolved the problem. However it also created a new problem, the job now finishes correctly but after that the cluster still go's down.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.