Deadlock with foreign key constraints

We are using MariaDB 10.1 and facing some weird deadlock issues. While investigating we ran a sample program to test the foreign keys and found a deadlock error. Can someone please explain this in bit detail so we understand how locking works in galera.

We used a 2 node cluster for this testing.

Session 1 (Node1):
create table dl1(pk int primary key, data varchar(100));
create table dl2(pk int primary key, pk1 int not null, constraint dl2_fk foreign key(pk1) 
  references dl1(pk), data varchar(100));
set autocommit=off;
insert into dl1 values(1, 'a');

Session 2 (Node2):
set autocommit=off;

insert into dl2(pk, pk1, data) values(10, 1, 'a0');

ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails 
  (`demoApp`.`dl2`, CONSTRAINT `dl2_fk` FOREIGN KEY (`pk1`) REFERENCES `dl1` (`pk`)) -- as expected

select * from dl1;

Empty set (0.00 sec) -- as expected
Session 1 (Node1):
Session 2 (Node2):
select * from dl1;

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction --- strange
Once I got this error I ran the 'SHOW ENGINE INNODB STATUS\G' on node2 and here is the truncated output.
2016-05-04 17:46:34 7ff98ead8b00 Transaction:

TRANSACTION 3852, ACTIVE 0 sec inserting, thread declared inside InnoDB 5000

mysql tables in use 1, locked 1

3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1 --------------------> (LOOK HERE)

MySQL thread id 2507, OS thread handle 0x7ff98ead8b00, query id 73514 localhost root update

insert into dl2(pk, pk1, data) values(10, 1, 'a0')

Foreign key constraint fails for table `test`.`dl2`:


CONSTRAINT `dl2_fk` FOREIGN KEY (`pk1`) REFERENCES `dl1` (`pk`)

Trying to add in child table, in index `dl2_fk` tuple:

DATA TUPLE: 2 fields;

0: len 4; hex 80000001; asc ;;

1: len 4; hex 8000000a; asc ;;

But in parent table `test`.`dl1`, in index `PRIMARY`,

the closest match we can find is record:

PHYSICAL RECORD: n_fields 1; compact format; info bits 0

0: len 8; hex 696e66696d756d00; asc infimum ;;




0 queries inside InnoDB, 0 queries in queue

0 read views open inside InnoDB

1 RW transactions active inside InnoDB ------------------> (LOOK HERE)

0 RO transactions active inside InnoDB

1 out of 1000 descriptors used

Main thread process no. 1242, id 140706194630400, state: sleeping

Number of rows inserted 0, updated 0, deleted 0, read 0

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

Number of system rows inserted 0, updated 0, deleted 0, read 0

0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

Answer Answered by Jan Lindström in this comment.


Firstly, lets run above transactions on single node InnoDB without any Galera. First insert-statement will take row locks to avoid phantom inserts to dl1. This means that second insert-statement to dl2 has to wait for that lock. This is because that insert is dependent on parent row in dl1. Thus, on single node InnoDB above transactions fail on second insert with lock wait timeout. Remember ACID rules, transactions are run on isolation, so they do not know each other.

Now lets consider Galera case. When node1 issues commit, the insert will be executed also on node2. This is problematic as transaction on node2 is still open (yes we did roll back the statement containing the insert but not whole transaction yet). This also means that all row locks are maintained. New select statement requires also shared locks to maintain repeatable read isolation level so it will compete with insert statement from node1. But, insert from node1 should commit and may not wait for locks so select in node2 is aborted. Now naturally the error is kind of misleading, but better than nothing.


Comments loading...
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.