Comments - Identifying and removing duplicates

6 years, 4 months ago Ian Gilfillan

There is another alternative that should work on versions before MariaDB 10.3. This works by setting a unique index on all the identical fields, and by making use of the old ALTER TABLE copying behaviour, ignoring errors when it copies the data back, basically silently dropping duplicates. This requires the old_alter_table system variable to be temporarily set. Here's a sample:

CREATE TABLE t (id INT, f1 VARCHAR(2));

INSERT INTO t VALUES (1,'a'), (2,'a'), (3,'b'), (4,'a');

SELECT * FROM t;
+------+------+
| id   | f1   |
+------+------+
|    1 | a    |
|    2 | a    |
|    3 | b    |
|    4 | a    |
+------+------+

SET SESSION old_alter_table=1;

ALTER IGNORE TABLE t ADD UNIQUE INDEX u(f1);
Query OK, 4 rows affected (0.82 sec)               
Records: 4  Duplicates: 2  Warnings: 0

SELECT * FROM t;
+------+------+
| id   | f1   |
+------+------+
|    1 | a    |
|    3 | b    |
+------+------+

SET SESSION old_alter_table=0;

With your data, you would create the unique index on all the duplicated fields, so something like:

ALTER TABLE t ADD UNIQUE INDEX u(f1, f2, f3...);
 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.