Comments - Useful MariaDB Queries

2 years, 11 months ago Tibor Lapohos

Specifically to "Removing Duplicates":

If there are many columns in a table, spelling out the necessary conditions as required in the above example can prove to be tedious. Therfore, what is proposed is to 1) find all unique rows by using grouping on all colums except the indexed column, 2) use it as an exclusion list from the original table as rows to be kept. I am not sure about the computational cost of this proposed method, so further comments are most welcome.

As an example:

-- Create a sample table:
CREATE TABLE t (id INT, c1 VARCHAR(2), c2 VARCHAR(2), c3 VARCHAR(2));
INSERT INTO t VALUES (1,'a', 'b', 'c'), (2,'a','b','c'), (3,'a','d','c'), (4,'a','d','e'), (5,'a','d','e'), (6,'b','a','c'), (7,'e', 'b', 'c'), (8,'e','b','c'), (9,'e','d','c');

-- Display the table, if you will:
SELECT * FROM t;
+------+------+------+------+
| id   | c1   | c2   | c3   |
+------+------+------+------+
|    1 | a    | b    | c    |
|    2 | a    | b    | c    |
|    3 | a    | d    | c    |
|    4 | a    | d    | e    |
|    5 | a    | d    | e    |
|    6 | b    | a    | c    |
|    7 | e    | b    | c    |
|    8 | e    | b    | c    |
|    9 | e    | d    | c    |
+------+------+------+------+

-- To see what can be considered as unique and be kept use MIN() or MAX() by grouping on all data columns. In terms of essential content, it makes no difference max or min:
SELECT MIN(id) FROM t GROUP BY c1, c2, c3;
+---------+
| MIN(id) |
+---------+
|       1 |
|       3 |
|       4 |
|       6 |
|       7 |
|       9 |
+---------+

-- What will be removed is:
SELECT id FROM t EXCEPT SELECT MIN(id) FROM t GROUP BY c1, c2, c3;
+------+
| id   |
+------+
|    2 |
|    5 |
|    8 |
+------+

-- And finally, the overall compiled statement is:
DELETE FROM t WHERE id IN (SELECT id FROM t EXCEPT SELECT MIN(id) FROM t GROUP BY c1, c2, c3);

-- To see the outcome:
SELECT * FROM t;
+------+------+------+------+
| id   | c1   | c2   | c3   |
+------+------+------+------+
|    1 | a    | b    | c    |
|    3 | a    | d    | c    |
|    4 | a    | d    | e    |
|    6 | b    | a    | c    |
|    7 | e    | b    | c    |
|    9 | e    | d    | c    |
+------+------+------+------+
 
1 year, 11 months ago Cesar L. C.

Query for remove duplicates with "group by" is great, it's easy to understand. Thank you so much.

 
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.