Table of Contents
ToggleSometimes, tables or result sets contain duplicate records. While in most cases, duplicates are allowed, there are situations where it is necessary to prevent duplicate records and remove them from a database table.
Handling duplicates in an SQL database becomes necessary to prevent the following consequences −
This chapter will describe how to prevent the occurrence of duplicate records in a table and how to remove the already existing duplicate records.
To prevent the entry of duplicate records into a table, we can define a PRIMARY KEY or a UNIQUE Index on the relevant fields. These database constraints ensure that each entry in the specified column or set of columns is unique.
As we have not defined any constraints on the table, duplicate records can be inserted into it. To prevent such cases, add a PRIMARY KEY constraint on relevant fields (say LAST_NAME and FIRST_NAME together) −
Alternatively, we can use the INSERT IGNORE statement to insert records without generating an error for duplicates as shown below −
As you can see below, the table will only consist of a single record (ignoring the duplicate value).
| FIRST_NAME | LAST_NAME | SEX |
|---|---|---|
| Thomas | Jay | NULL |
Or, use the REPLACE statement to replace duplicates as shown in the following query −
The table will contain the following records −
| FIRST_NAME | LAST_NAME | SEX |
|---|---|---|
| Kumar | Ajay | NULL |
| Thomas | Jay | NULL |
The choice between the INSERT IGNORE and REPLACE statements should be made based on the desired duplicate-handling behaviour. The INSERT IGNORE statement retains the first set of duplicate records and discards any subsequent duplicates. Conversely, the REPLACE statement preserves the last set of duplicates and erases any earlier ones.
Another way to enforce uniqueness in a table is by adding a UNIQUE constraint rather than a PRIMARY KEY constraint −
To count and identify duplicate records based on specific columns, we can use the COUNT function and GROUP BY clause.
Following is the query to count duplicate records with FIRST_NAME and LAST_NAME in the BUYERS −
This query will return a list of all the duplicate records in the PERSON_TABLE table. To identify sets of values that are duplicated, follow the steps given below −
We can use the DISTINCT keyword along with the SELECT statement to retrieve unique records from a table.
Alternatively, you can include a GROUP BY clause specifying the columns you are selecting to eliminate duplicates −
