Skip to content Skip to sidebar Skip to footer

Remove Duplicates From A Table And Re-link Referencing Rows To The New Master

I have a table transcription which contains passages of transcribed text and their citations with columns: text, transcription_id(PK), t_notes, citation and the second table town_t

Solution 1:

This single command should do it all:

WITH blacklist AS (  -- identify duplicate IDs and their master
   SELECT *
   FROM  (
      SELECT transcription_id
           , min(transcription_id) OVER (PARTITION BY text, citation) AS master_id
      FROM   transcription
      ) sub
   WHERE  transcription_id <> master_id
   )
, upd AS (  -- redirect referencing rows
   UPDATE town_transcription tt
   SET    transcription_id = b.master_id
   FROM   blacklist b
   WHERE  b.transcription_id = tt.transcription_id
   )
DELETE FROM transcription t  -- kill dupes (now without reference)
USING  blacklist b
WHERE  b.transcription_id = t.transcription_id;

For lack of definition I chose the row with the smallest ID per group as surviving master row.

FK constraints don't get in the way unless you have non-default settings. Detailed explanation:

After removing the dupes you might now want to add a UNIQUE constraint to prevent the same error from reoccurring:

ALTER TABLE transcription
ADD CONSTRAINT transcription_uni UNIQUE (text, citation);

Solution 2:

Use row_number() over(...) to identify rows that repeat information. A partition by text, citation in the over clause will force the row number series to re-start at 1 for each unique set of those values:

select
     *
from (
       select
              text, transcription_id, t_notes, citation
            , row_number() over(partition by text, citation 
                                order by transcription_id) as rn
       from transcription 
     ) d
where rn > 1

Once you have verified those as the unwanted rows,then use the same logic for a delete statement.

However, you may loose information held in the t_notes column - are you willing to do that?


Post a Comment for "Remove Duplicates From A Table And Re-link Referencing Rows To The New Master"