Skip to main content

e.discovery

de-duplication technology

The Exact Duplicate Problem

Research shows that anywhere from 30% to 50% or more of electronic document collections are exact duplicates. Duplicate documents significantly increase e.discovery processing costs and legal review time if they are not identified and removed.

The Exact Duplicate Solution

In the same way as humans have DNA and fingerprints to uniquely identify us from others, electronic documents have their own DNA and their own fingerprints. We can use this information to identify electronic documents that are exact duplicates of each other.

This is achieved by generating an MD5 Hash (its electronic fingerprint) of each individual file and comparing each fingerprint to find those which are an identical match. Any document which is an exact match can be safely removed from the collection of data, significantly reducing the volume of documents that is required to be processed and reviewed.

Exact de-duplication is a standard part of the e.law e.discovery workflow. However, what exact de-duplication does not address is the problem of near duplicates.

Read more about near duplicates

contact us today for a quote or: make an enquiry here