Duplicate Content Detection helps prevent Vault users from accidentally uploading the same file multiple times. This feature runs a duplicate content check each time a user creates a new document by uploading a file.

About the Checksum Field

When an Admin enables Duplicate Content Detection, Vault runs a checksum on source files (not viewable renditions) for all existing documents and populates each document’s Checksum field, using the source file’s checksum value.

A checksum is a numeric value representing the sum of set bits of data in the source file. Vault uses this as a unique identifier for each source file.

The Checksum field is available on all documents and available for document reporting.

About Automatic Duplicate Detection

When you upload a new file to create a document, Vault compares the checksum of the uploaded file with the latest version of each document in the Vault. The Save button may be momentarily disabled while Vault completes this process. If no duplicates are detected, document upload continues.

If there are duplicates, Vault indicates the total number found and lists up to five duplicates to which you have access. Note that duplicate detection notifies you if there are duplicates which you do not have access to view.

You can view the duplicates and either continue uploading or cancel the upload and use an existing document.

Triggers for Automatic Duplicate Detection

Vault automatically checks for duplicate files in the following scenarios:

  • Creating a new document by uploading a file
  • Uploading a file for a content placeholder

Duplicate detection does not run automatically when you upload new files for a document using the Create Draft, Upload New Version, or Check In actions. However, Vault does update the document’s Checksum value every time you upload a file.

About the One-Click Duplicate Content Report

Vault includes the one-click Duplicate Content Report, which you can access from each document’s All Actions menu in the Doc Info page and Library. To view this report, you must have the View Document permission for the document.

About Duplicate Detection in Document Inbox

From Document Inbox, you can select multiple documents and check them all for duplicate source files in a single action. After selecting documents, choose Detect Duplicates from the actions menu. Vault refreshes the page and displays a red thumbnail for the documents with duplicates, as well as a “Duplicates detected” message. By holding your cursor over the message, you can see the documents that share the same source file.

How to Find Duplicates Across Documents

You can use Vault’s reporting capabilities to find documents with duplicate content in your Vault. To do this, create a report with the following parameters:

  • Report Type: Document
  • Report Format: Tabular
  • Grouping: Use Checksum field to group all documents that share the same Checksum value

Reports only return documents that the report viewer can access. If your organization needs to find all duplicates, an Admin with access to all documents should run this report. However, non-Admin users still find this functionality useful for locating duplicates among their documents.