APPENDIX C - MD5 Hash Generation
  • 26 Nov 2024
  • 1 Minute to read
  • Dark
    Light
  • PDF

APPENDIX C - MD5 Hash Generation

  • Dark
    Light
  • PDF

Article summary

During import all original files are given an MD5 Hash Value which is used when identifying duplicates within Discovery Manager. The following table describes the data used to generate the MD5 Hash per Document Type. In addition to the email metadata properties listed in the table below, the following normalization process is used when creating an email MD5 Hash:

  • Milliseconds are removed from all time values.

  • Recipients are sorted by email address alphanumerically.

  • Display Names are not used.

  • Attachments are sorted by filename alphanumerically.

  • All whitespaces, hard line returns, and non-alphanumeric characters are removed from the email body leaving only letters and numbers.

  • Whitespaces, hard line returns, and non-alphanumeric characters are not removed from the email subject.

MD5 Generation by Document Type

Document Type

Values Used To Generate MD5 Hash

Efiles (Including Efile Attachments)

Generated on the bit stream of the file

Outlook Items1

Date Sent, Sender Email Address, Recipient Email Addresses, Subject, Body, Attachment Names, Attachment Size

Lotus Notes Items

Memo, Reply, Notice

From, DateSent, SendTo, CopyTo, BlindCopyTo, Attachment Name ($FILE), Subject, Body

Appointment

Subject, Chair, STARTDATETIME, Location, EndDateTime, RequiredAttendees, RepeatDates, OptionalAttendees, FYIAttendees, Attachment Name($FILE), Body

Task

Subject, DateSent, STARTDATETIME, DueDateTime, Principal, AssignedTo, OptionalAssignedTo, FYIAssignedTo, Body, AttachmentName ($FILE)

Non Delivery Report

Subject, IntendedRecipient, FailureReason, From, DateSent, SendTo, CopyTo, BlindCopyTo, Attachment Name ($FILE), OriginalSubject, Body

Delivery Report, Return Receipt

DateSent, Subject, IntendedRecipient, From, AttachmentName($FILE), OriginalSubject, Body, SendTo, CopyTo, BlindCopyTo

Unrecognized Forms

All properties except UNID

Note

The above fields can be adjusted within Project Settings shown below. You can remove fields, which will identify more duplicates, however it will create more false positives.

Email Dedupe Fields


ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence