Introduction
This article provides an optimized workflow for efficiently handling Data Subject Access Requests (DSARs) in Reveal. It outlines the most effective methods for identifying documents responsive to a Data Subject, focusing on minimizing irrelevant content—such as duplicate documents and non-inclusive emails. Additionally, it highlights techniques for streamlining the redaction process using communication analysis, entity analysis, and regular expressions, while leveraging the Wordlist functionality in conjunction with the Document Explorer.
Workflow Process Flow
The following steps outline the process for efficiently identifying and redacting documents in the review set:
Identify the Responsive Review Set.
Automate the Redaction Process.
Redact Wordlist Hits in the Document Viewer – Native View.
Identify the Responsive Review Set
This part ensures that all relevant documents related to the Data Subject are correctly identified and linked for review.
Identify the Data Subject: In the Communications module, search for the Data Subject by their name or identifier. Click on their node and select Edit. Ensure that all email aliases associated with this individual are linked to the correct communicator.
Note
On the right-hand side of the window, you'll see all aliases currently associated with the communicator. If you search for that person on the left, any unassigned aliases will appear. You can then decide whether to assign these unassigned aliases to the communicator.
Search for the Data Subject as a Communicator: Use the Advanced Search feature, which includes search fields for From, To, Cc, and Bcc. This allows you to locate all communication records associated with the Data Subject. Here, you can also view and modify the associated aliases. Once done, click Add to Search to include these results.
Change the Operator: Change the operator from IS to NOT, as we want to exclude this communicator from the search results where they appear as a participant in the correspondence. Be sure to include FAMILIES.
Run a Keyword/Term List Search: Perform a Keyword/Term List Search for the Data Subject. This will identify documents where they are mentioned in the body of the data, even if they did not send or receive the email. This search will return emails where the Data Subject is referenced but not necessarily a participant in the communication.
Exclude Exact Duplicates: You have the option to exclude Exact Duplicates from your results and identify the most Inclusive Email in the set. This ensures you are not reviewing redundant content.
Final Step - Create the Review Set: Once the responsive set of documents has been identified, you can either bulk tag them to a Work Folder or create a Saved Search. This will form your Review Set.
Automate the Redaction Process
On the second part, you'll automate the identification and redaction of sensitive information within the Review Set, streamlining the process and ensuring that all necessary data is appropriately redacted before review.
Export Communicators from the Review Set: From the Review Set Work Folder or Saved Search, navigate to the Communications module and select Export Chart.
This will generate a list of all communicators within the Review Set that need to be redacted.
Export Entities for Redaction: Entities are predefined items such as email addresses, names, phone numbers, etc. Focusing on the Review Set Work Folder or Saved Search, go to the Dashboard, select the entities you wish to redact, and then export them to XLSX format.
Prepare the Redaction Lists: Using the downloaded lists, copy the list of communicators and entities, excluding the Data Subject. Then, go to the Wordlists module in Project Admin.
Create and Assign Wordlists:
Select Add to create a new Wordlist. Provide a name and include the list of communicators and entities. If necessary, split these into multiple Wordlists. Once done, click Edit and assign appropriate teams to the Wordlist. Only the assigned teams will have visibility of these Wordlists in the Document Viewer.Note
This use case focuses on the new Document Viewer – Native View, so adding quotes to terms (as described in the Knowledgebase Articles) to create Wordlists is not necessary. This step is only required when using an alternative text view (OCR/Loaded or Extracted) and you wish to highlight the terms there.
Redact Wordlist Hits in the Document Viewer – Native View
Lastly, you will apply redactions to the documents in your Review Set based on the Wordlist hits, ensuring that sensitive information is properly masked before final review.
Open Document and Select Wordlist: Go to the Review Set and open a document. In the Document Explorer, select the Wordlist for redaction.
Apply Redaction Label: Choose the appropriate Redaction Label to apply to the identified text and click Redact. If you prefer a black box with no redaction text, you can create and apply a blank label.
Redact All Documents in the Review Set: Repeat the redaction process for all documents in the Review Set to ensure consistency and completeness.
Note
The more Wordlists that are created, the more times the redaction process needs to be run. To optimize efficiency, it’s recommended to combine terms for redaction into as few Wordlists as possible, provided the character count allows. This will help streamline the redaction process and improve overall performance
Tips
Regular Expressions in Wordlists (2024.11 Version): Regular expressions are now fully supported in Wordlists. This feature allows you to create Wordlists containing regular expressions, such as those for Personally Identifiable Information (PII).
Example: US SSN Redaction:
To redact US Social Security Numbers (SSN), create a Wordlist with the following expression:
[0-9]{3}-[0-9]{2}-[0-9]{4}
Other expressions for PII include:UK National Insurance Number (NIN):
[A-Z]{2}[0-9]{6}[A-Z]
US Phone Numbers:
(\+?(\b1)?[\ .\/-]?((?(2)|(\b))|(\())\d{3}(?(?<=\(\d{3})\)|)[\ .\/-]?)?(?(1)|\b)\d{3}[\ .\/-]?\d{4}[\ ]?([xX][\ ]?\d{1,5})?\b
Leverage Your Employee Directory: Use your employee directory to improve the accuracy of person name identification and increase recall.
Using Search in Document: Utilize the Search in Document feature to locate text within a document for redaction. This search also supports more extensive regular expressions (full support in the 2024.11 release).
For example, use the expression for US phone numbers:
(\+?(\b1)?[\ .\/-]?((?(2)|(\b))|(\())\d{3}(?(?<=\(\d{3})\)|)[\ .\/-]?)?(?(1)|\b)\d{3}[\ .\/-]?\d{4}[\ ]?([xX][\ ]?\d{1,5})?\b
Formatting Phone Numbers in Wordlists: When adding phone numbers or similar lists to a Wordlist, ensure that + signs are removed from the numbers to avoid search issues.
Wordlist Entry Limit: Avoid including more than 10,000 entries in a single Wordlist. Instead, divide the entries into multiple lists for better organization and efficiency.
Combine Expressions in One Wordlist: You can add multiple expressions to a single Wordlist. For example, combine country-specific PII, like UK PII (UK NIN, UK Phone, UK Postal Codes, etc.) in one list. This helps organize your rules and minimizes the number of clicks needed to convert to redactions.
Document Viewer Supports .NET Regular Expression Language: The new Document Viewer supports .NET regular expression syntax.
Known Limitations
Missing Text Layer in Native View: Not all documents in Native View contain a text layer. For instance, files such as PNG images or PDF image-based documents lack a text layer and, as a result, cannot be searched using the Search in Document feature or with Wordlists.
Redacting Initials: When attempting to redact initials, be aware that dots (.) are treated as wildcards by the Document Explorer, which may impact the accuracy of your redaction.
Regular Expression Library
Generic PII Regex List
(?<EMAIL>[\w\-\.]+@([\w\-]+\.)+[\w\-]{2,4})
(?<EMAIL_FIRSTPART>([^\.][\w\.\-]+)@)
(?<URL>(?<Protocol>\w+):\/\/(?<Domain>[\w@][\w.:@]+)\/?[\w\.?=%&=\-@/$,]*)
(?<IP_ADDRESS>(?<First>2[0-4]\d|25[0-5]|[01]?\d\d?)\.(?<Second>2[0-4]\d|25[0-5]|[01]?\d\d?)\.(?<Third>2[0-4]\d|25[0-5]|[01]?\d\d?)\.(?<Fourth>2[0-4]\d|25[0-5]|[01]?\d\d?))
(?<CC_AMERICANEXPRESS>3[47][0-9]{13}\b)|(\b3[47][0-9]{2}[ ]*[0-9]{6}[ ]*[0-9]{5})
(?<CC_BANKCARD>3[47][0-9]{13})
(?<CC_CHINAUNIONPAY>(62|88)[0-9]{13}[0-9]{1,4}\b)|(\b(62|88)[0-9]{2}([ ]*[0-9]{4}){3})
(?<CC_DINERSCLUB>3(?:0[0-5]|[68][0-9])[0-9]{11})
(?<CC_DINERSCLUBCARDBLANCHE>30[0-5][0-9]{11})
(?<CC_DINERSCLUBENROUTE>(2014|2149)[0-9]{11})
(?<CC_DINERSCLUBINTERNATIONAL>36[0-9]{12})
(?<CC_DINERSCLUBUNITEDSTATES&CANADA>(54|55)[0-9]{14})
(?<CC_DISCOVERCARD>(\b6011[0-9]{12})|(\b6221(26|27|28|29)[0-9]{10})|(\b622[3-8][0-9]{12})|(\b6229(20|21|22|23|24|25)[0-9]{10})|(\b64[4-9][0-9]{13})|(\b65[0-9]{14}))
(?<CC_INSTAPAYMENT>63[789][0-9]{13})
(?<CC_JCB>35((28|29)|([3-8][0-9]))[0-9]{12})
(?<CC_LASER>(6304|6706|6771|6709)[0-9]{12})
(?<CC_MAESTRO>(5018|5020|5038|5893|6304|6759|6761|6762|6763|0604)[0-9]{8,15})
(?<CC_MASTERCARD>5[1-5][0-9]{2}([ ]*[0-9]{4}){3})
(?<CC_SOLO>(((6334|6767)[0-9]{12})|((6334|6767)[0-9]{14})|((6334|6767)[0-9]{15})))
(?<CC_VISA>4[0-9]{15}\b)|(\b4[0-9]{12}\b)|(\b4[0-9]{12}\b)|(\b4[0-9]{3}([ ]*[0-9]{4}){3})
(?<IBAN>(?:(?:IT|SM)\d{2}[A-Z]\d{22}|CY\d{2}[A-Z]\d{23}|NL\d{2}[A-Z]{4}\d{10}|LV\d{2}[A-Z]{4}\d{13}|(?:BG|BH|GB|IE)\d{2}[A-Z]{4}\d{14}|GI\d{2}[A-Z]{4}\d{15}|RO\d{2}[A-Z]{4}\d{16}|KW\d{2}[A-Z]{4}\d{22}|MT\d{2}[A-Z]{4}\d{23}|NO\d{13}|(?:DK|FI|GL|FO)\d{16}|MK\d{17}|(?:AT|EE|KZ|LU|XK)\d{18}|(?:BA|HR|LI|CH|CR)\d{19}|(?:GE|DE|LT|ME|RS)\d{20}|IL\d{21}|(?:AD|CZ|ES|MD|SA)\d{22}|PT\d{23}|(?:BE|IS)\d{24}|(?:FR|MR|MC)\d{25}|(?:AL|DO|LB|PL)\d{26}|(?:AZ|HU)\d{27}|(?:GR|MU)\d{28}))
US PII Regex List
(?<SSN_US>((?!666|000)[0-8][0-9\_]{2}\-(?!00)[0-9\_]{2}\-(?!0000)[0-9\_]{4})*)
(?<PHONE_US>(\+?(\b1)?[\ .\/-]?((?(2)|(\b))|(\())\d{3}(?(?<=\(\d{3})\)|)[\ .\/-]?)?(?(1)|\b)\d{3}[\ .\/-]?\d{4}[\ ]?([xX][\ ]?\d{1,5})?\b)
UK PII Regex List
(?<NIN_UK>[A-Z]{2}[0-9]{6}[A-Z])
(?<PHONE_UK>(?:(?:\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?(?:\(?0\)?[\s-]?)?)|(?:\(?0))(?:(?:\d{5}\)?[\s-]?\d{4,5})|(?:\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3}))|(?:\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4})|(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}))(?:[\s-]?(?:x|ext\.?|\#)\d{3,4})?)
(?<POSTALCODE_UK>([A-Za-z][A-Ha-hJ-Yj-y]?[0-9][A-Za-z0-9]? ?[0-9][A-Za-z]{2}|[Gg][Ii][Rr] ?0[Aa]{2}))
Dutch PII Regex List
(?<NIN_NL>(?<=(sofi|bsn|burgerservicenummer|nummer).*[^\d])\d{8,9})
(?<PHONE_NL_MOBILE>((\+?|00?)(([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?)31([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?(\(0\)|0?)?|\(?0)([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?6\)?([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?([0-9][^\S\n\t]?){8}\b)
(?<PHONE_NL_LANDLINE>((\+?|00?)(([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?)31([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?(\(0\)|0?)?|0|[^\S\n\t]?\(0)(([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?)((10|111|113|114|115|117|118|13|14|15|161|162|164|165|166|167|168|172|174|180|181|182|183|184|186|187|20|222|223|224|226|227|228|229|23|24|251|252|255|26|294|297|299|30|313|314|315|316|317|318|320|321|33|341|342|343|344|345|346|347|348|35|36|38|40|411|412|413|416|418|43|44|45|46|475|478|481|485|486|487|488|492|493|495|497|499|50|511|512|513|514|515|516|517|518|519|521|522|523|524|525|527|528|529|53|541|543|544|545|546|547|548|55|561|562|566|570|571|572|573|575|577|578|58|591|592|593|594|595|596|597|598|599|70|71|72|73|74|75|76|77|78|79)|\((10|111|113|114|115|117|118|13|14|15|161|162|164|165|166|167|168|172|174|180|181|182|183|184|186|187|20|222|223|224|226|227|228|229|23|24|251|252|255|26|294|297|299|30|313|314|315|316|317|318|320|321|33|341|342|343|344|345|346|347|348|35|36|38|40|411|412|413|416|418|43|44|45|46|475|478|481|485|486|487|488|492|493|495|497|499|50|511|512|513|514|515|516|517|518|519|521|522|523|524|525|527|528|529|53|541|543|544|545|546|547|548|55|561|562|566|570|571|572|573|575|577|578|58|591|592|593|594|595|596|597|598|599|70|71|72|73|74|75|76|77|78|79)\))\)?([^\S\n\t]|[^\S\n\t]?\-[^\S\n\t]?)?([0-9][^\S\n\t]?){6,7}\b)
Australian PII Regex List
(?<PHONE_AUS>(\({0,1}((0|\+61)(2|4|3|7|8)){0,1}\){0,1}(\ |-){0,1}[0-9]{2}(\ |-){0,1}[0-9]{2}(\ |-){0,1}[0-9]{1}(\ |-){0,1}[0-9]{3}))
(?<CRN_AUS>[0-9]{3}\s[0-9]{3}\s[0-9]{3}[A-Z])
(?<WWC_AUS_NSW>WWC[0-9]{7}[EV])
(?<TFN_AUS>(?:Tfn|tfn|TFN|Tax[ \t]*File[ \t]*Number)[ \t]*[:;]?[ \t]*(\d{3}[ \t]?\d{3}[ \t]?\d{2,3}))