Overview
Text sets, in Reveal, are ways of representing a singular file in multiple formats, each providing its own purpose during document review. If you think about text sets in the context of a .jpeg file – a screenshot of a text message exchange from a custodian’s phone – then the native file refers to the source file itself, or the actual .jpeg directly off the custodian’s phone.
In Reveal, you can view the .jpeg’s Native View (PDF) text set in Document Viewer to see the image in a near-native state. Native View (PDF) displays the .jpeg in a view that’s as close as possible to how the original file would look if viewed from its data source. But you may also want to perform a keyword search on the text messages, which becomes possible if you OCR the .jpeg to create an OCR text set. Or, you may want to search through files by date the screenshot was captured or by custodian, which is where the Metadata text set becomes useful.
Indexing, or performing an Index job, is the process of generating a text set for your file so it can integrate with Reveal’s review and data visualization features. Much of Reveal’s indexing is automated as you process your data, but there may be occasions when you want to generate additional text sets so your data is more robust to work with.
Text Set types
Reveal’s platform has a variety of text sets. At a high level, there are three major categories of text sets with varying functions:
Rendered views are meant for allowing documents to be viewed, annotated, and redacted in a near-native state (nearly looks like the source file when viewed in its original software) using Document Viewer.
Text views are used for searching and analytics, providing Reveal with a file’s textual content. For example, the Reveal platform can’t open up and read a Word document, so it instead references a text set (e.g., OCR / Loaded). You can read text views through Document Viewer, but they may not always retain the native “look” of the original document or file.
Metadata surrounding your document is kept on its own, singular text set. Metadata related to your data can be viewed in the Review Grid.
Text set table
The below table provides a description of every pre-existing text set in Reveal.
Text Set Type | Description | Category |
|---|---|---|
Document_Metadata (Metadata) | The text of all field data loaded with or pulled from the dataset. | Metadata |
Native View (PDF) | A system-generated PDF representation of a document from its original file format (e.g., Word, PDF, Excel) without conversion or modification. | Rendered view |
Spreadsheet View | Content from native spreadsheets (e.g., Excel), displayed in a grid format, allowing users to view and interact with rows and columns. Spreadsheet View doesn’t contain searchable text, rather it’s a link to the spreadsheet itself. | Rendered view |
OCR / Loaded | Searchable text generated in processing:
This is the primary text set used for searching. | Text view |
Extracted | Embedded text extracted directly from the native file itself (Word documents, email messages, PowerPoint slides, Excel spreadsheets, etc.). This can supplement the OCR / Loaded text set. | Text view |
Transcription | Transcribed text from audio or video (A/V) files, representing spoken words as written content. | Text view |
Australia Native PDF | A system-generated PDF representation of a document from its original file format (e.g., Word, PDF, Excel) without conversion or modification. | Rendered view |
Australia Extracted PDF | Embedded text extracted directly from the native file itself (Word documents, email messages, PowerPoint slides, Excel spreadsheets, etc.). Australia Extracted PDFs are created when processing data using Australian numbering. | Text view |
Important
Australia Extracted PDF and Australia Native PDF function the same as Extracted text and Native View (PDF), respectively.
When Extracted text and Native View (PDF) text sets are mentioned in this knowledge base, assume the situation is the same for Australia Extracted PDF and Australia Native PDF.
Custom text sets
In addition to Reveal’s pre-existing text sets, you can create your own text sets depending on the needs of your data. These usually have unique names preset by the user depending on the purpose of the additional text set.
Common use cases are outlined below:
Text Set Type | Description | Category | Generation |
|---|---|---|---|
Translation text sets | Translated text from a document, converting content from one language to another while retaining the structure of the original. | Text View | Translation jobs |
OCR text sets | Text generated during OCR, which could be helpful if you want to generate a single text set of just OCRed text. This can supplement the OCR / Loaded text set, which is usually a mixture of OCRed and embedded text. | Text View | OCR jobs |
PDF Sets | Additional PDF text sets of your data outside Native View (PDF), used during imaging for a Production job. PDF sets can be created in Review Manager, if needed. | Native View | Database Update jobs (Review Manager) |
Spreadsheet Sets | Additional spreadsheet text sets of your data outside Spreadsheet View, used during imaging for a Production job. Spreadsheet sets can be created in Review Manager, if needed. | Native View | Database Update jobs (Review Manager) |
Image Sets | Image text sets (pictures) of your data, used during imaging for a Production job. | Native View | Database Update jobs |
Text set indexing order
If multiple index jobs are being performed in succession for a file (e.g. performing an index job in the Review Grid for multiple text sets) data is indexed in a specific text set order:
The Document_Metadata text set, indexed first to prioritize key information.
Text View text sets, which may include OCR / Loaded, Extracted, and Transcription.
OCR / Loaded will be indexed first, if present, then followed by the Extracted text set. This is done to get data into the project as quickly as possible to make the documents searchable.
Rendered View text sets, which may include Native View (PDF) and Spreadsheet View.