Load Data - Third-Party Production
  • 19 Nov 2024
  • 12 Minutes to read
  • Dark
    Light
  • PDF

Load Data - Third-Party Production

  • Dark
    Light
  • PDF

Article summary

Third-party productions are generally delivered with the following elements:

  • A delimited load file containing document control numbers and metadata.

  • Extracted text or optical character recognition (OCR) searchable versions of the produced documents.

  • Native files may be delivered as specified.

  • Image files of document pages with a load file to link with the control numbers in the document metadata load file.

This material is loaded using the Reveal Review Manager.

Loading Third-Party Production Data

  1. Log in to Review Manager with an Administrator account.

  2. If the project into which the production is to be loaded is not yet created, create the project.
     64 - 04 - Create new project in Review Manager

  3. Open S3 Browser and create or open the project folder within your bucket.
     08 - 00 - S3 Bucket new folder for upload

  4. Upload the production data to a sub-folder of the S3 project folder clearly labeled with the production name.
     08 - 00a - S3 folder upload

  5. Open the delimited load file (usually a Concordance DAT format, but may vary) to determine the fields included in the load file.

Mapping Fields

  1. Expand the Project Setup pane in Review Manager.

  2. Open Fields.

  3. Use New Field to create any load field (with the indicated data type) that does not have a match in the project database. NOTE that new fields may also be imported from a load file.  08 - 00b - Add field

    1. Field Table Name - The SQL column name. Cannot include spaces or dashes; underscores are permitted.

    2. Field Display Name- The name displayed to the user in Reveal. May include spaces.

    3. Field Data Type – The field types are:

      1. Boolean - True/False or Yes/No values

      2. Date - Multiple date formats supported (Example: DD/MM/YYYY)

      3. Date\Time - Date and Time combined (Example: DD/MM/YYYY HH:MM:SS)

      4. Float - Number with decimal point and unlimited precision after decimal

      5. Integer - Whole numbers, no decimal

      6. Numeric - Number with decimal point to 17 digits

      7. Text - Alpha numeric values

      8. Time - Time (Example: HH:MM:SS)

    4. Field Maximum length - The size of the field. Any data beyond a field’s max length will get truncated during data loading.

    5. Is Document Number Field – Indicates if field shall be used as a key field when loading images or updating data during Production Update.

    6. Is Searchable – Indicates whether the field should be indexed for searching.

    7. Is Updatable - Indicates whether the field should permit values to be modified. This attribute should be reserved largely to Custom fields, never for metadata.

    8. Is MultiValue Field – Indicates that the field may contain multiple values per document record. This setting allows the field to be parsed.

    9. Is Transcript Lookup - Designates the field to be a transcript reference field and be one of the selections in the Document Link Lookup By dropdown menu in a Transcript Pane. The user will be able to query this field and retrieve documents using a unique document ID.

    10. Click Add Field when done.

    11. Fields that do not currently exist in a project may also be imported from a load file using the Import Fields button in the Add new custom import field dialog box.  08 - 01 - Import new field from datafile

      1. Select Import Profile specifies the delimiter type used in the load file.

      2. Select Import Data File retrieves the load file whose field specification will be parsed. Any new field names discovered will be added to the Reveal project.

  4. In the Project Setup pane of Review Manager, open Import Mappings.  08 - 02 - Import mappings

  5. Specify the project (if not already open) in the Select Case field. The list of available fields will display in the table below.

  6. Select Import Profile to specify the delimiter set used in the load file; Concordance is the default.

  7. Mapping selects a named mapping profile if already defined; if not yet defined, go on with the specification and Save the Mapping name when complete.

  8. Select Import Data File is where you browse out to select the load file to be imported.

  9. There are two checkbox options as you prepare to map the table:

    1. Display unmatched fields only in selection dropdown is checked by default, to limit the choices when manually mapping or editing to fields not yet selected in the table.

    2. Hiding fields that are not imported optionally unclutter the table.

  10. Required fields are in yellow. Required fields are: BatchID, ItemID and BEGDOC.

  11. Unmapped fields are indicated with the word --Select-- in the Load File Field name.

  12. To have Reveal Review Manager try to match load file fields automatically, select the Match All Fields button just to the top right of the table. A prompt appears confirming that you wish to map the fields automatically.

    1. Load File Field values will appear next to Table Name fields

    2. The matches may or may not be correct - verify everything.

    3. Any unmatched fields from the load file will be available in a dropdown list when you click --Select-- in the Load File Field name for any unmatched Table Name field. 08 - 03 - Select import field

  13. These load file fields can be manually mapped by selecting from the dropdown list for any Table Name field, or may require custom fields to be created for them.

  14. To create a new custom field, you must go back to the Fields Tab.

  15. Uncheck the green checkmarks to deselect any fields that you do not want to import.

  16. Click the Save button to save the field mapping profile and name the profile in a similar manner to match your database. Click the Save As button if you wish to save a different mapping with a new profile name.

  17. Additional options are to Save As Database Template, Export XLSX and Export CSV.

  18. Certain fields may be flagged as <Manually entered> or <Automatically generated> rather than matched or left as --Select--.

    1. Fields such as ItemID and Parent_ItemID will normally be assigned by Reveal using internally sequential record numbers for consistency, referencing BEGDOC and BEGATT (Begin Attach) to signify new item records and item records sharing a parent document. These will be set as <Automatically generated> under Load File Field.

      Note

      Parent_ItemID is generated by referencing BEGATT and is used to populate document families. BEGATT must be mapped and the load file must have BEGATT populated for every record in the load file. If BEGATT is not populated for every record in the load file, the import/overlay will fail.

    2. BatchID is the field that gets populated in Review that’s used as a label to reference the import. The value is typically the volume information received by a producing party (e.g., VOL001). This is <Manually entered> at import rather than mapped from a field in the load file. See also Importing Documents->COMMON ERRORS below.

    3. Upon clicking Import Data, a prompt box will pop up:  08 - 05 - Manually entered import fields

  19. Once this step is complete, barring the need to configure any custom Text Sets, you are ready to import documents.

Note

For performance reasons there is a hard limit of 16MB expanded text size for indexing documents in the Native / HTML text set. While settings in Review Manager may be set to indicate a larger limit, any document exceeding the 16MB limit will not index and an error will appear in the indexing log. We strongly recommend contacting Reveal Support if encountering this limitation.

Note

The native and text file sizes differ from the expanded file sizes. The expanded file size is the size of the text set created.

Import Documents

  1. Expand the Import pane in Review Manager.

  2. Open Documents to specify, test and import.  08 - 04 - Import Documents-1

    1. Project select the project name from the drop-down list if not already set.

    2. Specify the Action, initially Import Data for initial dataset load (see Updating - Overlays for adding information with Update Data).

    3. Import Profile specifies the delimiter set used. The default is Concordance which uses ASCII 254 (the 'thorn' character) to define field contents, ASCII 020 as a field separator, ASCII 174 (the 'registered' symbol) for an in-field new line, and a hard return for a new record. A new Import Profile using different characters may be defined and specified.

    4. Mapping as configured and saved in Import Mappings above.

    5. Import Data File as referenced in Import Mappings.

    6. Use Data File Path - Checkbox specifying:

      1. Absolute Path Imports - When importing documents using a load file with fully qualified paths, ensure that Use Data File Path is checked. The Source Directory field is greyed out because source directory will not need to be specified, since Review Manager will instead rely on absolute pathing to locate the corresponding file on the system or network.

        Pathing Scenarios

        Subfolder Prefix

        NativePath

        TextPath

        Absolute path

        Drive letter (local or mapped)

        C:\Client\Matter\Vol01\Native\0001.pdf

        C:\Client\Matter\Vol01\TEXT\0001.txt

        UNC (\\)

        \\Client\Matter\Vol01\Native\0001.pdf

        \\Client\Matter\Vol01\TEXT\0001.

      2. Relative Path Imports - When importing documents using a load file with partial paths, ensure that Use Data File Path is unchecked. Review Manager will make the necessary adjustments so that all relative path scenarios are treated the same. The Source Directory will need to be specified. Review Manager supports the pathing structures shown in the table below.

        Pathing Scenarios

        Subfolder Prefix

        NativePath

        TextPath

        Relative path

        Nothing

        Native\0001.pdf

        TEXT\0001.txt

        Backslash dot (.\)

        .\Native\0001.pdf

        .\TEXT\0001.txt

        Backslash (\)

        Native0001.pdf

        TEXT0001.txt

         08 - 06 - Relative Path Import

    7. Set Language appropriately, usually to English in the United States, for example; language selection, which is noted as controlling date format, opens initially as Afrikaans which is the first alphabetical choice. The following languages are supported:

      Afrikaans

      Albanian

      Amharic

      Arabic

      Armenian

      Azerbaijani

      Basque

      Belarusian

      Bengali

      Bosnian

      Bulgarian

      Cebuano

      Chinese (Simplified)

      Chinese (Traditional)

      Corsican

      Croatian

      Czech

      Danish

      Dutch

      English

      English (United Kingdom)

      Esperanto

      Estonian

      Finnish

      French

      Frisian

      Galician

      Georgian

      German

      Greek

      Gujarati

      Haitian Creole

      Hawaiian

      Hindi

      Hmong

      Hungarian

      Icelandic

      Igbo

      Indonesian

      Irish

      Italian

      Japanese

      Kannada

      Kazakh

      Khmer

      Korean

      Kurdish

      Kyrgyz

      Lao

      Latin

      Latvian

      Lithuanian

      Luxembourgish

      Macedonian

      Malay

      Malayalam

      Maltese

      Maori

      Marathi

      Mongolian

      Myanmar (Burmese)

      Nepali

      Norwegian

      Nyanja (Chichewa)

      Pashto

      Persian

      Polish

      Portuguese (Brazilian)

      Punjabi

      Romanian

      Russian

      Samoan

      Scots Gaelic

      Serbian

      Shona

      Sindhi

      Sinhala (Sinhalese)

      Slovak

      Slovenian

      Somali

      Spanish

      Sundanese

      Swahili

      Swedish

      Tagalog (Filipino)

      Tajik

      Tamil

      Telugu

      Thai

      Turkish

      Ukrainian

      Urdu

      Vietnamese

      Welsh

      Xhosa

      Yiddish

      Zulu

    8. Administrators are strongly advised to run Test mode only first to turn up any errors in the mapping or import setup before they are written to the project database.

    9. When importing, checking Null bad dates and log will handle incorrectly-formatted dates by loading null values and logging the document ID and issue for analysis and remediation. Here are examples of acceptable common formats:

      •  Date format - MM/dd/yyyy, MM-dd-yyyy, yyyy-MM-dd, dd-MM-yyyy or yyyyMMdd
      •  Time format - hh:mm:ss tt [AM|PM], h:mm tt [AM|PM], hh:mm:ss, or hh:mm
      •  DateTime format - yyyy'-'MM'-'dd'T'HH':'mm':'ss = 2024-05-28T11:35:00

      Note

      Any column that is a DateTime field with a value length of less than 10 characters must be in YYYYMMDD format otherwise it is set to null. See Troubleshooting Data Import for details on date parsing during import.

    10. Click Import Data.

NOTE

File Encoding – Review Manager attempts to detect the encoding of a dat file when it is imported. UTF-8 is the default file format if the detection library cannot make any other determination. The detection library reads the first 1Kb of the file to determine the encoding, and if there are no detection characters or byte order marks (BOM) found, then it will fall back to UTF-8.

Testing Document Import

  1. With the Document Import settings configured and Test mode only checked, click Import Data.

    1. Review Manager will sample paths referenced in your load file to ensure that they exist on disk and warn immediately if referenced files cannot be found.  08 - 07 - Test Import - broken link

  2. Once the data path is confirmed by Review Manager you will be shown all <Automatically generated> field references (such as for ItemID and Parent_ItemID) and prompted for <Manually entered> field information (such as BatchID).  08 - 08 - Test Import - manual entry field

  3. After checking a Preview load table, select OK to run the test import. If all is well, the result will look roughly like this:

    08 - 09 - Test Import - clean result

  4. You may now deselect Test mode only.

  5. At your option, you may elect to Build Email Threads and/or Build Document Folders during import; these processes may also be run separately under functions in the Create pane.

Running Document Import

  1. To run the live import, once more click Import Data with Test mode only unchecked.

  2. Once again enter any <Automatically generated> field references (such as for ItemID and Parent_ItemID) and <Manually entered> field information (such as BatchID) when prompted.

  3. Check a Preview load table.

  4. Select OK to run the import.

  5. The load process will be more verbose, and should look like this:  08 - 10 - Import complete

Running Image Import

When you import images to Reveal, you need to either connect to the image load file, which should be (a) an IPRO LFP or Opticon (Concordance Image) OPT format file with single page image, or (b) PDF files loaded as images. The load files will likely be found in the same folder as the load file(s) used to load documents from a third-party production, with the referenced images found in subfolders.

PDFs will automatically be converted to single-page jpegs and loaded into your destination image set. If these PDFs contained searchable text, Reveal can extract this text and store in a text set.

  1. After indexing your text data has been started, you should begin the loading of your images. This can be done directly from the Index window, by clicking on the GoTo Load Images button at the bottom of the Indexes window or expanding the Import Pane and choosing the Images link. The Load Images window appears.  08 - 13 - Import images

  2. Select the Project Name from the dropdown menu.

  3. Select the Original Image Set, or create a New Set if needed. Reveal supports multi-version image loading. When you load images you will be able to use an existing set or create a new image set into which new document images will be placed.

  4. Select the Image Key Field or confirm Image Key (e.g., BEGDOC) for a new set.

  5. Set the Extracted Text Set (for PDFs only) specifies where text extracted from a PDF will be grouped.

    08 - 14 - Import images - Extracted Text Set for PDF

  6. Select the Image Template (for PDFs only) to manage PDF conversion settings.

  7. Use the ellipsis to search for and select the Load File. This will be a file in the document imports folder having an LFP or OPT extension.

  8. Specify the Load File Type.

  9. Indicate if Load File Path should be used, if the load file has a fully defined path to the image file. If your image load file has a complete path to the images they will load from that location. However if your path in the image load file doesn’t have the complete path and starts with the IMAGES folder, un-check the Use Load File Path box and browse to the IMAGES folder in the Source Directory field. Make sure there is a slash(\) at the end of the Source Directory path.

  10. Specify the Source Directory to locate the IMAGES folder if Use Load File Path is unchecked.

  11. Overwrite Existing option - Check only if you are replacing images that already exist in the database.

  12. Overwrite Annotations option - Check only if you want to overwrite existing images that include Annotations and Redactions.

  13. Select Load Images to complete the process.

  14. If needed, you have the option to View or Export the log file.

Common Errors

Index Errors:

The error message occurs when the load file is is missing information: An exception occurred creating the DataTable to hold import data. The message is: Index was outside the bounds of the array.

08 - 11 - Import load file error

Situations where this might occur are:

  1. If a line in a load file is incomplete, missing fields, or missing/incorrect delimiters.

  2. If a line in a load file has carriage returns.

  3. If the BegAttach field is not mapped and ParentID is set to "- Auto generated -" (which is the default) then an error will occur. This happens because ParentID is built by referencing the BegAttach field, so the system requires a properly mapped BegAttach to populate ParentID. NOTE: BegAttach values cannot be blank if using it to Auto-generate ParentID.

In addition, for performance reasons there is a hard limit of 16MB expanded text size for indexing documents in the Native / HTML text set. While settings in Review Manager may be set to indicate a larger limit, any document exceeding the 16MB limit will not index and an error will appear in the indexing log. We strongly recommend contacting Reveal Support if encountering this limitation.

Note

The native and text file sizes differ from the expanded file sizes. The expanded file size is the size of the text set created.

Storage Errors:

If the load file that is being utilized contains broken paths to natives or text, store errors will occur. Store errors signify that either the referenced text or native file did not exist, or the native or text path field within the load file was not formatted properly.

08 - 12 - Import store error

After Import

Once all Document imports are complete, go to Create Indexes to render them searchable.
If overlays are required to add further metadata or custom values, see Updating - Overlays.


ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence