Dataset Archive and Restore Options
  • 07 Nov 2024
  • 13 Minutes to read
  • Dark
    Light
  • PDF

Dataset Archive and Restore Options

  • Dark
    Light
  • PDF

Article summary

Archive and Restore a Dataset (v7.1 and Later)

Important

This section pertains specifically to Brainspace version 7.1 and later.

Overview of Archiving and Restoring Datasets

  • Brainspace 7.1 now supports archiving and restoring datasets directly through the application’s user interface, eliminating the need for the Linux command line.

  • Unlike previous versions, Brainspace 7.1 will handle build directories with spaces and special characters in their names by converting them to underscores.

  • As with previous versions of Brainspace, restoring a dataset will delete any existing work product and previous files within that dataset.

  • Brainspace 7.1 supports restoring datasets originally archived in Brainspace version 6.6 and later.

Archiving and restoring datasets must be performed by Brainspace Administrators.

If archiving and restoring a dataset for the first time, it is recommended to test the process on a sample dataset before deleting any datasets from Brainspace after archiving.

Important

After a dataset is deleted from Brainspace, it cannot be recovered if it has not been archived correctly.

Below are the high-level steps for archiving and restoring a dataset through the Brainspace user interface within the same instance.

  1. Disable the dataset.

  2. Archive the dataset.

  3. Restore the dataset.

Disable a Dataset

The first step in the archive process is to disable a dataset by changing its status from Active to Inactive.

Important

Ensure that any background work-product related processes (e.g., CMML, Focus, etc.) are not running before disabling a dataset.

  1. In the Brainspace user interface, click Administration. The Datasets screen will open.

  2. Locate the dataset in the list, and then click the Change Dataset… status icon. A confirmation dialog will open.

  3. Click the Disable button.

The Datasets screen will refresh, and the dataset status will change from Active to Inactive, which indicates that the dataset has been disabled.

Archive a Dataset

After disabling a dataset, it is ready to be archived.

  1. Locate the dataset in the list, and then click the Archive Dataset icon. A confirmation dialog will open.

  2. Click the Confirm button.

  3. The Dataset list will be displayed, and the status of the dataset being archived will display Archiving.

  4. When the archiving process is completed, the status of the dataset that was archived will display Inactive.

Note

The archiving process may take several minutes and is dependent on the size of the dataset. To see updated status of the dataset, refresh the page.

After successfully archiving a dataset, it can be deleted from the Brainspace user interface to free-up disk space. Again, it should be noted that when a dataset is deleted from Brainspace without being correctly archived, it cannot be recovered.

Restore a Dataset from an Instance

The Same Instance

After archiving a dataset, it can be restored back into Brainspace at any time.

Note

If the archive is included on a separate or remote storage volume, the directory must be mounted or linked within /data/brainspace/archive, or the archive tar.gz must be copied to /data/brainspace/archive.

To restore a dataset:

  1. In the Brainspace user interface, click Administration. The Datasets screen will open.

  2. Click the Add Dataset button. The New Dataset dialog will open.

  3. In the New Dataset dialog, type in a name for the dataset.

  4. Click the Create button.

  5. Click the Choose Connector button, scroll down the list, and then click Restore Archived Dataset. The Restore Archived Dataset dialog will be displayed

  6. Click the Browse button.

  7. In the Select file dialog, click on the desired archive to restore.

  8. Click the Select File button. The Restore Archived Dataset dialog will be displayed.

  9. Click the Restore Dataset button. The License Check dialog will be displayed.

  10. Click the Proceed button.

The Datasets list will be displayed, and the status of the restored dataset will be Copying.

Note

After a dataset is restored, it will not be available to select in the Restore Archived Dataset Select file dialog because it has already been restored.

The archiving process may take several minutes and is dependent on the size of the dataset. The archived dataset can take an extended period of time to uncompress. To see an updated status of the dataset, refresh the page.

A Different Instance

Brainspace 7.1 supports restoring datasets from Brainspace 6.6.x and later. Restoring datasets from a different instance or from Brainspace 6.6.x–6.8.x requires additional steps to reconfigure the Connector and remap dataset fields.  

Note

If your archive is included on a separate or remote storage volume, the directory must be mounted or linked within /data/brainspace, or the archive tar.gz must be copied to /data/brainspace.

  1. Follow the steps outlined for restoring a dataset (above).

  2. In the Datasets list, find the dataset you restored.

  3. Click on the Download Reports icon for the dataset you restored.

  4. In the list of reports displayed, scroll down to the report named Schema XML.

  5. Click on the Download button of the Schema XML file.

After restoring the dataset, the connector must be set up by deleting the existing connector and reconfiguring the connector.

  1. In the Datasets list, click on the Settings icon for the restored dataset.

  2. In the Settings dialog, click the trash icon next to the data source to delete it.

  3. In the confirmation dialog displayed, click Yes, Remove.

  4. In the Settings dialog, click Choose Connector.

  5. In the Choose Connector dialog, click on the Connector used for the dataset.

  6. In the Connector Dialog, click on the Source to select it.

  7. Click Save and Proceed to close the dialog.

  8. In the next dialog, click Proceed.

  9. The License Checks dialog will be displayed.

    Note

    The License Checks dialog indicates that documents will be added to the total, but the total will not change.

  10. Click Proceed.

  11. The Field Mapping dialog will be displayed.

  12. Using the Schema XML report as a reference, remap the fields to recreate the original dataset mappings that existed before the dataset was archived, and then click the Continue button.

  13. In the confirmation dialog, click Confirm.

  14. The Dataset Settings dialog will be displayed.

  15. If desired, a user can choose to either Build the dataset or Save it.

Datasets with Connected Tags

After reconfiguring the connector, any datasets with Connected Tags will need to be reconnected.

  1. In the Datasets list, find the restored dataset and click the Tag Management icon.

  2. In the Manage Tags modal, click the Connect Tags button.

  3. The Connect Tags modal will be displayed.

    Important

    Users should avoid deleting tags they plan to reconnect.

  4. In the Connect Tags modal, click the check boxes to select the desired tags to reconnect.

  5. Click Connect.

  6. The Manage Tags modal will be displayed again.

  7. Click the Close button.

Known Issues Restoring to a Different Instance

  1. Saved searches will not preserve the username and will display By deleted user.

  2. Notebook Created by and Last modified by usernames will not be preserved.

  3. CMML sessions created from a portable model will show the portable model's name from its original source, but it may not appear in the new environment’s list of portable models.

  4. CMML sessions will display the message “More documents have been added to the dataset you are analyzing”, as well as the Update Classifier button.

    Note

    The message is displayed because data was moved, not because additional documents have been added.

    1. Click the Update Classifier button.

    2. The Update Classifier dialog will be displayed.

    3. Click the Score Documents button.


Archive and Import a Dataset (v6.2-v7.0)

Important

This section pertains specifically to Brainspace versions 6.2 to 7.0.

Overview

Beginning with Brainspace v6.2, Brainspace Administrators have the ability to archive datasets to free-up disk space and then import archived datasets and associated work products back into Brainspace at a later date.

Note

If you are using Brainspace v6.0 through Brainspace v6.1.6, you must upgrade to Brainspace v6.2 to use the Archive and Import feature. If you have a separate Postgres server, this feature is not currently supported.

Archiving and importing datasets must be performed by Brainspace Administrators who have a basic understanding of Linux file management and command-line access to the Brainspace servers’ operating systems and database.

If this is your first attempt to archive and import a dataset, we recommend that you archive and then import a test dataset before deleting datasets from Brainspace after they have been archived. After a dataset is deleted from Brainspace, it cannot be recovered if it has not been archived correctly.

The archive and import process involves the following high-level steps:

  1. Disable a dataset in the Brainspace user interface.

  2. Archive a dataset in the command-line interface.

    Note

    The archive/import script does not handle build directories with spaces in their names, like those generated in Discovery 5.5 and older, and will require updates to the database. Please contact Brainspace Support before attempting to archive these datasets.

  3. Import a dataset in the Brainspace user interface and command-line interface.

  4. Activate a dataset in the Brainspace user interface.

  5. Remap a dataset's fields.

Disable a Dataset

The first step in the archive process is to disable a dataset by changing its status from Active to Inactive.

To disable a dataset:

  1. In the Brainspace user interface, click Administration. The Datasets screen will open.

  2. Locate the dataset in the list, and then click the Change Dataset… status icon. A confirmation dialog will open.

  3. Click the Disable button.

The Datasets screen will refresh, and the dataset's status will change from Active to Inactive, which indicates that the dataset has been disabled.

Archive a Dataset

After disabling a dataset, you are ready to archive it.

Note

The archive/import script does not handle build directories with spaces in their names, like those generated in Discovery 5.5 and older, and will require updates to the database. Please contact Brainspace Support before attempting to archive these datasets.

To archive a dataset:

  1. As the root user in the command-line interface, run:
    /var/lib/brains/scripts/archive-brainspace-dataset.sh --archive

  2. Type your brsarchive user password. A list of datasets available in your environment will appear.

    Note

    If a user password was not previously created, the password entered will be used to create the user. Please remember this password.

  3. Type the dataset’s ID number, and then press Enter on your keyboard.

  4. Type the archive directory path (e.g., /data/brainspace/archive) where you would like to keep the archive, and then press Enter on your keyboard. After the script runs through the archive process and compresses the archive directory, the following confirmation message will display:

    [2019-04-24 10:53:09] - INFO - Archive Completed. Please find all files located in /opsshared_data/apollo-data/brainspace/archive/TestingScript-04-24-2019_1040/TestingScript-bf73bd4a-ce7b-4daf-bae0-500880b4434c.tar.gz

    [2019-04-24 10:53:09] - INFO -

    [2019-04-24 10:53:09] - INFO - the checksum of the file is /opsshared_data/apollo-data/brainspace/archive/TestingScript-04-24-2019_1040/TestingScript-bf73bd4a-ce7b-4daf-bae0-500880b4434c.shazam

    [2019-04-24 10:53:09] - INFO - To complete the process manually remove the files located in /opsshared_data/apollo-data/brainspace/archive/TestingScript-04-24-2019_1040/TestingScript

    [2019-04-24 10:53:09] - INFO - And return to the Brainspace user interface

  5. Navigate to the archive location identified in the confirmation message, and then verify that the dataset has been archived successfully.

    Note

    Copy and store the path for the directory location for future reference (e.g., /brainspace/archive/TestingScript-04-24-2019_1040/TestingScript-bf73bd4a-ce7b-4daf-bae0-500880b4434c.tar.gz). You will need this path when using the import script.

After successfully archiving a dataset, you can delete it from the Brainspace user interface to free-up disk space; however, when a dataset is deleted from Brainspace without being correctly archived, it cannot be recovered.

Import a Dataset

After archiving a dataset, you can import it into Brainspace at any time.

Note

If your archive is included on a separate or remote storage volume, the directory must be mounted or linked within /data/brainspace, or the archive tar.gz must be copied to /data/brainspace.

To import a dataset:

  1. As the root user in the command-line interface, run:

    /var/lib/brains/scripts/import-brainspace-dataset.sh --expand-compressed-archive

  2. Type your brsarchive user password. A list of datasets available in your environment will appear.

    Note

    If a user password was not previously created, the password entered will be used to create the user. Please remember this password.

  3. Type the path of the archived tar.gz file as noted during the archive process, and then press Enter on your keyboard. You will be prompted to create a new dataset using a provided patch ending in /data as shown in the following example:

    [2019-04-24 11:08:32] - INFO - Please use the following path in Brainspace UI to Load From Disk in newly created Dataset.

    [2019-04-24 11:08:32] - INFO - /brainspace/archive/TestingScript-04-24-2019_1040/04-24-2019_1104/TestingScript/data

    Note

    The archived dataset can take an extended period of time to uncompress.

  4. Create a dataset:

    Important

    When creating a new dataset, do not click the Build button during the process described below.

    1. In the Brainspace user interface, click Administration. The Datasets screen will open.

    2. Click the Add Dataset button. The New Dataset dialog will open.

    3. In the New Dataset dialog, type a dataset name, and then toggle switches in the Dataset Groups pane to add the new dataset to one or more groups.

    4. Click the Create button.

    5. Click the Choose Connector button, scroll to the bottom of the list, and then click Load Existing Dataset.

    6. Type the path for the archived build folder.

    7. Click the Create Dataset button.

    1. Close the window and wait for the dataset to enable.

    2. Disable the dataset (see Disable a Dataset).

  5. After creating and disabling the new dataset, return to the command-line interface, and then choose either Option 1 or Option 2.

  6. When prompted, type the brsarchive user password, and then press Enter on your keyboard.

  7. Type the new dataset ID, and then press Enter on your keyboard. The import process will begin.

After the import process has completed, you can remove the uncompressed directory and enable the dataset in the Brainspace user interface.

Activate a Dataset

After importing an archived dataset, you must enable the dataset to use it in Brainspace.

To enable a dataset:

  1. In the Brainspace user interface, click Administration. The Datasets screen will open.

  2. Locate the dataset in the list, and then click the Change dataset… icon. A confirmation dialog will open.

  3. Click the Enable button.

The Datasets screen will refresh, and the dataset’s status will change from Inactive to Active. After activating the dataset, you must remap the dataset's fields.

Remap Dataset Fields

After activating the dataset, you must remap the dataset's fields.

To remap the dataset's fields:

  1. Download the dataset's Schema XML report as described in Dataset Reports.

  2. Navigate to the Field Mapping dialog:

    1. In the user drop-down menu, click Administration:

    2. The Datasets screen will open.

    3. In the Datasets screen, locate the dataset, and then click the Settings icon:

      The Dataset Settings dialog will open.

    4. Click the Reconfigure Data Source icon.

      The dataset configuration dialog will open.

    5. Click the Proceed button.

      The License Checks dialog will open.

    6. Click the Proceed button.

      The Field Mapping dialog will open.

  3. Using the Schema XML report, remap the fields to recreate the original dataset mappings that existed before the dataset was archived, and then click the Continue button.

    The Dataset Settings dialog will will refresh.

  4. Click the Run This Build Type button next to Full Analytics with Ingest or Full Analytics without Ingest.

    Choosing Full Analytics with Ingest will re-ingest the documents from the data source using the new field mapping that was configured previously.

    Choosing Full Analytics without Ingest will rebuild the dataset with the fields that were mapped. The Schedule Build dialog will open.

  5. Click the Build as soon as possible button.

    Note

    If you choose to build the dataset in the future, click the Schedule Build Time field, select a date and time, and then click the Save button.

The Datasets page will refresh and show the Dataset Queue build in progress:

While the build is in progress, you can click the View Status button to view the build steps in progress. For information on each step in the build process, see Build Steps.

After the build completes successfully, the dataset will move from the Dataset Queue to the list of active datasets in the Datasets page.


Archive and Restore Options (v6.1 and Earlier)

Important

This section pertains specifically to Brainspace version 6.1 and earlier.

To archive and import individual datasets, you must upgrade to Brainspace v6.2. For more information, see Archive and Import a Dataset.

Active and Inactive Datasets

Brainspace provides several options for managing your data and storage. Datasets can either be set to Active or Inactive status in the Administration screen in Brainspace. Active dataset documents only count against your active documents license allocation; likewise, inactive datasets do not count against your active license allocation and do not consume system memory.

Backup vs. Archive

The purpose for having a backup process in place is for disaster recovery, such as to restore a dataset that was deleted by accident or to roll back to a desired restore point. Back up your Brainspace environment using your preferred enterprise backup solution (e.g., VMware snapshots or backup software from Veritas or CommVault), making sure to include system, data, and database volumes.

Archiving your Brainspace data involves removing data from your live Brainspace instance so it no longer consumes memory or disk resources and, in some scenarios, license capacity. The primary reason for archiving is to reduce production disk storage usage.

System Backup Recommendations

Take full system snapshots/backups using your enterprise backup solution for disaster recovery and business continuity purposes. This will allow you to recover from infrastructure outage or data corruption to the last good restore point.

Brainspace recommends that full system backups (including database) or VM snapshots be made of the full Brainspace environment. In addition to the operating system partitions, backup the Builds (/data) and Datasets (/localdata) directories from each server and the PostgreSQL database.

Brainspace does not support backing up and restoring individual datasets prior to Brainspace v6.2. If you need to restore an individual dataset on Brainspace v6.0 or v6.1, you will need to perform a full system restore on similar infrastructure and then remove any unneeded datasets. This method will require separate Brainspace instance licensing for the recovery infrastructure.

To perform a full system recovery, restore each server in your Brainspace instance to the same restore point to match the database and each server's data volumes. Please contact Brainspace Support if you need assistance in setting up a disaster recovery/business continuity solution.

Archive Options Summary

Dataset State

Consumes License?

Consumes System Memory?

Consumes Disk Resources?

Work Product Retained?

Applicable Per Dataset?

Active (Enabled)

Yes (Active)

Yes

Yes

Yes

Yes

Inactive (Disabled)

Yes (Inactive)

No

Yes

Yes

Yes

*System Archive option is for business continuity or disaster recovery purposes only. If restoring to duplicate infrastructure, separate Brainspace instance licensing is required.


ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence