- 29 Oct 2024
- 5 Minutes to read
- Print
- DarkLight
- PDF
Setting up a Brainspace Disaster Recovery Site
- Updated on 29 Oct 2024
- 5 Minutes to read
- Print
- DarkLight
- PDF
The Brainspace application does not have native disaster recovery functionality built in, however the open source architecture supports many infrastructure solutions to enable most businesses' disaster recovery (DR) and business continuity (BC) plans.
Brainspace has outlined several recommended configurations for a deploying a Brainspace Disaster Recovery site. Choose the configuration options based on your business’ Disaster Recovery requirements.
Overview
Setup Brainspace in an Active/Passive configuration (Active in primary datacenter; Passive in DR site).
Replicate data from Primary to DR site.
Each site (Active & DR) must have a valid Brainspace license.
Brainspace services must be turned off at DR site during replication.
For Brainspace 6.2 – 6.3
brainspace-platform on Application Server
brainspace-analysis on Analytics/On-Demand Server
If using PostgreSQL replication, the Passive site DB must be running.
If using block or file level replication, the Passive site DB should remain stopped until brought online and restored with the latest available DB backup during failover activities.
Replication
Brainspace recommends block level replication from Primary to DR site for all OS and Data volumes (or VM snapshot).
Include in full system replication (/, /var, etc.) - this content does not change other than during Brainspace install/upgrade.
Build data (/data volume).
Datasets data (/localdata volume).
There are two options for Database (PostgreSQL) replication:
PostgreSQL replication. Here is the link to the PostgreSQL version 9.6 replication documentation. Please refer to the version specific to your deployment.
Block/file level replication. The PostgreSQL DB can be backed up in a hot/running state, however some files may not copy or restore properly due to being in use at the time of copy. If you do enable hot replication, we recommend you also use the PostgreSQL “pg_dump” utility to create an occasional failsafe backup of the database, which should be copied using your preferred file replication solution. Run the “pg_dump” utility using cron, on a schedule that meets your business needs. It is acceptable to omit the hot replication and only run a recurring “pg_dump” export (i.e. every 15 minutes) to be copied and restored when needed at the DR site.
Use differential replication where possible to minimize bandwidth consumption, improve replication speed and efficiency and avoid copying content that has not changed.
Setup a cron job (scheduled to meet your business needs) to use the “pg_dump” utility to backup the PostgreSQL database to disk. Use the resulting file as your source for DB replication to the DR site.
Setup replication frequency as required to meet your organizations business needs.
Failover Procedure
Preparation
Brainspace recommends to regularly test DR failover procedures.
Perform health checks on DR site servers before making the DR site active.
Check disk space and verify you have a valid Brainspace license before attempting to start services at the DR site. Executing Failover.
Stop all Brainspace services and replication at Primary site (Note: these may be in a failed state in a true DR scenario).
Services should be started or made primary at DR site in the following sequence:
Storage
Database
Note
If server IP addresses are different at Primary and DR sites, a database script will need to be run to update file paths.
Brainspace services (Application, Analytics and On-Demand Analytics services can be started in parallel).
Verify all storage mount points, including any required NFS volumes, are active on all Brainspace servers at DR site. Make sure you can access a file on each of the following volumes.
Application server volumes: /data and /localdata (/data requires NFS services and sharing from Analytics server).
Analytics: /data (also shared to Application server).
On-Demand Analytics: /localdata and /localdata-share.
If using native PostgreSQL replication, follow procedure to make DR site DB active. Here is the link to the PostgreSQL version 9.6 failover documentation. Please refer to the version specific to your deployment.
Start PostgreSQL database service if it has not already been made active using the PostgreSQL failover procedure (Postgres will already be running if using the native PostgreSQL replication).
The following simple PostgreSQL health check verifies that the DB has data. It should be run from the DB server as the root user:
echo "\d" | runuser -l postgres -c 'psql brainspace'
If the database fails to start or if not using native PostgreSQL replication, restore the latest backup archive from the Primary DB (created using the “pg_dump” utility). Here is a link to the PostgreSQL Backup and Restore documentation. Please refer to the specific version for your deployment.
If the DR site servers use different IP addresses from the Primary site, the IPs need to be updated in the DR site database before the application is stared up. Use the following steps logged in as the root user on the DB server.
su – postgres to switch to postgres user
psql brainspace to access the brainspace database
update servers set root_url = replace(root_url, 'https://ip of primary site application server:8081/rest', 'https://ip of secondary site application server:8081/rest') where type = ‘RUNTIME’; example: update servers set root_url = replace(root_url, ‘https://10.x.x.1:8081/rest’, ‘https://11.x.x.1:8081/rest’) where type = ‘RUNTIME’;
Start Brainspace services on Application, Analytics and On-Demand Analytics servers.
The Analytics and On-Demand Analytics servers will need to be updated on the Services tab in Brainspace\Administration. Re-add the Analytics and On-Demand Analytics servers to the Services tab using the IP address or hostname and appropriate network port (default is 1604) of these servers.
Monitor the Brainspace Log (brainspace.log) on the Application server during and following startup for any failures or errors using the following command.
tail -F /var/lib/brains/app/apache-tomcat8/logs/brainspace.log
Resolve any log entries indicating resources are not available (directories or mount points) before proceeding.
Check services health tab (Administration\Services) in the Administration section for server health.
Verify all servers are Active, Status = Ready, and that disk, license and memory utilization is within acceptable limits (select 'information' icon for each server).
Download the analysisServer.out log from both the Analytics and On-Demand Analytics servers and verify there are no recent (since startup for DR execution) failures or errors.
Execute Brainspace post-install checklist to verify all necessary functionality, connectors and dependencies are working as required Fail Back to Primary.
Ensure Primary site services are stopped and all servers and underlying infrastructure is healthy.
Stop Brainspace services at DR site (tomcat & analysisserver) to prevent further changes to the DB Stop PostgreSQL replication if it is running.
Follow PostgreSQL recovery procedures to reset the Primary site DB as Active. Here is the link to the PostgreSQL version 9.6 recovery documentation. Please refer to the version specific to your deployment.
Run the “pg_dump” utility from the DR site to make a full DB backup and copy the output file to the Primary site for restore.
Restore the latest backup archive from the Primary DB (created using the “pg_dump” utility). Here is a link to the PostgreSQL backup and restore documentation: https://www.postgresql.org/docs/9.6/static/backup.html. Please refer to the specific version for your deployment.
Run PostgreSQL scripts to update the Primary server IPs in the DB.
Re-add Analytics and On-Demand Analytics servers to the Services tab using server IP or hostname.
Execute procedures to change the primary PostgreSQL server in the replication set. Here is the link to the PostgreSQL version 9.6 recovery documentation. Please refer to the version specific to your deployment.
Start Brainspace services (tomcat on Application server; analysisserver on both Analytics and OnDemand Analytics) at Primary site. Check logs, system health and perform post-install checklist to verify functionality.
Re-setup replication from Primary to DR sites. Here is the link to the PostgreSQL version 9.6 replication documentation. Please refer to the version specific to your deployment.