How to Collect from Confluence On-Premise
  • 25 Jun 2024
  • 4 Minutes to read
  • Dark
    Light
  • PDF

How to Collect from Confluence On-Premise

  • Dark
    Light
  • PDF

Article summary

Getting Started

Confluence is a team content collaboration software. Onna supports Confluence Cloud and Server version 5.7 and up. Onna connects directly with the API to collect all information in native format. The integration collects all data and metadata from an entire Confluence site or individual spaces.

To collect from Confluence on-premise you will need Onna's Discovery app and an Onna Enterprise account. You can request the Onna Discovery application from our Support team, who will also need to whitelist the domain on our end before a collection can be performed. Your Confluence admin should be able to provide the domain for the sites that will be collected.

Please note the application needs to be installed by a user with admin-level access to that machine. If you have 2FA or SSO enabled for your Confluence site, you may need to create a new account without it enabled.

Generally, a server, virtual or physical, is preferred over a desktop or laptop unless the machine will remain unlocked and have no interruptions to its connectivity. In addition, the app must be installed on a machine that is behind the desired firewall and that has constant connectivity to the Confluence server and Internet.

Integration Features

All files are synced, including, but not limited to:

  • HTML content of the page

  • Comments on pages

  • Attachments for the page

  • Labels for attachments and pages

  • Ancestors for the page/attachments

  • Historical information and related metadata, including:

  • Author of the page

  • Created by/on

  • Last updated by/on

  • Previous Version created by/on

Types of Sync Available

For on-premise collections we only support one-time sync

  • One-time sync collects information in an account until a specified date. It does not update once collected.

The synchronization scope currently encompasses entire Confluence sites, specific Confluence spaces, and specific Confluence pages.

Data Exports

All files and metadata can be exported in eDiscovery ready format. Load files are available in a dat, CSV, or custom text file.

The following metadata fields are exported:

  • Space Name

  • Space ID (numeric field to identify space in Confluence)

  • Confluence Space Type

  • Ancestors for a file

  • List of Labels

  • All date related metadata

How to Guide

First, install the app on a machine that is behind the desired firewall and that has constant connectivity to the Confluence server and Internet.

Note: Generally, a server, virtual or physical, is preferred over a desktop or laptop unless the machine will remain unlocked and have no interruptions to its connectivity. The app needs to be installed by a user with admin level access to that machine. If you have 2FA enabled for your Confluence site you may need to create a new account without it enabled.

The app will open onto a login screen similar to the platform's login.

After logging in with the same credentials you would use in the web platform, the app will open to your Workspaces page

Currently, the workflow you'll have to follow is either:

  • Creating a new Workspace for your collection

  • Using an existing Workspace to add a data source

Inside the workspace, next click "Add new Source"

Currently, you can use the app to add a Confluence or Jira source.

Select Confluence.

First, name your source. This is the source's title on the platform.

Enter the Confluence site's URL as the host. If the site is password-protected, enter your credentials here, including your full username's email. If the site is public, leave username and password blank. (See example below for collection from a public site). Once you've finished entering the details, click 'Connect'.

Note: Confluence sources in Onna do not store usernames/passwords, instead they use JSESSION ID cookies. These credentials will need to be refreshed when the cookie expires. To avoid being frequently prompted to renew credentials, we suggest extending the amount of time the cookie is valid.

The option for "Collect external links" will attempt to collect and download links on the Confluence page. If the link is not accessible without authentication, it will not collect.

Select the space(s) you would like to sync. To sync all, select "All Spaces".

Once you have clicked "Sync", you will see this integration within your Groups page. You will also see it within your Sources page on the web platform.

Onna will begin to interact with Confluence's API and begin to sync files. Files will be processed and indexed so that all is searchable. A source will indicate that it's syncing during this process.

You can then view the synced data on your account on the the web platform.

Confluence pages in Onna

For on-premise Confluence collections we render the pages collected in HTML.

Accessing audit logs

Follow this article for information on viewing source audit logs.

Confluence FAQ

For Confluence on-premise collections, is it necessary to install anything on a server?
Yes, one needs to install an application on a Windows machine with at least 8 GB RAM that is always on and has constant connectivity to the Confluence server and Internet.

Where will the information be stored for an on-premise collection?
The information that you collect using the app will be uploaded to Onna's cloud environment.

What type of login is needed - database or user?
A user account to Confluence with full access to the space(s) that need to be collected.

If my collection runs into an error, what should I do?
Create a support ticket and our team will be happy to assist.

Is it possible to collect archived spaces?

At this time it is not possible to sync archived spaces due to an API limitation. We suggest changing archived spaces to current in order to perform the required collection. Once the collection has successfully completed the spaces can be archived again.


Is it possible to collected restricted spaces or pages?

It's only possible if the account used to create the collection has access to the restricted space or page. Our connector can only see what that user sees, even if that user is an admin, because admins can also have restricted access to a space or page.


ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence