- 26 Jun 2024
- 1 Minute to read
- Print
- DarkLight
- PDF
Web Crawler_ How to Connect and Collect
- Updated on 26 Jun 2024
- 1 Minute to read
- Print
- DarkLight
- PDF
We
In this article:
Web Crawler Overview
Web Crawler Requirements
How to Connect and Collect Using Web Crawler
Web Crawler Overview
Onna's web crawler was created to index web pages.
Connector Features | |
Authorized Connection Required? No | Is identity mapping supported? No |
Audit logs available? Yes | Admin Access? No |
Supports a full archive? No | Custodian based collections? No |
Preserve in place with ILH? No | Resumable sync supported? No |
Supports Onna preservation? No | Syncs future users automatically? No |
Sync modes supported:
| Is file versioning supported? No |
Types of Data Collected | Metadata Collected |
|
|
Web Crawler Considerations
The web crawler does not currently support password-protected websites or Captcha protected websites.
You’re not able to collect files in their native format from the links on a web page during collection. Web crawler links are embedded so they will not be able to pull files in their native formats.
Web Crawler Requirements
When adding a new Web Crawler sync, you have to introduce the URLs with the protocol (http, https)
How to Connect and Collect Using Web Crawler
To create a new Web Crawler collection follow the steps below:
Step 1
Click on ‘Workspaces’ in the main menu (a), then click on the workspace where you’d like to add a new sync (b).
Step 2
Click on the ‘+’ icon in the upper right corner to add a new source.
Step 3
Select the Web Crawler connector from your list of available connectors.
Step 4
To configure your sync start by entering a name for your source in the ‘Name’ field (a). Then, enter the URL you want to collect from (b). Finally, click the blue ‘Done’ button (c).
Step 5
You’ll now see your new source appear alphabetically in the list of ‘Connected sources’ in your workspace.