In this article:
- Web Crawler Overview 
- Web Crawler Requirements 
- How to Connect and Collect Using Web Crawler 
Web Crawler Overview
Onna's web crawler was created to index web pages.
| Connector Features | |
| Authorized Connection Required? No | Is identity mapping supported? No | 
| Audit logs available? Yes | Admin Access? No | 
| Supports a full archive? No | Custodian based collections? No | 
| Preserve in place with ILH? No | Resumable sync supported? No | 
| Supports Onna preservation? No | Syncs future users automatically? No | 
| Sync modes supported: 
 | Is file versioning supported? No | 
| Types of Data Collected | Metadata Collected | 
| 
 | 
 | 
Web Crawler Considerations
- The web crawler does not currently support password-protected websites or Captcha protected websites. 
- You’re not able to collect files in their native format from the links on a web page during collection. Web crawler links are embedded so they will not be able to pull files in their native formats. 
- You cannot collect data from a site if a user agent has issued a block request. 
Web Crawler Requirements
- When adding a new Web Crawler sync, you have to introduce the URLs with the protocol (http, https) 
How to Connect and Collect Using Web Crawler
To create a new Web Crawler collection follow the steps below:
Step 1
Click on ‘Workspaces’ in the main menu (a), then click on the workspace where you’d like to add a new sync (b).

Step 2
Click on the ‘+’ icon in the upper right corner to add a new source.

Step 3
Select the Web Crawler connector from your list of available connectors.

Step 4
To configure your sync start by entering a name for your source in the ‘Name’ field (a). Then, enter the URL you want to collect from (b). Finally, click the blue ‘Done’ button (c).

Step 5
You’ll now see your new source appear alphabetically in the list of ‘Connected sources’ in your workspace.
