The below tables provide estimated data expansion ratios for common file types encountered during eDiscovery processing, helping teams anticipate how raw data volumes may grow once processed.
Estimating file data size allows for more accurate scoping, budgeting, and infrastructure planning for your projects. Use these benchmarks to reduce risk, avoid unexpected overages, and plan processing workflows with greater confidence.
Typical corporate data expansion rates
Across a typical corporate data set, expansion rates are estimated to be as follows:
Expansion Rate | Data Set Description | Estimated Expansion Ratio |
|---|---|---|
Low Expansion | MS-Office docs and PDFs with minimal archives. | 1.3x to 1.8x |
Moderate Expansion* | Mix of emails with attachments and ZIP files. | 1.8x to 2.5x |
High Expansion | Heavy PSTs with a large volume of ZIP and RAR files, scanned docs requiring OCR, and chat data. | 2.5x to 5x or greater |
* Most common data set scenario for organizations
Expansion rates by file category / type
File Category | File Types | Estimated Expansion Ratio |
|---|---|---|
MS-Office docs and PDFs | PST, OST, MSG, EML, MBOX | 1.5x to 3x |
Office documents | DOC, DOCX, XLS, XLSX, PPT, PPTX, CSV, RTF, PDF, TXT | 1x to 1.5x |
Images | JPG, PNG, TIFF | 1.5x to 4x |
Archives | ZIP, RAR, 7Z | 2x to 10x |
System files | HTML, LOG | 1x to 1.5x |
Multimedia | MP4, MOV, MP3 | 1.2x†|
Slack data | N/A | 5x to 10x (no attachments) 15x to 20× (with attachments) |
Microsoft Teams data | N/A | 5x to 10x (no attachments) 15x to 20× (with attachments) |
†Expansion rate is due to generated transcript