Data Expansion Sizes by File Type

Prev Next

The below tables provide estimated data expansion ratios for common file types encountered during eDiscovery processing, helping teams anticipate how raw data volumes may grow once processed.

Estimating file data size allows for more accurate scoping, budgeting, and infrastructure planning for your projects. Use these benchmarks to reduce risk, avoid unexpected overages, and plan processing workflows with greater confidence.

Typical corporate data expansion rates

Across a typical corporate data set, expansion rates are estimated to be as follows:

Expansion Rate

Data Set Description

Estimated Expansion Ratio

Low Expansion

MS-Office docs and PDFs with minimal archives.

1.3x to 1.8x

Moderate Expansion*

Mix of emails with attachments and ZIP files.

1.8x to 2.5x

High Expansion

Heavy PSTs with a large volume of ZIP and RAR files, scanned docs requiring OCR, and chat data.

2.5x to 5x or greater

* Most common data set scenario for organizations

Expansion rates by file category / type

File Category

File Types

Estimated Expansion Ratio

MS-Office docs and PDFs

PST, OST, MSG, EML, MBOX

1.5x to 3x

Office documents

DOC, DOCX, XLS, XLSX, PPT, PPTX, CSV, RTF, PDF, TXT

1x to 1.5x

Images

JPG, PNG, TIFF

1.5x to 4x

Archives

ZIP, RAR, 7Z

2x to 10x

System files

HTML, LOG

1x to 1.5x

Multimedia

MP4, MOV, MP3

1.2x†

Slack data

N/A

5x to 10x (no attachments)

15x to 20× (with attachments)

Microsoft Teams data

N/A

5x to 10x (no attachments)

15x to 20× (with attachments)

† Expansion rate is due to generated transcript

Footer Design