---
title: "Data Expansion Sizes by File Type"
slug: "data-expansion-sizes-by-file-type"
updated: 2026-05-08T13:05:32Z
published: 2026-05-08T13:05:32Z
---

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.revealdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Expansion Sizes by File Type

The below tables provide estimated data expansion ratios for common file types encountered during eDiscovery processing, helping teams anticipate how raw data volumes may grow once processed.

Estimating file data size allows for more accurate scoping, budgeting, and infrastructure planning for your projects. Use these benchmarks to reduce risk, avoid unexpected overages, and plan processing workflows with greater confidence.

## Typical corporate data expansion rates

Across a typical corporate data set, expansion rates are estimated to be as follows:

| Expansion Rate | Data Set Description | Estimated Expansion Ratio |
| --- | --- | --- |
| **Low Expansion** | MS-Office docs and PDFs with minimal archives. | 1.3x to 1.8x |
| **Moderate Expansion*** | Mix of emails with attachments and ZIP files. | 1.8x to 2.5x |
| **High Expansion** | Heavy PSTs with a large volume of ZIP and RAR files, scanned docs requiring OCR, and chat data. | 2.5x to 5x or greater |

* Most common data set scenario for organizations

## Expansion rates by file category / type

| File Category | File Types | Estimated Expansion Ratio |
| --- | --- | --- |
| MS-Office docs and PDFs | PST, OST, MSG, EML, MBOX | 1.5x to 3x |
| Office documents | DOC, DOCX, XLS, XLSX, PPT, PPTX, CSV, RTF, PDF, TXT | 1x to 1.5x |
| Images | JPG, PNG, TIFF | 1.5x to 4x |
| Archives | ZIP, RAR, 7Z | 2x to 10x |
| System files | HTML, LOG | 1x to 1.5x |
| Multimedia | MP4, MOV, MP3 | 1.2x† |
| Slack data | HTML | 5x to 10x (no attachments) 15x to 20× (with attachments) |
| Microsoft Teams data | HTML | 5x to 10x (no attachments) 15x to 20× (with attachments) |

† Expansion rate is due to generated transcript

> [!NOTE]
> Note
> 
> Conversation files are rendered in HTML. Attachments within conversations are stored in their native file format — for example, a Slack or Teams message may include files such as DOCX, PDF, PNG, ZIP, or any other supported file type listed above.
