Skip to main content
All articles
Due Diligence·9 min read·

How to Read a Data Room: A Practical Guide for M&A Analysts

Most analysts are thrown into their first data room with no real framework for navigating it. Here's a systematic approach to reading a VDR that surfaces risk faster and leaves a defensible record.

S

Sritej Bommaraju

Founder, STET

Walking into a data room for the first time is disorienting. There are hundreds — sometimes thousands — of files organized according to the seller's internal logic, not yours. PDFs, Excel workbooks, scanned contracts, bank statements, and board minutes all sitting in nested folders. The natural instinct is to start at the top and work down. That's a mistake.

Reading a data room efficiently isn't about reading everything. It's about building a map of the data room quickly, identifying the documents that carry the most risk signal, and structuring your review so that findings can be traced back to specific source documents. This guide walks through that process systematically.

Step 1: Establish the ledger anchor before opening any documents

Before you open a single PDF, find the financial ledger — usually the general ledger, a trial balance, or an accounts payable/receivable export. This is your anchor. Every document in the room exists to support a transaction, an obligation, or a balance that should be traceable back to this ledger. Without anchoring to it first, you'll review documents in a vacuum.

Download the ledger and open it separately. Scan the column structure: reference numbers, dates, descriptions, amounts, counterparty names. These are the five fields you'll be hunting for in every supporting document. Make note of the naming conventions used — do reference numbers look like 'INV-2024-00142' or '00142' or something else entirely? This will determine how you search later.

Step 2: Categorize the document inventory before reviewing anything

Most analysts start reviewing documents immediately. Instead, spend 20 minutes building a document inventory. This means scanning the folder structure and cataloguing what types of documents exist, roughly how many of each type there are, and which folders appear to be complete versus sparse.

  • Financial documents: bank statements, AP/AR aging, trial balances, audited financials
  • Legal documents: material contracts, leases, IP assignments, corporate resolutions
  • Operational documents: payroll records, headcount data, vendor agreements
  • Tax documents: federal and state filings, transfer pricing studies, audit correspondence
  • HR/Employment: offer letters, equity schedules, PTO/benefits summaries

Note which categories appear under-populated relative to the deal size. A company with $50M in revenue that has only three vendor agreements in the data room has either consolidated its contracts or is hiding something. The absence of documents is itself a finding.

Step 3: Start with the documents that have the highest reconcilable density

Not all documents are equally information-dense for reconciliation purposes. A 200-page vendor agreement may contain one or two reference numbers and dollar amounts. A 3-page invoice contains a dozen. Bank statements are dense with transaction data but require normalization to match against a ledger. Prioritize by reconcilable density — how many matchable data points per page.

In a typical data room, 20% of documents contain 80% of the reconcilable transaction data. Invoices, bank statements, expense reports, and payment confirmations are almost always in that top 20%. Board minutes and legal agreements are almost always in the bottom 20%.

High-density documents to prioritize

  • Vendor invoices — reference numbers, line items, amounts, dates, payment terms
  • Bank statements — transaction IDs, counterparties, amounts, value dates
  • Expense reports — merchant names, dates, amounts, employee IDs
  • Payment confirmations — wire references, amounts, sending/receiving accounts
  • Purchase orders — PO numbers, line items, approval chains

Step 4: Build a discrepancy log from day one

The worst mistake analysts make in data room review is deferring the discrepancy log. They'll note something odd on page 42 of a vendor agreement, intend to come back to it, and then lose it in the volume. Start logging discrepancies as you find them, with enough specificity to return to the source later.

A good discrepancy log entry has five fields: the ledger row (or range of rows) it relates to, the source document file name and page number, the nature of the discrepancy (amount mismatch, missing document, duplicate entry, date inconsistency), the severity (high/medium/low), and a flag for whether it requires follow-up from the seller.

Step 5: Cross-reference the document index against the ledger

Once you've completed an initial pass, run a cross-reference between your document inventory and the ledger's transaction list. The question you're answering: which ledger line items have no corresponding supporting document in the data room? These gaps are your first escalation list.

Be specific about what counts as 'corresponding.' An invoice that covers a ledger line item partially doesn't fully support that entry. A bank statement that shows a payment went out doesn't verify what it was paying for. The quality of a match matters as much as its existence.

The automation layer that changes this process

Manual data room reading is still the industry standard, but the cross-referencing step — matching documents to ledger entries — is now automatable with high accuracy. Tools like STET run the document inventory, extract structured data from PDFs and spreadsheets, and match that data to ledger rows deterministically, flagging gaps and discrepancies automatically.

The result isn't that analysts stop reading data rooms — it's that the mechanical cross-referencing is done in minutes rather than days, and analysts spend their time on the discrepancies that actually require judgment, not on the 85% of entries that match exactly.

If you're spending more than a few hours on a reconciliation pass, the bottleneck is process, not document volume. The framework above cuts through that — and automation removes the bottleneck entirely.

See it in action

Ready to run reconciliation on your next deal?

Book a 30-minute demo and we'll walk through STET live with your data room.

Book a Demo