Documents

Overview

The Documents section is where you manage all your contract documents. Upload documents, organize them into matters, and access detailed review, comparison, and analytics tools.

Uploading documents

Drag and drop PDF or Word documents onto the document list, or click the upload area to browse. ContractRabbit automatically processes each document through a multi-stage pipeline:

Format conversion — Documents are converted into a rich structured format that preserves paragraph hierarchy, list numbering, table layouts, bold/italic formatting, and indentation — formatting details that carry legal significance
Preprocessing — The document is normalized: empty paragraphs are collapsed, split tables are merged, inconsistent list levels are unified, inline page numbers are stripped, and space-based indentation is converted to structural indent. This ensures consistent analysis regardless of how the document was authored
Clause identification — The document is broken into logical clauses and sections using a legal-aware parser that correctly handles abbreviations like Inc., LLC., L.P., section references like Section 12.7, and semicolons in enumerated lists — cases where general-purpose NLP tools produce incorrect splits
Multi-document detection — If the file contains multiple distinct legal documents (e.g., an NDA with attached exhibits and schedules), ContractRabbit automatically detects document boundaries using signature block patterns, exhibit markers, and structural analysis, then classifies each sub-document independently
Classification — Each clause is embedded as a 1536-dimensional vector and classified against your corpus using nearest-neighbor voting. The system learns from your existing labeled documents without explicit training
Attribute extraction — Two layers run in sequence: document-level extraction identifies parties, effective dates, governing law, duration, and other headline attributes; then clause-level extraction processes every clause to find specific dollar amounts, dates, durations, entities, citations, and locations
Scoring — Each clause is scored for party favorability, and aggregate scores are computed per document and across your corpus

You can upload multiple documents at once. Processing status is shown on each document row with real-time progress notifications.

Document list

The main document list shows all your documents with sortable columns:

Name — Document filename
Document Type — Automatically classified (e.g., NDA, Service Agreement, Employment Agreement)
Party / Counterparty — Extracted parties with corporate enrichment data
Source — Where the document came from
Lifecycle Stage — Current stage (Drafting, Internal Review, etc.)
Created / Updated — Timestamps

Use the search bar and filters to narrow results by document type, party, counterparty, jurisdiction, effective date, or custom attributes.

Bulk operations

Select multiple documents to:

Delete documents
Reprocess documents (re-run the full extraction pipeline)
Update metadata
Extract parties

Document detail tabs

Click any document to open its detail view with the following tabs:

Review

Read through the clause-structured document with AI-powered analysis. Key features:

Version control — Select and compare different versions with a dropdown showing commit history
Change highlighting — Word-level diffs between versions
Document tabs — Navigate between multiple documents detected within the file (e.g., the main agreement vs. Exhibit A)
Alignment panel — Select a standard and generate clause-by-clause alignment recommendations that appear as inline tracked changes
Accept/reject workflow — Review each AI recommendation individually, with feedback that trains future alignment sessions
Export — Download the document as clean final or as a redline with tracked changes
Sections panel — Jump to specific sections using the hierarchical clause tree

Metadata

View extracted metadata and corpus scoring:

Corpus scoring — Quadrant charts showing party vs. counterparty favorability
Radar analysis — Drill down into detailed score breakdowns by clause category
Key metrics — Summary cards with important document data points (effective date, duration, governing law, etc.)

Compare

Compare the current document against others in your corpus:

Side-by-side cohort comparison
Filter by attributes to find similar documents
See how terms stack up across your portfolio

Analytics

Visual analytics for the document and related documents:

Timeline charts — Effective date distribution and temporal trends
Jurisdiction analysis — Geographic distribution of governing law and forums
Dynamic filtering to explore different dimensions

Edit

Full document editing capabilities:

Rich text editor with change tracking (Track Changes or Direct Edit mode, configurable per team or per matter)
Section-based navigation with drag-and-drop reordering
Version history with restore capability
Changes panel for reviewing modifications before saving

Timeline

Complete audit trail of the document's lifecycle:

Stage transitions with timestamps
Who made each change and their role
Visual progression through the workflow

How clause classification works

When ContractRabbit processes a document, every clause and section is embedded as a high-dimensional vector. These embeddings are compared against cluster centroids — average vectors computed from all previously classified clauses of each type. The nearest centroid determines the clause label.

This means classification improves over time: as you process more documents, centroids become more representative and new documents are classified more accurately. There is no manual training step — the system learns continuously from your corpus.

For complex clauses with enumerated sub-items, ContractRabbit decomposes them into parent clauses and child subclauses. Each subclause inherits context from its prefix (e.g., "Each party shall:") and is classified independently. If all subclauses receive the same label as the parent, they are collapsed back to avoid redundancy.

Hybrid search

When you search across documents, ContractRabbit combines five signals in a single query:

Lexical (BM25) — Traditional keyword matching with legal-aware normalization (e.g., § maps to "section")
Vector (cosine similarity) — Semantic matching using clause embeddings, so "terminate the agreement" matches "end the contract"
Taxonomy label — Matches clauses classified under a specific node in the clause taxonomy
Structured attributes — Filters on extracted dates, amounts, durations, etc.
Party favorability — Weights results by how favorable they are to a given party

Different search profiles weight these signals differently — discovery mode emphasizes breadth, while favorability mode emphasizes party scoring.

Documents

On this page