INPUT Chaos OUTPUT Your Standard You can't control the input. You can control the output.
PDF EMAIL PHOTO XLSX SCAN FORM AI OUTPUT

The system has three parts

PART 1 Define Your Output PART 2 Set Extraction Rules PART 3 Build & Test

Turn your template into a field list

Field
Type
Status
Vendor Name
text
required
Invoice Date
date
required
Invoice Number
text
required
Line Items
text
required
Subtotal
dollar
required
Tax
dollar
optional
Total
dollar
required
Payment Terms
text
optional

No clean template yet? Upload 3–5 completed documents and use this:

I'm uploading completed documents that represent my standard output. Analyze them and extract every data field that appears across all of them. For each field, note the data type and whether it appears in all documents or just some. Output this as a simple template I can use as my extraction standard.
WITHOUT RULES Vendor Name Acme Corp ✓ correct Invoice Date March 15, 2025 ✓ correct Tax Rate 8.25% ✗ guessed Payment Terms Net 30 ✗ guessed AI fills in blanks with confident guesses
1
Grounding
The document is your only source of truth. Look nowhere else.
2
Change the Incentive
A blank answer is better than a wrong answer. Leave it empty when unsure.
3
Safety Net
Show your work. Include the exact quote that supports every value.
You are a document data extraction specialist. [Paste your field list here — every field, data type, required/optional] 1. Base your extraction only on the uploaded document. The document is your sole source of truth. 2. A wrong answer is 3x worse than a blank answer. When in doubt, leave the field blank and explain why. 3. For every extracted value, include the exact quote from the document that supports it. If you cannot point to the exact words, leave the field blank. Return a table with four columns: Field | Value | Source Quote | Status Status options: - EXTRACTED: found word for word in the document - INFERRED: derived or calculated from document content (needs review) - MISSING: not found in the document - AMBIGUOUS: multiple possible values found (list all candidates) After the table, provide a summary: - Total fields - How many extracted, inferred, missing, ambiguous - List every missing and ambiguous field by name Documents come from multiple clients in different formats. Layouts and labels will vary. The underlying data points are consistent.

The audit table tells you everything

FieldValueSource QuoteStatus
Vendor Name Acme Corp "Bill To: Acme Corp LLC" EXTRACTED
Invoice Date 03/15/2025 "Date: March 15, 2025" EXTRACTED
Tax No tax line found MISSING
Payment Terms Net 30 / Net 45 "Terms: Net 30" and "Pay within 45 days" AMBIGUOUS

Set up your project

1
Create a new project
2
Paste the system prompt
3
Upload 2–3 reference examples
4
Test with 5–8 real documents

From audit table to filled template

YOUR BLANK Template AI REVERSE-ENGINEERS Formatting BECOMES A Skill NOW THE FLOW IS Client Document Funnel Audit Table SKILL FILLS Your Template

Start in the browser. Move to desktop when you need to.

BROWSER 8–10 files per session You track errors manually You review and update rules DESKTOP Process entire folders AI logs errors to a file AI spots patterns and suggests fixes + Connects to CRM, database, shared drive
THE SELF-IMPROVING LOOP AI extracts data EDGE-CASES.MD Errors logged YOU TRIGGER A REVIEW AI spots patterns Instructions updated
Output
Define every field AI needs to find
Rules
Ground it, change the incentive, force evidence
Project
Build, test with variety, add a template skill if needed
Desktop
Start in browser — go desktop for volume, integrations, and self-improvement