Skip to content

Error-handling built in · Part of the PDF API

Repair PDF API

When a PDF won't open, fails validation, or crashes your pipeline with "unexpected EOF" and "xref not found," you don't need another retry loop—you need a repair-first workflow. This API rebuilds broken structures (like the XREF table) and fixes missing EOF markers so you can recover the document and keep your app moving.

DEV OPS QA

Processing 50,000+ PDFs/day

No credit card required · Rebuild XREF, fix EOF markers

Repair + Validate

Fast path
# cURL
curl -X POST "https://api.xspdf.com/v1/pdf/repair" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_url": "https://files.example.com/broken.pdf",
    "options": {
      "rebuild_xref": true,
      "fix_eof": true,
      "strict_validation": true
    }
  }'

Repair time

0.8–2.4s

Recovery

97.3%

Incidents

-41%

Get actionable repair reports. See Corrupted Files for common patterns.

128M+

PDF pages processed

47 min

saved per engineer/day

99.95%

API uptime (90-day)

SOC 2

controls aligned

You know the feeling when a "simple PDF upload" turns into an incident.

The file arrives. Your parser throws "xref table not found". A viewer says "There was an error processing a page". Your job queue retries until it times out. Support asks for an ETA. The customer asks if you "lost their documents."

Errors that don't reproduce

The PDF "opens on my machine," but fails in your headless pipeline.

Silent data loss risk

Some "fixes" render a PDF viewable but break links, forms, or object references.

Retries aren't recovery

Error handling can't be "try again later" when the file is structurally broken.

The hidden cost

Every corrupted upload that bounces increases churn risk. Users don't blame "PDF internals." They blame your product. Don't let a broken EOF marker cost you renewals.

There's a better way: repair, then process—with proof.

A reliable fix corrupted PDF API shouldn't "magically work" — it should be explicit about what it changed. Our recover PDF API focuses on structural repairs that directly reduce parser failures: rebuilding the XREF table, repairing broken EOF markers, and normalizing object offsets so readers can locate what they need.

1) Rebuild the XREF table

The XREF maps object numbers to byte offsets. If it's corrupted, many readers can't reliably locate objects. We rescan the file, recompute offsets, and write a consistent cross-reference structure.

2) Fix broken EOF markers

Many pipelines fail when %%EOF is missing, duplicated, or displaced. We validate end-of-file structure and repair markers so consumers can properly detect the document boundary.

3) Produce a repair report

Your app can branch on outcomes: "repaired," "recovered with warnings," or "unrecoverable." That means fewer manual escalations and cleaner support resolution notes.

Understand root causes? Read Corrupted Files for common patterns.

Error-handling response

JSON
{
  "status": "repaired_with_warnings",
  "input": {
    "bytes": 1843021,
    "sha256": "9f3d…c1b2"
  },
  "repairs": [
    {
      "type": "xref_rebuilt",
      "objects_indexed": 842
    },
    {
      "type": "eof_marker_fixed",
      "details": "missing %%EOF appended"
    }
  ],
  "validation": {
    "is_openable": true,
    "is_structurally_sound": true,
    "warnings": [
      "object_stream_recovered",
      "trailer_prev_chain_fixed"
    ]
  },
  "output": {
    "pdf_url": "https://cdn.xspdf.com/repaired/abc123.pdf",
    "bytes": 1849110
  }
}

Tip: route repaired_with_warnings to a "soft-fail" queue, and alert only on unrecoverable.

Predictable latency

Fast-path detection avoids expensive work when the file is already healthy.

Actionable logs

Structured repair reasons make dashboards and alerts actually useful.

What you get (outcomes you'll actually notice)

This isn't just "a PDF fixer." It's a calmer on-call rotation, fewer support escalations, and a pipeline that keeps moving even when inputs don't behave.

Fewer "can't open file" dead-ends

Repair common structural breaks so PDFs reliably open across viewers and libraries.

Cleaner downstream conversions

Stop feeding corrupted inputs into OCR, extraction, or rendering steps that choke unpredictably.

Deterministic error handling

Choose policies like "repair → validate → continue" and reduce retry storms.

Auditable repair trail

Know exactly what changed—ideal for regulated workflows and internal QA.

Performance without fragile hacks

Skip manual "open and re-save" steps and fix documents programmatically at scale.

Safer user experience

Don't bounce users with cryptic PDF errors—return a fixed file or a clear reason why not.

FAQ: Repair PDF API (the questions that actually matter)

If you're integrating this into a production pipeline, you're right to be skeptical. Here's how it behaves under real failure modes.

What kinds of corruption can you fix (and what can't you)?

We can repair structural issues like a missing/incorrect XREF table, broken trailer chains, and missing or malformed EOF markers—common causes of "xref not found" and "unexpected EOF." We can also recover many object references by rescanning and rebuilding indexes.

What we can't do is reconstruct content that was never uploaded (e.g., the file is truly truncated mid-stream). In those cases, the API returns an unrecoverable status with a clear reason so your error handling can fall back to re-upload. If you want a deeper breakdown of root causes, see our article on Corrupted Files.

Will repairing change the visual appearance of the PDF?
The goal is structural correctness—not re-rendering. Most repairs are metadata/index-level (XREF/trailer/EOF), which should not change how pages look. When a file is partially recoverable, we'll explicitly mark warnings in the repair report so you can decide whether to accept the output or request a re-upload.
How should I wire this into my pipeline for error handling?

The recommended pattern: when a PDF upload or processing step fails, send it to the Repair API before retrying. Branch on the response status:

  • repaired → proceed with downstream processing
  • repaired_with_warnings → log for review, but continue
  • unrecoverable → notify user to re-upload or escalate to support

This approach prevents retry loops and gives you deterministic error handling.

Can this fix password-protected or encrypted PDFs?
Yes, but you must provide the password. If you need to remove password protection entirely, use our Unlock PDF API first, then repair if needed.

Stop losing documents to "xref not found" errors. Repair PDFs programmatically.

Rebuild XREF tables, fix EOF markers, and get actionable repair reports. Turn pipeline failures into deterministic error handling. Start free—no credit card required.

Read: Corrupted Files

Also known as:

  • Repair PDF API
  • • Fix corrupted PDF API
  • • Recover PDF API
  • • PDF validation & repair
  • • XREF table rebuilder