What Is a PDF? The Format Explained in Plain Terms

PDF stands for Portable Document Format. It was created by Adobe in 1993 and became an open standard (ISO 32000) in 2008. The "portable" in the name is the whole point: a PDF is designed to look exactly the same on every device, every operating system, and every printer, regardless of what fonts or software the viewer has installed.

Understanding how PDFs work explains why certain editing operations are easy, others are hard, and why "just change that one word" often requires going back to the source file.

What's Actually Inside a PDF

A PDF file is a container. Open one in a hex editor and you'll find a mix of:

Page descriptions written in PDF's own drawing language (a subset of PostScript). These instructions say things like "draw a rectangle here," "place this image there," "render this text at this position in this font at this size."
Embedded resources: fonts (or subsets of fonts), images in formats like JPEG or PNG, color profiles, and metadata.
A cross-reference table that maps object numbers to byte offsets, so a viewer can jump directly to any page without reading the whole file.
Optional features: interactive form fields (AcroForms), JavaScript, digital signatures, embedded file attachments, bookmarks, and annotations.

The format was designed for rendering, not editing. A PDF viewer reads the page description instructions and draws the page exactly as specified. This is why a PDF looks identical in Acrobat, macOS Preview, a Chrome browser tab, a Linux PDF viewer, and a printer — they're all following the same instructions.

Why PDFs Are Hard to Edit

Imagine you have a recipe printed on paper. You can read it, you can write notes on the margins, you can cut it up and rearrange sections on a table. But you can't reach inside the printed ink and retype a word. You'd have to go back to the original Word document, change the word, and reprint.

PDFs are similar. The text on a PDF page is encoded as a sequence of drawing instructions: "place character 'H' at coordinates (72, 680), place character 'e' at (78, 680)..." and so on. The characters aren't in a word processor paragraph you can click into and edit — they're a set of positioning commands.

When an "edit PDF" tool changes text in an existing PDF, it's doing something clever but imperfect: it finds the relevant drawing instructions, replaces the character sequence, and tries to re-render the surrounding text. This works reasonably well for short edits in simple documents. It breaks badly in complex layouts because the tool doesn't have access to the original spacing rules, paragraph styles, or layout logic.

This is why the best way to edit the text of a PDF is to edit the source document (the Word file, the InDesign file, the web page) and regenerate the PDF. When that source file is unavailable or the edit is minor, tools like Acrobat can do a passable job on simple text corrections.

The Difference Between Text PDFs and Scanned PDFs

A text PDF (also called a "native PDF") was created by a PDF printer or exported from software like Word, InDesign, or Google Docs. It contains actual text content — characters that can be copied, searched, and selected. When you click-drag to highlight text in a PDF and copy it, this is a text PDF.

A scanned PDF is a collection of images. Someone put paper documents through a scanner, and the scanner produced one JPEG per page, assembled into a PDF container. There are no text characters — only pixels. You can't search or copy text from a scanned PDF because, to the software, there is no text. There are just colored pixels that happen to look like letters.

This distinction matters enormously for editing:

Text PDFs can have text overlaid, form fields filled, and content searched
Scanned PDFs need OCR (optical character recognition) before any of those things are possible

Many people encounter this when they try to copy text from a scanned document and get garbage characters or nothing at all.

How PDF Fonts Work

Fonts in PDFs are handled in two ways:

Embedded fonts: The PDF contains the actual font file (or a subset of it). This is why PDFs look identical on any device — the font travels with the document. Subset embedding is common: rather than embedding the entire 500-character font, the PDF embeds only the 47 characters actually used in the document. This keeps file size down but means those characters are available anywhere.

Referenced fonts: The PDF references a font by name but doesn't embed it. The viewer uses a font it has locally to substitute. If you've ever opened a PDF and seen the text look slightly different from what you expected, a non-embedded font is often the cause. Professional publishing workflows always embed fonts to avoid this.

Interactive PDFs: Forms and Fields

AcroForms are interactive form fields that can be added to a PDF. When you open a government form PDF and click into a field to type your name, that's an AcroForm. These fields have their own layer on top of the page content — the viewer knows where each field is, what type it is (text, checkbox, dropdown, radio button), and how to interact with it.

When you fill in an AcroForm field and print the PDF, the form data is included. When you "flatten" the PDF, the fields are burned into the page as static content, making the document non-editable but ensuring the filled values are preserved everywhere.

Why File Size Varies So Much

A simple 2-page text document exported from Word might be 80KB. A 2-page brochure from a design agency might be 40MB. The difference is embedded content:

Images: A high-resolution photograph embedded at full quality (no JPEG compression) adds megabytes. Compress the images, and the PDF shrinks dramatically.
Fonts: Embedding entire font files adds size. Most PDFs embed only the characters used (subsetting).
Transparency: Complex transparency effects and blending require additional rendering information.
Annotations and metadata: Extensive comments, form field data, and document metadata all add a small amount.

For documents you're distributing widely, compressing a PDF before sending is good practice — though you should always keep a high-quality original.

PDF Versions and Features

The PDF format has evolved significantly since 1993:

PDF 1.0 (1993): Basic page description
PDF 1.4 (2001): Transparency support
PDF 1.5 (2003): JPEG 2000, cross-document links
PDF 1.7 (2006): 3D objects, JavaScript
PDF 2.0 (2017): Major update; improved encryption, better accessibility, deprecated legacy features

Modern tools produce PDF 1.7 or PDF 2.0. Very old viewers may not support all features of newer PDFs, but this is rarely a problem in practice since PDF viewers are regularly updated.

What PDFs Can't Do

Being clear about limitations saves frustration:

Reflowable text: PDFs have fixed layouts. Unlike an ebook (EPUB), the text doesn't reflow to fit a smaller screen. On a phone, you either zoom and scroll or read tiny text.
Live data: A PDF is a snapshot. A PDF of a spreadsheet doesn't recalculate formulas. A PDF of a chart doesn't update when the underlying data changes.
Free text editing: As described above, you can't edit existing PDF text the way you'd edit a Word document. You can overlay new content, or you can modify text via Acrobat with imperfect results.
Reliable redaction by drawing over: Drawing a black rectangle over sensitive text in most tools hides it visually but leaves the underlying content in the file. Proper redaction requires software that physically removes the content from the PDF stream.

Understanding what PDFs are and aren't makes it much easier to choose the right tool for whatever you need to do with them.