Rust & WebAssembly
Every performance-critical operation in PDFox.js — compression, encryption, image decoding, color conversion, content parsing — is written in Rust and compiled to WebAssembly. This isn't an optimization layer. It's the foundation.
WASM modules must be compiled and available in all environments — browser and Node.js. There are no JavaScript fallbacks. If WASM is unavailable, you fix the WASM build. The same compiled code runs in production and in tests.
The Five Crates
Each crate is a focused Rust library compiled to a standalone .wasm module via wasm-pack.
Image decoding and color space conversion. Handles JPEG2000, JBIG2, CCITT Fax — formats common in scanned and archival PDFs that pure JS can't decode efficiently.
JPEG2000
JBIG2
CCITT G3/G4
CMYK→RGB
Lab→RGB
Resampling
Content stream tokenization — the core parsing loop that breaks PDF page content into operators and operands at near-native speed.
Tokenizer
Operator parsing
Operand extraction
CMap parsing for CID-keyed fonts. Converts character codes to Unicode for complex scripts including CJK, Arabic, and Devanagari.
CMap parsing
CID fonts
Unicode mapping
All PDF stream decompression algorithms — the most frequently called WASM module, processing every content stream and embedded object.
FlateDecode
LZW
ASCII85
RunLength
PNG predictor
TIFF predictor
Cryptographic primitives for PDF encryption and digital signatures. Handles legacy RC4 through modern AES-256 as defined in PDF 2.0.
AES-256
AES-128
RC4
SHA-256/384/512
MD5
i
Why Rust, not C/C++
Rust compiles to compact, safe WebAssembly without a garbage collector or runtime. Memory safety is guaranteed at compile time — no buffer overflows, no use-after-free, no data races. The WASM binary size stays small because there's no runtime to bundle. And Rust's wasm-bindgen generates clean TypeScript bindings automatically.
Build Pipeline
Rust source is compiled through wasm-pack with aggressive release optimizations, producing minimal .wasm binaries that load alongside TypeScript packages.
Cargo + rustc
Compile to WASM
wasm-pack
Optimize + bind
.wasm Binary
Optimized module
@pdfox/wasm
TypeScript bindings
Release Optimizations
lto = true Link-Time Optimization across crates
opt-level = "s" Optimize for binary size
panic = "abort" No unwinding overhead
strip = true Remove debug symbols
# Shared release profile for all WASM crates
[profile.release]
lto = true
opt-level = "s"
panic = "abort"
strip = true
codegen-units = 1
[workspace]
members = [
"crates/pdfox-compression",
"crates/pdfox-crypto",
"crates/pdfox-graphics",
"crates/pdfox-cmap",
"crates/pdfox-content",
]
Performance
WASM isn't a nice-to-have optimization — it's a hard requirement for PDF operations that process megabytes of binary data. Compression runs on every stream. Crypto runs on every encrypted document. Image decoding runs on every scanned page.
Stream Decompression (FlateDecode)
~3x faster
JavaScript
Rust WASM
JPEG2000 Decoding (JPXDecode)
~5x faster
JavaScript
Rust WASM
AES-256 Encryption
~2x faster
JavaScript
Rust WASM
Content Stream Tokenization
~4x faster
JavaScript
Rust WASM
Rust Dependencies
Each crate uses battle-tested Rust libraries — no reinventing cryptography or compression. The dependency tree stays minimal to keep WASM binaries small.
Compression & Encoding
- flate2 zlib/deflate
- weezl LZW codec
- openjp2 JPEG2000 decoding
- wasm-bindgen JS ↔ Rust bindings
Cryptography
- aes AES-128/256
- sha2 SHA-256/384/512
- md-5 MD5 (legacy PDF)
- rc4 RC4 (legacy PDF)
TypeScript Packages
The application layer is a pnpm monorepo of 11 TypeScript packages — strict mode, ES2020 target, tree-shakeable ESM/CJS/UMD outputs. Each package is independently importable: use only what you need.
@pdfox/core
Document model, PDF objects, events, LRU caching
~40KB
@pdfox/parser
XRef table/stream, lexer, object parser, stream decoders
~25KB
@pdfox/renderer
Canvas, DOM (contentEditable), SVG graphics — triple output
~35KB
@pdfox/text
Text extraction with styles, Type1/TrueType/CID fonts, CMap
~15KB
@pdfox/editor
Content modification, page rotate/reorder/merge/extract
~30KB
@pdfox/forms
AcroForms — all field types, filling, flattening, FDF/XFDF
~20KB
@pdfox/annotations
11+ types: highlight, underline, sticky, shapes, ink, redaction
~25KB
@pdfox/security
RC4/AES encryption, PKCS#7/CMS signatures, PAdES
~40KB
@pdfox/accessibility
Tagged PDF, structure tree, reading order, PDF/UA validation
~20KB
@pdfox/standards
PDF/A-1 through PDF/A-4, PDF/UA-1 & PDF/UA-2 validation
~30KB
@pdfox/wasm
WebAssembly modules — TypeScript bindings for all 5 Rust crates
5 .wasm binaries
import { PDFoxWASM } from '@pdfox/wasm';
import { PDFParser } from '@pdfox/parser';
import { DOMRenderer } from '@pdfox/renderer';
// Initialize WASM modules (required before any PDF operations)
await PDFoxWASM.init();
// Load and parse PDF
const buffer = await fetch('document.pdf').then(r => r.arrayBuffer());
const doc = await PDFParser.parse(buffer);
// Render to editable DOM
const renderer = new DOMRenderer({ container, scale: 1.5, editable: true });
await renderer.render(doc.getPage(0));
Monorepo Architecture
The entire stack — Rust crates, TypeScript packages, build tooling — lives in a single pnpm workspace. 121 TypeScript source files. 5 Rust crates. One build command.
WASM
pdfox-compression
pdfox-crypto
pdfox-graphics
pdfox-cmap
pdfox-content
Core
@pdfox/core
@pdfox/parser
@pdfox/wasm
Engine
@pdfox/renderer
@pdfox/text
@pdfox/editor
Features
@pdfox/forms
@pdfox/annotations
@pdfox/security
Standards
@pdfox/accessibility
@pdfox/standards
Tooling
Rollup
Vitest
TypeDoc
wasm-pack