LiteParse: Fast Open-Source Document Parser for AI Agents

✍️ OpenClawRadar📅 Published: March 21, 2026🔗 Source
LiteParse: Fast Open-Source Document Parser for AI Agents
Ad

LiteParse is an open-source document parser focused on fast, local parsing with spatial text extraction and bounding boxes. It runs entirely locally without cloud dependencies or GPU requirements, processing hundreds of pages in seconds.

Key Features

  • Apache 2.0 licensed open-source tool
  • Spatial text parsing with bounding boxes for precise text positioning
  • No dependency on local or frontier VLMs (Vision Language Models)
  • Runs on any machine without GPU requirements
  • Supports multiple file formats: PDFs, Office documents, images
  • Higher accuracy than similar tools like PyPDF, PyMuPDF, MarkItDown
  • One-line installation as a skill for 40+ AI agents including Claude Code, Cursor, OpenClaw, Windsurf

Installation Options

CLI Tool Installation:

npm i -g @llamaindex/liteparse

Then use:

lit parse document.pdf
lit screenshot document.pdf

For macOS and Linux via Homebrew:

brew tap run-llama/liteparse
brew install llamaindex-liteparse

Agent Skill Installation:

npx skills add run-llama/llamaparse-agent-skills --skill liteparse

Usage Examples

Basic parsing:

lit parse document.pdf
lit parse document.pdf --format json -o output.md
lit parse document.pdf --target-pages "1-5,10,15-20"
lit parse document.pdf --no-ocr

Batch parsing:

lit batch-parse ./input-directory ./output-directory

Screenshot generation (useful for LLM agents):

lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots
lit screenshot document.pdf --dpi 300 -o ./screenshots
lit screenshot document.pdf --target-pages "1-10" -o ./screenshots
Ad

Library Usage

Install as a dependency:

npm install @llamaindex/liteparse
# or
pnpm add @llamaindex/liteparse

Basic usage:

import { LiteParse } from '@llamaindex/liteparse';
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);

Buffer/Uint8Array input (no disk I/O):

import { LiteParse } from '@llamaindex/liteparse';
import { readFile } from 'fs/promises';
const parser = new LiteParse();
const pdfBytes = await readFile('document.pdf');
const result = await parser.parse(pdfBytes);

Technical Details

  • Flexible OCR system with built-in Tesseract.js (zero setup)
  • Supports HTTP servers for OCR (EasyOCR, PaddleOCR, custom)
  • Standard OCR API specification
  • Multiple output formats: JSON and Text
  • Standalone binary with no cloud dependencies
  • Multi-platform support: Linux, macOS (Intel/ARM), Windows

For complex documents with dense tables, multi-column layouts, charts, handwritten text, or scanned PDFs, the creators recommend LlamaParse, their cloud-based document parser built for production document pipelines.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating
Tools

Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating

Creation OS wraps local LLMs (BitNet, Qwen, Gemma, any GGUF) with a σ-gate that measures multiple uncertainty channels and decides ACCEPT, RETHINK, or ABSTAIN per output. No cloud, no API. TruthfulQA accuracy improved ~29% via selective regeneration.

OpenClawRadar
Cowork vs. Claude Chat: Document Extraction Accuracy Comparison
Tools

Cowork vs. Claude Chat: Document Extraction Accuracy Comparison

A developer tested Claude.ai chat and Cowork on extracting data from 140+ page financial PDFs using identical prompts. Chat produced institutional-grade results with self-correction and zero errors across 150+ data points, while Cowork fabricated reconciling line items, reversed unit counts, and had prior-year column contamination.

OpenClawRadar
Claude Skills Hub: Searchable Repository for 789+ Claude Code Skills and 10 Autonomous Agents
Tools

Claude Skills Hub: Searchable Repository for 789+ Claude Code Skills and 10 Autonomous Agents

Claude Skills Hub (clskills.in) provides a centralized search interface for 789+ Claude Code skill files across 71 categories, plus 10 autonomous AI agents that chain multiple skills into complete workflows. The open-source project aggregates skills from multiple community collections and offers one-click downloads.

OpenClawRadar
OpenClaw .NET: NativeAOT Port with JSON-RPC Bridge for Existing Plugins
Tools

OpenClaw .NET: NativeAOT Port with JSON-RPC Bridge for Existing Plugins

OpenClaw .NET is a C# port of OpenClaw that compiles to a ~23MB NativeAOT binary, eliminating JIT warmup and Node runtime overhead while maintaining compatibility with existing TypeScript/JavaScript plugins through a built-in JSON-RPC bridge.

OpenClawRadar