LiteParse：GPU不要・高速オープンソース文書パーサー、AIエージェント統合

LiteParseは、高速なローカル解析と空間テキスト抽出・バウンディングボックスに焦点を当てたオープンソースのドキュメントパーサーです。クラウド依存やGPUを必要とせず、完全にローカルで動作し、数百ページを数秒で処理します。

主な機能

Apache 2.0ライセンスのオープンソースツール
正確なテキスト配置のためのバウンディングボックスを伴う空間テキスト解析
ローカルまたは最先端のVLM（Vision Language Models）に依存しない
GPUを必要とせず、あらゆるマシンで動作
複数のファイル形式をサポート：PDF、Office文書、画像
PyPDF、PyMuPDF、MarkItDownなどの類似ツールよりも高い精度
Claude Code、Cursor、OpenClaw、Windsurfを含む40以上のAIエージェント向けスキルとしてワンラインインストール可能

インストール方法

CLIツールのインストール：

npm i -g @llamaindex/liteparse

使用例：

lit parse document.pdf
lit screenshot document.pdf

macOSおよびLinux（Homebrew経由）：

brew tap run-llama/liteparse
brew install llamaindex-liteparse

エージェントスキルのインストール：

npx skills add run-llama/llamaparse-agent-skills --skill liteparse

使用例

基本的な解析：

lit parse document.pdf
lit parse document.pdf --format json -o output.md
lit parse document.pdf --target-pages "1-5,10,15-20"
lit parse document.pdf --no-ocr

バッチ解析：

lit batch-parse ./input-directory ./output-directory

スクリーンショット生成（LLMエージェントに有用）：

lit screenshot document.pdf -o ./screenshots
lit screenshot document.pdf --target-pages "1,3,5" -o ./screenshots
lit screenshot document.pdf --dpi 300 -o ./screenshots
lit screenshot document.pdf --target-pages "1-10" -o ./screenshots

ライブラリの使用

依存関係としてインストール：

npm install @llamaindex/liteparse
# または
pnpm add @llamaindex/liteparse

基本的な使用法：

import { LiteParse } from '@llamaindex/liteparse';
const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);

Buffer/Uint8Array入力（ディスクI/Oなし）：

import { LiteParse } from '@llamaindex/liteparse';
import { readFile } from 'fs/promises';
const parser = new LiteParse();
const pdfBytes = await readFile('document.pdf');
const result = await parser.parse(pdfBytes);