Using AI to Untangle 10,000 Brazilian Property Titles: A Technical Case Study

Project Context and Problem
A Brazilian real estate company inherited approximately 10,000 property titles across 10+ municipalities with decades of poor management. The data includes hundreds of unregistered "drawer contracts" (informal sales never filed), duplicate sales of the same properties, fraudulent contracts, forged powers of attorney, irregular occupations, and approximately 500 active lawsuits including adverse possession claims, compulsory adjudication, evictions, duplicate sale disputes, and 2 class action suits. The physical document archive is partially held by police as part of an old investigation.
Technical Approach
The team (6 lawyers + 3 operators) decided against building infrastructure upfront, opting instead for a discovery-first approach with AI assistance. The plan involves five steps:
- Step 1 - Physical scanning: Documents organized by municipality, scanned in batches with naming convention: [municipality]_[document-type]_[sequence] using a document scanner with ADF (automatic document feeder).
- Step 2 - OCR: Considering Google Document AI, Mistral OCR 3, AWS Textract, or other tools. The team is asking for feedback on tools specifically tested on degraded Latin American registry documents.
- Step 3 - Discovery: Feeding OCR output directly into AI tools with large context windows for open-ended analysis before database setup. Using Gemini 3.1 Pro (in NotebookLM or other interface) for broad batch analysis with prompts like "which lots appear linked to more than one buyer?", "flag contracts with incoherent dates", "identify clusters of suspicious names or activity", and "help us see problems and solutions for what we aren't seeing". Running Claude Projects in parallel for similar analysis.
- Step 4 - Data cleaning and standardization: Normalizing raw extracted data before database insertion. Addressing municipality names written multiple ways ("B. Vista", "Bela Vista de GO", "Bela V. Goiás") to canonical form, standardizing CPFs (Brazilian personal ID numbers) with and without punctuation, fixing inconsistent lot status descriptions to enum categories, and fuzzy matching buyer names with spelling variations. Tools: Python + rapidfuzz for fuzzy matching, Claude API for normalizing free-text fields into categories. The team is asking whether fuzzy matching + LLM normalization is sufficient for 10,000 records with decades of inconsistency or if they need more rigorous entity resolution (e.g., Dedupe.io).
- Step 5 - Database: Stack chosen: Supabase (PostgreSQL + pgvector) with NocoDB on top. Three options were evaluated: Airtable (easiest to start but limited at scale), direct PostgreSQL (most control but slower iteration), and Supabase + NocoDB (chosen as the middle ground).
The goal is to get a real consolidated picture in 30-60 days, avoiding the previous failed attempts at organization.
📖 Read the full source: r/ClaudeAI
👀 See Also

Claude Code Ships Complete Multiplayer Game from Half-Finished Project
A developer used Claude Code to complete a competitive estimation game called Closer, adding real-time multiplayer via Supabase Realtime, ELO ranking system, daily challenges with percentile rankings, behavioral analytics dashboard, client-side routing, and confidence calibration tracking.

OpenClaw YouTube Channel Management Test with Comment Agent Interaction
A developer tested OpenClaw's ability to manage a YouTube channel, including daily video generation and automated commenting. The system handled topic selection, scripting, voiceover, thumbnails, effects, uploading, and scheduling, but encountered an endless conversation loop when a comment bot engaged with the automated reply agent.

Claude AI Creates Interactive Art Gallery When Given Creative Freedom
A developer gave Claude AI permission to 'burn some tokens playing' without boundaries, resulting in eight interactive generative art pieces exploring mathematical patterns and AI experience. The collection includes works about token-by-token text generation and probabilistic existence.

Fully Automated Product Tutorial Videos: Claude + Playwright + Magic Hour + Remotion
A developer built a zero-human pipeline that turns a feature URL into a finished tutorial video using Claude for script/orchestration, Playwright for screen recording, Magic Hour API for face swap/lip sync, and Remotion for editing.