Gaffa vs Patrivox
Side-by-side comparison to help you choose the right tool.
Gaffa is a scalable REST API for web automation and data extraction using real browsers.
Last updated: March 1, 2026
Patrivox
Patrivox uses Mistral AI to digitize, classify, and make your document archives fully searchable in minutes.
Last updated: March 4, 2026
Visual Comparison
Gaffa

Patrivox

Feature Comparison
Gaffa
Simple REST API for Browser Control
Gaffa abstracts the complexities of frameworks like Playwright, Selenium, and Puppeteer into a straightforward REST API. Developers can execute sophisticated browser automation tasks—such as navigation, interaction, and data extraction—with a single HTTP request. This eliminates the learning curve and maintenance burden associated with direct browser automation libraries, streamlining integration into existing data workflows and applications.
Stealth Mode with Resilient Infrastructure
The platform is engineered to target the hardest-to-scrape websites. Its stealth mode integrates a global network of residential proxies, automated CAPTCHA-solving capabilities, and real browser instances that execute JavaScript and render pages identically to a human user. Gaffa automatically handles proxy rotation, request throttling, and fingerprint management to bypass advanced anti-bot defenses, ensuring high success rates for data extraction missions.
Automated Data Processing and Output Formats
Gaffa goes beyond returning raw HTML. It includes built-in data processing to transform web content into immediately usable formats. Users can specify output as clean, simplified HTML; structured JSON via CSS selectors; LLM-ready markdown optimized for AI ingestion; full-page screenshots; or even self-contained offline page archives. This feature saves significant post-processing time and computational resources.
Full Observability and Request Recording
Every automation executed through the Gaffa API is recorded for complete observability. Users can review detailed logs, performance metrics, and visual screen recordings of their browser sessions. This transparency is critical for debugging complex automation scripts, verifying data extraction accuracy, and auditing the behavior of the automation to ensure it operates as intended against target websites.
Patrivox
Next-Generation OCR & Automated Entity Extraction
Patrivox utilizes Mistral AI's advanced optical character recognition technology to accurately convert scanned PDF documents into machine-readable text. Beyond simple text extraction, the system performs automatic Named Entity Recognition (NER), identifying and categorizing key entities such as individuals, locations, organizations, and dates from within the documents. This process occurs in batch mode, allowing for the upload of hundreds of files simultaneously, with metadata automatically enriched without any manual configuration or data entry required.
Intelligent Search & AI-Powered Chat Interface
The platform provides a multi-faceted search experience across the entire document collection. It features instant full-text search with typo tolerance and filtering capabilities by date, author, or document type. Furthermore, users can interact with their archives using a conversational AI chat interface, asking questions in plain language. The AI synthesizes answers based on the document content and provides direct citations to the source material, enabling efficient, evidence-based research and information retrieval.
Interactive Knowledge Graph (Constellation)
Patrivox automatically generates a visual knowledge graph, termed "Constellation," which maps the relationships between all identified entities. Users can navigate interactively from node to node, discovering non-obvious connections between people, places, and organizations across different documents. This feature allows for exploratory research, revealing hidden patterns and contextual relationships that would be exceedingly difficult to discern through manual review of the archival material.
Sovereign European Hosting & Collaboration Tools
The platform is built with data sovereignty as a core principle. All data is processed and stored on servers located within the European Union, ensuring full compliance with GDPR regulations. It includes comprehensive audit logs for accountability. Patrivox also supports seamless collaboration, offering unlimited reader accounts for public or team access alongside configurable administrator roles for managing content and user permissions securely.
Use Cases
Gaffa
Competitive Intelligence and Market Research
Businesses can automate the collection of pricing data, product catalogs, feature updates, and promotional content from competitor websites at scale. Gaffa's ability to handle JavaScript-heavy sites and bypass blocks ensures a consistent flow of structured data, enabling companies to perform dynamic pricing analysis, track market trends, and inform strategic decisions with real-time web data.
AI and LLM Training Data Aggregation
For teams building or fine-tuning large language models, Gaffa provides a reliable pipeline for sourcing high-quality, diverse training data from the web. The platform's ability to output clean, LLM-ready markdown and structured JSON simplifies the data preparation pipeline, allowing data scientists to focus on model development rather than the complexities of data collection and cleaning.
Regulatory Compliance and Financial Monitoring
Financial institutions and compliance teams can use Gaffa to automate the monitoring of regulatory publications, news sites, and official registers. The platform's reliability and audit trail via session recording are essential for ensuring data provenance and completeness in regulated environments, supporting activities like KYC (Know Your Customer) checks and adverse media screening.
E-commerce Price and Inventory Monitoring
Retailers and aggregators can deploy Gaffa to track real-time pricing, stock availability, and product descriptions across thousands of e-commerce SKUs. The system's concurrent request handling and proxy rotation allow for high-frequency, large-scale scraping without triggering IP bans, enabling dynamic repricing strategies and supply chain optimization.
Patrivox
Municipal and State Archives Digitization Projects
Municipal archives can utilize Patrivox to systematically digitize, process, and open access to vast collections of official records, including council deliberations, civil registers, and historical correspondence. The AI automates the indexing process, making decades or centuries of administrative documents fully searchable for historians, genealogists, and citizens, thereby enhancing transparency and preserving civic heritage.
Historical Society Bulletin and Collection Management
Historical societies and associations managing periodic bulletins, journals, and documentary collections can transform these resources into a searchable digital library. Patrivox enables members and the public to instantly find articles by topic, mentioned individuals, or locations, and to ask complex questions about the society's holdings, significantly increasing the utility and engagement with their published materials.
Ecclesiastical and Parish Register Preservation
Dioceses and parish archives can preserve fragile parish registers (baptisms, marriages, burials) by digitizing them with Patrivox. The AI automatically extracts names, dates, and places, linking them across records. This creates an invaluable resource for genealogical research and historical demography, allowing for complex queries about family lineages and local history that were previously labor-intensive to perform.
Special Collections Management for Heritage Libraries
Heritage libraries and institutions with special collections can use Patrivox to provide controlled, enhanced access to unique or rare documents. Researchers can perform deep textual analysis, discover connections between authors and subjects via the knowledge graph, and access sourced answers to specific inquiries, all while the institution maintains secure, role-based access and a complete audit trail of document interactions.
Pricing Comparison
Gaffa
Gaffa offers tiered monthly subscriptions and a pay-as-you-go credit system. The Starter plan is $29/month, including 9,000 credits, 1 concurrent request, and 7-day data retention. The Startup plan is $99/month, offering 35,000 credits, 3 concurrent requests, and 30-day retention. The Growth plan is $249/month, providing 100,000 credits, 10 concurrent requests, and 3-month retention. All plans include access to the Browser Request API and residential proxies. Additionally, users can Get Credits on a flexible basis at $20 per 5,000 credits, which come with Starter-plan allowances. For high-volume or custom requirements, Gaffa provides the option to Contact Us for a bespoke enterprise plan.
Patrivox
Patrivox offers a transparent, tiered pricing model with monthly or annual billing options. All plans include unlimited reader accounts and are hosted in Europe.
Découverte (Discovery): A free trial plan with 100 trial pages, 200 MB storage, a 250 MB per-file limit, 50 AI queries per month, and support for 1 administrator.
Starter: Priced at €29 per month (billed monthly), this plan includes 1,000 pages, 2 GB storage, a 250 MB per-file limit, 150 AI queries per month, 2 administrators, and standard email support.
Essentiel (Essential): The popular plan, priced at €129 per month (billed monthly). It offers 5,000 pages, 10 GB storage, a 250 MB per-file limit, 1,000 AI queries per month, 3 administrators, and priority email support.
Association: Designed for large associations and archives at €299 per month (billed monthly). It provides 15,000 pages, 25 GB storage, a 1 GB per-file limit, 3,000 AI queries per month, and support for 3 administrators.
Overview
About Gaffa
Gaffa is an API-first platform engineered to solve the complex, infrastructural challenges of large-scale web data extraction and browser automation. It provides developers, data scientists, and businesses with a robust, simplified interface to control real, fully-featured web browsers via a REST API, eliminating the need to build and maintain intricate in-house systems. The platform's core value proposition is its abstraction of the entire technical stack required for reliable scraping, including proxy management, CAPTCHA solving, browser orchestration, and failure handling. This allows technical teams to focus entirely on data utilization rather than pipeline management. Gaffa is specifically architected for resilience against sophisticated anti-bot measures, employing stealth techniques, residential proxy networks, and real browser instances to mimic genuine human interaction. It supports sophisticated automation actions like scrolling, clicking, and form submission, and delivers processed data in multiple formats including raw HTML, structured JSON, LLM-ready markdown, and images. Ideal for startups, growth-stage companies, and enterprises, Gaffa delivers consistent, high-volume access to web data with minimal operational overhead.
About Patrivox
Patrivox is a sophisticated, AI-powered Software-as-a-Service (SaaS) platform engineered to transform static, scanned document archives into dynamic, intelligent knowledge bases. It is specifically designed for European heritage institutions, municipal archives, historical societies, dioceses, and enterprises managing large collections of documents such as parish registers, municipal deliberations, bulletins, and special collections. The platform leverages Mistral AI's advanced optical character recognition (OCR) and natural language processing (NLP) to automatically extract text, identify named entities (people, places, organizations, dates), and construct a semantic network of connections. Its core value proposition is the radical simplification of accessing archival knowledge, enabling full-text search with typo tolerance, natural language question-answering with source citations, and interactive exploration via a knowledge graph. As a sovereign solution, Patrivox is 100% hosted in Europe, GDPR-native by design, and provides a secure, scalable infrastructure to democratize access to historical and institutional knowledge for researchers, administrators, and the public.
Frequently Asked Questions
Gaffa FAQ
What is a credit and how is it calculated?
A credit is Gaffa's unit of consumption for its API. Usage is calculated based on two primary factors: request duration and proxy bandwidth. Browser runtime is billed at 1 credit per 30 seconds (or 2 credits per 30 seconds if screen recording is enabled). Additionally, any request utilizing a residential proxy (proxy_location parameter) incurs a bandwidth charge of 1500 credits per 1GB of data transferred. Each successful API call deducts the corresponding credits from your monthly allowance.
Does Gaffa offer a free trial?
Yes, Gaffa provides a free tier that allows users to sign up and experiment with the full API feature set without a credit card. This trial is conducted on a dedicated demo site (demo.gaffa.dev), enabling users to build and test automations, understand the workflow, and assess output formats before upgrading to a paid plan for use on the open internet.
What is Gaffa's refund policy?
Gaffa is happy to offer a refund upon request, provided the request is made before any credits have been consumed within the current billing cycle. Once credits have been used, refunds are not typically issued. Users are encouraged to review the detailed refund policy on the Gaffa website for specific terms and conditions.
Do unused credits roll over to the next month?
No, credits do not roll over. The credit allowance included with your monthly subscription plan is reset at the start of each new billing cycle. Any unused credits from the previous period are forfeited. This applies to both monthly plans and pay-as-you-go credit packs, which also have no expiration unless specified otherwise at purchase.
Patrivox FAQ
How does Patrivox ensure data privacy and compliance?
Patrivox is a fully sovereign European platform. All data processing and storage infrastructure is hosted exclusively within the European Union. The platform is designed from the ground up to be GDPR-native, implementing principles of data minimization, security, and accountability by default. Comprehensive audit logs are maintained for all system activities, providing full transparency over data access and processing.
What file formats and volumes can I upload?
The primary supported format for document ingestion is PDF. The platform is optimized for batch processing, allowing users to drag and drop hundreds of PDF files simultaneously. Individual file size limits vary by subscription plan, typically ranging from 250 MB to 1 GB per file. There are also tiered limits on the total number of pages and storage capacity, detailed in the pricing plans.
What is an "AI query" in the context of pricing plans?
An "AI query" refers to each individual interaction with the platform's conversational AI chat interface. Every question asked in natural language that requires the AI to analyze document content and generate a synthesized, sourced answer consumes one query from the monthly allowance. Standard full-text searches and knowledge graph navigation do not count as AI queries.
Can I provide public access to the archives I upload?
Yes. A core feature of Patrivox is the ability to share your processed knowledge base with unlimited readers. You can grant secure, role-based access to researchers, your team, or the general public without incurring additional per-user costs. Administrators retain full control over management permissions and can curate what is accessible to different user groups.
Alternatives
Gaffa Alternatives
Gaffa is a REST API platform in the web automation and data extraction category. It provides a managed service for controlling real browsers at scale, abstracting the infrastructure complexities of proxy management, CAPTCHA solving, and anti-bot evasion to deliver reliable data. Users may seek alternatives for various reasons, including budget constraints, specific feature requirements not covered by the platform, or a need for greater control over the underlying infrastructure. Some organizations might prefer an open-source framework to build a custom solution, while others might require different pricing models or integration capabilities. When evaluating alternatives, key considerations include the core technology stack, such as whether it uses headless browsers or HTTP clients, its ability to handle JavaScript-rendered content and bypass anti-bot measures, and the scalability of its proxy network. The format and reliability of data output, along with the total cost of ownership for development and maintenance, are also critical decision factors.
Patrivox Alternatives
Patrivox is a specialized SaaS platform within the document intelligence and content automation category. It leverages advanced AI, specifically Mistral AI, to perform high-accuracy OCR, entity extraction, and knowledge graph generation, transforming static scanned PDFs into a dynamic, searchable digital archive. This process is designed for speed and depth, enabling complex semantic search and natural language queries. Users may explore alternatives to Patrivox for several technical and operational reasons. Common drivers include specific budget constraints or pricing model preferences, the need for integration with an existing enterprise software stack, or requirements for particular feature sets such as custom entity recognition models, different API capabilities, or support for niche document types not covered by a general platform. When evaluating an alternative solution, key technical criteria should be assessed. These include the core OCR engine's accuracy, especially with historical or poor-quality scans, the sophistication of its natural language processing for entity and relationship extraction, and the flexibility of its search functionality. Additionally, considerations around data sovereignty, compliance certifications, and the scalability of the platform's architecture are critical for enterprise deployment.