AllTopicsTodayAllTopicsToday
Notification
Font ResizerAa
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Reading: FireRedTeam Releases FireRed-OCR-2B Utilizing GRPO to Solve Structural Hallucinations in Tables and LaTeX for Software Developers
Share
Font ResizerAa
AllTopicsTodayAllTopicsToday
  • Home
  • Blog
  • About Us
  • Contact
Search
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Have an existing account? Sign In
Follow US
©AllTopicsToday 2026. All Rights Reserved.
AllTopicsToday > Blog > AI > FireRedTeam Releases FireRed-OCR-2B Utilizing GRPO to Solve Structural Hallucinations in Tables and LaTeX for Software Developers
AI

FireRedTeam Releases FireRed-OCR-2B Utilizing GRPO to Solve Structural Hallucinations in Tables and LaTeX for Software Developers

AllTopicsToday
Last updated: March 2, 2026 2:17 pm
AllTopicsToday
Published: March 2, 2026
Share
SHARE

Digitizing paperwork has lengthy been a multi-step downside. It first detects the structure, then extracts the textual content, and at last makes an attempt to reconstruct the construction. For Massive Imaginative and prescient-Language Fashions (LVLM), this usually results in “structural hallucinations” similar to jumbled strains, concocted formulation, or unclosed syntax.

FireRedTeam has launched FireRed-OCR-2B, a flagship mannequin designed to deal with doc evaluation as a structural engineering process quite than “impressionist” textual content technology. Constructed on the Qwen3-VL-2B-Instruct structure, this mannequin established a brand new state-of-the-art in end-to-end options (SOTA) and achieved an total rating of 92.94% on the OmniDocBench v1.5 benchmark.

Paradigm Shift: Structural Engineering vs. Textual content Era

Builders usually discover that even probably the most highly effective widespread VLMs wrestle with the dense spatial logic of technical PDFs. When a mannequin “sees” complicated tables or multi-line LaTeX equations, it usually can not preserve hierarchical relationships between components.

FireRed-OCR-2B addresses this by means of a specialised progressive coaching pipeline consisting of three distinct levels:

Multitask preconditioning: This stage establishes the spatial basis by coaching the mannequin on duties starting from detection, area recognition, and structure to markdown. Specialised SFT (Supervised Fantastic-Tuning): Fashions are fine-tuned based mostly on high-quality standardized Markdown datasets to make sure logical consistency and hierarchical illustration. Kind-constrained GRPO: The ultimate stage makes use of reinforcement studying to implement syntactic validity.

Core innovation: format-constrained GRPO

FireRed-OCR’s most essential technical differentiator is its use of group-relative coverage optimization (GRPO) with format constraints. Whereas conventional fine-tuning focuses on character accuracy, GRPO introduces a reinforcement studying loop that offers the mannequin particular structural properties.

Mathematical syntax: Make sure that LaTeX equations are mathematically legitimate. Desk integrity: Keep constant row/column counts and correct HTML/Markdown tagging. Hierarchical closure: Ensures that every one open structural tags (similar to lists and headers) are correctly closed. Textual content accuracy: Cut back character-level errors in dense blocks of textual content.

By eliminating the necessity for separate “important” fashions, a key good thing about the GRPO algorithm, FireRedTeam has optimized the coaching course of to particularly give attention to high-friction areas of doc evaluation.

Fixing long-tail structure points

The “lengthy tail” of doc structure (similar to non-standard authorized codecs, educational papers with overlapping figures, or handwritten annotations) is the place most OCR pipelines break. FireRed-OCR makes use of a “geometry + semantics” knowledge manufacturing facility.

This new strategy makes use of geometric characteristic clustering and multidimensional tagging to synthesize a balanced dataset. By combining geometric consciousness and semantic understanding, the mannequin maintains “in-the-wild robustness” and outperforms conventional pipeline methods similar to PaddleOCR on complicated and non-standard layouts (as benchmarked on the FireRedBench dataset).

Efficiency benchmark

In a direct comparability in OmniDocBench v1.5, FireRed-OCR-2B (92.94%) considerably outperforms different end-to-end fashions, together with:

DeepSeek-OCR 2: 91.09% Gemini-3.0 Professional: 90.33% Qwen3-VL-235B: 89.15%

Though some “pipeline” options (utilizing separate fashions for detection and recognition) obtain barely increased scores, FireRed-OCR-2B has the most effective efficiency as a single-model end-to-end strategy. That is particularly essential for builders trying to scale back system complexity and inference latency in manufacturing Retrieval-Augmented Era (RAG) environments.

Necessary factors

We have summarized the technical significance and efficiency metrics of the FireRed-OCR-2B launch into 5 key takeaways for AI engineers and knowledge scientists.

5 essential factors: FireRed-OCR-2B

New end-to-end SOTA efficiency: FireRed-OCR-2B achieved a state-of-the-art (SOTA) rating of 92.94% on the OmniDocBench v1.5 benchmark. This makes it the main single-model answer for doc evaluation, surpassing considerably bigger fashions similar to Qwen2-VL-72B and Gemini-1.5-Professional ​​in structural accuracy. Architectural Basis: Fashions constructed on Qwen2-VL-2B-Instruct (or up to date 2026 iterations) make the most of the Imaginative and prescient-Language-Mannequin (VLM) strategy. Exchange conventional multi-stage pipelines (separate detection, cropping, and OCR steps) with a unified end-to-end transformer structure that straight outputs structured markdown. Structural integrity with GRPO: The important thing technical differentiator is using GRPO (Group Relative Coverage Optimization) with format constraints. This reinforcement studying method rewards fashions that preserve syntactic validity. Particularly, it ensures that LaTeX formulation, desk tags, and Markdown hierarchies are logically closed and mathematically constant. “Geometry + Semantics” Information Manufacturing unit: To resolve complicated “real-world” structure issues, FireRedTeam has developed a specialised knowledge engine. This “manufacturing facility” synthesizes datasets by balancing geometric structure options with semantic content material, permitting the mannequin to deal with duplicate figures, multi-column educational papers, and non-standard types extra reliably than earlier iterations.

Verify your mannequin weights and repositories. Additionally, be at liberty to observe us on Twitter. Additionally, do not forget to affix the 120,000+ ML SubReddit and subscribe to our e-newsletter. dangle on! Are you on telegram? Now you can additionally take part by telegram.

Model Quantization: Meaning, Benefits & Techniques
Can AI Save Indian Farmers?
What Is Cloud Optimization? Practical Guide to Optimizing Cloud Usage
Is ChatGPT-5 Able to Provide Proofs for Advanced Mathematics?
“This isn’t what we signed up for.”
Share This Article
Facebook Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

Popular News
102305597 101637597.jpg
AI

Bull markets, bubbles and Swiftonomics

AllTopicsToday
AllTopicsToday
October 5, 2025
9 mind-soothing apps to help you sleep better
Top AI Infrastructure Companies | Comprehensive Comparison Guide
My Hero Academia Beats Gachiakuta As Crunchyroll’s Top Anime
3 Electrical Equipment And Parts Stocks Flashing Strong Signals – Flux Power Holdings (NASDAQ:FLUX), GrafTech International (NYSE:EAF)
- Advertisement -
Ad space (1)

Categories

  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies

About US

We believe in the power of information to empower decisions, fuel curiosity, and spark innovation.
Quick Links
  • Home
  • Blog
  • About Us
  • Contact
Important Links
  • About Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • Contact

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

©AllTopicsToday 2026. All Rights Reserved.
1 2
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?