AllTopicsTodayAllTopicsToday
Notification
Font ResizerAa
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Reading: FireRedTeam Releases FireRed-OCR-2B Utilizing GRPO to Solve Structural Hallucinations in Tables and LaTeX for Software Developers
Share
Font ResizerAa
AllTopicsTodayAllTopicsToday
  • Home
  • Blog
  • About Us
  • Contact
Search
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Have an existing account? Sign In
Follow US
©AllTopicsToday 2026. All Rights Reserved.
AllTopicsToday > Blog > AI > FireRedTeam Releases FireRed-OCR-2B Utilizing GRPO to Solve Structural Hallucinations in Tables and LaTeX for Software Developers
AI

FireRedTeam Releases FireRed-OCR-2B Utilizing GRPO to Solve Structural Hallucinations in Tables and LaTeX for Software Developers

AllTopicsToday
Last updated: March 2, 2026 2:17 pm
AllTopicsToday
Published: March 2, 2026
Share
SHARE

Digitizing paperwork has lengthy been a multi-step downside. It first detects the structure, then extracts the textual content, and at last makes an attempt to reconstruct the construction. For Massive Imaginative and prescient-Language Fashions (LVLM), this usually results in “structural hallucinations” similar to jumbled strains, concocted formulation, or unclosed syntax.

FireRedTeam has launched FireRed-OCR-2B, a flagship mannequin designed to deal with doc evaluation as a structural engineering process quite than “impressionist” textual content technology. Constructed on the Qwen3-VL-2B-Instruct structure, this mannequin established a brand new state-of-the-art in end-to-end options (SOTA) and achieved an total rating of 92.94% on the OmniDocBench v1.5 benchmark.

Paradigm Shift: Structural Engineering vs. Textual content Era

Builders usually discover that even probably the most highly effective widespread VLMs wrestle with the dense spatial logic of technical PDFs. When a mannequin “sees” complicated tables or multi-line LaTeX equations, it usually can not preserve hierarchical relationships between components.

FireRed-OCR-2B addresses this by means of a specialised progressive coaching pipeline consisting of three distinct levels:

Multitask preconditioning: This stage establishes the spatial basis by coaching the mannequin on duties starting from detection, area recognition, and structure to markdown. Specialised SFT (Supervised Fantastic-Tuning): Fashions are fine-tuned based mostly on high-quality standardized Markdown datasets to make sure logical consistency and hierarchical illustration. Kind-constrained GRPO: The ultimate stage makes use of reinforcement studying to implement syntactic validity.

Core innovation: format-constrained GRPO

FireRed-OCR’s most essential technical differentiator is its use of group-relative coverage optimization (GRPO) with format constraints. Whereas conventional fine-tuning focuses on character accuracy, GRPO introduces a reinforcement studying loop that offers the mannequin particular structural properties.

Mathematical syntax: Make sure that LaTeX equations are mathematically legitimate. Desk integrity: Keep constant row/column counts and correct HTML/Markdown tagging. Hierarchical closure: Ensures that every one open structural tags (similar to lists and headers) are correctly closed. Textual content accuracy: Cut back character-level errors in dense blocks of textual content.

By eliminating the necessity for separate “important” fashions, a key good thing about the GRPO algorithm, FireRedTeam has optimized the coaching course of to particularly give attention to high-friction areas of doc evaluation.

Fixing long-tail structure points

The “lengthy tail” of doc structure (similar to non-standard authorized codecs, educational papers with overlapping figures, or handwritten annotations) is the place most OCR pipelines break. FireRed-OCR makes use of a “geometry + semantics” knowledge manufacturing facility.

This new strategy makes use of geometric characteristic clustering and multidimensional tagging to synthesize a balanced dataset. By combining geometric consciousness and semantic understanding, the mannequin maintains “in-the-wild robustness” and outperforms conventional pipeline methods similar to PaddleOCR on complicated and non-standard layouts (as benchmarked on the FireRedBench dataset).

Efficiency benchmark

In a direct comparability in OmniDocBench v1.5, FireRed-OCR-2B (92.94%) considerably outperforms different end-to-end fashions, together with:

DeepSeek-OCR 2: 91.09% Gemini-3.0 Professional: 90.33% Qwen3-VL-235B: 89.15%

Though some “pipeline” options (utilizing separate fashions for detection and recognition) obtain barely increased scores, FireRed-OCR-2B has the most effective efficiency as a single-model end-to-end strategy. That is particularly essential for builders trying to scale back system complexity and inference latency in manufacturing Retrieval-Augmented Era (RAG) environments.

Necessary factors

We have summarized the technical significance and efficiency metrics of the FireRed-OCR-2B launch into 5 key takeaways for AI engineers and knowledge scientists.

5 essential factors: FireRed-OCR-2B

New end-to-end SOTA efficiency: FireRed-OCR-2B achieved a state-of-the-art (SOTA) rating of 92.94% on the OmniDocBench v1.5 benchmark. This makes it the main single-model answer for doc evaluation, surpassing considerably bigger fashions similar to Qwen2-VL-72B and Gemini-1.5-Professional ​​in structural accuracy. Architectural Basis: Fashions constructed on Qwen2-VL-2B-Instruct (or up to date 2026 iterations) make the most of the Imaginative and prescient-Language-Mannequin (VLM) strategy. Exchange conventional multi-stage pipelines (separate detection, cropping, and OCR steps) with a unified end-to-end transformer structure that straight outputs structured markdown. Structural integrity with GRPO: The important thing technical differentiator is using GRPO (Group Relative Coverage Optimization) with format constraints. This reinforcement studying method rewards fashions that preserve syntactic validity. Particularly, it ensures that LaTeX formulation, desk tags, and Markdown hierarchies are logically closed and mathematically constant. “Geometry + Semantics” Information Manufacturing unit: To resolve complicated “real-world” structure issues, FireRedTeam has developed a specialised knowledge engine. This “manufacturing facility” synthesizes datasets by balancing geometric structure options with semantic content material, permitting the mannequin to deal with duplicate figures, multi-column educational papers, and non-standard types extra reliably than earlier iterations.

Verify your mannequin weights and repositories. Additionally, be at liberty to observe us on Twitter. Additionally, do not forget to affix the 120,000+ ML SubReddit and subscribe to our e-newsletter. dangle on! Are you on telegram? Now you can additionally take part by telegram.

How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models
Trump asking EU to slap 100% tariffs on India and China raises eyebrows
A Coding Guide to Build a Scalable End-to-End Machine Learning Data Pipeline Using Daft for High-Performance Structured and Image Data Processing
Unfiltered AI Companion Chatbots with Phone Calls: Top Picks
How AI tools can redefine universal design to increase accessibility
Share This Article
Facebook Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!
Popular News
Oatmealcookies2.png
Wellness

Healthy Oatmeal Raisin Chocolate Chip Cookies

AllTopicsToday
AllTopicsToday
March 19, 2026
Social media and PR: SMEs need to raise their game
Zenpay (zenpay.sbs) program details. Reviews, Scam or Paying
Level Up You Racing Experience With the Mad Catz M.2.X. Pro Racing Wheel
New Modding Platform GGMods Launches, Offering Funding For Your Mods
- Advertisement -
Ad space (1)

Categories

  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies

About US

We believe in the power of information to empower decisions, fuel curiosity, and spark innovation.
Quick Links
  • Home
  • Blog
  • About Us
  • Contact
Important Links
  • About Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • Contact

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

©AllTopicsToday 2026. All Rights Reserved.
1 2
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?