DeepReinforce has launched Ornith-1.0, an open supply mannequin household constructed for agent coding. Out there in 4 sizes, from the 9B high-density mannequin to the 397B knowledgeable blended flagship. All Checkpoints are shipped below Hugging Face’s MIT License. The mannequin is post-trained primarily based on pre-trained Gemma 4 and Qwen 3.5.
Most coding brokers mix fashions with human-designed fixation harnesses. Ornith-1.0 as a substitute learns its personal description. The DeepReinforce analysis group studies state-of-the-art outcomes on open fashions of comparable measurement.
TL;DR
Ornith-1.0 ships below MIT in 9B, 31B, 35B-MoE, and 397B-MoE sizes and is constructed on Gemma 4 and Qwen 3.5. The mannequin learns its personal scaffolding throughout RL and collectively optimizes the harness and resolution. Ornith-1.0-397B outperforms Claude Opus 4.7 in each headline benchmarks, however falls wanting Opus 4.8 and the bigger GLM-5.2-744B. Three layers (fastened belief boundaries, deterministic displays, and frozen LLM judges) forestall rewards from being hacked.
What’s Ornis-1.0?
Ornith-1.0 is a set of inference fashions tailor-made for coding brokers. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B mannequin is a mixture of consultants and prompts roughly 3B parameters per token. FP8 and GGUF builds are additionally printed to hurry up native supply.
Every mannequin is an inference mannequin. Replies begin with a block earlier than the ultimate reply. The supplied recipe permits the reasoning parser, so the hint is returned in a separate reasoning_content area. This mannequin additionally points well-formed instrument requires agent loops.
Set up is simple. The 9B mannequin is about 19GB in bf16 and runs on a single 80GB GPU. The supplied recipes goal vLLM, SGLang, and Transformers. Every mannequin exposes an OpenAI-compatible endpoint. Subsequently, the usual agent framework works with none code adjustments.


