AllTopicsTodayAllTopicsToday
Notification
Font ResizerAa
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Reading: Meet ‘AutoAgent’: The Open-Source Library That Lets an AI Engineer and Optimize Its Own Agent Harness Overnight
Share
Font ResizerAa
AllTopicsTodayAllTopicsToday
  • Home
  • Blog
  • About Us
  • Contact
Search
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Have an existing account? Sign In
Follow US
©AllTopicsToday 2026. All Rights Reserved.
AllTopicsToday > Blog > AI > Meet ‘AutoAgent’: The Open-Source Library That Lets an AI Engineer and Optimize Its Own Agent Harness Overnight
Blog 9 1024x731.png
AI

Meet ‘AutoAgent’: The Open-Source Library That Lets an AI Engineer and Optimize Its Own Agent Harness Overnight

AllTopicsToday
Last updated: April 5, 2026 4:59 pm
AllTopicsToday
Published: April 5, 2026
Share
SHARE

There are particular forms of boring duties that every one AI engineers are acquainted with. It is a immediate tuning loop. Create system prompts, run brokers in opposition to benchmarks, learn fault traces, tweak prompts, add instruments, and rerun. Should you repeat this a number of dozen occasions, the needle might transfer. That is the tedious activity of dressing up your Python recordsdata. Now, a brand new open supply library referred to as AutoAgent, constructed by Kevin Gu of threelayer.inc, proposes an unsettling various. Do not do this work your self. Let the AI ​​do it.

AutoAgent is an open supply library for autonomously enhancing brokers on any area. In a 24-hour run, it achieved #1 on SpreadsheetBench with a rating of 96.5% and #1 on Terminal Bench with a GPT-5 rating of 55.1%.

https://x.com/kevingu/standing/2039843234760073341

What truly is AutoAgent?

AutoAgent is described as being “like automated analysis, however for agent engineering.” The thought is to offer an AI agent a activity and let it autonomously construct and iterate on an agent harness in a single day. Modify system prompts, instruments, agent configurations, and orchestrations, run benchmarks, test scores, preserve or discard modifications, and repeat.

To know the analogy, Andrej Karpathy’s automated analysis does the identical factor for ML coaching. It loops by the suggestion-train-evaluation cycle and retains solely the modifications that enhance the validation loss. AutoAgent transfers the identical ratchet loop from ML coaching to agent engineering. As a substitute of optimizing mannequin weights and coaching hyperparameters, you optimize the harness, the system prompts, instrument definitions, routing logic, and orchestration methods that decide how the agent behaves on a activity.

The harness on this context is the scaffolding across the LLM. That’s, what system prompts it receives, what instruments it could invoke, the way it routes between subagents, and the way its duties are formatted as enter. Most agent engineers handcraft this scaffolding. AutoAgent automates the iteration of the scaffold itself.

Structure: 2 brokers, 1 file, 1 directive

GitHub repositories have an deliberately easy construction. Agent.py combines all the harness below take a look at into one file. This consists of configuration, instrument definitions, agent registries, routing/orchestration, and harbor adapter boundaries. The adapter part is explicitly marked as fastened. The remaining are the primary editorial points of MetaAgent. program.md incorporates the meta-agent directions and directives (what sort of agent to construct) and is the one file edited by people.

Consider this as a separation of considerations between people and machines. A human units the course in program.md. A meta-agent (a separate high-level AI) then reads that directive, inspects agent.py, runs the benchmark, diagnoses what failed, and repeats the method by rewriting the related components of agent.py. People by no means contact agent.py straight.

A key infrastructure that maintains loop consistency throughout iterations is outcomes.tsv, an experiment log that’s mechanically created and maintained by the meta-agent. It tracks all experiments carried out and supplies a historical past for the meta-agent to study and modify what to strive subsequent. The whole venture construction additionally features a Dockerfile.base, an elective .agent/ listing for reusable agent workspace artifacts corresponding to prompts and expertise, a activity/ folder for benchmark payloads (added for every benchmark department), and a jobs/ listing for Harbor job output.

This metric is the entire rating produced by the benchmark’s activity take a look at suite. Meta-agents will climb this rating. All experiments generate numerical scores. If it is good, preserve it; if it is not, discard it. This is similar loop as computerized investigation.

Job format and harbor integration

Benchmarks are expressed as Harbor-style duties. Every activity resides below duties/my-task/ and incorporates activity.toml for settings corresponding to timeouts and metadata, directions.md which is the immediate despatched to the agent, a testing/ listing with a take a look at.sh entry level that writes the rating to /logs/reward.txt, and take a look at.py for deterministic checking or validation utilizing LLM-as-judge. The surroundings/Dockerfile defines the duty container and the file/listing holds reference recordsdata mounted within the container. The take a look at writes a rating between 0.0 and 1.0 to the verifier’s log. Meta-agents climb this mountain.

The LLM-as-judge sample right here is value flagging. Along with checking the reply deterministically (as in a unit take a look at), the take a look at suite can use one other LLM to judge whether or not the agent’s output is “adequate”. That is widespread in agent benchmarks the place the proper reply shouldn’t be reducible to a string match.

Necessary factors

Autonomous Harness Engineering Work — AutoAgent proves {that a} meta-agent can utterly substitute the human immediate adjustment loop, iterating by agent.py in a single day with no human straight touching the harness file. Benchmark outcomes validate our method — over 24 hours of execution, AutoAgent achieved first place in SpreadsheetBench (96.5%) and the very best GPT-5 rating in Terminal Bench (55.1%), outperforming all different hand-engineered entries by people. “Mannequin empathy” could also be an actual phenomenon — Claude metaagents optimizing Claude activity brokers seem to diagnose failures extra precisely than optimizing GPT-based brokers, suggesting that pairing same-family fashions could also be necessary when designing AutoAgent loops. The human job strikes from the engineer to the director. You do not write or edit agent.py. Create program.md, a easy Markdown directive that controls the metaagent. This distinction displays broader modifications in agent engineering, from writing code to setting objectives. Plug-and-play with any benchmark — AutoAgent is area agnostic as a result of duties observe Harbor’s open format and brokers run inside Docker containers. Any activity that may be scored, corresponding to spreadsheets, terminal instructions, or your personal customized area, could be a goal for autonomous self-optimization.

Take a look at the repository and tweets. Additionally, be at liberty to observe us on Twitter. Additionally, do not forget to affix the 120,000+ ML SubReddit and subscribe to our publication. hold on! Are you on telegram? Now you can additionally take part by telegram.

Have to accomplice with us to advertise your GitHub repository, Hug Face Web page, product releases, webinars, and extra? Join with us

ChatGPT Is Making People Think They’re Gods and Their Families Are Terrified
Deep researcher with test-time diffusion
Introducing Clarifai Reasoning Engine Optimized for Agentic AI Inference
Beginner’s Guide to Automating ML Workflows
Asia markets slip as investors assess Greenland and China data
TAGGED:agentAutoAgentengineerHarnessletsLibraryMeetopensourceOptimizeOvernight
Share This Article
Facebook Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

Popular News
Vivrelle chanel 2.jpg
Wellness

3.6 Friday Faves – The Fitnessista

AllTopicsToday
AllTopicsToday
March 7, 2026
Good Luck, Have Fun, Don’t Die rails against AI in style
Behind the Scenes with the Band and Director
wavehour.top (wavehour.top) program details. Reviews, Scam or Paying
Euphoria Star’s Directorial Debut Is A Hilarious, Heartwarming Affair
- Advertisement -
Ad space (1)

Categories

  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies

About US

We believe in the power of information to empower decisions, fuel curiosity, and spark innovation.
Quick Links
  • Home
  • Blog
  • About Us
  • Contact
Important Links
  • About Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • Contact

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

©AllTopicsToday 2026. All Rights Reserved.
1 2
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?