AllTopicsTodayAllTopicsToday
Notification
Font ResizerAa
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Reading: How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models
Share
Font ResizerAa
AllTopicsTodayAllTopicsToday
  • Home
  • Blog
  • About Us
  • Contact
Search
  • Home
  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies
Have an existing account? Sign In
Follow US
©AllTopicsToday 2026. All Rights Reserved.
AllTopicsToday > Blog > AI > How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models
Blog banner 85.png
AI

How to Build a Fully Functional Computer-Use Agent that Thinks, Plans, and Executes Virtual Actions Using Local AI Models

AllTopicsToday
Last updated: October 25, 2025 11:38 am
AllTopicsToday
Published: October 25, 2025
Share
SHARE

On this tutorial, you construct from scratch a classy computational agent that may purpose, plan, and execute digital actions utilizing native open-weight fashions. We create a miniature mock desktop, equip it with a software interface, and design an clever agent that may analyze its atmosphere, determine on actions similar to clicks and inputs, and carry out them step-by-step. On the finish, you may see how the agent interprets targets similar to opening an electronic mail or writing a observe, and you will present how native language fashions can mimic interactive reasoning and activity execution. Take a look at the entire code right here.

!pip set up -q Transformer accelerates statements nest_asyncio Import torch, asyncio, uuid from transformer Import pipeline import net_asyncio nest_asyncio.apply()

Arrange your atmosphere by putting in required libraries similar to Transformers, Speed up, and Nest Asyncio. This lets you seamlessly run native fashions and asynchronous duties in Colab. Prepares the runtime in order that future elements of the agent can function effectively with out exterior dependencies. Take a look at the entire code right here.

class LocalLLM: def __init__(self, model_name=”google/flan-t5-small”, max_new_tokens=128): self.pipe = Pipeline(“text2text-generation”, mannequin=model_name, gadget=0 if torch.cuda.is_available() else -1) self.max_new_tokens = max_new_tokens def generate(self, immediate: str) -> str: out = self.pipe(immediate, max_new_tokens=self.max_new_tokens, temperature=0.0)[0][“generated_text”]

return out.strip() class VirtualComputer: def __init__(self): self.apps = {“browser”: “https://instance.com”, “notes”: “”, “mail”: [“Welcome to CUA”, “Invoice #221”, “Weekly Report”]} self.focus = “browser” self.display = “A browser opens at https://instance.comnThe search bar is in focus.” ” self.action_log = []
def screenshot(self): return f”FOCUS:{self.focus}nSCREEN:n{self.display}nAPPS:{checklist(self.apps.keys())}” def click on(self, goal:str): if goal in self.apps: self.focus = goal if goal==”browser”: self.display = f”Browser tab: {self.apps[‘browser’]}nAddress bar is concentrated. ” elif goal==”notes”: self.display = f”Notes AppnCurrent notes:n{self.apps[‘notes’]}” elif goal==”mail”: inbox = “n”.be part of(f”- {s}” for s in self.apps)[‘mail’]) self.display = f”Mail app inbox:n{inbox}n(read-only preview)” else: self.display += f”nClicked “{goal}.” self.action_log.append({“kind”:”click on”,”goal”:goal}) def kind(self, textual content:str): if self.focus==”browser”: self.apps[“browser”] = textual content self.display = f”Browser tab is positioned at {textual content}nPage heading: Pattern Area” elif self.focus==”notes”: self.apps[“notes”] += (“n”+textual content) self.display = f”Memo AppsnCurrent Memo:n{self.apps[‘notes’]}” else: self.display += f”nYou entered “{textual content}” however there aren’t any editable fields. ” self.action_log.append({“kind”:”kind”,”textual content”:textual content})

Outline core elements, light-weight native fashions, and digital computer systems. Use Flan-T5 as an inference engine to create a simulated desktop that may open apps, show screens, and reply to enter and click on interactions. Take a look at the entire code right here.

class ComputerTool: def __init__(self,pc:VirtualComputer): self.pc =pc def run(self, command:str, argument:str=””): if command==”click on”: self.pc.click on(argument) return {“standing”:”accomplished”,”end result”:f”clicked {argument}”} if command==”kind”: self.pc.kind(argument) return {“standing”:”accomplished”,”end result”:f”typed {argument}”} if command==”screenshot”: snap = self.pc.screenshot() return {“standing”:”accomplished”,”end result”:snap} return {“standing”:”error”,”end result”:f”unknown command {command}”}

Introduces the ComputerTool interface. It acts as a communication bridge between the agent’s inference and the digital desktop. Outline high-level actions similar to clicks, sorts, and screenshots to allow brokers to work together with the atmosphere in a structured manner. Take a look at the entire code right here.

class ComputerAgent: def __init__(self, llm:LocalLLM, instruments:ComputerTool, max_trajectory_budget:float=5.0): self.llm = llm self.software = software self.max_trajectory_budget = max_trajectory_budget async def run(self,messages): user_goal =messages[-1][“content”]

Variety of steps remaining = int(self.max_trajectory_budget) Output occasion = []
total_prompt_tokens = 0 total_completion_tokens = 0 whilesteps_remaining>0: display = self.software.pc.screenshot() immediate = ( “You’re a pc utilization agent.n” f”Person objective: {user_goal}n” f”Present display:n{display}nn” “Suppose step-by-step.n” “Reply: ACTION ARG THEN .n” ) thought = self.llm.generate(immediate) total_prompt_tokens += len(immediate.cut up()) total_completion_tokens += len(thought.cut up()) motion=”screenshot”; arg=””; Assistant_msg=”Working…” on the road thought.splitlines(): if line.strip().startswith(“ACTION “): after = line.cut up(“ACTION “,1)[1]
motion = after.cut up()[0].strip() If there may be “ARG” within the line: half = line.cut up(“ARG “,1)[1]
If half is ” THEN “: arg = half.cut up(” THEN “)[0].strip() else: arg = half.strip() if “THEN ” in line:assistant_msg = line.cut up(“THEN “,1)[1].strip() Output_events.append({“Abstract”:[{“text”:assistant_msg,”type”:”summary_text”}],”kind”:”inference”}) call_id = “call_”+uuid.uuid4().hex[:16]
tool_res = self.software.run(motion, arg) Output_events.append({“motion”:{“kind”:motion,”textual content”:arg},”call_id”:call_id,”standing”:tool_res[“status”],”kind”:”computer_call”}) snap = self.software.pc.screenshot() Output_events.append({“kind”:”computer_call_output”,”call_id”:call_id,”output”:{“kind”:”input_image”,”image_url”:snap}}) Output_events.append({“kind”:”message”,”position”:”assistant”,”content material”:[{“type”:”output_text”,”text”:assistant_msg}]}) If “completed” in assistant_msg.decrease() or “right here is” in assistant_msg.decrease(): Breaksteps_remaining -= 1 use = {“prompt_tokens”: total_prompt_tokens,”completion_tokens”: total_completion_tokens,”total_tokens”: total_prompt_tokens + total_completion_tokens,”response_cost”: 0.0} yield {“output”: output occasion, “utilization”: utilization}

Construct a ComputerAgent to behave as an clever controller in your system. We program it to purpose about targets, determine which actions to take, execute them by way of the software’s interface, and document every interplay as a step within the decision-making course of. Take a look at the entire code right here.

async def main_demo(): pc = VirtualComputer() software = ComputerTool(pc) llm = LocalLLM() agent = ComputerAgent(llm, software, max_trajectory_budget=4) message =[{“role”:”user”,”content”:”Open mail, read inbox subjects, and summarize.”}]
async for the results of Agent.run(messages): print(“==== STREAM RESULT ====”) for the ensuing occasion[“output”]: For occasions[“type”]==”Pc name”: a =occasion.get(“motion”,{}) print(f”[TOOL CALL] {a.get(‘kind’)} -> {a.get(‘textual content’)} [{event.get(‘status’)}]”) for occasions[“type”]==”computer_call_output”: snap = occasion[“output”][“image_url”]

print(“Display screen after motion:n”, snap[:400],”…n”) for occasions[“type”]==”Message”: print(“Assistant:”, Occasion[“content”][0][“text”]”n”) print(“Utilization:”, end result[“usage”]) loop = asyncio.get_event_loop() loop.run_until_complete(main_demo())

We’ll put every little thing collectively by working the demo. The agent interprets the consumer’s request and executes the duty on the digital pc. We watch because it generates inferences, executes instructions, updates digital screens, and accomplishes its targets clearly and step-by-step.

In conclusion, we now have applied the essence of a computer-based agent able to autonomous reasoning and interplay. Witness how native language fashions like Flan-T5 can powerfully simulate desktop-level automation inside a safe text-based sandbox. This mission helps perceive the structure behind clever brokers, similar to computer-assisted brokers, that bridge pure language reasoning and digital software management. It lays a robust basis for extending these capabilities into real-world, multimodal, and safe automation programs.

Take a look at the entire code right here. Be at liberty to go to our GitHub web page for tutorials, code, and notebooks. Additionally, be at liberty to observe us on Twitter. Additionally, do not forget to affix the 100,000+ ML SubReddit and subscribe to our publication. dangle on! Are you on telegram? Now you can additionally take part by telegram.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of synthetic intelligence for social good. His newest endeavor is the launch of Marktechpost, a synthetic intelligence media platform. It stands out for its thorough protection of machine studying and deep studying information, which is technically sound and simply understood by a large viewers. The platform boasts over 2 million views monthly, demonstrating its recognition amongst viewers.

🙌 Observe MARKTECHPOST: Add us as your most well-liked supply on Google.

Nano Banana 2 is Here! Smaller, Faster, Cheaper
How AI tools can redefine universal design to increase accessibility
Model Quantization: Meaning, Benefits & Techniques
DHS Concludes with a Bold AI Vision
Amazon Reveals Layoff Plans By Mistake In Email To AWS Employees: Report – Amazon.com (NASDAQ:AMZN)
TAGGED:actionsagentbuildComputerUseExecutesFullyFunctionalLocalModelsPlansthinksVirtual
Share This Article
Facebook Email Print
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
XFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

Popular News
Screen shot 2019 04 09 at 12.38.58 pm.png
Tech

What does it mean when Uncle Sam is one of your biggest shareholders? Chip startup xLight is about to find out

AllTopicsToday
AllTopicsToday
December 2, 2025
ION Closes Upsized Non-Brokered Private Placement
Democrats benefited from voters who don’t like either party
10 Great RPGs You Can Grab For Surprisingly Cheap Right Now
How to Build a Model-Native Agent That Learns Internal Planning, Memory, and Multi-Tool Reasoning Through End-to-End Reinforcement Learning
- Advertisement -
Ad space (1)

Categories

  • Tech
  • Investing & Finance
  • AI
  • Entertainment
  • Wellness
  • Gaming
  • Movies

About US

We believe in the power of information to empower decisions, fuel curiosity, and spark innovation.
Quick Links
  • Home
  • Blog
  • About Us
  • Contact
Important Links
  • About Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
  • Contact

Subscribe US

Subscribe to our newsletter to get our newest articles instantly!

©AllTopicsToday 2026. All Rights Reserved.
1 2
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?