introduction
LM Studio makes it extremely straightforward to run and experiment with open supply large-scale language fashions (LLMs) fully in your native machine, with out requiring an web connection or cloud dependencies. Obtain fashions, begin chats, and discover responses with full management over your information.
However what if you wish to transcend the native interface?
Suppose you could have an LM Studio mannequin working regionally and need to name it from one other app, combine it into your manufacturing atmosphere, share it securely together with your crew, or join it to instruments constructed across the OpenAI API.
That is the tough half. LM Studio runs fashions regionally however doesn’t expose them natively by a safe authenticated API. In the event you set this up manually, you will be dealing with tunneling, routing, and API administration your self.
That is the place the Clarifai Native Runner is available in. Native Runner lets you securely and seamlessly serve AI fashions, MCP servers, or brokers by public APIs immediately out of your laptop computer, workstation, or inner server. There isn’t any must add fashions or handle infrastructure. When run regionally, Clarifai handles the API, routing, and integration.
When executed, the native runner establishes a safe connection to Clarifai’s management aircraft. All API requests despatched to your mannequin are routed to your machine, processed regionally, and returned to your consumer. From the skin, it behaves like a Clarifai-hosted mannequin, however all calculations are completed on native {hardware}.
Native runners can help you:
Run the mannequin by yourself {hardware}
Use a laptop computer, workstation, or on-premises server with full entry to the native GPU and system instruments.
Preserve your information and computing non-public
Do not add something. That is helpful in regulated environments and delicate initiatives.
Skip infrastructure setup
There isn’t any must construct and host your personal API. Clarifai supplies endpoints, routing, and authentication.
Prototype and iterate rapidly
Take a look at your mannequin in a dwell pipeline with out delaying deployment. Examine requests and output dwell.
Connect with native recordsdata and personal APIs
Enable fashions to entry file techniques, inner databases, or OS assets with out exposing the atmosphere.
Now that the advantages are clear, let’s check out easy methods to run LM Studio fashions regionally and expose them securely through the API.
Run the LM Studio mannequin regionally
The Clarifai CLI’s LM Studio toolkit lets you initialize, configure, and run LM Studio fashions regionally whereas exposing them by a safe public API. Take a look at, combine, and iterate immediately out of your machine with out standing up any infrastructure.
Notice: Please obtain and hold LM Studio open when working Native Runner. The runner begins and communicates with LM Studio through native ports to load, serve, and execute mannequin inference.
Step 1: Conditions
Set up the Clarifai package deal and CLI
Login to Make clear
Observe the prompts to enter your person ID and private entry token (PAT). In the event you need assistance acquiring these, please see the documentation.
Step 2: Initialize the mannequin
Initialize and configure your LM Studio mannequin regionally utilizing the Clarifai CLI. Solely fashions accessible within the LM Studio mannequin catalog and GGUF format are supported.
Initialize the default pattern mannequin
By default, a venture for the LiquidAI/LFM2-1.2B LM Studio mannequin is created within the present listing.
If you wish to use a particular mannequin as an alternative of the default LiquidAI/LFM2-1.2B, –Mannequin title Flag specifying the total mannequin title. See an entire checklist of all fashions right here.
Notice: Some fashions are massive and require massive quantities of reminiscence. Be certain that your machine meets the mannequin necessities earlier than initializing.
After working the above command, the CLI will scaffold your venture. The generated listing construction appears like this:
mannequin.py incorporates logic that calls LM Studio’s native runtime for predictions. config.yaml defines metadata, compute traits, and toolkit settings. necessities.txt lists Python dependencies.
Step 3: Customise mannequin.py
The scaffold features a LMstudioModelClass that extends OpenAIModelClass. This defines how Native Runner interacts with LM Studio’s native runtime.
Primary strategies:
load_model() – Begins LM Studio’s native runtime, masses the chosen mannequin, and connects to the server port utilizing an OpenAI-compatible API interface.
predict() – handles single-prompt inference with non-obligatory parameters resembling max_tokens, temperature, and top_p. Returns the entire mannequin response.
generate() – Stream generated tokens in actual time for interactive or incremental output.
You should utilize these implementations as is or modify them to fit your most popular request and response construction.
Step 4: Configure config.yaml
The config.yaml file defines the mannequin ID, runtime, and compute metadata for LM Studio Native Runner.
Mannequin – Incorporates id, user_id, app_id, model_type_id (for instance, text-to-text).
Toolkit – Specify lmstudio because the supplier. The primary fields are:
Mannequin – LM Studio mannequin to make use of (e.g. LiquidAI/LFM2-1.2B).
port – Native port on which the LM Studio server listens.
context_length – Most size of the mannequin’s context.
inference_compute_info – That is largely non-obligatory for native runners, because the mannequin runs fully in your native machine and makes use of native CPU/GPU assets. You possibly can go away the default as is. In the event you plan to deploy your mannequin to Clarifai’s devoted compute, you’ll be able to specify CPU/reminiscence limits, variety of accelerators, and GPU sort to fit your mannequin’s necessities.
build_info – Specifies the Python model used at runtime (e.g. 3.12).
Lastly, the necessities.txt file lists the Python dependencies your mannequin requires. Add any extra packages wanted to your logic.
Step 5: Begin your native runner
Begin an area runner that connects to the LM Studio runtime.
If contexts or defaults are lacking, the CLI prompts you to create them. This ensures that the compute context, node pool, and deployment are set in your configuration.
After launching, you’ll obtain a public Clarifai URL to your native mannequin. Requests despatched to this endpoint are securely routed to your machine, executed by LM Studio, and returned to your consumer.
Run inference with native runner
As soon as your LM Studio mannequin is run regionally and printed through Clarifai Native Runner, you’ll be able to submit inference requests from wherever utilizing OpenAI-compatible APIs or the Clarifai SDK.
OpenAI suitable API
Clarifai Python SDK
You can even attempt the real-time streaming generate() technique.
conclusion
Native Runner provides you full management over the place your fashions run with out sacrificing integration, safety, or flexibility. You possibly can prototype, take a look at, and serve real-world workloads by yourself {hardware} whereas Clarifai handles routing, authentication, and public endpoints.
Strive Native Runners at no cost with the free tier, or improve to a developer plan for $1 per thirty days for the primary 12 months and join as much as 5 Native Runners for limitless time.


