MiniMax, the AI analysis firm behind the MiniMax omnimodal mannequin stack, has launched MMX-CLI, a Node.js-based command-line interface. This exposes the complete suite of technology capabilities of the MiniMax AI platform to each human builders working in terminals and AI brokers working in instruments corresponding to Cursor, Claude Code, and OpenCode.
What downside does MMX-CLI resolve?
Most present large-scale language mannequin (LLM)-based brokers are good at studying and writing textual content. They will motive by means of paperwork, generate code, and reply to a number of directions. Nonetheless, there isn’t any direct path to generate media. There isn’t a built-in solution to synthesize audio, compose music, render video, or perceive photos with out utilizing a separate integration layer corresponding to Mannequin Context Protocol (MCP).
Constructing these integrations sometimes requires writing customized API wrappers, configuring server-side instruments, and managing authentication individually from the agent framework you are utilizing. MMX-CLI is positioned as a substitute strategy. We expose all of those capabilities as shell instructions that brokers can name straight, the identical means builders do from the terminal. No MCP glue required.
7 modalities
MMX-CLI wraps MiniMax’s full modal stack into seven teams of technology instructions (mmx textual content, mmx picture, mmx video, mmx speech, mmx music, mmx imaginative and prescient, mmx search), plus supporting utilities (mmx auth, mmx config, mmxuota, mmx replace).
The mmx textual content command helps multi-turn chat, streaming output, system immediate, and JSON output modes. Use the –model flag to focus on a selected MiniMax mannequin variant, corresponding to MiniMax-M2.7-highspeed (MiniMax-M2.7 is the default). The mmx picture command generates photos from textual content prompts that management the side ratio (–aspect-ratio) and variety of batches (–n). It additionally helps the –subject-ref parameter for topic references. This enables character or object consistency throughout a number of photos generated, helpful for workflows that require visible continuity. The mmx video command makes use of MiniMax-Hailuo-2.3 because the default mannequin, however you need to use MiniMax-Hailuo-2.3-Quick as an alternative. By default, mmx video generated submits jobs and polls synchronously till the video is prepared. Passing –async or –no-wait modifications this conduct. The command instantly returns the duty ID, permitting the caller to independently test the progress through mmx video process get –task-id. This command additionally helps the –first-frame flag for image-conditional video technology, the place the particular picture is used as the beginning body of the output video. mmx voice instructions expose text-to-speech (TTS) synthesis with over 30 obtainable voices, pace management, quantity and pitch adjustment, subtitle timing information output through –subtitles, and streaming playback help through a pipe to a media participant. The default mannequin is speech-2.8-hd, however there are speech-2.6 and speech-02 as an alternative. The enter restrict is 10,000 characters. The mmx music command, which helps the music-2.5 mannequin, generates music from textual content prompts with fine-grained compositional controls together with –vocals (e.g. “heat male baritone”), –genre, –mood, –instruments, –tempo, –bpm, –key, and –struct. –instrumental flag produces music with out vocals. The –aigc-watermark flag may also be used to embed a watermark for AI-generated content material into the output audio. mmx imaginative and prescient handles picture understanding by means of the Imaginative and prescient Language Mannequin (VLM). Accepts a neighborhood file path or distant URL (native recordsdata are mechanically Base64 encoded), or a beforehand uploaded MiniMax file ID. The –prompt flag permits you to ask particular questions concerning the picture. The default immediate is “Picture description”. mmx search executes internet search queries by means of MiniMax’s proprietary search infrastructure and returns ends in textual content or JSON format.
expertise structure
MMX-CLI is written virtually solely in TypeScript (99.8% TS) with Strict mode enabled, makes use of Bun because the native runtime for growth and testing, and is distributed on npm for compatibility with Node.js 18+ environments. Zod is used to validate the configuration schema, and determination follows outlined priorities (CLI flags → surroundings variables → ~/.mmx/config.json → default). This makes it simple to deploy in containerized or CI environments. Twin-region help is constructed into the API consumer layer, which routes international customers to api.minimax.io and CN customers to api.minimaxi.com. It may be toggled through mmx config set –keyregion –value cn.
Necessary factors
MMX-CLI is MiniMax’s official open command line interface, giving AI brokers native entry to seven generative modalities: textual content, photos, video, audio, music, imaginative and prescient, and search, with out requiring MCP integration. AI brokers working in instruments like Cursor, Claude Code, and OpenCode will be arrange with two instructions and one pure language instruction, after which the agent learns the entire command interface by itself from the bundled SKILL.md doc. The CLI is designed to be used by applications and brokers, with devoted flags for non-interactive execution, clear stdout/stderr separation for protected pipes, structured exit codes for error dealing with, and a schema export characteristic that permits agent frameworks to register mmx instructions as JSON instrument definitions. For AI builders who’re already constructing agent-based programs, integrating picture, video, audio, music, imaginative and prescient, and search technology right into a single, well-documented CLI that brokers can study and function on their very own considerably lowers the mixing barrier.
Take a look at the repository right here. Additionally, be at liberty to comply with us on Twitter. Additionally, do not forget to affix the 130,000+ ML SubReddit and subscribe to our publication. dangle on! Are you on telegram? Now you can additionally take part by telegram.
Must associate with us to advertise your GitHub repository, Hug Face Web page, product releases, webinars, and extra? Join with us
Shobha is a knowledge analyst with a confirmed monitor report of creating revolutionary machine studying options that drive enterprise worth.


