{"id":2590,"date":"2025-10-14T00:16:25","date_gmt":"2025-10-14T00:16:25","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/14\/docker-model-runner-on-the-new-nvidia-dgx-spark-a-new-paradigm-for-developing-ai-locally\/"},"modified":"2025-10-14T00:16:25","modified_gmt":"2025-10-14T00:16:25","slug":"docker-model-runner-on-the-new-nvidia-dgx-spark-a-new-paradigm-for-developing-ai-locally","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/14\/docker-model-runner-on-the-new-nvidia-dgx-spark-a-new-paradigm-for-developing-ai-locally\/","title":{"rendered":"Docker Model Runner on the new NVIDIA DGX Spark: a new paradigm for developing AI locally"},"content":{"rendered":"<p>We\u2019re thrilled to bring NVIDIA DGX\u2122 Spark support to <a href=\"https:\/\/www.docker.com\/products\/model-runner\/\">Docker Model Runner<\/a>. The new NVIDIA DGX Spark delivers incredible performance, and Docker Model Runner makes it accessible. With Model Runner, you can easily run and iterate on larger models right on your local machine, using the same intuitive Docker experience you already trust.<\/p>\n<p>In this post, we\u2019ll show how DGX Spark and Docker Model Runner work together to make local model development faster and simpler, covering the unboxing experience, how to set up Model Runner, and how to use it in real-world developer workflows.<\/p>\n<h2 class=\"wp-block-heading\">What is NVIDIA DGX Spark<\/h2>\n<p>NVIDIA DGX Spark is the newest member of the DGX family: a compact, workstation-class AI system, powered by the Grace Blackwell GB10 Superchip\u00a0 that delivers incredible\u00a0 performance for local model development. Designed for researchers and developers, it makes prototyping, fine-tuning, and serving large models fast and effortless, all without relying on the cloud.<\/p>\n<p>Here at Docker, we were fortunate to get a preproduction version of\u00a0 DGX Spark. And yes, it\u2019s every bit as impressive in person as it looks in NVIDIA\u2019s launch materials.<\/p>\n\n<h2 class=\"wp-block-heading\">Why Run Local AI Models and How Docker Model Runner and NVIDIA DGX Spark Make It Easy\u00a0<\/h2>\n<p>Many of us at Docker and across the broader developer community are experimenting with local AI models<strong>. <\/strong>Running locally has clear advantages:<\/p>\n<p><strong>Data privacy and control<\/strong>: no external API calls; everything stays on your machine<\/p>\n<p><strong>Offline availability<\/strong>: work from anywhere, even when you\u2019re disconnected<\/p>\n<p>\u00a0<strong>Ease of customization<\/strong>: experiment with prompts, adapters, or fine-tuned variants without relying on remote infrastructure<\/p>\n<p>But there are also familiar tradeoffs:<\/p>\n<p>Local GPUs and memory can be limiting for large models<\/p>\n<p>Setting up CUDA, runtimes, and dependencies often eats time<\/p>\n<p>Managing security and isolation for AI workloads can be complex<\/p>\n<p>This is where DGX Spark and Docker Model Runner (DMR) shine. DMR provides an easy and secure way to run AI models in a sandboxed environment, fully integrated with Docker Desktop or Docker Engine. When combined with the DGX Spark\u2019s NVIDIA AI software stack and large 128GB unified memory, you get the best of both worlds: plug-and-play GPU acceleration and Docker-level simplicity.<\/p>\n\n<h2 class=\"wp-block-heading\">Unboxing NVIDIA DGX Spark<\/h2>\n<p>The device arrived well-packaged, sleek, and surprisingly small, resembling more a mini-workstation than a server.<\/p>\n<p>Setup was refreshingly straightforward: plug in power, network, and peripherals, then boot into NVIDIA DGX OS, which includes NVIDIA drivers, CUDA, and AI software stack pre-installed.<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p>Once on the network, enabling SSH access makes it easy to integrate the Spark into your existing workflow.<\/p>\n<p>This way, the DGX Spark becomes an AI co-processor for your everyday development environment, augmenting, not replacing, your primary machine.<\/p>\n<h2 class=\"wp-block-heading\">Getting Started with Docker Model Runner on NVIDIA DGX Spark<\/h2>\n<p>Installing Docker Model Runner on the DGX Spark is simple and can be done in a matter of minutes.<\/p>\n<h3 class=\"wp-block-heading\">1. Verify Docker CE is Installed<\/h3>\n<p>DGX OS comes with Docker Engine (CE) preinstalled. Confirm you have it:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker version\n<\/div>\n<p>If it\u2019s missing or outdated, install following the regular <a href=\"https:\/\/docs.docker.com\/engine\/install\/ubuntu\/\" target=\"_blank\">Ubuntu installation instructions<\/a>.<\/p>\n\n<h3 class=\"wp-block-heading\">2. Install the Docker Model CLI Plugin<\/h3>\n<p>The Model Runner CLI is distributed as a Debian package via Docker\u2019s apt repository. Once the repository is configured (see linked instructions above) install via the following commands:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nsudo apt-get update<br \/>\nsudo apt-get apt-get install docker-model-plugin\n<\/div>\n<p>Or use Docker\u2019s handy installation script:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ncurl -fsSL https:\/\/get.docker.com | sudo bash\n<\/div>\n<p>You can confirm it\u2019s installed with:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model version\n<\/div>\n<h3 class=\"wp-block-heading\">3. Pull and Run a Model<\/h3>\n<p>Now that the plugin is installed, let\u2019s pull a model from the <a href=\"https:\/\/hub.docker.com\/u\/ai\" target=\"_blank\">Docker Hub AI Catalog<\/a>. For example, the<strong> Qwen 3 Coder <\/strong>model:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker model pull ai\/qwen3-coder\n<\/div>\n<p>The Model Runner container will automatically expose an OpenAI-compatible endpoint at:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nhttp:\/\/localhost:12434\/engines\/v1\n<\/div>\n<p>You can verify it\u2019s live with a quick test:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n# Test via API\n<p>curl http:\/\/localhost:12434\/engines\/v1\/chat\/completions \u00a0 -H &#8216;Content-Type: application\/json&#8217; \u00a0 -d<br \/>\n&#8216;{&#8220;model&#8221;:&#8221;ai\/qwen3-coder&#8221;,&#8221;messages&#8221;:[{&#8220;role&#8221;:&#8221;user&#8221;,&#8221;content&#8221;:&#8221;Hello!&#8221;}]}&#8217;<\/p>\n<p># Or via CLI<br \/>\ndocker model run ai\/qwen3-coder\n<\/p><\/div>\n\n<p>GPUs are allocated to the Model Runner container via<strong> <\/strong>nvidia-container-runtime and the Model Runner will take advantage of any available GPUs automatically. To see GPU usage:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nnvidia-smi\n<\/div>\n<h3 class=\"wp-block-heading\">4. Architecture Overview<\/h3>\n<p>Here\u2019s what\u2019s happening under the hood:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n[ DGX Spark Hardware (GPU + Grace CPU) ]\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u2502<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0(NVIDIA Container Runtime)<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u2502<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0[ Docker Engine (CE) ]<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u2502<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0[ Docker Model Runner Container ]<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u2502<\/p>\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0OpenAI-compatible API :12434\n<\/p><\/div>\n\n<p>The NVIDIA Container Runtime bridges the NVIDIA GB10 Grace Blackwell Superchip drivers and Docker Engine, so containers can access CUDA directly. Docker Model Runner then runs inside its own container, managing the model lifecycle and providing the standard OpenAI API endpoint. (For more info on Model Runner architecture, see this <a href=\"https:\/\/www.docker.com\/blog\/how-we-designed-model-runner-and-whats-next\/\">blog<\/a>).<\/p>\n<p>From a developer\u2019s perspective, interact with models similarly to any other Dockerized service \u2014 docker model pull, list, inspect, and run all work out of the box.<\/p>\n<h2 class=\"wp-block-heading\">Using Local Models in Your Daily Workflows<\/h2>\n<p>If you\u2019re using a laptop or desktop as your primary machine, the DGX Spark can act as your <strong>remote model host<\/strong>. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.<\/p>\n<h3 class=\"wp-block-heading\">1. Forward the DMR Port (for Model Access)<\/h3>\n<p>To access the DGX Spark via SSH first set up an SSH server:<\/p>\n<p>Using Local Models in Your Daily Workflows<br \/>If you\u2019re using a laptop or desktop as your primary machine, the DGX Spark can act as your <strong>remote model host<\/strong>. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nsudo apt install openssh-server<br \/>\nsudo systemctl enable &#8211;now ssh\n<\/div>\n<p>Run the following command to access Model Runner via your local machine. <strong>Replace user with the username you configured when you first booted the DGX Spark and replace dgx-spark.local with the IP address of the DGX Spark on your local network or a hostname configured in \/etc\/hosts.\u00a0<\/strong><\/p>\n\n<div class=\"wp-block-syntaxhighlighter-code \">\nssh -N -L localhost:12435:localhost:12434 user@dgx-spark.local\n<\/div>\n<p>This forwards the Model Runner API from the DGX Spark to your local machine.<br \/>Now, in your IDE, CLI tool, or app that expects an OpenAI-compatible API, just point it to:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nhttp:\/\/localhost:12435\/engines\/v1\/models\n<\/div>\n<p>Set the model name (e.g. ai\/qwen3-coder) and you\u2019re ready to use local inference seamlessly.<\/p>\n\n<h3 class=\"wp-block-heading\">2. Forward the DGX Dashboard Port (for Monitoring)<\/h3>\n<p>The DGX Spark exposes a lightweight browser dashboard showing real-time GPU, memory, and thermal stats, usually served locally at:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nhttp:\/\/localhost:11000\n<\/div>\n\n<p>You can forward it through the same SSH session or a separate one:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nssh -N -L localhost:11000:localhost:11000 user@dgx-spark.local\n<\/div>\n<p>Then open <a href=\"http:\/\/localhost:11000\/\" target=\"_blank\">http:\/\/localhost:11000<\/a> in your browser on your main workstation to monitor the DGX Spark performance while running your models.<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p>This combination makes the DGX Spark feel like a remote, GPU-powered extension of your development environment. Your IDE or tools still live on your laptop, while model execution and resource-heavy workloads happen securely on the Spark.<\/p>\n<h2 class=\"wp-block-heading\">Example application: Configuring Opencode with Qwen3-Coder<\/h2>\n<p>Let\u2019s make this concrete.<\/p>\n<p>Suppose you use <a href=\"https:\/\/github.com\/sst\/opencode\" target=\"_blank\">OpenCode<\/a>, an open-source, terminal-based AI coding agent.<\/p>\n<p>Once your DGX Spark is running Docker Model Runner with ai\/qwen3-coder pulled and the port is forwarded, you can configure OpenCode by adding the following to ~\/.config\/opencode\/opencode.json<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n{<br \/>\n\u00a0\u00a0&#8220;$schema&#8221;: &#8220;https:\/\/opencode.ai\/config.json&#8221;,<br \/>\n\u00a0\u00a0&#8220;provider&#8221;: {<br \/>\n\u00a0\u00a0\u00a0\u00a0&#8220;dmr&#8221;: {<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8220;npm&#8221;: &#8220;@ai-sdk\/openai-compatible&#8221;,<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8220;name&#8221;: &#8220;Docker Model Runner&#8221;,<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8220;options&#8221;: {<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8220;baseURL&#8221;: &#8220;http:\/\/localhost:12435\/engines\/v1&#8221; \u00a0 \/\/ DMR\u2019s OpenAI-compatible base<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0},<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8220;models&#8221;: {<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0&#8220;ai\/qwen3-coder&#8221;: { &#8220;name&#8221;: &#8220;Qwen3 Coder&#8221; }<br \/>\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0}<br \/>\n\u00a0\u00a0\u00a0\u00a0}<br \/>\n\u00a0\u00a0},<br \/>\n\u00a0\u00a0&#8220;model&#8221;: &#8220;ai\/qwen3-coder&#8221;<br \/>\n}\n<\/div>\n<p>Now run opencode and select Qwen3 Coder with the \/models command:<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p>That\u2019s it! Completions and chat requests will be routed through Docker Model Runner on your DGX Spark, meaning Qwen3-Coder now powers your agentic development experience locally.<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p>You can verify that the model is running by opening<a href=\"http:\/\/localhost:11000\/\" target=\"_blank\"> http:\/\/localhost:11000<\/a> (the DGX dashboard) to watch GPU utilization in real time while coding.<br \/>This setup lets you:<\/p>\n<p>Keep your laptop light while leveraging the DGX Spark GPUs<\/p>\n<p>Experiment with custom or fine-tuned models through DMR<\/p>\n<p>Stay fully within your local environment for privacy and cost-control<\/p>\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n<p>Running Docker Model Runner on the NVIDIA DGX Spark makes it remarkably easy to turn powerful local hardware into a seamless extension of your everyday Docker workflow.<\/p>\n<p>You install one plugin and use familiar Docker commands (docker model pull, docker model run).<br \/>You get full GPU acceleration through NVIDIA\u2019s container runtime.<br \/>You can forward both the model API and monitoring dashboard to your main workstation for effortless development and visibility.<\/p>\n<p>This setup bridges the gap between developer productivity and AI infrastructure, giving you the speed, privacy, and flexibility of local execution with the reliability and simplicity Docker provides.<\/p>\n<p>As local model workloads continue to grow, the DGX Spark + Docker Model Runner combo represents a practical, developer-friendly way to bring serious AI compute to your desk \u2014 no data center or cloud dependency required.<\/p>\n<p><strong>Learn more<\/strong>:<\/p>\n<p>Read the official announcement of DGX Spark launch on <a href=\"https:\/\/nvidianews.nvidia.com\/\" target=\"_blank\">NVIDIA newsroom<\/a><\/p>\n<p>Check out the Docker Model Runner General Availability<a href=\"https:\/\/www.docker.com\/blog\/announcing-docker-model-runner-ga\/\"> announcement<\/a><\/p>\n<p>Visit our<a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\"> Model Runner GitHub repo<\/a>. Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!\u00a0Star, fork and contribute.<\/p>","protected":false},"excerpt":{"rendered":"<p>We\u2019re thrilled to bring NVIDIA DGX\u2122 Spark support to Docker Model Runner. The new NVIDIA DGX Spark delivers incredible performance, [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2590","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2590","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2590"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2590\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}