{"id":2143,"date":"2025-06-18T14:19:51","date_gmt":"2025-06-18T14:19:51","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/06\/18\/behind-the-scenes-how-we-designed-docker-model-runner-and-whats-next\/"},"modified":"2025-06-18T14:19:51","modified_gmt":"2025-06-18T14:19:51","slug":"behind-the-scenes-how-we-designed-docker-model-runner-and-whats-next","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/06\/18\/behind-the-scenes-how-we-designed-docker-model-runner-and-whats-next\/","title":{"rendered":"Behind the scenes: How we designed Docker Model Runner and what\u2019s next"},"content":{"rendered":"<p>The last few years have made it clear that AI models will continue to be a fundamental component of many applications. The catch is that they\u2019re also a fundamentally <em>different<\/em> type of component, with complex software and hardware requirements that don\u2019t (yet) fit neatly into the constraints of container-oriented development lifecycles and architectures. To help address this problem, Docker launched the <a href=\"https:\/\/docs.docker.com\/model-runner\/\" target=\"_blank\">Docker Model Runner<\/a> with Docker Desktop 4.40. Since then, we\u2019ve been working aggressively to expand Docker Model Runner with additional OS and hardware support, deeper integration with popular Docker tools, and improvements to both performance and usability.<br \/>For those interested in Docker Model Runner and its future, we offer a behind-the-scenes look at its design, development, and roadmap. <\/p>\n<div class=\"style-standard wp-block-ponyo-houston\">\n<div class=\"wp-block-ponyo-icon\">\n<\/div>\n<p><strong>Note:<\/strong> Docker Model Runner is really <em>two<\/em> components: the model runner and the model distribution specification. In this article, we\u2019ll be covering the former, but be sure to check out the <a href=\"https:\/\/www.docker.com\/blog\/why-docker-chose-oci-artifacts-for-ai-model-packaging\/\">companion blog post<\/a> by Emily Casey for the equally important distribution side of the story.<\/p>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Design goals<\/strong><\/h2>\n<p>Docker Model Runner\u2019s primary design goal was to allow users to <a href=\"https:\/\/www.docker.com\/blog\/run-llms-locally\/\">run AI models locally<\/a> and to access them from both containers and host processes. While that\u2019s simple enough to articulate, it still leaves an enormous design space in which to find a solution. Fortunately, we had some additional constraints: we were a small engineering team, and we had some ambitious timelines. Most importantly, we didn\u2019t want to compromise on UX, even if we couldn\u2019t deliver it all at once. In the end, this motivated design decisions that have so far allowed us to deliver a viable solution while leaving plenty of room for future improvement.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Multiple backends<\/strong><\/h3>\n<p>One thing we knew early on was that we weren\u2019t going to write our own inference engine (Docker\u2019s wheelhouse is containerized development, not low-level inference engines). We\u2019re also big proponents of open-source, and there were just so many great existing solutions! There\u2019s <a href=\"https:\/\/github.com\/ggml-org\/llama.cpp\" target=\"_blank\">llama.cpp<\/a>, <a href=\"https:\/\/github.com\/vllm-project\/vllm\" target=\"_blank\">vLLM<\/a>, <a href=\"https:\/\/github.com\/ml-explore\/mlx\" target=\"_blank\">MLX<\/a>, <a href=\"https:\/\/onnx.ai\/\" target=\"_blank\">ONNX<\/a>, and <a href=\"https:\/\/pytorch.org\/\" target=\"_blank\">PyTorch<\/a>, just to name a few.<\/p>\n<p>Of course, being spoiled for choice can also be a curse \u2014 which to choose? The obvious answer was: as many as possible, but not all at once.<\/p>\n<p>We decided to go with llama.cpp for our initial implementation, but we intentionally designed our APIs with an additional, optional path component (the <strong>{name}<\/strong> in <strong>\/engines\/{name}<\/strong>) to allow users to take advantage of multiple future backends. We also designed interfaces and stubbed out implementations for other backends to enforce good development hygiene and to avoid becoming tethered to one \u201cinitial\u201d implementation.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>OpenAI API compatibility<\/strong><\/h3>\n<p>The second design choice we had to make was how to expose inference to consumers in containers. While there was also a fair amount of choice in the inference API space, we found that the OpenAI API standard seemed to offer the best initial tooling compatibility. We were also motivated by the fact that several teams inside Docker were already using this API for various real-world products. While we may support additional APIs in the future, we\u2019ve so far found that this API surface is sufficient for most applications. One gap that we know exists is <em>full<\/em> compatibility with this API surface, which is something we\u2019re working on iteratively.<\/p>\n<p>This decision also drove our choice of llama.cpp as our initial backend. The llama.cpp project already offered a turnkey option for OpenAI API compatibility through its <a href=\"https:\/\/github.com\/ggml-org\/llama.cpp\/tree\/master\/tools\/server\" target=\"_blank\">server<\/a> implementation. While we had to make some small modifications (e.g. Unix domain socket support), this offered us the fastest path to a solution. We\u2019ve also started contributing these small patches upstream, and we hope to expand our contributions to these projects in the future.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>First-class citizenship for models in the Docker API<\/strong><\/h3>\n<p>While the OpenAI API standard was the most ubiquitous option amongst existing tooling, we also knew that we wanted models to be first-class citizens in the <a href=\"https:\/\/docs.docker.com\/reference\/api\/engine\/version\/v1.50\/\" target=\"_blank\">Docker Engine API<\/a>. Models have a fundamentally different execution lifecycle than the processes that typically make up the <strong>ENTRYPOINT<\/strong>s of containers, and thus, they don\u2019t fit well under the standard <strong>\/containers<\/strong> endpoints of the Docker Engine API. However, much like containers, images, networks, and volumes, models are such a fundamental component that they really deserve their own API resource type. This motivated the addition of a set of <strong>\/models<\/strong> endpoints, closely modeled after the <strong>\/images<\/strong> endpoints, but separate for reasons that are best discussed in the distribution blog post.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>GPU acceleration<\/strong><\/h3>\n<p>Another critical design goal was support for GPU acceleration of inference operations. Even the smallest useful models are extremely computationally demanding, while more sophisticated models (such as those with tool-calling capabilities) would be a stretch to fit onto local hardware at all. GPU support was going to be non-negotiable for a useful experience.<\/p>\n\n<p>Unfortunately, passing GPUs across the VM boundary in Docker Desktop, especially in a way that would be reliable across platforms and offer a usable computation API inside containers, was going to be either impossible or very flaky.<\/p>\n\n<p>As a compromise, we decided to run inference operations outside of the Docker Desktop VM and simply proxy API calls from the VM to the host. While there are some risks with this approach, we are working on initiatives to mitigate these with containerd-hosted sandboxing on macOS and Windows. Moreover, with Docker-provided models and application-provided prompts, the risk is somewhat lower, especially given that inference consists primarily of numerical operations. We assess the risk in Docker Desktop to be about on par with accessing host-side services via <strong>host.docker.internal<\/strong> (something already enabled by default).<\/p>\n\n<p>However, agents that drive tool usage with model output can cause more significant side effects, and that\u2019s something we needed to address. Fortunately, using the <a href=\"https:\/\/www.docker.com\/blog\/announcing-docker-mcp-catalog-and-toolkit-beta\/\">Docker MCP Toolkit<\/a>, we\u2019re able to perform tool invocation inside ephemeral containers, offering reliable encapsulation of the side effects that models might drive. This hybrid approach allows us to offer the best possible local performance with relative peace of mind when using tools.<\/p>\n\n<p>Outside the context of Docker Desktop, for example, in Docker CE, we\u2019re in a significantly better position due to the lack of a VM boundary (or at least a very transparent VM boundary in the case of a hypervisor) between the host hardware and containers. When running in standalone mode in Docker CE, the Docker Model Runner will have direct access to host hardware (e.g. via the <a href=\"https:\/\/docs.nvidia.com\/datacenter\/cloud-native\/container-toolkit\/latest\/index.html\" target=\"_blank\">NVIDIA Container Toolkit<\/a>) and will run inference operations within a container.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Modularity, iteration, and open-sourcing<\/strong><\/h3>\n<p>As previously mentioned, the Docker Model Runner team is relatively small, which meant that we couldn\u2019t rely on a monolithic architecture if we wanted to effectively parallelize the development work for Docker Model Runner. Moreover, we had an early and overarching directive: open-source as much as possible.<\/p>\n\n<p>We decided on three high-level components around which we could organize development work: <a href=\"https:\/\/github.com\/docker\/model-runner\" target=\"_blank\">the model runner<\/a>, <a href=\"https:\/\/github.com\/docker\/model-distribution\" target=\"_blank\">the model distribution tooling<\/a>, and <a href=\"https:\/\/github.com\/docker\/model-cli\" target=\"_blank\">the model CLI plugin<\/a>.<\/p>\n<p>Breaking up these components allowed us to divide work more effectively, iterate faster, and define clean API boundaries between different concerns. While there have been some tricky dependency hurdles (in particular when integrating with closed-source components), we\u2019ve found that the modular approach has facilitated faster incremental changes and support for new platforms.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>The High-Level Architecture<\/strong><\/h2>\n<p>At a high level, the Docker Model Runner architecture is composed of the three components mentioned above (the runner, the distribution code, and the CLI), but there are also some interesting sub-components within each:<\/p>\n<div class=\"wp-block-ponyo-image\">\n<\/div>\n<p><strong>Figure 1: Docker Model Runner high-level architecture<\/strong><\/p>\n\n<p>How these components are packaged and hosted (and how they interact) also depends on the platform where they\u2019re deployed. In each case it looks slightly different. Sometimes they run on the host, sometimes they run in a VM, sometimes they run in a container, but the overall architecture looks the same.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Model storage and client<\/strong><\/h3>\n<p>The core architectural component is the <strong>model store<\/strong>. This component, provided by the model distribution code, is where the actual model tensor files are stored. These files are stored differently (and separately) from images because (1) they\u2019re high-entropy and not particularly compressible and (2) the inference engine needs direct access to the files so that it can do things like mapping them into its virtual address space via <strong>mmap()<\/strong>. For more information, see the accompanying model distribution blog post.<\/p>\n\n<p>The model distribution code also provides the <strong>model distribution client<\/strong>. This component performs operations (such as pulling models) using the model distribution protocol against OCI registries.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Model runner<\/strong><\/h3>\n<p>Built on top of the model store is the <strong>model runner<\/strong>. The model runner maps inbound inference API requests (e.g. <strong>\/v1\/chat\/completions<\/strong> or <strong>\/v1\/embeddings<\/strong> requests) to processes hosting pairs of inference engines and models. It includes scheduler, loader, and runner components that coordinate the loading of models in and out of memory so that concurrent requests can be serviced, even if models can\u2019t be loaded simultaneously (e.g. due to resource constraints). This makes the execution lifecycle of models different from that of containers, with engines and models operating as ephemeral processes (mostly hidden from users) that can be terminated and unloaded from memory as necessary (or when idle). A different backend process is run for each combination of engine (e.g. <strong>llama.cpp<\/strong>) and model (e.g. <strong>ai\/qwen3:8B-Q4_K_M<\/strong>) as required by inference API requests (though multiple requests targeting the same pair will reuse the same runner and backend processes if possible).<\/p>\n\n<p>The runner also includes an installer service that can dynamically download backend binaries and libraries, allowing users to selectively enable features (such as CUDA support) that might require downloading hundreds of MBs of dependencies.<\/p>\n\n<p>Finally, the model runner serves as the central server for all Docker Model Runner APIs, including the <strong>\/models<\/strong> APIs (which it routes to the model distribution code) and the <strong>\/engines<\/strong> APIs (which it routes to its scheduler). This API server will always opt to hold in-flight requests until the resources (primarily RAM or VRAM) are available to service them, rather than returning something like a <strong>503<\/strong> response. This is critical for a number of usage patterns, such multiple agents running with different models or concurrent requests for both embedding and completion.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Model CLI<\/strong><\/h3>\n<p>The primary user-facing component of the Docker Model Runner architecture is the <strong>model CLI<\/strong>. This component is a standard <a href=\"https:\/\/pkg.go.dev\/github.com\/docker\/cli\/cli-plugins\" target=\"_blank\">Docker CLI plugin<\/a> that offers an interface very similar to the <strong>docker image<\/strong> command. While the lifecycle of model execution is different from that of containers, the concepts (such as pushing, pulling, and running) should be familiar enough to existing Docker users.<\/p>\n\n<p>The model CLI communicates with the model runner\u2019s APIs to perform almost all of its operations (though the transport for that communication varies by platform). The model CLI is context-aware, allowing it to determine if it\u2019s talking to a Docker Desktop model runner, Docker CE model runner, or a model runner on some custom platform. Because we\u2019re using the standard Docker CLI plugin framework, we get all of the standard <a href=\"https:\/\/docs.docker.com\/engine\/manage-resources\/contexts\/\" target=\"_blank\">Docker Context<\/a> functionality for free, making this detection much easier.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>API design and routing<\/strong><\/h3>\n<p>As previously mentioned, the Docker Model Runner comprises two sets of APIs: the Docker-style APIs and the OpenAI-compatible APIs. The Docker-style APIs (modeled after the <strong>\/image<\/strong> APIs) include the following endpoints:<\/p>\n<p><strong>POST \/models\/create<\/strong> (Model pulling)<\/p>\n<p><strong>GET \/models<\/strong> (Model listing)<\/p>\n<p><strong>GET \/models\/{namespace}\/{name}<\/strong> (Model metadata)<\/p>\n<p><strong>DELETE \/models\/{namespace}\/{name}<\/strong> (Model deletion)<\/p>\n<p>The bodies for these requests look very similar to their image analogs. There\u2019s no documentation at the moment, but you can get a glimpse of the format by looking at their <a href=\"https:\/\/github.com\/docker\/model-runner\/blob\/main\/pkg\/inference\/models\/api.go\" target=\"_blank\">corresponding Go types<\/a>.<\/p>\n<p>In contrast, the OpenAI endpoints follow a different but still RESTful convention:<\/p>\n<p><strong>GET \/engines\/{engine}\/v1\/models<\/strong> (OpenAI-format model listing)<\/p>\n<p><strong>GET \/engines\/{engine}\/v1\/models\/{namespace}\/{name}<\/strong> (OpenAI-format model metadata)<\/p>\n<p><strong>POST \/engines\/{engine}\/v1\/chat\/completions<\/strong> (Chat completions)<\/p>\n<p><strong>POST \/engines\/{engine}\/v1\/completions<\/strong> (Chat completions (legacy endpoint))<\/p>\n<p><strong>POST \/engines\/{engine}\/v1\/embeddings<\/strong> (Create embeddings)<\/p>\n<p>At this point in time, only one <strong>{engine}<\/strong> value is supported (<strong>llama.cpp<\/strong>), and it can also be omitted to use the default (<strong>llama.cpp<\/strong>) engine.<\/p>\n<p>We make these APIs available on several different endpoints:<\/p>\n<p>First, in Docker Desktop, they\u2019re available on the Docker socket (<strong>\/var\/run\/docker.sock<\/strong>), both inside and outside containers. This is in service of our design goal of having models as a first-class citizen in the Docker Engine API. At the moment, these endpoints are prefixed with a <strong>\/exp\/vDD4.40<\/strong> path (to avoid dependencies on APIs that may evolve during development), but we\u2019ll likely remove this prefix in the next few releases since these APIs have now mostly stabilized and will evolve in a backward-compatible way.<\/p>\n<p>Second, also in Docker Desktop, we make the APIs available on a special <strong>model-runner.docker.internal<\/strong> endpoint that\u2019s accessible just from containers (though not currently from <a href=\"https:\/\/docs.docker.com\/security\/for-admins\/hardened-desktop\/enhanced-container-isolation\/\" target=\"_blank\">ECI<\/a> containers, because we want to have inference sandboxing implemented first). This TCP-based endpoint exposes just the <strong>\/models<\/strong> and <strong>\/engines<\/strong> API endpoints (not the whole Docker API) and is designed to serve existing tooling (which likely can\u2019t access APIs via a Unix domain socket). No <strong>\/exp\/vDD4.40<\/strong> prefix is used in this case.<\/p>\n<p>Finally, in both Docker Desktop and Docker CE, we make the <strong>\/models<\/strong> and <strong>\/engines<\/strong> API endpoints available on a host TCP endpoint (<strong>localhost:12434<\/strong>, by default, again without any <strong>\/exp\/vDD4.40<\/strong> prefix). In Docker Desktop this is optional and not enabled by default. In Docker CE, it\u2019s a critical component of how the API endpoints are accessed, because we currently lack the integration to add endpoints to Docker CE\u2019s <strong>\/var\/run\/docker.sock<\/strong> or to inject a custom <strong>model-runner.docker.internal<\/strong> hostname, so we advise using the standard <strong>172.17.0.1<\/strong> host gateway address to access this localhost-exposed port (e.g. setting your OpenAI API base URL to <strong>http:\/\/172.17.0.1:12434\/engines\/v1<\/strong>). Hopefully we\u2019ll be able to unify this across Docker platforms in the near future (see our roadmap below).<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>First up: Docker Desktop<\/strong><\/h2>\n<p>The natural first step for Docker Model Runner was integration into Docker Desktop. In Docker Desktop, we have more direct control over integration with the Docker Engine, as well as existing processes that we can use to host the model runner components. In this case, the model runner and model distribution components live in the Docker Desktop host backend process (the <strong>com.docker.backend<\/strong> process you may have seen running) and we use special middleware and networking magic to route requests on <strong>\/var\/run\/docker.sock<\/strong> and <strong>model-runner.docker.internal<\/strong> to the model runner\u2019s API server. Since the individual inference backend processes run as subprocesses of <strong>com.docker.backend<\/strong>, there\u2019s no risk of a crash in Docker Desktop if, for example, an inference backend is killed by an Out Of Memory (OOM) error.<\/p>\n<p>We started initially with support for macOS on Apple Silicon, because it provided the most uniform platform for developing the model runner functionality, but we implemented most of the functionality along the way to build and test for all Docker Desktop platforms. This made it significantly easier to port to Windows on AMD64 and ARM64 platforms, as well as the GPU variations that we found there.<\/p>\n<p>The one complexity with Windows was the larger size of the supporting library dependencies for the GPU-based backends. It wouldn\u2019t have been feasible (or tolerated) if we added another 500 MB \u2013 1 GB to the Docker Desktop for Windows installer, so we decided to default to a CPU-based backend in Docker Desktop for Windows with optional support for the GPU backend. This was the primary motivating factor for the dynamic installer component of the model runner (in addition to our desire for incremental updates to different backends).<\/p>\n<p>This all sounds like a very well-planned exercise, and we did indeed start with a three-component design and strictly enforced API boundaries, but in truth we started with the model runner service code as a sub-package of the Docker Desktop source code. This made it much easier to iterate quickly, especially as we were exploring the architecture for the different services. Fortunately, by sticking to a relatively strict isolation policy for the code, and enforcing clean dependencies through APIs and interfaces, we were able to easily extract the code (kudos to the excellent <a href=\"https:\/\/github.com\/newren\/git-filter-repo\" target=\"_blank\">git-filter-repo<\/a> tool) into a separate repository for the purposes of open-sourcing.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>Next stop: Docker CE<\/strong><\/h2>\n<p>Aside from Docker\u2019s penchant for open-sourcing, one of the main reasons that we wanted to make the Docker Model Runner source code publicly available was to support integration into Docker CE. Our goal was to package the <strong>docker model<\/strong> command in the same way as <strong>docker buildx<\/strong> and <strong>docker compose<\/strong>.<\/p>\n<p>The trick with Docker CE is that we wanted to ship Docker Model Runner as a \u201cvanilla\u201d Docker CLI plugin (i.e. without any special privileges or API access), which meant that we didn\u2019t have a backend process that could host the model runner service. However, in the Docker CE case, the boundary between host hardware and container processes is much less disruptive, meaning that we could actually run Docker Model Runner in a container and simply make any accelerator hardware available to it directly. So, much like a standalone BuildKit builder container, we run the Docker Model Runner as a standalone container in Docker CE, with a special named volume for model storage (meaning you can uninstall the runner without having to re-pull models). This \u201cinstallation\u201d is performed by the model CLI automatically (and when necessary) by pulling the <strong>docker\/model-runner<\/strong> image and starting a container. Explicit configuration for the runner can also be specified using the <strong>docker model install-runner<\/strong> command. If you want, you can also remove the model runner (and optionally the model storage) using <strong>docker model uninstall-runner<\/strong>.<\/p>\n<p>This unfortunately leads to one small compromise with the UX: we don\u2019t currently support the model runner APIs on <strong>\/var\/run\/docker.sock<\/strong> or on the special <strong>model-runner.docker.internal<\/strong> URL. Instead, the model runner API server listens on the host system\u2019s loopback interface at <strong>localhost:12434<\/strong> (by default), which is available inside most containers at <strong>172.17.0.1:12434<\/strong>. If desired, users can also make this available on <strong>model-runner.docker.internal:12434<\/strong> by utilizing something like <strong>\u2013add-host=model-runner.docker.internal:host-gateway<\/strong> when running <strong>docker run<\/strong> or <strong>docker create<\/strong> commands. This can also be achieved by using the <strong>extra_hosts<\/strong> key in a Compose YAML file. We have plans to make this more ergonomic in future releases.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>The road ahead\u2026<\/strong><\/h2>\n<p>The status quo is Docker Model Runner support in Docker Desktop on macOS and Windows and support for Docker CE on Linux (including WSL2), but that\u2019s definitely not the end of the story. Over the next few months, we have a number of initiatives planned that we think will reshape the user experience, performance, and security of Docker Model Runner.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Additional GUI and CLI functionality<\/strong><\/h3>\n<p>The most visible functionality coming out over the next few months will be in the model CLI and the \u201cModels\u201d tab in the Docker Desktop dashboard. Expect to see new commands (such as <strong>df<\/strong>, <strong>ps<\/strong>, and <strong>unload<\/strong>) that will provide more direct support for monitoring and controlling model execution. Also, expect to see new and expanded layouts and functionality in the Models tab.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Expanded OpenAI API support<\/strong><\/h3>\n<p>A less-visible but equally important aspect of the Docker Model Runner user experience is our compatibility with the OpenAI API. There are dozens of endpoints and parameters to support (and we already support many), so we will work to expand API surface compatibility with a focus on practical use cases and prioritization of compatibility with existing tools.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>containerd and Moby integration<\/strong><\/h3>\n<p>One of the longer-term initiatives that we\u2019re looking at is integration with <strong>containerd<\/strong>. containerd already provides a modular runtime system that allows for task execution coordinated with storage. We believe this is the right way forward and that it will allow us to better codify the relationship between model storage, model execution, and model execution sandboxing.<\/p>\n<p>In combination with the containerd work, we would also like tighter integration with the Moby project. While our existing Docker CE integration offers a viable and performant solution, we believe that better ergonomics could be achieved with more direct integration. In particular, niceties like support for <strong>model-runner.docker.internal<\/strong> DNS resolution in Docker CE are on our radar. Perhaps the biggest win from this tighter integration would be to expose Docker Model Runner APIs on the Docker socket and to include the API endpoints (e.g. <strong>\/models<\/strong>) in the official <a href=\"https:\/\/docs.docker.com\/reference\/api\/engine\/version\/v1.50\/\" target=\"_blank\">Docker Engine API documentation<\/a>.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Kubernetes<\/strong><\/h3>\n<p>One of the product goals for Docker Model Runner was a consistent experience from development inner loop to production, and Kubernetes is inarguably a part of that path. The existing Docker Model Runner images that we\u2019re using for Docker CE will also work within a Kubernetes cluster, and we\u2019re currently developing instructions to set up a Docker Model Runner instance in a Kubernetes cluster. The big difference with Kubernetes is the variety of cluster and application architectures in use, so we\u2019ll likely end up with different \u201crecipes\u201d for how to configure the Docker Model Runner in different scenarios.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>vLLM<\/strong><\/h3>\n<p>One of the things we\u2019ve heard from a number of customers is that vLLM forms a core component of their production stack. This was also the first alternate backend that we stubbed out in the model runner repository, and the time has come to start poking at an implementation.<\/p>\n\n<h3 class=\"wp-block-heading\"><strong>Even more to come\u2026<\/strong><\/h3>\n<p>Finally, there are some bits that we just can\u2019t talk about yet, but they will fundamentally shift the way that developers interact with models. Be sure to tune-in to Docker\u2019s sessions at <a href=\"https:\/\/www.wearedevelopers.com\/world-congress\" target=\"_blank\">WeAreDevelopers<\/a> from July 9\u201311 for some exciting announcements around AI-related initiatives at Docker.<\/p>\n\n<h3 class=\"wp-block-heading\">Learn more<\/h3>\n<p>Explore the <a href=\"https:\/\/www.docker.com\/blog\/oci-artifacts-for-ai-model-packaging\/\">story<\/a> behind our model distribution specification<\/p>\n<p>Read our quickstart guide to <a href=\"https:\/\/www.docker.com\/blog\/run-llms-locally\/\">Docker Model Runner<\/a>.<\/p>\n<p>Find documentation for <a href=\"https:\/\/docs.docker.com\/model-runner\/\" target=\"_blank\">Model Runner<\/a>.<\/p>\n<p>Subscribe to the <a href=\"https:\/\/www.docker.com\/newsletter-subscription\/\">Docker Navigator Newsletter<\/a>.<\/p>\n<p>New to Docker? <a href=\"https:\/\/hub.docker.com\/signup?_gl=1*1v81gq1*_gcl_au*MTQxNjU3MjYxNS4xNzQyMjI1MTk2*_ga*MTMxODI0ODQ4LjE3NDE4MTI3NTA.*_ga_XJWPQMJYHQ*czE3NDg0NTYyNzIkbzI2JGcxJHQxNzQ4NDU2MzI2JGo2JGwwJGgw\" target=\"_blank\">Create an account<\/a>.\u00a0<\/p>\n<p>Have questions? The <a href=\"https:\/\/www.docker.com\/community\/\">Docker community is here to help<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>The last few years have made it clear that AI models will continue to be a fundamental component of many [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2143","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2143","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2143"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2143\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2143"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2143"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}