{"id":1907,"date":"2025-04-09T13:18:22","date_gmt":"2025-04-09T13:18:22","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/04\/09\/run-gemma-3-with-docker-model-runner-fully-local-genai-developer-experience\/"},"modified":"2025-04-09T13:18:22","modified_gmt":"2025-04-09T13:18:22","slug":"run-gemma-3-with-docker-model-runner-fully-local-genai-developer-experience","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/04\/09\/run-gemma-3-with-docker-model-runner-fully-local-genai-developer-experience\/","title":{"rendered":"Run Gemma 3 with Docker Model Runner: Fully Local GenAI Developer Experience"},"content":{"rendered":"<p>The landscape of generative AI development is evolving rapidly but comes with significant challenges. API usage costs can quickly add up, especially during development. Privacy concerns arise when sensitive data must be sent to external services. And relying on external APIs can introduce connectivity issues and latency.<\/p>\n<p><strong>Enter Gemma 3 and <\/strong><a href=\"https:\/\/docs.docker.com\/desktop\/features\/model-runner\/\" target=\"_blank\"><strong>Docker Model Runner<\/strong><\/a>, a powerful combination that brings state-of-the-art language models to your local environment, addressing these challenges head-on.<\/p>\n<p>In this blog post, we\u2019ll explore how to run <a href=\"https:\/\/hub.docker.com\/r\/ai\/gemma3\" target=\"_blank\">Gemma 3<\/a> locally using Docker Model Runner. We\u2019ll also walk through a practical case study: a <a href=\"https:\/\/github.com\/docker\/ai-reviewer\" target=\"_blank\">Comment Processing System<\/a> that analyzes user feedback about a fictional AI assistant named Jarvis.<\/p>\n<h2 class=\"wp-block-heading\">The power of local GenAI development<\/h2>\n<p>Before diving into the implementation, let\u2019s look at why local GenAI development is becoming increasingly important:<\/p>\n<p><strong>Cost efficiency<\/strong>: With no per-token or per-request charges, you can experiment freely without worrying about usage fees.<\/p>\n<p><strong>Data privacy<\/strong>: Sensitive data stays within your environment, with no third-party exposure.<\/p>\n<p><strong>Reduced network latency<\/strong>: Eliminates reliance on external APIs and enables offline use.<\/p>\n<p><strong>Full control<\/strong>: Run the model on your terms, with no intermediaries and full transparency.<\/p>\n<h2 class=\"wp-block-heading\">Setting up Docker Model Runner with Gemma 3<\/h2>\n<p>Docker Model Runner provides an OpenAI-compatible API interface to run models locally.<br \/>It is included in Docker Desktop for macOS, starting with <a href=\"https:\/\/www.docker.com\/blog\/docker-desktop-4-40\/\">version 4.40.0<\/a>.<\/p>\n<p>Here\u2019s how to set it up with <a href=\"https:\/\/hub.docker.com\/r\/ai\/gemma3\" target=\"_blank\">Gemma 3<\/a>:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker desktop enable model-runner &#8211;tcp 12434<br \/>\ndocker model pull ai\/gemma3\n<\/div>\n<p>Once setup is complete, the OpenAI-compatible API provided by the Model Runner is available at: <a href=\"http:\/\/localhost:12434\/engines\/llama.cpp\/v1\" target=\"_blank\">http:\/\/localhost:12434\/engines\/v1<\/a><\/p>\n<h2 class=\"wp-block-heading\">Case study: Comment processing system<\/h2>\n<p>To demonstrate the power of local GenAI development, we\u2019ve built a <a href=\"https:\/\/github.com\/docker\/ai-reviewer\" target=\"_blank\">Comment Processing System<\/a> that leverages Gemma 3 for multiple NLP tasks. This system:<\/p>\n<p>Generates synthetic user comments about a fictional AI assistant<\/p>\n<p>Categorizes comments as positive, negative, or neutral<\/p>\n<p>Clusters similar comments together using embeddings<\/p>\n<p>Identifies potential product features from the comments<\/p>\n<p>Generates contextually appropriate responses<\/p>\n<p>All tasks are performed locally with <strong>no external API calls.<\/strong><\/p>\n<h2 class=\"wp-block-heading\">Implementation details<\/h2>\n<h3 class=\"wp-block-heading\">Configuring the OpenAI SDK to use local models<\/h3>\n<p>To make this work, we configure the OpenAI SDK to point to the Docker Model Runner:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n\/\/ config.js\n<p>export default {<br \/>\n  \/\/ Model configuration<br \/>\n  openai: {<br \/>\n    baseURL: &#8220;http:\/\/localhost:12434\/engines\/v1&#8221;, \/\/ Base URL for Docker Model Runner<br \/>\n    apiKey: &#8216;ignored&#8217;,<br \/>\n    model: &#8220;ai\/gemma3&#8221;,<br \/>\n    commentGeneration: { \/\/ Each task has its own configuration, for example temperature is set to a high value when generating comments for creativity<br \/>\n      temperature: 0.3,<br \/>\n      max_tokens: 250,<br \/>\n      n: 1,<br \/>\n    },<br \/>\n    embedding: {<br \/>\n      model: &#8220;ai\/mxbai-embed-large&#8221;, \/\/ Model for generating embeddings<br \/>\n    },<br \/>\n  },<br \/>\n  \/\/ &#8230; other configuration options<br \/>\n};<\/p>\n<\/div>\n<div class=\"wp-block-syntaxhighlighter-code \">\nimport OpenAI from &#8216;openai&#8217;;<br \/>\nimport config from &#8216;.\/config.js&#8217;;\n<p>\/\/ Initialize OpenAI client with local endpoint<br \/>\nconst client = new OpenAI({<br \/>\n  baseURL: config.openai.baseURL,<br \/>\n  apiKey: config.openai.apiKey,<br \/>\n});<\/p>\n<\/div>\n<h3 class=\"wp-block-heading\">Task-specific configuration<\/h3>\n<p>One key benefit of <a href=\"https:\/\/www.docker.com\/blog\/run-llms-locally\/\">running models locally<\/a> is the ability to experiment freely with different configurations for each task without worrying about API costs or rate limits.<\/p>\n<p>In our case:<\/p>\n<p>Synthetic comment generation uses a higher temperature for creativity.<\/p>\n<p>Categorization uses a lower temperature and a 10-token limit for consistency.<\/p>\n<p>Clustering allows up to 20 tokens to improve semantic richness in embeddings.<\/p>\n<p>This flexibility lets us iterate quickly, tune for performance, and tailor the model\u2019s behavior to each use case.<\/p>\n<h3 class=\"wp-block-heading\">Generating synthetic comments<\/h3>\n<p>To simulate user feedback, we use Gemma 3\u2019s ability to follow detailed, context-aware prompts.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n\/**<br \/>\n * Create a prompt for comment generation<br \/>\n * @param {string} type &#8211; Type of comment (positive, negative, neutral)<br \/>\n * @param {string} topic &#8211; Topic of the comment<br \/>\n * @returns {string} &#8211; Prompt for OpenAI<br \/>\n *\/<br \/>\nfunction createPromptForCommentGeneration(type, topic) {<br \/>\n  let sentiment = &#8221;;\n<p>  switch (type) {<br \/>\n    case &#8216;positive&#8217;:<br \/>\n      sentiment = &#8216;positive and appreciative&#8217;;<br \/>\n      break;<br \/>\n    case &#8216;negative&#8217;:<br \/>\n      sentiment = &#8216;negative and critical&#8217;;<br \/>\n      break;<br \/>\n    case &#8216;neutral&#8217;:<br \/>\n      sentiment = &#8216;neutral and balanced&#8217;;<br \/>\n      break;<br \/>\n    default:<br \/>\n      sentiment = &#8216;general&#8217;;<br \/>\n  }<\/p>\n<p>  return `Generate a realistic ${sentiment} user comment about an AI assistant called Jarvis, focusing on its ${topic}.<\/p>\n<p>The comment should sound natural, as if written by a real user who has been using Jarvis.<br \/>\nKeep the comment concise (1-3 sentences) and focused on the specific topic.<br \/>\nDo not include ratings (like &#8220;5\/5 stars&#8221;) or formatting.<br \/>\nJust return the comment text without any additional context or explanation.`;<br \/>\n}<\/p>\n<\/div>\n<p>Examples:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n&#8220;Honestly, Jarvis is just a lot of empty promises. It keeps suggesting irrelevant articles and failing to actually understand my requests for help with my work \u2013 it\u2019s not helpful at all.&#8221;\n<p>&#8220;Jarvis is seriously impressive \u2013 the speed at which it responds is incredible! I\u2019ve never used an AI assistant that\u2019s so quick and efficient, it\u2019s a game changer.\n<\/p><\/div>\n<p>The ability to produce realistic feedback on demand is incredibly useful for simulating user data with <strong>zero API cost.<\/strong><\/p>\n<h3 class=\"wp-block-heading\">Generating contextual responses<\/h3>\n<p>We also use Gemma 3 to simulate polite, on-brand support responses to user comments. Here\u2019s the prompt logic:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nconst response = await client.chat.completions.create({<br \/>\n    model: config.openai.model,<br \/>\n    messages: [<br \/>\n      {<br \/>\n        role: &#8220;system&#8221;,<br \/>\n        content: `You are a customer support representative for an AI assistant called Jarvis. Your task is to generate polite, helpful responses to user comments.\n<p>Guidelines for responses:<br \/>\n1. Be empathetic and acknowledge the user&#8217;s feedback<br \/>\n2. Thank the user for their input<br \/>\n3. If the comment is positive, express appreciation<br \/>\n4. If the comment is negative, apologize for the inconvenience and assure them you&#8217;re working on improvements<br \/>\n5. If the comment is neutral, acknowledge their observation<br \/>\n6. If relevant, mention that their feedback will be considered for future updates<br \/>\n7. Keep responses concise (2-4 sentences) and professional<br \/>\n8. Do not make specific promises about feature implementation or timelines<br \/>\n9. Sign the response as &#8220;The Jarvis Team&#8221;`<br \/>\n      },<br \/>\n      {<br \/>\n        role: &#8220;user&#8221;,<br \/>\n        content: `User comment: &#8220;${comment.text}&#8221;<br \/>\nComment category: ${comment.category || &#8216;unknown&#8217;}<\/p>\n<p>${featuresContext}<\/p>\n<p>Generate a polite, helpful response to this user comment.`<br \/>\n      }<br \/>\n    ],<br \/>\n    temperature: 0.7,<br \/>\n    max_tokens: 200<br \/>\n  });<\/p>\n<\/div>\n<p>Examples:<\/p>\n<p>For a positive comment:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nThank you so much for your positive feedback regarding Jarvis\u2019s interface! We\u2019re thrilled to hear you find it clean and intuitive \u2013 that\u2019s exactly what we\u2019re aiming for. We appreciate you pointing out your desire for more visual customization options, and your feedback will definitely be considered as we continue to develop Jarvis.\n<p>The Jarvis Team<\/p>\n<\/div>\n<p>For a negative comment:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nThank you for your feedback \u2013 we appreciate you taking the time to share your observations about Jarvis. We sincerely apologize for the glitches and freezes you\u2019ve experienced; we understand how frustrating that can be. Your input is valuable, and we\u2019re actively working on improvements to enhance Jarvis\u2019s reliability and accuracy. \n<p>The Jarvis Team\n<\/p><\/div>\n<p>This approach ensures a consistent, human-like support experience <strong>generated entirely locally.<\/strong><\/p>\n<h3 class=\"wp-block-heading\">Extracting product features from user feedback<\/h3>\n<p>Beyond generating and responding to comments, we also use Gemma 3 to analyze user feedback and identify actionable insights. This helps simulate the role of a product analyst, surfacing recurring themes, user pain points, and opportunities for improvement.<\/p>\n<p>Here, we provide a prompt instructing the model to identify up to three potential features or improvements based on a set of user comments.\u00a0<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n\/**<br \/>\n * Extract features from comments<br \/>\n * @param {string} commentsText &#8211; Text of comments<br \/>\n * @returns {Promise&lt;Array&gt;} &#8211; Array of identified features<br \/>\n *\/<br \/>\nasync function extractFeaturesFromComments(commentsText) {<br \/>\n  const response = await client.chat.completions.create({<br \/>\n    model: config.openai.model,<br \/>\n    messages: [<br \/>\n      {<br \/>\n        role: &#8220;system&#8221;,<br \/>\n        content: `You are a product analyst for an AI assistant called Jarvis. Your task is to identify potential product features or improvements based on user comments.\n<p>For each set of comments, identify up to 3 potential features or improvements that could address the user feedback.<\/p>\n<p>For each feature, provide:<br \/>\n1. A short name (2-5 words)<br \/>\n2. A brief description (1-2 sentences)<br \/>\n3. The type of feature (New Feature, Improvement, Bug Fix)<br \/>\n4. Priority (High, Medium, Low)<\/p>\n<p>Format your response as a JSON array of features, with each feature having the fields: name, description, type, and priority.`<br \/>\n      },<br \/>\n      {<br \/>\n        role: &#8220;user&#8221;,<br \/>\n        content: `Here are some user comments about Jarvis. Identify potential features or improvements based on these comments:<\/p>\n<p>${commentsText}`<br \/>\n      }<br \/>\n    ],<br \/>\n    response_format: { type: &#8220;json_object&#8221; },<br \/>\n    temperature: 0.5<br \/>\n  });<\/p>\n<p>  try {<br \/>\n    const result = JSON.parse(response.choices[0].message.content);<br \/>\n    return result.features || [];<br \/>\n  } catch (error) {<br \/>\n    console.error(&#8216;Error parsing feature identification response:&#8217;, error);<br \/>\n    return [];<br \/>\n  }<br \/>\n}<\/p>\n<\/div>\n<p>Here\u2019s an example of what the model might return:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n&#8220;features&#8221;: [<br \/>\n    {<br \/>\n      &#8220;name&#8221;: &#8220;Enhanced Visual Customization&#8221;,<br \/>\n      &#8220;description&#8221;: &#8220;Allows users to personalize the Jarvis interface with more themes, icon styles, and display options to improve visual appeal and user preference.&#8221;,<br \/>\n      &#8220;type&#8221;: &#8220;Improvement&#8221;,<br \/>\n      &#8220;priority&#8221;: &#8220;Medium&#8221;,<br \/>\n      &#8220;clusters&#8221;: [<br \/>\n        &#8220;1&#8221;<br \/>\n      ]<br \/>\n    },\n<\/div>\n<p>And just like everything else in this project, <strong>it\u2019s generated locally with no external services<\/strong>.<\/p>\n<h3 class=\"wp-block-heading\">Conclusion<\/h3>\n<p>By combining <a href=\"https:\/\/hub.docker.com\/r\/ai\/gemma3\" target=\"_blank\">Gemma 3<\/a> with Docker Model Runner, we\u2019ve unlocked a local GenAI workflow that\u2019s fast, private, cost-effective, and fully under our control. In building our Comment Processing System, we experienced firsthand the benefits of this approach:<\/p>\n<p><strong>Rapid iteration<\/strong> without worrying about API costs or rate limits<\/p>\n<p><strong>Flexibility<\/strong> to test different configurations for each task<\/p>\n<p><strong>Offline development<\/strong> with no dependency on external services<\/p>\n<p><strong>Significant cost savings<\/strong> during development<\/p>\n<p>And this is just one example of what\u2019s possible. Whether you\u2019re prototyping a new AI product, building internal tools, or exploring advanced NLP use cases, running models locally puts you in the driver\u2019s seat.<\/p>\n<p>As open-source models and local tooling continue to evolve, the barrier to entry for building powerful AI systems keeps getting lower.<\/p>\n<p>Don\u2019t just consume AI; develop, shape, and own the process.<\/p>\n<p>Try it yourself: <a href=\"https:\/\/github.com\/docker\/ai-reviewer\" target=\"_blank\">clone the repository<\/a> and start experimenting today.<\/p>","protected":false},"excerpt":{"rendered":"<p>The landscape of generative AI development is evolving rapidly but comes with significant challenges. API usage costs can quickly add [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-1907","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/1907","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=1907"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/1907\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=1907"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=1907"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=1907"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}