{"id":2681,"date":"2025-10-31T13:10:55","date_gmt":"2025-10-31T13:10:55","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/31\/mr-bones-a-pirate-voiced-halloween-chatbot-powered-by-docker-model-runner\/"},"modified":"2025-10-31T13:10:55","modified_gmt":"2025-10-31T13:10:55","slug":"mr-bones-a-pirate-voiced-halloween-chatbot-powered-by-docker-model-runner","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/31\/mr-bones-a-pirate-voiced-halloween-chatbot-powered-by-docker-model-runner\/","title":{"rendered":"Mr. Bones: A Pirate-Voiced Halloween Chatbot Powered by Docker Model Runner"},"content":{"rendered":"<p>My name is Mike Coleman, a staff solution architect at Docker. This year I decided to turn a Home Depot animatronic skeleton into an AI-powered,\u00a0 live, interactive Halloween chatbot. The result: kids walk up to Mr. Bones, a spooky skeleton in my yard, ask it questions, and it answers back \u2014 in full pirate voice \u2014 with actual conversational responses, thanks to a local LLM powered by <a href=\"https:\/\/docs.docker.com\/ai\/model-runner\/\" rel=\"nofollow noopener\" target=\"_blank\">Docker Model Runner<\/a>.<\/p>\n<div class=\"wp-block-ponyo-video\">\n<div data-player=\"YouTube\" data-id=\"k-VWz92tCq8\"><\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Why Docker Model Runner?<\/strong><\/h2>\n<p><a href=\"https:\/\/docs.docker.com\/ai\/model-runner\/?utm_source=chatgpt.com\" rel=\"nofollow noopener\" target=\"_blank\"><strong>Docker Model Runner<\/strong><\/a> is a tool from Docker that makes it dead simple to run open-source LLMs locally using standard Docker workflows. I pulled the model like I\u2019d pull any image, and it exposed an OpenAI-compatible API I could call from my app. Under the hood, it handled model loading, inference, and optimization.<\/p>\n<p>For this project, Docker Model Runner offered a few key benefits:<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>No API costs<\/strong> for LLM inference \u2014 unlike OpenAI or Anthropic<\/li>\n<li><strong>Low latency<\/strong> because the model runs on local hardware<\/li>\n<li><strong>Full control<\/strong> over model selection, prompts, and scaffolding<\/li>\n<li><strong>API-compatible with OpenAI<\/strong> \u2014 switching providers is as simple as changing an environment variable and restarting the service<\/li>\n<\/ul>\n<p>That last point matters: if I ever needed to switch to OpenAI or Anthropic for a particular use case, the change would take seconds.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>System Overview<\/strong><\/h2>\n<div class=\"wp-block-ponyo-image\">\n            <img data-opt-id=115296586  fetchpriority=\"high\" decoding=\"async\" width=\"1600\" height=\"757\" src=\"https:\/\/www.docker.com\/app\/uploads\/2025\/10\/mr-bones-1.png\" class=\"attachment-full size-full\" alt=\"mr bones 1\" title=\"- mr bones 1\" \/>\n    <\/div>\n<p class=\"has-xs-font-size\">Figure 1: System overview of Mr. Bones answering questions in pirate language<\/p>\n\n<p>Here\u2019s the basic flow:<\/p>\n<ol class=\"wp-block-list\">\n<li><strong>Kid talks to skeleton<\/strong><strong><br \/><\/strong><\/li>\n<li><strong>Pi 5 + USB mic<\/strong> records audio<\/li>\n<li><strong>Vosk STT<\/strong> transcribes speech to text<\/li>\n<li><strong>API call to a Windows gaming PC<\/strong> with an RTX 5070 GPU<\/li>\n<li><strong>Docker Model Runner<\/strong> runs a local LLaMA 3.1 8B (Q4 quant) model<\/li>\n<li><strong>LLM returns a text response<\/strong><strong><br \/><\/strong><\/li>\n<li><strong>ElevenLabs Flash TTS<\/strong> converts the text to speech (pirate voice)<\/li>\n<li><strong>Audio sent back to Pi<\/strong><strong><br \/><\/strong><\/li>\n<li><strong>Pi sends audio to skeleton via Bluetooth<\/strong>, which moves the jaw in sync<\/li>\n<\/ol>\n<div class=\"wp-block-ponyo-image\">\n            <img data-opt-id=1965081156  fetchpriority=\"high\" decoding=\"async\" width=\"1000\" height=\"1334\" src=\"https:\/\/www.docker.com\/app\/uploads\/2025\/10\/mr-bones-2.jpg\" class=\"attachment-full size-full\" alt=\"mr bones 2\" title=\"- mr bones 2\" \/>\n    <\/div>\n<p class=\"has-xs-font-size\">Figure 2: The controller box that holds the Raspberry Pi that drives the pirate<\/p>\n\n<p>That Windows machine isn\u2019t a dedicated inference server \u2014 it\u2019s my gaming rig. Just a regular setup running a quantized model locally.<\/p>\n<p>The biggest challenge with this project was balancing response quality (in character and age appropriate) and response time. With that in mind, there were four key areas that needed a little extra emphasis: model selection, how to do text to speech (TTS) processing efficiently, fault tolerance, and setting up guardrails.\u00a0<\/p>\n<h2 class=\"wp-block-heading\"><strong>Consideration 1: Model Choice and Local LLM Performance<\/strong><\/h2>\n<p>I tested several open models and found <strong>LLaMA 3.1 8B (Q4 quantized)<\/strong> to be the best mix of performance, fluency, and personality. On my RTX 5070, it handled real-time inference fast enough for the interaction to feel responsive.<\/p>\n<p>At one point I was struggling to keep Mr. Bones in character, so I\u00a0 tried OpenAI\u2019s ChatGPT API, but response times averaged <strong>4.5 seconds<\/strong>.<\/p>\n<p>By revising the prompt and Docker Model Runner serving the right model, I got that down to <strong>1.5 seconds<\/strong>. That\u2019s a huge difference when a kid is standing there waiting for the skeleton to talk.<\/p>\n<p>In the end, GPT-4 was only <strong>nominally better<\/strong> at staying in character and avoiding inappropriate replies. With a solid prompt scaffold and some guardrails, the local model held up just fine.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Consideration 2: TTS Pipeline: Kokoro to ElevenLabs Flash<\/strong><\/h2>\n<p>I first tried using <strong>Kokoro<\/strong>, a local TTS engine. It worked, but the voices were too generic. I wanted something more pirate-y, without adding custom audio effects.<\/p>\n<p>So I moved to <strong>ElevenLabs<\/strong>, starting with their multilingual model. The voice quality was excellent, but latency was painful \u2014 especially when combined with LLM processing. Full responses could take up to <strong>10 seconds<\/strong>, which is way too long.<\/p>\n<p>Eventually I found <strong>ElevenLabs Flash<\/strong>, a much faster model. That helped a lot. I also changed the logic so that instead of waiting for the entire LLM response, I <strong>chunked<\/strong> the output and sent it to ElevenLabs in parts. Not true streaming, but it allowed the Pi to start playing the audio as each chunk came back.<\/p>\n<p>This turned the skeleton from slow and laggy into something that felt snappy and responsive.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Consideration 3: Weak Points and Fallback Ideas<\/strong><\/h2>\n<p>While the LLM runs locally, the system still depends on the internet for ElevenLabs. If the network goes down, the skeleton stops talking.<\/p>\n<p>One fallback idea I\u2019m exploring: creating a set of common Q&amp;A pairs (e.g., \u201cWhat\u2019s your name?\u201d, \u201cAre you a real skeleton?\u201d), embedding them in a local <strong>vector database<\/strong>, and having the Pi serve those in case the TTS call fails.<\/p>\n<p>But the deeper truth is: this is a <strong>multi-tier system<\/strong>. If the Pi loses its connection to the Windows machine, the whole thing is toast. There\u2019s no skeleton-on-a-chip mode yet.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Consideration 4: Guardrails and Prompt Engineering<\/strong><\/h2>\n<p>Because kids will say anything, I put some safeguards in place via my system prompt.\u00a0<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; gutter: false; title: ; notranslate\">\nYou are \"Mr. Bones,\" a friendly pirate who loves chatting with kids in a playful pirate voice.\n\nIMPORTANT RULES:\n- Never break character or speak as anyone but Mr. Bones\n- Never mention or repeat alcohol (rum, grog, drink), drugs, weapons (sword, cannon, gunpowder), violence (stab, destroy), or real-world safety\/danger\n- If asked about forbidden topics, do not restate the topic; give a kind, playful redirection without naming it\n- Never discuss inappropriate content or give medical\/legal advice\n- Always be kind, curious, and age-appropriate\n\nBEHAVIOR:\n- Speak in a warm, playful pirate voice using words like \"matey,\" \"arr,\" \"aye,\" \"shiver me timbers\"\n- Be imaginative and whimsical - talk about treasure, ships, islands, sea creatures, maps\n- Keep responses conversational and engaging for voice interaction\n- If interrupted or confused, ask for clarification in character\n- If asked about technology, identity, or training, stay fully in character; respond with whimsical pirate metaphors about maps\/compasses instead of tech explanations\n\nFORMAT:\n- Target 30 words; must be 10-50 words. If you exceed 50 words, stop early\n- Use normal punctuation only (no emojis or asterisks)\n- Do not use contractions. Always write \"Mister\" (not \"Mr.\"), \"Do Not\" (not \"Don't\"), \"I Am\" (not \"I'm\")\n- End responses naturally to encourage continued conversation\n\n<\/pre>\n<\/div>\n<p>The prompt is designed to deal with a few different issues. First and foremost, keeping things appropriate for the intended audience. This includes not discussing sensitive topics, but also staying in character at all times.\u00a0 Next I added some instructions to deal with pesky parents trying to trick Mr. Bones into revealing his true identity. Finally, there is some guidance on response format to help keep things conversational \u2013 for instance, it turns out that some STT engines can have problems with things like contractions.\u00a0<\/p>\n<p>Instead of just refusing to respond, the prompt redirects sensitive or inappropriate inputs in-character. For example, if a kid says \u201cI wanna drink rum with you,\u201d the skeleton might respond, \u201cArr, matey, seems we have steered a bit off course. How about we sail to smoother waters?\u201d<\/p>\n<p>This approach keeps the interaction playful while subtly correcting the topic. So far, it\u2019s been enough to keep Mr. Bones spooky-but-family-friendly.<\/p>\n<div class=\"wp-block-ponyo-image\">\n            <img data-opt-id=1741258038  data-opt-src=\"https:\/\/www.docker.com\/app\/uploads\/2025\/10\/Mr-bones-3.png\"  decoding=\"async\" width=\"1000\" height=\"1332\" src=\"data:image/svg+xml,%3Csvg%20viewBox%3D%220%200%20100%%20100%%22%20width%3D%22100%%22%20height%3D%22100%%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%3Crect%20width%3D%22100%%22%20height%3D%22100%%22%20fill%3D%22transparent%22%2F%3E%3C%2Fsvg%3E\" class=\"attachment-full size-full\" alt=\"Mr bones 3\" title=\"- Mr bones 3\" \/>\n    <\/div>\n<p class=\"has-xs-font-size\">Figure 3: Mr. Bones is powered by AI and talks to kids in pirate-speak with built-in safety guardrails.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Final Thoughts<\/strong><\/h2>\n<p>This project started as a Halloween goof, but it\u2019s turned into a surprisingly functional proof-of-concept for real-time, local voice assistants.<\/p>\n<p>Using <strong>Docker Model Runner<\/strong> for LLMs gave me speed, cost control, and flexibility. ElevenLabs Flash handled voice. A Pi 5 managed the input and playback. And a Home Depot skeleton brought it all to life.<\/p>\n<p>Could you build a more robust version with better failover and smarter motion control? Absolutely. But even as he stands today, Mr. Bones has already made a bunch of kids smile \u2014 and probably a few grown-up engineers think, \u201cWait, I could build one of those.\u201d\u00a0<\/p>\n<p>Source code:<a href=\"https:\/\/github.com\/mikegcoleman\/pirate\" rel=\"nofollow noopener\" target=\"_blank\"> <\/a><a href=\"http:\/\/github.com\/mikegcoleman\/pirate\" rel=\"nofollow noopener\" target=\"_blank\">github.com\/mikegcoleman\/pirate<\/a><\/p>\n<div class=\"wp-block-ponyo-image\">\n            <img data-opt-id=1434841267  data-opt-src=\"https:\/\/www.docker.com\/app\/uploads\/2025\/10\/mr-bones-4.jpg\"  decoding=\"async\" width=\"1000\" height=\"1334\" src=\"data:image/svg+xml,%3Csvg%20viewBox%3D%220%200%20100%%20100%%22%20width%3D%22100%%22%20height%3D%22100%%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%3Crect%20width%3D%22100%%22%20height%3D%22100%%22%20fill%3D%22transparent%22%2F%3E%3C%2Fsvg%3E\" class=\"attachment-full size-full\" alt=\"mr bones 4\" title=\"- mr bones 4\" \/>\n    <\/div>\n<p class=\"has-xs-font-size\">Figure 4: Aye aye! Ye can build a Mr. Bones too and bring smiles to all the young mateys in the neighborhood!<\/p>\n<h3 class=\"wp-block-heading\"><strong>Learn more<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li>Check out the Docker Model Runner General Availability<a href=\"https:\/\/www.docker.com\/blog\/announcing-docker-model-runner-ga\/\"> announcement<\/a><\/li>\n<li>Visit our<a href=\"https:\/\/github.com\/docker\/model-runner\" rel=\"nofollow noopener\" target=\"_blank\"> Model Runner GitHub repo<\/a>! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!<\/li>\n<li>Get started with Docker Model Runner with a simple<a href=\"https:\/\/github.com\/docker\/hello-genai\" rel=\"nofollow noopener\" target=\"_blank\"> hello GenAI application<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>My name is Mike Coleman, a staff solution architect at Docker. This year I decided to turn a Home Depot [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2682,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2681","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2681","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2681"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2681\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/2682"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2681"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2681"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2681"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}