{"id":1397,"date":"2024-11-04T14:16:20","date_gmt":"2024-11-04T14:16:20","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2024\/11\/04\/using-ai-tools-to-convert-a-pdf-into-images\/"},"modified":"2024-11-04T14:16:20","modified_gmt":"2024-11-04T14:16:20","slug":"using-ai-tools-to-convert-a-pdf-into-images","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2024\/11\/04\/using-ai-tools-to-convert-a-pdf-into-images\/","title":{"rendered":"Using AI Tools to Convert a PDF into Images"},"content":{"rendered":"<p><em>This ongoing <\/em><a href=\"https:\/\/www.docker.com\/blog\/tag\/genai-docker-labs\/\" target=\"_blank\" rel=\"noopener\"><em>Docker Labs GenAI series<\/em><\/a><em> explores the exciting space of AI developer tools. At Docker, we believe there is a vast scope to explore, openly and without the hype. We will share our explorations and collaborate with the developer community in real time. Although developers have adopted autocomplete tooling like GitHub Copilot and use chat, there is significant potential for AI tools to assist with more specific tasks and interfaces throughout the entire software lifecycle. Therefore, our exploration will be broad. We will be releasing software as open source so you can play, explore, and hack with us, too.<\/em><\/p>\n<p>How do you turn a PDF into a set of images? A modern suggestion could be to get Claude or GPT to write you a CLI command for something like ImageMagick. Let\u2019s follow that train of thought.<\/p>\n<p>If we enter a prompt like \u201cHow do I turn a PDF into a bunch of images using ImageMagick,\u201d we will indeed get something that looks like it might be a solution.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nImagemagick includes a command-line tool called convert which you can use to convert PDF files into images. Here&#8217;s how you can do it:\n<p>1. Open your terminal<\/p>\n<p>2. Change directory to where your PDF is located:<\/p>\n<p>&#8220;`<br \/>\ncd \/path\/to\/your\/pdf<\/p>\n<p>&#8220;`<br \/>\n3. Use the following command to convert the PDF into images (in this case JPEG):<\/p>\n<p>&#8220;`<br \/>\nconvert -density 300 -quality 100 your_file.pdf output_name.jpg<\/p>\n<p>&#8220;`\n<\/p><\/div>\n<p>The agent will then explain how convert works and finish with a suggestion to \u201cplease make sure that you have installed ImageMagick before running these commands.\u201d<\/p>\n<p>Instead of worrying about whether <a href=\"https:\/\/imagemagick.org\/index.php\" target=\"_blank\" rel=\"noopener\">ImageMagick<\/a> is installed, however, let\u2019s use Docker as our tool provider and prompt it to go ahead and run the command. If you\u2019re following along in this series, you\u2019ll have seen that we are using Markdown files to mix together tools and prompts.\u00a0 Here\u2019s the first prompt we tried:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n&#8212;<br \/>\ntools:<br \/>\n  &#8211; name: imagemagick<br \/>\n&#8212;<br \/>\n# prompt user\n<p>Use Imagemagick to convert the family.pdf file into a bunch of jpg images.\n<\/p><\/div>\n<p>After executing this prompt, the LLM generated a tool call, which we executed in the <a href=\"https:\/\/www.docker.com\/products\/container-runtime\/\" target=\"_blank\" rel=\"noopener\">Docker runtime<\/a>, and it successfully converted family.pdf into nine .jpg files (my family.pdf file had nine pages).\u00a0<\/p>\n<p>Figure 1 shows the flow from our <a href=\"https:\/\/www.linkedin.com\/redir\/redirect?url=https%3A%2F%2Fgithub.com%2Fdocker%2Flabs-ai-tools-vscode&amp;urlhash=5XDU&amp;trk=article-ssr-frontend-pulse_little-text-block\" target=\"_blank\" rel=\"noopener\">VSCode Extension<\/a>.<\/p>\n<p><a href=\"https:\/\/www.docker.com\/wp-content\/uploads\/2024\/10\/F1-workflow.gif\" target=\"_blank\" rel=\"noopener\"><\/a><strong>Figure 1: <\/strong>Workflow from VSCode Extension.<\/p>\n<p>We have given enough context to the LLM that it is able to plan a call to this ImageMagick binary. And, because this tool is available on <a href=\"https:\/\/www.docker.com\/products\/docker-hub\/\" target=\"_blank\" rel=\"noopener\">Docker Hub<\/a>, we don\u2019t have to \u201cmake sure that ImageMagick is installed.\u201d\u00a0This would be the equivalent command if you were to use docker run directly:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n# family.pdf must be located in your $PWD\n<p>docker run &#8211;rm -v $PWD:\/project &#8211;workdir \/project vonwig\/imageMagick:latest convert -density 300 -quality 300 family.pdf family.jpg\n<\/p><\/div>\n<h2 class=\"wp-block-heading\">The tool ecosystem<\/h2>\n<p>How did this work? The process relied on two things:<\/p>\n<p>Tool distribution and discovery (pulling tools into Docker Hub for distribution to our <a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\" target=\"_blank\" rel=\"noopener\">Docker Desktop<\/a> runtime).<\/p>\n<p>Automatic generation of Agent Tool interfaces.<\/p>\n<p>When we first started this project, we expected that we\u2019d begin with a small set of tools because the interface for each tool would take time to design. We thought we were going to need to bootstrap an ecosystem of tools that had been prepared to be used in these agent workflows.\u00a0<\/p>\n<p>However, we learned that we can use a much more generic approach. Most tools already come with documentation, such as command-line help, examples, and man pages. Instead of treating each tool as something special, we are using an architecture where an agent responds to failures by reading documentation and trying again (Figure 2).<\/p>\n<p><a href=\"https:\/\/www.docker.com\/wp-content\/uploads\/2024\/10\/F2-process-loop.png\" target=\"_blank\" rel=\"noopener\"><\/a><strong>Figure 2:<\/strong> Agent process.<\/p>\n<p>We see a process of experimenting with tools that is not unlike what we, as developers, do on the command line. Try a command line, read a doc, adjust the command line, and try again.<\/p>\n<p>The value of this kind of looping has changed our expectations. Step one is simply pulling the tool into Docker Hub and seeing whether the agent can use it with nothing more than its out-of-the-box documentation. We are also pulling open source software (OSS)\u00a0 tools directly from <a href=\"https:\/\/www.linkedin.com\/redir\/redirect?url=https%3A%2F%2Fgithub.com%2FNixOS%2Fnixpkgs&amp;urlhash=uOBn&amp;trk=article-ssr-frontend-pulse_little-text-block\" target=\"_blank\" rel=\"noopener\">nixpkgs<\/a>, which gives us access to tens of thousands of different tools to experiment with.\u00a0<\/p>\n<p>Docker keeps our runtimes isolated from the host operating system, while the nixpkgs ecosystem and maintainers provide a rich source of OSS tools.<\/p>\n<p>As expected, packaging agents still run into issues that force us to re-plan how tools are packaged. For example, the prompt we showed above might have generated the correct tool call on the first try, but the ImageMagick container failed on the first run with this terrible-looking error message:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\nfunction call failed call exited with non-zero code (1): Error: sh: 1: gs: not found\n<\/div>\n<p>Fortunately, feeding that error back into the LLM resulted in the suggestion that convert needs another tool, called <a href=\"https:\/\/www.ghostscript.com\/\" target=\"_blank\" rel=\"noopener\">Ghostscript<\/a>, to run successfully. Our agent was not able to fix this automatically today. However, we adjusted the image build slightly and now the \u201clatest\u201d version of the vonwig\/imagemagick:latest no longer has this issue. This is an example of something we only need to learn once.<\/p>\n<p>The LLM figured out convert on its own. But its agency came from the addition of a tool.<\/p>\n<p>Read the <a href=\"https:\/\/www.docker.com\/blog\/tag\/genai-docker-labs\/\" target=\"_blank\" rel=\"noopener\">Docker Labs GenAI series<\/a> to see more of what we\u2019ve been working on.<\/p>\n<h2 class=\"wp-block-heading\">Learn more<\/h2>\n<p>Subscribe to the <a href=\"https:\/\/www.docker.com\/newsletter-subscription\/\" target=\"_blank\" rel=\"noopener\">Docker Newsletter<\/a>.\u00a0<\/p>\n<p>Get the latest release of <a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\" target=\"_blank\" rel=\"noopener\">Docker Desktop<\/a>.<\/p>\n<p>Have questions? The <a href=\"https:\/\/www.docker.com\/community\/\" target=\"_blank\" rel=\"noopener\">Docker community is here to help<\/a>.<\/p>\n<p>New to Docker? <a href=\"https:\/\/docs.docker.com\/desktop\/\" target=\"_blank\" rel=\"noopener\">Get started<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>This ongoing Docker Labs GenAI series explores the exciting space of AI developer tools. At Docker, we believe there is [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-1397","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/1397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=1397"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/1397\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=1397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=1397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=1397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}