{"id":2550,"date":"2025-10-02T13:20:05","date_gmt":"2025-10-02T13:20:05","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/02\/from-shell-scripts-to-science-agents-how-ai-agents-are-transforming-research-workflows\/"},"modified":"2025-10-02T13:20:05","modified_gmt":"2025-10-02T13:20:05","slug":"from-shell-scripts-to-science-agents-how-ai-agents-are-transforming-research-workflows","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2025\/10\/02\/from-shell-scripts-to-science-agents-how-ai-agents-are-transforming-research-workflows\/","title":{"rendered":"From Shell Scripts to Science Agents: How AI Agents Are Transforming Research Workflows"},"content":{"rendered":"<p>It\u2019s 2 AM in a lab somewhere. A researcher has three terminals open, a half-written Jupyter notebook on one screen, an Excel sheet filled with sample IDs on another, and a half-eaten snack next to shell commands. They\u2019re juggling scripts to run a protein folding model, parsing CSVs from the last experiment, searching for literature, and Googling whether that one Python package broke in the latest update, again.<\/p>\n<p>This isn\u2019t the exception; it\u2019s the norm. Scientific research today is a patchwork of tools, scripts, and formats, glued together by determination and late-night caffeine. Reproducibility is a wishlist item. Infrastructure is an afterthought. And while automation exists, it\u2019s usually hand-rolled and stuck on someone\u2019s laptop.<\/p>\n<p>But what if science workflows could be orchestrated, end-to-end, by an intelligent agent?<\/p>\n<p>What if instead of writing shell scripts and hoping the dependencies don\u2019t break, a scientist could describe the goal, \u201cread this CSV of compounds and proteins, search for literature, admet, and more\u201d, and an AI agent could plan the steps, spin up the right tools in containers, execute the tasks, and even summarize the results?<\/p>\n<p>That\u2019s the promise of <strong>science agents<\/strong>. AI-powered systems that don\u2019t just answer questions like ChatGPT, but autonomously carry out entire research workflows. And thanks to the convergence of LLMs, GPUs, Dockerized environments, and open scientific tools, this shift isn\u2019t theoretical anymore.<\/p>\n<p>It\u2019s happening now.<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<h2 class=\"wp-block-heading\"><strong>What is a Science Agent?<\/strong><\/h2>\n<p>A <strong>Science Agent<\/strong> is more than just a chatbot or a smart prompt generator; it\u2019s an autonomous system designed to <em>plan, execute, and iterate<\/em> on entire scientific workflows with minimal human input.<\/p>\n<p>Instead of relying on one-off questions like \u201cWhat is ADMET?\u201d or \u201cSummarize this paper,\u201d a science agent operates like a digital research assistant. It understands goals, breaks them into steps, selects the right tools, runs computations, and even reflects on results.<\/p>\n<div class=\"style-plain wp-block-ponyo-houston\">\n<p>CrewAI: AI agents framework -&gt;<a href=\"https:\/\/www.crewai.com\/\" target=\"_blank\"> https:\/\/www.crewai.com\/<\/a><br \/>ADMET: how a drug is absorbed, distributed, metabolized, and excreted, and its toxicity<\/p>\n<\/div>\n<p>Let\u2019s make it concrete:<\/p>\n<p>Take this multi-agent system you might build with <strong>CrewAI<\/strong>:<\/p>\n<p><strong>Curator<\/strong>: Data-focused agent whose primary role is to ensure data quality and standardization.<\/p>\n<p><strong>Researcher<\/strong>: Literature specialist. Its main goal is to find relevant academic papers on PubMed for the normalized entities provided by the Curator.<\/p>\n<p><strong>Web Scraper<\/strong>: Specialized agent for extracting information from websites.<\/p>\n<p><strong>Analyst<\/strong>: Predicts ADMET properties and toxicity using models or APIs.<\/p>\n<p><strong>Reporter<\/strong>: Compiles all results into a clean Markdown report.<\/p>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p>Each of these agents acts independently but works as part of a coordinated system. Together, they automate what would take a human team hours or even days, now in minutes and reproducibly.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Why This Is Different from ChatGPT<\/strong><\/h2>\n<p>You\u2019ve probably used ChatGPT to summarize papers, write Python code, or explain complex topics. And while it might seem like a simple question-answer engine, there\u2019s often more happening behind the scenes, prompt chains, context windows, and latent loops of reasoning. But even with those advances, these interactions are still mostly human-in-the-loop: you ask, it answers.<\/p>\n<p><strong>Science agents are a different species entirely.<\/strong><\/p>\n<p>Instead of waiting for your next prompt, they plan and execute entire workflows autonomously. They decide which tools to use based on context, how to validate results, and when to pivot. Where ChatGPT responds, agents act. They\u2019re less like assistants and more like collaborators.<\/p>\n<p>Let\u2019s break down the key differences:<\/p>\n<div class=\"wp-block-ponyo-table style__default\">\n<p><strong>Feature<\/strong><\/p>\n<p><strong>LLMs (ChatGPT &amp; similar)<\/strong><\/p>\n<p><strong>Science Agents (CrewAI, LangGraph, etc.)<\/strong><\/p>\n<p>Interaction<\/p>\n<p>Multi-turn, often guided by user prompts or system instructions<\/p>\n<p>Long-running, autonomous workflows across multiple tools<\/p>\n<p>Role<\/p>\n<p>Assistant with agentic capabilities abstracted away<\/p>\n<p>Explicit research collaborator executing role-specific tasks<\/p>\n<p>Autonomy<\/p>\n<p>Semi-autonomous; requires external prompting or embedded system orchestration<\/p>\n<p>Fully autonomous planning, tool selection, and iteration<\/p>\n<p>Tool Use<\/p>\n<p>Some tools are used via plugins\/functions (e.g., browser, code interpreter)<\/p>\n<p>Explicit tool integration (APIs, simulations, databases, Dockerized tools)<\/p>\n<p>Memory<\/p>\n<p>Short- to medium-term context (limited per session or chat, non-explicit workspace)<\/p>\n<p>Persistent long-term memory (vector DBs, file logs, databases, explicit and programmable)<\/p>\n<p>Reproducibility<\/p>\n<p>Very limited, without the ability to define agents\u2019 roles\/tasks and their tools<\/p>\n<p>Fully containerized, versioned workflows, reproducible workflows with defined agent roles\/tasks<\/p>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Try it yourself<\/strong><\/h2>\n<p>If you\u2019re curious, here\u2019s a two-container demo you can run in minutes.<\/p>\n<p>git repo:<a href=\"https:\/\/github.com\/estebanx64\/docker_blog_ai_agents_science\" target=\"_blank\"> <\/a><a href=\"https:\/\/github.com\/estebanx64\/docker_blog_ai_agents_research\" target=\"_blank\">https:\/\/github.com\/estebanx64\/docker_blog_ai_agents_research<\/a><\/p>\n<h3 class=\"wp-block-heading\"><strong>We just have two containers\/services for this example<\/strong>:<\/h3>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<h3 class=\"wp-block-heading\"><strong>Prerequisites<\/strong><\/h3>\n<p>Docker and Docker Compose<\/p>\n<p>OpenAI API key (for GPT-4o model access)<\/p>\n<p>Sample CSV file with biological entities<\/p>\n<p>Follow the instructions from README.md in our repo to set up your OpenAI API KEY<\/p>\n<div class=\"style-plain wp-block-ponyo-houston\">\n<p>Running the next workflow with the example included in our repo is going to charge ~1-2 USD for the OpenAI API.<\/p>\n<\/div>\n<p>Run the workflow.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\ndocker compose up\n<\/div>\n<div class=\"wp-block-ponyo-image\"><\/div>\n<p>The logs above demonstrate how our agents autonomously plan and execute a complete workflow. <\/p>\n<p><strong>Ingest CSV File<\/strong><\/p>\n<p>\u00a0The agents load and parse the input CSV dataset.<\/p>\n<p><strong>Query PubMed<\/strong><\/p>\n<p>\u00a0They automatically search PubMed for relevant scientific articles.<\/p>\n<p><strong>Generate Literature Summaries<\/strong><\/p>\n<p>\u00a0The retrieved articles are summarized into concise, structured insights.<\/p>\n<p><strong>Calculate ADMET Properties<\/strong><\/p>\n<p>\u00a0The agents call an external API to compute ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) predictions.<\/p>\n<p><strong>Compile Results into Markdown Report<\/strong><\/p>\n<p>\u00a0All findings are aggregated and formatted into a structured report.md.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Output Files<\/strong><\/h3>\n<p><strong>report.md<\/strong> \u2013 Comprehensive research report.<\/p>\n<p><strong>JSON files<\/strong> \u2013 Contain normalized entities, literature references, and ADMET predictions.<\/p>\n<p>This showcases the agents\u2019 ability to make decisions, use tools, and coordinate tasks without manual intervention.<\/p>\n<p>If you want to explore and dive in more, please check the README.md included in the github repository<\/p>\n\n<p><strong>Imagine if your lab could run 100 experiments overnight, what would you discover first?<\/strong><\/p>\n<p>But to make this vision real, the hard part isn\u2019t just the agents, it\u2019s the infrastructure they need to run.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Infrastructure: The Bottleneck<\/strong><\/h2>\n<p>AI science agents are powerful, but without the right infrastructure, they break quickly or can\u2019t scale. Real research workflows involve GPUs, complex dependencies, and large datasets. Here\u2019s where things get challenging, and where Docker becomes essential.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Pain Points<\/strong><\/h3>\n<p><strong>Heavy workloads:<\/strong> Running tools like AlphaFold or Boltz requires high-performance GPUs and smart scheduling (e.g., EKS, Slurm).<\/p>\n<p><strong>Reproducibility chaos:<\/strong> Different systems = broken environments. Scientists spend hours debugging libraries instead of doing science.<\/p>\n<p><strong>Toolchain complexity:<\/strong> Agents rely on multiple scientific tools (RDKit, PyMOL, Rosetta, etc.), each with their own dependencies.<\/p>\n<p><strong>Versioning hell:<\/strong> Keeping track of dataset\/model versions across runs is non-trivial, especially when collaborating.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Why Containers Matter<\/strong><\/h3>\n<p><strong>Standardized environments:<\/strong> Package your tools once, run them anywhere, from a laptop to the cloud.<\/p>\n<p><strong>Reproducible workflows:<\/strong> Every step of your agent\u2019s process is containerized, making it easy to rerun or share experiments.<\/p>\n<p><strong>Composable agents:<\/strong> Treat each step (e.g., literature search, folding, ADMET prediction) as a containerized service.<\/p>\n<p><strong>Smooth orchestration:<\/strong> You can use the CrewAI or other frameworks\u2019 capabilities to spin up containers and isolate tasks that need to run or validated output code without compromising the host.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Open Challenges &amp; Opportunities<\/strong><\/h2>\n<p>Science agents are powerful, but still early. There\u2019s a growing list of challenges where developers, researchers, and hackers can make a huge impact.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Unsolved Pain Points<\/strong><\/h3>\n<p><strong>Long-term memory:<\/strong> Forgetful agents aren\u2019t useful. We need better semantic memory systems (e.g., vector stores, file logs) for scientific reasoning over time.<\/p>\n<p><strong>Orchestration frameworks:<\/strong> Complex workflows require robust pipelines. Temporal, Kestra, Prefect, and friends could be game changers for bio.<\/p>\n<p><strong>Safety &amp; bounded autonomy:<\/strong> How do we keep agents focused and avoid \u201challucinated science\u201d? Guardrails are still missing.<\/p>\n<p><strong>Benchmarking agents:<\/strong> There\u2019s no standard to compare science agents. We need tasks, datasets, and metrics to measure real-world utility.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Ways to Contribute<\/strong><\/h3>\n<p><strong>Containerize<\/strong> more tools (models, pipelines, APIs) to plug into agent systems.<\/p>\n<p><strong>Create tests and benchmarks<\/strong> for evaluating agent performance in scientific domains.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n<p>We\u2019re standing at the edge of a new scientific paradigm, one where research isn\u2019t just accelerated by AI, but partnered with it. Science agents are transforming what used to be days of fragmented work into orchestrated workflows that run autonomously, reproducibly, and at scale.<\/p>\n<p>This shift from messy shell scripts and notebooks to containerized, intelligent agents isn\u2019t just about convenience. It\u2019s about opening up research to more people, compressing discovery cycles, and building infrastructure that\u2019s as powerful as the models it runs.<\/p>\n<p>Science is no longer confined to the lab. It\u2019s being automated in containers, scheduled on GPUs, and shipped by developers like you.<\/p>\n<p>Check out the repo and try building your own science agent. What workflow would you automate first?<\/p>","protected":false},"excerpt":{"rendered":"<p>It\u2019s 2 AM in a lab somewhere. A researcher has three terminals open, a half-written Jupyter notebook on one screen, [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2550","post","type-post","status-publish","format-standard","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2550","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=2550"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/2550\/revisions"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=2550"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=2550"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=2550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}