{"id":426,"date":"2024-04-05T03:12:18","date_gmt":"2024-04-05T03:12:18","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2024\/04\/05\/what-is-retrieval-augmented-generation-and-what-does-it-do-for-generative-ai\/"},"modified":"2024-04-05T03:12:18","modified_gmt":"2024-04-05T03:12:18","slug":"what-is-retrieval-augmented-generation-and-what-does-it-do-for-generative-ai","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2024\/04\/05\/what-is-retrieval-augmented-generation-and-what-does-it-do-for-generative-ai\/","title":{"rendered":"What is retrieval-augmented generation, and what does it do for generative AI?"},"content":{"rendered":"<p>One of the hottest topics in AI right now is <strong>RAG, or retrieval-augmented generation<\/strong>, which is a retrieval method used by some AI tools to improve the quality and relevance of their outputs.<\/p>\n<p><strong>Organizations want AI tools that use RAG<\/strong> because it makes those tools aware of proprietary data without the effort and expense of custom model training. RAG also keeps models up to date. \u2009When generating an answer without RAG, models can only draw upon data that existed when they were trained. With RAG, on the other hand, models can leverage a private database of newer information for more informed responses.<\/p>\n<p>We talked to <a href=\"https:\/\/githubnext.com\/\">GitHub Next<\/a>\u2019s Senior Director of Research, <a href=\"https:\/\/github.com\/idan\">Idan Gazit<\/a>, and Software Engineer, <a href=\"https:\/\/github.com\/colin353\">Colin Merkel<\/a>, to learn more about RAG and how it\u2019s used in generative AI tools.<\/p>\n<h2>Why everyone\u2019s talking about RAG<a href=\"https:\/\/github.blog\/category\/engineering\/#why-everyones-talking-about-rag\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h2>\n<p>One of the reasons you should always verify outputs from a generative AI tool is because its training data <strong>has a knowledge cut-off date<\/strong>. While models are able to produce outputs that are tailored to a request, they can only reference information that existed at the time of their training. But with RAG, an AI tool can use data sources beyond its model\u2019s training data to generate an output.<\/p>\n<h3>The difference between RAG and fine-tuning<a href=\"https:\/\/github.blog\/category\/engineering\/#the-difference-between-rag-and-fine-tuning\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h3>\n<p>Most organizations currently don\u2019t train their own AI models. Instead, they customize pre-trained models to their specific needs, often using RAG or <a href=\"https:\/\/github.blog\/2024-02-28-customizing-and-fine-tuning-llms-what-you-need-to-know\/#fine-tuning\">fine-tuning<\/a>. Here\u2019s a quick breakdown of how these two strategies differ.<\/p>\n<p><strong>Fine-tuning<\/strong> requires adjusting a model\u2019s weights, which results in a highly customized model that excels at a specific task. It\u2019s a good option for organizations that rely on codebases written in a specialized language, especially if the language isn\u2019t well-represented in the model\u2019s original training data.<\/p>\n<p><strong>RAG<\/strong>, on the other hand, doesn\u2019t require weight adjustment. Instead, it retrieves and gathers information from a variety of data sources to augment a prompt, which results in an AI model generating a more contextually relevant response for the end user.<\/p>\n<p>Some organizations start with RAG and then fine-tune their models to accomplish a more specific task. Other organizations find that RAG is a sufficient method for AI customization alone.<\/p>\n<h2>How AI models use context<a href=\"https:\/\/github.blog\/category\/engineering\/#how-ai-models-use-context\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h2>\n<p>In order for an AI tool to generate helpful responses, it <a href=\"https:\/\/github.blog\/2023-04-14-how-generative-ai-is-changing-the-way-developers-work\/\">needs the right context<\/a>. This is the same dilemma we face as humans when making a decision or solving a problem. It\u2019s hard to do when you don\u2019t have the right information to act on.<\/p>\n<p>So, let\u2019s talk more about context in the context () of generative AI:<\/p>\n<p>Today\u2019s generative AI applications are powered by large language models (LLMs) that are structured as <strong>transformers<\/strong>, and all transformer LLMs have a <strong>context window<\/strong>\u2014\u2009the amount of data that they can accept in a single prompt. Though context windows are limited in size, they can and will continue to grow larger as more powerful models are released.<\/p>\n<p><strong>Input data<\/strong> will vary depending on the AI tool\u2019s capabilities. For instance, when it comes to <strong>GitHub Copilot in the IDE<\/strong>, input data comprises all of the code in the file that you\u2019re currently working on. This is made possible because of our <strong><a href=\"https:\/\/github.blog\/2023-05-17-how-github-copilot-is-getting-better-at-understanding-your-code\/\">Fill-in-the-Middle<\/a> (FIM)<\/strong> paradigm, which makes GitHub Copilot aware of both the code before your cursor (the prefix) and after your cursor (the suffix).<\/p>\n<p>GitHub Copilot also processes code from your other open tabs (a process we call <a href=\"https:\/\/github.blog\/2023-05-17-how-github-copilot-is-getting-better-at-understanding-your-code\/#improving-semantic-understanding\">neighboring tabs<\/a>) to potentially find and add relevant information to the prompt. When there are a lot of open tabs, GitHub Copilot will scan the most recently reviewed ones.<\/p>\n<p>Because of the context window\u2019s limited size, the challenge of ML engineers is to figure out what input data to add to the prompt and in what order to generate the most relevant suggestion from the AI model. This task is known as <strong><a href=\"https:\/\/github.blog\/2023-06-20-how-to-write-better-prompts-for-github-copilot\/#whats-a-prompt-and-what-is-prompt-engineering:~:text=Prompts-,Prompt%20engineering,-Context\">prompt engineering<\/a><\/strong>.<\/p>\n<h2>How RAG enhances an AI model\u2019s contextual understanding<a href=\"https:\/\/github.blog\/category\/engineering\/#how-rag-enhances-an-ai-models-contextual-understanding\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h2>\n<p>With RAG, an LLM can go beyond training data and retrieve information from <strong>a variety of data sources, including customized ones<\/strong>.<\/p>\n<p>When it comes to <strong>GitHub Copilot Chat within GitHub.com and in the IDE<\/strong>, input data can include your conversation with the chat assistant, whether it\u2019s code or natural language, through a process called <a href=\"https:\/\/github.blog\/2023-10-30-the-architecture-of-todays-llm-applications\/#:~:text=or%20fine-tuning.-,In-context%20learning,-%2C%20sometimes%20referred\">in-context learning<\/a>. It can also include data from <strong>indexed repositories<\/strong> (public or private), <strong>a collection of Markdown documentation<\/strong> across repositories (that we refer to as <a href=\"https:\/\/docs.github.com\/enterprise-cloud@latest\/copilot\/github-copilot-enterprise\/copilot-chat-in-github\/managing-copilot-knowledge-bases\">knowledge bases<\/a>), and results from integrated <strong>search engines<\/strong>.  From these other sources, RAG will retrieve additional data to augment the initial prompt. As a result, it can generate a more relevant response.<\/p>\n<p>The type of input data used by GitHub Copilot will depend on which GitHub Copilot plan you\u2019re using.<\/p>\n\n<h2>RAG and semantic search<a href=\"https:\/\/github.blog\/category\/engineering\/#rag-and-semantic-search\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h2>\n<p><strong>Unlike keyword search or <a href=\"https:\/\/docs.github.com\/search-github\/github-code-search\/understanding-github-code-search-syntax#using-boolean-operations\">Boolean search operators<\/a><\/strong>, an ML-powered semantic search system uses its training data to understand the relationship between your keywords. So, rather than view, for example, \u201ccats\u201d and \u201ckittens\u201d as independent terms as you would in a keyword search, a semantic search system can understand, from its training, that those words are often associated with cute videos of the animal. Because of this, a search for just \u201ccats and kittens\u201d might rank a cute animal video as a top search result.<\/p>\n<p><strong>How does semantic search improve the quality of RAG retrievals?<\/strong> When using a customized database or search engine as a RAG data source, semantic search can improve the context added to the prompt and overall relevance of the AI-generated output.<\/p>\n<p>The semantic search process is at the heart of retrieval. \u201cIt surfaces great examples that often elicit great results,\u201d Gazit says.<\/p>\n<div class=\"wp-video\"><!--[if lt IE 9]&gt;document.createElement('video');&lt;![endif]--><br \/>\n<a href=\"https:\/\/github.blog\/wp-content\/uploads\/2024\/02\/BLOG2_chat-knowledge-base_002.mp4\">https:\/\/github.blog\/wp-content\/uploads\/2024\/02\/BLOG2_chat-knowledge-base_002.mp4<\/a><\/div>\n<p>\n<em>Developers can use Copilot Chat on GitHub.com to ask questions and receive answers about a codebase in natural language, or surface relevant documentation and existing solutions.<\/em><\/p>\n<h2>RAG data sources: Where RAG uses semantic search<a href=\"https:\/\/github.blog\/category\/engineering\/#rag-data-sources-where-rag-uses-semantic-search\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h2>\n<p>You\u2019ve probably read dozens of articles (including some of our own) that talk about RAG, vector databases, and embeddings. And even if you haven\u2019t, here\u2019s something you should know: <strong>RAG doesn\u2019t require embeddings or vector databases<\/strong>.<\/p>\n<p>A RAG system can use semantic search to retrieve relevant documents, whether from an embedding-based retrieval system, traditional database, or search engine. The snippets from those documents are then formatted into the model\u2019s prompt. We\u2019ll provide a quick recap of vector databases and then, using GitHub Copilot Enterprise as an example, cover how <strong>RAG retrieves data from a variety of sources<\/strong>.<\/p>\n<h3>Vector databases<a href=\"https:\/\/github.blog\/category\/engineering\/#vector-databases\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h3>\n<p><strong>Vector databases<\/strong> are optimized for storing embeddings of your repository code and documentation. They allow us to use novel search parameters to find matches between similar vectors.<\/p>\n<p>To retrieve data from a vector database, code and documentation are converted into <strong>embeddings<\/strong>, a type of high-dimensional vector, to make them searchable by a RAG system.<\/p>\n<p><strong>Here\u2019s how RAG retrieves data from vector databases<\/strong>: while you code in your IDE, algorithms create embeddings for your code snippets, which are stored in a vector database. Then, an AI coding tool can search that database by <strong>embedding similarity<\/strong> to find snippets from across your codebase that are related to the code you\u2019re currently writing and generate a coding suggestion. Those snippets are often highly relevant context, enabling an AI coding assistant to generate a more contextually relevant coding suggestion. GitHub Copilot Chat uses embedding similarity in the IDE and on GitHub.com, so it finds code and documentation snippets related to your query.<\/p>\n<p>Embedding similarity \u2009is incredibly powerful because it identifies code that has subtle relationships to the code you\u2019re editing.<\/p>\n<p>\u201cEmbedding similarity might surface code that uses the same APIs, or code that performs a similar task to yours but that lives in another part of the codebase,\u201d Gazit explains. \u201cWhen those examples are added to a prompt, the model\u2019s primed to produce responses that mimic the idioms and techniques that are native to your codebase\u2014even though the model was not trained on your code.\u201d<\/p>\n<h3>General text search and search engines<a href=\"https:\/\/github.blog\/category\/engineering\/#general-text-search-and-search-engines\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h3>\n<p>With a <strong>general text search<\/strong>, any documents that you want to be accessible to the AI model are indexed ahead of time and stored for later retrieval. For instance, RAG in GitHub Copilot Enterprise can retrieve data from files in an indexed repository and <a href=\"https:\/\/docs.github.com\/enterprise-cloud@latest\/copilot\/github-copilot-enterprise\/copilot-chat-in-github\/managing-copilot-knowledge-bases\">Markdown files across repositories<\/a>.<\/p>\n\n<div class=\"content-button-wrap text-center\"><a href=\"https:\/\/docs.github.com\/copilot\/github-copilot-enterprise\/overview\/github-copilot-enterprise-feature-set\" target=\"_self\" class=\"btn-mktg arrow-target-mktg\" rel=\"noopener\">Learn more about GitHub Copilot Enterprise features<\/a><\/div>\n<p><\/p>\n<p>RAG can also retrieve information from <strong>external and internal search engines<\/strong>. When integrated with an external search engine, RAG can search and retrieve information from the entire internet. When integrated with an internal search engine, it can also access information from within your organization, like an internal website or platform. Integrating both kinds of search engines supercharges RAG\u2019s ability to provide relevant responses.<\/p>\n<p>For instance, GitHub Copilot Enterprise integrates both Bing, an external search engine, and an <a href=\"https:\/\/github.blog\/2023-05-08-github-code-search-is-generally-available\/\">internal search engine<\/a> built by GitHub into Copilot Chat on GitHub.com. Bing integration allows GitHub Copilot Chat to conduct a web search and retrieve up-to-date information, like about the latest Java release. But without a search engine searching internally, \u201dCopilot Chat on GitHub.com cannot answer questions about your private codebase unless you provide a specific code reference yourself,\u201d explains Merkel, who helped to build GitHub\u2019s internal search engine from scratch.<\/p>\n<p><strong>Here\u2019s how this works in practice.<\/strong> When a developer asks a question about a repository to GitHub Copilot Chat in GitHub.com, RAG in Copilot Enterprise uses the internal search engine to find relevant code or text from indexed files to answer that question. To do this, the internal search engine conducts a semantic search by analyzing the content of documents from the indexed repository, and then ranking those documents based on relevance. GitHub Copilot Chat then uses RAG, which also conducts a semantic search, to find and retrieve the most relevant snippets from the top-ranked documents. Those snippets are added to the prompt so GitHub Copilot Chat can generate a relevant response for the developer.<\/p>\n<h2>Key takeaways about RAG<a href=\"https:\/\/github.blog\/category\/engineering\/#key-takeaways-about-rag\" class=\"heading-link pl-2 text-italic text-bold\"><\/a><\/h2>\n<p>RAG offers an effective way to customize AI models, helping to ensure outputs are up to date with organizational knowledge and best practices, and the latest information on the internet.<\/p>\n<p>GitHub Copilot uses a variety of methods to improve the quality of input data and contextualize an initial prompt, and that ability is enhanced with RAG. What\u2019s more, the RAG retrieval method in GitHub Copilot Enterprise goes beyond vector databases and includes data sources like general text search and search engine integrations, which provides even more cost-efficient retrievals.<\/p>\n<p>Context is everything when it comes to getting the most out of an AI tool. To improve the relevance and quality of a generative AI output, you need to improve the relevance and quality of the input.<\/p>\n<p>As Gazit says, \u201cQuality in, quality out.\u201d<\/p>\n<div class=\"post-content-cta\">\n<p>Looking to bring the power of GitHub Copilot Enterprise to your organization? <a href=\"https:\/\/github.com\/features\/copilot\/\">Learn more<\/a> about GitHub Copilot Enterprise or <a href=\"https:\/\/github.com\/features\/copilot\/plans\">get started now<\/a>.<\/p>\n<\/div>\n<p>The post <a href=\"https:\/\/github.blog\/2024-04-04-what-is-retrieval-augmented-generation-and-what-does-it-do-for-generative-ai\/\">What is retrieval-augmented generation, and what does it do for generative AI?<\/a> appeared first on <a href=\"https:\/\/github.blog\/\">The GitHub Blog<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>One of the hottest topics in AI right now is RAG, or retrieval-augmented generation, which is a retrieval method used [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":427,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[8],"tags":[],"class_list":["post-426","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-github-engineering"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/426","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=426"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/426\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/427"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=426"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=426"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=426"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}