{"id":3860,"date":"2026-04-16T16:12:08","date_gmt":"2026-04-16T16:12:08","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/04\/16\/how-github-uses-ebpf-to-improve-deployment-safety\/"},"modified":"2026-04-16T16:12:08","modified_gmt":"2026-04-16T16:12:08","slug":"how-github-uses-ebpf-to-improve-deployment-safety","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/04\/16\/how-github-uses-ebpf-to-improve-deployment-safety\/","title":{"rendered":"How GitHub uses eBPF to improve deployment safety"},"content":{"rendered":"<p>Did you know that, at GitHub, we host all of our own source code on <a href=\"http:\/\/github.com\/\">github.com<\/a>? We do this because we\u2019re our own biggest customer\u2014testing out changes internally before they go to users. However, there\u2019s one downside: If github.com were ever to go down, we wouldn\u2019t be able to access our own source code.<\/p>\n<p>This is what you\u2019d call a very simple circular dependency: to deploy GitHub, we needed GitHub. If GitHub is down, then we wouldn\u2019t be able to deploy something to fix it. We mitigate this by maintaining a mirror of our code for fixing forward and built assets for rolling back.<\/p>\n<p>So we\u2019re done, right? Problem solved? Nope, there are more circular dependencies to consider. For example, how do you stop a deployment script introducing a circular dependency of its own on an internal service or downloading a binary from GitHub?<\/p>\n<p>When we started to design our new host-based deployment system, we evaluated some new approaches to prevent deployment code from creating circular dependencies. We found that using eBPF, we could selectively monitor and block those calls. In this blog post, we\u2019ll take you through our findings and show how you can get started writing your own eBPF programs.<\/p>\n<h2 class=\"wp-block-heading\">Types of circular dependencies<\/h2>\n<p>Let\u2019s start by looking at the types of circular dependencies through a hypothetical scenario.<\/p>\n<p>Suppose a MySQL outage occurs, which causes GitHub to be unable to serve <code>release<\/code> data from repositories. To resolve the incident, we need to roll out a configuration change to the stateful MySQL nodes that are impacted. This configuration change is applied by executing a deploy script on each node.<\/p>\n<p>Now, let\u2019s look at the different types of circular dependencies that could impact GitHub during this scenario.<\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Direct dependency<\/strong>: The MySQL deploy script attempts to pull the latest release of an\u00a0open source\u00a0tool from GitHub. Since GitHub\u00a0can\u2019t\u00a0serve the release data (due to the outage), the script\u00a0can\u2019t\u00a0complete.\u00a0\u00a0<\/li>\n<\/ol>\n<figure class=\"wp-block-image size-full\"><img data-opt-id=1931513887  fetchpriority=\"high\" decoding=\"async\" data-recalc-dims=\"1\" width=\"1433\" height=\"194\" src=\"https:\/\/github.blog\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-06-at-6.36.24-PM.png?resize=1433%2C194\" alt=\"Diagram showing a MySQL deploy script fails after attempting to pull the latest release of an\u00a0open\u00a0source\u00a0tool from GitHub.\" class=\"wp-image-95085\" \/><\/figure>\n<ol start=\"2\" class=\"wp-block-list\">\n<li><strong>Hidden dependencies<\/strong>: The MySQL deploy script uses a servicing tool that is already present on the machine\u2019s disk. However, when the tool runs, it checks GitHub to see if an update is available. If it\u2019s unable to contact GitHub (due to the outage), the script may fail or hang, depending on how the tool handles the error when checking for updates.<\/li>\n<\/ol>\n<figure class=\"wp-block-image size-full\"><img data-opt-id=1118018614  fetchpriority=\"high\" decoding=\"async\" data-recalc-dims=\"1\" width=\"1439\" height=\"363\" src=\"https:\/\/github.blog\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-06-at-6.36.34-PM.png?resize=1439%2C363\" alt=\"Diagram showing a script failing after being unable to contact GitHub (due to the outage).\" class=\"wp-image-95086\" \/><\/figure>\n<ol start=\"3\" class=\"wp-block-list\">\n<li><strong>Transient dependencies<\/strong>: The MySQL deploy script calls, via an API, another internal service (for example, a migrations service), which in turn attempts to fetch the latest release of an open source tool from GitHub to use the new binary. The failure propagates back to the deploy script.<\/li>\n<\/ol>\n<figure class=\"wp-block-image size-full\"><img data-opt-id=1593835597  data-opt-src=\"https:\/\/github.blog\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-06-at-6.36.41-PM.png\"  decoding=\"async\" data-recalc-dims=\"1\" width=\"1450\" height=\"202\" src=\"data:image/svg+xml,%3Csvg%20viewBox%3D%220%200%20100%%20100%%22%20width%3D%22100%%22%20height%3D%22100%%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%3Crect%20width%3D%22100%%22%20height%3D%22100%%22%20fill%3D%22transparent%22%2F%3E%3C%2Fsvg%3E?resize=1450%2C202\" alt=\"Diagram showing a MySQL deploy script calling, via an API, another internal service, which in turn attempts to fetch the latest release of an open source tool from GitHub to use the new binary. The failure propagates back to the deploy script.\" class=\"wp-image-95088\" \/><\/figure>\n<h2 class=\"wp-block-heading\">How do you solve these circular dependencies?<\/h2>\n<p>Until recently, the onus has been on every team who that owns stateful hosts to review their deployment scripts and identify circular dependencies.<\/p>\n<p>In practice, however, many dependencies aren\u2019t identified until an incident occurs, which can delay recovery.<\/p>\n<p>The obvious route would be to block access to github.com from the machines to validate that the system can deploy without it. But these hosts are stateful and serve customer traffic even during rolling deploys, drains, or restarts. Blocking github.com entirely would impact their ability to handle production requests.<\/p>\n<p>This is where we started to look at eBPF, which lets you load custom programs into the Linux kernel and hook into core system primitives like networking.<\/p>\n<p>We were particularly interested in the <a href=\"https:\/\/docs.ebpf.io\/linux\/program-type\/BPF_PROG_TYPE_CGROUP_SKB\/\"><code>BPF_PROG_TYPE_CGROUP_SKB<\/code> program type<\/a> because it lets you hook network egress from a particular cGroup.<\/p>\n<p>A <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cgroups\">cGroup<\/a> is a Linux primitive (used heavily by Docker but not limited to it) that enforces resource limits and isolation for sets of processes. You can create a cGroup, configure it, and move processes into it\u2014no Docker required.<\/p>\n<p>This started to look very promising. Could we create a cGroup, place only the deployment script inside it, and then limit the outbound network access of only that script? It certainly looked possible, so we started to build a proof of concept.<\/p>\n<h2 class=\"wp-block-heading\">Building out per-process conditional network filtering with eBPF<\/h2>\n<p>We started on a proof of concept in <code>go<\/code> that used the <code><a href=\"https:\/\/github.com\/cilium\/ebpf\">cilium\/ebpf<\/a><\/code> library.<\/p>\n<p>ebpf-go is a pure-Go library to read, modify, and load eBPF programs and attach them to various hooks in the Linux kernel.<\/p>\n<p>It massively simplifies the process of authoring, building, and running programs that use eBPF. For example, to hook the <a href=\"https:\/\/docs.ebpf.io\/linux\/program-type\/BPF_PROG_TYPE_CGROUP_SKB\/\"><code>BPF_PROG_TYPE_CGROUP_SKB<\/code> program type<\/a>, we can do this as follows: \ud83d\udc47<\/p>\n<div class=\"wp-block-code-wrapper\">\n<pre class=\"wp-block-code\"><code>\/\/go:generate go tool bpf2go -tags linux bpf cgroup_skb.c -- -I..\/headers \n\n \n\nfunc main() { \n\n   \/\/ Load pre-compiled programs and maps into the kernel. \n\n   objs := bpfObjects{} \n\n   if err := loadBpfObjects(&amp;objs, nil); err != nil { \n\n       log.Fatalf(\"loading objects: %v\", err) \n\n   } \n\n   defer objs.Close() \n\n \n\n   \/\/ Link the count_egress_packets program to the cgroup. \n\n   l, err := link.AttachCgroup(link.CgroupOptions{ \n\n       Path:    \"\/sys\/fs\/cgroup\/system.slice\", \n\n       Attach:  ebpf.AttachCGroupInetEgress, \n\n       Program: objs.CountEgressPackets, \n\n   }) \n\n   if err != nil { \n\n       log.Fatal(err) \n\n   } \n\n   defer l.Close() \n\n \n\n   log.Println(\"Counting packets...\") \n\n \n\n   \/\/ Read loop reporting the total amount of times the kernel \n\n   \/\/ function was entered, once per second. \n\n   ticker := time.NewTicker(1 * time.Second) \n\n   defer ticker.Stop() \n\n \n\n   for range ticker.C { \n\n       var value uint64 \n\n       if err := objs.PktCount.Lookup(uint32(0), &amp;value); err != nil { \n\n           log.Fatalf(\"reading map: %v\", err) \n\n       } \n\n       log.Printf(\"number of packets: %dn\", value) \n\n   } \n\n} <\/code><\/pre>\n<\/div>\n<p>With the eBPF program:<\/p>\n<div class=\"wp-block-code-wrapper\">\n<pre class=\"wp-block-code\"><code>\/\/go:build ignore \n\n \n\n#include \"common.h\" \n\n \n\nchar __license[] SEC(\"license\") = \"Dual MIT\/GPL\"; \n\n \n\nstruct { \n\n   __uint(type, BPF_MAP_TYPE_ARRAY); \n\n   __type(key, u32); \n\n   __type(value, u64); \n\n   __uint(max_entries, 1); \n\n} pkt_count SEC(\".maps\"); \n\n \n\nSEC(\"cgroup_skb\/egress\") \n\nint count_egress_packets(struct __sk_buff *skb) { \n\n   u32 key      = 0; \n\n   u64 init_val = 1; \n\n \n\n   u64 *count = bpf_map_lookup_elem(&amp;pkt_count, &amp;key); \n\n   if (!count) { \n\n       bpf_map_update_elem(&amp;pkt_count, &amp;key, &amp;init_val, BPF_ANY); \n\n       return 1; \n\n   } \n\n   __sync_fetch_and_add(count, 1); \n\n \n\n   return 1; \n\n} <\/code><\/pre>\n<\/div>\n<p>The <code>\/\/go:generate<\/code> line handles compiling the eBPF C code and auto-generating the <code>bpfObjects<\/code> struct, which allows us to attach and interact with the program. This means a simple <code>go build<\/code> is all you need. \ud83e\udd73<\/p>\n<p>(<code>cilium\/ebpf<\/code> has a great set of examples to get started. <a href=\"https:\/\/github.com\/cilium\/ebpf\/tree\/main\/examples\/cgroup_skb\">Review the full code from above<\/a>).<\/p>\n<p>There was still a missing piece though: <code>CGROUP_SKB<\/code> operates on IP addresses. Given the breadth of GitHub\u2019s systems and rate of change, keeping an up-to-date block IP list would be very hard.<\/p>\n<p>Could we use more eBPF to create a DNS-based blocked list? Yes, it turns out we could.<\/p>\n<p>An eBPF <a href=\"https:\/\/docs.ebpf.io\/linux\/program-type\/BPF_PROG_TYPE_CGROUP_SOCK_ADDR\/\">program type of <code>BPF_PROG_TYPE_CGROUP_SOCK_ADDR<\/code><\/a> allows you to hook syscalls to create sockets <strong>and change the destination IP<\/strong>.<\/p>\n<p>Here is a simplified example where we rewrite any <code>connect4<\/code> syscall targeting DNS (Port 53) to <code>localhost:53<\/code>.<\/p>\n<div class=\"wp-block-code-wrapper\">\n<pre class=\"wp-block-code\"><code>cgroupLink, err := link.AttachCgroup(link.CgroupOptions{ \n\n       Path:    cgroup.Name(), \n\n       Attach:  ebpf.AttachCGroupInet4Connect, \n\n       Program: obj.Connect4, \n\n   }) \n\n   if err != nil { \n\n       return nil, fmt.Errorf(\"attaching eBPF program Connect4 to cgroup: %w\", err) \n\n   } <\/code><\/pre>\n<\/div>\n<div class=\"wp-block-code-wrapper\">\n<pre class=\"wp-block-code\"><code>\/* This is the hexadecimal representation of 127.0.0.1 address *\/ \n\nconst __u32 ADDRESS_LOCALHOST_NETBYTEORDER = bpf_htonl(0x7f000001); \n\n \n\nSEC(\"cgroup\/connect4\") \n\nint connect4(struct bpf_sock_addr *ctx) { \n\n __be32 original_ip = ctx-&gt;user_ip4; \n\n __u16 original_port = bpf_ntohs(ctx-&gt;user_port); \n\n \n\n if (ctx-&gt;user_port == bpf_htons(53)) { \n\n   \/* For DNS Query (*:53) rewire service to backend \n\n    * 127.0.0.1:const_dns_proxy_port *\/ \n\n   ctx-&gt;user_ip4 = const_mitm_proxy_address; \n\n   ctx-&gt;user_port = bpf_htons(const_dns_proxy_port); \n\n } \n\n \n\n return 1; \n\n} <\/code><\/pre>\n<\/div>\n<p>We used this to intercept DNS queries from the cGroup and forward them to a userspace DNS proxy we run.<\/p>\n<p>Now, any DNS queries initiated by the deployment script are routed through our DNS proxy. Our proxy evaluates each requested domain against our block list and uses <a href=\"https:\/\/docs.ebpf.io\/linux\/concepts\/maps\/\">eBPF Maps<\/a> to communicate with the <code>CGROUP_SKB<\/code> program, allowing or denying the request accordingly.<\/p>\n<p>If you\u2019d like to dig into the code, here\u2019s <a href=\"https:\/\/github.com\/lawrencegripper\/ebpf-cgroup-firewall\/\">an early proof of concept<\/a> we put together. Our current implementation has progressed since then, but this should serve as a good intro.<\/p>\n<p>Like any fun project, the deeper we got, the more we realized we could do.<\/p>\n<p>For example, could we correlate blocked DNS requests back to the specific command or process that triggered them, so teams could more easily debug and fix issues? Yes, we can!<\/p>\n<p>Inside the <a href=\"https:\/\/docs.ebpf.io\/linux\/program-type\/BPF_PROG_TYPE_CGROUP_SKB\/\"><code>BPF_PROG_TYPE_CGROUP_SKB<\/code> program type<\/a>, we have <a href=\"https:\/\/docs.ebpf.io\/linux\/program-context\/__sk_buff\/\">the <code>skb_buff<\/code><\/a> from which we can pull the <a href=\"https:\/\/beta.computer-networking.info\/syllabus\/default\/protocols\/dns.html\">DNS transaction ID<\/a> and also <a href=\"https:\/\/docs.ebpf.io\/linux\/helper-function\/bpf_get_current_pid_tgid\/\">capture the Process ID<\/a> (PID) that initiated the request. We place this information into another eBPF Map tracking <code>DNS Transaction ID -&gt; Process ID<\/code>.<\/p>\n<p>Here is a simplified version of the eBPF code (see this <a href=\"https:\/\/github.com\/lawrencegripper\/ebpf-cgroup-firewall\/blob\/main\/pkg\/ebpf\/bpf.c#L338-L360\">PoC code<\/a> for full example):<\/p>\n<div class=\"wp-block-code-wrapper\">\n<pre class=\"wp-block-code\"><code>  __u32 pid = bpf_get_current_pid_tgid() &gt;&gt; 32; \n\n     __u16 skb_read_offset = sizeof(struct iphdr) + sizeof(struct udphdr); \n\n     __u16 dns_transaction_id = \n\n         get_transaction_id_from_dns_header(skb, skb_read_offset); \n\n \n\n     if (pid &amp;&amp; dns_transaction_id != 0) { \n\n       bpf_map_update_elem(&amp;dns_transaction_id_to_pid, &amp;dns_transaction_id, \n\n                           pid, BPF_ANY); \n\n     } <\/code><\/pre>\n<\/div>\n<p>As we\u2019re redirecting all DNS calls to our userspace DNS proxy, we can look at the transaction ID of each request, find the domain being resolved, and lookup in the eBPF Map to see which process made the request. By reading <code>\/proc\/{PID}\/cmdline<\/code>, we can even extract the full command line that triggered the request.<\/p>\n<p>Then we can output a log line with all the information:<\/p>\n<div class=\"wp-block-code-wrapper\">\n<pre class=\"wp-block-code\"><code>&gt; WARN DNS BLOCKED reason=FromDNSRequest blocked=true blockedAt=dns domain=github.com. pid=266767 cmd=\"curl github.com \" firewallMethod=blocklist<\/code><\/pre>\n<\/div>\n<p>With that, we\u2019re done.<\/p>\n<p>We can now:<\/p>\n<ul class=\"wp-block-list\">\n<li>Conditionally block domains that would cause circular dependencies from deployment scripts.<\/li>\n<li>Inform the owning team which command triggered the blocked request.<\/li>\n<li>Provide an audit list of all domains contacted during a deployment.<\/li>\n<li>Use the cGroups to enforce CPU and memory limits on deploy scripts, preventing runaway resource usage from impacting workloads.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\">What\u2019s next?<\/h2>\n<p>Our new circular dependency detection process is live after a six-month rollout.<\/p>\n<p>Now, if a team accidentally adds a problematic dependency, or if an existing binary tool we use takes a new dependency, the tooling will detect that problem and flag it to the team.<\/p>\n<p>The net result is a more stable GitHub and faster mean time to recovery during incidents (due to the removal of these circular dependencies).<\/p>\n<p>Are there ways for circular dependencies to still trip things up? You bet\u2014and we\u2019ll look to improve the tool as we discover them.<\/p>\n<h2 class=\"wp-block-heading\">Want to dive in?<\/h2>\n<p>Has this piqued your interest in what you might be able to do with eBPF?<\/p>\n<p>Get started by having a look through the examples in <a href=\"https:\/\/github.com\/cilium\/ebpf\/tree\/main\/examples\">cilium\/ebpf<\/a> and the great documentation on the <a href=\"http:\/\/docs.ebpf.io\/\">docs.ebpf.io<\/a> site.<\/p>\n<p>If you\u2019re not quite ready to start writing your own eBPF tools, try open source tools powered by eBPF, like <a href=\"https:\/\/bpftrace.org\/tutorial-one-liners#lesson-3-file-opens\">bpftrace for deep tracing<\/a> or <a href=\"https:\/\/github.com\/mozillazg\/ptcpdump\">ptcpdump to get TCP dumps<\/a> with container-level metadata.<\/p>\n<p>The post <a href=\"https:\/\/github.blog\/engineering\/infrastructure\/how-github-uses-ebpf-to-improve-deployment-safety\/\">How GitHub uses eBPF to improve deployment safety<\/a> appeared first on <a href=\"https:\/\/github.blog\/\">The GitHub Blog<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Did you know that, at GitHub, we host all of our own source code on github.com? We do this because [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3861,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[8],"tags":[],"class_list":["post-3860","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-github-engineering"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3860","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=3860"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3860\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/3861"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=3860"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=3860"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=3860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}