We recently tackled a difficult engineering problem at the intersection of eBPF, SSL, and Kubernetes. The goal was to build an automated tracing program that would capture all HTTP, MySQL, and PostgreSQL requests and queries within a Kubernetes cluster, without the need for a reverse proxy, a service mesh, or sidecar containers.
To achieve this, we ran a DaemonSet that attached a BPF program to the virtual interfaces of all of our pods and, in real time, intercepted the TCP packets, decrypted them, and exported them to an external system for further analysis.
In this article, we’ll explain how it all worked.
Getting the packets
To decrypt SSL, the first thing you need is the raw encrypted packets. There are many options for packet capture: netlink, BPF classic, and of course eBPF. Within eBPF, the options for packet introspection are TC (Traffic Control) programs, XDP (eXpress Data Path) programs, and cgroup socket programs.
We started with XDP but ran into problems pretty quickly on AWS, where XDP for ENI (elastic network interface) interfaces are unsupported out of the box. Wanting a smooth user experience, we ported the XDP program quickly to TC. Libbpf is poorly documented with regard to attaching TC programs, so we spent many hours comparing trace logs of the TC program with our code to correctly attach the TC program in the right part of the filter chain.
The next problem was getting the packets into Wireshark. Wireshark has supported old-school PCAP since eBPF was just BPF (Berkley Packet Filter) and PcapNG. Wanting a modern extensible foundation to build on, we chose PcapNG. After evaluating the PcapNG libraries available and noting they lacked support for advanced features, such as DSB (Decryption Secrets Block), that we knew we would need, we decided to create our own framework for writing PcapNG files.
Early on, it became clear that with perf buffers, having per-CPU buffers meant that packets would arrive out of order as they were buffered and that the kernel signaled data availability to epoll somewhat randomly. Each per-CPU buffer is its own file descriptor and needs to be registered with epoll for availability; luckily, libbpf took care of this for us, but it did not solve the ordering issue.
We eventually decided to use the ktime APIs and CLOCK_MONOTONIC to provide in-kernel ordering of the packets without having to dissect the packet flow and use TCP ordering. After dusting off the old data structures and algorithms text, we decided std::priority_queue was a perfect fit for reordering the packets based on ktime. After we applied some heuristics, it seemed that about 10ms worked well for reordering the packets sufficiently for Wireshark to finish the job of reassembling the TCP flows. On newer kernels, it’s possible to use atomic increments on a BPF map to create true serialization. As ktime has nanosecond accuracy, it’s good enough, and it might even be faster given the cache flushing that atomic increments cause. After considering maintaining two code paths, we went with the simple solution of just using ktime.
We then tested unencrypted traffic on MySQL, PostgreSQL, and HTTP to make sure we had a solid foundation and could start tackling SSL/TLS.
Simplified struct ssl_st / struct s3
>> Client Hello
unsigned char client_random[SSL3_RANDOM_SIZE];
<< Server Hello
unsigned char server_random[SSL3_RANDOM_SIZE];
Calculate on server and client
<< Server Change Cipher
<< Server Encrypted Extensions
<< Server Certificate
<< Server Certificate Verify
<< Server Handshake Finished
Calculate on server and client
>> Client Change Cipher
>> Client Application Data
<< Server New Session Ticket 1
<< Server New Session Ticket 2
<< Server Application Data
The big problem with libssl is its reliance on opaque pointers and its inability to call SSL functions from within an eBPF program. You can hook the start of a function or the end of the function, but all you can do is read memory, or CPU registers. Unlike with a debugger, functions cannot be called from an eBPF program. Technically, you can write process memory, too, but that didn’t solve any of our problems, so using any of the OpenSSL API to read the keys was out of the question.
Finding the offsets
11 fields needed to decrypt TLSv1.3, struct is simplified, full struct
Behind the typedefs and external and internal headers, SSL* is struct ssl_t and an anonymous struct s3 of the hundred or so variables in the SSL structs. We needed 11 of them, so we leveraged X Macros to avoid using C++ templates to calculate the offsets of the variables we needed.
For those who haven’t heard of X Macros, they allow you to repeat lists of code and apply a macro to them, so in our case we needed to define a struct and then populate it. X-Macros are likely the root of Greenspun’s tenth rule of programming: Any sufficiently complicated C or Fortran program contains an ad hoc, informally specified, bug-ridden, slow implementation of half of Common Lisp.
The ssl_offsets.c program was just copied into the OpenSSL source code, and once it was compiled, we didn’t even need to compile OpenSSL, since all we needed were the struct offsets. Once it was compiled, we ran the program, and it worked: we had the offsets we needed for that version of OpenSSL. Luckily, the SSL struct doesn’t change that much, so we only needed to run it against a few versions of OpenSSL.
Ssl_do_handshake is the core of SSL connections for the purposes of decrypting traffic — everything you need is here: SSL_write calls it, SSL_read calls it, SSL_accept calls it, and SSL_connect calls it. So rather than hooking all of those functions individually, you can just attach Ssl_do_handshake.
TLSv1.2 and earlier
TLSv1.2 is pretty easy to get the keys for; it’s two fields client_random and server_random, we only need to hook Ssl_do_handshake in libssl and store the SSL* pointer address in an array. In the return function hook, we grabbed the pointer to the struct stored on entry and read the two values we needed. Technically, the process can migrate CPUs between function entry and exit, but it tends not to. Once we had the keys, we copied the struct into the perf buffer, passed it to userspace, and wrote it out as a PcapNG Decryption Secrets Block.
TLSv1.3 and perfect forward secrecy
Perfect forward secrecy sounds great: no matter what happens to your private keys, no one can decrypt your connect unless they get the handshake keys during the few milliseconds when the TLS handshake is taking place. It turns out, rather unsurprisingly, that Wireshark needs these keys, and OpenSSL goes to great lengths to make them unreadable once the handshake is finished. Even if you have access to a program’s memory once they are derived, they are useless to Wireshark. It is indeed perfect secrecy — almost.
For about 30 lines of code in OpenSSL, or a few nanoseconds, the keys are still readable. Libbpf can hook only functions that exist in the symbol table — but none of the TLSv1.3 handshake functions existed in the symbol table, so we had to dig deeper into the primitives in libcrypto. Since the handshake keys are keys, they are created using EVP_PKEY_CTX_ctrl and EVP_PKEY_derive. When Ssl_do_handshake was called, we stored the addresses of the handshake keys and then waited for EVP_PKEY_CTX_ctrl & EVP_PKEY_derive to be called. Obviously, most of the calls to these functions would not bethe keys we were looking for, so if the address of the key didn’t exist in the hash of keys, we exited and didn’t record anything. When the address existed in the hash, we started recording how many times EVP_PKEY_derive had been called, because the key is in the correct state for Wireshark only the second time it is called for that address.
We had the TLSv1.3 keys and it still didn’t work, but TLSv1.2 did
Once we had the keys, half the battle was won. We used curl and the SSLKEYLOGFILE environment variable to determine that, yes, Wireshark can decrypt TLSv1.3 — but it won’t decrypt if the keys are in a DSB. Wait … what? This was probably the most maddening part of the whole exercise. We tested decryption with curl with TLS1.2, and it worked, every time. We tested with TLS1.3, and it didn’t work. But when we added the keylog file to Wireshark, all of a sudden it worked. We checked byte for byte that the Decryption Secrets Block was identical to the keylog file, and it still didn’t work.
We wondered if there might be a difference that we somehow couldn’t see between the Decryption Secrets Block and the keylog file, so we took the keylog file and injected it directly as a DSB into the PcapNG file. There’s a Wireshark tool to do just this, so we ran it and reran Wireshark — and it worked!
OK, but why? We compared the two DSBs in the packet capture and they were still identical, byte for byte. The only difference was that the Wireshark tool had injected the keylog as the first block of the packet cap, which was our first clue as to why it wasn’t working.
TLSv1.3 is a very complex handshake, but simple in terms of the number of packets; however, in TLSv1.2, on the first packet you have everything you need to decrypt. In TLSv1.3, you need a reply packet with the other half of the handshake keys, and Wireshark needs all of these keys in the PcapNG file before the first packet. Obviously, we had a problem: needing the data from a packet before it’s seen.
So what was the solution? Delay, delay, delay! Just delay the packets! Remember the packet reorder buffer? We asked ourselves, “What if whenever we get a DSB block, we write it immediately to the PcapNG file and just turn up the delay on the reorder buffer?” A few lines of code later, and voila! Wireshark is happy, and the new packet delay is 500ms — if the handshake takes longer than 500ms, we won’t be able to decrypt that stream, but it turns out that with the new reduced handshake procedure, this rarely happens.
Overall, while working with eBPF can be opaque and lead you down some strange rabbit holes, the possibilities are endless, and it truly feels like an unexplored frontier. If you have questions about anything in this article or just want to chat about eBPF, give us a shout on Twitter.