The XZ backdoor - the unfolding of an accidentally caught severe supply chain attack

Fadhili NjagiFadhili Njagi
16 min read

The accidental discovery of an insidious and well-planned supply chain attack on the XZ library sent massive shock waves throughout the open-source community. This attack effectively introduced a backdoor in the SSH service, rendering several Linux distributions vulnerable to unfettered access by the attacker.

The Linux community caught and thwarted one of the worst supply chain attacks in the history of open source. This attack eclipses the SolarWinds incident of 2020. The attack was an alarmingly well-planned and carefully executed campaign to compromise OpenSSH in all Linux distributions. OpenSSH powers Secure Shell (SSH), which is the most commonly used and the industry standard way to remotely access Linux systems. The SSH backdoor granted root access to anyone having the attacker's SSH key pair. With Linux powering over 90% of servers and 100% of the top 500 supercomputers in the world, had this attack gone unnoticed and distributed to all major stable Linux distributions, the attacker would have complete access to most of the world's critical systems.

Backdoor meaning
A backdoor is a covert, undocumented way to gain access to a computer system by bypassing normal authentication.

Let's break down what happened, what was affected, how it was caught, and what this means for Linux and open source software.

Background

What is XZ Utils?

As a computer user, you probably have come across compressed archive files. The most common one is could be .zip or .tar.gz. These file formats use the DEFLATE compression algorithm to reduce file size so the resulting archive takes up less space on disk or uses less network bandwidth. Windows users may be more familiar with .rar (Roshal Archive), which uses a proprietary compression format by RARLAB. .xz format, which uses the LZMA algoritm, has higher compression ratios which makes the resulting archives smaller. As a result, it is the choice compression format for Linux, for both file compression and compression of network traffic. On Linux, XZ is part of the liblzma package.

But Open source software is supposed to be secure!

It is true that open-source software (OSS) has more eyes on it and it should be, and most of the time is, more secure than proprietary software. The source codes of open-source software are available on publicly accessible version control systems for all to inspect and contribute. Code contributions are reviewed, approved and merged into the project source code by the designated project maintainers. This level of transparency, community involvement, and peer review makes it hard for vulnerabilities or malicious code to go unnoticed. In addition to that, there are automated A.I. powered testing tools like Google's OSS-Fuzz project, that continuously test open-source software for vulnerabilities and exploits.

However, the state of open source and the sophistication of the attack process made the attack not only possible but alarmingly successful.

How did this happen?

Lasse Collin, a developer living in Finland, has been thanklessly maintaining XZ utils, XZ for Java and XZ embedded since 2009 as hobby projects. He was starting to struggle to actively maintain said projects in 2022, due to personal reasons. Even though a lot of systems including corporate systems depend on xz utils, consumers often don't give back to the project in terms of monetary support or code contributions.

The preparation

Enter Jia Tan. Jia Tan was opened a Github account in January 2021 under the username JiaT75. After creating and committing to their own private repositories, Jia started making open source contributions around November 2021, including a commit to the libarchive library. The work Jia did seems to revolve around data compression, using the C language. Jia made their first commit to the tukaani-project/xs GitHub repo on 6th February 2022.

The repo is currently unavailable on Github after the discovery of the exploit.

With Lasse struggling to maintain the xz repos and falling behind on merging pull requests and delivering new features, Jia's contributions seemed god sent. Jia also contributed to tukaani-project/xs-java later that year. In hindsight, Jia's commit to the libarchive repo was devious and suspect, which should have been a red flag.

Phase 1: Infiltrating the xz library

Users and contributors of the xz repos began getting concerned about the state of the xz repos. The full mail thread can be found on this mail-archive.

Dennis EnsThur, 19 Mar 2022 12:26

Dear XZ Java Community

Is XZ for Java still maintained? I asked a question here a week ago and have not heard back. When I view the git log I can see it has not updated in over a year. I am looking for things like multithreaded encoding / decoding and a few updates that Brett Okken had submited (but are still waiting for merge). Should I add these things to only my local version, or is there a plan for these things in the future?


Lasse CollinThur, 19 Mar 2022 13:41

Is XZ for Java still maintained?

Yes, by some definition at least, like if someone reports a bug it will get fixed. Development of new features definitely isn't very active. :-(

I asked a question here a week ago and have not heard back.

I saw. I have lots of unanswered emails at the moment and obviously that isn't a good thing. After the latest XZ for Java release I've tried focus on XZ Utils (and ignored XZ for Java), although obviously that hasn't worked so well either even if some progress has happened with XZ Utils.

When I view the git log I can see it has not updated in over a year. I am looking for things like multithreaded encoding / decoding and a few updates that Brett Okken had submited (but are still waiting for merge). Should I add these things to only my local version, or is there a plan for these things in the future?

Brett Okken's patches I haven't reviewed so I cannot give definite answers about if you should include them in your local version, sorry.

[... some technical details ommitted]

Jia Tan has helped me off-list with XZ Utils and he might have a bigger role in the future at least with XZ Utils. It's clear that my resources are too limited (thus the many emails waiting for replies) so something has to change in the long term.

Enter an entitled, unappreciative and demanding xz user - Jigar Kumar - who puts unreasonable pressure on Lasse. Jigar had never participated on this forum before.

Jigar KumarTue, 07 Jun 2022 09:00

Progress will not happen until there is new maintainer. XZ for C has sparse commit log too. Dennis you are better off waiting until new maintainer happens or fork yourself. Submitting patches here has no purpose these days. The current maintainer lost interest or doesn't care to maintain anymore. It is sad to see for a repo like this.


Lasse CollinWed, 08 Jun 2022 03:28

I haven't lost interest but my ability to care has been fairly limited mostly due to longterm mental health issues but also due to some other things. Recently I've worked off-list a bit with Jia Tan on XZ Utils and perhaps he will have a bigger role in the future, we'll see.

It's also good to keep in mind that this is an unpaid hobby project.

Anyway, I assure you that I know far too well about the problem that not much progress has been made. The thought of finding new maintainers has existed for a long time too as the current situation is obviously bad and sad for the project.

A new XZ Utils stable branch should get released this year with threaded decoder etc. and a few alpha/beta releases before that. Perhaps the moment after the 5.4.0 release would be a convenient moment to make changes in the list of project maintainer(s).

Forks are obviously another possibility and I cannot control that. If those happen, I hope that file format changes are done so that no silly problems occur (like using the same ID for different things in two projects). 7-Zip supports .xz and keeping its developer Igor Pavlov informed about format changes (including new filters) is important too.

Even after Lasse revealed he was dealing with mental health issues, Jigar persisted.

Jigar KumarTue, 14 Jun 2022 11:16

With your current rate, I very doubt to see 5.4.0 release this year. The only progress since april has been small changes to test code. You ignore the many patches bit rotting away on this mailing list. Right now you choke your repo. Why wait until 5.4.0 to change maintainer? Why delay what your repo needs?


Dennis EnsTue, 21 Jun 2022 13:24

I am sorry about your mental health issues, but its important to be aware of your own limits. I get that this is a hobby project for all contributors, but the community desires more. Why not pass on maintainership for XZ for C so you can give XZ for Java more attention? Or pass on XZ for Java to someone else to focus on XZ for C? Trying to maintain both means that neither are maintained well.

Hmm. Curious. Dennis takes a more sympathetic tone but with the same recommendation - handing over maintainership of the repo to Jia Tan. Could they be in kahoots? Could they actually be working together, with a good cop (Dennis) bad cop (Jigar) arrangement?

Lasse CollinWed, 29 Jun 2022 13:07

Finding a co-maintainer or passing the projects completely to someone else has been in my mind a long time but it's not a trivial thing to do. For example, someone would need to have the skills, time, and enough long-term interest specifically for this. There are many other projects needing more maintainers too.

As I have hinted in earlier emails, Jia Tan may have a bigger role in the project in the future. He has been helping a lot off-list and is practically a co-maintainer already. :-) I know that not much has happened in the git repository yet but things happen in small steps. In any case some change in maintainership is already in progress at least for XZ Utils.

Phase 2: Getting everything in place

The social engineering worked: Lasse finally caved. But Jia didn't strike immediately and continued to make meaningful contributions and earn the trust of Lasse and the community at large. With more trust came more access. Jia made a pull request (PR) to google/oss-fuzz to replace Lasse's contacts with theirs.

Another mysterious Github account, under the name Hans Jansen, made the pull request liblzma: Add ifunc implementation to crc64_fast.c. to set up the the testing infrastructure that would later be used in executing the attack. There is little activity on Hans Jansen's Github Account before or after this PR, suggesting that the account was made specifically for this PR. On 8th July 2023, Jia also made a PR in google/oss-fuzz to disable ifunc for fuzzing builds. This is a feature the attack would later leverage, and Jia did not want the oss-fuzz tool to detect it. Eventually, Lasse granted Jia the rights to build and release xz versions.

On 16th February 2024, Jia added the malicious file build-to-host.m4 to .gitignore, which would be instrumental in the final phase.

Phase 3: Executing the attack

On 9th March 2024, Jia added two obfuscated and encrypted malicious binary files into the test folder of the xz repo:

  1. tests/files/bad-3-corrupt_lzma2.xz

  2. test/files/good-large_compressed.lzma

Obfuscation meaning
Obfuscation is making source or compiled code obscure intentionally to make it difficult for humans and computers to understand what the code does.

At this point, it was common to add corrupted .xz archives to test xz. The test files usually did not make it to the final build, and as such, were less scrutinised by developers. During the build pipeline, the m4/build-to-host.m4 macro got executed and uncorrupted the bad-3-corrupt_lzma2.xz through character substitution. The bad-3-corrupt_lzma2.xz file is then decompressed into a bash script which then ran, decompressing the good-large_compressed.lzma file, removing junk and decrypting the file. The output was the liblzma_la-crc64-fast.o binary backdoor which was then added to the compilation/linking process. This build process then culminates in a compromised liblzma tar ball (.tar.xz archive), released to the world as version 5.6.0.

The success of this attack is attributed to the way the backdoor was uploaded to the repo as corrupt, obfuscated and encrypted archives that got injected into the final program during the build process, instead of straight-up adding malicious code to the repo. This novel and genius approach enabled the exploit to go undetected by both man and AI. How ironic that a malicious .xz archive was used to compromise the xz project!

Here is how they are related. A library can tamper with the inner workings of any executable it is linked against as a dependency. Even though OpenSSH does not directly link to liblzma, Debian and other major Linux distributions patch sshd, OpenSSH's server process, to systemd, which in turn links to liblzma and could now allow xz utils to tamper with sshd. The compromised xz program specifically replaced the RSA_public_decrypt function pointer to point to the backdoor code, which would check for the attacker's predefined public RSA keys. This way, the sshd process would allow authentication with the attacker's private key, and grant the attacker root access to the server. With root access, the attacker could steal data, install malware, takeover systems, you name it. Imagination would be the only limit!

Hans Jansen appealed to Ubuntu, Red Hat and Debian to merge the new xz versions to their OS updates around the same week Hans' Debian Gitlab account was created. Around the same time, users started complaining of Valgrind notifications when using v5.6.0. Valgrind is a programming tool for debugging memory, detecting memory leak, and profiling software. Random users popped up in the Debian forum reassuring other users that it was a false warning and that this would be fixed in the next version. As a result, Jia further obfuscated the attack in version 5.6.1 to make it harder to detect.

Once the backdoor was in place, the attackers could now just wait until Linux distributions upgraded to liblzma v5.6.0 or v5.6.1. They were in it for the long game, remember?

The spread of the xz exploit

The bleeding edge Linux distributions that usually use the latest versions of software packages promptly upgraded to the compromised versions of liblzma. This includes distributions like:

  • Debian Testing (Sid)

  • Debian Unstable and Experimental

  • Arch Linux

  • OpenSuse Tumbleweed and OpenSuse MicroOS

  • Fedora 41

  • Fedora Rawhide

  • Kali Linux

  • Other unstable/experimental/rolling Linux releases

However, more stable or long term release (LTR) versions of Linux distributions like Ubuntu and Linux Mint do not receive updates so frequently to allow for extensive testing and catching bugs before updates are rolled out. Fortunately, the backdoor was discovered before the LTR releases upgraded to the compromised liblzma version. Since majority of servers use LTR versions of Linux instead of unstable bleeding-edge versions, the impact of this backdoor was limited. If you are using Ubuntu, you are probably safe. But just to be sure, run the following commands:

xz --version

If you see 5.6.0 or 5.6.1, run sudo apt update and sudo apt upgrade immediately.

Or

apt-cache policy liblzma5

If you see Installed: 5.6.0-0.2, run sudo apt update and sudo apt upgrade immediately.

I am a Kali Linux user, so imagine my horror when I ran xz --version and found out that I was using liblzma v5.6.0. Luckily, I don't leave SSH running on my PC and I did upgrade my system immediately.

The accidental discovery

Enter Andres Freund, a software engineer employed by Microsoft, accidentally discovered the XZ exploit while benchmarking PostgreSQL.

Andres FreundFri, 29 Mar 2024 18:32 • via Mastodon

I was doing some micro-benchmarking at the time, needed to quiesce the system to reduce noise. Saw sshd processes were using a surprising amount of CPU, despite immediately failing because of wrong usernames etc. Profiled sshd, showing lots of cpu time in liblzma, with perf unable to attribute it to a symbol. Got suspicious. Recalled that I had seen an odd valgrind complaint in automated testing of postgres, a few weeks earlier, after package updates.

Really required a lot of coincidences.

The same day, Andres sent an alert to OpenWall titled "Backdoor in upstream xz/liblzma leading to ssh server compromise". Redhat Linux then announced a Common Vulnerabilities and Exposure (CVE) entry CVE-2024-3094 with a severity score of 10. Read more details from Freund here. This GitHub Gist also contains all the technical information concerning the backdoor.

Response

The discovery of this exploit reverberated intensely within the open source community, as evidenced by the amount of YouTube videos and blog articles that cover this incident. GitHub immediately suspended Jia Tan and Lasse Collin's GitHub accounts. Suspending Lasse's account seems a bit unfair since he is a victim here. The GitHub repo of XZ Utils is also unreachable till now. It has been moved here. Debian also banned Hans Jansen's Debian Gitlab account. All open source contributions by Jia Tan and Hans Jansen are being reviewed with a fine tooth comb.

What this means for Linux and Open-source

So, we caught and foiled this insidious plan. Yaaay? No, we should be scared. We only accidentally caught the exploit. Andres, our hero, was not even doing security research or penetration testing. We just got lucky and thwarted the worst supply chain attack the world has seen in a while. The attack incorporated both effective social engineering against an overwhelmed, struggling open-source maintainer and simple but meticulous obfuscation that escaped detection by both man and AI.

Understanding the severity of the backdoor

As much as I am a fan of movies, I refused to accept that there exists a program like the Fast and Furious Saga's God's Eye, that could hack into any system, activate cameras and gather the data to track anyone on Earth. "No one can hack into just any device into the world," I told myself. These past few weeks proved that the world got close to such a scenario. The attacker would be able to access any Linux system with SSH enabled and do as they please. This is real, and now more than ever, supply chain attacks remain critical threats that we have to fiercely defend against. Even with incorporation of automated security checks using A.I., much more needs to be done.

Suspected state actor

There seems to be consensus among the security community that the attackers were not just rookies or some common hacking group. These attackers showed incredible coordination, skill and patience. They were playing 3D chess with open source software, patiently biding their time, and almost quietly succeeded. It is most likely that this was a well-funded organisation with good talent, such as a state actor.

Nonviable open-source model

It is now clear that having the world's infrastructure, commercial or non-commercial, depend on the efforts of unappreciated, unsupported and overworked open source contributors is not a viable strategy in the long run. They are more likely to be subverted by devious organisations or state actors, or fall victim to social engineering like Lasse did.

This could have happened to proprietary software

Such well-coordinated, skilled and most likely well funded bad actors could have as easily infiltrated companies and compromised them from the inside. Severe zero-day vulnerabilities are being discovered everyday in ubiquitous proprietary software. We have had several in the past, such as WhatsApp's call vulnerability in 2019, Zoom's numerous vulnerabilites over the years and even the numerous and frequent Windows ones.

Conclusion

No system or program is 100% secure. The XZ Utils backdoor serves as a stark reminder: vigilance is essential even in the most trusted corners of the open-source world. Supply chain attacks remain one of most severe threats in the last few years following the SolarWinds attack, the Log4j vulnerability and now the XZ exploit. We missed this one by the skin of our teeth, and we need to be prepared for the next one.

Jonathan Blow, as controversial as he may be, was correct in stating that there are thousands of skilled hackers employed by nefarious organisations or state actors specifically for the purpose of subverting or compromising software and systems, especially open source. In turn, the open source community needs to up its game at security lest the next attack goes undetected.

Stay alert, fellow tech enthusiasts!

💡
Open Source Software of the week: Valkey, the Linux Foundation's open source alternative to Redis, an in-memory, NoSQL, key-value data store.
2
Subscribe to my newsletter

Read articles from Fadhili Njagi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Fadhili Njagi
Fadhili Njagi

I'm a developer from Nairobi, Kenya. I have a BSc in Computer Science and I'm pursuing an MSc in AI. I mostly do full-stack web development. I do pay attention to the finer details, so stay tuned for awesome articles.