Theory of a Dependency Confusion
Dependency Confusion is a vulnerability that can exist if our organization uses internal dependencies that are managed through a dependency manager. In short, a race condition can be created by an attacker that could lead to a malicious dependency being used instead of the internal one. In this task, we will look into the theory of a Dependency Confusion vulnerability before practically exploiting one in the next task.
Background
Dependency Confusion was discovered by Alex Birsan in 2021. The issue stems from how internal dependencies are managed. Let's take a look at a simple example in Python:
What actually happens in the background when we run this command? When we run this command, pip will connect to the external PyPi repository to look for a package called numpy, find the latest version, and install it. In the past, there have been some interesting ways this package could be compromised through a supply chain attack:
Typosquatting - An attacker hosts a package called
nunpy
, hoping that a developer will mistype the name and install their malicious package.Source Injection - An attacker contributes to the package for a new feature through a pull request but also embeds a vulnerability in the code that could be used to compromise applications that make use of the package.
Domain Expiry - Sometimes, the developers of packages may forget to renew the domain where their email is being hosted. If this happens, an attacker can buy the expired domain, allowing them full control over email, which could be used to reset the password of a package maintainer to gain full control over the package. This is a common risk for legacy packages on these external repositories.
There are several other supply chain attack methods, but all of them target the dependency or its maintainers directly. If we wanted to use pip to install an internal package, and we followed the example on StackOverflow (like all good developers do), our build step would look something like this:
The --extra-index-url
argument tells pip that an additional Pypi server should be inspected for the package. But what if numpy exists in both the internal repo and the external, public-facing PyPi repo? How does pip know which package to install? Well, it's simple, it will collect the package from all available repos, compare the version numbers, and then install the package with the highest version number. You should start to see the problem here.
Staging a Dependency Confusion Attack
All an attacker really needs to stage an internal dependency attack is the name of one of your internal dependencies. While this might seem like a challenge, it happens more frequently than you would expect:
Developers often ask questions on public forums such as StackOverflow but do not obfuscate sensitive information such as the names of libraries being used, some of which could be internal dependencies.
Some compiled applications like NodeJS will often disclose internal package names in their
package.json
file, which is usually exposed in the application itself.
Once an attacker learns the name of an internal dependency, they can attempt to host a package with a similar name on one of the external package repos but with a higher version number. This will force any system that attempts to build the application and install the dependency to get confused between the internal and external package, and if the external one is chosen, the attacker's dependency confusion attack will succeed. The full attack is shown in the diagram below:
Considerations
There are a couple of things that should be kept in mind:
Since we only know the name of the internal package, not the actual source code of the package, if we perform a dependency confusion attack, the build process of the pipeline will most likely fail at a later step since the actual package was not installed.
Our external version number must be higher than the version number of the internal package for the confusion to work in our favor. However, this is easy since we can simply have a package with version number 9000 since most packages have major version numbers lower than 10.
Dependency confusion can affect any type of package, such as Python pip packages, JavaScript npm packages, or Ruby gems packages.
Defenses
Protecting internal dependencies is a massive security endeavor. Since we have to create, maintain, and host these dependencies ourselves, the security project is much larger than that of external dependencies. The following defense strategies should be considered for all internal dependencies:
Internal dependencies should be actively maintained. This will ensure that vulnerabilities in these dependencies do not affect multiple applications and services.
The hosting infrastructure of internal dependencies should be secured. The following Microsoft whitepaper provides the following three key focus areas:
Reference one private feed, not multiple. This contributes to protecting against dependency confusion attacks. With our python example, we would then use
--index-url
the argument instead of--extra-index-url
to indicate that the package must be collected from the specified index.Protect your packages using controlled scopes. By controlling the scopes of dependencies, it will ensure that dependencies are locked to the applications that require them.
Utilize client-side verification features. Controls such as sub-resource integrity or version locking will ensure that applications and services will detect when malicious code is introduced into a dependency and refuse to execute it.
As an additional defense measure against dependency confusion attacks, the names of internal dependencies can be registered on external package managers without the source code to claim the name. This will prevent an attacker from registering a similarly named package.
Last updated