npm is a dependency manager for the NodeJS ecosystem, composed of two main elements:
- a registry with all the packages that can be installed in a NodeJS project
- the npm CLI, which provides the ability to workout which dependencies are required for the project
npm is bundled together with NodeJS, so if you have a NodeJS environment on your machine, there is also npm installed with that.
The packages in the registry are immutable for each version: this means that it’s not possible to update a package submitted to the npm registry, but rather it is mandatory to publish a new version for it.
How npm CLI works?
It all starts with npm install
.
The npm CLI is the tool that computes the dependencies required by the project, fetches them from the registry (if not downloaded yet, i.e. when adding a new library to an existing project) and executes building scripts, putting all the results into the node_modules
folder. The best feature of npm CLI is the local dependency trees for each project, so that no conflict can arise for dependencies when the server is hosting multiple applications.
First releases of npm CLI were using pretty simple approaches, downloading recursively the dependencies for each package: that led to very nested folders inside node_modules
and pretty “fat” project folders, hence the meme of the heaviest object in the universe:
The process was also slow as many packages had to be downloaded from the registry and a lot of disk I/O was required to complete the process.
Managing the dependency tree is a tricky task, as we’ll see later in the “lock file” section. Indeed it requires not only a better logistic approach for the download and disk I/O but also the foundation for the reproducibility of the installation process.
Since npm CLI version 3, the approach has been smarter, so that shared dependencies are worked out to build a flatten tree, less download and disk I/O in general was required. Newer versions of npm CLI refined this approach to reduce as much as possible the overhead required to install, challenged by another competitor arised meanwhile: yarn
.
What is yarn?
yarn
is another CLI application, a npm registry client, which aims to map the same feature of npm, but faster. The yarn
project was created by some big companies like Google and Facebook amongst the others to provide a secure and faster alternative to npm. It periodically makes a copy of the npm registry and provides a lock file (yarn.lock
) by default.
Since yarn
introduced the lock file by default, also npm started providing one (package-lock.json), which was before an optional feature behind a flag.
But what exactly is this lock file? Why yarn
decided it was so important to have it by default?
The lock file
The lock file aims to solve a typical problem of dependency management systems: the reproducibility of the installation process. With a single package.json file (as in the original npm
approach), the developer has the ability to choose which versions of the libraries s/he wants to use, but has no power on the rest of the dependency tree (which can be very deep). This means that an author of a dependency expressed in the package.json (or even a sub-dependency at any level of the tree) could arbitrarily update his library in a way that can lead to a conflict in the final project.
The lock file makes a snapshot of the whole tree once installed and use this snapshot, when available, to replicate the same setup: a sort of “work on my machine” extended to the dependency management.
The next generation
So far we’ve been talking about npm and its history, as well as about yarn
. But have they reached the tip of the mountain in terms of speed and user experience for dependency management? Well, not yet.
Both projects are still pursuing the performance target to improve the CLI and have an almost zero-time install experience. Interestingly, during the latest JSconf EU 2019 both the projects showcased two completely opposite approaches to this goal.
npm presented the tink project, which eventually will be merged into the next npm CLI project. The idea is to let tink
compute the dependencies just before runtime, fetching only the strictly required files from each dependency. This means no more packages in node_modules
with test and documentation folders, source code or unused dependencies. For production environments, a special command will be provided to bundle this list of files and install them.
On the other hand the yarn project is going in the completely opposite direction, keeping the node_modules
folder and fill it with compressed package files. Avoiding to uncompress each package saves a lot of CPU and disk I/O time during the installation process, and additional security checks can be applied to them. In the end, their idea is to commit the node_modules
folder to bring the next level of lock safeness to the project.
With both projects presenting their (opposite) ideas on how to improve the install experience, we can now try to figure out what the future will bring.
What’s next?
The state of the art for both npm CLI and yarn
has been presented now, but no particular evolution has been seen on the registry side of npm for many years. This year, at the JSconf EU 2019 conference, a new project on this topic has been presented, by no-less than the former npm CTO C. J. Silverio.
We recommend to watch the full talk, as it provides very interesting insights on the npm registry world which only few people in the world can have:
For those who prefer to jump to the final point, here’s the conclusion. npm has been always a centralized registry for the NodeJS ecosystem, beating competition because it got bundled in the NodeJS binary. Due to some recent rumors about economic difficulties for npm inc., the NodeJS world started to look around for alternatives before doom’s day and half of the Internet collapses because npm registry has had to close (remember those funny days of leftpad?).
C. J. Silverio proposes a federated package manager called entropic which works like the npm CLI but has multiple registries federated to look for packages. This will bring more resilience to the NodeJS ecosystem with a robust solution which does not depend on a single company infrastructure.