This is great, but all that glitters is not gold… Look, for instance, at this picture for a second:
Every time you run
npm install you basically start to get so many files that you might feel like you are downloading the entire world wide into your hard drive! 😰
There are even tools that try to scout for
node_modules folders in your system and get rid of them (E.g.
wipe-modules). There are also some developers who showed how all the
node_modules folders in their system is making their backups too slow (see tweet)!
Some like to make fun of this issue or they just complain about it. In this article, I don’t want to do any of those things. I’d rather prefer to be a little bit more constructive and try to share some simple techniques to keep your NPM modules as lean as possible, so that other developers will save bandwidth and time when pulling your modules from NPM!
Repository vs Registry
In some languages like Go or PHP, what you have in a module repository is exactly what you get through the package manager when trying to install the module. This is because the code you download through the package manager is actually coming straight from the repository (or from a proxy that keeps a copy of the repository). In this cases, the structure of your repository is fundamentally tied to the file structure of your module: what you get by installing a module is pretty much what you would get by cloning the repository.
NPM doesn’t work this way. In fact, NPM allows you to selectively push files into the registry, so you might end up with a very different file structure compared to what you have in your git repository.
While this interesting property of the system have caused some security issues in the past (see the event-stream module incident if you are curious), it also offers us an opportunity to be very selective with what we publish and keep the module lean.
Conversely, you probably don’t want to keep dist code in your repository. This code can easily be regenerated by the build toolchain when necessary and there’s no point in tracking changes on the dist files when what you are really changing over time is the source code. In git you can use
.gitignore to make sure dist files are kept out of the repository.
In short, registries are for production-ready code (dist) while repositories are for development code (src).
In the rest of this article we will see some ways to configure an NPM package so that all the unnecessary files will be excluded from the registry.
Publishing on NPM
With the NPM command line,
npm publish is the de facto way of publishing new modules (or new versions of a module) into the NPM registry.
An NPM module is nothing else than a folder with a valid
package.json file in it. It doesn’t have to be a git repository (in reality the definition of what an NPM module can be is a little bit more complicated, to get the full spiel, check out the official NPM documentation).
npm publish will publish all the files in the package directory (including subfolders recursively).
So the first thing to do is to be careful and make sure that you don’t have sensible files containing passwords, tokens or other sensible information in your project folder. It’s generally a good idea to keep those away from the module folder, just in case…
You should also try to avoid to keep unrelated files in the same folder. Yeah, I admit that many times I did some quick n’dirty
wget to get something I needed while I was working on a module and ended up with a lot of unrelated stuff published in my module. Please be smarter than me, don’t do that! 😜
Before starting to deep dive into the different ways you can specify the files to be included/excluded when you publish your package, let’s see first what are the default rules.
No matter what you do, there are some files that are always excluded:
Similarly, there are files that are always included:
README.md(and its variants, like
CHANGELOG.md(and its variants, like
package-lock.json is NOT automatically included.
The first interesting property of
npm publish is that, if your folder is also a git repository and you are using a
.gitignore file, all the patterns listed in it will be used to exclude files.
So, for instance, if you have
*.cache pattern in your
.gitignore, all the files matching the pattern won’t be published in the registry.
We discussed already that you might want to have different rules between what you track in your repository and what you publish to the registry, so relying on one configuration to ignore files for both targets might not always be a good idea.
In those cases you can create a more specific file called
.npmignore (which supports exactly the same syntax as
.gitignore). If this file exists,
npm publish will use that to exclude files, rather than using
This means that there’s no inheritance, the two files are totally independent. If you want a pattern to exclude files for both your repository and your registry, you will have to put the pattern in both configuration files.
One interesting lesser known (and rarely used) tip is that you can put
.npmignore files also in subdirectories. The patterns specified in these files will apply only to the subtree of directories where the
.npmignore is found.
If you don’t like the idea of blacklisting some files (in fairness, you might forget to exclude a file with some sensitive information in it…) you can also follow a whitelisting approach.
In fact, NPM allows you to use a field called
files in your
package.json to specify an array of file patterns to include in the package.
From the official documentation:
filesfield is an array of file patterns that describes the entries to be included when your package is installed as a dependency. File patterns follow a similar syntax to
.gitignore, but reversed: including a file, directory, or glob pattern (
**/*, and such) will make it so that file is included in the tarball when it’s packed. Omitting the field will make it default to [
"*"], which means it will include all files.
One important rule is that files included with the
files field cannot be excluded through
.npmignore. In other words, the
files field has higher priority than
.npmignore vs the
As we said,
.npmignore is effectively a blacklist of files, while the
files field acts as a whitelist.
This means that, if the
files field is populated, everything is excluded by default and only those files explicitly listed will be included in the packaged tarball.
You are probably wondering now, should I use the
files field or the
To be honest, I don’t think there’s a silver bullet here. Just pick the mental model (whitelist vs blacklist) that comes easier to you.
I generally prefer to keep my folder structure simple and explicit by having folders for source (
src) and distribution files (
With this approach you can simply say that
src is what you want to keep in your repo (excluding
dist) and, viceversa, in
dist is what you want to publish on NPM (excluding
Just to make a very simple example, let’s say we are building a new library and our code base contains the following files:
src/index.js: source code for our module logic (using ES2019 syntax, because we like to be cool! 😎)
src/index.test.js: unit test file
dist/index.js: distributable version of our module (transpiled to ES5 with babel)
Now we want to keep
src/index.test.js in our repository (but not in your final package) and
dist/index.js in our package (but not in our repository).
One way we can achieve this result is by adding
dist/ to our
.gitignore, this will make sure we never commit files from the dist folder to the repository. Then we can either use the
.npmignore file or the
files field to specify what goes in our package.
I personally prefer to use the
files field, which in this case will be super simple.
Notice that I am also pointing the entrypoint (
main) to our
index.js file in
dist. This is what will be used when our module is imported.
With this approach I can add all sorts of other files to my repo (e.g. integration tests, functional tests, images, documentation, etc.) and I won’t have to worry about polluting my final package and making the end user download a lot of stuff that they won’t need!
Testing the package files
But how do we know if our setup is correct? We don’t want to publish the package just to see if our setup is correct.
Thankfully there are at least 2 ways to preview what’s gonna end up in the registry with
npm publish without having to actually publish anything.
The first way is
npm pack, this command will create a tarball that contains all the files that will be published in the registry.
The output is actually pretty nice and it will list all the included files.
If we run
npm pack on the package folder from the example above we should see something like this:
npm notice 📦 [email protected]
npm notice === Tarball Contents ===
npm notice 74B dist/index.js
npm notice 266B package.json
npm notice 13B LICENSE.md
npm notice 39B README.md
npm notice === Tarball Details ===
npm notice name: some-test-package
npm notice version: 1.0.0
npm notice filename: some-test-package-1.0.0.tgz
npm notice package size: 428 B
npm notice unpacked size: 392 B
npm notice shasum: 738776acad3cb41c549a884c6f9e946e7f367657
npm notice integrity: sha512-QQS68QqFtfTGE[...]XmPGJpSYqmpKw==
npm notice total files: 4
Note that only 4 files have been included:
An alternative approach is to run
npm publish in dry run mode with the flag
--dry-run. With this approach no tarball is created but you will see the output of all the files that would be published with a normal
npm publish run.
In summary, these are the main points I wanted to get across with this article:
- What you have in your repository can (and probably should) be different from what you publish in the NPM registry.
- You can exclude files by specifying patterns in
- Alternatively, you can whitelist files by specifying patterns of files to be included in the
filesfield in your
- There’s a list of files that are always included and, similarly, a list of files that are always excluded (see list above).
- Be smart and only publish the bare minimum needed for people to use your library: keep your NPM package lean!
With these advices we are probably not going to solve the
node_modules drama, but at least we can do our part to make it a little bit more bearable.
Please, let me know what you think about these advices here in the comments. Did you know about these configuration options? Did you use other strategies to keep your NPM packages lean?
I’ll see you in the next article. Until then, keep your NPM modules lean! 🤗📦