Every developer on the planet knows how modular Node.js and the JavaScript ecosystem have become. This is probably due to the great job that package management systems and registries like bower (discontinued) and npm carried over in the last few years. I personally believe that this is also a consequence of the “many small modules” philosophy that has been popularised within the JavaScript ecosystem.
This is great, but all that glitters is not gold… Look, for instance, at this picture for a second:
Yeah you have probably seen this picture before and it’s probably not funny anymore… Anyway, this picture is a good summary right there on how this “many small modules” idea got a little bit out of hand within the JavaScript ecosystem.
Every time you run npm install
you basically start to get so many files that you might feel like you are downloading the entire world wide into your hard drive! 😰
There are even tools that try to scout for node_modules
folders in your system and get rid of them (E.g. wipe-modules
). There are also some developers who showed how all the node_modules
folders in their system is making their backups too slow (see tweet)!
Some like to make fun of this issue or they just complain about it. In this article, I don’t want to do any of those things. I’d rather prefer to be a little bit more constructive and try to share some simple techniques to keep your NPM modules as lean as possible, so that other developers will save bandwidth and time when pulling your modules from NPM!
Repository vs Registry
In some languages like Go or PHP, what you have in a module repository is exactly what you get through the package manager when trying to install the module. This is because the code you download through the package manager is actually coming straight from the repository (or from a proxy that keeps a copy of the repository). In this cases, the structure of your repository is fundamentally tied to the file structure of your module: what you get by installing a module is pretty much what you would get by cloning the repository.
NPM doesn’t work this way. In fact, NPM allows you to selectively push files into the registry, so you might end up with a very different file structure compared to what you have in your git repository.
While this interesting property of the system have caused some security issues in the past (see the event-stream module incident if you are curious), it also offers us an opportunity to be very selective with what we publish and keep the module lean.
This is especially important if you “build” your JavaScript code (e.g. using Typescript, Babel or a module bundler), so that the “distribution” (dist) version of your module is the result of a compilation/transpilation/bundling process. In such cases, you don’t need to publish the entire codebase on NPM as your users will be using only the dist version of your code. The same goes for tests, documentation, images and other files that won’t be used by the users of your module in their codebase, you should keep them only in your repository and avoid to publish them in the registry.
Conversely, you probably don’t want to keep dist code in your repository. This code can easily be regenerated by the build toolchain when necessary and there’s no point in tracking changes on the dist files when what you are really changing over time is the source code. In git you can use .gitignore
to make sure dist files are kept out of the repository.
In short, registries are for production-ready code (dist) while repositories are for development code (src).
In the rest of this article we will see some ways to configure an NPM package so that all the unnecessary files will be excluded from the registry.
Publishing on NPM
With the NPM command line, npm publish
is the de facto way of publishing new modules (or new versions of a module) into the NPM registry.
An NPM module is nothing else than a folder with a valid package.json
file in it. It doesn’t have to be a git repository (in reality the definition of what an NPM module can be is a little bit more complicated, to get the full spiel, check out the official NPM documentation).
By default npm publish
will publish all the files in the package directory (including subfolders recursively).
So the first thing to do is to be careful and make sure that you don’t have sensible files containing passwords, tokens or other sensible information in your project folder. It’s generally a good idea to keep those away from the module folder, just in case…
You should also try to avoid to keep unrelated files in the same folder. Yeah, I admit that many times I did some quick n’dirty wget
to get something I needed while I was working on a module and ended up with a lot of unrelated stuff published in my module. Please be smarter than me, don’t do that! 😜
Default rules
Before starting to deep dive into the different ways you can specify the files to be included/excluded when you publish your package, let’s see first what are the default rules.
No matter what you do, there are some files that are always excluded:
.*.swp
._*
.DS_Store
.git
.hg
.npmrc
.lock-wscript
.svn
.wafpickle-*
config.gypi
CVS
npm-debug.log
Similarly, there are files that are always included:
package.json
README.md
(and its variants, likeREADME.markdown
orREADME.rst
)CHANGELOG.md
(and its variants, likeCHANGELOG.markdown
orCHANGELOG.rst
)LICENSE
andLICENCE
Note that package-lock.json
is NOT automatically included.
.gitignore
& .npmignore
The first interesting property of npm publish
is that, if your folder is also a git repository and you are using a .gitignore
file, all the patterns listed in it will be used to exclude files.
So, for instance, if you have *.cache
pattern in your .gitignore
, all the files matching the pattern won’t be published in the registry.
We discussed already that you might want to have different rules between what you track in your repository and what you publish to the registry, so relying on one configuration to ignore files for both targets might not always be a good idea.
In those cases you can create a more specific file called .npmignore
(which supports exactly the same syntax as .gitignore
). If this file exists, npm publish
will use that to exclude files, rather than using .gitignore
.
This means that there’s no inheritance, the two files are totally independent. If you want a pattern to exclude files for both your repository and your registry, you will have to put the pattern in both configuration files.
One interesting lesser known (and rarely used) tip is that you can put .npmignore
files also in subdirectories. The patterns specified in these files will apply only to the subtree of directories where the .npmignore
is found.
The files
field
If you don’t like the idea of blacklisting some files (in fairness, you might forget to exclude a file with some sensitive information in it…) you can also follow a whitelisting approach.
In fact, NPM allows you to use a field called files
in your package.json
to specify an array of file patterns to include in the package.
From the official documentation:
The optional
files
field is an array of file patterns that describes the entries to be included when your package is installed as a dependency. File patterns follow a similar syntax to.gitignore
, but reversed: including a file, directory, or glob pattern (*
,**/*
, and such) will make it so that file is included in the tarball when it’s packed. Omitting the field will make it default to ["*"
], which means it will include all files.
One important rule is that files included with the files
field cannot be excluded through .npmignore
. In other words, the files
field has higher priority than .npmignore
.
.npmignore
vs the files
field
As we said, .npmignore
is effectively a blacklist of files, while the files
field acts as a whitelist.
This means that, if the files
field is populated, everything is excluded by default and only those files explicitly listed will be included in the packaged tarball.
You are probably wondering now, should I use the files
field or the .npmignore
file?
To be honest, I don’t think there’s a silver bullet here. Just pick the mental model (whitelist vs blacklist) that comes easier to you.
An example
I generally prefer to keep my folder structure simple and explicit by having folders for source (src
) and distribution files (dist
).
With this approach you can simply say that src
is what you want to keep in your repo (excluding dist
) and, viceversa, in dist
is what you want to publish on NPM (excluding src
).
Just to make a very simple example, let’s say we are building a new library and our code base contains the following files:
src/index.js
: source code for our module logic (using ES2019 syntax, because we like to be cool! 😎)src/index.test.js
: unit test filedist/index.js
: distributable version of our module (transpiled to ES5 with babel)
Now we want to keep src/index.js
and src/index.test.js
in our repository (but not in your final package) and dist/index.js
in our package (but not in our repository).
One way we can achieve this result is by adding dist/
to our .gitignore
, this will make sure we never commit files from the dist folder to the repository. Then we can either use the .npmignore
file or the files
field to specify what goes in our package.
I personally prefer to use the files
field, which in this case will be super simple.
{
"name": "some-test-package",
"version": "1.0.0",
"main": "dist/index.js",
"files": [
"dist/"
]
}
Notice that I am also pointing the entrypoint (main
) to our index.js
file in dist
. This is what will be used when our module is imported.
With this approach I can add all sorts of other files to my repo (e.g. integration tests, functional tests, images, documentation, etc.) and I won’t have to worry about polluting my final package and making the end user download a lot of stuff that they won’t need!
Testing the package files
But how do we know if our setup is correct? We don’t want to publish the package just to see if our setup is correct.
Thankfully there are at least 2 ways to preview what’s gonna end up in the registry with npm publish
without having to actually publish anything.
The first way is npm pack
, this command will create a tarball that contains all the files that will be published in the registry.
The output is actually pretty nice and it will list all the included files.
If we run npm pack
on the package folder from the example above we should see something like this:
npm notice
npm notice 📦 [email protected]
npm notice === Tarball Contents ===
npm notice 74B dist/index.js
npm notice 266B package.json
npm notice 13B LICENSE.md
npm notice 39B README.md
npm notice === Tarball Details ===
npm notice name: some-test-package
npm notice version: 1.0.0
npm notice filename: some-test-package-1.0.0.tgz
npm notice package size: 428 B
npm notice unpacked size: 392 B
npm notice shasum: 738776acad3cb41c549a884c6f9e946e7f367657
npm notice integrity: sha512-QQS68QqFtfTGE[...]XmPGJpSYqmpKw==
npm notice total files: 4
npm notice
some-test-package-1.0.0.tgz
Note that only 4 files have been included:
dist/index.js
package.json
LICENSE.md
README.md
An alternative approach is to run npm publish
in dry run mode with the flag --dry-run
. With this approach no tarball is created but you will see the output of all the files that would be published with a normal npm publish
run.
Conclusion
In summary, these are the main points I wanted to get across with this article:
- What you have in your repository can (and probably should) be different from what you publish in the NPM registry.
- You can exclude files by specifying patterns in
.npmignore
(similarly to.gitignore
) - Alternatively, you can whitelist files by specifying patterns of files to be included in the
files
field in yourpackage.json
- There’s a list of files that are always included and, similarly, a list of files that are always excluded (see list above).
- Be smart and only publish the bare minimum needed for people to use your library: keep your NPM package lean!
With these advices we are probably not going to solve the node_modules
drama, but at least we can do our part to make it a little bit more bearable.
Please, let me know what you think about these advices here in the comments. Did you know about these configuration options? Did you use other strategies to keep your NPM packages lean?
I’ll see you in the next article. Until then, keep your NPM modules lean! 🤗📦
CIAO 👋