Download everything (async wget -i in NodeJS)

One of my most used tools on the command line tools is wget (and it's companion curl). Almost everything you can do on the web can be written as a wget command, which makes it repeatable and scriptable. You also get tons of neat helper functions. One of them in wget is the -i flag. It takes a file which has a list of urls on different lines. It will then download each file and save it to your disc.

As you can guess, this is helpful when you have a large amount of files you want to download. wget does this in sequence, once after the other. I've found it can be sped up considerably by downloading each file in parallel. Here's what I came up in a Node script.

const fs = require('fs');  
const request = require('request');  
const async = require('async');  
const last = require('lodash/last');  
const trim = require('lodash/trim');

const argv = require('minimist')(process.argv.slice(2));

const saveUrl = url => callback => {  
  let file = trim(url).replace("://", "_").replace(/\//g, "_");
  file = last(trim((file), '_').split('_')); 
  console.log(file);
  request({uri: url})
      .pipe(fs.createWriteStream(file))
      .on('close', () => {
        callback();
      });
};

const NUM_PARALLEL = 10;  
fs.readFile(argv.i, 'utf8', (err, data) => {  
  const saves = data.split('\n').map(url => saveUrl(url));
  async.parallelLimit(saves, NUM_PARALLEL , (err, res) => console.log('complete'));
});

You can use it just like the wget command, with the -i flag. I limited the number of files being downloaded in parallel to 10 but it can be much larger if your connection and OS can handle it. Keep in mind, you can only have a limited amount of both sockets and file handles open, so don't set it too high.

Bret Lowrey

Code is like a war - the best code is one never written.

Florida, USA