Quick parallelization with GNU parallel.

With GNU parallel we can quickly parallelize any shell command. The documentation is extensive, so the following are my getting started notes.

In general, the syntax is:

parallel [parallel-options] [command] [command-arguments] [::: input]

The ::: delimiter denotes the beginning of the inputs to the command. Per default, this is a space-separated list where each element is passed to one run of the command. We will run the command as many (per default space-separated) times, as we have an input element. For example:

parallel --max-procs 2 echo ::: 1 2 3 4

runs echo four times, with two job slots that can be run in parallel.

Instead of using ::: and writing out all the input, we can create a sequence and pipe it using seq:

seq 1 10 | parallel --max-procs 2 echo "Running."

Some more tidbits:

  • We can use {} to refer to the input:
parallel --max-procs 2 echo "Run {}" ::: 1 2 3 4
  • Use --verbose, so that you can exactly see how parallel calls the command. This may be important, because parallel may have to quote the arguments - if that’s necessary, use --quote.
  • You may want to simply run the command several times, but not pass any of the input to it. In that case, use --max-args 0.
  • You can make parallel precede each output line by the input arguments with --tag.

I used parallel to make the same curl request several times. The following curl parameters were useful:

  • --output /dev/null: don’t show the response body
  • --silent --show-error: don’t show a progress bar
  • --write-out "Total time: %{time_total}s\n": write out some information about the request. See the man page for all available information.

So, in the end:

seq 1 4 | parallel \
    --max-args 0 \
    --max-procs 2 \
    --quote \
    curl \
        --output /dev/null \
        --silent --show-error \
        --write-out "Total time: %{time_total} - response code: %{response_code}\n" \
        www.google.de