TechEarl

How to Run find in Parallel with xargs -P

find . -type f -name '*.log' -print0 | xargs -0 -P 4 -n 1 gzip compresses every matched file four at a time. The flags that make it work: -P for parallel workers, -n 1 so each worker gets one job, -0 paired with find's -print0 for safety. When parallelism helps (CPU-bound work) and when it just thrashes the disk.

Ishan KarunaratneIshan Karunaratne⏱️ 15 min readUpdated
Use xargs -P to run find results in parallel: find ... -print0 | xargs -0 -P 4 -n 1 cmd. Set -P to the core count, why -n 1 matters, CPU-bound vs IO-bound work, and xargs -P vs GNU parallel.

find . -type f -name '*.log' -print0 | xargs -0 -P 4 -n 1 gzip compresses every matched .log file, four at a time, in parallel. The -P 4 is the part that does the work: it tells xargs to keep up to four gzip processes running at once instead of one after another. On a four-core box that's roughly a 4× speedup for a CPU-bound job like compression.

This is the focused parallelism deep-dive. If you want the broader "should I use -exec or xargs at all" decision, that's the find -exec vs xargs comparison. Here I assume you've already picked xargs and want to make it run jobs concurrently: the flags, how to size -P, which workloads actually speed up, and the mistakes that quietly kill the parallelism you think you turned on.

Set your values

Try it with your own values

Set your OS, search path, and the number of parallel jobs. Every command below updates with your values.

The one-liner

bash· Linux (GNU)
find :search_path -type f -name '*.log' -print0 | xargs -0 -P :jobs -n 1 gzip

That runs up to :jobs gzip processes concurrently, each compressing one file. When a worker finishes, xargs immediately hands it the next file off the list. The pool stays full until the file list is exhausted.

The three flags that matter

Three xargs flags turn a serial pipeline into a parallel one safely.

FlagMeaningWhy you need it
-P NRun up to N command processes at onceThis is the parallelism. -P 1 (default) is fully serial.
-n NPass N arguments per command invocationWith -P, set -n 1 so each worker gets exactly one job.
-0Read NUL-delimited inputPairs with find's -print0. The only safe way to handle filenames with spaces and newlines.

-P alone isn't enough. The next two sections explain why -n 1 and -0 are not optional.

Why -n 1 is required with -P

Without -n, xargs packs as many file paths as it can into a single command invocation, up to the kernel argument-list limit. That's the right default for a serial xargs gzip, because one invocation processing 5,000 files is faster than 5,000 invocations. But it destroys parallelism.

Here's the trap. find ... -print0 | xargs -0 -P 4 gzip looks parallel. It has -P 4. But xargs batches all matched files into (typically) one giant invocation, so there is only one gzip command to run. -P 4 has nothing to spread across four workers. You get one process, zero speedup, and a command that looks correct in code review.

-n 1 fixes it by forcing one file per invocation. Now there are as many gzip commands as there are files, and -P 4 can keep four of them running. The tradeoff is fork overhead (one process per file), but for any job where the per-file work takes more than a few milliseconds, that overhead is noise.

A middle ground exists: -n 10 -P 4 gives each worker a batch of 10 files. Useful when the per-file work is tiny and fork cost would dominate, but for most CPU-bound jobs -n 1 is the right call because it keeps the workers evenly loaded.

Why -0 and -print0 are not optional

find ... -print0 separates output filenames with NUL bytes. xargs -0 reads NUL-separated input. NUL is the only byte that cannot appear in a Unix filename, so it's the only safe separator.

Plain find ... | xargs splits on whitespace. The moment a matched file has a space in its name, xargs passes the two halves as separate arguments and your command fails on "no such file". With -P in the mix this is worse, because the failure is now interleaved into concurrent output and harder to spot. Always pair -print0 with -0. The find -exec vs xargs article covers the filename-safety rules in full.

Sizing -P to the core count

For CPU-bound work, the sweet spot for -P is the number of logical CPU cores. More workers than cores just means the OS scheduler time-slices them, adding context-switch overhead without doing more work in parallel.

Don't hardcode 4. Query the core count at runtime:

bash· Linux (GNU)
find :search_path -type f -name '*.log' -print0 | xargs -0 -P "$(nproc)" -n 1 gzip

nproc reports logical cores on Linux. sysctl -n hw.ncpu is the macOS equivalent. On Windows, [Environment]::ProcessorCount does the same. There's also a useful shorthand on GNU xargs: -P 0 means "run as many jobs as possible", which xargs interprets as one per available core. BSD xargs does not support -P 0, so for cross-platform scripts use the explicit nproc / sysctl form.

One refinement: if the job is partly IO-bound (some CPU, some disk wait), going slightly above the core count can help, because workers blocked on IO leave a core free for another worker. -P at 1.5× cores is a reasonable starting point for mixed workloads. Measure, don't guess.

CPU-bound vs IO-bound: when parallelism actually helps

This is the single most important thing to understand before reaching for -P. Parallelism speeds up CPU-bound work and often does nothing for IO-bound work.

CPU-bound tasks spend their time computing: compression (gzip, zstd), hashing (sha256sum), image processing (resize, convert), video transcoding, minification. Each file keeps a core busy. Running N of them on N cores gives close to N× throughput. This is the case -P was built for.

IO-bound tasks spend their time waiting on the disk or network: copying files across a single spinning disk, rsync over one network link, reading many small files off a slow mount. The bottleneck is the device, not the CPU. Running four parallel cp jobs against one disk doesn't move data four times faster; it makes the disk head seek between four locations, and on a spinning disk that thrashing can make it slower than serial. NVMe SSDs tolerate parallel IO far better, but the device bandwidth is still a hard ceiling.

The quick test: if running the job on one file pegs a CPU core, it's CPU-bound and -P will help. If running it on one file leaves the CPU idle while the disk light is solid, it's IO-bound and -P mostly won't.

Worked examples

Parallel gzip (the canonical case, CPU-bound):

bash· Linux (GNU)
find :search_path -type f -name '*.log' -print0 | xargs -0 -P :jobs -n 1 gzip

Parallel image resize with ImageMagick (CPU-bound, big speedup on a folder of photos):

bash· Linux (GNU)
find :search_path -type f -name '*.jpg' -print0 | xargs -0 -P :jobs -n 1 -I {} mogrify -resize 1200x1200 {}

-I {} lets you place the filename mid-command instead of at the end. Note that -I implies -n 1 (one input per invocation), so you can drop the explicit -n 1 when you use -I {}. The -P still applies.

Parallel checksum with sha256sum (CPU-bound, useful for verifying a large file set):

bash· Linux (GNU)
find :search_path -type f -print0 | xargs -0 -P :jobs -n 1 sha256sum

macOS ships shasum rather than sha256sum; shasum -a 256 is the equivalent. Output ordering is not deterministic here: see the next section.

Parallel file conversion with ffmpeg (CPU-bound, transcoding is heavy):

bash· Linux (GNU)
find :search_path -type f -name '*.wav' -print0 | xargs -0 -P :jobs -n 1 -I {} ffmpeg -i {} {}.mp3

Caveat for ffmpeg specifically: many builds are already multi-threaded per file, so running four ffmpeg processes on a four-core box oversubscribes the cores. For internally-threaded tools, keep -P low (2 or 3) or cap each process's thread count.

The output-ordering caveat

Parallel workers write to the same terminal, and their output interleaves. With -P 4 -n 1 sha256sum, the four running processes finish in unpredictable order, and a process that emits multiple lines can have its lines split by another process's output. The result is a scrambled, sometimes mangled stream.

If order does not matter (compressing files in place, where the output is just status noise), ignore it. If order does matter, you have two options:

  1. Capture per-file output. Have each worker write to its own file: xargs -0 -P 4 -n 1 -I {} sh -c 'sha256sum "{}" > "{}.sha256"'. No interleaving because each process owns its own output file.
  2. Use GNU parallel with --keep-order. parallel buffers each job's output and replays it in input order once the job completes, so the combined stream reads as if the jobs ran serially even though they ran concurrently. xargs -P has no equivalent.

xargs -P vs GNU parallel

xargs -P and GNU parallel solve the same core problem. The differences come down to features versus ubiquity.

xargs -PGNU parallel
AvailabilityBuilt in everywhere (GNU and BSD)Separate install (apt install parallel, brew install parallel)
Parallel jobs-P N-j N
Progress barNoYes (--bar)
Ordered outputNoYes (--keep-order)
Per-job logsNoYes (--joblog)
Retry failed jobsNoYes (--retries)
Input grouping-n, -L-N, :::, richer

The honest summary: xargs -P is good enough for the large majority of "run this on every matched file, N at a time" jobs, and it's already installed on every Unix box. Reach for GNU parallel when you specifically need a progress bar on a long run, ordered output, a job log to see which files failed, or automatic retries. For a one-off gzip of a log directory, xargs -P is the right tool and the smaller dependency.

macOS BSD vs GNU xargs

The good news: BSD xargs (the macOS default) supports the parallel flags. You do not need to install GNU findutils just to get -P.

FeatureGNU xargsBSD xargs (macOS default)
-P N (parallel workers)SupportedSupported
-P 0 (auto core count)SupportedNOT supported (use sysctl -n hw.ncpu)
-0 (NUL delimiter)SupportedSupported
-n N (args per invocation)SupportedSupported
-I {} (replacement token)SupportedSupported
-r / --no-run-if-emptySupported-r not needed; BSD skips empty input by default
-L N (lines per invocation)SupportedSupported

The two practical gaps: BSD xargs has no -P 0 shorthand, so use -P "$(sysctl -n hw.ncpu)" explicitly; and BSD already skips the command on empty input, so the GNU -r flag is unnecessary there. Everything in the parallel pipeline (-P, -n, -0, -I) works the same on both.

Common mistakes

1. -P without -n 1, so batching kills the parallelism. xargs -0 -P 4 gzip batches every file into one invocation, leaving -P 4 nothing to parallelize. You get one process and zero speedup. Always add -n 1 (or -I {}, which implies it) when you want per-file parallelism.

2. Forgetting -print0 / -0. Plain find ... | xargs -P 4 splits filenames on whitespace and breaks on any path with a space. Use find ... -print0 | xargs -0 -P 4.

3. Setting -P far above the core count. -P 64 on an 8-core box doesn't run 64× faster. It runs 8 jobs' worth of work with 64 processes fighting over 8 cores, paying context-switch and memory overhead for nothing. Cap -P at (or near) the core count for CPU-bound work.

4. Parallelizing IO-bound work. Running -P 8 cp against a single spinning disk makes the head thrash and can be slower than serial. Parallelism is for CPU-bound jobs. Check whether one file pegs a core before scaling out.

5. Expecting ordered output. Parallel workers interleave their stdout. If you pipe xargs -P 4 sha256sum into a file expecting a clean checksum list, you'll get a scrambled one. Capture per-file output or use parallel --keep-order.

6. Oversubscribing with internally-threaded tools. ffmpeg, zstd -T0, and many image tools already use multiple threads per file. Running -P 8 of them on 8 cores oversubscribes by 8×. For self-threading tools, keep -P at 2 or 3, or cap each process's thread count.

When NOT to use xargs -P

Parallelism is not free, and a few situations call for something else:

  • IO-bound work on shared storage. Copying, moving, or rsync-ing across one disk or one network link. The device bandwidth is the ceiling; more workers just add seek thrashing. Run it serially.
  • Tasks with shared-resource contention. If every worker writes to the same database, the same log file, or the same lock, they serialize on that resource anyway, and you've added process overhead for no gain. Parallelism only helps when the workers are genuinely independent.
  • When you need progress, retries, or ordered output. xargs -P has none of these. For a long-running batch where you want a progress bar, a job log of failures, or automatic retries, GNU parallel is the better fit. Install it and use parallel --bar --joblog --retries.
  • Complex per-file logic. If each file needs conditional branching ("transcode only if the output doesn't already exist and the input is newer"), wrap a shell loop instead: find ... -print0 | while IFS= read -r -d '' f; do ...; done. That's serial, but if you need the parallel version, GNU parallel can run a shell function cleanly where xargs would need an awkward sh -c wrapper.

See also

FAQ

TagsfindxargsParallelismCLILinuxmacOSBSDPerformance
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Run a local LLM with Ollama: install, pull a model, hardware floor, picking between Llama, Mistral, Qwen. When local beats cloud and when it doesn't.

How to Run a Local LLM with Ollama

Run a local LLM with Ollama: install, pull a model, the hardware floor, picking between Llama, Mistral, and Qwen, and when local is faster than cloud (and when it isn't).

Archive every file matching a find pattern with tar. The safe find -print0 | tar --null --files-from=- one-liner, the macOS BSD tar -T difference, archiving by modification time, and gzip vs bzip2 vs xz vs zstd.

How to Archive Files Matching a find Pattern with tar

find locates the files, tar archives them. The safe pairing is find -print0 piped into tar reading a NUL-delimited list from stdin: no breakage on spaces or newlines. The flag breakdown, the macOS BSD tar vs GNU tar difference, the -exec append alternative, archiving by modification time, and the compression choices.