find . -type f -name '*.log' -print0 | xargs -0 -P 4 -n 1 gzip compresses every matched .log file, four at a time, in parallel. The -P 4 is the part that does the work: it tells xargs to keep up to four gzip processes running at once instead of one after another. On a four-core box that's roughly a 4× speedup for a CPU-bound job like compression.
This is the focused parallelism deep-dive. If you want the broader "should I use -exec or xargs at all" decision, that's the find -exec vs xargs comparison. Here I assume you've already picked xargs and want to make it run jobs concurrently: the flags, how to size -P, which workloads actually speed up, and the mistakes that quietly kill the parallelism you think you turned on.
Set your values
Set your OS, search path, and the number of parallel jobs. Every command below updates with your values.
The one-liner
find :search_path -type f -name '*.log' -print0 | xargs -0 -P :jobs -n 1 gzipThat runs up to :jobs gzip processes concurrently, each compressing one file. When a worker finishes, xargs immediately hands it the next file off the list. The pool stays full until the file list is exhausted.
The three flags that matter
Three xargs flags turn a serial pipeline into a parallel one safely.
| Flag | Meaning | Why you need it |
|---|---|---|
-P N | Run up to N command processes at once | This is the parallelism. -P 1 (default) is fully serial. |
-n N | Pass N arguments per command invocation | With -P, set -n 1 so each worker gets exactly one job. |
-0 | Read NUL-delimited input | Pairs with find's -print0. The only safe way to handle filenames with spaces and newlines. |
-P alone isn't enough. The next two sections explain why -n 1 and -0 are not optional.
Why -n 1 is required with -P
Without -n, xargs packs as many file paths as it can into a single command invocation, up to the kernel argument-list limit. That's the right default for a serial xargs gzip, because one invocation processing 5,000 files is faster than 5,000 invocations. But it destroys parallelism.
Here's the trap. find ... -print0 | xargs -0 -P 4 gzip looks parallel. It has -P 4. But xargs batches all matched files into (typically) one giant invocation, so there is only one gzip command to run. -P 4 has nothing to spread across four workers. You get one process, zero speedup, and a command that looks correct in code review.
-n 1 fixes it by forcing one file per invocation. Now there are as many gzip commands as there are files, and -P 4 can keep four of them running. The tradeoff is fork overhead (one process per file), but for any job where the per-file work takes more than a few milliseconds, that overhead is noise.
A middle ground exists: -n 10 -P 4 gives each worker a batch of 10 files. Useful when the per-file work is tiny and fork cost would dominate, but for most CPU-bound jobs -n 1 is the right call because it keeps the workers evenly loaded.
Why -0 and -print0 are not optional
find ... -print0 separates output filenames with NUL bytes. xargs -0 reads NUL-separated input. NUL is the only byte that cannot appear in a Unix filename, so it's the only safe separator.
Plain find ... | xargs splits on whitespace. The moment a matched file has a space in its name, xargs passes the two halves as separate arguments and your command fails on "no such file". With -P in the mix this is worse, because the failure is now interleaved into concurrent output and harder to spot. Always pair -print0 with -0. The find -exec vs xargs article covers the filename-safety rules in full.
Sizing -P to the core count
For CPU-bound work, the sweet spot for -P is the number of logical CPU cores. More workers than cores just means the OS scheduler time-slices them, adding context-switch overhead without doing more work in parallel.
Don't hardcode 4. Query the core count at runtime:
find :search_path -type f -name '*.log' -print0 | xargs -0 -P "$(nproc)" -n 1 gzipnproc reports logical cores on Linux. sysctl -n hw.ncpu is the macOS equivalent. On Windows, [Environment]::ProcessorCount does the same. There's also a useful shorthand on GNU xargs: -P 0 means "run as many jobs as possible", which xargs interprets as one per available core. BSD xargs does not support -P 0, so for cross-platform scripts use the explicit nproc / sysctl form.
One refinement: if the job is partly IO-bound (some CPU, some disk wait), going slightly above the core count can help, because workers blocked on IO leave a core free for another worker. -P at 1.5× cores is a reasonable starting point for mixed workloads. Measure, don't guess.
CPU-bound vs IO-bound: when parallelism actually helps
This is the single most important thing to understand before reaching for -P. Parallelism speeds up CPU-bound work and often does nothing for IO-bound work.
CPU-bound tasks spend their time computing: compression (gzip, zstd), hashing (sha256sum), image processing (resize, convert), video transcoding, minification. Each file keeps a core busy. Running N of them on N cores gives close to N× throughput. This is the case -P was built for.
IO-bound tasks spend their time waiting on the disk or network: copying files across a single spinning disk, rsync over one network link, reading many small files off a slow mount. The bottleneck is the device, not the CPU. Running four parallel cp jobs against one disk doesn't move data four times faster; it makes the disk head seek between four locations, and on a spinning disk that thrashing can make it slower than serial. NVMe SSDs tolerate parallel IO far better, but the device bandwidth is still a hard ceiling.
The quick test: if running the job on one file pegs a CPU core, it's CPU-bound and -P will help. If running it on one file leaves the CPU idle while the disk light is solid, it's IO-bound and -P mostly won't.
Worked examples
Parallel gzip (the canonical case, CPU-bound):
find :search_path -type f -name '*.log' -print0 | xargs -0 -P :jobs -n 1 gzipParallel image resize with ImageMagick (CPU-bound, big speedup on a folder of photos):
find :search_path -type f -name '*.jpg' -print0 | xargs -0 -P :jobs -n 1 -I {} mogrify -resize 1200x1200 {}-I {} lets you place the filename mid-command instead of at the end. Note that -I implies -n 1 (one input per invocation), so you can drop the explicit -n 1 when you use -I {}. The -P still applies.
Parallel checksum with sha256sum (CPU-bound, useful for verifying a large file set):
find :search_path -type f -print0 | xargs -0 -P :jobs -n 1 sha256summacOS ships shasum rather than sha256sum; shasum -a 256 is the equivalent. Output ordering is not deterministic here: see the next section.
Parallel file conversion with ffmpeg (CPU-bound, transcoding is heavy):
find :search_path -type f -name '*.wav' -print0 | xargs -0 -P :jobs -n 1 -I {} ffmpeg -i {} {}.mp3Caveat for ffmpeg specifically: many builds are already multi-threaded per file, so running four ffmpeg processes on a four-core box oversubscribes the cores. For internally-threaded tools, keep -P low (2 or 3) or cap each process's thread count.
The output-ordering caveat
Parallel workers write to the same terminal, and their output interleaves. With -P 4 -n 1 sha256sum, the four running processes finish in unpredictable order, and a process that emits multiple lines can have its lines split by another process's output. The result is a scrambled, sometimes mangled stream.
If order does not matter (compressing files in place, where the output is just status noise), ignore it. If order does matter, you have two options:
- Capture per-file output. Have each worker write to its own file:
xargs -0 -P 4 -n 1 -I {} sh -c 'sha256sum "{}" > "{}.sha256"'. No interleaving because each process owns its own output file. - Use GNU
parallelwith--keep-order.parallelbuffers each job's output and replays it in input order once the job completes, so the combined stream reads as if the jobs ran serially even though they ran concurrently.xargs -Phas no equivalent.
xargs -P vs GNU parallel
xargs -P and GNU parallel solve the same core problem. The differences come down to features versus ubiquity.
xargs -P | GNU parallel | |
|---|---|---|
| Availability | Built in everywhere (GNU and BSD) | Separate install (apt install parallel, brew install parallel) |
| Parallel jobs | -P N | -j N |
| Progress bar | No | Yes (--bar) |
| Ordered output | No | Yes (--keep-order) |
| Per-job logs | No | Yes (--joblog) |
| Retry failed jobs | No | Yes (--retries) |
| Input grouping | -n, -L | -N, :::, richer |
The honest summary: xargs -P is good enough for the large majority of "run this on every matched file, N at a time" jobs, and it's already installed on every Unix box. Reach for GNU parallel when you specifically need a progress bar on a long run, ordered output, a job log to see which files failed, or automatic retries. For a one-off gzip of a log directory, xargs -P is the right tool and the smaller dependency.
macOS BSD vs GNU xargs
The good news: BSD xargs (the macOS default) supports the parallel flags. You do not need to install GNU findutils just to get -P.
| Feature | GNU xargs | BSD xargs (macOS default) |
|---|---|---|
-P N (parallel workers) | Supported | Supported |
-P 0 (auto core count) | Supported | NOT supported (use sysctl -n hw.ncpu) |
-0 (NUL delimiter) | Supported | Supported |
-n N (args per invocation) | Supported | Supported |
-I {} (replacement token) | Supported | Supported |
-r / --no-run-if-empty | Supported | -r not needed; BSD skips empty input by default |
-L N (lines per invocation) | Supported | Supported |
The two practical gaps: BSD xargs has no -P 0 shorthand, so use -P "$(sysctl -n hw.ncpu)" explicitly; and BSD already skips the command on empty input, so the GNU -r flag is unnecessary there. Everything in the parallel pipeline (-P, -n, -0, -I) works the same on both.
Common mistakes
1. -P without -n 1, so batching kills the parallelism. xargs -0 -P 4 gzip batches every file into one invocation, leaving -P 4 nothing to parallelize. You get one process and zero speedup. Always add -n 1 (or -I {}, which implies it) when you want per-file parallelism.
2. Forgetting -print0 / -0. Plain find ... | xargs -P 4 splits filenames on whitespace and breaks on any path with a space. Use find ... -print0 | xargs -0 -P 4.
3. Setting -P far above the core count. -P 64 on an 8-core box doesn't run 64× faster. It runs 8 jobs' worth of work with 64 processes fighting over 8 cores, paying context-switch and memory overhead for nothing. Cap -P at (or near) the core count for CPU-bound work.
4. Parallelizing IO-bound work. Running -P 8 cp against a single spinning disk makes the head thrash and can be slower than serial. Parallelism is for CPU-bound jobs. Check whether one file pegs a core before scaling out.
5. Expecting ordered output. Parallel workers interleave their stdout. If you pipe xargs -P 4 sha256sum into a file expecting a clean checksum list, you'll get a scrambled one. Capture per-file output or use parallel --keep-order.
6. Oversubscribing with internally-threaded tools. ffmpeg, zstd -T0, and many image tools already use multiple threads per file. Running -P 8 of them on 8 cores oversubscribes by 8×. For self-threading tools, keep -P at 2 or 3, or cap each process's thread count.
When NOT to use xargs -P
Parallelism is not free, and a few situations call for something else:
- IO-bound work on shared storage. Copying, moving, or
rsync-ing across one disk or one network link. The device bandwidth is the ceiling; more workers just add seek thrashing. Run it serially. - Tasks with shared-resource contention. If every worker writes to the same database, the same log file, or the same lock, they serialize on that resource anyway, and you've added process overhead for no gain. Parallelism only helps when the workers are genuinely independent.
- When you need progress, retries, or ordered output.
xargs -Phas none of these. For a long-running batch where you want a progress bar, a job log of failures, or automatic retries, GNUparallelis the better fit. Install it and useparallel --bar --joblog --retries. - Complex per-file logic. If each file needs conditional branching ("transcode only if the output doesn't already exist and the input is newer"), wrap a shell loop instead:
find ... -print0 | while IFS= read -r -d '' f; do ...; done. That's serial, but if you need the parallel version, GNUparallelcan run a shell function cleanly wherexargswould need an awkwardsh -cwrapper.
See also
- find Command Cheat Sheet: the full find reference covering name, type, size, time, and exec patterns
- find -exec vs xargs: when to use each, the decision matrix, and the safety rules for weird filenames
- Find files containing text (find + grep): the most common find-then-tool pipeline, which
xargs -Pcan also parallelize - Bash while loop: the
while IFS= read -r -d '' fpattern for per-file logic that doesn't fitxargs - External: GNU findutils manual, xargs(1) man page, GNU parallel.





