Run find in Parallel with xargs -P: -P, -n 1, -print0 (2026)

find . -type f -name '*.log' -print0 | xargs -0 -P 4 -n 1 gzip compresses every matched .log file, four at a time, in parallel. The -P 4 is the part that does the work: it tells xargs to keep up to four gzip processes running at once instead of one after another. On a four-core box that's roughly a 4× speedup for a CPU-bound job like compression.

This is the focused parallelism deep-dive. If you want the broader "should I use -exec or xargs at all" decision, that's the find -exec vs xargs comparison. Here I assume you've already picked xargs and want to make it run jobs concurrently: the flags, how to size -P, which workloads actually speed up, and the mistakes that quietly kill the parallelism you think you turned on.

Set your values

Try it with your own values

Set your OS, search path, and the number of parallel jobs. Every command below updates with your values.

Operating systemSearch pathParallel jobs (-P)

The one-liner

bash· Linux (GNU)

find :search_path -type f -name '*.log' -print0 | xargs -0 -P :jobs -n 1 gzip

That runs up to :jobs gzip processes concurrently, each compressing one file. When a worker finishes, xargs immediately hands it the next file off the list. The pool stays full until the file list is exhausted.

The three flags that matter

Three xargs flags turn a serial pipeline into a parallel one safely.

Flag	Meaning	Why you need it
`-P N`	Run up to N command processes at once	This is the parallelism. `-P 1` (default) is fully serial.
`-n N`	Pass N arguments per command invocation	With `-P`, set `-n 1` so each worker gets exactly one job.
`-0`	Read NUL-delimited input	Pairs with find's `-print0`. The only safe way to handle filenames with spaces and newlines.

-P alone isn't enough. The next two sections explain why -n 1 and -0 are not optional.

Why -n 1 is required with -P

Without -n, xargs packs as many file paths as it can into a single command invocation, up to the kernel argument-list limit. That's the right default for a serial xargs gzip, because one invocation processing 5,000 files is faster than 5,000 invocations. But it destroys parallelism.

Here's the trap. find ... -print0 | xargs -0 -P 4 gzip looks parallel. It has -P 4. But xargs batches all matched files into (typically) one giant invocation, so there is only one gzip command to run. -P 4 has nothing to spread across four workers. You get one process, zero speedup, and a command that looks correct in code review.

-n 1 fixes it by forcing one file per invocation. Now there are as many gzip commands as there are files, and -P 4 can keep four of them running. The tradeoff is fork overhead (one process per file), but for any job where the per-file work takes more than a few milliseconds, that overhead is noise.

A middle ground exists: -n 10 -P 4 gives each worker a batch of 10 files. Useful when the per-file work is tiny and fork cost would dominate, but for most CPU-bound jobs -n 1 is the right call because it keeps the workers evenly loaded.

Why -0 and -print0 are not optional

find ... -print0 separates output filenames with NUL bytes. xargs -0 reads NUL-separated input. NUL is the only byte that cannot appear in a Unix filename, so it's the only safe separator.

Plain find ... | xargs splits on whitespace. The moment a matched file has a space in its name, xargs passes the two halves as separate arguments and your command fails on "no such file". With -P in the mix this is worse, because the failure is now interleaved into concurrent output and harder to spot. Always pair -print0 with -0. The find -exec vs xargs article covers the filename-safety rules in full.

Sizing -P to the core count

For CPU-bound work, the sweet spot for -P is the number of logical CPU cores. More workers than cores just means the OS scheduler time-slices them, adding context-switch overhead without doing more work in parallel.

Don't hardcode 4. Query the core count at runtime:

bash· Linux (GNU)

find :search_path -type f -name '*.log' -print0 | xargs -0 -P "$(nproc)" -n 1 gzip

nproc reports logical cores on Linux. sysctl -n hw.ncpu is the macOS equivalent. On Windows, [Environment]::ProcessorCount does the same. There's also a useful shorthand on GNU xargs: -P 0 means "run as many jobs as possible", which xargs interprets as one per available core. BSD xargs does not support -P 0, so for cross-platform scripts use the explicit nproc / sysctl form.

One refinement: if the job is partly IO-bound (some CPU, some disk wait), going slightly above the core count can help, because workers blocked on IO leave a core free for another worker. -P at 1.5× cores is a reasonable starting point for mixed workloads. Measure, don't guess.

CPU-bound vs IO-bound: when parallelism actually helps

This is the single most important thing to understand before reaching for -P. Parallelism speeds up CPU-bound work and often does nothing for IO-bound work.

CPU-bound tasks spend their time computing: compression (gzip, zstd), hashing (sha256sum), image processing (resize, convert), video transcoding, minification. Each file keeps a core busy. Running N of them on N cores gives close to N× throughput. This is the case -P was built for.

IO-bound tasks spend their time waiting on the disk or network: copying files across a single spinning disk, rsync over one network link, reading many small files off a slow mount. The bottleneck is the device, not the CPU. Running four parallel cp jobs against one disk doesn't move data four times faster; it makes the disk head seek between four locations, and on a spinning disk that thrashing can make it slower than serial. NVMe SSDs tolerate parallel IO far better, but the device bandwidth is still a hard ceiling.

The quick test: if running the job on one file pegs a CPU core, it's CPU-bound and -P will help. If running it on one file leaves the CPU idle while the disk light is solid, it's IO-bound and -P mostly won't.

Worked examples

Parallel gzip (the canonical case, CPU-bound):

bash· Linux (GNU)

find :search_path -type f -name '*.log' -print0 | xargs -0 -P :jobs -n 1 gzip

Parallel image resize with ImageMagick (CPU-bound, big speedup on a folder of photos):

bash· Linux (GNU)

find :search_path -type f -name '*.jpg' -print0 | xargs -0 -P :jobs -n 1 -I {} mogrify -resize 1200x1200 {}

-I {} lets you place the filename mid-command instead of at the end. Note that -I implies -n 1 (one input per invocation), so you can drop the explicit -n 1 when you use -I {}. The -P still applies.

Parallel checksum with sha256sum (CPU-bound, useful for verifying a large file set):

bash· Linux (GNU)

find :search_path -type f -print0 | xargs -0 -P :jobs -n 1 sha256sum

macOS ships shasum rather than sha256sum; shasum -a 256 is the equivalent. Output ordering is not deterministic here: see the next section.

Parallel file conversion with ffmpeg (CPU-bound, transcoding is heavy):

bash· Linux (GNU)

find :search_path -type f -name '*.wav' -print0 | xargs -0 -P :jobs -n 1 -I {} ffmpeg -i {} {}.mp3

Two caveats for this ffmpeg example. First, the output name {}.mp3 appends rather than replaces the extension, so clip.wav becomes clip.wav.mp3. For a clean swap, wrap it in a shell that does the parameter expansion: -I {} sh -c 'ffmpeg -i "$0" "${0%.wav}.mp3"' {}. Second, many ffmpeg builds are already multi-threaded per file, so running four ffmpeg processes on a four-core box oversubscribes the cores. For internally-threaded tools, keep -P low (2 or 3) or cap each process's thread count.

The output-ordering caveat

Parallel workers write to the same terminal, and their output interleaves. With -P 4 -n 1 sha256sum, the four running processes finish in unpredictable order, and a process that emits multiple lines can have its lines split by another process's output. The result is a scrambled, sometimes mangled stream.

If order does not matter (compressing files in place, where the output is just status noise), ignore it. If order does matter, you have two options:

Capture per-file output. Have each worker write to its own file: xargs -0 -P 4 -n 1 -I {} sh -c 'sha256sum "{}" > "{}.sha256"'. No interleaving because each process owns its own output file.
Use GNU parallel with --keep-order. parallel buffers each job's output and replays it in input order once the job completes, so the combined stream reads as if the jobs ran serially even though they ran concurrently. xargs -P has no equivalent.

xargs -P vs GNU parallel

xargs -P and GNU parallel solve the same core problem. The differences come down to features versus ubiquity.

Aspect	`xargs -P`	GNU `parallel`
Availability	Built in everywhere (GNU and BSD)	Separate install (`apt install parallel`, `brew install parallel`)
Parallel jobs	`-P N`	`-j N`
Progress bar	No	Yes (`--bar`)
Ordered output	No	Yes (`--keep-order`)
Per-job logs	No	Yes (`--joblog`)
Retry failed jobs	No	Yes (`--retries`)
Input grouping	`-n`, `-L`	`-N`, `:::`, richer

The honest summary: xargs -P is good enough for the large majority of "run this on every matched file, N at a time" jobs, and it's already installed on every Unix box. Reach for GNU parallel when you specifically need a progress bar on a long run, ordered output, a job log to see which files failed, or automatic retries. For a one-off gzip of a log directory, xargs -P is the right tool and the smaller dependency.

macOS BSD vs GNU xargs

The good news: BSD xargs (the macOS default) supports the parallel flags. You do not need to install GNU findutils just to get -P.

Feature	GNU xargs	BSD xargs (macOS default)
`-P N` (parallel workers)	Supported	Supported
`-P 0` (auto core count)	Supported	NOT supported (use `sysctl -n hw.ncpu`)
`-0` (NUL delimiter)	Supported	Supported
`-n N` (args per invocation)	Supported	Supported
`-I {}` (replacement token)	Supported	Supported
`-r` / `--no-run-if-empty`	Supported	`-r` not needed; BSD skips empty input by default
`-L N` (lines per invocation)	Supported	Supported

The two practical gaps: BSD xargs has no -P 0 shorthand, so use -P "$(sysctl -n hw.ncpu)" explicitly; and BSD already skips the command on empty input, so the GNU -r flag is unnecessary there. Everything in the parallel pipeline (-P, -n, -0, -I) works the same on both.

Common mistakes

1. -P without -n 1, so batching kills the parallelism. xargs -0 -P 4 gzip batches every file into one invocation, leaving -P 4 nothing to parallelize. You get one process and zero speedup. Always add -n 1 (or -I {}, which implies it) when you want per-file parallelism.

2. Forgetting -print0 / -0. Plain find ... | xargs -P 4 splits filenames on whitespace and breaks on any path with a space. Use find ... -print0 | xargs -0 -P 4.

3. Setting -P far above the core count. -P 64 on an 8-core box doesn't run 64× faster. It runs 8 jobs' worth of work with 64 processes fighting over 8 cores, paying context-switch and memory overhead for nothing. Cap -P at (or near) the core count for CPU-bound work.

4. Parallelizing IO-bound work. Running -P 8 cp against a single spinning disk makes the head thrash and can be slower than serial. Parallelism is for CPU-bound jobs. Check whether one file pegs a core before scaling out.

5. Expecting ordered output. Parallel workers interleave their stdout. If you pipe xargs -P 4 sha256sum into a file expecting a clean checksum list, you'll get a scrambled one. Capture per-file output or use parallel --keep-order.

6. Oversubscribing with internally-threaded tools. ffmpeg, zstd -T0, and many image tools already use multiple threads per file. Running -P 8 of them on 8 cores oversubscribes by 8×. For self-threading tools, keep -P at 2 or 3, or cap each process's thread count.

When NOT to use xargs -P

Parallelism is not free, and a few situations call for something else:

IO-bound work on shared storage. Copying, moving, or rsync-ing across one disk or one network link. The device bandwidth is the ceiling; more workers just add seek thrashing. Run it serially.
Tasks with shared-resource contention. If every worker writes to the same database, the same log file, or the same lock, they serialize on that resource anyway, and you've added process overhead for no gain. Parallelism only helps when the workers are genuinely independent.
When you need progress, retries, or ordered output. xargs -P has none of these. For a long-running batch where you want a progress bar, a job log of failures, or automatic retries, GNU parallel is the better fit. Install it and use parallel --bar --joblog --retries.
Complex per-file logic. If each file needs conditional branching ("transcode only if the output doesn't already exist and the input is newer"), wrap a shell loop instead: find ... -print0 | while IFS= read -r -d '' f; do ...; done. That's serial, but if you need the parallel version, GNU parallel can run a shell function cleanly where xargs would need an awkward sh -c wrapper.

FAQ

-P N tells xargs to run up to N command processes in parallel instead of one at a time. The default is -P 1, which is fully serial. When a worker finishes, xargs immediately starts the next job, keeping the pool of N workers full until the input is exhausted.

It only helps if there are multiple command invocations to spread across the workers, which means you also need -n 1 (or -I ) so each invocation handles one input.

Almost always because -n 1 is missing. Without it, xargs batches every input into one command invocation, so there is only one process to run and -P has nothing to parallelize.

The fix: find ... -print0 | xargs -0 -P 4 -n 1 cmd. The -n 1 forces one input per invocation, giving -P 4 four jobs to run concurrently. Using -I also works because it implies -n 1.

For CPU-bound work like compression or hashing, set -P to the number of logical CPU cores. Query it at runtime instead of hardcoding: -P "$(nproc)" on Linux, -P "$(sysctl -n hw.ncpu)" on macOS. On GNU xargs, -P 0 is a shorthand for "one job per core".

For mixed CPU-and-IO work, slightly above the core count (around 1.5×) can help, since workers waiting on IO free a core for another worker. For tools that are already multi-threaded per file, like ffmpeg, keep -P low to avoid oversubscribing.

Usually not. Copying is IO-bound: the bottleneck is the disk or network, not the CPU. Running four parallel cp jobs against one spinning disk makes the drive head seek between four locations, which can be slower than copying serially. NVMe SSDs handle parallel IO better, but the device bandwidth is still a hard ceiling.

xargs -P shines on CPU-bound work: gzip, sha256sum, image resizing, video transcoding. The quick test is whether running the job on one file pegs a CPU core.

They do the same core job: run a command on many inputs, N at a time. xargs -P is built into every Unix system, including macOS. GNU parallel is a separate install but adds a progress bar (--bar), ordered output (--keep-order), per-job logs (--joblog), and automatic retries.

Use xargs -P for most jobs: it is good enough and already installed. Reach for GNU parallel when you need progress visibility, ordered output, or retry handling on a long batch run.

Parallel workers all write to the same terminal, and their output interleaves unpredictably. A process that prints multiple lines can have its lines split apart by another process's output. This is expected behavior, not a bug.

If order matters, capture each worker's output to its own file: xargs -0 -P 4 -n 1 -I {} sh -c 'cmd "{}" > "{}.out"'. Or use GNU parallel with --keep-order, which buffers each job's output and replays it in input order.

Yes. The BSD xargs that ships with macOS supports -P, -n, -0, and -I , so the full parallel pipeline works without installing GNU findutils.

The one gap is -P 0: GNU xargs treats it as "auto-detect the core count", but BSD xargs does not. On macOS use -P "$(sysctl -n hw.ncpu)" to get the same effect explicitly.

How to Run find in Parallel with xargs -P

Set your values

The one-liner

The three flags that matter

Why -n 1 is required with -P

Why -0 and -print0 are not optional

Sizing -P to the core count

CPU-bound vs IO-bound: when parallelism actually helps

Worked examples

The output-ordering caveat

xargs -P vs GNU parallel

macOS BSD vs GNU xargs

Common mistakes

When NOT to use xargs -P

See also

FAQ

Ishan Karunaratne

Related posts

How to Use .gitignore (with Examples)

How to Run a Local LLM with Ollama

How to Run MariaDB in Docker (With Persistent Storage)

What does xargs -P do?

Why is my xargs -P command not running in parallel?

What should I set xargs -P to?

Does xargs -P speed up file copying?

How is xargs -P different from GNU parallel?

Why is the output of xargs -P scrambled?

Does macOS xargs support -P?

Ishan Karunaratne