TechEarl

How to Archive Files Matching a find Pattern with tar

find locates the files, tar archives them. The safe pairing is find -print0 piped into tar reading a NUL-delimited list from stdin: no breakage on spaces or newlines. The flag breakdown, the macOS BSD tar vs GNU tar difference, the -exec append alternative, archiving by modification time, and the compression choices.

Ishan KarunaratneIshan Karunaratne⏱️ 13 min readUpdated
Archive every file matching a find pattern with tar. The safe find -print0 | tar --null --files-from=- one-liner, the macOS BSD tar -T difference, archiving by modification time, and gzip vs bzip2 vs xz vs zstd.

find is good at selecting files. tar is good at packing them into one archive. The job of gluing them together has one safe answer and several unsafe ones. The safe answer is to have find emit a NUL-delimited list of paths and have tar read that list from stdin:

bash
find . -type f -name '*.log' -print0 | tar --null -czf logs.tar.gz --files-from=-

That finds every .log file under the current directory and writes a gzip-compressed tar archive containing exactly those files. No temporary file list, no breakage when a filename has a space or a newline in it. This page is the reference I keep open whenever a deploy script or a log-rotation job needs to bundle a filtered set of files.

Set your values

Try it with your own values

Set your OS, search path, and archive name. Every tar command below updates with your values.

The safe one-liner

bash· Linux (GNU)
find :search_path -type f -name '*.log' -print0 | tar --null -czf :archive_name --files-from=-

The Linux and macOS commands differ in exactly one place: GNU tar names the option --files-from=-, BSD tar (what macOS ships) spells the same idea -T -. Both read the file list from stdin (-), and both accept --null to say that list is NUL-delimited. Everything else is identical.

Breaking down the flags

The pipeline has two halves. find produces the list, tar consumes it.

FlagSideWhat it does
-print0findEmit each matched path followed by a NUL byte instead of a newline
--nulltarTell tar the incoming list is NUL-delimited, not newline-delimited
--files-from=-GNU tarRead the list of files to archive from this file; - means stdin
-T -BSD tarSame as --files-from=-; BSD tar's spelling
-ctarCreate a new archive
-ztarCompress the archive with gzip
-f :archive_nametarWrite to this file (the f flag always takes the next argument as the filename)

The -czf cluster is just -c -z -f collapsed. Order inside the cluster matters for the -f: whatever follows the cluster is taken as the archive filename, so -f must be last in the group.

Why -print0 and --null go together

A Unix filename can contain any byte except / and NUL. That includes spaces, tabs, and newlines. The naive pipeline:

bash
find . -name '*.log' | tar -czf logs.tar.gz --files-from=-

splits the list on newlines. The moment one of your log files is named app log.txt or, worse, weird\nname.log with an embedded newline, tar either archives the wrong path or fails outright. -print0 separates entries with NUL, which is the one byte that cannot appear in a filename, so the split is always unambiguous. --null tells tar to expect that separator. This is the exact same principle as find -print0 | xargs -0: NUL in, NUL out, nothing in between can corrupt the list.

If you only ever archive files you named yourself and you are certain none contain whitespace, the newline version works. I still use -print0 everywhere because the cost is one flag and the failure mode is a silently wrong archive.

The macOS BSD tar vs GNU tar difference

macOS ships bsdtar (from libarchive) as /usr/bin/tar. Linux distributions ship GNU tar. They agree on the common short flags (-c, -z, -f, -x, -t) but diverge on the long options.

BehaviorGNU tar (Linux)BSD tar (macOS)
Read file list from a file--files-from=FILE or -T FILE-T FILE (no --files-from)
Read file list from stdin--files-from=- or -T --T -
NUL-delimited input list--null--null
Append to existing archive-r / --append-r / --append
Create with gzip-czf-czf
Create with xz-cJf-cJf
Create with zstd--zstd--zstd (newer libarchive)

The portable choice is -T - plus --null: GNU tar accepts -T as a synonym for --files-from, so a script using -T - runs unchanged on both platforms. That is why the macOS variant above uses -T - and you can safely use it on Linux too. If you want GNU tar's behavior on macOS, install it with brew install gnu-tar and call gtar.

The -exec alternative (append mode)

You can skip the pipe entirely and have find invoke tar directly with -exec:

bash
find . -type f -name '*.log' -exec tar -rvf logs.tar {} +

-r is append mode: tar adds each batch of files to an existing (or new) archive. The {} + form batches many paths into one tar call, so this is not one fork per file.

The catch: you cannot append to a compressed archive. tar -r needs to seek to the end of the archive, and a gzip or xz stream is not seekable. So this is a two-step process:

bash
find . -type f -name '*.log' -exec tar -rvf logs.tar {} +
gzip logs.tar

First build the uncompressed logs.tar, then compress it to logs.tar.gz as a separate step. For most jobs the find -print0 | tar --null pipeline is simpler because it creates the compressed archive in one pass. Reach for -exec ... -r only when you genuinely need to append to an archive that already exists.

Archive files modified today or in the last N days

Because the file selection is just a find expression, any find test composes in. Add -mtime to archive by modification time:

bash· Linux (GNU)
find :search_path -type f -mtime -1 -print0 | tar --null -czf :archive_name -T -

-mtime -1 matches files modified in the last 24 hours, so this archives "everything changed today". For the last 7 days use -mtime -7; for minute resolution use -mmin -60 (last hour). The sign convention and the off-by-one rounding rule are covered in find files modified in the last 7 days. You can stack tests freely: find . -type f -name '*.log' -mtime -7 -print0 archives only the log files touched this week.

Directory structure: preserved vs flattened

tar stores whatever path string find hands it, verbatim. If find . emits ./var/log/app.log, the archive stores ./var/log/app.log, and extracting recreates var/log/app.log under your current directory. The structure is preserved because the paths are relative and include their directories.

Two things to know:

  • Run find from the directory you want as the archive root. find . -type f ... gives you relative paths; find /var/log -type f ... gives you absolute-ish paths starting /var/log/..., and GNU tar strips the leading / with a warning. Use cd /var/log && find . -type f ... so the archive is rooted cleanly.
  • To flatten (strip directories), tar cannot do it on create from a file list. If you genuinely need every file at the archive's top level, you need a copy step first or tar --transform (GNU only). Flattening risks name collisions, so I avoid it unless the job specifically calls for it.

Extract and list contents

To see what is inside without unpacking:

bash
tar -tzf logs.tar.gz

-t lists, -z says it is gzip, -f names the file. To extract everything back:

bash
tar -xzf logs.tar.gz

-x extracts. Add -C /target/dir to extract somewhere other than the current directory: tar -xzf logs.tar.gz -C /tmp/restore. To pull out a single file, name it after the archive: tar -xzf logs.tar.gz ./var/log/app.log (the path must match exactly what tar -t shows).

Compression choice: gzip, bzip2, xz, zstd

tar itself does not compress; it pipes the archive through a compressor selected by a flag. The four common choices:

Compressortar flagSpeedRatioNotes
gzip-zFastModerateUniversal, the safe default
bzip2-jSlowBetter than gzipLargely superseded by xz and zstd
xz-JSlowestBest ratioGreat for archives you store and rarely touch
zstd--zstdVery fastNear xz at high levelsBest speed-to-ratio balance; needs a recent tar

For day-to-day log bundling I use -z (gzip): it is everywhere, decompresses fast, and the ratio is fine for text. For archives I am shipping over a slow link or storing long-term, --zstd is the modern pick. -J (xz) wins on pure ratio if archive size is the only thing that matters and you do not mind the CPU cost. -j (bzip2) has no real niche left.

The file extension is just convention: .tar.gz / .tgz for gzip, .tar.bz2 for bzip2, .tar.xz for xz, .tar.zst for zstd. tar does not enforce it, but matching the extension to the compressor keeps everyone sane.

Common mistakes

1. Newline-delimited list with whitespace filenames. find . | tar -T - without -print0 and --null breaks on any filename containing a space or newline. Always pair -print0 with --null.

2. Trying to append to a .tar.gz. tar -rf logs.tar.gz newfile fails: append mode needs a seekable archive and a gzip stream is not seekable. Build the .tar uncompressed, append to it, then compress as a final step.

3. Forgetting --null after using -print0. If find emits NUL-delimited paths but tar still expects newlines, tar sees the whole stream as one giant filename. The two flags are a matched pair.

4. Path vs basename confusion. The archive stores the exact path string find produced. find /var/log ... puts var/log/... in the archive (leading slash stripped); cd /var/log && find . ... puts ./.... Decide where the archive should be rooted and run find from there.

5. Using GNU --files-from on macOS. BSD tar does not recognize --files-from. Use -T -, which both tars accept, for portable scripts.

6. Archiving an absolute-path tree and being surprised on extract. GNU tar strips leading / on create and warns; on extract it lands relative to the current directory. If you expected files to restore to their original absolute locations, they will not (by design, for safety).

When NOT to use find + tar

This pipeline is for "pack a filtered snapshot of files into one archive". It is the wrong tool when:

  • You need incremental sync. rsync copies only what changed and can mirror a directory efficiently. For keeping two locations in step, use find and rsync for selective transfers, not a fresh tar every time.
  • You need a cross-platform archive. Windows has no native tar in older versions. If a non-technical recipient or a Windows machine has to open the archive, zip is the safer interchange format. On Windows itself, PowerShell's Compress-Archive produces a .zip.
  • You need a real backup. tar is an archiver, not a backup system. It has no deduplication, no encryption, no retention policy, no integrity verification across snapshots. For actual backups use a tool built for it (restic, borg, or your platform's backup service). tar is fine as one building block inside a backup script, not as the whole thing.
  • The selection is trivial. If you just want to archive a whole directory, tar -czf out.tar.gz mydir/ needs no find at all. Bring in find only when the selection is a real filter.

For Windows, Compress-Archive is the closest built-in equivalent:

powershell
Get-ChildItem -Path . -Recurse -File -Filter '*.log' | Compress-Archive -DestinationPath archive.zip

See also

FAQ

TagsfindtarCLILinuxmacOSBSDShell ScriptingArchiving
Share
Ishan Karunaratne

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years across software, Linux systems, DevOps, and infrastructure — and a more recent focus on AI. Currently Chief Technology Officer at a tech startup in the healthcare space.

Keep reading

Related posts

Search multiple patterns with grep: grep -e 'A' -e 'B', grep -E 'A|B' alternation, and grep -f patterns.txt. Covers -F fixed strings, AND logic with chained greps and PCRE lookahead, and BSD vs GNU differences on macOS.

How to Search Multiple Patterns with grep

grep can OR several patterns three ways: -e per pattern, -E with alternation, or -f reading the list from a file. The one-liner is grep -E 'ERROR|WARN|FATAL' file. Here is when to pick each, how -F speeds up literal multi-pattern search, why grep has no single-pass AND, and the BSD vs GNU differences that bite on macOS.

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

How to Match a Domain Name with Regex

Match a domain name with regex. Basic labels, RFC 1035 length rules, subdomains, IDN punycode, trailing-dot form, JavaScript / Python / PHP examples, engine notes, and common mistakes.

Use xargs -P to run find results in parallel: find ... -print0 | xargs -0 -P 4 -n 1 cmd. Set -P to the core count, why -n 1 matters, CPU-bound vs IO-bound work, and xargs -P vs GNU parallel.

How to Run find in Parallel with xargs -P

find . -type f -name '*.log' -print0 | xargs -0 -P 4 -n 1 gzip compresses every matched file four at a time. The flags that make it work: -P for parallel workers, -n 1 so each worker gets one job, -0 paired with find's -print0 for safety. When parallelism helps (CPU-bound work) and when it just thrashes the disk.