TechEarl
Topic · Linux

Freedom to innovate, power to perform

35 articlesWritten by Ishan Karunaratne
Why find scripts break between macOS and Linux: -printf and -regextype are GNU only, regex flavors and stat format strings differ. The portable find subset, the gotchas, and brew install findutils for gfind.
FeaturedLinux

BSD find vs GNU find: Every macOS vs Linux Difference That Matters

macOS ships BSD find; Linux ships GNU find. The two share a name and most of an interface, but -printf, -regextype, and the stat format strings diverge hard enough to break scripts shipped between platforms. The full divergence list, the portable subset that works on both, and how to get GNU find on a Mac.

Read post
More in Linux
The grep -o | sort | uniq -c | sort -rn pipeline counts unique matches and ranks them. Why sort comes before uniq, worked log-analysis examples, sort -u, uniq -d, and the awk one-pass alternative.

How to Count Unique Matches with grep, sort, and uniq

The grep -o 'pattern' file | sort | uniq -c | sort -rn pipeline is the classic log-analysis one-liner. Why sort must come before uniq, how each stage works, worked examples for top IPs and status codes, the awk one-pass alternative for huge files, and the BSD vs GNU notes.

find walks the live filesystem every time; locate and plocate query a prebuilt updatedb database. Compare freshness, speed, permission-awareness, and filtering, plus the mlocate vs plocate split and the macOS mdfind alternative.

find vs locate vs mlocate: Which File Search Tool to Use

find walks the live filesystem every time it runs: always current, sometimes slow. locate queries a prebuilt database: instant, but stale until the next updatedb. This breaks down the locate family (mlocate, plocate, slocate), the macOS situation, and exactly when to reach for each one.

Use grep 'pattern' file | awk '{print $2}' to filter lines and print a specific column. awk field basics, custom separators with -F, multi-column output, grep -o and cut alternatives, and when awk alone replaces the pipe.

How to grep and Print a Specific Column (grep + awk)

grep filters lines, awk extracts fields. The classic pipe is grep 'pattern' file | awk '{print $2}'. This covers awk field basics ($1, $NF), custom separators with -F, multi-column output, the cases grep -o and cut cover on their own, and the fact that awk's own pattern match makes the grep half optional.

rsync files modified in the last N days by piping find -print0 into rsync --files-from=- --from0. The relative-path gotcha, the dry run, BSD vs GNU notes, and when rsync filters replace find.

How to rsync Only the Files find Selected

rsync has no native time filter, so the standard trick is to let find pick the files and feed the list to rsync. The one-liner is find ... -print0 | rsync --files-from=- --from0, and the failure mode is always the same: the paths in the list have to be relative to the rsync source argument. The breakdown, the dry run habit, and when rsync's own filters make find unnecessary.

Archive every file matching a find pattern with tar. The safe find -print0 | tar --null --files-from=- one-liner, the macOS BSD tar -T difference, archiving by modification time, and gzip vs bzip2 vs xz vs zstd.

How to Archive Files Matching a find Pattern with tar

find locates the files, tar archives them. The safe pairing is find -print0 piped into tar reading a NUL-delimited list from stdin: no breakage on spaces or newlines. The flag breakdown, the macOS BSD tar vs GNU tar difference, the -exec append alternative, archiving by modification time, and the compression choices.

grep is universal and searches everything you point it at; ripgrep (rg) is the fast Rust default that skips .gitignore'd and binary files; ag is the older fast-grep now superseded. Compare speed, defaults, and regex engines.

grep vs ripgrep vs ag: Which Search Tool to Use

grep is on every system and searches exactly what you point it at. ripgrep (rg) is the fast Rust-based default for code search: it skips .gitignore'd, hidden, and binary files unless told otherwise. ag (the_silver_searcher) was the older fast-grep, now largely superseded by ripgrep. This breaks down speed, defaults, regex engines, and exactly when to reach for each one.

Use xargs -P to run find results in parallel: find ... -print0 | xargs -0 -P 4 -n 1 cmd. Set -P to the core count, why -n 1 matters, CPU-bound vs IO-bound work, and xargs -P vs GNU parallel.

How to Run find in Parallel with xargs -P

find . -type f -name '*.log' -print0 | xargs -0 -P 4 -n 1 gzip compresses every matched file four at a time. The flags that make it work: -P for parallel workers, -n 1 so each worker gets one job, -0 paired with find's -print0 for safety. When parallelism helps (CPU-bound work) and when it just thrashes the disk.

find -name uses shell globs on the basename; find -regex matches a full regular expression against the whole path. The -regextype flavors, the GNU emacs vs BSD basic default drift, and when each one is the right tool.

find -regex vs -name: When to Use Regex in find

find -name takes a shell glob and matches the basename; find -regex takes a full regular expression and matches the whole path. That whole-path detail is the number one surprise: -regex '.*\.txt' works but -regex '.txt' matches nothing. The flag reference, -regextype flavors, the GNU vs BSD default-flavor drift, and when -name is the better tool.

Skip node_modules, .git, and build output from grep with --exclude-dir, exclude filename globs with --exclude, and search only matching files with --include. The one-liner, brace expansion, --exclude-from, and the BSD grep fallback.

How to Exclude Files and Directories from grep

grep does not read .gitignore, so skipping node_modules, .git, and build output is on you. The flags that do it: --exclude for filename globs, --exclude-dir for whole directories, --include for the inverse, --exclude-from to read the list from a file, plus the find -prune fallback for older macOS grep.

Exclude a directory in find with -path './node_modules' -prune -o ... -print. Why the trailing -print is mandatory, the multi-directory form, the slower -not -path alternative, and BSD vs GNU notes.

How to Exclude a Directory in find (the -prune Pattern Explained)

find -path './node_modules' -prune -o -type f -print skips a directory subtree instead of walking into it. The pattern looks strange because -prune is an action, not a test, and the trailing -print is mandatory once you write an explicit action. The breakdown, the multi-directory form, the slower -not -path alternative, and when each one is the right call.

Use grep -l 'pattern' files to list only the filenames that contain a match. The inverted grep -L, the recursive grep -rl one-liner, the NUL-safe xargs pipeline for find-and-replace, and the macOS BSD vs GNU notes.

How to List Only Filenames with grep -l

grep -l prints the name of each file that contains a match and stops reading at the first hit, which makes it the fast answer to 'which files contain this string'. The lowercase -l, the inverted -L for files missing a pattern, the grep -rl one-liner, the NUL-safe xargs pipeline for find-and-replace, and the BSD vs GNU notes.

Rank the biggest files on a full disk with find -printf '%s %p' piped to sort -rn. The GNU one-liner, the BSD stat variant for macOS, why -xdev matters, human-readable sizes, and when du or ncdu beats find.

How to Find the Largest Files on Disk (find, sort, du)

find / -xdev -type f -printf '%s %p\n' | sort -rn | head -20 gives you a ranked list of the biggest files on a full disk. The GNU one-liner, the BSD/macOS stat variant, why -xdev matters, human-readable output with numfmt, when to switch to du or ncdu for per-directory totals, and the mistakes that send a scan into /proc.

grep -E vs grep -P explained: basic regex (BRE) treats + ? | ( ) { } as literal text, extended regex (ERE) makes them metacharacters, and PCRE adds lookaround and \d. Plus why macOS BSD grep has no -P.

grep Regex: BRE vs ERE vs PCRE Explained

grep has three regex engines and the default one surprises everyone: in basic regex (BRE) the characters + ? | ( ) { } are literal text until you backslash-escape them. -E switches to extended regex (ERE) where they work bare, and -P unlocks Perl-compatible regex with lookaround and \d. The full BRE vs ERE vs PCRE comparison, the same pattern in all three, and why -P does not exist on macOS.

Use find -user, -group, and -perm to locate files by ownership and mode. The -perm -mode vs -perm /mode distinction explained, world-writable and SUID/SGID audit recipes, orphaned-file checks, and the BSD vs GNU find differences on macOS.

How to Find Files by Owner, Group, or Permission with find

find -user www-data lists every file owned by a user; -group developers filters by group; -perm matches the mode bits. The subtle part is -perm -mode (all of these bits set) versus -perm /mode (any of these bits set). Plus the security-audit recipes for world-writable files and SUID/SGID binaries, the BSD vs GNU divergences, and the orphaned-file checks.

Use grep -w to match a whole word instead of a substring. What grep counts as a word boundary, the \b and \< \> regex equivalents, -x for whole-line match, and BSD vs GNU differences.

How to Match a Whole Word with grep -w

grep cat also matches category, concatenate, and scatter. grep -w cat matches only the standalone word. The whole-word flag, what grep counts as a word boundary, the regex equivalents with \b and \< \>, the stricter -x whole-line cousin, and the BSD vs GNU differences that bite on macOS.

Use find ... -print0 | xargs -0 grep -l 'PATTERN' to find every file containing a string. When grep -r is enough, when to add find as a pre-filter for performance, multi-pattern matching with -E, and the safe NUL-delimited pipeline.

How to Find Files Containing Specific Text (find + grep)

find ... -print0 | xargs -0 grep -l 'PATTERN' finds every file containing a piece of text. The combo handles weird filenames, scales to huge trees, and replaces three other common but broken pipelines. When to use grep -r alone, when to add find as a pre-filter, and the BSD vs GNU pitfalls.

Use find -type d -empty to list empty directories and find -type f -empty for empty files. The -depth trap for deleting nested empty trees, the hidden-file gotcha, the safe two-pass cleanup, and BSD vs GNU find notes.

How to Find (and Delete) Empty Directories and Files

find . -type d -empty lists every empty directory; find . -type f -empty lists every empty file. The catch is what 'empty' means (a hidden file makes a directory not empty) and the -depth trap that lets find -delete collapse whole nested empty trees in one pass. The flag reference, the safe two-pass cleanup, the BSD vs GNU notes, and the mistakes that bite.

Search multiple patterns with grep: grep -e 'A' -e 'B', grep -E 'A|B' alternation, and grep -f patterns.txt. Covers -F fixed strings, AND logic with chained greps and PCRE lookahead, and BSD vs GNU differences on macOS.

How to Search Multiple Patterns with grep

grep can OR several patterns three ways: -e per pattern, -E with alternation, or -f reading the list from a file. The one-liner is grep -E 'ERROR|WARN|FATAL' file. Here is when to pick each, how -F speeds up literal multi-pattern search, why grep has no single-pass AND, and the BSD vs GNU differences that bite on macOS.

Use grep -C 3 'pattern' file to print 3 lines before and after each match. The -A, -B, -C context flags, the -- group separator, asymmetric context, recursive search, and BSD vs GNU grep differences.

How to Show Lines Before and After a grep Match (Context)

grep -C 3 'pattern' file prints the matching line plus 3 lines on each side. The three context flags (-A after, -B before, -C both), how the -- group separator works between match blocks, asymmetric context, recursive context search, and the macOS BSD vs GNU differences that bite.

find -exec ... {} + batches arguments into one command (fast). find ... -exec ... {} \; forks per file (slow). xargs adds shell flexibility but needs -0 for safety. The decision matrix and performance comparison.

find -exec vs xargs: Which to Use (and the {} + Trick That Beats Both)

find -exec ... {} + and find -print0 | xargs -0 are roughly equivalent for batch operations on matched files. find -exec ... {} \; forks once per match and is much slower. The decision matrix: when -exec is enough, when xargs adds value, and the safety rules for filenames with spaces, newlines, and quotes.

grep -c counts matching lines, not occurrences. Use grep -o piped into wc -l for the true count, grep -rc for per-file counts, grep -vc to count non-matching lines, plus the macOS BSD vs GNU differences.

How to Count Matches with grep -c (and the Line-vs-Occurrence Trap)

grep -c counts matching LINES, not occurrences. A line with three hits still counts as 1. The fix is grep -o piped into wc -l, which puts every match on its own line first. Per-file counts, filtering out the :0 noise, counting non-matching lines, and the BSD vs GNU differences.

Use find -type f -name '*.txt' for one extension, or group -name tests in escaped parens joined by -o for many (.jpg, .png, .gif). Case-insensitive -iname, files with no extension, the -regex shortcut, and BSD vs GNU find differences.

How to Find Files by Extension (One or Many) with find

find . -type f -name '*.txt' lists every file with one extension. For many extensions you group -name tests with escaped parens and join them with -o. This covers the single one-liner, the multi-extension OR pattern, why the parens are mandatory, case-insensitive -iname, files with no extension at all, the -regex shortcut, and the BSD vs GNU divergence that bites on macOS.

find -delete removes every matched file with no confirmation. The safe -print-first dry-run pattern, depth-first directory deletion, when to use -exec rm vs xargs rm -f, and the BSD vs GNU differences.

How to Find and Delete Files Safely with find -delete

find -delete removes every matched file with no confirmation and no undo. The safe pattern is to write the command with -print first, eyeball the list, then swap -print for -delete. Plus the directory-depth-first trap, when to use -exec rm instead, and the find -delete vs xargs rm -f tradeoff.

How to grep case-insensitively with grep -i. Combine -i with -r, -w, -v, -c, the locale caveat for non-ASCII case folding, the PCRE (?i) inline flag, and BSD vs GNU grep differences.

How to grep Case-Insensitively (grep -i)

grep -i 'pattern' file matches regardless of case. The flag pairs with -r, -w, -v, and -c the way you would expect, but -i only folds ASCII case reliably. Non-ASCII case folding (accented characters, the Turkish dotted-i) depends on your locale. The combinations, the locale caveat, the PCRE per-pattern (?i) flag, and the BSD vs GNU differences.

Use find -size +100M to list files larger than 100 megabytes. Unit suffixes (c/k/M/G), +/- sign convention, combine with sort -rn to surface the biggest files on disk, and BSD vs GNU rendering differences.

How to Find Files Larger Than a Size with find -size

find . -size +100M lists every file larger than 100 megabytes. The unit suffixes (c, k, M, G), the +/- sign convention, how to combine with sort to find the biggest files on disk, the BSD vs GNU divergence for printing sizes, and the wc -c trick for byte-exact thresholds.

Use grep -v 'pattern' file to print every line that does not match. Exclude multiple patterns with -e or -vE, strip comments and blank lines, count with -vc, and avoid the OR-becomes-AND double-negative trap.

How to Exclude Matches with grep -v (Invert Match)

grep -v 'pattern' file prints every line that does NOT match. The flag reference, how to exclude multiple patterns, the strip-comments-and-blank-lines pipeline, the double-negative trap where -v of an OR becomes an AND of negations, and the macOS BSD vs GNU differences.

Use find -mtime -7 to list files modified in the last 7 days. The off-by-one (-7 means under 7 days, +7 means over 7 days), -mmin for minute resolution, -newer for exact timestamps, and the BSD rounding gotcha on macOS.

How to Find Files Modified in the Last 7 Days (find -mtime)

find -mtime -7 lists every file modified in the last 7 days. The catch is the off-by-one: -7 means less than 7 days ago, +7 means more than 7 days ago, and exact-7 almost never matches what people expect. The flag reference, worked variations for hours and minutes, the BSD vs GNU rounding difference, and the safe cleanup patterns.

Use grep -r 'pattern' . to search every file in a directory tree. The -r vs -R symlink difference, --include and --exclude-dir filters, -rl and -rn, and the macOS BSD vs GNU grep gaps.

How to grep Recursively Through a Directory

grep -r 'pattern' . searches every file under a directory tree. The catch is the path argument people forget, the -r vs -R symlink difference, and the unfiltered crawl into node_modules and .git. The flag reference, the --include and --exclude-dir filters, the macOS BSD vs GNU gaps, and when to reach for ripgrep or git grep instead.

jpegoptim CLI usage: lossless and lossy modes, --max quality, --strip-all metadata removal, bulk find + xargs -P parallel pipelines, comparisons with mozjpeg, squoosh-cli, and sharp-cli.

How to Optimize JPEG Images Using jpegoptim

Use jpegoptim to losslessly or lossy-compress JPEGs from the command line, in bulk, and inside CI pipelines. Includes the install path on macOS/Linux/Windows, mozjpeg / squoosh-cli / sharp comparisons, and the parallel xargs pattern for tens of thousands of images.