Find the Largest Files on Disk: find, sort, du (2026)

find / -xdev -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -20 gives you a ranked list of the 20 biggest files on the disk, largest first. This is the command I reach for the moment a server alerts on a full root filesystem and I need to know what is actually eating the space.

It is three tools doing one job. find walks the tree and prints each file's size and path. sort -rn ranks those lines by the leading number, descending. head -20 trims the output to something I can scan in one screen. The whole thing runs in seconds on a normal server and tells me, concretely, which files to investigate first.

This article is the focused "what is using my disk space" version. If you want the threshold-filter mechanics (the +100M sign convention, unit suffixes, size ranges), the find files larger than a size article covers that. Here the goal is different: I do not have a threshold in mind, I just want the disk's worst offenders ranked.

Set your values

Try it with your own values

Set your OS and search path. Every command below updates with your values.

Operating systemSearch path

The one-liner

bash· Linux (GNU)

find :search_path -xdev -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -20

The output is <size_in_bytes> <path>, one file per line, biggest at the top. On GNU systems -printf '%s %p\n' prints the raw byte count and the path with almost no overhead because it never forks a subprocess. The BSD variant has to call stat once per file (more on that gap below). Bump head -20 to head -50 for a wider sweep.

A few of those flags carry real weight, so it is worth being precise about what each one buys.

Why -xdev matters

-xdev tells find to stay on a single filesystem and never cross a mount point. When you anchor a scan at /, that one flag is the difference between a clean result and a scan that runs for an hour and returns garbage.

Without -xdev, find / descends into every mounted filesystem under the root tree:

/proc and /sys are virtual filesystems. The "files" in them are kernel interfaces, not disk data. find will report nonsense sizes (/proc/kcore famously looks like it is the size of all your RAM, sometimes terabytes) and waste time enumerating thousands of process entries.
Network mounts (NFS, SMB, sshfs) get walked over the wire. A scan that should take five seconds locally now takes minutes and hammers a remote server.
/dev is a device filesystem. Block devices show up with their addressable size, so a disk device node reads as the size of the disk.

-xdev skips all of it. The scan walks only the filesystem that / itself lives on. If you have separate partitions for /var or /home, that means you run the command once per partition (find /var -xdev ..., find /home -xdev ...), which is exactly what you want when you are diagnosing which partition is full.

The 2>/dev/null is the companion flag in spirit: it discards the permission-denied errors find prints when it hits a directory you cannot read. Without it, those errors interleave with your results and clutter the output.

Human-readable sizes

Raw byte counts are precise but hard to eyeball. 4823949312 does not register as "4.5 GB" at a glance. On GNU systems, numfmt fixes that:

bash· Linux (GNU)

find :search_path -xdev -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -20 | numfmt --to=iec --field=1 --padding=8

The sort happens on raw bytes first, then numfmt --to=iec converts only the displayed output to 4.5G, 812M, and so on. Doing it in that order matters: if you converted to human-readable strings before sorting, sort -rn would rank 9K above 8G because 9 is numerically larger than 8.

numfmt is part of GNU coreutils and is not installed on macOS by default. The BSD variant above uses an inline awk block to do the same conversion, which works on any system with awk (so, everything).

The du companion: per-directory totals

find answers "which individual file is biggest". That is the wrong question about as often as it is the right one. Frequently the disk is full not because of one huge file but because of ten thousand medium ones in a directory you forgot about: a runaway log directory, a Docker image cache, months of database backups. find would spread those across the bottom of the ranking and you would never see the pattern.

That is du's job. du sums sizes per directory subtree:

bash· Linux (GNU)

du -xh --max-depth=1 :search_path 2>/dev/null | sort -rh | head -20

du -h prints human-readable sizes, -x is du's equivalent of find's -xdev (stay on one filesystem), and --max-depth=1 (GNU) or -d 1 (BSD) limits the output to immediate children rather than every nested directory. sort -rh does a human-readable numeric sort that understands the K, M, G suffixes, so 2.1G ranks above 900M correctly.

The decision is simple:

Use find ... -printf '%s %p' when you suspect one or a few large files (a forgotten core dump, a giant log, a stray ISO). It gives you per-file granularity.
Use du -xh --max-depth=1 when you suspect accumulation: a directory that is large because of its contents in aggregate, not any single file. It gives you per-directory totals.

In practice I run du first to find the heavy directory, then cd into it and run du again one level deeper, drilling down until I hit either a single fat file (switch to find) or a directory full of small files I can now explain.

ncdu: the interactive option

When the drill-down is more than two or three levels deep, repeating du gets tedious. ncdu ("NCurses disk usage") does the same scan once and gives you an interactive, navigable tree: arrow keys to move, Enter to descend, sizes recalculated as you go, and a delete key bound right in the interface.

bash

# Debian/Ubuntu
sudo apt install ncdu
# macOS
brew install ncdu

# Scan a path, stay on one filesystem
ncdu -x /

ncdu -x / scans once, then hands you a browsable view sorted by size at every level. For an unfamiliar server where I do not yet know the shape of the disk, ncdu is the fastest way to build a mental model. The -x flag is the same one-filesystem guard as find's -xdev and du's -x. The trade-off versus the one-liners: ncdu is interactive, so it does not compose into a script or a cron job. For automation, stay with find and du.

The full-server disk audit recipe

When a production box alerts on disk and I need to diagnose it quickly, this is the sequence I run, top to bottom:

bash

# 1. Which filesystem is actually full?
df -h

# 2. Which top-level directories are heaviest on that filesystem?
du -xh --max-depth=1 / 2>/dev/null | sort -rh | head -20

# 3. Drill into the heavy directory (repeat as needed)
du -xh --max-depth=1 /var 2>/dev/null | sort -rh | head -20

# 4. Once narrowed down, rank individual files in the suspect path
find /var/log -xdev -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -20

# 5. Check for deleted-but-open files holding space (the silent killer)
lsof -nP 2>/dev/null | awk '/deleted/ && $7 ~ /^[0-9]+$/ && $7 > 104857600'

Step 5 catches a case neither find nor du can see: a process that has deleted a large file but still holds the file descriptor open. The disk space is not freed until that process closes the handle or restarts, and the file no longer appears in the directory tree at all. df reports the space as used, du reports it as free, and the two disagreeing is the tell. When that happens, restart the process holding the descriptor (lsof names it) and the space comes back.

macOS BSD vs GNU find

The headline difference for this task is -printf. GNU find has it; BSD find (the default on macOS) does not. That single gap is why every example in this article ships a separate mac variant.

Feature	GNU find (Linux)	BSD find (macOS default)
`-printf '%s %p'` (size + path, no fork)	Supported	NOT supported
`-xdev` (stay on one filesystem)	Supported	Supported (also spelled `-x`)
Size ranking without `-printf`	n/a	`-exec stat -f '%z %N'` (forks per file)
`numfmt` for human sizes	In coreutils	Not installed by default
`du --max-depth=N`	Supported	Use `-d N` instead
`du -x` (one filesystem)	Supported	Supported

Because BSD find lacks -printf, the macOS one-liner falls back to -exec stat -f '%z %N' {} \;, which forks a stat process for every file. On a tree with hundreds of thousands of files that is noticeably slower than the GNU version. Two ways around it: install GNU findutils (brew install findutils, then use gfind with -printf), or skip per-file ranking entirely and lean on du, which is fast and identical in behavior on both platforms.

Common mistakes

1. Forgetting -xdev on a / scan. This is the big one. Without it, find walks /proc, /sys, /dev, and every network mount. You get bogus multi-terabyte "files" from kernel interfaces and a scan that takes an order of magnitude longer. Always pair a /-anchored find with -xdev.

2. Confusing logical size with physical size. find -printf '%s' reports the logical file size, the same number ls -l shows. That is not always how much disk the file consumes. The next two mistakes are specific cases of this.

3. Sparse files. A sparse file has a large logical size but holds far fewer actual blocks on disk (the unwritten regions are not allocated). VM disk images and database files are commonly sparse. find -printf '%s' shows the logical size and overstates the real disk impact. For allocated blocks, use find -printf '%b' (512-byte block count, GNU) or du, which reports actual allocation by default.

4. Compression and copy-on-write filesystems. On ZFS or btrfs with transparent compression, a file's logical size and its on-disk footprint can diverge sharply. find reports logical bytes; du reports allocated blocks. When the two disagree, trust du for "how much disk is this costing me".

5. Sorting human-readable output with sort -rn. If you run numfmt or du -h before sorting, sort -rn reads 9K and 8G as 9 and 8 and ranks the kilobyte file higher. Either sort raw bytes then format (the find | sort -rn | head | numfmt order), or use sort -rh which understands the suffixes.

6. Running the scan without 2>/dev/null. Permission-denied errors interleave with results and bury the actual data. Redirect stderr. (Conversely, if you suspect a permissions issue is hiding files, drop the redirect once to see the errors.)

When NOT to use this

find -printf '%s %p' | sort -rn is the per-file tool. Reach for something else when:

You want per-directory totals, not per-file. Use du -xh --max-depth=1. A disk filled by accumulation (thousands of small files in one tree) is invisible to a per-file ranking. du is the right lens.
You are exploring an unfamiliar disk interactively. Use ncdu. Repeatedly re-running du to drill down is slower and more error-prone than a single ncdu scan you can navigate.
You need physical disk usage, not logical size. Use du (allocated blocks) or find -printf '%b'. On filesystems with sparse files or compression, logical size from -printf '%s' is misleading.
The space is genuinely gone but nothing shows up. df says full, du says there is room: that is a deleted-but-open file. Use lsof | grep deleted, not find. find cannot see a file that no longer has a directory entry.
You want a fixed-size threshold filter. "Everything over 1 GB" is find -size +1G, covered in find files larger than a size. This article is for ranking when you do not have a threshold in mind.

FAQ

The canonical one-liner is find / -xdev -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -20. find prints each file's size in bytes and its path, sort -rn ranks those lines by size descending, and head -20 trims to the top 20. The -xdev flag keeps the scan on a single filesystem so it does not waste time in /proc and /sys, and 2>/dev/null discards permission-denied noise.

Start with df -h to see which filesystem is full, then run du -xh --max-depth=1 / 2>/dev/null | sort -rh | head -20 to rank the top-level directories by total size. Drill into the heaviest one by re-running du against it. When you reach a directory whose weight comes from one or a few large files, switch to the find per-file ranking. Use du for per-directory totals and find for individual files.

Because you anchored it at / without -xdev. By default find crosses mount points, and /proc, /sys, and /dev are virtual filesystems mounted under the root tree. Their entries are kernel interfaces, not disk data, so find reports nonsense sizes and wastes time enumerating them. Adding -xdev confines the scan to the one filesystem that / lives on.

macOS ships BSD find, which does not support the -printf action that GNU find uses to print sizes. The macOS variant falls back to -exec stat -f '%z %N' {} \;, which forks a stat process for every file and so runs slower on large trees. If you do this often on a Mac, install GNU findutils with brew install findutils and use gfind with -printf, or lean on du, which behaves identically on both platforms.

Use du when the disk is full because of accumulation: thousands of small or medium files in one directory tree. A per-file ranking from find spreads those across the bottom of the list and hides the pattern, whereas du sums them per directory. Use ncdu when you want to explore an unfamiliar disk interactively; it scans once and gives you a navigable, size-sorted tree instead of forcing you to re-run du at each level. Stick with find when you suspect one or a few individually large files.

That gap usually means a process deleted a large file but still holds the file descriptor open. The disk space is not reclaimed until the process closes the handle or restarts, and because the file has no directory entry anymore, neither du nor find can see it. Run lsof | grep deleted to identify the process holding the descriptor, then restart that process to free the space.

find -printf '%s' reports the logical file size (the byte count ls -l shows), while du reports the physical blocks actually allocated on disk. The two diverge for sparse files (large logical size, few real blocks) and on filesystems with transparent compression like ZFS or btrfs. For "how much disk is this file actually costing me", trust du or use find -printf '%b' for the allocated block count.

How to Find the Largest Files on Disk (find, sort, du)

Set your values

The one-liner

Why -xdev matters

Human-readable sizes

The du companion: per-directory totals

ncdu: the interactive option

The full-server disk audit recipe

macOS BSD vs GNU find

Common mistakes

When NOT to use this

See also

FAQ

Ishan Karunaratne

Related posts

How to Find and Delete Files Safely with find -delete

How to Find Files Larger Than a Size with find -size

How to List the Files Changed in Git

How do I find the largest files on a Linux disk?

What is using my disk space on Linux?

Why does my find scan recurse into /proc and /sys?

Why is the macOS command different from the Linux one?

When should I use du or ncdu instead of find?

df says the disk is full but du shows free space. Why?

Why does find report a different size than du for the same file?

Ishan Karunaratne