To turn a whole video into a single image full of thumbnails, sample frames at a fixed interval, scale them down, and let ffmpeg's tile filter arrange them into a grid:
ffmpeg -i in.mp4 -vf "fps=1/10,scale=320:-1,tile=4x3" -frames:v 1 sheet.pngThat one command reads in.mp4, takes one frame every 10 seconds (fps=1/10), scales each to 320px wide while keeping the aspect ratio (scale=320:-1), and packs the first 12 of them into a 4-wide by 3-tall grid (tile=4x3). The -frames:v 1 tells ffmpeg to write a single output image (the assembled sheet), not 12 separate files. No ImageMagick, no temp directory, no second step.
This is the thing magazines and film editors call a contact sheet and video tools call a storyboard or preview: a glance-able map of what happens across the runtime. I reach for it constantly when I need to find roughly where a scene is, or hand someone a one-image summary of a long recording.
What each filter in the chain does
The -vf value is a single filter chain, comma-separated, applied left to right. Order matters: sample first, then scale the smaller set of frames, then tile.
fps=1/10sets the sampling rate to one frame every 10 seconds.fps=1/60is one a minute;fps=2would be two a second (rarely what you want for a sheet). This is the knob that decides how much of the video each tile represents.scale=320:-1resizes each sampled frame to 320px wide. The-1height means "whatever keeps the aspect ratio". Use-2instead of-1if a codec complains about an odd height, since-2rounds to an even number.tile=4x3lays the frames out in a grid that is 4 columns across and 3 rows down, so 12 cells. Thetilefilter consumes frames from its input and emits one combined image once it has enough to fill the grid.
tile=4x3 is shorthand for tile=layout=4x3. Pick the grid to match how many samples you expect: a 10-minute video at fps=1/10 produces 60 frames, so a 4x3 sheet shows only the first 12. To capture the whole thing in one sheet, either widen the grid (tile=10x6 for 60 cells) or sample less often. The math is simple: grid_cells = columns x rows must be at least duration_seconds / sample_interval to fit everything on one sheet.
Grab keyframes instead of fixed intervals
Fixed-interval sampling can land on a fade, a black frame, or a blurry pan. If you would rather the tiles show the video's actual cut points, sample I-frames (keyframes), which encoders place at scene changes and other "important" moments:
ffmpeg -i in.mp4 -vf "select='eq(pict_type,I)',scale=320:-1,tile=4x3" -vsync vfr -frames:v 1 sheet.pngselect='eq(pict_type,I)' passes only frames whose picture type is I (intra-coded keyframes) and drops the rest. The -vsync vfr (variable frame rate) flag is important here: without it ffmpeg duplicates the selected frames to pad back out to a constant rate, and your sheet fills with repeats. With vfr, only the genuinely-selected frames reach the tile step.
One housekeeping note for current ffmpeg: -vsync is deprecated (it still works and does the right thing, but newer builds print a warning and it may be removed eventually). The modern spelling is -fps_mode vfr, which behaves identically here, so on a recent ffmpeg you can write -fps_mode vfr in place of -vsync vfr.
How useful a keyframe sheet is depends on how the file was encoded. Encoders drop an I-frame on a regular GOP cadence (commonly every ~250 frames, roughly 10 seconds at 24fps) and, unless scene detection is turned off, an extra one on hard cuts. So on a typical encode the keyframes are partly evenly spaced (the GOP rhythm) and partly clustered around cuts: an action sequence with many cuts yields denser keyframes than a static interview. That makes a keyframe sheet a decent map of the distinct shots, while fixed fps sampling is the cleaner map of time. I pick interval sampling for "show me the timeline" and keyframe sampling for "show me the cuts".
Padding and margin between tiles
By default the tiles butt up against each other with no gap. The tile filter takes padding (pixels between cells), margin (pixels around the outside edge), and color (the fill for any gaps and unused cells):
ffmpeg -i in.mp4 -vf "fps=1/10,scale=320:-1,tile=4x3:padding=4:margin=8:color=white" -frames:v 1 sheet.pngThat puts a 4px white gutter between every thumbnail and an 8px white border around the whole sheet, which reads far more like a printed contact sheet than the seamless default. Both padding and margin accept 0 to 1024 pixels. color takes any ffmpeg color name or a hex value like 0x202020; if the last row is short, the leftover cells fill with this color too.
Several sheets for a long video
If the video is long enough that one grid cannot hold every sample without the tiles becoming postage stamps, drop -frames:v 1 and let tile emit a new image every time it fills the grid:
ffmpeg -i in.mp4 -vf "fps=1/10,scale=320:-1,tile=4x3" sheet_%03d.pngWithout the -frames:v 1 cap, ffmpeg keeps going: the first 12 frames become sheet_001.png, the next 12 become sheet_002.png, and so on across the whole runtime. The %03d in the output name is a zero-padded counter (001, 002, 003). This is how you contact-sheet a feature-length file at a useful thumbnail size without one absurdly large image.
To know how many sheets to expect, or to size the grid, check the duration first:
ffprobe -v error -show_entries format=duration -of csv=p=0 in.mp4That prints the runtime in seconds. Divide by your sample interval for the frame count, then by the cells per grid for the sheet count.
A few practical notes. JPEG output (sheet.jpg with -q:v 2) is much smaller than PNG for photographic frames and fine for a preview. The contact sheet shows the whole video at a glance; if you only need one representative still, see extract a thumbnail or frame from a video, and if you want to sample only a portion of the file, trim the clip first and feed that to the sheet command.
FAQ
See also
- The ffmpeg command cheat sheet: every common convert, trim, crop, and compress command in one reference.
- Extract a thumbnail or frame from a video: grab a single still by timestamp, frame number, or the thumbnail filter.
- Trim and cut a video without re-encoding: cut the clip down first, then contact-sheet just the part you care about.
Sources
Authoritative references this article was fact-checked against.




