TechEarl

How to Extract a YouTube Transcript (Captions) with yt-dlp

Pull a YouTube video's transcript free from the command line with yt-dlp: list the caption tracks, download the subtitles without the video, convert SRT/VTT to clean plain text, and transcribe with Whisper when no captions exist.

Ishan Karunaratne⏱️ 8 min readUpdated
Share thisCopied
Extract a YouTube video's transcript or captions from the command line with yt-dlp, download subtitles without the video, convert SRT/VTT to plain text, and transcribe with Whisper when none exist.

The fastest free way to get a YouTube video's transcript in 2026 is yt-dlp, the free, open-source command-line downloader. It pulls the caption track straight from YouTube without downloading the video, and a one-line sed strips the timestamps so you are left with plain prose. That plain-text transcript is exactly what you paste into ChatGPT, Claude, or any AI summarizer to get a summary, key points, or a searchable record of a long talk without watching it. Paste a URL below and copy the command:

Build your yt-dlp command

Paste a YouTube URL, choose what to grab, and copy the command. It updates as you change the options.

Your command
bash
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

The builder gives you the download. The rest of this page is the detail: which caption types exist, listing what a video actually has, getting the words without the megabytes of video, cleaning the file up to plain text, and the fallback for videos that have no captions at all.

Pull transcripts you have the right to use. Reading a transcript for your own notes or feeding it to a summarizer is one thing; republishing someone else's captions as your own content is another, and bulk scraping can violate YouTube's Terms of Service. This guide is for the legitimate cases. What you do with the text is on you.

Two kinds of captions: uploaded vs auto-generated

Before you download anything, know what you are pulling. YouTube has two distinct caption tracks, and yt-dlp treats them with two different flags:

  • Creator-uploaded captions (--write-subs). The video's owner uploaded or hand-corrected these. They are accurate, properly punctuated, and the ones you want when they exist. Many videos do not have them.
  • Auto-generated captions (--write-auto-subs). YouTube's own speech-to-text. Almost every video with spoken English has them, so they are the reliable fallback, but they have no punctuation to speak of and they mangle names, jargon, and homophones. Good enough to feed an AI summarizer that fixes the grammar for you; not good enough to publish verbatim.

The practical move is to ask for both and let yt-dlp grab whichever is present. Uploaded wins when available; auto fills the gap.

List what a video actually has

Do not guess. List the caption tracks (both manual and automatic) first:

bash
yt-dlp --list-subs "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

That prints two tables: the available subtitles (creator-uploaded) and the available automatic captions, each with a language code (en, en-US, es, and so on) and the formats on offer (vtt, srt, ttml). If the subtitles table is empty but the automatic-captions table lists en, you will be relying on --write-auto-subs. If both tables are empty, there are no captions at all and you skip to the Whisper fallback below.

Download the transcript only, no video

This is the whole point: get the words without fetching hundreds of megabytes of video. --skip-download tells yt-dlp to skip the media stream and pull only the subtitle file:

bash
# Uploaded + auto captions, English, written out as SRT, no video
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Breaking that down:

  • --skip-download skips the video entirely; you only get the caption file.
  • --write-subs requests creator-uploaded captions.
  • --write-auto-subs requests YouTube's auto captions, so you still get something when there are no uploaded ones.
  • --sub-langs en limits it to English. Use en.* to catch en-US/en-GB variants, or a comma list like en,es for several languages.
  • --convert-subs srt normalizes whatever YouTube serves into SubRip (.srt).

You get a file named something like Video Title [VIDEO_ID].en.srt. Prefer WebVTT? Swap the format:

bash
# Same, but WebVTT output instead of SRT
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs vtt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

SRT and VTT carry the same words; they differ only in the timecode syntax and a header line. Pick SRT if a downstream tool expects it, VTT if you are embedding in HTML5 video. For feeding an AI model, neither matters once you strip the timing in the next step.

Convert SRT/VTT to clean plain text

Here is the gap people hit: yt-dlp converts between subtitle formats, but it does not output a plain .txt. An SRT file is interleaved with sequence numbers, 00:00:01,200 --> 00:00:03,800 timestamp lines, and blank separators, which is noise when all you want is the prose. One sed removes all three:

bash
# Strip indices, timestamp lines, and blanks down to plain prose
sed -E '/^[0-9]+$/d; /-->/d; /^$/d' "Video Title [VIDEO_ID].en.srt" > transcript.txt

That deletes any line that is only a number (the sequence index), any line containing --> (the timestamp), and any empty line, leaving the spoken text. The result is one line per caption cue, which most summarizers handle fine; if you want it reflowed into paragraphs, pipe it through fmt or paste it into your editor and reflow there. The same sed works on a .vtt file too, since VTT timestamp lines also contain --> (you may want to also drop the leading WEBVTT header line, which the number/-->/blank filter leaves behind).

For a one-shot pipeline that downloads and cleans in a single command:

bash
# Download the auto/uploaded EN captions, then immediately flatten to text
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt -o "%(id)s.%(ext)s" "URL" \
  && sed -E '/^[0-9]+$/d; /-->/d; /^$/d' *.en.srt > transcript.txt

The -o "%(id)s.%(ext)s" keeps the filename predictable (just the video ID) so the sed glob is easy. For more on output templates and every other flag, the yt-dlp cheat sheet is the reference.

No captions at all? Transcribe locally with Whisper

Some videos have neither uploaded nor auto captions: music, very new uploads, languages YouTube does not auto-caption well. When --list-subs comes back empty, the fallback is to download the audio and transcribe it yourself with OpenAI Whisper, a free, open-source speech-to-text model that runs entirely on your own machine, no API key, no per-minute cost:

bash
# Grab the audio, then transcribe it to a .txt locally
yt-dlp -x --audio-format m4a -o audio.m4a "URL" && whisper audio.m4a --model small --output_format txt

yt-dlp -x extracts the audio (see download YouTube audio for the full -x flow). Then whisper transcribes audio.m4a and --output_format txt writes a clean audio.txt with no timestamps to strip, so there is no sed step here. The --model small is a good speed/accuracy balance on a laptop; tiny/base are faster, medium/large are more accurate but want a GPU. Install it with pip install -U openai-whisper (it needs ffmpeg, which you likely already have from yt-dlp).

If you want C-speed transcription on CPU, whisper.cpp is the same model reimplemented in C++; it is noticeably faster on a machine with no GPU and is also free and open source. Either one turns a captionless video into text without sending your audio to a third-party service.

See also

FAQ

Sources

Authoritative references this article was fact-checked against.

Tagsyt-dlpYouTubetranscriptcaptionssubtitlesWhisperCLI

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Tech Architect · Software Engineer · AI/DevOps

Tech architect and software engineer with 20+ years building software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Currently Chief Technology Officer at a healthcare tech startup, which is where most of these field notes come from.

Keep reading

Related posts

Download a YouTube Playlist or Entire Channel with yt-dlp

Download a whole YouTube playlist or entire channel free with yt-dlp: point it at a playlist or @handle, use a download archive so re-runs skip what you already have, organize files with output templates, and rate-limit to dodge the bot wall.