Extract a YouTube Transcript Free with yt-dlp (Captions to Text)

The fastest free way to get a YouTube video's transcript in 2026 is yt-dlp, the free, open-source command-line downloader. It pulls the caption track straight from YouTube without downloading the video, and a one-line sed strips the timestamps so you are left with plain prose. That plain-text transcript is exactly what you paste into ChatGPT, Claude, or any AI summarizer to get a summary, key points, or a searchable record of a long talk without watching it. Paste a URL below and copy the command:

Build your yt-dlp command

Paste a YouTube URL, choose what to grab, and copy the command. It updates as you change the options.

YouTube URL

What to downloadLanguageFormatBrowser cookies

Your command

bash

yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

AI CLI prompt

Paste this into your terminal AI and it runs the command for you, installing the tool first if you do not have it. Works with Claude Code, OpenAI Codex CLI, Gemini CLI, GitHub Copilot CLI, Aider, Cursor Agent, Warp, OpenCode, Cline and any other CLI coding agent.

prompt

Goal: download the transcript from this YouTube link.

Run this command in my shell:

    yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

First check whether `yt-dlp` is installed (run `command -v yt-dlp`). If it is not, install it with my system's package manager before running anything: macOS `brew install yt-dlp ffmpeg`, Debian/Ubuntu `sudo apt install yt-dlp ffmpeg`, Fedora `sudo dnf install yt-dlp ffmpeg`, Windows `winget install yt-dlp ffmpeg`. Then run the command above.

Show me any command that needs sudo before you run it, tell me where the output file landed when you are done, and do not run anything else.

You run this at your own risk. An AI agent can execute commands on your machine; review what it does before approving. TechEarl is not liable for the outcome, see the Terms of Service.

The builder gives you the download. The rest of this page is the detail: which caption types exist, listing what a video actually has, getting the words without the megabytes of video, cleaning the file up to plain text, and the fallback for videos that have no captions at all.

Pull transcripts you have the right to use. Reading a transcript for your own notes or feeding it to a summarizer is one thing; republishing someone else's captions as your own content is another, and bulk scraping can violate YouTube's Terms of Service. This guide is for the legitimate cases. What you do with the text is on you.

Two kinds of captions: uploaded vs auto-generated

Before you download anything, know what you are pulling. YouTube has two distinct caption tracks, and yt-dlp treats them with two different flags:

Creator-uploaded captions (--write-subs). The video's owner uploaded or hand-corrected these. They are accurate, properly punctuated, and the ones you want when they exist. Many videos do not have them.
Auto-generated captions (--write-auto-subs). YouTube's own speech-to-text. Almost every video with spoken English has them, so they are the reliable fallback, but they have no punctuation to speak of and they mangle names, jargon, and homophones. Good enough to feed an AI summarizer that fixes the grammar for you; not good enough to publish verbatim.

The practical move is to ask for both and let yt-dlp grab whichever is present. Uploaded wins when available; auto fills the gap.

List what a video actually has

Do not guess. List the caption tracks (both manual and automatic) first:

bash

yt-dlp --list-subs "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

That prints two tables: the available subtitles (creator-uploaded) and the available automatic captions, each with a language code (en, en-US, es, and so on) and the formats on offer (vtt, srt, ttml). If the subtitles table is empty but the automatic-captions table lists en, you will be relying on --write-auto-subs. If both tables are empty, there are no captions at all and you skip to the Whisper fallback below.

Download the transcript only, no video

This is the whole point: get the words without fetching hundreds of megabytes of video. --skip-download tells yt-dlp to skip the media stream and pull only the subtitle file:

bash

# Uploaded + auto captions, English, written out as SRT, no video
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Breaking that down:

--skip-download skips the video entirely; you only get the caption file.
--write-subs requests creator-uploaded captions.
--write-auto-subs requests YouTube's auto captions, so you still get something when there are no uploaded ones.
--sub-langs en limits it to English. Use en.* to catch en-US/en-GB variants, or a comma list like en,es for several languages.
--convert-subs srt normalizes whatever YouTube serves into SubRip (.srt).

You get a file named something like Video Title [VIDEO_ID].en.srt. Prefer WebVTT? Swap the format:

bash

# Same, but WebVTT output instead of SRT
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs vtt "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

SRT and VTT carry the same words; they differ only in the timecode syntax and a header line. Pick SRT if a downstream tool expects it, VTT if you are embedding in HTML5 video. For feeding an AI model, neither matters once you strip the timing in the next step.

Convert SRT/VTT to clean plain text

Here is the gap people hit: yt-dlp converts between subtitle formats, but it does not output a plain .txt. An SRT file is interleaved with sequence numbers, 00:00:01,200 --> 00:00:03,800 timestamp lines, and blank separators, which is noise when all you want is the prose. One sed removes all three:

bash

# Strip indices, timestamp lines, and blanks down to plain prose
sed -E '/^[0-9]+$/d; /-->/d; /^$/d' "Video Title [VIDEO_ID].en.srt" > transcript.txt

That deletes any line that is only a number (the sequence index), any line containing --> (the timestamp), and any empty line, leaving the spoken text. The result is one line per caption cue, which most summarizers handle fine; if you want it reflowed into paragraphs, pipe it through fmt or paste it into your editor and reflow there. The same sed works on a .vtt file too, since VTT timestamp lines also contain --> (you may want to also drop the leading WEBVTT header line, which the number/-->/blank filter leaves behind).

For a one-shot pipeline that downloads and cleans in a single command:

bash

# Download the auto/uploaded EN captions, then immediately flatten to text
yt-dlp --skip-download --write-subs --write-auto-subs --sub-langs en --convert-subs srt -o "%(id)s.%(ext)s" "URL" \
  && sed -E '/^[0-9]+$/d; /-->/d; /^$/d' *.en.srt > transcript.txt

The -o "%(id)s.%(ext)s" keeps the filename predictable (just the video ID) so the sed glob is easy. For more on output templates and every other flag, the yt-dlp cheat sheet is the reference.

No captions at all? Transcribe locally with Whisper

Some videos have neither uploaded nor auto captions: music, very new uploads, languages YouTube does not auto-caption well. When --list-subs comes back empty, the fallback is to download the audio and transcribe it yourself with OpenAI Whisper, a free, open-source speech-to-text model that runs entirely on your own machine, no API key, no per-minute cost:

bash

# Grab the audio, then transcribe it to a .txt locally
yt-dlp -x --audio-format m4a -o audio.m4a "URL" && whisper audio.m4a --model small --output_format txt

yt-dlp -x extracts the audio (see download YouTube audio for the full -x flow). Then whisper transcribes audio.m4a and --output_format txt writes a clean audio.txt with no timestamps to strip, so there is no sed step here. The --model small is a good speed/accuracy balance on a laptop; tiny/base are faster, medium/large are more accurate but want a GPU. Install it with pip install -U openai-whisper (it needs ffmpeg, which you likely already have from yt-dlp).

If you want C-speed transcription on CPU, whisper.cpp is the same model reimplemented in C++; it is noticeably faster on a machine with no GPU and is also free and open source. Either one turns a captionless video into text without sending your audio to a third-party service.

FAQ

Yes. yt-dlp is free and open source (released into the public domain under the Unlicense), with no ads, no paywall, and no limits. Pulling a caption track with --skip-download downloads only a small text file, so it is fast and uses almost no bandwidth. The Whisper fallback is free and open source too, and runs entirely on your own machine.

--write-subs pulls captions the creator uploaded or hand-corrected, which are accurate and punctuated. --write-auto-subs pulls YouTube's automatic speech-to-text, which most videos have but which lacks punctuation and misreads names and jargon. Pass both so yt-dlp uses the uploaded track when it exists and falls back to the auto one when it does not.

yt-dlp converts between subtitle formats but does not output .txt, so strip the timing yourself: sed -E '/^[0-9]+$/d; /-->/d; /^$/d' file.en.srt > transcript.txt. That deletes the sequence numbers, the timestamp lines, and the blank separators, leaving just the spoken words. The same command works on a .vtt file.

Yes, by transcribing the audio yourself. Extract it with yt-dlp -x, then run OpenAI Whisper: whisper audio.m4a --model small --output_format txt. Whisper is free, open source, runs locally with no API key, and writes a clean text file directly. On a machine with no GPU, whisper.cpp does the same job much faster.

Yes. Run yt-dlp --list-subs URL to see which language codes the video offers, then set --sub-langs to that code (for example --sub-langs es for Spanish, or --sub-langs en,es for both). Use a pattern like en.* to catch regional variants such as en-US and en-GB in one go.

You left out --skip-download. Without it, yt-dlp downloads the video and the captions. Add --skip-download and yt-dlp fetches only the subtitle file, which is what makes transcript extraction fast and bandwidth-cheap.

How to Extract a YouTube Transcript (Captions) with yt-dlp

Two kinds of captions: uploaded vs auto-generated

List what a video actually has

Download the transcript only, no video

Convert SRT/VTT to clean plain text

No captions at all? Transcribe locally with Whisper

See also

FAQ

Sources

Ishan Karunaratne

Related posts

How to Download a YouTube Video (Any Quality) with yt-dlp

How to Download YouTube Audio (M4A, MP3, or Opus) with yt-dlp

Download a YouTube Playlist or Entire Channel with yt-dlp

Is yt-dlp free for downloading transcripts?

What is the difference between --write-subs and --write-auto-subs?

How do I convert the .srt or .vtt file to plain text?

The video has no captions. Can I still get a transcript?

Can I download captions in a language other than English?

Why does my transcript download grab the whole video?

Sources

Ishan Karunaratne