QZ qz thoughts
a blog from Eli the Bearded
Tag search results for 2025 Page 1 of 1

On Using ffmpeg


The ffmpeg package (with ffmpeg and associated other tools) is an enormously powerful suite for dealing with video files. Video files are, however, very complicated and ffmpeg command line options are enormously complicated to match.

Slowly I have been learning to understand the command line for the things I want to do with video files. The first thing to know is "What are you dealing with?" and the ffprobe tool can tell you that. But that's a verbose tool which unhelpfully sends all output to standard error. I cope with a simple wrapper script:

#!/bin/sh

if [ "$#" = 0 ] ; then
  echo "Usage: ffid videofile [videofile ...]"
  echo "Ease of use wrapper for 'ffprobe'."
  exit
fi

blank=

for file ; do
  if [ "$blank" = yes ] ; then
    # blank line between files
    echo ""
  fi

  if [ "$#" != 1 ] ; then
    # show file names when more than one
    echo "$file --"
    blank=yes
  fi

  # Don't show compile settings ("-hide_banner") and send all
  # stderr to stdout for grep, etc
  ffprobe -hide_banner "$file" 2>&1
done

Sample output for some old stop-motion videos downloaded from Youtube. (yt-dlp for the win)

The Cameraman's Revenge (1912) animation [U424m8utJnA].webm --
Input #0, matroska,webm, from 'The Cameraman's Revenge (1912) animation [U424m8utJnA].webm':
  Metadata:
    COMPATIBLE_BRANDS: iso6mp41
    MAJOR_BRAND     : dash
    MINOR_VERSION   : 0
    ENCODER         : Lavf58.29.100
  Duration: 00:13:22.92, start: -0.007000, bitrate: 377 kb/s
    Stream #0:0: Video: vp9 (Profile 0), yuv420p(tv, bt709), 480x360, SAR 1:1 DAR 4:3, 23.98 fps, 23.98 tbr, 1k tbn, 1k tbc (default)
    Metadata:
      HANDLER_NAME    : ISO Media file produced by Google Inc. Created on: 12/22/2024.
      DURATION        : 00:13:22.844000000
    Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION        : 00:13:22.921000000

The Insects' Christmas [CXYpvBS7RWE].webm --
Input #0, matroska,webm, from 'The Insects' Christmas [CXYpvBS7RWE].webm':
  Metadata:
    ENCODER         : Lavf58.29.100
  Duration: 00:06:35.04, start: -0.007000, bitrate: 453 kb/s
    Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, smpte170m/smpte170m/bt709), 640x480, SAR 1:1 DAR 4:3, 24 fps, 24 tbr, 1k tbn, 1k tbc (default)
    Metadata:
      DURATION        : 00:06:34.999000000
    Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION        : 00:06:35.041000000

Reading that mess takes a bit of practice. Key things I'm looking for when I'm reading it it is the "Duration" line (not "DURATION") and the "Stream #" lines. The first one has Duration: 00:13:22.92, zero hours, thirteen minutes, 22.92 seconds. Next I can see there weach has two streams Stream #0:0: Video and Stream #0:1(eng): Audio. That's a bog standard combination. In this case, both of these were originally silent short films and the audio was added later.

Another more complicated example:

Input #0, matroska,webm, from 'The.Seventh.Seal.1957.mkv':
  Metadata:
    title           : 
    encoder         : libebml v1.4.2 + libmatroska v1.6.4
    creation_time   : 2022-03-12T09:32:02.000000Z
  Duration: 01:36:59.28, start: 0.000000, bitrate: 37589 kb/s
    Chapter #0:0: start 0.000000, end 72.322000
    Metadata:
      title           : Logos / Opening Credits
    Chapter #0:1: start 72.322000, end 493.993000
    Metadata:
      title           : On the Beach
    Chapter #0:2: start 493.993000, end 984.650000
    Metadata:
      title           : Jof's Vision
    Chapter #0:3: start 984.650000, end 1638.053000
    Metadata:
      title           : At the Church
    Chapter #0:4: start 1638.053000, end 1939.896000
    Metadata:
      title           : The Deserted Village
    Chapter #0:5: start 1939.896000, end 2184.849000
    Metadata:
      title           : The Seduction of Skat
    Chapter #0:6: start 2184.849000, end 2574.029000
    Metadata:
      title           : The Procession of Flagellants
    Chapter #0:7: start 2574.029000, end 2920.334000
    Metadata:
      title           : Torture at the Tavern
    Chapter #0:8: start 2920.334000, end 3509.089000
    Metadata:
      title           : Strawberries and Milk at Dusk
    Chapter #0:9: start 3509.089000, end 3735.773000
    Metadata:
      title           : "Love is the blackest of all plagues"
    Chapter #0:10: start 3735.773000, end 4186.724000
    Metadata:
      title           : "The deadest actor I've ever seen"
    Chapter #0:11: start 4186.724000, end 4718.797000
    Metadata:
      title           : The Burning of the Witch
    Chapter #0:12: start 4718.797000, end 5161.239000
    Metadata:
      title           : "Mate at the next move"
    Chapter #0:13: start 5161.239000, end 5641.219000
    Metadata:
      title           : The Last Supper
    Chapter #0:14: start 5641.219000, end 5819.280000
    Metadata:
      title           : The Dance of Death
    Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)
    Metadata:
      title           : 
    Stream #0:1(swe): Audio: flac, 48000 Hz, mono, s16 (default)
    Metadata:
      title           : 
    Stream #0:2(swe): Audio: flac, 48000 Hz, mono, s32 (24 bit)
    Metadata:
      title           : Original / FLAC Audio / 1.0 / 48 kHz / 649 kbps / 24-bit
    Stream #0:3(eng): Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s
    Metadata:
      title           : English Dub / Dolby Digital Audio / 1.0 / 48 kHz / 192 kbps
    Stream #0:4(eng): Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s
    Metadata:
      title           : Commentary with film scholar Peter Cowie / Dolby Digital Audio / 1.0 / 48 kHz / 192 kbps
    Stream #0:5(eng): Audio: flac, 48000 Hz, stereo, s16
    Metadata:
      title           : Commentary with film critic Kat Ellinger / FLAC Audio / 2.0 / 48 kHz / 457 kbps / 16-bit
    Stream #0:6(eng): Subtitle: hdmv_pgs_subtitle, 1920x1080 (default)
    Metadata:
      title           : 
    Stream #0:7(eng): Subtitle: hdmv_pgs_subtitle
    Metadata:
      title           : English (UK / BFI)
    Stream #0:8(eng): Subtitle: hdmv_pgs_subtitle
    Metadata:
      title           : English (UK / Tartan)
    Stream #0:9(dut): Subtitle: dvd_subtitle, 1440x1080
    Stream #0:10(fre): Subtitle: hdmv_pgs_subtitle
    Stream #0:11(ger): Subtitle: hdmv_pgs_subtitle

This file has chapter information, including meaningful titles, these use plain seconds for timetamps. "Strawberries and Milk at Dusk" is the ninth chapter (Chapter #0:8 but counting begins with zero) and runs from 2920.334000 to 3509.089000 (48:40.334 to 58:29.089). There are five audio streams, two in Swedish (Stream #0:1(swe) and Stream #0:2(swe)); and three in English (Stream #0:3(eng), a dubbed audio stream, plus Stream #0:4(eng) and Stream #0:5(eng), two commentary streams. There are six subtitle streams, five in hdmv_pgs_subtitle format and one in dvd_subtitle format; three in English (eng), one each in Dutch (dut), French (fre) and German (ger).

About subtitles. I'm not really sure what dvd_subtitle are, just that they are not too much trouble. However, the hdmv_pgs_subtitle are hard to work with. Sometimes these are called "bluray" subtitles. They are images to be superimposed on the video. This is awkward if you want to resize the video, because those images are not as easily resized. The easiest substitles to work with are "subrip" (eg Stream #0:2(eng): Subtitle: subrip (default)). Subrips are embeded text strings, to be rendered and superimposed at playback time.

Attached pictures. Sometimes you'll find other things in a video file. It's possible to embed a thumbnail still image, and yt-dlp has options to do that if you want the "cover" art from Youtube. Such an embed might look like Stream #0:3: Video: png, rgba(pc), 640x346, 90k tbr, 90k tbn, 90k tbc (attached pic) Note how it lies about being "video". This can be trouble.

Let's say you are me and you like to archive stuff. However archive quality is not necessarily ideal for everyday use. My laptop screen is 1920x1040 and my phone screen is smaller. If I watch a 3840x2076 video on either, I'll get stuttering because the pitiful GPU can't keep up with realtime. So a rescale in advance of playing the file makes life pleasant. But some of those streams cause problems for me.

Simplest case scaling. In the best case, I can have ffmpeg scale the video, copy the audio as-is, copy the subtitles as-as, and have a useful smaller video file. Here's my method for doing that:

outsize=1280x720  # small: 720x480, medium: 1280x720, large: 1920x1080

# .suf here can be .mp4, .mkv, .webm, .ogv, etc. Usually I will keep the
# same suffix from input to output. 
input=some-video-file.HUGE.suf
output=some-video-file.$outsize.suf

nice ffmpeg \
        -i "$input" \
        -s $outsize \
        -map 0 \
        -acodec copy \
        -scodec copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Here's what's happening. I specify the input file, the size to rescale the video, those two options should be obvious. Next -map 0 this will have ffmpeg copy multiple streams, but I'm still a little hazy on how it works. The -acodec copy will copy the audio streams without re-encoding them, the -scodec copy will copy subrip subtitles without re-encoding them. The suffix in the $output file will tell ffmpeg the video file format to use. I remap standard input to /dev/null, so ffmpeg won't read from keyboard, and I send all output to /tmp/ffmpeg.out so it won't be too noisy in my terminal. I can check progress with a command like this (which I have as an alias):

tail -400c /tmp/ffmpeg.out | grep frame= ; echo

That will give output like:

frame=93001 fps= 17 q=28.0 size=  776495kB time=01:02:00.41 bitrate=1709.8kbits/s speed=0.672x    

The line tells me it is at the 93001st individual frame of the input, just over one hour through the source (time=01:02:00.41). It is running at 67.2% of real time (speed=0.672x). If I checked the Duration before running, I'll have a good idea how much is left to process. Or I can grep for the Duration: from the tmp file.

A Note on Concurrency. I'm writing from the perspective of someone who will never have multiple curcurrent ffmpeg operations going on, so I have made no attempt to create code that is safe for curcurrent use. It's always using the same file names, and if there were a second instance the output would be clobbered.

Complications. What if there is an embeded image or those bluray subtitles? The ffmpeg command fails fast. Here's an example for the embeded image case, with some "[...]" over verbose output.

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'has-cover-art.mp4':
[ ... ]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (copy)
  Stream #0:2 -> #0:2 (copy)
  Stream #0:3 -> #0:3 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[ ... ]
[mp4 @ 0x56216635bc00] Could not find tag for codec h264 in stream #3, codec not currently supported in container
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:0 -- 
Conversion failed!

The stream mapping bit shows that the PNG "video" is going to be converted to h264 video, but then when conversion is attempted, it fails. Ugh! Thwarted!

Time for the mysterious -map to come back. With a -map -0:3 we can tell ffmpeg to just not copy stream #0:3. Note placement of the option. Order of arguments matter to ffmpeg, and in this case we want the second drop-a-stream -map sometime after the first one.

# $input/$output/$outsize set as above

# Scale video copying everything else except the third stream
nice ffmpeg \
        -i "$input" \
        -s $outsize \
        -map 0 \
        -acodec copy \
        -scodec copy \
        -map -0:3 \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Okay, what about the bluray subtitle issue? I have two methods, which I select between based on how important the subtitles are to me. If it is an English language film with English language subtitles, then I don't really care, and I'll just drop them from the output. Again a mysterious -map option (and again order matters). Here we drop all subtitles with -map -0:s

# Scale, copy audio but omit all subtitle streams
nice ffmpeg \
        -i "$input" \
        -s $outsize \
        -map 0 \
        -acodec copy \
        -map -0:s \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

My second method is a "cargo-cult" filter that tells ffmpeg to superimpose the subtitles on video before scaling. This will create an output video that has subtitles which cannot be toggled on/off but are always on. I don't fully understand this, but it has worked for me. This will not copy any subtitles as separate streams into the output (for cases where there are multiple subtitles) but it will copy all of the audio streams.

# Scale with burned in subtitles.

# Roughly I can see that this will scale the video to the $outsize
# setting I have, and will take the first subtitle stream ("0:s:0")
# scale it by the same amount as the video ("scale2ref"),
# and "overlay" it on the video stream. But I could not write this
# filter.
filter="[0:v]scale=$outsize[ref];[0:s:0][ref]scale2ref[sub][vid];[vid][sub]overlay[v]"

nice ffmpeg \
        -i "$input" \
        -filter_complex "$filter" \
        -map '[v]' \
        -map 0:a \
        -acodec copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Additional manipulations. What if you want to swap the order of two streams? Use -map to copy them in a specific order: -map 0:a:1 -map 0:a:0 says to copy audio stream 1 first, then audio stream 0. If you additionally want to change which stream is flagged default, for players that know to respect that, you'll need a bit more magic to change the "disposition" of the streams.

# Scale video and swap first two audio streams
nice ffmpeg \
        -i "$input" \
        -vf scale=$outsize \
        -map 0:v \
        -map 0:s \
        -map 0:a:1 -map 0:a:0 \
        -disposition:a:0 default \
        -disposition:a:1 none \
        -acodec copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

What if you have a file that someone split up into multiple parts? Well, if the streams all match in codec, count, order, etc, you can do that, but it will take a config file. This is what I use. It runs pretty fast. With scaling, I expect to see speed=0.45 to speed=1.5, but with this I expect more like speed=100 or more.

# Concatenate compatible video files with no other modifications
inputs="file.1.mp4 file.2.mp4 file.3.mp4"

ffconfig=/tmp/ff.config.txt

for input in $inputs ; do
  printf "file '%s'\n" "$input"
done > $ffconfig

ffmpeg \
        -f concat \
        -safe 0 \
        -i $ffconfig \
        -c copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

rm $ffconfig

Create a fragment. Of course, one can go in the opposite direction, and create a video from a slice of another. There are three particularly helpful flags for this:

  1. -ss TIME specify a starting time (defaults to very beginning)
  2. -to TIME specify a ending time (defaults to very end)
  3. -t TIME specify a time duration (defaults to all remaining)

The ending time and time duration flags should not be used together. Pick which one makes sense for the the slice you are making and use that. The relative times (start and ending) relate to the input file. So -ss 60 -to 120 copies a minute of input, starting sixty seconds in, and -ss 60 -t 60 would do the same.

# Copy all streams, 30 seconds duration starting 125 seconds in
ffmpeg \
        -i "$input" \
        -ss 125 \
        -t 30 \
        -map 0 \
        -c copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

# Copy just video, 10 seconds to 25 seconds (15 seconds worth)
ffmpeg \
        -i "$input" \
        -ss 10 \
        -to 25 \
        -map 0:v \
        -c copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Rough bonus one: create animated GIF from video with a two pass process. In this example, the input file (stamp.mkv) was a slice created without sound using a command line very similar to the preceeding example. I like to do it that way to preview the exact slice I'm getting quickly (speed=900) before I begin the much slower convert to GIF process.

# First pass: Find an optimal color palette: GIF is limited to 256 colors
# The input file is 30 seconds of video at 640x480 resolution 23.98
# frames per second, no audio or other streams. It is 3.3 megabytes.
ffmpeg \
        -i stamp.mkv \
        -filter_complex "[0:v] palettegen" \
         palette.png

# palette.png is 16x16 pixels and under 1 kilobyte. It took 9.7
# seconds to be generated.

# Second pass: scale, quantize using palette, and encode as GIF.
# The output file is 10 frames per second, 320x240 pixels.
ffmpeg \
        -i stamp.mkv \
        -i palette.png \
        -filter_complex "[0:v] fps=10,scale=320x240 [new];[new][1:v] paletteuse" \
        stamp.gif

# stamp.gif is 9.2 megabytes and took 7.4 seconds to be generated.

And bonus two: turn a bunch of jpeg files into a video. For this I have used various tools (camera on tripod, stop motion animation, then clean up of individual pictures) to create a sequence of frames which I named f-0001.jpg, f-0002.jpg, f-0003.jpg, ... all in a directory called filmparts. I used hard links when I wanted a frame to be duplicated. I decided I wanted a soundless film 24 frames per second and 1500k bits per second. The input specifer uses printf style notation to indicate how to find the frames.

ffmpeg -i "filmparts/f-%04d.jpg" -r 24 -b:v 1500k mov-24fps-1500.mp4

All of this knowledge has been built through years of web searches on ffmpeg "recipes" and fragmentary reading of the ffmpeg manpage. I have hardly scratched the surface of what ffmpeg can do, but I feel a lot more confident in my usage these days.