QZ qz thoughts
a blog from Eli the Bearded

On Using ffmpeg


The ffmpeg package (with ffmpeg and associated other tools) is an enormously powerful suite for dealing with video files. Video files are, however, very complicated and ffmpeg command line options are enormously complicated to match.

Slowly I have been learning to understand the command line for the things I want to do with video files. The first thing to know is "What are you dealing with?" and the ffprobe tool can tell you that. But that's a verbose tool which unhelpfully sends all output to standard error. I cope with a simple wrapper script:

#!/bin/sh

if [ "$#" = 0 ] ; then
  echo "Usage: ffid videofile [videofile ...]"
  echo "Ease of use wrapper for 'ffprobe'."
  exit
fi

blank=

for file ; do
  if [ "$blank" = yes ] ; then
    # blank line between files
    echo ""
  fi

  if [ "$#" != 1 ] ; then
    # show file names when more than one
    echo "$file --"
    blank=yes
  fi

  # Don't show compile settings ("-hide_banner") and send all
  # stderr to stdout for grep, etc
  ffprobe -hide_banner "$file" 2>&1
done

Sample output for some old stop-motion videos downloaded from Youtube. (yt-dlp for the win)

The Cameraman's Revenge (1912) animation [U424m8utJnA].webm --
Input #0, matroska,webm, from 'The Cameraman's Revenge (1912) animation [U424m8utJnA].webm':
  Metadata:
    COMPATIBLE_BRANDS: iso6mp41
    MAJOR_BRAND     : dash
    MINOR_VERSION   : 0
    ENCODER         : Lavf58.29.100
  Duration: 00:13:22.92, start: -0.007000, bitrate: 377 kb/s
    Stream #0:0: Video: vp9 (Profile 0), yuv420p(tv, bt709), 480x360, SAR 1:1 DAR 4:3, 23.98 fps, 23.98 tbr, 1k tbn, 1k tbc (default)
    Metadata:
      HANDLER_NAME    : ISO Media file produced by Google Inc. Created on: 12/22/2024.
      DURATION        : 00:13:22.844000000
    Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION        : 00:13:22.921000000

The Insects' Christmas [CXYpvBS7RWE].webm --
Input #0, matroska,webm, from 'The Insects' Christmas [CXYpvBS7RWE].webm':
  Metadata:
    ENCODER         : Lavf58.29.100
  Duration: 00:06:35.04, start: -0.007000, bitrate: 453 kb/s
    Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, smpte170m/smpte170m/bt709), 640x480, SAR 1:1 DAR 4:3, 24 fps, 24 tbr, 1k tbn, 1k tbc (default)
    Metadata:
      DURATION        : 00:06:34.999000000
    Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION        : 00:06:35.041000000

Reading that mess takes a bit of practice. Key things I'm looking for when I'm reading it it is the "Duration" line (not "DURATION") and the "Stream #" lines. The first one has Duration: 00:13:22.92, zero hours, thirteen minutes, 22.92 seconds. Next I can see there weach has two streams Stream #0:0: Video and Stream #0:1(eng): Audio. That's a bog standard combination. In this case, both of these were originally silent short films and the audio was added later.

Another more complicated example:

Input #0, matroska,webm, from 'The.Seventh.Seal.1957.mkv':
  Metadata:
    title           : 
    encoder         : libebml v1.4.2 + libmatroska v1.6.4
    creation_time   : 2022-03-12T09:32:02.000000Z
  Duration: 01:36:59.28, start: 0.000000, bitrate: 37589 kb/s
    Chapter #0:0: start 0.000000, end 72.322000
    Metadata:
      title           : Logos / Opening Credits
    Chapter #0:1: start 72.322000, end 493.993000
    Metadata:
      title           : On the Beach
    Chapter #0:2: start 493.993000, end 984.650000
    Metadata:
      title           : Jof's Vision
    Chapter #0:3: start 984.650000, end 1638.053000
    Metadata:
      title           : At the Church
    Chapter #0:4: start 1638.053000, end 1939.896000
    Metadata:
      title           : The Deserted Village
    Chapter #0:5: start 1939.896000, end 2184.849000
    Metadata:
      title           : The Seduction of Skat
    Chapter #0:6: start 2184.849000, end 2574.029000
    Metadata:
      title           : The Procession of Flagellants
    Chapter #0:7: start 2574.029000, end 2920.334000
    Metadata:
      title           : Torture at the Tavern
    Chapter #0:8: start 2920.334000, end 3509.089000
    Metadata:
      title           : Strawberries and Milk at Dusk
    Chapter #0:9: start 3509.089000, end 3735.773000
    Metadata:
      title           : "Love is the blackest of all plagues"
    Chapter #0:10: start 3735.773000, end 4186.724000
    Metadata:
      title           : "The deadest actor I've ever seen"
    Chapter #0:11: start 4186.724000, end 4718.797000
    Metadata:
      title           : The Burning of the Witch
    Chapter #0:12: start 4718.797000, end 5161.239000
    Metadata:
      title           : "Mate at the next move"
    Chapter #0:13: start 5161.239000, end 5641.219000
    Metadata:
      title           : The Last Supper
    Chapter #0:14: start 5641.219000, end 5819.280000
    Metadata:
      title           : The Dance of Death
    Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)
    Metadata:
      title           : 
    Stream #0:1(swe): Audio: flac, 48000 Hz, mono, s16 (default)
    Metadata:
      title           : 
    Stream #0:2(swe): Audio: flac, 48000 Hz, mono, s32 (24 bit)
    Metadata:
      title           : Original / FLAC Audio / 1.0 / 48 kHz / 649 kbps / 24-bit
    Stream #0:3(eng): Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s
    Metadata:
      title           : English Dub / Dolby Digital Audio / 1.0 / 48 kHz / 192 kbps
    Stream #0:4(eng): Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s
    Metadata:
      title           : Commentary with film scholar Peter Cowie / Dolby Digital Audio / 1.0 / 48 kHz / 192 kbps
    Stream #0:5(eng): Audio: flac, 48000 Hz, stereo, s16
    Metadata:
      title           : Commentary with film critic Kat Ellinger / FLAC Audio / 2.0 / 48 kHz / 457 kbps / 16-bit
    Stream #0:6(eng): Subtitle: hdmv_pgs_subtitle, 1920x1080 (default)
    Metadata:
      title           : 
    Stream #0:7(eng): Subtitle: hdmv_pgs_subtitle
    Metadata:
      title           : English (UK / BFI)
    Stream #0:8(eng): Subtitle: hdmv_pgs_subtitle
    Metadata:
      title           : English (UK / Tartan)
    Stream #0:9(dut): Subtitle: dvd_subtitle, 1440x1080
    Stream #0:10(fre): Subtitle: hdmv_pgs_subtitle
    Stream #0:11(ger): Subtitle: hdmv_pgs_subtitle

This file has chapter information, including meaningful titles, these use plain seconds for timetamps. "Strawberries and Milk at Dusk" is the ninth chapter (Chapter #0:8 but counting begins with zero) and runs from 2920.334000 to 3509.089000 (48:40.334 to 58:29.089). There are five audio streams, two in Swedish (Stream #0:1(swe) and Stream #0:2(swe)); and three in English (Stream #0:3(eng), a dubbed audio stream, plus Stream #0:4(eng) and Stream #0:5(eng), two commentary streams. There are six subtitle streams, five in hdmv_pgs_subtitle format and one in dvd_subtitle format; three in English (eng), one each in Dutch (dut), French (fre) and German (ger).

About subtitles. I'm not really sure what dvd_subtitle are, just that they are not too much trouble. However, the hdmv_pgs_subtitle are hard to work with. Sometimes these are called "bluray" subtitles. They are images to be superimposed on the video. This is awkward if you want to resize the video, because those images are not as easily resized. The easiest substitles to work with are "subrip" (eg Stream #0:2(eng): Subtitle: subrip (default)). Subrips are embeded text strings, to be rendered and superimposed at playback time.

Attached pictures. Sometimes you'll find other things in a video file. It's possible to embed a thumbnail still image, and yt-dlp has options to do that if you want the "cover" art from Youtube. Such an embed might look like Stream #0:3: Video: png, rgba(pc), 640x346, 90k tbr, 90k tbn, 90k tbc (attached pic) Note how it lies about being "video". This can be trouble.

Let's say you are me and you like to archive stuff. However archive quality is not necessarily ideal for everyday use. My laptop screen is 1920x1040 and my phone screen is smaller. If I watch a 3840x2076 video on either, I'll get stuttering because the pitiful GPU can't keep up with realtime. So a rescale in advance of playing the file makes life pleasant. But some of those streams cause problems for me.

Simplest case scaling. In the best case, I can have ffmpeg scale the video, copy the audio as-is, copy the subtitles as-as, and have a useful smaller video file. Here's my method for doing that:

outsize=1280x720  # small: 720x480, medium: 1280x720, large: 1920x1080

# .suf here can be .mp4, .mkv, .webm, .ogv, etc. Usually I will keep the
# same suffix from input to output. 
input=some-video-file.HUGE.suf
output=some-video-file.$outsize.suf

nice ffmpeg \
        -i "$input" \
        -s $outsize \
        -map 0 \
        -acodec copy \
        -scodec copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Here's what's happening. I specify the input file, the size to rescale the video, those two options should be obvious. Next -map 0 this will have ffmpeg copy multiple streams, but I'm still a little hazy on how it works. The -acodec copy will copy the audio streams without re-encoding them, the -scodec copy will copy subrip subtitles without re-encoding them. The suffix in the $output file will tell ffmpeg the video file format to use. I remap standard input to /dev/null, so ffmpeg won't read from keyboard, and I send all output to /tmp/ffmpeg.out so it won't be too noisy in my terminal. I can check progress with a command like this (which I have as an alias):

tail -400c /tmp/ffmpeg.out | grep frame= ; echo

That will give output like:

frame=93001 fps= 17 q=28.0 size=  776495kB time=01:02:00.41 bitrate=1709.8kbits/s speed=0.672x    

The line tells me it is at the 93001st individual frame of the input, just over one hour through the source (time=01:02:00.41). It is running at 67.2% of real time (speed=0.672x). If I checked the Duration before running, I'll have a good idea how much is left to process. Or I can grep for the Duration: from the tmp file.

A Note on Concurrency. I'm writing from the perspective of someone who will never have multiple curcurrent ffmpeg operations going on, so I have made no attempt to create code that is safe for curcurrent use. It's always using the same file names, and if there were a second instance the output would be clobbered.

Complications. What if there is an embeded image or those bluray subtitles? The ffmpeg command fails fast. Here's an example for the embeded image case, with some "[...]" over verbose output.

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'has-cover-art.mp4':
[ ... ]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (copy)
  Stream #0:2 -> #0:2 (copy)
  Stream #0:3 -> #0:3 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[ ... ]
[mp4 @ 0x56216635bc00] Could not find tag for codec h264 in stream #3, codec not currently supported in container
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:0 -- 
Conversion failed!

The stream mapping bit shows that the PNG "video" is going to be converted to h264 video, but then when conversion is attempted, it fails. Ugh! Thwarted!

Time for the mysterious -map to come back. With a -map -0:3 we can tell ffmpeg to just not copy stream #0:3. Note placement of the option. Order of arguments matter to ffmpeg, and in this case we want the second drop-a-stream -map sometime after the first one.

# $input/$output/$outsize set as above

# Scale video copying everything else except the third stream
nice ffmpeg \
        -i "$input" \
        -s $outsize \
        -map 0 \
        -acodec copy \
        -scodec copy \
        -map -0:3 \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Okay, what about the bluray subtitle issue? I have two methods, which I select between based on how important the subtitles are to me. If it is an English language film with English language subtitles, then I don't really care, and I'll just drop them from the output. Again a mysterious -map option (and again order matters). Here we drop all subtitles with -map -0:s

# Scale, copy audio but omit all subtitle streams
nice ffmpeg \
        -i "$input" \
        -s $outsize \
        -map 0 \
        -acodec copy \
        -map -0:s \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

My second method is a "cargo-cult" filter that tells ffmpeg to superimpose the subtitles on video before scaling. This will create an output video that has subtitles which cannot be toggled on/off but are always on. I don't fully understand this, but it has worked for me. This will not copy any subtitles as separate streams into the output (for cases where there are multiple subtitles) but it will copy all of the audio streams.

# Scale with burned in subtitles.

# Roughly I can see that this will scale the video to the $outsize
# setting I have, and will take the first subtitle stream ("0:s:0")
# scale it by the same amount as the video ("scale2ref"),
# and "overlay" it on the video stream. But I could not write this
# filter.
filter="[0:v]scale=$outsize[ref];[0:s:0][ref]scale2ref[sub][vid];[vid][sub]overlay[v]"

nice ffmpeg \
        -i "$input" \
        -filter_complex "$filter" \
        -map '[v]' \
        -map 0:a \
        -acodec copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Additional manipulations. What if you want to swap the order of two streams? Use -map to copy them in a specific order: -map 0:a:1 -map 0:a:0 says to copy audio stream 1 first, then audio stream 0. If you additionally want to change which stream is flagged default, for players that know to respect that, you'll need a bit more magic to change the "disposition" of the streams.

# Scale video and swap first two audio streams
nice ffmpeg \
        -i "$input" \
        -vf scale=$outsize \
        -map 0:v \
        -map 0:s \
        -map 0:a:1 -map 0:a:0 \
        -disposition:a:0 default \
        -disposition:a:1 none \
        -acodec copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

What if you have a file that someone split up into multiple parts? Well, if the streams all match in codec, count, order, etc, you can do that, but it will take a config file. This is what I use. It runs pretty fast. With scaling, I expect to see speed=0.45 to speed=1.5, but with this I expect more like speed=100 or more.

# Concatenate compatible video files with no other modifications
inputs="file.1.mp4 file.2.mp4 file.3.mp4"

ffconfig=/tmp/ff.config.txt

for input in $inputs ; do
  printf "file '%s'\n" "$input"
done > $ffconfig

ffmpeg \
        -f concat \
        -safe 0 \
        -i $ffconfig \
        -c copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

rm $ffconfig

Create a fragment. Of course, one can go in the opposite direction, and create a video from a slice of another. There are three particularly helpful flags for this:

  1. -ss TIME specify a starting time (defaults to very beginning)
  2. -to TIME specify a ending time (defaults to very end)
  3. -t TIME specify a time duration (defaults to all remaining)

The ending time and time duration flags should not be used together. Pick which one makes sense for the the slice you are making and use that. The relative times (start and ending) relate to the input file. So -ss 60 -to 120 copies a minute of input, starting sixty seconds in, and -ss 60 -t 60 would do the same.

# Copy all streams, 30 seconds duration starting 125 seconds in
ffmpeg \
        -i "$input" \
        -ss 125 \
        -t 30 \
        -map 0 \
        -c copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

# Copy just video, 10 seconds to 25 seconds (15 seconds worth)
ffmpeg \
        -i "$input" \
        -ss 10 \
        -to 25 \
        -map 0:v \
        -c copy \
        "$output" \
        < /dev/null > /tmp/ffmpeg.out 2>&1

Rough bonus one: create animated GIF from video with a two pass process. In this example, the input file (stamp.mkv) was a slice created without sound using a command line very similar to the preceeding example. I like to do it that way to preview the exact slice I'm getting quickly (speed=900) before I begin the much slower convert to GIF process.

# First pass: Find an optimal color palette: GIF is limited to 256 colors
# The input file is 30 seconds of video at 640x480 resolution 23.98
# frames per second, no audio or other streams. It is 3.3 megabytes.
ffmpeg \
        -i stamp.mkv \
        -filter_complex "[0:v] palettegen" \
         palette.png

# palette.png is 16x16 pixels and under 1 kilobyte. It took 9.7
# seconds to be generated.

# Second pass: scale, quantize using palette, and encode as GIF.
# The output file is 10 frames per second, 320x240 pixels.
ffmpeg \
        -i stamp.mkv \
        -i palette.png \
        -filter_complex "[0:v] fps=10,scale=320x240 [new];[new][1:v] paletteuse" \
        stamp.gif

# stamp.gif is 9.2 megabytes and took 7.4 seconds to be generated.

And bonus two: turn a bunch of jpeg files into a video. For this I have used various tools (camera on tripod, stop motion animation, then clean up of individual pictures) to create a sequence of frames which I named f-0001.jpg, f-0002.jpg, f-0003.jpg, ... all in a directory called filmparts. I used hard links when I wanted a frame to be duplicated. I decided I wanted a soundless film 24 frames per second and 1500k bits per second. The input specifer uses printf style notation to indicate how to find the frames.

ffmpeg -i "filmparts/f-%04d.jpg" -r 24 -b:v 1500k mov-24fps-1500.mp4

All of this knowledge has been built through years of web searches on ffmpeg "recipes" and fragmentary reading of the ffmpeg manpage. I have hardly scratched the surface of what ffmpeg can do, but I feel a lot more confident in my usage these days.

SF Film Festival 2024


The San Francisco Film festival gets smaller every year. It's now down to five days, plus a few more "encore" days. I watched four of the films in this year's line up, one documentary and three fictional.

Porcelain War
Janet Planet
Dìdi
Thelma

Porcelain War is a is a much more hopeful film about the war in Ukraine than the recent Oscar winning 20 Days in Mariupol, and possibly the most beautiful film you can imagine being made inside a country at war about the people trying to keep on with their lives there. After the film, both co-directors Brendan Bellomo and Slava Leontyev, plus cinematographer Andrey Stefanov (and little dog Frodo[*]), were there to talk about how the production was done. Bellomo had previously met these people and was planning a film about their art before war broke out. Then they changed the subject matter. Because of the deep involvement in the war by two artists, and the working relationship with a third artist, this is a lovely film about art in wartime with some slightly awkward cuts to artists waging war.

Four out of four bombs dropped by a whimsically painted DJI drone.

[*] No questions were directed to Frodo and he did not speak on his own.

Janet Planet is a film about an 11-year-old girl one summer in 1991. Her mom is divorced and has a series of relationships that constitute the major chapters of the film. It's from A24 and a first time movie director and follows in the footsteps of A24 giving a chance to first time directors to tell stories of girls (eg, Greta Gerwig's _Lady Bird_). This would fail a gender reversed Bechdel Test, if that sort of thing bothers you. What bothered me about the film was just how slow it is. At one point, near a major plot event, we see some blintzs get put in a microwave and cooked for 30 seconds. The camera just stays there immobile watching the food cook. I can see the message there: 1991, home alone in the woods, no friends, no video games, no cellphone, no cable TV: life is slow and you watch your food cook. It's not an thing kids today (and there were a lot of under 30s in the audience) really know. But it is a little tedious. Afterwards, over the next few days, I did find myself thinking a lot about the film, but boy was it slow to watch.

Two out of four frozen blintzes.

Dìdi makes an interesting contrast to Janet Planet. This is about a boy one spring and summer in 2008, at the end of 8th grade up to the first few days in high school, in Freemont, California. His dad works in Taiwan, and he lives with mother, his paternal grandmother (Nai Nai), and his older sister, about to head off to dental school. The boy goes by different names with different people, "Dìdi" ("younger brother") to his family, "Wang Wang" to some of his friends, "Chris" to others. He is trying hard to fit in and become more popular, definitely falling into the tag along friend in his social groups, and having some rough times getting along with his family. And so the movie follows his failings and growth at that time when socialization was in person, on Myspace, on Facebook, on Youtube, and on text messages by flip phone.

Three out of four hasty Google searches.

Thelma is a delightful film that manages to deftly juggle the concerns of seniors (Thelma is 93 and played by a 94 year old actress) and a loving hommage to Mission: Impossible. It starts with Thelma discussing Tom Cruise doing all his own stunts while they watch a clip from MI Fallout, which is meta-relevant in that June Squibb (long a supporting actress, now in her first starring role) did "many" of her stunts for Thelma. "Stunts" in the Hollywood union rules sense of stunts, at least, and not really challenging for the able-bodied. The story concerns a common scam that targets seniors, and Thelma's quest to recover her lost money after falling for it.

Three out of four hearing aids.

Links for other movies mentioned:
20 Days in Mariupol
Lady Bird
Mission: Impossible - Fallout

Banana bread


This is a very forgiving recipe that has a lot of room for slighlt wrong amounts, different sized bananas, and tolerance for coooking time. It's a bit of a slow cook but not tricky at all. We make this a couple of times a month, sometimes doubled recipe and freezing some.

Start off with

  • 1/2 cup vegetable oil
  • 1 cup sugar

Cream together. I use a stand mixer but whatever.

Add in and mix well:

  • 1 teaspoon baking soda
  • 1/2 teaspoon salt
  • 3 very ripe bananas

Next add and mix:

  • 2 cups flour (up to 1/2 cup whole wheat)
  • 1/2 cup chopped walnuts or chocolate chips

In our household, adults always use walnuts and kids always bake it with chocolate chips.

Pour into oiled/floured loaf pan. Or multiple pans. Or make muffins.

Bake 45 minutes to an hour at 350°F. Use the clean toothpick test. Smaller cooks faster, so cupcakes may be 30 minutes, big loaves closer to the hour mark.

Long Silence Again


With no feedback from people reading, it's really hard for me to maintain motivation for writing. So I stop and write in places I do get feedback. That's been Net News (eg Usenet and Usenet like) forever and Mastodon-flavored Fediverse recently.

I created a Mastodon account perhaps four years ago, but due to the "no feedback" thing, and not knowing anyone else on the platform, it got little use until Musk started his Twitter purchase and then Twitter destruction. So Twitter link no-more on Contact section and Fediverse QAZ link instead.

I've also revamped the robots.txt file, because of other Internet "enshitification". Google is useless as a search engine now, time to drop their bot. At the same time, I added some more "SEO" related bots and the one "AI" bot I've noticed ("ChatGPT", which people tell me is phonetically the same the French phrase "Cat, I farted", « Chat, j'ai pété »).

Part of the prompt for robots.txt was a persistent highly personalized campaign from some Internet advertising company urging me to put ads on my site through their service. I suspect one of the "SEO" metrics company was how I came to this guy's attention. Better to just block those bozos. Web advertising is just a downhill spiral to the worst profit motives on the web.