On Using ffmpeg
The ffmpeg
package (with ffmpeg
and associated other tools)
is an enormously powerful suite for dealing with video files. Video
files are, however, very complicated and ffmpeg
command line options
are enormously complicated to match.
Slowly I have been learning to understand the command line for the
things I want to do with video files. The first thing to know is "What
are you dealing with?" and the ffprobe
tool can tell you that. But
that's a verbose tool which unhelpfully sends all output to standard
error. I cope with a simple wrapper script:
#!/bin/sh if [ "$#" = 0 ] ; then echo "Usage: ffid videofile [videofile ...]" echo "Ease of use wrapper for 'ffprobe'." exit fi blank= for file ; do if [ "$blank" = yes ] ; then # blank line between files echo "" fi if [ "$#" != 1 ] ; then # show file names when more than one echo "$file --" blank=yes fi # Don't show compile settings ("-hide_banner") and send all # stderr to stdout for grep, etc ffprobe -hide_banner "$file" 2>&1 done
Sample output for some old stop-motion videos downloaded from Youtube. (yt-dlp for the win)
The Cameraman's Revenge (1912) animation [U424m8utJnA].webm -- Input #0, matroska,webm, from 'The Cameraman's Revenge (1912) animation [U424m8utJnA].webm': Metadata: COMPATIBLE_BRANDS: iso6mp41 MAJOR_BRAND : dash MINOR_VERSION : 0 ENCODER : Lavf58.29.100 Duration: 00:13:22.92, start: -0.007000, bitrate: 377 kb/s Stream #0:0: Video: vp9 (Profile 0), yuv420p(tv, bt709), 480x360, SAR 1:1 DAR 4:3, 23.98 fps, 23.98 tbr, 1k tbn, 1k tbc (default) Metadata: HANDLER_NAME : ISO Media file produced by Google Inc. Created on: 12/22/2024. DURATION : 00:13:22.844000000 Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Metadata: DURATION : 00:13:22.921000000 The Insects' Christmas [CXYpvBS7RWE].webm -- Input #0, matroska,webm, from 'The Insects' Christmas [CXYpvBS7RWE].webm': Metadata: ENCODER : Lavf58.29.100 Duration: 00:06:35.04, start: -0.007000, bitrate: 453 kb/s Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, smpte170m/smpte170m/bt709), 640x480, SAR 1:1 DAR 4:3, 24 fps, 24 tbr, 1k tbn, 1k tbc (default) Metadata: DURATION : 00:06:34.999000000 Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Metadata: DURATION : 00:06:35.041000000
Reading that mess takes a bit of practice. Key things I'm looking for
when I'm reading it it is the "Duration" line (not "DURATION") and the
"Stream #" lines. The first one has Duration: 00:13:22.92
, zero hours,
thirteen minutes, 22.92 seconds. Next I can see there weach has two
streams Stream #0:0: Video
and Stream #0:1(eng): Audio
. That's a
bog standard combination. In this case, both of these were originally
silent short films and the audio was added later.
Another more complicated example:
Input #0, matroska,webm, from 'The.Seventh.Seal.1957.mkv': Metadata: title : encoder : libebml v1.4.2 + libmatroska v1.6.4 creation_time : 2022-03-12T09:32:02.000000Z Duration: 01:36:59.28, start: 0.000000, bitrate: 37589 kb/s Chapter #0:0: start 0.000000, end 72.322000 Metadata: title : Logos / Opening Credits Chapter #0:1: start 72.322000, end 493.993000 Metadata: title : On the Beach Chapter #0:2: start 493.993000, end 984.650000 Metadata: title : Jof's Vision Chapter #0:3: start 984.650000, end 1638.053000 Metadata: title : At the Church Chapter #0:4: start 1638.053000, end 1939.896000 Metadata: title : The Deserted Village Chapter #0:5: start 1939.896000, end 2184.849000 Metadata: title : The Seduction of Skat Chapter #0:6: start 2184.849000, end 2574.029000 Metadata: title : The Procession of Flagellants Chapter #0:7: start 2574.029000, end 2920.334000 Metadata: title : Torture at the Tavern Chapter #0:8: start 2920.334000, end 3509.089000 Metadata: title : Strawberries and Milk at Dusk Chapter #0:9: start 3509.089000, end 3735.773000 Metadata: title : "Love is the blackest of all plagues" Chapter #0:10: start 3735.773000, end 4186.724000 Metadata: title : "The deadest actor I've ever seen" Chapter #0:11: start 4186.724000, end 4718.797000 Metadata: title : The Burning of the Witch Chapter #0:12: start 4718.797000, end 5161.239000 Metadata: title : "Mate at the next move" Chapter #0:13: start 5161.239000, end 5641.219000 Metadata: title : The Last Supper Chapter #0:14: start 5641.219000, end 5819.280000 Metadata: title : The Dance of Death Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Metadata: title : Stream #0:1(swe): Audio: flac, 48000 Hz, mono, s16 (default) Metadata: title : Stream #0:2(swe): Audio: flac, 48000 Hz, mono, s32 (24 bit) Metadata: title : Original / FLAC Audio / 1.0 / 48 kHz / 649 kbps / 24-bit Stream #0:3(eng): Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s Metadata: title : English Dub / Dolby Digital Audio / 1.0 / 48 kHz / 192 kbps Stream #0:4(eng): Audio: ac3, 48000 Hz, mono, fltp, 192 kb/s Metadata: title : Commentary with film scholar Peter Cowie / Dolby Digital Audio / 1.0 / 48 kHz / 192 kbps Stream #0:5(eng): Audio: flac, 48000 Hz, stereo, s16 Metadata: title : Commentary with film critic Kat Ellinger / FLAC Audio / 2.0 / 48 kHz / 457 kbps / 16-bit Stream #0:6(eng): Subtitle: hdmv_pgs_subtitle, 1920x1080 (default) Metadata: title : Stream #0:7(eng): Subtitle: hdmv_pgs_subtitle Metadata: title : English (UK / BFI) Stream #0:8(eng): Subtitle: hdmv_pgs_subtitle Metadata: title : English (UK / Tartan) Stream #0:9(dut): Subtitle: dvd_subtitle, 1440x1080 Stream #0:10(fre): Subtitle: hdmv_pgs_subtitle Stream #0:11(ger): Subtitle: hdmv_pgs_subtitle
This file has chapter information, including meaningful titles, these
use plain seconds for timetamps. "Strawberries and Milk at Dusk" is
the ninth chapter (Chapter #0:8
but counting begins with zero) and
runs from 2920.334000 to 3509.089000 (48:40.334 to 58:29.089). There
are five audio streams, two in Swedish (Stream #0:1(swe)
and
Stream #0:2(swe)
); and three in English (Stream #0:3(eng)
,
a dubbed audio stream, plus Stream #0:4(eng)
and Stream #0:5(eng)
,
two commentary streams. There are six subtitle streams, five in
hdmv_pgs_subtitle
format and one in dvd_subtitle
format;
three in English (eng
), one each in Dutch (dut
), French
(fre
) and German (ger
).
About subtitles.
I'm not really sure what dvd_subtitle
are, just that they are not
too much trouble. However, the hdmv_pgs_subtitle
are hard to work
with. Sometimes these are called "bluray" subtitles. They are images
to be superimposed on the video. This is awkward if you want to
resize the video, because those images are not as easily resized.
The easiest substitles to work with are "subrip" (eg
Stream #0:2(eng): Subtitle: subrip (default)
). Subrips are embeded
text strings, to be rendered and superimposed at playback time.
Attached pictures.
Sometimes you'll find other things in a video file. It's possible
to embed a thumbnail still image, and yt-dlp
has options to do
that if you want the "cover" art from Youtube. Such an embed might
look like
Stream #0:3: Video: png, rgba(pc), 640x346, 90k tbr, 90k tbn, 90k tbc (attached pic)
Note how it lies about being "video". This can be trouble.
Let's say you are me and you like to archive stuff. However archive quality is not necessarily ideal for everyday use. My laptop screen is 1920x1040 and my phone screen is smaller. If I watch a 3840x2076 video on either, I'll get stuttering because the pitiful GPU can't keep up with realtime. So a rescale in advance of playing the file makes life pleasant. But some of those streams cause problems for me.
Simplest case scaling.
In the best case, I can have ffmpeg
scale the video, copy the
audio as-is, copy the subtitles as-as, and have a useful smaller
video file. Here's my method for doing that:
outsize=1280x720 # small: 720x480, medium: 1280x720, large: 1920x1080 # .suf here can be .mp4, .mkv, .webm, .ogv, etc. Usually I will keep the # same suffix from input to output. input=some-video-file.HUGE.suf output=some-video-file.$outsize.suf nice ffmpeg \ -i "$input" \ -s $outsize \ -map 0 \ -acodec copy \ -scodec copy \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1
Here's what's happening. I specify the input file, the size to rescale
the video, those two options should be obvious. Next -map 0
this
will have ffmpeg
copy multiple streams, but I'm still a little hazy
on how it works. The -acodec copy
will copy the audio streams
without re-encoding them, the -scodec copy
will copy subrip
subtitles without re-encoding them. The suffix in the $output
file
will tell ffmpeg
the video file format to use. I remap standard
input to /dev/null
, so ffmpeg
won't read from keyboard, and I
send all output to /tmp/ffmpeg.out
so it won't be too noisy in
my terminal. I can check progress with a command like this (which
I have as an alias):
tail -400c /tmp/ffmpeg.out | grep frame= ; echo
That will give output like:
frame=93001 fps= 17 q=28.0 size= 776495kB time=01:02:00.41 bitrate=1709.8kbits/s speed=0.672x
The line tells me it is at the 93001st individual frame of the input,
just over one hour through the source (time=01:02:00.41
). It is
running at 67.2% of real time (speed=0.672x
). If I checked the
Duration
before running, I'll have a good idea how much is left to
process. Or I can grep
for the Duration:
from the tmp file.
A Note on Concurrency.
I'm writing from the perspective of someone who will never have
multiple curcurrent ffmpeg
operations going on, so I have made
no attempt to create code that is safe for curcurrent use. It's
always using the same file names, and if there were a second instance
the output would be clobbered.
Complications.
What if there is an embeded image or those bluray subtitles? The
ffmpeg
command fails fast. Here's an example for the embeded image
case, with some "[...]" over verbose output.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'has-cover-art.mp4': [ ... ] Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264)) Stream #0:1 -> #0:1 (copy) Stream #0:2 -> #0:2 (copy) Stream #0:3 -> #0:3 (png (native) -> h264 (libx264)) Press [q] to stop, [?] for help [ ... ] [mp4 @ 0x56216635bc00] Could not find tag for codec h264 in stream #3, codec not currently supported in container Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument Error initializing output stream 0:0 -- Conversion failed!
The stream mapping bit shows that the PNG "video" is going to be converted to h264 video, but then when conversion is attempted, it fails. Ugh! Thwarted!
Time for the mysterious -map
to come back. With a -map -0:3
we can
tell ffmpeg
to just not copy stream #0:3
. Note placement of the
option. Order of arguments matter to ffmpeg
, and in this case we
want the second drop-a-stream -map
sometime after the first one.
# $input/$output/$outsize set as above # Scale video copying everything else except the third stream nice ffmpeg \ -i "$input" \ -s $outsize \ -map 0 \ -acodec copy \ -scodec copy \ -map -0:3 \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1
Okay, what about the bluray subtitle issue? I have two methods, which
I select between based on how important the subtitles are to me. If
it is an English language film with English language subtitles, then
I don't really care, and I'll just drop them from the output. Again
a mysterious -map
option (and again order matters). Here we drop
all subtitles with -map -0:s
# Scale, copy audio but omit all subtitle streams nice ffmpeg \ -i "$input" \ -s $outsize \ -map 0 \ -acodec copy \ -map -0:s \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1
My second method is a "cargo-cult" filter that tells ffmpeg
to
superimpose the subtitles on video before scaling. This will create
an output video that has subtitles which cannot be toggled on/off
but are always on. I don't fully understand this, but it has worked
for me. This will not copy any subtitles as separate streams into the
output (for cases where there are multiple subtitles) but it will
copy all of the audio streams.
# Scale with burned in subtitles. # Roughly I can see that this will scale the video to the $outsize # setting I have, and will take the first subtitle stream ("0:s:0") # scale it by the same amount as the video ("scale2ref"), # and "overlay" it on the video stream. But I could not write this # filter. filter="[0:v]scale=$outsize[ref];[0:s:0][ref]scale2ref[sub][vid];[vid][sub]overlay[v]" nice ffmpeg \ -i "$input" \ -filter_complex "$filter" \ -map '[v]' \ -map 0:a \ -acodec copy \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1
Additional manipulations.
What if you want to swap the order of two streams? Use -map
to copy
them in a specific order: -map 0:a:1 -map 0:a:0
says to copy audio
stream 1 first, then audio stream 0. If you additionally want to
change which stream is flagged default, for players that know to
respect that, you'll need a bit more magic to change the "disposition"
of the streams.
# Scale video and swap first two audio streams nice ffmpeg \ -i "$input" \ -vf scale=$outsize \ -map 0:v \ -map 0:s \ -map 0:a:1 -map 0:a:0 \ -disposition:a:0 default \ -disposition:a:1 none \ -acodec copy \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1
What if you have a file that someone split up into multiple parts?
Well, if the streams all match in codec, count, order, etc, you can
do that, but it will take a config file. This is what I use. It runs
pretty fast. With scaling, I expect to see speed=0.45
to
speed=1.5
, but with this I expect more like speed=100
or more.
# Concatenate compatible video files with no other modifications inputs="file.1.mp4 file.2.mp4 file.3.mp4" ffconfig=/tmp/ff.config.txt for input in $inputs ; do printf "file '%s'\n" "$input" done > $ffconfig ffmpeg \ -f concat \ -safe 0 \ -i $ffconfig \ -c copy \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1 rm $ffconfig
Create a fragment. Of course, one can go in the opposite direction, and create a video from a slice of another. There are three particularly helpful flags for this:
-ss TIME
specify a starting time (defaults to very beginning)-to TIME
specify a ending time (defaults to very end)-t TIME
specify a time duration (defaults to all remaining)
The ending time and time duration flags should not be used together.
Pick which one makes sense for the the slice you are making and use
that. The relative times (start and ending) relate to the input file.
So -ss 60 -to 120
copies a minute of input, starting sixty seconds
in, and -ss 60 -t 60
would do the same.
# Copy all streams, 30 seconds duration starting 125 seconds in ffmpeg \ -i "$input" \ -ss 125 \ -t 30 \ -map 0 \ -c copy \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1 # Copy just video, 10 seconds to 25 seconds (15 seconds worth) ffmpeg \ -i "$input" \ -ss 10 \ -to 25 \ -map 0:v \ -c copy \ "$output" \ < /dev/null > /tmp/ffmpeg.out 2>&1
Rough bonus one: create animated GIF from video with a two pass
process. In this example, the input file (stamp.mkv
) was a slice
created without sound using a command line very similar to the
preceeding example. I like to do it that way to preview the exact
slice I'm getting quickly (speed=900
) before I begin the much
slower convert to GIF process.
# First pass: Find an optimal color palette: GIF is limited to 256 colors # The input file is 30 seconds of video at 640x480 resolution 23.98 # frames per second, no audio or other streams. It is 3.3 megabytes. ffmpeg \ -i stamp.mkv \ -filter_complex "[0:v] palettegen" \ palette.png # palette.png is 16x16 pixels and under 1 kilobyte. It took 9.7 # seconds to be generated. # Second pass: scale, quantize using palette, and encode as GIF. # The output file is 10 frames per second, 320x240 pixels. ffmpeg \ -i stamp.mkv \ -i palette.png \ -filter_complex "[0:v] fps=10,scale=320x240 [new];[new][1:v] paletteuse" \ stamp.gif # stamp.gif is 9.2 megabytes and took 7.4 seconds to be generated.
And bonus two: turn a bunch of jpeg files into a video. For this I
have used various tools (camera on tripod, stop motion animation,
then clean up of individual pictures) to create a sequence of
frames which I named f-0001.jpg
, f-0002.jpg
,
f-0003.jpg
, ... all in a directory called filmparts
. I used
hard links when I wanted a frame to be duplicated. I decided
I wanted a soundless film 24 frames per second and 1500k bits per
second. The input specifer uses printf
style notation to indicate
how to find the frames.
ffmpeg -i "filmparts/f-%04d.jpg" -r 24 -b:v 1500k mov-24fps-1500.mp4
All of this knowledge has been built through years of web searches on
ffmpeg
"recipes" and fragmentary reading of the ffmpeg
manpage. I
have hardly scratched the surface of what ffmpeg
can do, but I feel
a lot more confident in my usage these days.