Evaluate: Perform all media processing (including images) via FFmpeg libraries
Pros:
- Widely used, available basically everywhere
- Fast
- Covers image, video and audio formats, including metadata
- Exposes things like rotation in stream side data so it can be handled explicitly
- Lots of options for filtering, including GPU acceleration where available
Cons:
- Somewhat poorly documented
- API changes rather often
- No good high-level bindings
Other thoughts:
- Since many MP4/ISOBMFF files are not muxed for streaming, something worth considering is remuxing them to move the
moovatom to the start of the file (like passing-movflags +faststarttoffmpeg) so that clients can start decoding immediately without having to do extra range requests or downloading the entire file. - When generating multiple outputs (e.g. thumbnail + webpublic), because of how much control FFmpeg gives us over the filter graph, we can most likely avoid having to perform some work like colorspace conversion twice.
- libavfilter has support for image classification via DNNs, with results exposed through side data. This is very useful for moderation because it can be used to automatically flag potentially unwanted media for review. With 0x0.st I have been relying on automatic classification so much that I now consider it essential.
Project
Iceshrimp.NETPriority
NormalN
Type
FeatureF
State
UntriagedU
Assignee
Laura HausmannSubsystem
BackendB
Component
No componentTarget version
UnscheduledReleased in version
Unreleased
libmpv is another option if maintaining FFmpeg wrappers is too much work. Transcoding is not its primary use case, but I’ve been using it for exactly that via a LuaJIT wrapper that took a couple hours to write. It’s quite high-level and doesn’t expose some features that might be of interest, but on the plus side the API is very easy to use and also very stable (didn’t see breakage in what I’m using in I don’t even know how many years).