Hacking the VLC - How the VLC media player works

This is a translation of my previous article from Japanese.

So, let's begin with analyzing how the vlc media player is working.

The version I used is 2.2.1, which can be downloaded from here. https://www.videolan.org/vlc/download-sources.html

Multi-Threading of VLC

Since VLC is highly multi-threaded program, I must first explain the role of each thread. The important threads to note are the following:

  • Playlist: src/playlist/thread.c
    • does managing of the playlist
  • Video Output: src/video_output/video_output.c
    • does output of the video
  • Decoder: src/input/decoder.c
    • does decoding of the video. Filters are also applied here.
  • The drawing thread (i.e. Qt)

In fact there seems to be more than 10 additional threads running, but I couldn't follow all of them.

If you are interested in those threads, you might want to watch the function "vlc_clone", which is called every time the new thread is created. But be sure that, since VLC is designed to work on multi-platform, it depends on the device which GUI thread is actually called.

Video Processing of VLC

Whoo! Now we are ready to understand how the video is processed inside the VLC!

The video is processed in the following steps:

  1. The video is loaded by Playlist thread.
    • The signals (Play/Stop, Pause, etc.) from GUI thread is received and processed here.
  2. The Decoder thread decodes the video with ffmpeg.
    • The decoded frames are then pushed to picture_fifo.
  3. VideoOutput thread outputs the frame as decoding goes on.
    • the frame is popped from the fifo.
    • the process decides whether the frame should be displayed or not based on the timestamp.

This is basically all what VLC does.

The picture_fifo mentioned above is the fifo of decoded pictures. The frames are pushed in order decoded and popped when played. See picture_fifo.c for further implementations.

There is another struct similar to picture_fifo - picture_pool. This is written in picture_pool.c. It seems like it does bind some pictures or mutex-and-gc-like things. I didn't need to touch these components, so I don't know what it actually does.


There are many VLC-unique structs defined. Fortunately, today there are BUNCH of libraries floating around which make me not write these stuffs. But it wasn't that easy in 2001. These unique structures make the VLC sources very very complicated.

The astronomical numbers of structs are defined in VLC, but just let me list the ones I think are important (and interesting).

vlc_object_t: Object

Defined in vlc_common.h. The main function can be found in vlc_variables.h.

Since the VLC was developed in 2001, and maybe C++ wan't popular then, the core of VLC is written in C. But I can't imagine how hard to write such a huge program without object-oriented language.

Therefore, VLC has written all those object-oriented structures only with C. Maybe developers were too mad that they couldn't have private variables and member functions in C structs.

I guess that the developers were too happy to use those structures then, but they are slightly outdated today. Hmm...

vlc_object_t creates an object as it is given the type and the name string. The content of an object is described in vlc_value_t union as following:

 * VLC value structure
typedef union
    int64_t         i_int;
    bool            b_bool;
    float           f_float;
    char *          psz_string;
    void *          p_address;
    vlc_object_t *  p_object;
    vlc_list_t *    p_list;
    mtime_t         i_time;
    struct { int32_t x; int32_t y; } coords;

} vlc_value_t;

vlc_variables.h has many functions defined for each of the supported types. The functions are given the prefix of var_ as a namespace. All basic variables used in VLC are using this structure.

block_t, picture_t: Pictures

VLC reads the video by block, and writes by frame. block_t and picture_t are defined for these purposes.

block_t is a multi-purpose data block and defined in vlc_block.h:

struct block_t
    block_t    *p_next;

    uint8_t    *p_buffer; /**< Payload start */
    size_t      i_buffer; /**< Payload length */
    uint8_t    *p_start; /**< Buffer start */
    size_t      i_size; /**< Buffer total size */

    uint32_t    i_flags;
    unsigned    i_nb_samples; /* Used for audio */

    mtime_t     i_pts;
    mtime_t     i_dts;
    mtime_t     i_length;

    /* Rudimentary support for overloading block (de)allocation. */
    block_free_t pf_release;

As you can see, block_t is a lineare list, which can store encoding/decoding information in pts/dts. Thus, block_t is used for managing data by or transfer data in block.

It is obvious that picture_t stores a picture. The purpose of picture_t is limited to storing decoded frames, and single object itself contains enough information for displaying the picture.

 * Video picture
struct picture_t
     * The properties of the picture
    video_frame_format_t format;

    plane_t         p[PICTURE_PLANE_MAX];     /**< description of the planes */
    int             i_planes;                /**< number of allocated planes */

    /** \name Picture management properties
     * These properties can be modified using the video output thread API,
     * but should never be written directly */
    mtime_t         date;                                  /**< display date */
    bool            b_force;

    /** \name Picture dynamic properties
     * Those properties can be changed by the decoder
     * @{
    bool            b_progressive;          /**< is it a progressive frame ? */
    bool            b_top_field_first;             /**< which field is first */
    unsigned int    i_nb_fields;                  /**< # of displayed fields */
    void          * context;          /**< video format-specific data pointer,
             * must point to a (void (*)(void*)) pointer to free the context */

    /** Private data - the video output plugin might want to put stuff here to
     * keep track of the picture */
    picture_sys_t * p_sys;

    /** This way the picture_Release can be overloaded */
        atomic_uintptr_t refcount;
        void (*pf_destroy)( picture_t * );
        picture_gc_sys_t *p_sys;
    } gc;

    /** Next picture in a FIFO a pictures */
    struct picture_t *p_next;

You can think of IplImage in OpenCV. Yet, picture_t has no such capability that it can be directly casted to IplImage. I won't explain the reason today, so wait for a while.


    • PTS is an abbreviation for Presentation Time Stamp, and presents the time when the frame SHOULD be played.
    • DTS stands for Decode Time Stamp, which is the time the frame was decoded.
  • I/P/B frame
    • The idea of I/P/B frames is that the frames can be classified to 3 types, I/P/B. This method is used widely, including ffmpeg.
    • Briefly explaining, there are I frames which contain all pixel information. Between I frames, there 2 types of frames, P and B, which differ in compressing level. The P frames are softly compressed and has DIFFERENCES from the previous I or P frame. B frames are hardly compressed frames between P frames, which contains complementary infomation between them. B frame can only be decoded by referencing to previous and next P frames.
    • By using this method, the process can decode the frames under both slow and fast environments with a good quality. (from Wikipedia)
    • decoder_synchro.c has further descriptions.
 * DISCUSSION : How to Write an efficient Frame-Dropping Algorithm
 * ==========
 * This implementation is based on mathematical and statistical
 * developments. Older implementations used an enslavement, considering
 * that if we're late when reading an I picture, we will decode one frame
 * less. It had a tendancy to derive, and wasn't responsive enough, which
 * would have caused trouble with the stream control stuff.
 * 1. Structure of a picture stream
 *    =============================
 * Between 2 I's, we have for instance :
 *    I   B   P   B   P   B   P   B   P   B   P   B   I
 *    t0  t1  t2  t3  t4  t5  t6  t7  t8  t9  t10 t11 t12
 * Please bear in mind that B's and IP's will be inverted when displaying
 * (decoding order != presentation order). Thus, t1 < t0.
 * 2. Definitions
 *    ===========
 * t[0..12]     : Presentation timestamps of pictures 0..12.
 * t            : Current timestamp, at the moment of the decoding.
 * T            : Picture period, T = 1/frame_rate.
 * tau[I,P,B]   : Mean time to decode an [I,P,B] picture.
 * tauYUV       : Mean time to render a picture (given by the video_output).
 * tau´[I,P,B] = 2 * tau[I,P,B] + tauYUV
 *              : Mean time + typical difference (estimated to tau/2, that
 *                needs to be confirmed) + render time.
 * DELTA        : A given error margin.
 * 3. General considerations
 *    ======================
 * We define three types of machines :
 *      14T > tauI : machines capable of decoding all I pictures
 *      2T > tauP  : machines capable of decoding all P pictures
 *      T > tauB   : machines capable of decoding all B pictures
 * 4. Decoding of an I picture
 *    ========================
 * On fast machines, we decode all I's.
 * Otherwise :
 * We can decode an I picture if we simply have enough time to decode it
 * before displaying :
 *      t0 - t > tau´I + DELTA
 * 5. Decoding of a P picture
 *    =======================
 * On fast machines, we decode all P's.
 * Otherwise :
 * First criterion : have time to decode it.
 *      t2 - t > tau´P + DELTA
 * Second criterion : it shouldn't prevent us from displaying the forthcoming
 * I picture, which is more important.
 *      t12 - t > tau´P + tau´I + DELTA
 * 6. Decoding of a B picture
 *    =======================
 * On fast machines, we decode all B's. Otherwise :
 *      t1 - t > tau´B + DELTA
 * Since the next displayed I or P is already decoded, we don't have to
 * worry about it.
 * I hope you will have a pleasant flight and do not forget your life
 * jacket.
 *                                                  --Meuuh (2000-12-29)

This is how VLC can play the video on both fast and slow machines.

I will add more info if I could recall more...