Introduction: Solving Developers' Core Pain Points
As a developer, when you want to integrate smooth adaptive video streaming functionality into your application or website, you are likely to encounter the terms DASH and MPD. DASH is today's mainstream efficient streaming media protocol, while MPD is its core and soul. Simply put, the MPD is the navigation map that tells the player "where the video is, what versions are available, and how to play it".
DASH stands for Dynamic Adaptive Streaming over HTTP. As the name suggests, it is an HTTP-based adaptive streaming media technology. Its core concept is:
Cutting video files into countless small segments.
Providing multiple different quality versions (such as resolution, bitrate) for the same video.
The player dynamically selects the most appropriate quality segments to download and play based on current network speed, thereby ensuring a smooth viewing experience.
MPD stands for Media Presentation Description. It is an XML format manifest file containing metadata for all the above information. The first thing the player must do is obtain this MPD file before it can intelligently start pulling video segments.
We can imagine an MPD file as the structure of a book, which makes it very intuitive to understand.
<MPD> Root Element: The book cover, containing information such as the book title (mediaPresentationDuration total duration), publication date (publishTime), etc.
<Period> Element: The chapters in the book. A live stream may have only one chapter, while a movie might be divided into two chapters: main feature and trailer. Chapters are arranged in chronological order, playing one after another.
<AdaptationSet> Element: The categories under a chapter. For example, a chapter typically contains two major categories: "video" and "audio". It defines common attributes for the same type of media stream, such as video codec format being AVC and audio codec format being EC-3.
<Representation> Element: The specific versions under a category. This is the most critical link, representing a quality specification of the video or audio.
<SegmentTemplate> / <SegmentList>: Defines how to find each small segment. It provides the download path template for segments. This is the core of the player puzzle.
Hierarchical Relationship Summary: MPD -> Period (Chapter) -> AdaptationSet (Category: Video/Audio) -> Representation (Quality Version) -> SegmentTemplate (Segment Path Rules)
Let's deeply understand through a real MPD file (from the famous Big Buck Bunny test video). This file describes a video-on-demand (VOD) with a total duration of approximately 634 seconds (10 minutes 34 seconds). It provides 10 video qualities ranging from 180p to 4K and 1 audio quality. Download URL: https://dash.akamaized.net/akamai/bbb_30fps/bbb_30fps.mpd
<MPD mediaPresentationDuration="PT634.566S" minBufferTime="PT2.00S" profiles="urn:hbbtv:dash:profile:isoff-live:2012,urn:mpeg:dash:profile:isoff-live:2011" type="static" xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:DASH:schema:MPD:2011 DASH-MPD.xsd"> <BaseURL>./</BaseURL> <Period> <AdaptationSet mimeType="video/mp4" contentType="video" subsegmentAlignment="true" subsegmentStartsWithSAP="1" par="16:9"> <SegmentTemplate duration="120" timescale="30" media="$RepresentationID$/$RepresentationID$_$Number$.m4v" startNumber="1" initialization="$RepresentationID$/$RepresentationID$_0.m4v"/> <Representation id="bbb_30fps_1024x576_2500k" codecs="avc1.64001f" bandwidth="3134488" width="1024" height="576" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_1280x720_4000k" codecs="avc1.64001f" bandwidth="4952892" width="1280" height="720" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_1920x1080_8000k" codecs="avc1.640028" bandwidth="9914554" width="1920" height="1080" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_320x180_200k" codecs="avc1.64000d" bandwidth="254320" width="320" height="180" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_320x180_400k" codecs="avc1.64000d" bandwidth="507246" width="320" height="180" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_480x270_600k" codecs="avc1.640015" bandwidth="759798" width="480" height="270" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_640x360_1000k" codecs="avc1.64001e" bandwidth="1254758" width="640" height="360" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_640x360_800k" codecs="avc1.64001e" bandwidth="1013310" width="640" height="360" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_768x432_1500k" codecs="avc1.64001e" bandwidth="1883700" width="768" height="432" frameRate="30" sar="1:1" scanType="progressive"/> <Representation id="bbb_30fps_3840x2160_12000k" codecs="avc1.640033" bandwidth="14931538" width="3840" height="2160" frameRate="30" sar="1:1" scanType="progressive"/> </AdaptationSet> <AdaptationSet mimeType="audio/mp4" contentType="audio" subsegmentAlignment="true" subsegmentStartsWithSAP="1"> <Accessibility schemeIdUri="urn:tva:metadata:cs:AudioPurposeCS:2007" value="6"/> <Role schemeIdUri="urn:mpeg:dash:role:2011" value="main"/> <SegmentTemplate duration="192512" timescale="48000" media="$RepresentationID$/$RepresentationID$_$Number$.m4a" startNumber="1" initialization="$RepresentationID$/$RepresentationID$_0.m4a"/> <Representation id="bbb_a64k" codecs="mp4a.40.5" bandwidth="67071" audioSamplingRate="48000"> <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/> </Representation> </AdaptationSet> </Period> </MPD>
1. Global Information (<MPD> Tag)
mediaPresentationDuration="PT634.566S": The total duration of the entire media, 634.566 seconds.
minBufferTime="PT2.00S": For smooth playback, the player needs to pre-download at least 2 seconds of data.
type="static": One of the most critical attributes! static indicates this is a video-on-demand (VOD). All video segments have been generated.
<BaseURL>./</BaseURL>: The base URL for all media segment paths. ./ indicates a relative path, meaning the segment folders are in the same directory as the MPD file.
2. Video Track (<AdaptationSet> for Video)
mimeType="video/mp4": The container format is MP4.
<SegmentTemplate>: Segment Rules - This is the recipe for how the player assembles the file URL!
timescale="30" + duration="120": 120 / 30 = 4 seconds. Each video segment is 4 seconds long.
initialization="$RepresentationID$/$RepresentationID$_0.m4v": Initialization file path. $RepresentationID$ is a variable that will be replaced with the specific quality ID.
media="$RepresentationID$/$RepresentationID$_$Number$.m4v": Media segment path. $Number$ is a variable representing the segment sequence number.
Example: For the version with id="bbb_30fps_1280x720_4000k", the path to its 5th segment is: ./bbb_30fps_1280x720_4000k/bbb_30fps_1280x720_4000k_5.m4v
<Representation>: Quality Versions - Here 10 qualities are listed.
4K Version Example:
<Representation id="bbb_30fps_3840x2160_12000k" codecs="avc1.640033" bandwidth="14931538" width="3840" height="2160" frameRate="30"/>
bandwidth="14931538" indicates a bitrate of ~14.9 Mbps, which is the most critical number for the player to make switching decisions.
Low Definition Version Example:
<Representation id="bbb_30fps_320x180_200k" codecs="avc1.64000d" bandwidth="254320" width="320" height="180" frameRate="30"/>
The bitrate is only ~254 kbps, very data-efficient.
3. Audio Track (<AdaptationSet> for Audio)
Usually there is only one audio stream, as different languages would be placed in different AdaptationSets.
Audio <SegmentTemplate>:
timescale="48000" + duration="192512": 192512 / 48000 ≈ 4.01 seconds. The segment duration aligns with the video's ~4 seconds, facilitating audio-video synchronization.
Audio <Representation>:
codecs="mp4a.40.5": Represents AAC-LC audio encoding.
bandwidth="67071": Bitrate is approximately 67 kbps.
<AudioChannelConfiguration value="2"/>: Dual-channel stereo.
Obtain MPD: Download and parse this XML file.
Understand Structure: Know that this is a video-on-demand, with 10 video qualities and 1 audio quality, all segments being approximately 4 seconds each.
Download Initialization Segments: Download audio and video initialization segments according to the template (..._0.m4v and ..._0.m4a).
Adaptive Loop:
Monitor Network: Calculate current download speed.
Select Quality: Compare current network speed with bandwidth values, select the highest quality that can be played smoothly (such as 720p).
Download Media Segments: According to the media attribute template, assemble the URL for the next 4-second video segment and download it. Simultaneously download the corresponding audio segment.
Decode and Play: Feed segments into the decoder for playback.
Continuous Loop: After downloading each segment, re-evaluate network conditions, repeat this process until playback ends.
Static vs. Dynamic: type="static" indicates video-on-demand (VOD), where the entire MPD information is complete. type="dynamic" indicates live streaming, where the MPD will be continuously updated, containing attributes like minimumUpdatePeriod.
Bitrate Adaptive Logic: The DASH protocol itself only provides the "navigation map"; how to select the most appropriate Representation based on bandwidth is an algorithm that player developers need to implement. You can use open-source libraries like dash.js, Shaka Player, or zwplayer, which already have mature algorithms built-in.
Debugging Tools: The browser F12 Developer Tools "Network" tab is the best place to observe the DASH workflow. You will see requests for .mpd files, and requests for numerous .m4s or .m4v/.m4a segments.
The MPD file is the "brain" and "navigation map" of the DASH streaming media system. Through its clear XML structure, it efficiently organizes complex multi-bitrate media stream information in a loosely coupled way. Through the line-by-line interpretation of the example above, we can see how the player relies on core elements like BaseURL, SegmentTemplate, and Representation to piece together real media segment URLs, and intelligently completes adaptive switching based on the bandwidth attribute. Understanding these, you have mastered the critical first step to integrating DASH streaming media functionality.
The zwplayer player is one of the few players that supports DASH format multi-bitrate adaptive playback and embedded subtitle playback. Everyone is welcome to use it 😀
Post a Comment