Thoughts Cory Carpenter Thoughts Cory Carpenter

Freeing the Video Industry’s Data from Its Black Box

What it means to unbox the black box, release video data from its silos, and improve the online video experience. 


The impact of these unforeseen times has narrowed the lens on video analytics dramatically. People have not only embraced how video streaming has become the main source of consuming entertainment content, but also how it’s rapidly starting to play a major role within other industries. 

We’re starting to see the adoption of video streaming with work conferences, telehealth appointments, education settings, and more during this pandemic. Video, in general, has had a huge boost not just because of the yearning for connectivity to keep us sane and entertained, but because organizations are now realizing they can still survive and thrive taking their business execution to the screen.  With a somewhat “no hands” on deck mentality, organizations are using virtual technologies including video streaming to replace in-person touchpoints and conduct business as usual.

Where the Industry’s at Now

Even before the recent months, as an industry, we’ve also seen big changes under a few major conglomerates with the shuffling of different platforms. Disney swallowed up ESPN and the majority of Hulu, while Viacom scooped PlutoTV and merged with CBS. When Comcast acquired NBC and SKY,  AT&T followed suit with DirecTV, Turner, and Otter Media. This proves how all the big players are at the forefront as media corporations scramble to stake their claim in the new streaming world order. 

With 80% of the internet now being video traffic, companies can’t dispute their consumers prefer to ingest content via video. With that in mind, those who value the end-user experience are constantly looking for answers, and specifically more data to help them better understand and control the streaming video experience. 

Today’s Black Box

A common pain point we hear is that analysts, marketers, and other decision-makers are frustrated by the walled gardens of information they’re forced to operate in. Their patchwork of single platform tools creates scattered data across their video stack, failing to generate proper insights that drive actions or business outcomes. A black box of information without any context or visibility. 

For product, operations, marketing, advertising, and business teams at a content publisher, the insights they have come from several systems and technologies that perform analysis without anyone fully understanding their inner workings. These black boxes lack transparency because they’re comprised of very complex systems, contrasting inputs, and complicated algorithms.

When it comes to video streaming, improving the experience of consuming video content requires real-time data. For this to happen, the data must become unified in real-time to power observability, adaptability, and to optimize solutions. For the video stack to truly become actionable, you must maintain a constant pulse on the health of the data in your system. Identifying and evaluating data quality and discoverability issues leads to healthier pipelines, more productive teams, and happier customers.

Why It’s Taken So Long

Streaming is complicated. It’s unique in the sense that it requires an uninterrupted experience for its entirety.  Customers expect their viewing experience to be seamless without ample spinning hourglasses. The Internet is used in every other function. It’s adaptable in the sense that you can deliver a file here and a text there. The user’s expectations of file, text, and photo sharing are less impacted by the microsecond changes across the end to end environment. Video is not that forgiving. It’s highly susceptible, down to the very millisecond, to the impact across the delivery chain.

The unique challenge with video is it’s not just a system under a single entity’s control, it’s a system that has control spread throughout many entities. From the content owner to the vendors who support those owners, to the internet providers who balance traffic and connectivity, it’s a distributed system that needs to come together and work synergistically for the final outcome to meet expectations. It requires having the ability to observe, trace between, and influence the control over the interaction between multiple back-end systems. 

The industry hasn’t been able to fully apply consistent measurement across the end-to-end process, which prevents businesses from toggling the variables to change the output. Many systems (Encoders, Origin, CDN, Transit, ISPs) are used to prepare, deliver, and play content, but all are monitored independently. Moreover, the data and metric outputs from those systems are inconsistent and unstandardized, preventing true apples to apples analysis. And without a common understanding of end-to-end performance, we can’t pinpoint operational breakages or areas to improve. This leaves us with the black box because we never used a precise consistent measurement, standardized as an industry, or tried to pull together a framework as we do for user experience.  It’s a fully distributed system that needs to come together. 

Generally speaking, when it comes to data sharing, how nice would it be to share some non-private, telemetry data about how certain services are performing? Sharing this technical feedback with multiple outside vendors would in turn help them work cohesively and serve you better. Are there ways to collect the right data about system performance and the effects on video quality and share them today? 

How To Unbox the Black Box

Breaking the seal around what’s happening for the end-user is the first line of duty. However, the results generated from the tools available today to monitor the end-user experience have great variability in measurement, and this has led to an inability to interpret the current state. Even when conjoining these insights with those from other back-end platforms, the inconsistency of results generated makes it difficult to align all stakeholders efficiently to take action. Today manual re-interpretation of metrics is often required, and this prevents any scalable, automated, or real-time improvements from being deployed. 

So how do we get our data and metrics to be reliable and insightful for all? Investments need to be made to ensure consistent data collection and measurement at every stage. Establishing a single methodology for what and how things are monitored and measured will create a common understanding of system performance. Therefore, when we tie together insights, we can easily deduce what variables impact our end to end workflows, and thus actually control the outcome. 

Agreeing to not only sharing insights but using shared data collection and measurement methodologies will allow all stakeholders, including external vendors, to align and take action to best support the end-user experience.

A quality video data platform will help all stakeholders involved in the end-to-end video pipeline to do their job more efficiently, and thus help level the playing field for customers of any size to take advantage of the internet to deliver content. You don’t have to be Comcast or Disney to create a great user experience for your users if you can efficiently and effectively align all parties involved to make it happen. You can start by creating data pipes to customize which datasets are shared internally, and which can be provided as feedback for vendors.

Essentially to optimize an end-to-end workflow requires that everyone is able to optimize their system, and thus do their part. If we do that, we can raise the bar for all video delivery and deliver flawless video experiences.


A Glimpse into the Future

A business operates best when everyone’s on the same page. Your video systems should be run the same way. If you can tap into the power of raw data to align your technologies with a single source of truth, you’ll create a vast ecosystem.  

The future surely holds more data standards creation, adoption, and technical data sharing between entities at different stages of the end to end workflow — together we can eliminate the black box.  If all parties can be more transparent, practices can be improved, and opportunity cost can be reduced. With more controlled data sharing in a standardized manner, the more likely a premium experience is achieved for end-users.

Read More
Thoughts Cory Carpenter Thoughts Cory Carpenter

Joining CDN Logs and Playback Data with the Datazoom Session_ID

When a playback error occurs, a Law & Order-esque drama unfolds for video product managers seeking to understand the root cause of the issue. First, they review the analytics which indicated the error. These often include metrics like high buffer ratios, user drop-offs, and Exits Before Video Starts (EBVS). But when it comes time to dig down through the delivery chain to identify the failing links, siloed data fails them. Then the real mystery begins, what caused the problem? 

Today, we have no shortage of alerts, indicators, metrics, and reports which define playback errors. However, aside from institutional knowledge (really a glorified ‘best guess’), there are few resources available to identify the culprit, or culprits, causing the problem. The resulting confusion affects user QoE and ultimately, revenues. 

Fortunately, there’s a way to avoid these mysteries in order to perform efficient and effective root-cause analysis. This methodology centers around an identifier traveling through the delivery chain: the Datazoom Session_ID

What is the Datazoom Session_ID

The Datazoom Session_ID is like an anchor, a unique 1-to-1 identifier which allows you to correlate events generated during playback against other events generated “upstream.” These events could include ISP drop-offs, CDN abnormalities, a problem with the encoder, et cetera

As a common variable spanning the entire delivery chain from CDN to end-point, the Session_ID a key nexus with which logs and events from each link can be correlated. This means information like CDN logs can be queried and correlated with client-side player events in an analytics system. Today, we’ll focus on this CDN use case and provide a starting point for testing it.

Implementation of the Datazoom Session_ID is possible for Self-Service and Enterprise customers of Datazoom. For step by step guides, click the links below: 

1. Setting Up Custom Header Requests: This article lays out the steps necessary for configuring the Datazoom Session_ID on a webpage hosting a supported Datazoom Collector. 

2. Configuring CDN logs to Accept the Datazoom Session_ID: This article lays out the steps for configuring a CDN to accept the Datazoom Session_ID to facilitate the joining of client-side player events with CDN logs. Fastly enables customers to set this themselves, while other CDNs like Akamai, Edgecast, Cloudfront, and Limelight can support this functionality via a request made to your account representative. 

Visualizing CDN Data with Playback Data

Once the Datazoom Session_ID is implemented across players and the CDNs, you can begin constructing metrics and visualizations for this data. Our team has prepared a sample dashboard (as an XML file) for Splunk users which can be easily imported into their account. 

Alongside conventional QoE metrics built using Datazoom’s Data Dictionary (KPIs like Minutes Viewed, Requests, Starts, Average Time to First Frame, Exits Before Video Start, Average Bitrate, and Buffer Ratio), this dashboard includes CDN focused metrics for Cache Status, Fastly State (for this example), Edge v. Shield, as well as Cache and Cluster Hit Ratios. This dashboard is a great starting point for conducting root cause analysis and obtaining a grasp on how different links in the video delivery chain affect the performance of your service. 


Getting Started

Interested in implementing the Datazoom Session_ID across your video delivery stack? Click here to signup for your 15-day, 5GB free trial of Datazoom. Reach out to us if you want more information on how to get started.

Read More
Thoughts Cory Carpenter Thoughts Cory Carpenter

8 Core Video KPIs you can build with Datazoom’s Data Dictionary

Every video player exposes data points differently. Datazoom’s Data Dictionary defines how we normalize different player terminologies into a common nomenclature. When data is standardized across platforms, players, and centralized across the video delivery chain (encompassing encoders, to CDNs, to playback data), we have a more holistic view of performance and operations, as well as the ability to understand the causes behind the numbers we see.

Here, we’ll review eight KPIs you can build in any analytics system based on Datazoom’s Data Dictionary. We’ll provide you with some generalized formulas, with Data Dictionary terms bolded.

You can then translate these expressions into the specific querying language of the tool you’re using, or another format your system of choice may require. Keep in mind that these formulas are sample starting points. They are by no means the “end all and be all” formulas. Part of the beauty of the Data Dictionary is its role as a springboard for customizing metrics in a fashion which best suits your organization. 

However, please note that similar to how players expose data points by different names, each player exposes different data points entirely. Thus, some of the metrics below may not be supported across players. To check if the players your team relies on are supported, check out our documentation here

General Metric Formulas

1. Play Requests

Alternatively called “Play Attempts,” this metric is the summation of total user attempts to initiate video playback. This QoE metric provides a good way to obtain an understanding of an audience’s interaction with a specific video asset title. 

Using Data Dictionary nomenclature, a general formula is: 

=(sum of Play_Requests

2. Play Starts

Sometimes referred to as simply “Plays,” this QoE metric is the total count of First Frame events. As such, this metric indicates the number of playback experiences which successfully initiated. 

Using Data Dictionary nomenclature, a general formula is: 

=(sum of First_Frame)

3. Video Start Failures

This QoE metric is an important gauge service performance. Reflected as the percentage, Video Start Failures indicate the total play requests which fail to reach First Frame. In other words, this metric compares Play Requests with Play Starts. Under ideal circumstances, a value close to 0% is desired. 

Using Data Dictionary nomenclature, a general formula is: 

=(((sum of Play_Requests) – (sum of First_Frame))/(sum of Play_Requests)))*100

4. Average Bitrate

This metric reflects the mean bitrate persisting over the course of a playback experience. It is useful for understanding the average data being transferred per second of playback. 

This QoE metric is useful for understanding the average connectivity of the end-user during their experience. 

Using Data Dictionary nomenclature, a generalized formula for Average Bitrate is: 

=((sum of bitrate)/(count of events with bitrate))/1000

5. Average Time to First Frame

This metric reflects the mean time which has elapsed between the user initiating playback by pressing the play button and the commencement of said playback. This is an important QoE metric which, under ideal circumstances, should be kept as low as possible. A metric complementary to Average Time to First Frame is Exits Before Video Start (EBVS). 

Using Data Dictionary nomenclature, a generalized formula for Average Time to First Frame is: 

=(sum of timeSinceRequested for First_Frame event)/1000

6. Exit Before Video Start

This metric computes the percentage of users who exit a video playback experience before the first frame is visible. Exit Before Video Start is a useful metric for gauging the percentage of viewers discontinuing their playback experience before the first frame commences, thus indicating start times are lasting longer than a user’s interest in remaining in their experience. 

Using Data Dictionary nomenclature, a generalized formula for Exist Before Video Start: 

=(count Play_Request – count First_Frame)/count Play_Request)*100

7. Total Time Watched

This metric reflects the total playback time viewed by users. This KPI is useful for obtaining an understanding of the total time users spent viewing content. 

Total Time Watched is also a useful example of the versatility of the Data Dictionary as a foundation for metrics. Our team has identified two different generalized formulas which will yield a value for Total Time Watched.  

Using Data Dictionary nomenclature, some generalized formulas for Total Time Watched are: 

=sum (timeSinceLastFluxData from event.type=FluxData)/1000/60

OR

=sum(max(totalPlayTime for each unique sessionViewId)/1000/60

8. Rebuffer Ratio

Alternatively called the “Buffer Ratio,” this metric, reflected as a percentage, compares the amount of time a viewer spends re-buffering (waiting for video) against time spent watching a playing video. Rebuffer Ratio is useful for understanding the fraction of a user’s playback experience spent loading the video again once playback commenced. 

Note that for this particular calculation, our formula does include buffer events generated during the initialization of the playback (i.e. the original “buffering” period), though some calculations would evict these events. 

Using Data Dictionary nomenclature, a generalized formula for Rebuffer Ratio is:  

=sum(timeSinceBufferBegin from event.type=BufferEnd)/sum(max(totalPlayTime for each unique sessionViewId))

Another way to view Buffering is to create a time-series metric that charts the average time viewers are in a Buffering state:

=time series chart with a span=10sec of avg(timeSinceBufferBegin from event.type=Buffering)


See for yourself

Equipped with these metrics, you can begin to visualize video KPIs in new analytics and data visualization systems ranging from application performance monitoring (APM) tools to customer analytics tools. 

You can start building these metrics now when you visit app.datazoom.io/signup and begin your 15-day, 5GB free trial of Datazoom. Reach out to us if you want more information on how to get started or customizing your plan. We would love to hear out your use-case so that together, we can create an action plan to assist you in operationalizing your video data.

Read More