Thoughts Cory Carpenter Thoughts Cory Carpenter

Don’t Let Video Analytics Keep You From Seeing the Bigger Picture

Are You Seeing the Forest For the Trees?

If you talk to streaming operators in the industry about what video data is important, they will often talk about QoE metrics: rebuffer ratio, time-to-first byte, cache-hit ratio. And, yes, those metrics are definitely important. But just having access to that player-level video analytics isn’t entirely helpful. It scratches the surface, illustrating the output of dozens of technologies used in the delivery and servicing of video viewer requests. It’s like not knowing anything about the production of a car, through a manufacturing line, and just seeing what comes out at the end. What if it doesn’t work? How can you diagnose the problems if you don’t know how the guts were assembled inside?

End-to-End Instrumentation is Key to Visualizing Video Analytics

In order to understand how all of the component technologies within the workflow influence a metric like rebuffer ratio, it’s crucial to monitor everything within the stack. You need to collect data from encoders, from packagers, from DRM servers, from server-side ad insertion, from CDN delivery, and more. Everything that’s involved in the workflow, from content ingestion to playback, is critical to getting a true picture of everything. In keeping with the title of this post, all of those data sources are the trees and your workflow is the forest.

So how do you see the forest? There are three key steps to any end-to-end instrumentation: data collection, data normalization, and data storage.

Data Collection

This is the most basic step to seeing the forest. If you can’t get to the data from each of the workflow components, you can’t get a complete picture. This may require programmatic connection, in the case of technologies like virtualized encoders which provide API access, or it may require third-party software, such as a software or hardware probe, to monitor the technology. If a technology doesn’t expose data, or a third-party doesn’t allow for data consumption programmatically (such as CDN logs), then it might be time to look at a replacement. You can’t have a data blackhole in your workflow instrumentation.

Data Normalization

Once the data has been collected, it has to be normalized. You can probably surmise that most workflow technology vendors are not coordinating with each other regarding data representation. They employ different fields and different values, sometimes for the same metric! So to make sense of it all, to ensure there is a relationship between the encoding data about a chunk and that same chunk in the cache, all of the data being collected should be normalized against some standardized schema. Doing so will ensure that the forest you see has all the same types of trees.

Data Storage

Of course, collecting and normalizing all this data without a place to store it doesn’t make much sense. You need a repository that is also flexible and programmatically accessible. This could be a data lake provided in a cloud operator, like Google BigQuery, supported by an additional, transient storage mechanism, like Memcache, for lightning-fast retrieval.

With The Forest in View, Your Video Analytics Can Make You a Data Ranger

With end-to-end instrumentation of all the workflow technologies, you can get down to making sense of it all. For those just getting started with this kind of approach, that will require a lot of manual connections. You will spend your time tending the forest, pruning trees, grouping them together, and relating them. That work, of course, will pay dividends in the future as your video analytics visualizations and dashboards become ever the smarter. But making manual connections between data sets within your storage isn’t scalable. Most streaming operators will look for ways to automate this through machine-learning or artificial intelligence systems. These systems, once trained, could propose connections on their own, making suggestions about the nature of a data value. For example, if your rebuffer ratio is high and your encoder is through errors, a system like this could bubble up a recommendation that one of the bitrates in the bitrate ladder is corrupt. An intelligent system might even analyze each of the bitrates and identify the one which is causing the higher rebuffer ratio.

Let the Forest Tend Itself

With a continual flow of data coming from throughout the workflow, normalizing and visualized for quick decisions, you are well on your way to taking the next step in streaming operations: automation. Edge-based, serverless processes, such as Lambda functions, could analyze results from different data sets in real-time (leveraging that machine-learning or AI layer that we mentioned previously) and take action against them based on pre-determined thresholds. For example, if viewers in a specific geographic region were having high TTFB values, the system could automatically switch to an alternate CDN. If that did not fix the problem, the system could then serve a lower bitrate, overriding the player logic with some data. You get the idea. A system like this not only provides granular, robust analysis to operations (through real-time, dynamic visualizations), but it also participates in continuous improvement and self-healing. Automation within the streaming video workflow could even get predictive by comparing real-time video analytics being collected with historical heuristics. What if the system knew that on Mondays, CDN A usually had a tough time in a certain geographic region? Rather than relying on the analysis of data to make a switch, why not automatically switch to CDN B during that time frame?

Data Enables Decisions, Video Analytics Visualizes Them, But You Have to Make Them

Don’t be the streaming provider that just sees the numbers. That’s looking just at the trees. To truly make informed business decisions that affect QoE and QoS, which in turn affect subscriber satisfaction and churn rates, you need end-to-end instrumentation. With a system that collects all the data, normalizes it, and visualizes it, you can be assured that your operations personnel can see the forest to make better, holistic decisions rather than fixing the value for a single data point.

Read More
Thoughts Cory Carpenter Thoughts Cory Carpenter

A Video Data Platform First, Analytics Second

It’s becoming more generally accepted in the streaming video industry that having more data can provide greater insight. With better graphs, better algorithms, better analytics, more improvements can be made. As such, many streaming operators look for new tools or providers to make more data from their technology stack available. However, the result can be overwhelming: a dozen different interfaces all providing detail and insight into a different aspect of the streaming workflow. Of course, none of them are connected together.

Unfortunately, this is an all-too-common approach to streaming video data monitoring. Operators put analytics first, seeking a way to visualize a data source without having an overall strategy for data in general.

What’s needed first, before meaning can be derived, before any attempt at analysis, is a video data platform.

What Are Analytics?

Analytics, simply put, is the use of visualization tools to display data in a form that can be analyzed. Although it’s possible to analyze data points within a table, the concept of analytics, especially within streaming video, usually includes some sort of dashboard or graphical interpretation of the data.
But analytics does not necessarily map to insight. That’s because analytics, in the current state of the streaming industry, is often carried out against siloed data. Each source of data often has its own tool or dashboard that makes “sense” of the data itself. Of course, these are helpful. Just looking at data tables doesn’t reveal much. Yet because each dashboard is independent, true insight must be inferred by looking at multiple tools. This increases not only the complexity of deriving meaning from the data, such as root-cause analysis but it also significantly increases the time it takes.

Putting the Cart Before the Horse

When streaming operators focus on analysis without a video data platform strategy, they get short-term gains at the cost of long-term benefits. For example, with access to CDN log data through the CDN visualization, a network operations engineer may be able to ascertain a low cache efficiency on a particular piece of content. But the cause of that may not be the CDN at all. Rather, it may be a bad encode for a specific bitrate in the bitrate ladder. Without an overall data platform strategy, which would include a means to relate the CDN log data to the encoder data and even other sources like the player, the analysis is only partially helpful. The low cache-hit ratio reveals a problem. With some help from the CDN operations engineers, the streaming operator may be able to discover that it’s a result of the encoder. This kind of approach is repeated over and over again as new data sources from streaming stack technologies are made available. It’s an analysis-first mindset.

When You Put Strategy First

Rather than looking at how to visualize each type of data, streaming operators need to implement a video data platform. A good video data platform is comprised of three layers:

  • Data retrieval and transport (Level 1)

  • Data normalization and standardization (Level 2)

  • Data relating and visualization (Level 3)

Video Data Retrieval and Transport (Level 1)

The first layer of a video data platform is getting the data from the tools. In many cases, this means programmatic access to log files or the tool’s database. Once access has been achieved, the data must be transported to a common location (i.e., a data lake). Most often, this is cloud-based, such as through a provider like Amazon Web Services or Google Cloud Platform, and has programmatic access built-in. Key to this as well is the speed at which data can be transported. Some data should be provided in real-time, such as QoE data from the player, while other data can take longer.

Data Normalization and Standardization (Level 2)

The second layer of the video data platform is a process by which to normalize and standardize the data. Many tools collect similar data points. For example, the average video player utilizes over 15 software development kits (SDKs) from various technology vendors. These may collect data points that are duplicative and need to be scrubbed, normalized, and de-duped. The streaming operator can build a machine-learning system on top of the data lake to take care of this normalization.

Data Relating and Visualization (Level 3)

The final layer of the video platform is making the connections between elements in different data sources and carrying out the calculations that are needed to derive meaning. This usually involves a mapping of data elements or utilizing a master table (based on standardized data elements) which links data elements between sources together under a single master element. This can often be accomplished through third-party tools like Tableau, Datadog, or Looker. These tools also provide visualization features so streaming operators can create customized dashboards for different business groups or roles.

A Streaming Video Data Platform Grows With the Business

The best part about making analytics a product of your video data platform is that you don’t have to rebuild everything when you want to include new data into your visualization. A video data platform is flexible by nature. The architecture is intended to facilitate new data sources by just connecting them through an API (which is a function of the platform itself). New logic can be added to the normalization and standardization layer, again not “rip and replace,” enabling new relationships to be created between different data sets which can be exposed through enhanced visualizations. The video data platform, then, becomes the foundation for all monitoring and analytics across the organization.

Datazoom: A Ready-to-Implement Video Data Platform

Of course, you can build all of this yourself. However, is building a video data platform your core business? As a streaming operator, probably not. Furthermore, you can’t just rely on any provider to supply something so fundamental to the health and success of your streaming business. You need a fire-tested, battle-hardened, proven platform to ensure that the data you need to provide the best possible video experience is available quickly, normalized to your business needs, and visualized for actionable business decisions.

Read More
Thoughts Cory Carpenter Thoughts Cory Carpenter

Increase Engagement and Stickiness

This is the sixth and final blog in a series talking about how video streaming data, pulled from various parts of the workflow, can be used to support business goals. This post will focus on increasing subscriber engagement and stickiness.


Do you know how much content your viewers are watching? On what device? At what time of day? If you don’t, then you are missing a critical puzzle of long-term success for your streaming platform: user engagement. Understanding how often your viewers watch, and from where, and on what device, is the fundamental data of your business. Not only can this data help shape advertising, it can also help you determine the long-term viability of your platform. If few users are watching little content, if only a small portion of your subscribers are logging in every day, it may signal troubling times for your business. Thankfully, with access to the data from your streaming workflow, you can take action to increase engagement and stickiness.

The Two Core Values of Engagement and Stickiness: DVU and MVU

Two key metrics, Daily Views per User (DVU) and Monthly Views Per User (MVU), tell you how often your users are returning to your platform to watch content. Industry statistics tell us that users who are more engaged and return more often, are five times more likely to continue paying for the service. In short, a retained user is far more valuable than a newly acquired user!

Measuring these values and using them to experiment with a variety of levers in the platform (i.e., ad placement, subscription tiers, content recommendations, etc.) can help ensure you drive up the level of repeat engagement. Most streaming platforms, for example, get only about 20% of their subscriber base to continually engage. In fact, studies have shown that 80% of users churn after three days of subscribing. Improving those numbers, which will have a demonstrable impact on sustainable revenue, ad impression sales, and ad impression value, involves continual monitoring.
So how do you get users to come back and consume content every day, week, and month? You utilize player and delivery data to build content journeys for different user personas. For example, some users like to binge on content. So you offer them that experience. Other users like the slow burn, such as releasing a single episode each week at the same time (similar to traditional linear television). By looking at the N-day retention of users who perform the playback start event for a series, you can see how the airing date affects their engagement, and can segment and adjust marketing campaigns to re-engage accordingly.

Cohort Analysis: An Example of Using Subscriber Data to Affect Meaningful Improvement in Your Streaming Platform

As defined by Bill Su in his Medium article, cohort analysis is, “…an analytical technique that focuses on analyzing the behavior of a group of users/customers over time, thereby uncovering insights about the experiences of those customers, and what companies can do to better those experiences.”

Capturing data about individual users, for instance, customizing content recommendations, is important but understanding how groups of similar, or dissimilar, users behave is critical to improving the overall experience. For example, employing cohort analysis with data gathered through Datazoom, could provide you a deep understanding of consumer behavior over the first 72 hours (a critical OTT platform timeframe when users generally make the decision to stay or churn; of course, this could be aligned with a free trial time frame as well).

Using this data, you could then make needed changes to the content recommendation engines, platform features, etc. to try and mitigate that early churn. Another example might be to understand how a decline in viewing minutes relates to churn. You could look at users who have a certain percentage of decline over a given time frame. Then you could reference that against churn rates for that group.

Ultimately, you could set system alerts, perhaps automated emails that hit employee inboxes every morning, when multiple users have hit the start of the declination threshold enabling marketers to target specific programs at those users in the hopes of driving their viewing numbers back up.

Although cohort analysis setup can be complicated to ensure its reporting the right insights you need, it is becoming a valuable tool in the OTT operator’s toolbox for preventing churn and increasing subscriber satisfaction.

Keep Them Coming Back For More

Of course, having a rich and popular content library is sure to bring users back to your platform day-after-day. But even if you don’t have a billion-dollar content budget, you can still utilize viewer data to not only understand user receptiveness to your content, but also a host of other behaviors which all relate to how often your users will return to your platform. Keeping them engaged with the content that is most relevant to their behaviors will ensure they return often and, hopefully, bring others with them.

Read More