Thoughts Cory Carpenter Thoughts Cory Carpenter

Understanding the Datatecture Part 3: Video Infrastructure Deep-Dive

In Part three of this series, we dig into some of the deeper layers of the Streaming Video Datatecture in the Infrastructure category, defining many of the individual sub-categories and explaining their purpose in the broader workflow.


As we covered in the first post of the series, the Datatecture is governed by three main categories: Operations, Infrastructure, and Workflow. Within these categories are also a myriad of other sub-categories, often branching into even more specific groups. This structure isn’t intended as a parent-child hierarchy. Rather, it is just a way of illustrating relationships between specific components and categories of functionality. For example, there are many systems and technologies within analytics that don’t compete against each other because they handle different sets of data from video player metrics to customer behavior.


What is Video Infrastructure?

As was discussed in the initial blog post, Infrastructure refers to systems which house many of the streaming stack technologies. Infrastructure components represent the most foundational aspect of the stack: storage, databases, containers, and even queueing systems. These systems enable many of the other streaming technologies to work at scale. 


Containers and Virtualization, Storage and Caching, and Queueing Systems

Within the Infrastructure category, there are three primary sub-categories which were outlined in the first post of this blog series. Let’s dig past those and go deeper into video Infrastructure to understand the individual systems involved in this area of the Datatecture.


Containers and Virtualization

As streaming providers have adopted cloud-based components within their technology stack and have moved from monolithic software architectures to microservices, containers and virtualization have become increasingly important. That’s because hardware-based approaches don’t scale well to global audiences. 

To meet the needs of all geographic audiences, such as those with low latency, providers would have to host physical servers around the globe. As those audiences grew, they would need to add more servers to support the demand. It becomes a very expensive proposition. 

Virtualization, though, and especially containers, allow operators to deploy new streaming infrastructure into existing cloud providers, enabling operations to grow or shrink programmatically. Containerization is especially exciting as it allows for a simplified – especially when using one of a variety of management tools – to spin up new streaming components that are already pre-configured for production use.


Storage and Caching

Streaming is dependent upon storage. Without somewhere to keep the segmented HTTP video files, there would be no way to provide them to requesting viewers. Of course, sometimes the storage of those segments is transitory, such as in a caching system, and other times is more permanent, such as for an on-demand library. 

In addition to physical storage, this category of the datatecture also includes other storage mechanisms such as databases. 

  • Object Storage —This is part of the core infrastructure of a streaming service: a place to house the video segments or transcoded copies. In most cases, this will be a cloud provider which offers a geographically distributed, redundant storage solution and can work in conjunction with CDN caching systems. 

  • Origin Services —This is where the content begins. It represents the storage of the original assets which are then transcoded or packaged into different formats for delivery and storage, downstream. In many cases, this storage isn’t as distributed as object storage which is why it needs to be protected from low-efficiency caches. If there are lots of cache misses and requests need to travel back to the origin, a flood can easily tip these over. Given that, many streaming operators opt for origin services offered by other providers who can protect it against flooding and ensure that the master content is always available to be ingested into the delivery workflow.

  • Open Caching Management —Open Caching, a development by the Streaming Video Alliance, is an interoperable, API-based caching system that allows streaming operators, network operators, and content rights holders all visibility and control over the caching topology. As a developing set of specifications, Open Caching isn’t something that can be downloaded and installed. It needs to be built and implemented. As such, there are vendors entering the market who can implement and support Open Caching software implementation.

  • Time-series Databases —There are some aspects of streaming data, such as player analytics, which are time-based. It’s critical to monitor and ultimately troubleshoot player events, understanding at what point the event happened. That way, it can be correlated to other streaming data, such as CDN logs, to provide telemetry on root-cause. 

  • Data Warehouses — Streaming is driven by data. Every component within the workflow, as evidenced by the Datatecture, throws off data. But to provide opportunity for insight, that data needs to be related. For that to happen, it needs to be stored in a single location. Data warehouses, and more recently, Datalakes, provide a single storage location for all data sources enabling streaming operators to see patterns and connections across datasets. By storing the data in a single location, analysis can be significantly sped up as there is no need to query multiple data sources when relating variables.


Queueing Systems

The streaming workflow is built upon servicing requests. Sometimes, those requests may come from viewers. Sometimes, they may come from other systems. For example, consider a user that requests a video that is not in cache. This request is passed up through the workflow to higher caches until it gets to the origin. 

But what if the content requested is for a specific device or format that isn’t prepared? That then triggers the workflow to push the origin content through transcoding so it can be returned to the user. But what if there are thousands or millions of such requests? A queueing system, such as a message bus, can help prioritize and organize those requests to ensure that affected systems are receiving them without being overloaded. 


Infrastructure All Works Together

These components don’t work in a vacuum. An important distinction to understand is data warehouses are linked to time-series, which are linked to object storage, which is linked to queueing systems. When looking at your own Datatecture, understanding the interplay between systems means you aren’t seeing data in a silo. Data from one component is often used by another component or that data from one technology is affected by data from another. Seeing these relationships will help you get better visibility across the entire workflow.


To learn more, visit and explore the Datatecture site. In the next blog post, we will explore the groups within the Workflow category.

Read More
Thoughts Cory Carpenter Thoughts Cory Carpenter

Understanding the Datatecture Part 2: Operations Deep-Dive

In this second post of this series, we dig into some of the deeper layers of the Streaming Video Datatecture in the Operations category, defining many of the individual sub-categories, and explaining their purpose in the broader workflow.


Just a reminder, as we covered in the first post of this series, the Datatecture is governed by three main categories: Operations, Infrastructure, and Workflow. Within these categories are also a myriad of other sub-categories, sometimes themselves branching to even more specific groups. This structure isn’t intended as a parent-child hierarchy. Rather, it is just a way of illustrating relationships between specific components and categories of functionality. For example, there are many systems and technologies within analytics that don’t compete against each other because they handle different sets of data from video player metrics to customer behavior.

What is Operations?

As was discussed in the initial blog post, Operations refers to systems that are involved in the operation of the streaming service. Many of these systems, like dashboards, video analytics, video quality assurance, and multi-CDN solutions are part of the Network Operations Center (NOC) where operations and customer support engineers can keep careful track of what’s happening within the streaming video technology stack. But because the operation of a streaming platform extends beyond just traffic and network management, there are also a lot of other systems in use by non-engineering employees such as customer and product analytics and ad viewability. 

Analytics, Monitoring, and Configuration Management

Within the Operations category, there are three primary sub-categories which were outlined in the first post of this blog series. Let’s dig past those and go deeper into Operations to understand the individual systems involved in this area of the Datatecture.

Analytics

Analytics is a core function within the streaming technology stack. As such, there are many systems (gathered into separate categories) that address a broad range of activities ranging from quality assurance to ad viewability.

  • Ad Viewability and Verification. One of the biggest issues with delivering digital advertising is ensuring advertising impressions are legitimate and meet the advertiser requirements such as how much time constitutes a view. Some of these systems are also involved in fraud and bot detection. The systems in this category are critical to any streaming operator whose business model includes advertising.

  • Identity and Attribution. Understanding the impact of marketing campaigns and other subscriber-touchpoints is crucial to maximizing viewer engagement which can have a positive impact on advertiser and subscriber revenue. The platforms in this subcategory enable streaming operators to deeply understand each user touchpoint and maximize revenue opportunities.

  • Customer and Product Analytics. While operations engineers are busy looking at performance data, others in the business are focused on activity within the streaming platform trying to answer such question as, “what features are users engaging with the most?” “how easy it is for users to find what they need” or “what are the most visited parts of the interface?” Answering these can be important to maximizing engagement and revenue. The service providers in this subcategory offer platforms to help product managers and product developers to better understand user interaction with the platform features.

  • Video Quality Assurance. One of the biggest challenges to delivering a great viewing experience is ensuring a high visual quality of the content. There may be points within the workflow, such as encoding, transcoding, and delivery where the quality of the content degrades. The systems in this group of the Datatecture analyze content visually, identifying areas where degradation has occurred (such as blocking) so that it can be remedied before delivering to the viewer.

  • Audience Measurement. An important dataset to the business of streaming is audience measurement. This data provides, in short, an understanding of what viewers are watching which can be instrumental in influencing future content investments. These well-known providers, such as Comscore and Nielsen, can provide invaluable data about the popularity and engagement with content.

  • Video Analytics. Much like broadcast, understanding the Quality of Experience (QoE) is crucial to ensuring a great viewing experience. This means gathering data about bitrates, buffering, start time, and more from the player itself. The providers in this subcategory offer specialized services to help both engineers and business-focused employees understand what the viewer is experiencing. Although many of these providers offer data via an API, they also provide proprietary visualization tools.

Monitoring

Unlike Analytics, which can involve more detailed and in-depth exploration of datasets, Monitoring is purely looking at streams of data, such as performance data, most often in a dashboard or visualization tool. 

  • Synthetic Monitoring and Testing. It can be difficult to understand the impact of sudden scale on streaming platform features because it isn’t feasible to employ a million or more users. Synthetic testing can simulate those users and provide valuable data to understand what the real-world impact of scale might be. In addition, these same monitors can be employed to continually track operation and performance throughout the video stack including on-premise, cloud-based, and even third-parties, like CDNs, to provide a holistic view of the workflow.

  • Visualization and Dashboards. The visual of streaming operations is always the same: screens on the walls in the Network Operations Center displaying content and a myriad of dashboards. That’s because without visualization it would be impossible to understand what was happening. There is simply too much data coming too quickly to make sense of just numbers. Dashboards and visualization tools empower operations engineers to have visibility on performance issues, KPIs, and other data thresholds without having to dig into the numbers themselves.

Configuration Management

This subcategory within operations addresses systems which are deeply involved in how the streaming platform functions, from managing the data that is collected to how CDNs are used to deliver streams.

  • Data Management Platforms. Streaming is not just about content. It’s about data. Unlike broadcast, the content delivered to viewers through streaming platforms is all bits and bytes. Not only that, but each component within the technology stack throws off data: CDNs have logs, video players have metrics, etc. All of this data must be managed. The providers in this subcategory provide technologies and Software-as-a-Service offerings that enable streaming operators to have more control over the data behind their business.

  • Multi-CDN solutions. As streaming platforms have gone global, it has become necessary to utilize multiple CDNs as no one CDN has the best performance in every region. Using Multi-CDN services, like those offered by the providers in this Datatecture group, streaming operators can quickly and easily move between CDNs to ensure that content is always delivered on the CDN that meets the provider’s requirements, whether that is performance or price-based.

Customer Data Platform (CDP)

Sitting outside the other subcategories within operations is a very important system to subscription-based streaming services: CDPs. These platforms enable streaming operators to leverage their first-party data to better and more deeply understand their subscribers. By enabling that understanding, insights can be derived which are critical to the success of marketing campaigns and other targeted communications with subscribers.

Separate, But Not Alone

Although these operations systems are all in discrete and separate groups, they aren’t independent. Many of them provide data that can be used by other systems. For example, some of the platforms have their own dashboards but, with programmatic access to the data, that data can be pulled into more flexible visualization tools, such as Looker. By doing so, both operations engineers and business leaders can exchange simple analysis and monitoring for observability: with all of the data in one place, it can be easier to see patterns across all of the sources (of course, it helps when that data is standardized such as through a Datazoom Data Dictionary). 

Read More
Thoughts Cory Carpenter Thoughts Cory Carpenter

CDN Management: The Secret to the Future Success of Streaming Video Platforms

Content delivery networks (CDN) are critical to the success of streaming platforms. Without their huge networks and experienced engineers, streaming video experiences might be spotty at best. Resilience, consistency, scalability… achieving those streaming platform attributes requires the use of multiple CDNs. But managing multiple delivery networks is hard enough when they all provide their own logs and their own visualization tools. Savvy streaming providers often build complex systems and tools to not only switch quickly between CDNs but also to collect all that log data, normalize it, and visualize it for use by operations engineers.

But CDN management means more than just understanding which CDN is doing well isn’t enough. Identifying root cause is often a collaborative approach between multiple CDNs, streaming operations, and other engineers. To facilitate that, though, everyone has to be able to trace data from each system to the same playback…and that means access to a unified set of data.

Step 1 of CDN Management: Setup a Video Data Platform for Your CDN’s

The first step is to implement a video data platform. This platform serves the central purpose of aggregating all of the data, from different providers and sources, and normalizing it against a standard, agreed-upon set of data elements. In many cases, this will be the streaming operator. As the owner of all the data for their streaming video technology stack, the operator can provide access to any third-parties, such as the various CDNs. In an ideal world, the video data platform will support programmatic access so that the data can be consumed by other services, like a visualization tool.

Step 2 of CDN Management: Create Shared Dashboards

Now that you have a video data platform, which is collecting, normalizing, and storing all of the data from the workflow, you can create dashboards representing different aspects of performance monitoring. For example, you might have a QoE dashboard which includes a CDN provider in addition to all of the telemetry coming from the player. By giving access to the CDN reflected in the dashboard, the streaming operator can work hand-in-hand to identify root-cause issues involving that delivery network. What’s more, CDN network and operations engineers can see other data to which they might not normally be privy. They can see how potential issues in their delivery may be impacting aspects of the viewer experience such as start-up time, video exits (because of latency or buffering), etc.

A Video Data Platform: The Modern Day Rosetta Stone

This approach to creating a unified solution for streaming video telemetry data is as much one of enabling collaboration as it is gaining consensus. Part of the issue with a streaming video operator sharing data with others in its provider ecosystem is the disconnection on the data itself. Providers may have different measurements, different data names, different values, etc. This can all create confusion when the streaming operator complains about a low value which the provider is measuring as normal or high. The video data platform can be like a Rosetta Stone within a provider ecosystem. Through normalization against a standard set of data elements, there is no longer any reason to compare data sets or data values.

Not Just a Unified Solution for Ecosystem Providers

Of course, having this kind of solution is great for getting third-party providers, like CDNs, onto the same page as the streaming operator. But a video data platform, collecting data from throughout the streaming workflow, both internal and external components, can also help to standardize the metrics, measurements, and values that are used to drive the business. Marketing, for example, may already have their own way of calculating a viewer engagement rate. But if that rate is calculated within the video data platform by employing specific metadata collected from the player, marketing doesn’t have to do that work anymore and doesn’t have to justify or validate its approach. They can simply point to a shared dashboard that captures and visualizes the metrics important to them.

Part of the Value of This CDN Management Approach Is In the Conversation Itself

Of course, having all of the ecosystem providers like CDNs working from the same playback is ideal. But setting up this kind of solution can also be collaborative. Streaming operators can involve a variety of vested parties such as CDN operations and network engineers to help identify what’s important, the thresholds for values, and the data elements themselves. When this solution is approached in such a manner, when conversations happen between streaming operator resources and third-party resources, everyone has skin in the game. The conversation about the solution itself, even before conversations involving the shared dashboard, serves to align everyone closer together. And that kind of collaboration is just as valuable as having a unified data solution.

Read More