Understanding the Datatecture Part 4: Workflow Deep-Dive
In Part four of this series, we dig into some of the deeper layers of the Streaming Video Datatecture in the Workflow category, defining many of the individual sub-categories and explaining their purpose in the broader workflow.
As we covered in the first post of the series, the Datatecture is governed by three main categories: Operations, Infrastructure, and Workflow. Within these categories are also a myriad of other sub-categories, often branching into even more specific groups. This structure isn’t intended as a parent-child hierarchy. Rather, it is just a way of illustrating relationships between specific components and categories of functionality. For example, there are many systems and technologies within analytics that don’t compete against each other because they handle different sets of data from video player metrics to customer behavior.
What is Workflow?
As was discussed in the initial blog post, Workflow refers to the core systems which enable a stream to be ingested, secured, delivered and played.
Delivery, Security, Playback, Transformation, Monetization, and Content Recommendations
Within the Workflow category, there are six primary sub-categories which were outlined in the first post of this blog series. Let’s dig past those and go deeper into Workflow to understand the individual systems involved in this area of the Datatecture.
Delivery
At the heart of streaming is delivering a video stream to a viewer’s player. In technical terms, this most often means a web server sending video segments in response to an HTTP request. But there are many ways to accomplish that as evidence by the sub-categories within this Datatecture group:
Content Delivery Network (CDN). A CDN is a cache-based network which improves the act of responding to user requests for video segments by placing popular segments closer to the user and reducing the round-trip time. Most streaming operators employ multiple CDNs which have strengths in specific regions (because of network saturation and size) or overall global penetration. CDNs often work hand-in-hand with network operators (ISPs) by existing within their network (as caching boxes) or terminating at their network in peering fabrics. There are three primary types of CDNs: private networks, cloud deployments, and algorithm-based (this is only Akamai). Private networks often employ lease-wavelength with their own optical gear to build a private loop network. Cloud deployments leverage existing Cloud Service Providers (CSPs) to provide distribution and scale without having to build physical infrastructure.
Ultra-Low Latency Streaming. Certain use cases, such as online gambling, which frequire real-time interaction need to ensure delivery that is sub-second. Often relying on non-traditional streaming technologies like WebRTC, these services (sometimes offered by traditional CDNs) ensure super-fast round-trip times at the cost of scalability.
Multicast ABR. Streaming has historically been a unicast approach: each user that requests gets their own unique version of the stream. The reason for this is because streaming is often over-the-top (OTT) and requires the use of public internet for last-mile delivery. The distributed nature of the internet doesn’t provide for the network services to manage that delivery like a traditional broadcast network (multicast). So, when there are millions of concurrent users, the unicast approach can require significant bandwidth and ultimately force a reduction in quality to meet bandwidth constraints. Multicast Assisted Adaptive Bitrate, or Multicast ABR, is a suite of technologies to enable the use of multicast (a single stream that is consumed by every viewer) over the internet.
Peer-to-Peer (P2P) Streaming. P2P streaming is the use of peers, such as a viewer, to deliver content to other viewers in a very limited geographic region. The technology “seeds” peers with video segments. These peers act as local caches for other peers within the P2P network. This network approach can significantly reduce bandwidth requirements for a platform operator by taking advantage of available viewer bandwidth they might not be using. P2P can be an especially useful approach for live content when working in conjunction with a traditional CDN.
Security
Unlike traditional broadcast which has a closed end-to-end system (from network operator to set-top box), streaming is a more open ecosystem. As such, content rights holders must utilize other technologies to ensure the security of their content when delivered via streaming. These methods can include:
Geo IP. This security technology attempts to limit access to viewers who only meet specific geographic requirements. For example, a streaming operator may only have rights to distribute content in a specific geography. If viewers from outside the geography attempt to gain access to that content, they can be blocked by resolving their IP address to geographic location and comparing against whitelist locations.
Digital Rights Management (DRM). This security technology employs encryption and decryption to keep content secured. A viewer that has purchased rights to watch content can be provided a license. When they request to watch DRM-encrypted content, the license is checked against a licensing server to verify rights. If rights are verified, the player can decrypt the content.
Watermarking. In some cases, such as live content, DRM may not be a viable option (as it can introduce additional latency). In these cases, watermarking can be a significant deterrent. The watermark is layered into the frames of a video pixel-by-pixel. The resulting pattern of pixel manipulation is a binary hash representing critical data about the content such as who originally purchased it, the IP address of the purchasing user, etc. If watermarked content is found on the internet, forensic technologies can pull the data from the watermark to identify how the content was made available.
Playback
This is where the rubber meets the road. Unlike traditional broadcast in which there is a single endpoint, streaming supports an infinite number of endpoints from which the viewer can consume content. In fact, any device with a screen and an operating system that can support a video player can be an endpoint. This includes SmartTVs, mobile phones, tablets, gaming consoles, and more. As such, this Datatecture category is broken down in a multitude of sub-categories which reflect both the endpoints and the player technologies itself:
Devices. The sub-categories within this category represent the endpoints on which a video player might exist and allow playback of streaming content. These endpoints can include:
Connected TVs. These are TVs with a software platform that allows the installation of applications such as streaming services which would include a player)
Gaming Consoles. Many gaming consoles, such as Microsoft XBox, Sony Playstation, and Nintendo Switch include video player software for content playback)
Mobile. Not only do the main operating systems provide a player, but each OS also supports an application ecosystem which may include other players as well)
Set-Top Boxes/OS. These companies create IP-based STBs, which include a player component, as well as STB operating systems that can be installed on generic hardware and also include built-in video player software while also sometimes supporting the installation of third-party players.
Connected Streaming Devices. Perhaps the newest entrance in the endpoint category, these represent self-contained platforms for users to consume video from a variety of service providers. They are similar to a SmartTV, but portable so they can be moved from television to television. They include built-in video player software as well as supporting third-party applications, such as a streaming service, that also can include proprietary video player software.
Players. The sub-categories within this category represent the three main flavors of player implementation:
Commercial. These are companies which have created and support video player software that can be installed within an application or as a standalone implementation.
Open-Source. Similar to commercial but without the price tag, open source player technology includes software created and supported by a community of developers.
Offline. A key functionality of many streaming platforms is the ability for the viewer to download a movie and watch offline (rather than streaming). To facilitate this, the player functionality needs to support it. Rather than building such functionality into a commercial or open-source player, some streaming operators opt for a commercial player that can support download-to-go functionality.
Transformation
Unlike traditional broadcast, streaming video must be transformed (encoded and packaged) prior to delivery to provide a stream which does not take all the available bandwidth. What’s more, different player implementations on different devices (often a reflection of licensing costs) require different formats. All-in-all, this can significantly complicate the video workflow by requiring operators to support multiple codecs and packages. The sub-categories within this Datatecture group represent the technologies which streaming platforms use to ensure the content is consumable at the viewer endpoints. This can sometimes happen in real-time.
Encoding. This is the process by which the source material, say from camera acquisition, is converted into a format playable by an endpoint. This requires a specific codec which is often optimized for the kind of delivery, such as broadcast versus streaming. Once encoded, the endpoint player will then also need the same codec to do the decryption. There are a variety of ways to encode including using on-premise equipment (most often with traditional broadcast) to using virtualized encoders (offering scalability) to using an encoding-as-a-service provider (which obviates the need to keep the encoding software up-to-date).
Transcoding. This technology represents the re-encoding of content into a different format without changing the underlying aspect ratio of the content. Transcoding is the primary technology employed in adaptive bitrate (ABR) ladders, allowing endpoint players to “switch” between bitrates depending upon the current parameters of the environment such as available bandwidth, available CPU, available memory, etc. Transcoding can happen via commercial and open-source software (i.e., FFMPEG) as well as service providers. Unlike encoding, it can also happen in real-time enabling streaming operators to deliver specific renditions when requested.
Packaging. Packaging is a group of technologies to “wrap” encoded or transcoded content into a format that is playable by the endpoint. There are a host of popular packages including Apple HLS, MPEG-Dash, and CMAF. Streaming operators can build their own packaging services or opt to utilize a service provider. In the later implementation, there is little maintenance involved by the streaming provider and they can rest assured that the packages are always up-to-date.
Metadata. One of the fundamental differences between streaming and broadcast content is metadata. This data, which is part of the streaming package, represents information about the content from the title to the content developer to even actors and other details. Metadata is crucial to streaming platforms as it provides the means by which content can be organized and recommended. The providers within this Datatecture group represent stores of content metadata from which a streaming provider can draw to add metadata to their content.
Monetization
The transition from broadcast distribution to streaming distribution is fraught with technical challenges. One of those is monetization, especially for streaming operators that have opted for advertising-based distribution models (rather than, or in conjunction with, subscriptions). The delivery of advertising in a traditional television broadcast is based on numerous standards with technology that has been tested and improved over time. With streaming, though, monetization of a video platform, such as embedding advertising into the videos, can involve a multitude of technologies which often aren’t built to interoperate. Furthermore, streaming operators are still gathering data to better understand the translation of the broadcast television advertising model to the streaming ecosystem. The sub-categories within this Datatecture group reflect the myriad of technologies involved in monetizing streaming video.
Paywall. As the name suggests, this is a barrier between free content and content which the viewer must pay to watch. This monetization strategy can often complement an advertising-based approach and be used to create FOMO which can lead to more consistent and predictable revenue, such as a subscription.
Advertising Systems.
Supply-side Platforms (SSPs). SSPs are software used to sell advertising in an automated fashion and most often used by online publishers to help them sell display, video, and mobile ads. SSPs are designed by publishers to do the opposite of a DSP: to maximize the prices their impressions sell at. SSPs and DSPs utilize very similar technologies.
Ad Exchange. An ad exchange is a digital marketplace which enables advertisers and publishers to buy and sell advertising space, often through real-time auctions. They’re most often used to sell display, video, and mobile ad inventory.
Video Ad Insertion. Getting advertisements into a video stream is in no way as easy or straight-forward as doing so in broadcast television. Streaming workflows which want to monetize content through advertising need technology to stitch the ad into the video stream. This process can happen server-side (SSAI) or client-side (CSAI). SSAI is often used for live content while CSAI is more utilized for on-demand content.
Buy-Side Ad Servers. Buy-Side Ad Servers are video ad servers utilized by the advertiser.
Ad Network.
Video Ad Servers. An ad server is a technology which manages, serves, tracks, and reports online display advertising campaigns. The process by which ad servers operate is relatively simple. First, a user visits a video where the publisher’s ad server gets a request to display the ad. Second, once the ad servers receive the request, it examines the data to choose the most appropriate ad for the viewer. The ad tag contains an extensive list of criteria fed by the advertiser. The ads will be selected based on several factors such as age, geography, size, behavior, etc. Third, once the best match has been made, it’s passed to the video ad insertion technology (again, client-side or server-side) where it can be delivered to the player for playback. Finally, the player gathers information relating to the user interaction with the ad such as clicks, impressions, conversions, etc.
Demand-Side Platforms (DSPs). Demand-side Platforms (DSPs) are used by marketers to buy ad impressions from exchanges as cheaply and as efficiently as possible. These are the marketer’s equivalent to the SSP.
Content Recommendation
Perhaps one of the most exciting aspects of delivering video via streaming rather than broadcast is data. With streaming video, there is a myriad of data generated from each view, data that is not available in a broadcast environment. As such, streaming platform operators can tailor the viewing, content, and even advertising experience, more tightly with each individual viewer providing for a far more personalized experience. One of those technologies is content recommendation. Often packaged into an “engine,” these software components installed within the delivery workflow analyze data and, using the metadata attached to each piece of content, can recommend content for the viewer to watch based on what they, or people like them, have watched. This can significantly improve engagement metrics, such as viewing time, as well as revenue.
The Workflow is a Process
Unlike the other two categories, Infrastructure and Operations, the Workflow category of the Datatecture represents a somewhat linear progression: content is transformed, secured, delivered, played back, and monetized. Of course, some of the individual technologies may be integrated within different functional components of the workflow (such as watermarking happening during transformation) but there is generally a flow within the workflow pipeline. What this demonstrates, like in the other categories, is a very intricate web of technologies which must all work in harmony to provide a scalable, resilient, and high-performing streaming service.
To learn more, visit and explore the Datatecture site.
Understanding the Datatecture Part 3: Video Infrastructure Deep-Dive
In Part three of this series, we dig into some of the deeper layers of the Streaming Video Datatecture in the Infrastructure category, defining many of the individual sub-categories and explaining their purpose in the broader workflow.
As we covered in the first post of the series, the Datatecture is governed by three main categories: Operations, Infrastructure, and Workflow. Within these categories are also a myriad of other sub-categories, often branching into even more specific groups. This structure isn’t intended as a parent-child hierarchy. Rather, it is just a way of illustrating relationships between specific components and categories of functionality. For example, there are many systems and technologies within analytics that don’t compete against each other because they handle different sets of data from video player metrics to customer behavior.
What is Video Infrastructure?
As was discussed in the initial blog post, Infrastructure refers to systems which house many of the streaming stack technologies. Infrastructure components represent the most foundational aspect of the stack: storage, databases, containers, and even queueing systems. These systems enable many of the other streaming technologies to work at scale.
Containers and Virtualization, Storage and Caching, and Queueing Systems
Within the Infrastructure category, there are three primary sub-categories which were outlined in the first post of this blog series. Let’s dig past those and go deeper into video Infrastructure to understand the individual systems involved in this area of the Datatecture.
Containers and Virtualization
As streaming providers have adopted cloud-based components within their technology stack and have moved from monolithic software architectures to microservices, containers and virtualization have become increasingly important. That’s because hardware-based approaches don’t scale well to global audiences.
To meet the needs of all geographic audiences, such as those with low latency, providers would have to host physical servers around the globe. As those audiences grew, they would need to add more servers to support the demand. It becomes a very expensive proposition.
Virtualization, though, and especially containers, allow operators to deploy new streaming infrastructure into existing cloud providers, enabling operations to grow or shrink programmatically. Containerization is especially exciting as it allows for a simplified – especially when using one of a variety of management tools – to spin up new streaming components that are already pre-configured for production use.
Storage and Caching
Streaming is dependent upon storage. Without somewhere to keep the segmented HTTP video files, there would be no way to provide them to requesting viewers. Of course, sometimes the storage of those segments is transitory, such as in a caching system, and other times is more permanent, such as for an on-demand library.
In addition to physical storage, this category of the datatecture also includes other storage mechanisms such as databases.
Object Storage —This is part of the core infrastructure of a streaming service: a place to house the video segments or transcoded copies. In most cases, this will be a cloud provider which offers a geographically distributed, redundant storage solution and can work in conjunction with CDN caching systems.
Origin Services —This is where the content begins. It represents the storage of the original assets which are then transcoded or packaged into different formats for delivery and storage, downstream. In many cases, this storage isn’t as distributed as object storage which is why it needs to be protected from low-efficiency caches. If there are lots of cache misses and requests need to travel back to the origin, a flood can easily tip these over. Given that, many streaming operators opt for origin services offered by other providers who can protect it against flooding and ensure that the master content is always available to be ingested into the delivery workflow.
Open Caching Management —Open Caching, a development by the Streaming Video Alliance, is an interoperable, API-based caching system that allows streaming operators, network operators, and content rights holders all visibility and control over the caching topology. As a developing set of specifications, Open Caching isn’t something that can be downloaded and installed. It needs to be built and implemented. As such, there are vendors entering the market who can implement and support Open Caching software implementation.
Time-series Databases —There are some aspects of streaming data, such as player analytics, which are time-based. It’s critical to monitor and ultimately troubleshoot player events, understanding at what point the event happened. That way, it can be correlated to other streaming data, such as CDN logs, to provide telemetry on root-cause.
Data Warehouses — Streaming is driven by data. Every component within the workflow, as evidenced by the Datatecture, throws off data. But to provide opportunity for insight, that data needs to be related. For that to happen, it needs to be stored in a single location. Data warehouses, and more recently, Datalakes, provide a single storage location for all data sources enabling streaming operators to see patterns and connections across datasets. By storing the data in a single location, analysis can be significantly sped up as there is no need to query multiple data sources when relating variables.
Queueing Systems
The streaming workflow is built upon servicing requests. Sometimes, those requests may come from viewers. Sometimes, they may come from other systems. For example, consider a user that requests a video that is not in cache. This request is passed up through the workflow to higher caches until it gets to the origin.
But what if the content requested is for a specific device or format that isn’t prepared? That then triggers the workflow to push the origin content through transcoding so it can be returned to the user. But what if there are thousands or millions of such requests? A queueing system, such as a message bus, can help prioritize and organize those requests to ensure that affected systems are receiving them without being overloaded.
Infrastructure All Works Together
These components don’t work in a vacuum. An important distinction to understand is data warehouses are linked to time-series, which are linked to object storage, which is linked to queueing systems. When looking at your own Datatecture, understanding the interplay between systems means you aren’t seeing data in a silo. Data from one component is often used by another component or that data from one technology is affected by data from another. Seeing these relationships will help you get better visibility across the entire workflow.
To learn more, visit and explore the Datatecture site. In the next blog post, we will explore the groups within the Workflow category.
Understanding the Datatecture Part 2: Operations Deep-Dive
In this second post of this series, we dig into some of the deeper layers of the Streaming Video Datatecture in the Operations category, defining many of the individual sub-categories, and explaining their purpose in the broader workflow.
Just a reminder, as we covered in the first post of this series, the Datatecture is governed by three main categories: Operations, Infrastructure, and Workflow. Within these categories are also a myriad of other sub-categories, sometimes themselves branching to even more specific groups. This structure isn’t intended as a parent-child hierarchy. Rather, it is just a way of illustrating relationships between specific components and categories of functionality. For example, there are many systems and technologies within analytics that don’t compete against each other because they handle different sets of data from video player metrics to customer behavior.
What is Operations?
As was discussed in the initial blog post, Operations refers to systems that are involved in the operation of the streaming service. Many of these systems, like dashboards, video analytics, video quality assurance, and multi-CDN solutions are part of the Network Operations Center (NOC) where operations and customer support engineers can keep careful track of what’s happening within the streaming video technology stack. But because the operation of a streaming platform extends beyond just traffic and network management, there are also a lot of other systems in use by non-engineering employees such as customer and product analytics and ad viewability.
Analytics, Monitoring, and Configuration Management
Within the Operations category, there are three primary sub-categories which were outlined in the first post of this blog series. Let’s dig past those and go deeper into Operations to understand the individual systems involved in this area of the Datatecture.
Analytics
Analytics is a core function within the streaming technology stack. As such, there are many systems (gathered into separate categories) that address a broad range of activities ranging from quality assurance to ad viewability.
Ad Viewability and Verification. One of the biggest issues with delivering digital advertising is ensuring advertising impressions are legitimate and meet the advertiser requirements such as how much time constitutes a view. Some of these systems are also involved in fraud and bot detection. The systems in this category are critical to any streaming operator whose business model includes advertising.
Identity and Attribution. Understanding the impact of marketing campaigns and other subscriber-touchpoints is crucial to maximizing viewer engagement which can have a positive impact on advertiser and subscriber revenue. The platforms in this subcategory enable streaming operators to deeply understand each user touchpoint and maximize revenue opportunities.
Customer and Product Analytics. While operations engineers are busy looking at performance data, others in the business are focused on activity within the streaming platform trying to answer such question as, “what features are users engaging with the most?” “how easy it is for users to find what they need” or “what are the most visited parts of the interface?” Answering these can be important to maximizing engagement and revenue. The service providers in this subcategory offer platforms to help product managers and product developers to better understand user interaction with the platform features.
Video Quality Assurance. One of the biggest challenges to delivering a great viewing experience is ensuring a high visual quality of the content. There may be points within the workflow, such as encoding, transcoding, and delivery where the quality of the content degrades. The systems in this group of the Datatecture analyze content visually, identifying areas where degradation has occurred (such as blocking) so that it can be remedied before delivering to the viewer.
Audience Measurement. An important dataset to the business of streaming is audience measurement. This data provides, in short, an understanding of what viewers are watching which can be instrumental in influencing future content investments. These well-known providers, such as Comscore and Nielsen, can provide invaluable data about the popularity and engagement with content.
Video Analytics. Much like broadcast, understanding the Quality of Experience (QoE) is crucial to ensuring a great viewing experience. This means gathering data about bitrates, buffering, start time, and more from the player itself. The providers in this subcategory offer specialized services to help both engineers and business-focused employees understand what the viewer is experiencing. Although many of these providers offer data via an API, they also provide proprietary visualization tools.
Monitoring
Unlike Analytics, which can involve more detailed and in-depth exploration of datasets, Monitoring is purely looking at streams of data, such as performance data, most often in a dashboard or visualization tool.
Synthetic Monitoring and Testing. It can be difficult to understand the impact of sudden scale on streaming platform features because it isn’t feasible to employ a million or more users. Synthetic testing can simulate those users and provide valuable data to understand what the real-world impact of scale might be. In addition, these same monitors can be employed to continually track operation and performance throughout the video stack including on-premise, cloud-based, and even third-parties, like CDNs, to provide a holistic view of the workflow.
Visualization and Dashboards. The visual of streaming operations is always the same: screens on the walls in the Network Operations Center displaying content and a myriad of dashboards. That’s because without visualization it would be impossible to understand what was happening. There is simply too much data coming too quickly to make sense of just numbers. Dashboards and visualization tools empower operations engineers to have visibility on performance issues, KPIs, and other data thresholds without having to dig into the numbers themselves.
Configuration Management
This subcategory within operations addresses systems which are deeply involved in how the streaming platform functions, from managing the data that is collected to how CDNs are used to deliver streams.
Data Management Platforms. Streaming is not just about content. It’s about data. Unlike broadcast, the content delivered to viewers through streaming platforms is all bits and bytes. Not only that, but each component within the technology stack throws off data: CDNs have logs, video players have metrics, etc. All of this data must be managed. The providers in this subcategory provide technologies and Software-as-a-Service offerings that enable streaming operators to have more control over the data behind their business.
Multi-CDN solutions. As streaming platforms have gone global, it has become necessary to utilize multiple CDNs as no one CDN has the best performance in every region. Using Multi-CDN services, like those offered by the providers in this Datatecture group, streaming operators can quickly and easily move between CDNs to ensure that content is always delivered on the CDN that meets the provider’s requirements, whether that is performance or price-based.
Customer Data Platform (CDP)
Sitting outside the other subcategories within operations is a very important system to subscription-based streaming services: CDPs. These platforms enable streaming operators to leverage their first-party data to better and more deeply understand their subscribers. By enabling that understanding, insights can be derived which are critical to the success of marketing campaigns and other targeted communications with subscribers.
Separate, But Not Alone
Although these operations systems are all in discrete and separate groups, they aren’t independent. Many of them provide data that can be used by other systems. For example, some of the platforms have their own dashboards but, with programmatic access to the data, that data can be pulled into more flexible visualization tools, such as Looker. By doing so, both operations engineers and business leaders can exchange simple analysis and monitoring for observability: with all of the data in one place, it can be easier to see patterns across all of the sources (of course, it helps when that data is standardized such as through a Datazoom Data Dictionary).