Architecting A Video Transcoding System | Associated Problems & Solutions

3 min read

Architecting A Video Transcoding System | Associated Problems & SolutionsAfter reading the well written and thought provoking article
about 
Architecting a Video Transcoding System
by Brian Peebles, I
thought I’d talk a bit about the problems and consequent solutions in
this field.  

I’ve
been involved in building audio/video and graphic systems for over 20
years, and architecting these systems for performance, function, cost,
etc. has been a daunting task, with the consequences large and long
standing.  

I like to tell the story of when I was a junior
engineer
building mainframe graphic systems for CAD/CAM applications, we had a
director that in every product meeting would remind us of our top 3
product goals; 1) performance, 2) performance, and 3) ah –
performance.  

He would then go on with a wry smile and say
there is
actually a 4th goal, but I don’t think I have to mention what
that was.

Performance Is Not The Only Requirement

As Brian pointed out in his article, performance is not the
only
requirement. These systems must also be architected for
functionality,
quality, serviceability, upgradeability, as well as other less thought
of but equally important factors such as space and power
requirements. 

Architecting video conferencing systems for 6 years, including
going
thru 3 different architectures, has further heightened my awareness of
the critical impact in choosing the right architecture now, and for
that architecture to be viable as your business grows.

Architecting video transcoding systems leverages very similar
requirements demanded by these earlier graphic and recent video
conferencing systems.  

Hardware Selection Is Critical

With the technology requirements needed
for
audio and video transcoding, the architecture and hardware selection is
critical.  Audio and video processing demand is increasing
exponentially, from the social networking sites, to the deployment of
video content to mobile phones and laptops, to IPTV,
etc.  

The
plethora
of audio and video algorithms (e.g. AMR, AAC, MP3, AC3, H.264, WMV,
Flash, MPEG), along with all of their various profiles, as well as the
almost infinite number of resolutions, frame rates, and bit rates has
generated the need for multimedia processing that is hard to
comprehend.  

Add to this all of the required image and audio
processing
(e.g. scaling, de-interlacing, sample rate conversion, audio gain and
normalization), and we effectively need the horsepower of a top fuel
dragster with the function, reliability, maintainability, and earth
friendly features of a Toyota Prius.

So, what does all of this mean?  Well, it means we
need ultra fast
(for the most part parallel) mathematical processing, extremely
efficient data movement, and the flexibility and programming model to
quickly react to customer’s changing demands and the ever
changing
arithmetical standards.  

ASIC Has Its Problems

Unfortunately, these goals have
traditionally
been at odds with each other.  Typically, the fastest hardware
for
video type processing has been an ASIC.  But, ASIC’s
have their
inherent problems.  

They typically only support one or maybe a
few
codec standards, won’t support new codec standards, and would
very
likely not support a new profile or appendix of an existing
standard.  

And in case anyone get’s the notion that
‘this
standard and its
profiles are the last’, take a look at the relatively new
H.264 – the
ever growing number of profiles is getting
staggering.  

Add to
that the
fact that it is usually not cost effective for a company to build its
own ASIC, thus relying on a 3rd party vendor, an ASIC solution is
usually a risky non extensible solution.

And As For GPPs

At the other end is flexibility.  GPPs traditionally
have been the
most flexible platform.  But it is quite a universal feeling
that GPPs
just are not designed for the heavy mathematical processing required
for video compression and image processing.  

Throwing multiple
cores
helps some, but diminishing returns quickly kick in with regard to data
movement and power consumption.

Other Solutions

Other solutions are starting to gain some traction, such as
FPGAs,
and the new class of parallel programmable processors (e.g. Stream
Processors, IBM Cell).  FPGAs have the advantage that
they’re quite
fast, and retain a level of programmability.  

However, they
suffer
drawbacks in that typically the engineers that have the programming
competence are not video algorithmetic engineers, resulting in either
sub optimal implementations or difficult project
collaboration.  

FPGAs
also typically require some means of ‘GPP’ for
general control and
system interface, so you end up with a multi architecture solution with
sometimes significant data movement.

The new class of parallel programmable processors are
certainly an
interesting piece of technology worth watching.  The claims
from a
couple of years ago are quite interesting – ASIC level speed
with the
ease and programmability of a GPP or DSP.  

I think the jury is
still
out on these claims, and as we’re starting to see some of
these
technologies coming to fruition, we’ll start to get some
visibility
into the actual performance and practical programmability.

What About DSPs?

Ok, you noticed I left DSP’s for last. 
DSP’s have traditionally
been the choice for many multimedia architectures, but have always left
architects wishing for more.  They were never quite fast
enough, never
quite easy enough to extract the maximum performance
from.  

Many
companies have shifted DSP’s and architectures in search of
the ‘holy
grail’.  Well, I think we may be finally getting
there with the next
class of traditional DSP’s that include
‘GPP’ cores and hardware
acceleration for mathematical functions required by video
compression. 

The result is the potential to have it all – the
ease of GPP
programmability for control software and system interfacing, the speed
and programmability of a DSP core for traditional video and audio
processing, and the raw speed of an ASIC for the heavy processing power
required by a video compression algorithm like H.264 for functions like
motion estimation and de-blocking.  

The result being a single
core that
can be used to build a transcoding product that is flexible,
sustainable, eco friendly, and reflecting back to that director of mine
many years ago – just plan ol’ fast.

Originally written by Rich Hall of the RipCode Blog. RipCode offers on-demand video transcoding solutions to ease the process of re-purposing video into multiple viewing formats.

Author