Podcasts

All About Dubbing

Written by Mission Cloud | Nov 9, 2023 4:00:00 PM

In this episode we explore how generative AI is automating media translation and building content dubbing pipelines for huge gains in both efficiency and cost. We explore a real solution we built for one of our customers, MagellanTV, to demonstrate what’s possible with a state-of-the-art AWS-native solution.


Show notes:

A real life example of a gen AI-powered dubbing pipeline we built for one of our customers: https://www.missioncloud.com/case-studies/magellantv-uses-generative-ai-for-expansion

Amazon's blog on multi-language translation with Amazon Polly: https://aws.amazon.com/blogs/machine-learning/create-audio-for-content-in-multiple-languages-with-the-same-tts-voice-persona-in-amazon-polly/

About Amazon Titan, AWS's home grown large language model: https://aws.amazon.com/bedrock/titan/

 

Official Transcript:

Ryan

Welcome to Mission Generate, the podcast where we explore the power of AWS and the cutting-edge techniques we use with real customers to build generative AI solutions for them. Join us each episode as we dig into the technical details, cut through the hype, and uncover the business possibilities.

I'm Ryan Ries, our generative AI practice lead and your host. In today's episode, we're going to talk about content production, translation, and getting your media into the hands of a global audience.

If you've been in the AWS ecosystem for a while, you may have heard one of their value propositions: Go Global in Minutes. It's a succinct way of framing the power of the cloud, where you can launch an application in any part of the world in just minutes.

Today, what we're tackling is another example of this, and it's going to change the media landscape in the same way. Because it gives every producer of content a global audience that they can easily scale to reach.

To kick things off, a disclaimer.

Yes, this is Ryan—my voice anyway—but I'm not actually speaking to you at the moment. An AI is. If you happened to miss it, Mission Generate is a podcast about gen AI that also happens to be made with gen AI. Our senior product marketing manager, Casey, came up with this concept as a way of letting me host because we would have never found the time to actually sit down and record.

We think it also exemplifies the power of this technology. It's convincing enough we even fooled some of my colleagues on a first listen, so we try to give folks a heads up at the start of each episode so that anyone listening knows this is AI-Ryan. Or as we like to call him, RyAIn.

Now that we've got that cleared up, let's talk about dubbing. If you're like me, your first experience of dubbing were all those incredible Kung Fu movies from the 70s. Characters' mouths were moving and someone else was speaking--often in some of the most strained, over-the-top, or just plain weird performances. They're a meme at this point.

And they're amusing, sure, but not exactly what an international media company would willingly do to their brand these days. Nowadays, companies spend huge amounts of money to cater to international audiences. Translators, voice actors, audio engineers--it's a huge undertaking to get something into another language in a way that retains accuracy without losing nuance. To do so without losing the magic of the original performance is nearly impossible. And for each new language you intend to reach, you're going to have to multiply all that cost and effort.

Well, I should have said—"used to be"—impossible. Because Gen AI, especially when paired with the power of AWS services, is completely overhauling this process. Using tools like Translate, Transcribe, and Polly you can get real-time translation and perfect voice synchronization. But generative AI really takes this to the next level, by allowing you to intercept hard-to-translate phrases or translations that won't match the speaking length of the original delivery, and reformulate it.

This kind of work used to take a lot of manpower, but not anymore.

Here's the deal: if you're in corporate communications, if you host webinars, if you're running an e-learning platform, if you're building a video game, or even like me, just hosting a podcast, this completely changes the economics of reaching a global audience. And when I say completely changes, let me put a hard number on that: we have seen an almost 500x reduction in cost in a real world implementation we've built. So this is an industry-shattering development.

We're at the very beginning of this technology's potential, but from the results we've seen, I think it's going to upend the content production pipeline of just about every form of media you can imagine.

Let me share a real-life transformation story of one of our customers. Magellan TV approached us with a vision. They had been making inroads in Spanish and Mandarin-speaking markets. But like many before them, they were relying on a traditional translation pipeline which was slow, expensive, and inefficient. Consider a common problem: how do you translate English slang or idiom into another language without losing its meaning or charm? Often, there isn't a good 1-to-1 match for these types of phrases. So normally you'd hire someone really gifted in both languages, or realistically speaking, a team of them. But we set up a system that detects English idiom and slang, and with the combined power of Amazon Translate and Amazon's own large language model, Amazon Titan, we automated this translation process. Then we used Polly to capture and synthesize the original voices in their newly dubbed languages.

In fact, this is all very similar to the techniques that are bringing my synthesized voice to you right now.

Let me sum this up for you. The speed, the accuracy, the cost, the fidelity of this system--all of these are an improvement over the status quo. Such an improvement, I'd say, that solutions built this way are going to out-perform their human-driven equivalents, and they're going to get even better over time. Think about it: the more media your models ingest and can train on, the more precise they will get, leading to better and better translations over time.

We're at the leading edge of this technology, and I still think we haven't seen all that it can do for us. But what I have seen already represents such an improvement that I think any business producing any kind of content should consider it.

Well, I hope you learned something and are just as excited as we are. Because I think we've only scratched the surface of this technology's potential.

And hey, if you found yourself thinking of opportunities to expand your reach, or if you're already dealing with the challenges of addressing a global audience, we'd love to talk to you more. My team works with customers of all sizes, and we're familiar with the challenges of reaching a large audience. And if you'd like, we'll give you an hour of our time, free of charge, just to discuss what you're working on.

We enjoy building solutions like these and you can read more of how we did exactly that for Magellan TV. Just hop on over to mission cloud dot com and read the case study. You can also reach us there, chat with my team, and discuss your own project.

Okay, that's it for this episode. Good luck out there, and happy building!