• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 00:38
CEST 06:38
KST 13:38
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Code S RO4 & Finals Preview: herO, GuMiho, Classic, Cure6Code S RO8 Preview: Classic, Reynor, Maru, GuMiho3Code S RO8 Preview: ByuN, Rogue, herO, Cure5[ASL19] Ro4 Preview: Storied Rivals7Code S RO12 Preview: Maru, Trigger, Rogue, NightMare12
Community News
Code S Season 1 - Classic & GuMiho advance to RO4 (2025)4[BSL 2v2] ProLeague Season 3 - Friday 21:00 CET7herO & Cure GSL RO8 Interviews: "I also think that all the practice I put in when Protoss wasn’t doing as well is paying off"0Code S Season 1 - herO & Cure advance to RO4 (2025)0Dark to begin military service on May 13th (2025)21
StarCraft 2
General
Code S RO8 Preview: Classic, Reynor, Maru, GuMiho Code S RO4 & Finals Preview: herO, GuMiho, Classic, Cure Is there a place to provide feedback for maps? Code S RO8 Preview: ByuN, Rogue, herO, Cure Code S Season 1 - Classic & GuMiho advance to RO4 (2025)
Tourneys
[GSL 2025] Code S Season 1 - RO4 and Grand Finals RSL: Revival, a new crowdfunded tournament series SOOPer7s Showmatches 2025 [GSL 2025] Code S:Season 1 - RO8 - Group B SOOP Starcraft Global #20
Strategy
Simple Questions Simple Answers [G] PvT Cheese: 13 Gate Proxy Robo
Custom Maps
[UMS] Zillion Zerglings
External Content
Mutation # 473 Cold is the Void Mutation # 472 Dead Heat Mutation # 471 Delivery Guaranteed Mutation # 470 Certain Demise
Brood War
General
Pros React To: Emotional Finalist in Best vs Light ASL 19 Tickets for foreigners BGH auto balance -> http://bghmmr.eu/ BW General Discussion Recent recommended BW games
Tourneys
[ASL19] Semifinal B [BSL 2v2] ProLeague Season 3 - Friday 21:00 CET [USBL Spring 2025] Groups cast [ASL19] Semifinal A
Strategy
[G] How to get started on ladder as a new Z player Creating a full chart of Zerg builds [G] Mineral Boosting
Other Games
General Games
What do you want from future RTS games? Stormgate/Frost Giant Megathread Beyond All Reason Grand Theft Auto VI Nintendo Switch Thread
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
LiquidLegends to reintegrate into TL.net
Heroes of the Storm
Simple Questions, Simple Answers
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia TL Mafia Community Thread TL Mafia Plays: Diplomacy TL Mafia: Generative Agents Showdown Survivor II: The Amazon
Community
General
UK Politics Mega-thread Russo-Ukrainian War Thread US Politics Mega-thread Elon Musk's lies, propaganda, etc. Ask and answer stupid questions here!
Fan Clubs
Serral Fan Club
Media & Entertainment
[Manga] One Piece Movie Discussion! Anime Discussion Thread [Books] Wool by Hugh Howey
Sports
Formula 1 Discussion 2024 - 2025 Football Thread NHL Playoffs 2024 NBA General Discussion
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread Cleaning My Mechanical Keyboard How to clean a TTe Thermaltake keyboard?
TL Community
The Automated Ban List TL.net Ten Commandments
Blogs
Why 5v5 Games Keep Us Hooked…
TrAiDoS
Info SLEgma_12
SLEgma_12
SECOND COMMING
XenOsky
WombaT’s Old BW Terran Theme …
WombaT
Heero Yuy & the Tax…
KrillinFromwales
BW PvZ Balance hypothetic…
Vasoline73
ASL S19 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 23809 users

[G] GenAI subtitles for Korean BW content

Forum Index > BW General
Post a Reply
Kraekkling
Profile Blog Joined June 2007
Romania383 Posts
Last Edited: 2025-05-07 04:12:04
May 07 2025 01:42 GMT
#1
ASL RO8, Soulkey vs Rush, from Flash/Shuttle stream

ASL RO8 spoilers below!!

g1 + Show Spoiler +
https://www.captionfy.com/video/youtube/Ixu6V3pCQf8?c=en

g2 + Show Spoiler +
https://www.captionfy.com/video/youtube/p7l6c5qzoDw?c=en

g3 + Show Spoiler +
https://www.captionfy.com/video/youtube/p_rvWNRKhgw?c=en

g4 + Show Spoiler +
https://www.captionfy.com/video/youtube/a478rarEBTY?c=en

g5 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb

g6 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb

g7 + Show Spoiler +
https://www.captionfy.com/video/youtube/YZWpi_IUi94?c=en-Ntb


The latest Gemini model by Google can handle video input and works surprisingly well for generating English subtitles for Korean Brood War videos. It still makes mistakes here and there and sometimes hallucinates, but it's a big step up from the gibberish you get from YouTube's auto-subtitles. If I had to guesstimate, I’d say it gets >80% right, which feels pretty impressive.

Workflow below.

+ Show Spoiler +
I'm using Gemini 2.5 Pro Preview (05-06) at https://aistudio.google.com/ with default settings. The model is currently free to test. It supports up to 1 million tokens of context; one minute of video is roughly 20k tokens, so the videos above ended up around ~160k–170k tokens each. However this means long content videos like daily proleague or KCM would not work as these exceed the context limit. Maybe chopping them up somehow could work?

Basically, I just pass it the YouTube link and ask it to generate English subtitles.

I've found it works better if I do this in two steps. First, I give it the link and just ask, "what is happening here?".

It will take a while and output a summary.
+ Show Spoiler +
[image loading]

+ Show Spoiler +
Interestingly, this summary often has hallucinations and often doesn’t accurately describe the video. Still, I noticed that when I skip this step and instead ask for subtitles right away, the results are worse. It seems like preloading the context window with Brood War jargon actually helps when it comes time to generate the subtitles. The summary itself being wrong doesn't seem to have any effect on the quality of the subtitles.


After that, I ask it to create the subtitles. The prompt I use looks like this:

+ Show Spoiler +

create english subtitles (.srt)

Quick sanity checklist for SRT files:

Sequential numbers starting at 1.

Timestamp line exactly HH:MM:SS,mmm --> HH:MM:SS,mmm.

The video is less than 1 hour long so all timestamps must start with 00 for HH.

One subtitle text line.

A blank line after every cue.


This should give you subtitles you can copy, save as an .srt file, and use with the video. + Show Spoiler +
[image loading]


The resulting .srt file sometimes has errors which results in missing text; this is often due to the generated formatting being wrong. Most of the times I found it best to just re-run until it worked. Alternatively you could adjust the prompt or fix the .srt yourself. I found the browser addon substital useful, because it allows you to use a local .srt file for youtube videos; and it generated error messages caused by wrong formatting of the .srt files faster than captionify.

I’m still figuring out the best way to share these or upload them for YouTube. I found captionfy, which seems pretty easy to use. You sign up and can create a shareable overlay for any YouTube video. The good thing is that traffic still goes to the original creator, and anyone can upload subtitles that are then available for everyone.

I guess the end goal would be to automate the full pipeline and translate a lot of stuff? It seems captionfy does not have an api so maybe something else might be better suited?

Also the gemini model likely won't be free forever, but with current pricing it should be possible at about ~6cent per 1 minute of content (for videos of similar length) which seems cheap enough? The price scales with (video) input length so longer videos will be more expensive.
(*^^)(^*)
Last.Midnight
Profile Blog Joined July 2006
Australia902 Posts
May 07 2025 01:53 GMT
#2
I was curious about doing this. Surely there are models/n8n setups that can automatically replace/overdub the voice too?

Thanks for sharing man this is great.
Last.Midnight
Profile Blog Joined July 2006
Australia902 Posts
May 07 2025 02:49 GMT
#3
Recall (https://www.getrecall.ai/) provides written translations and app.vozo.ai apparently does voiceover dubs, but I'm not sure how accurate they are and it's expensive.
Simplistik
Profile Blog Joined November 2007
1977 Posts
May 07 2025 03:34 GMT
#4
I feel like there is a webservice niche for automating this workflow if anyone has the patience to make to makw it work.
Dear BW Gods, I know it's not autumn (in the Northern hemisphere), but please have mercy on Protoss.
Last.Midnight
Profile Blog Joined July 2006
Australia902 Posts
Last Edited: 2025-05-07 04:25:05
May 07 2025 04:24 GMT
#5
yt-dlp for download into ElevenLabs overdub most likely. Only problem is the EL credits.

Possibly with a specialised Eng>Kor model in between.
rtyrt7
Profile Joined August 2018
46 Posts
May 07 2025 07:34 GMT
#6
Maybe the free models over here would also be helpful, as API:
https://openrouter.ai/models?max_price=0

But it has these limits for the models whose ID is ending in ":free":
- Per-Minute Limit: 20 requests per minute
- Daily Limit: 50 requests per day per account
prosatan
Profile Joined September 2009
Romania7775 Posts
May 07 2025 07:57 GMT
#7
Thank you Kraekkling !
Lee JaeDong Fighting! The only church that illuminates is the one that burns.
Kraekkling
Profile Blog Joined June 2007
Romania383 Posts
May 07 2025 11:56 GMT
#8
On May 07 2025 10:53 Last.Midnight wrote:
Surely there are models/n8n setups that can automatically replace/overdub the voice too?


This is likely not feasible yet. What you're talking about is basically a different piece of technology.

You're right though that there are models that are able to translate audio and output sound in a voice similar to the speaker. However those models are several orders of magnitudes smaller than what we have here and do purely audio-to-audio. They can't handle long-term context. Also there just isn't much training data for these models to be able to properly handle bw jargon.

The advantage of the Gemini model is that we're using information from the video itself (not only the audio) and also tapping inside its "general intelligence" which is due to the very big model size. Additionally here we have inference time scaling, which means the model internally outputs an ensemble of chain-of-thought threads in which it discusses the best way to translate a given passage of video given the overall context, before giving an answer to the user.

However I think we might be not too far away to have models which could do what you suggested, give it 1-2 years at max and we'll be there. The next iteration of openai's omni-series might already do it.

(*^^)(^*)
yubo56
Profile Joined May 2014
687 Posts
May 07 2025 20:31 GMT
#9
On May 07 2025 20:56 Kraekkling wrote:
Show nested quote +
On May 07 2025 10:53 Last.Midnight wrote:
Surely there are models/n8n setups that can automatically replace/overdub the voice too?


This is likely not feasible yet. What you're talking about is basically a different piece of technology.

You're right though that there are models that are able to translate audio and output sound in a voice similar to the speaker. However those models are several orders of magnitudes smaller than what we have here and do purely audio-to-audio. They can't handle long-term context. Also there just isn't much training data for these models to be able to properly handle bw jargon.

The advantage of the Gemini model is that we're using information from the video itself (not only the audio) and also tapping inside its "general intelligence" which is due to the very big model size. Additionally here we have inference time scaling, which means the model internally outputs an ensemble of chain-of-thought threads in which it discusses the best way to translate a given passage of video given the overall context, before giving an answer to the user.

However I think we might be not too far away to have models which could do what you suggested, give it 1-2 years at max and we'll be there. The next iteration of openai's omni-series might already do it.


Wait, but you're describing the difficulty of direct audio-audio translation. If you already can do audio -> translated text though, can't you just slap a text-to-speech and have a (basic) audio-audio translation?

I guess you'd have trouble matching the duration of the sentences, but with some simple squeezing and stretching of audio bytes it's still surely quite feasible compared to direct audio-to-audio translation...
Jung Yoon Jong fighting, even after retirement! Feel better soon.
prion_
Profile Joined September 2022
64 Posts
Last Edited: 2025-05-07 22:10:15
May 07 2025 22:08 GMT
#10
The problem is that it would sound like TikTok caption voice. I mean, not exactly that, but you wouldn't be able to keep the rhythm and modulation of their voices by going audio->text->audio, even if you adjusted for time.
IntoTheWow
Profile Blog Joined May 2004
is awesome32273 Posts
May 08 2025 02:26 GMT
#11
This is really cool!

Do you think that adding some keywords in the prompt could help the model? Like units, BW jargon, etc? Or are errors due to other factors?
Moderator<:3-/-<
Last.Midnight
Profile Blog Joined July 2006
Australia902 Posts
May 08 2025 03:36 GMT
#12
I tried ElevenLabs dubbing feature and it works pretty great. Of course I can't speak to the accuracy of the translation but it's certainly more accurate than "translate to English" on Chrome. Only funny thing is that it also dubs the unit sounds so whenever the player isn't speaking he'll repeat SCV commands etc. haha
Lorch
Profile Joined June 2011
Germany3672 Posts
May 08 2025 13:00 GMT
#13
This is completely useless if you don't speak Korean.
You can never know what part of the translation are accurate and which aren't. You thinking that it sounds reasonable/makes sense is not a great heuristic, especially with how AIs tend to hallucinate.

Would probably need a dedicated bw ai model trained under the supervision of someone who speaks korean + english and is knowledgeable in starcraft to create something worth using.
Kraekkling
Profile Blog Joined June 2007
Romania383 Posts
May 08 2025 14:32 GMT
#14
On May 08 2025 11:26 IntoTheWow wrote:
This is really cool!

Do you think that adding some keywords in the prompt could help the model? Like units, BW jargon, etc? Or are errors due to other factors?


We're pre-filling the prompt with BW jargon by asking for a video summary first. As to why there are errors - I guess the easiest answer is that the technology is not 100% there yet. Machine translation generally got useful only in the last decade or so... Additionally, BW is a niche domain - one needs a sufficient world model to make sense of the meaning behind words. Koreans often use abbreviations, for example they'd say "zildra" for a zealot/dragoon army; or "sam-hat" (삼햇) for a 3-hatchery opening, etc. I've also tried older models but this one by far is the best one to make sense of stuff like this.

To me, the fact that any of this works at all is pretty crazy.

On May 08 2025 22:00 Lorch wrote:
Would probably need a dedicated bw ai model trained under the supervision of someone who speaks korean + english and is knowledgeable in starcraft to create something worth using.


Unfortunately this won't happen, so for now its either youtube auto-subs or this. + Show Spoiler +
also this is not how models are trained


This is completely useless if you don't speak Korean.
You can never know what part of the translation are accurate and which aren't. You thinking that it sounds reasonable/makes sense is not a great heuristic, especially with how AIs tend to hallucinate.


Maybe someone who speaks Korean could comment? I'm only comparing this to yt auto-subs, and it felt like even with some obvious hallucinations the overall commentary was pretty easy to grasp?
(*^^)(^*)
Last.Midnight
Profile Blog Joined July 2006
Australia902 Posts
Last Edited: 2025-05-08 21:25:04
May 08 2025 21:24 GMT
#15
On May 08 2025 22:00 Lorch wrote:
This is completely useless if you don't speak Korean.
You can never know what part of the translation are accurate and which aren't. You thinking that it sounds reasonable/makes sense is not a great heuristic, especially with how AIs tend to hallucinate.

Would probably need a dedicated bw ai model trained under the supervision of someone who speaks korean + english and is knowledgeable in starcraft to create something worth using.


Not useless, but not optimal either. Some phrases are lost but things like "focus fire the tank here" when he's also clicking a tank is pretty clear. Hallucinations don't happen as much when models draw from source material, they tend to happen when the trained parameters through a massive database misinterpret a request.

That's why for enterprise integration RAG is all the rage, since the "database" the models link to is the company's data.
Please log in or register to reply.
Live Events Refresh
Online Event
04:00
May Mayhem: Playoffs
Clem vs ShoWTimE
herO vs MaxPax
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
PattyMac 30
StarCraft: Brood War
Leta 749
Noble 286
Sharp 90
NaDa 84
Icarus 8
Dota 2
monkeys_forever675
NeuroSwarm156
League of Legends
JimRising 722
Counter-Strike
Stewie2K641
Super Smash Bros
Mew2King1063
Other Games
summit1g9781
WinterStarcraft551
ViBE213
RuFF_SC2131
Organizations
StarCraft 2
ESL.tv163
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 12 non-featured ]
StarCraft 2
• practicex 80
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• Ler51
Upcoming Events
GSL Qualifier
3h 52m
Sparkling Tuna Cup
5h 22m
WardiTV Invitational
6h 22m
Percival vs TriGGeR
ByuN vs Solar
Clem vs Spirit
MaxPax vs Jumy
Anonymous
9h 22m
BSL Season 20
10h 22m
TerrOr vs HBO
Tarson vs Spine
RSL Revival
12h 22m
BSL Season 20
13h 22m
MadiNho vs dxtr13
Gypsy vs Dark
Wardi Open
1d 6h
Monday Night Weeklies
1d 11h
Replay Cast
2 days
[ Show More ]
The PondCast
3 days
Replay Cast
3 days
Replay Cast
4 days
Road to EWC
5 days
SC Evo League
6 days
Road to EWC
6 days
Liquipedia Results

Completed

Proleague 2025-05-14
2025 GSL S1
Calamity Stars S2

Ongoing

JPL Season 2
ASL Season 19
YSL S1
BSL 2v2 Season 3
BSL Season 20
China & Korea Top Challenge
KCM Race Survival 2025 Season 2
NPSL S3
Heroes 10 EU
PGL Astana 2025
Asian Champions League '25
ECL Season 49: Europe
BLAST Rivals Spring 2025
MESA Nomadic Masters
CCT Season 2 Global Finals
IEM Melbourne 2025
YaLLa Compass Qatar 2025
PGL Bucharest 2025
BLAST Open Spring 2025
ESL Pro League S21

Upcoming

Rose Open S1
CSLPRO Last Chance 2025
CSLAN 2025
K-Championship
Esports World Cup 2025
HSC XXVII
Championship of Russia 2025
Bellum Gens Elite Stara Zagora 2025
2025 GSL S2
DreamHack Dallas 2025
IEM Cologne 2025
FISSURE Playground #1
BLAST.tv Austin Major 2025
ESL Impact League Season 7
IEM Dallas 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.