window.huggingface={variables:{"SPACE_CREATOR_USER_ID":"69f581b71986e1d42596490d"}};d> Miwa — Real-Time Discord Voice Translation
AMD MI300X · 192 GB HBM3 · Llama 3.3 70B · FP16

Miwa 美話

Language barriers cost gaming communities millions of shared moments every day. Miwa eliminates the gap — per-speaker Discord voice translation with romaji, a three-agent CrewAI suggestion pipeline, and style-matched LLM refinement, targeting under 800ms on AMD MI300X.

Francis Daniel Genese (Mizu) AMD Developer Hackathon 2026 · lablab.ai · Track 1: AI Agents & Agentic Workflows · May 4–11, 2026
<800ms
End-to-end latency
70B
Model parameters
3
AI reply suggestions
FP16
Precision on MI300X
200M+
Monthly Discord users
Discord has no native real-time translation. Miwa is a transparent overlay that requires nothing from the other person.
<800ms
End-to-end latency target
Fast enough to follow the conversation as it happens — no awkward lag, no interrupting the flow of the call
$0
New hardware required
Pure software overlay — runs on any PC already running Discord, no special headsets or upgrades needed
Demo

What you see during a call

The Miwa overlay sits transparently above your game or browser. Per-speaker cards appear as each person talks, fading out when they go silent.

Word-by-word karaoke highlight
Each word lights up in sync with the speaker's voice as openai-whisper emits word-level timestamps.
Two-pass translation
A fast translation appears the moment speech ends. The LLM then refines it to your chosen style — the card updates in place with no flicker.
Reply with one key
Press 1, 2, or 3 to instantly send that suggestion to Discord chat. Each card also has individual buttons — Bot Speaks in VC, Bot Sends in text, or I’ll Speak (opens fullscreen romaji).
AI agents generate each suggestion Track 1
Three CrewAI agents — Analyst → Strategist → Writer — run on AMD MI300X to produce context-aware Japanese replies. Per-speaker Qdrant vector memory means suggestions improve as the conversation continues.
Miwa
connected
Formal Neutral Casual Gaming
Opacity
100%
?
Type English — translates as Casual…
🍁 Valo Tomodachi 🌵 · #🔥 | ざつだん 🍁_ · 1 in call ▾
M
`Mizu
lol kusa 草w
💬 🔊
Ehh?! ee~ えー!
💬 🔊
Seriously? maji de? マジで?
💬 🔊
Insane! yabai! やばい!
💬 🔊
Nice one! ii ne! いいね!
💬 🔊
Speakers
M
`Mizu 🎤
AI
それは面白いですね
soreha omoshiroi desune
That’s interesting.
Reply With
1
そうですね
Yes, it is
📌
🔊 Bot Speaks 💬 Bot Sends 👤 I’ll Speak
2
どうして面白いと思ったんですか
Why do you think it’s interesting?
📌
🔊 Bot Speaks 💬 Bot Sends 👤 I’ll Speak
3
私も興味があります
I’m interested too
📌
🔊 Bot Speaks 💬 Bot Sends 👤 I’ll Speak
· LLM ~500ms · WS open
Pipeline

Voice to translation targeting under 800ms

01
Audio
Capture
<5ms
02
openai-whisper
STT
~150ms
03
Google
Translate
<100ms
04
Llama 3.3
70B FP16
~500ms
05
CrewAI
Agents
~8–15s†
06
Tauri
Overlay
<20ms
♪ PCM
miwa — live pipeline simulation
01 — Audio Capture <5ms
IN Discord Opus stream (per-speaker, 48 kHz)
OUT
Latency breakdown target <800ms end-to-end
0ms200ms400ms600ms800ms
01 Capture (<5ms)
02 Transcription (~150ms)
03 Fast Translation (<100ms)
04 LLM refinement (~500ms)
05 Overlay (<20ms) — 06 Suggestions deferred (~8–15s, agentic pipeline)
Video

Watch it in a live call

Full end-to-end walkthrough recorded live on AMD MI300X hardware.

Infrastructure

Why AMD MI300X matters for this

Used in Miwa — AMD Developer Cloud
AMD MI300X
ROCm 7.2 · vLLM 0.17.1 · PyTorch 2.6.0 · 20 vCPU · 240 GB RAM
192GB
HBM3 VRAM
5.3TB/s
Mem bandwidth
FP16
Full precision
RTX 5090 — Consumer NVIDIA H100 — Data center AMD MI300X ✓ Miwa
VRAM 32 GB GDDR7X 80 GB HBM2e 192 GB HBM3
Llama 3.3 70B INT4 only — quality loss FP8/INT8 — barely fits Full FP16, single GPU
Bandwidth 1.79 TB/s 3.35 TB/s 5.3 TB/s
Multi-GPU Required for 70B Often needed Not needed
Ecosystem CUDA only CUDA only ROCm — open source
Zero quantization loss
192GB unified HBM3 memory fits Llama 3.3 70B entirely in FP16. No INT4/INT8 rounding errors that degrade Japanese nuance and translation style.
5.3 TB/s memory bandwidth
Token generation is memory-bound, not compute-bound. The MI300X moves model weights to compute units 3× faster than an RTX 5090 — every transformer layer stays fed without stalling, which is why full FP16 inference completes under 500ms.
ROCm open ecosystem
openai-whisper, vLLM 0.17.1, and PyTorch 2.6.0 all run on ROCm 7.2 — the same PyTorch code that runs on CUDA runs here with a single env flag.
Multi-speaker concurrency
Discord voice channels can have 10+ speakers. The MI300X handles concurrent openai-whisper transcription and vLLM inference requests without GPU contention.
Features

Everything you need to stay in the conversation

Two-pass translation
Instant display, then refined
Google Translate fires in under 100ms so you read immediately. Llama 3.3 70B follows with a style-aware refinement — Formal, Neutral, Casual, or Gaming — updating the card in place.
Google
<100ms
LLM
~500ms
Always-on-top overlay
Tauri v2 window, zero distraction
Transparent, frameless, always-on-top Tauri window sits over your game or browser. Drag the header to reposition. Double-click to snap to corners. Resize vertically from the bottom edge.
Quick Reply
Type English, see Japanese live
Debounced auto-translation as you type. Preview your message in Japanese and romaji before sending — the bot delivers it in your chosen style.
Shortcuts
Keyboard-first workflow
123Send suggestion to chat
Ctrl+1–9Phrasebook slots
?Show all shortcuts
EscClose romaji popup
Quick Reactions
Mode-adaptive reactions library
80 pre-written reactions (20 per mode) that automatically swap when you change style — Casual, Gaming, Formal, Neutral. Live search filters by JP text, romaji, or English. One click sends via Bot Sends (chat) or Bot Speaks (TTS in voice channel).
草w えー! マジで? やばい! いいね! +75
Memory
Per-speaker Qdrant vector store
Each utterance is embedded and stored against the speaker’s Discord user ID. When the CrewAI Analyst agent runs, it retrieves the most relevant past exchanges via vector similarity — giving the suggestion pipeline actual conversation context, not just the last sentence. Suggestions get more accurate as the call progresses.
Romaji
Fullscreen popup
Hit 3 to open a large romaji pronunciation overlay and speak the reply yourself.
Stack

Built on AMD MI300X and open-source AI

Primary GPU
AMD MI300X
192GB HBM3 unified memory — runs Llama 3.3 70B in full FP16 with zero quantization loss
192GB VRAM FP16 ROCm 7.2 5.3 TB/s vLLM 0.17.1
Inference
Serving
vLLM + ROCm
STT
openai-whisper
AI & Agents
LLM
Llama 3.3 70B
Alt. LLM
Qwen2.5 72B
Agents
CrewAI
Vector DB
Qdrant
TTS
edge-tts
Translation
Google Translate
Romaji
pykakasi
Application
Server
FastAPI
Cache
SQLite WAL
Desktop
Tauri v2
Runtime
Rust
UI
React 19
Animation
Framer Motion
Discord
discord.js v14
Engineering

Built to production standard

Solo build · 7 days · AMD Developer Hackathon 2026
Full-stack, from GPU to glass.
One engineer. Five runtimes. Real-time AI inference on AMD MI300X.
TypeScript — React 19 overlay UI
Python — FastAPI AI pipeline server
Rust — Tauri desktop backend
JavaScript — Discord bot & voice capture
GLSL — Hyperspeed WebGL shaders
System architecture — data flow
01
A friend speaks Japanese in Discord
Each person’s voice is captured as a separate stream — not mixed with others. Miwa listens only to the Discord call and forwards the audio over an encrypted connection to AMD’s AI cloud for processing.
Node.js 18 · discord.js v14 · @discordjs/voice · per-speaker Opus stream · SSH tunnel
Encrypted SSH tunnel → AMD AI cloud
02
The AI cloud converts speech and translates it
Speech recognition converts the audio to Japanese text. Google Translate returns an English result in under 100ms — shown to you immediately. A large language model then refines the translation to match your chosen style (formal, casual, or gaming slang). Three context-aware reply suggestions are generated based on the conversation.
AMD MI300X · 192 GB HBM3 · openai-whisper STT · Llama 3.3 70B FP16 · vLLM 0.17.1 · Google Translate · CrewAI agents · Qdrant memory
WebSocket JSON → local machine · total <800ms
03
You see the translation floating above Discord
A transparent window stays on top of your screen without covering Discord. You see the Japanese text, a romaji pronunciation guide so you can read it aloud, and the English translation. Select a suggested reply and the Discord bot delivers it — or tap “I’ll Speak” to say it yourself using the phonetic guide.
Tauri v2 · Rust · React 19 · TypeScript strict · Jotai · Framer Motion · transparent always-on-top window
Step 02 in detail
Inside the AI cloud — AMD MI300X · 192 GB HBM3
Python · ROCm 7.2 · vLLM 0.17.1 · PyTorch 2.6.0
STT
openai-whisper
PCM → Japanese text + word-level timestamps for karaoke
<150ms
TRANSLATE · pass 1
Google Translate
Fast EN result → shown immediately as fast packet
<100ms
TRANSLATE · pass 2
vLLM — Llama 3.3 70B FP16
Style-refined EN → updates card as refined packet
<700ms
ROMAJI
pykakasi
JP text → phonetic reading for pronunciation guide
<5ms
AGENTS — deferred, does not block translation
CrewAI 3-agent pipeline · vLLM + Qdrant
Analyst — retrieves top-K past utterances from Qdrant (per-speaker vector memory), assesses conversation context and emotional register
Strategist — receives Analyst brief, decides reply type (agreement / question / reaction / game callout), produces structured handoff
Writer — generates 3 style-matched Japanese reply suggestions from Strategist’s brief · each suggestion pre-synthesized via edge-tts
~8–15s (deferred)
JavaScript — Discord bot
Python — AI server
Rust — Tauri backend
TypeScript — React UI
Python — FastAPI AI server (AMD MI300X)
Key engineering decisions
01
openai-whisper, not WhisperX
WhisperX uses CTranslate2 for inference acceleration — CTranslate2 has no ROCm backend. Running WhisperX on AMD MI300X silently falls back to CPU: 10–20× slower. openai-whisper runs natively on PyTorch + ROCm with full GPU utilization. Not a preference — the alternative would have broken the latency target entirely.
02
Tauri, not Electron
50 MB binary vs Electron’s 200+ MB. The Rust backend gives native always-on-top transparent windows with no chrome, direct OS API access, and a real security sandbox — non-negotiable for a persistent screen overlay.
03
SSH tunnel, not public port
All traffic between the local bot and AMD cloud routes through SSH port 22. Zero firewall rules, zero exposed WebSocket endpoints, zero attack surface. The production path is identical to the development path.
04
Per-speaker, not mixed audio
Discord’s VoiceReceiver provides separate Opus streams per user ID. Mixing destroys speaker identity before transcription. Each stream gets its own openai-whisper task, enabling per-speaker karaoke cards and context-aware reply suggestions.
05
FP16 + 80% VRAM cap
192 GB HBM3 fits Llama 3.3 70B in full FP16 — no INT4/INT8. Japanese honorifics, register, and verb endings encode in the tail of the probability distribution; INT4 truncates that tail. vLLM is capped at --gpu-memory-utilization 0.80, reserving ~38 GB for Whisper. The default 100% leaves zero VRAM for STT and fails silently at runtime.
06
pykakasi, not MeCab
MeCab requires a native shared library compiled against a specific OS and dictionary version — inside a ROCm Docker container with non-standard Python, that is a multi-hour dependency tangle. pykakasi is pure Python: one pip install, zero native deps, zero build step. The romaji quality difference is negligible for pronunciation guidance.