Real-time speech-to-speech translation pipeline recommendations for Android (offline-first)


I’m developing an Android app for instant speech-to-speech translation with a strong focus on offline operation and low latency.

Requirements:

  • Target devices: Android 10+ on mid-range hardware such as Snapdragon 778G with 8 GB RAM

  • Language pairs: EN↔RU and EN↔FR, with possible expansion later

  • Offline-first approach

  • Privacy-focused design, ideally without cloud APIs

  • Target latency: under 500 ms from microphone input to translated audio output

Current pipeline:
Speech-to-text → local translation model → text-to-speech

Problems I’m running into:

  1. Speech recognition latency is around 1.5 seconds even when using partial results

  2. Local translation models are too slow on mid-range devices

  3. The audio pipeline can block the UI if not handled carefully

Main questions:

  1. What architecture is most practical for sub-500 ms speech-to-speech translation on Android?

  2. Which local translation models are currently the best fit for mobile devices if size and inference speed are the main constraints?

  3. What is the best strategy for handling partial speech recognition results without triggering translation too early?

  4. What threading or pipeline design works best for this kind of audio workflow on Android?

I’d especially appreciate answers based on practical experience with TensorFlow Lite, ONNX Runtime Mobile, or similar mobile inference setups. My main goal is to understand what architecture and optimization strategy are realistic for low-latency offline translation on Android.

0
May 12 at 1:09 PM
User AvatarEugene Jackson
#android#translation#real-time#text-to-speech#question2answer

Accepted Answer

If it's not a secret, how much space does an offline language database take up on a device?

User Avataruser31774114
May 12 at 10:44 PM
0