EdgeMind — A Privacy-First Google Assistant Alternative

active
Android Kotlin Gemma 4 E2B LiteRT-LM Native audio Tool calling On-device AI
EdgeMind — A Privacy-First Google Assistant Alternative

EdgeMind

EdgeMind is an Android voice assistant that runs entirely on the device — no cloud, no account, no telemetry. You hold a button, talk to it, and Gemma 4 — running locally — either answers or calls a tool (timer, calendar, music control, web search, flashlight…). Your voice never leaves the phone.

The model at the centre is Gemma 4 E2B with native audio input — Google’s mobile-first ~2B-effective-param multimodal LLM. It’s loaded through LiteRT-LM (Google’s on-device runtime), with the audio bytes from the mic going straight into the model and tool calls dispatched automatically through litertlm’s reflection layer.

Why this exists

Open-source voice assistants on Android exist — Dicio is the obvious one and a genuinely good project. But Dicio (and the Rhasspy-style siblings before it) is built around command grammars and skill plugins: a fixed pattern matcher with handlers wired in. That works really well for “set a timer for 10 minutes” and falls off a cliff the moment the user phrases something the grammar didn’t predict.

I wanted to build the version where an LLM does the reasoning instead of a parser, so the same assistant can handle “what’s on my calendar” and “remind me to take the chicken out in 25 minutes” and “what was the F1 result yesterday” without anyone having to enumerate the patterns up front. With on-device LLMs finally fast enough on modern phones — and Gemma 4 shipping with native audio input so you don’t even need a separate STT — that’s tractable. EdgeMind is what falls out of doing it that way.

The privacy framing isn’t decorative either. An assistant where every prompt is recorded by an ad company is a different product than one that works on the device alone. Google Assistant ships your audio to a server. EdgeMind doesn’t.

How it works

The model. Gemma 4 E2B with native audio input, distributed as a .litertlm file (~2.5–3 GB). It’s downloaded from HuggingFace’s litert-community on first run with HTTP Range resume, not bundled in the APK. With native audio, there’s no separate Whisper/STT stage — the mic bytes go straight into the model alongside the system prompt, and Gemma transcribes-and-reasons in one pass.

The runtime — and why not MediaPipe. I started on com.google.mediapipe:tasks-genai, the obvious choice. It refused to load Gemma 4 with audio: the model’s audio adapter section is CPU-pinned in the .litertlm metadata, but tasks-genai’s audio executor is hardcoded GPU-only and dies trying to materialise LlmParameters. The fix was to drop down a layer to LiteRT-LM (com.google.ai.edge.litertlm:litertlm-android:0.10.2), which exposes per-modality backends — GPU LLM, CPU audio — which is what the model actually requires. The Kotlin API itself is documented well enough (Engine, ConversationConfig, ToolSet / @Tool / @ToolParam); what isn’t documented is the operational layer below.

Audio wiring (the WAV-header trick). The mic produces raw 16 kHz mono 16-bit PCM through AudioRecord — exactly the format the model expects. But pushing those raw bytes into LiteRT-LM throws INTERNAL: Failed to initialize miniaudio decoder, error code: -10. LiteRT-LM decodes audio through miniaudio, which sniffs format from the header — raw PCM has none. The fix is a 44-byte RIFF/WAVE header prepended in-process; same bytes, miniaudio recognises the format, the model gets its audio. That single trick was the difference between “audio works” and “audio is impossible.”

Backend cascade. GPU (OpenCL) → GpuArtisan → CPU, tried in order at engine init. Android’s System.loadLibrary("OpenCL") only searches the standard linker namespace and misses vendor-installed OpenCL on most Mali/Adreno devices, so I probe the common absolute paths (/vendor/lib64/, /system/vendor/lib64/, etc.) explicitly. If a backend dies mid-generation, it’s marked bad, the engine is closed, and the next turn re-inits on the next-best backend. Audio always stays pinned to CPU — that section of the model card requires it.

Tool calling. Each tool (set a timer, read or create calendar events, control music, look up contacts, toggle flashlight, change volume, search the web, launch apps) is a Kotlin function on a ToolSet annotated with @Tool / @ToolParam, registered into Hilt as a multibound set. At conversation start, all ToolSets are passed to litertlm with automaticToolCalling=true, and the JNI handles the loop natively: model emits a tool call → JNI reflects into the @Tool function → result is fed back → generation continues. From the app’s side it’s just a stream of Content.Text tokens.

Voice in / voice out. Push-to-talk only today — hold the mic button, talk, release. AudioRecord captures at 16 kHz mono with a 30 s cap matching the model’s max audio segment, and Gemma’s reply is piped back through Android’s built-in TextToSpeech engine. The system prompt is rebuilt each turn with the actual current date, time, and zone — without that, the model fabricates dates from its 2023 training cutoff.

What works today

End-to-end voice loop with real tool calls:

  • “What’s on my calendar tomorrow?” → reads upcoming events from the system calendar.
  • “Add an event for dinner with Marina at 8 tonight.” → writes to the primary calendar with the right timezone.
  • “Turn the flashlight on.” → toggles the camera torch.
  • “Search the web for the F1 standings.” → opens the browser with the query.
  • Plus timers, volume, music transport (play/pause/next/prev), now-playing, and launching installed apps by name.

All of that with audio captured locally, no STT round-trip, and the model deciding when to call a tool versus when to answer directly.

Stack

  • Kotlin 1.9.20 · min SDK 26 (Android 8.0) · target SDK 34 (Android 14)
  • Gemma 4 E2B with native audio (.litertlm, downloaded on first run from litert-community/gemma-4-E2B-it-litert-lm)
  • LiteRT-LM 0.10.2 (com.google.ai.edge.litertlm:litertlm-android) — per-modality backends
  • Android TextToSpeech for replies · AudioRecord for the mic (16 kHz mono PCM, WAV-wrapped before hand-off)
  • Hilt 2.57 for DI · multibound ToolSets for tools
  • Jetpack Compose + Material 3 · Coroutines + Flow for streaming
  • Room for conversation history
  • Clean Architecture: domain · data · presentation

Things I’m proud of

Picking the right runtime. Both MediaPipe tasks-genai and LiteRT-LM are documented and load Gemma 4 in the happy path. Neither doc tells you that Gemma 4 E2B’s audio adapter is CPU-pinned in the .litertlm and that tasks-genai’s audio executor is hardcoded GPU. That’s the kind of thing you find by reading the failure mode and matching it against what the runtime actually exposes. LiteRT-LM lets you set the audio backend per-modality; that’s the only path that works today.

The WAV-header workaround. Three lines of ByteBuffer and a RIFF/WAVE/fmt /data header prepended to the PCM — and audio input goes from “miniaudio error -10” to working end-to-end. Smallest fix for the biggest unblock.

Backend cascade with vendor OpenCL probing. GPU → GpuArtisan → CPU, with manual dlopen against the seven common Android vendor paths because the system linker namespace doesn’t see them. If a backend fails mid-generation it’s quarantined and the next turn falls through to the next option, transparent to the user.

Tool calling without a tool-loop. litertlm runs the call → execute → continue dance natively through reflection on @Tool-annotated Kotlin functions. The app just defines tools and streams tokens; the assistant loop is the runtime’s problem, not mine.

No cloud, no account, no telemetry. It falls out of the architecture, not a marketing tagline. There’s no server-side dependency to break.

Where it’s going

The hard parts are working: Gemma 4 loads, audio in goes through, tools dispatch, replies come out the speaker — all push-to-talk for now. What’s left is the polish layer — registering EdgeMind as the device’s default assistant via RoleManager.ROLE_ASSISTANT and a VoiceInteractionService, then a hands-free mode behind on-device VAD for users who’d rather not hold a button.

Source: github.com/IgnacioLD/edgemind.