I'm Building My Kids an AI Learning Console. Here's Why.
My kids — Jaxsen (9) and Adalind (6) — ask Alexa questions all day. The answers are terrible. Shallow, ad-adjacent, often wrong in the specific way that sounds plausible enough that a kid won't question it. So I'm building something better.
What exists, and why it falls short
Tablets work. They also capture attention in a way that makes them hard to put down, and the "educational" framing usually lasts about ten minutes before it turns into YouTube. I'm not opposed to screens — I'm opposed to screens that are optimized for engagement metrics instead of actual learning.
Voice assistants are fast and always available, which is genuinely useful. But they're designed to answer one question and move on. They can't tutor. They can't notice when a kid is struggling, change the pacing, try a different explanation. They're lookup tables with a voice.
Educational apps are the most honest product category: they do a specific thing, and then you're done. Khan Kids is good. Duolingo is fine. They just can't talk to your kid like a person.
What I'm building instead
Blip is a mini PC tucked behind the living room TV. Wake word is "Hey Blip." It listens, thinks, and responds — and it can carry a real conversation. Spelling bee, math practice, interactive story, trivia, help with homework. When the TV is on it shows visual feedback. When the TV is off it works in audio-only mode, like a smart speaker that actually knows how to teach.
The hardware is a Beelink SER5 MAX — a small AMD Ryzen box that costs around $250 and fits in one hand. Plus an Anker USB speakerphone for far-field mic pickup. Total hardware cost under $300. It sits behind the TV and the kids don't interact with it directly at all.
Privacy by design
Speech-to-text runs locally on the device using Whisper — an open-source model from OpenAI that you can run on your own hardware. My kids' voices never leave the house. The only network call is to Claude (Anthropic's API) to generate a response, and that's just text.
I have a separate machine on my home network with a lot of GPU memory that handles structured tasks locally — spelling, math, trivia — using a smaller model called Llama 3.3. Simpler questions don't need to leave the house at all. Claude handles anything creative or emotionally nuanced.
What's next
Phase 1 — getting the audio pipeline working — is done. Wake word detection, transcription, response, speech output. The loop works. Phase 2 is integrating Claude properly so Blip can actually hold a conversation and run structured activities.
I'll keep writing about this as it comes together. If you've built something similar, or have thoughts on voice interfaces for kids, I'd like to hear from you.
You don't need a computer science degree to start building things for your kids. You just need a reason to start.