I have a couple of Raspberry Pi-based voice satellites still using Rhasspy software linked to Home Assistant. Mike, the guy behind Rhasspy, worked for Mycroft and now for Nabu Casa (the company behind Home Assistant) doing a wonderful job of developing a fully local voice assistant.
Mike developed Rhasspy as a generalised voice toolkit - but it hasn’t got much attention in the last couple of years. The modules he has developed since (Rhasspy v3 morphed into Wyoming) are designed to be general use … but the focus has been more on integrating into Home Assistant than documenting how to use them stand-alone.
I am interested in Home Automation rather than AI, so these are my limited experiences/opinions and not to be taken as fact.
From a RasPi hardware point of view, we commonly used reSpeaker 2-mic HATs from seeed (and several copies including the Adafruit Voice Bonnet) - but seeed didn’t put much effort into the drivers and software; and then stopped supporting them. I mean, the device proudly boasts 2-mics … and the hardware for both mics does work - but the supplied driver only uses one
One user kindly has been updating the driver for later kernel releases - but there has been no active development for many years.
Since then seeed released 4-mic and USB reSpeakers … but they never caught on (in the Rhasspy community anyway).
Certainly you can add microphone and speaker to your magic mirror RasPi … but you won’t get good quality without adding Digital Signal Processing (DSP).
Recently I see several ESP32-S3 based voice satellites utilising a XMOS DSP chip; including the ReSpeaker Lite (which claims to work with an ESP32 or Raspberry Pi). I am curious to try it .. real soon now.
As for the back-end … RasPi is a great general-purpose computing platform - but voice recognition (STT and TTS) really benefits from more specialised AI-style hardware. Google and amazon do it “in the cloud” and don’t want you think about the implications thereof. They have spent years and millions of dollars on high powered hardware and AI software to make it look easy.
The fact is that it requires millions of calculations, similar to those graphics cards do to generate the reflections on waves in your latest game. You can get slow low-quality responses on a RasPi - but it is preferable for your Raspberry Pi to pass AI requests to one of the cloud-based services (with the massive hardware and quick response times) - or to set up your own PC with modern graphics card to do the AI.
Objective
If you are wanting it mainly to control your home (“open the blind”, “turn off the light”, “Is the garage door closed?”) I would definitely recommend you look at Home Assistant and “year of voice” (now in its second year
). Recently they have been adding general AI capabilities (including to anonymously ask Alexa or Google) where Home Assistant can’t provide the answer.
Personally I have just purchased a Home Assistant Voice Preview Edition device (ESP32-S3 microcontroller with XMOS Digital Signal Processor, 2-mics, speaker and status LEDs for Aus$120 with delivery) for my Home Assistant … and next job will be to upgrade the software in my RasPi units to bring them into the same Wyoming system.