Distributed DIY Voice Assistant

back
Tell me about any recent project-
you worked on using Paithan
- Random Recruiter on call
Mii Chat

My online friends made fun of me because I didn't have a mic.
So I made my own mic using my Game Console.
That mic evolved into a node for my voice assistant.

Distributed DIY Voice Assistant

I was unable to afford a microphone, so I made use of my PS Vita as a microphone and ended up using Dear ImGui to make a custom client app for my PS Vita to connect with my computer that is running my custom voice assistant using an Ollama model and lots of algorithms, rules and fuzzy logic.

Initially I made a single node version of this using my laptop and its inbuilt mic. You can check the below video.

Table of contents →

  1. Setting up the Environment
  2. Sending Audio over WiFi
  3. Server Setup
  4. Voice Assistant (alter-ego)
  5. Client App
  6. Demo Video on YouTube

Setting up the Environment

  • My PS Vita was the only device I had which had a mic and could stay on my desk for a long time. Also, my PSV is hacked/jailbroken and can run custom code. Therefore, I decided to use it.
  • I first installed vitasdk, which is the environment to make homebrew apps for PSV. I grabbed the code for sample apps from the Official GitHub and compiled a hello world program to check if it would run. After compiling, it produced a .vpk file which works like an Android .apk file in design. They both are fancy zip files with rules for where the files should be copied to, etc.
    Once I moved the file to my PSV via FTP and installed it, I could see the hello world program working.
  • Hello world program (using vitasdk) where
mii chat

Hmmm...
Those Hexadecimal numbers?
Seems like I can transfer it to my PC via WiFi to make a DIY wireless Mic to make calls.

Sending audio via WiFi

  • I copied the boilerplate code from the samples to setup a network. I used SCE_NET_SOCK_DGRAM to setup a UDP network. I chose UDP because I first wanted to check if PSV's 2.4 GHz 802.11n WiFi (it does not match true 802.11n speeds though, I only get a max of around 2 megabytes/sec)is good enough for modern standards. Surprisingly, it worked very well → given my router is literally above my computer monitor.
  • Connecting something via wifi requires an IP Address. Initially I had hardcoded the IP Address of my computer, but since I also use my laptop sometimes, it was not ideal to hardcode. So I used file operations to read the IP Address from a file instead:
    sceNetInetPton(SCE_NET_AF_INET, SERVER_IP, &server_audio.sin_addr)
    server_audio.sin_port = sceNetHtons(SERVER_PORT)
  • #define NET_PARAM_MEM_SIZE (1*1024*1024) → This is how much space is allotted for the networking stack of the entire program. 1*1024*1024 here means 1 MB → networking stack RAM.
  • Code for the network stack
mii chat

I used port 2012 because I had this feeling of impending doom when I was writing the code for this.
Therefore -> 2012.
(。﹏。) don't ask...

mii chat

やった
Math time
(ミ^ᆽ^ミ)

Server Setup (The Receiver End)

  • Since I use Linux and PipeWire I can use the pacat command line utility like so to create a fake audio device.
mii chat

Only Walkie.
No Walkie-Talkie yet.
Until I implement 2 way communication.
(。╯︵╰。)

Voice Assistant (alter-ego)

mii chat

It's called alter-ego because this project was heavily inspired by Chihiro from Danganronpa.

I then made a few rules like "open browser": lambda: (speak("Opening browser"), send2vita("Opening browser"), notify("Opening Browser"), subprocess.Popen(["zen-browser"]))

command dictionary

Here are some more example commands which demonstrate the assistant being able to do system level tasks like changing volume, brightness, keyboard backlight, taking a screenshot, opening YouTube, etc.

But the assistant was not perfect when I tried to use it. Since vosk converts human speech to English, a slight difference in accent could make it misunderstand the command and not do anything at all, as it's not defined in it's command dictionary. Therefore, I implemented a Fuzzy Logic selection method using RapidFuzz to make sure it can narrow down what the user has said to something in the command dictionary. fuzzy matching code snippet

Fuzzy Logic means checking how similar 2 strings are, instead of a strict binary output like both are same or both are not the same. It gives a score on how similar the 2 strings are.

mii chat

Simple, effective,
elegant
ヽ(´▽`)/

wake word detection code

A voice assistant needs to respond by sound, not by us pressing a button

mii chat

This was done because once I was yelling at myself "shut up" and it fuzzy matched it to "shutdown" and shutdown my PC...

wake word responses

A voice assistant needs a personality. These are the things it says when it detects a wake word.

fuzzy score < 60 responses

It says one of these if fuzzy match score is below 60.

Client App

mii chat

it's basically my test code which i use to test graphical capabilities of various hardware. yes, it runs everywhere, like doom.

Screenshot of the program