Rolyantrauts #3
Replies: 2 comments 3 replies
-
Hey @StuartIanNaylor, moving this to Discussions as these topics are more suited to that format. |
Beta Was this translation helpful? Give feedback.
-
More broadly, I can't really speak to higher-level design decisions around open-source digital assistant platforms and how to structure the core services for them. It seems like there are many viable ways to handle the distribution of services to centralized and edge devices, but that isn't something I focused on with openWakeWord. It could easily be used in either type of configuration. I am interested in the concept of improving a KWS model through use, and I have some (very) initial thoughts about how that might be done using the method behind openWakeWord. If you also have some ideas about this, can you open a new discussion topic focused on that? Thanks! |
Beta Was this translation helpful? Give feedback.
-
I got barred from the Rhasspy forum for continuing with the same argument for 2 years and that I have constantly tried to dispel myths on audio hardware that for some reason seem to want sales.
Finally it looks like Rhasspy is going to be partitioned into modules and much of the superfluous website and methods cast off as really what we are doing is so super easy that the majority of the complexity of Rhasspy is to support the web intereface and this strangely over complex 'Hermes' protocol that is also there without need.
You can read what I wrote and pretty much boiled down need to the lowest common denominator and too right I was critical of the Rhasspy 'Satelite' being raised one more time and I actually had the temerity to forward some ideas.
https://community.rhasspy.org/t/2023-year-of-voice/4130/8?u=rolyan_trauts
We need an open and simple Voice system for linux that is a bring and buy of hardware, kws, skill servers that can be utilised with multiple systems without hardcoded system requirements. We need the absolute oppisite of the Google Assistants, Siri and Bixbies that are there to enforce system and hardware choice and worse of all the idea a small herd can is just delusional.
I don't use rhasspy because it just doesn't work well and have been trying to research ways to fill the gaps and been critical to what doesn't work well to highlight what does need dev and implementing.
So I can not converse with you guys on the forum as I am locked out and if this email is insincere or delusional from Michael I don't know 'I'd still like you to be part of the community and Rhasspy going forward, with civil discussions about what should be done differently from everyone' as how when my account has been deactivated when there was absolutely nothing uncivilised about what I said anyway.
So I have had the stuffing knocked out of my KWS motivation just as finally there was interest and knowledge of trying to provide something that actually works well and proofs via imperical testing and hopefully discourse and exchange of opinion.
I might regain some interest in the new year but not too sure at the moment.
Voice infrastructure is purely serial and my take with KWS is a 'KWS server' that is nothing more that a queue router to the next step in the voice chain and is pretty much standalone that can pass metadata in a zonal format probably inherited from the audio out system of use.
All that is needed is audio, zonal data and trigger source and those can be just passed in files from conf without any embedded protocol needs.
As audio if your passing to ASR its likely file based as much SOTA ASR has quite wide beamsizes where phonetic sentence context is a huge part of accuracy, but if you are passing to an intermediary audio processing section its likely it should be a stream, so the ability to have both is likely needed.
I am interested if you guys have any ideas on local data collection and on-device training in a 'KWS Server' so KWS can improve through use?
Beta Was this translation helpful? Give feedback.
All reactions