Multiple activations for long wake words #100

jiaz · 2024-01-18T06:27:38Z

jiaz
Jan 18, 2024

Hi! Thanks for the excellent library! I'm testing the model with the pre-trained 'hey javis' wake word. It appears that the example streaming detection implementation feed each 80ms sample chunk to the model to detect the word. In this case, the 'hey jarvis' is a bit long (probably 300ms-ish) so it looks like the model will detect multiple activations of this wake word. I think a simple debounce logic would take care of this issue, but I just want to learn if there are standard techniques to handle it. Would it make sense to increase the chunk size, for example? Thanks!

Answered by dscripka

Jan 18, 2024

Thank you, I'm glad you are finding it useful!

Yes, the current way the model works (independent predictions on chunks of audio with a sliding window) does often produce multiple activations withing a short time, as the audio data associated with the word is still present in the chunk. To your point, openWakeWord does not currently implement any debounce logic. This was originally an intentional choice as different deployment environments and application scenarios might require different types of logic for what happens after an activation. But I agree that providing at least default approach for this situation would help.

As for increasing chunksize, that actually would have a similar out…

View full answer

dscripka · 2024-01-18T14:02:14Z

dscripka
Jan 18, 2024
Maintainer

Thank you, I'm glad you are finding it useful!

Yes, the current way the model works (independent predictions on chunks of audio with a sliding window) does often produce multiple activations withing a short time, as the audio data associated with the word is still present in the chunk. To your point, openWakeWord does not currently implement any debounce logic. This was originally an intentional choice as different deployment environments and application scenarios might require different types of logic for what happens after an activation. But I agree that providing at least default approach for this situation would help.

As for increasing chunksize, that actually would have a similar outcome to debounce logic. In the current version of openWakeWord, increasing the chunksize will still predict internally at the same rate (every 80 ms), but will then take the maximum prediction within the chunk and simply return that. So by increasing chunksize you decrease the latency of the response, but increase the chances that you'll just have 1 activation.

1 reply

jiaz Jan 19, 2024
Author

Thank you @dscripka for the detailed response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple activations for long wake words #100

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Multiple activations for long wake words #100

jiaz Jan 18, 2024

Replies: 1 comment · 1 reply

dscripka Jan 18, 2024 Maintainer

jiaz Jan 19, 2024 Author

jiaz
Jan 18, 2024

Replies: 1 comment 1 reply

dscripka
Jan 18, 2024
Maintainer

jiaz Jan 19, 2024
Author