Replies: 1 comment 2 replies
-
That is very interesting! Thanks for contacting the creator of that IR dataset, if you wouldn't mind sharing the email that would be useful. As for the non-commercial components, its actually not just on the validation features. For the pre-trained models much of the training data itself is from data that either has non-commercial licensing, or is composed of data with mixed/unknown licensing and thus should conservatively be considered as non-commercial. It should be possible to train models without this data, but you'd have to find a good (and large) combination of speech, background noise, and music all with permissive licensing. I'd be happy discuss some options here, if you already have some datasets you are considering. |
Beta Was this translation helpful? Give feedback.
-
Wasn't sure exactly where to put this, but I reached out to the creator of the impulse response dataset used for the pretrained and for the background noise regarding license and confirmed it was MIT 4.0 for huggingface hosted data and for the reference in the colab notebook. I can show the email as well if you would like to confirm.
Related to this, I would just like to confirm that the non-commercially licensed component of the pretrained models is only for the validation features correct? If possible would you be able to provide the code that creates the validation featuresl? I assume it is just what is in the collab notebook but it is mentioned specfic subsets of some of the data are selected so I would like to keep that consistent, then just remove the non-commercial licensed datasets and try ones that are commercially licensed.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions