Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CombDetector, a naive polyphonic pitch detector #6

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alesgenova
Copy link
Owner

Simple approach to attempt multi pitch detection:

The new CombDetector does the following:

  • has an internal monophonic pitch detector (e.g. McLeod)
  • detects the monophonic pitch of the signal using
  • if a pitch is found, create a comb filter that removes the the pitch frequency and all of its octave from the original signal
  • feed the new signal to the monophonic detector
  • iterate...

Caveats:

  • It doesn't work that well, McLeod and autocorrelation don't work well with signals that have multiple pitches
  • Looks like a larger signal window is needed to have any type of result (4096?)

@wagnerf42
Copy link
Contributor

hi alessandro, that's very nice of you.

so, i gave it a try and it does not work.

actually i have been thinking of using the fact i know which frequency is the parasite string and to filter it out but i'm really a beginner with sound processing and my attempt also did not work.

i've uploaded my code here : https://gitlab.ensimag.fr/wagnerf/listen/
it's not meant for release yet. i need to clean a lot of things inside. i don't know if you are a musician ? you can give a try if you like. i've already had a lot of fun with it. works nicely with clarinet and saxophone and well with my mandolin, even at high speed. the 'multipitch_test' contains some wav files i recorded with my mandolin. A and F are single pitch and FA are both together.

thanks for your help

@alesgenova
Copy link
Owner Author

@wagnerf42

Thank you for providing the sound samples, I was too lazy to record them myself :)

So I played around with the F# A mandolin sample from your report and the polyphonic detector actually works better than I expected. I'm writing some test cases using it so you can use it as an example. I'll push the test cases tomorrow to this branch, but in the meantime this is the preliminary result I'm getting:

For reference I'm using the samples 4000...20384 from your wave file (remove the plucking part and cutting the low volume tail).

I pick a 4096 sample moving window (the window moves by 2048 samples at each iteration) and feed it to the polyphonic detector. As you can see from the output, in most cases the algorithm is able to detect both notes (without considering the octave)

---- mcleod_poly_wave stdout ----
Window t = 0 s
Poly #0 - Chosen Peak idx: 101.15764; clarity: 0.7213225; freq: 435.95325 +/- 4.352661
Poly #1 - Chosen Peak idx: 119.00466; clarity: 0.76116675; freq: 370.57373 +/- 3.1403198
Window t = 0.023219954 s
Poly #0 - Chosen Peak idx: 101.48783; clarity: 0.73747027; freq: 434.53485 +/- 4.3242493
Poly #1 - Chosen Peak idx: 118.77221; clarity: 0.7906468; freq: 371.29898 +/- 3.1526794
Window t = 0.04643991 s
Poly #0 - Chosen Peak idx: 101.55038; clarity: 0.76859653; freq: 434.2672 +/- 4.3189087
Poly #1 - Chosen Peak idx: 118.98824; clarity: 0.8014235; freq: 370.62485 +/- 3.1412048
Window t = 0.06965987 s
Poly #0 - Chosen Peak idx: 101.64789; clarity: 0.7927725; freq: 433.85062 +/- 4.3105774
Poly #1 - Chosen Peak idx: 118.74246; clarity: 0.78980315; freq: 371.392 +/- 3.1542664
Window t = 0.09287982 s
Poly #0 - Chosen Peak idx: 101.66748; clarity: 0.8066324; freq: 433.76703 +/- 4.308899
Poly #1 - Chosen Peak idx: 119.20936; clarity: 0.758316; freq: 369.9374 +/- 3.1295166
Window t = 0.116099775 s
Poly #0 - Chosen Peak idx: 101.78122; clarity: 0.8120416; freq: 433.2823 +/- 4.2992554
Poly #1 - Chosen Peak idx: 119.21018; clarity: 0.71151227; freq: 369.93484 +/- 3.1294556
Window t = 0.13931973 s
Poly #0 - Chosen Peak idx: 101.92821; clarity: 0.8117292; freq: 432.65747 +/- 4.2867737
Poly #1 - Chosen Peak idx: 120.873375; clarity: 0.6857633; freq: 364.8446 +/- 3.0436096
Window t = 0.16253968 s
Poly #0 - Chosen Peak idx: 102.32695; clarity: 0.79495287; freq: 430.9715 +/- 4.253296
Poly #1 - Chosen Peak idx: 123.24959; clarity: 0.65354437; freq: 357.81052 +/- 2.9269104
Window t = 0.18575963 s
Poly #0 - Chosen Peak idx: 102.93913; clarity: 0.77097565; freq: 428.4085 +/- 4.2025757
Poly #1 - Chosen Peak idx: 127.076485; clarity: 0.65991896; freq: 347.0351 +/- 2.7525635
Window t = 0.20897959 s
Poly #0 - Chosen Peak idx: 104.08295; clarity: 0.7293231; freq: 423.70053 +/- 4.1102905
Poly #1 - Chosen Peak idx: 104.537506; clarity: 0.62709546; freq: 421.85815 +/- 4.074463
Window t = 0.23219955 s
Poly #0 - Chosen Peak idx: 105.37428; clarity: 0.72733015; freq: 418.50818 +/- 4.0097046
Poly #1 - Chosen Peak idx: 103.932655; clarity: 0.62606245; freq: 424.31323 +/- 4.122223
Window t = 0.2554195 s
Poly #0 - Chosen Peak idx: 104.56516; clarity: 0.7400443; freq: 421.74658 +/- 4.072296
Poly #1 - Chosen Peak idx: 105.45749; clarity: 0.6326864; freq: 418.17798 +/- 4.0033264
FAILURES: 4 TOTAL: 24

@wagnerf42
Copy link
Contributor

hey that's really nice. I'll definitely give it a try again. maybe my window size was too small actually.
i only tested with 1024 and moving by 256 that's maybe why i did not get it.
octave is no big deal for me. I do a modulo 12 on my notes and that's fine.

by the way i was wondering how you handle the moving part. since your input is a slice you are forced to copy the whole window at least from time to time (in my current code i do it at each move).

I've changed a bit the input types to allow structs which are not slices :
https://github.com/wagnerf42/pitch-detection (current head)
this way we could use a ring buffer as input.

the drawback is that the input types are less readable somehow. the inner code does not change much though.
what do you think ?

as a side note i'm going to be traveling home today, so i might be offline for a few days but i'll definitely be back soon.

thanks again and have a nice christmas.

@alesgenova
Copy link
Owner Author

hey that's really nice. I'll definitely give it a try again. maybe my window size was too small actually.
i only tested with 1024 and moving by 256 that's maybe why i did not get it.
octave is no big deal for me. I do a modulo 12 on my notes and that's fine.

I added a few tests that use recordings of real instruments (for example your mandolin), see my latest commit to this PR.
I'm starting to think that it might even be worth merging this PR at some point, if the results are even slightly useful, what do you think?

by the way i was wondering how you handle the moving part. since your input is a slice you are forced to copy the whole window at least from time to time (in my current code i do it at each move).

Yes, I'm making a copy (see the get_chunk helper function in the test file).

I've changed a bit the input types to allow structs which are not slices :
https://github.com/wagnerf42/pitch-detection (current head)
this way we could use a ring buffer as input.

the drawback is that the input types are less readable somehow. the inner code does not change much though.
what do you think ?

I wouldn't worry about the performance hit of copying the full window when consecutive windows are overlapping instead of the smaller part that is changing.
This is because within the pitch detection algorithm we already have to do several operations which are more expensive than this, for example:

  • 2 Fast Fourier Transforms (forward and inverse) which are O(N logN)
  • Copying the real signal to/from scratch complex float arrays

However, I'm open to consider a change to the signature of the input types, if it increases the compatibility of the library with different common inputs. Could you give me some examples?

as a side note i'm going to be traveling home today, so i might be offline for a few days but i'll definitely be back soon.

thanks again and have a nice christmas.

Thank you, have fun on your break!

@wagnerf42
Copy link
Contributor

hi alessandro, happy new year.

so, i took a quick look.

  • for the copy i wanted to use https://crates.io/crates/dasp_ring_buffer but &Fixed does not implement IntoIterator. it's not a big deal anyway we can postpone that.
  • for the polyphonic api, don't you think it would be better to return a Vec<Pitch<T>> instead of a Vec<Option<Pitch<T>>> ? it makes more sense for me
  • if you are interested the way to write the comb detector in functional is to loop on the range iterator, using scan with the signal as the state. scan is a bit tricky the first time you see it though. you can then directly collect into the vector.
  • now, I tested the polyphonic algorithm and I must have a mistake somewhere because it makes no sense. i'm having way worse results and it makes no sense because i said it would be ok if any pitch matches my target. even with the limit equal to 1. I looked at your code and mine but didn't manage to find what is wrong. i must have been doing a silly mistake somewhere. i'll look again tomorrow.

@wagnerf42
Copy link
Contributor

hi, so i did some more testing.

i get consistent results now. the bad news is that in my specific case the comb detector did not really manage to improve detections.
i now save all mis-detected notes on the disk and post-process them with different windows sizes / detectors.
the funny thing is that when playing it my brain could swear everything is fine but when listening to the audio sample I cannot figure which note it is.

the good news is that i will get a steady stream of mis-identified samples now. i'm also still going to use the multi pitch detector when i add chords.

as a side note i wonder if it would'nt be a better api to add a new method to the PitchDetector instead of having a new PolyphonicDetector trait.

Fred

@alesgenova
Copy link
Owner Author

hi, so i did some more testing.

i get consistent results now. the bad news is that in my specific case the comb detector did not really manage to improve detections.
i now save all mis-detected notes on the disk and post-process them with different windows sizes / detectors.
the funny thing is that when playing it my brain could swear everything is fine but when listening to the audio sample I cannot figure which note it is.

the good news is that i will get a steady stream of mis-identified samples now. i'm also still going to use the multi pitch detector when i add chords.

as a side note i wonder if it would'nt be a better api to add a new method to the PitchDetector instead of having a new PolyphonicDetector trait.

I'm not sure I want to add another method to the basic PitchDetector trait, because then the other two monophonic pitch detectors would have to implement it without the actual functionality. But I like your suggestion of changing the return type to Vec<Pitch<T>> instead of Vec<Option<Pitch<T>>>, so I'll do that before merging this PR.

I will also include the tips you gave above regarding the loop in the CombDetector, thank you for that.

Finally, I'll write an isolated function to apply the comb filter to a signal, so you can play around with the various mis-identified samples and check if tweaking the parameters in the filter could help or if it's a hopeless approach.

@wagnerf42
Copy link
Contributor

hi, sorry for the delay.

I tried another api.
it's here https://github.com/wagnerf42/pitch-detection/
in the poly branch.

in my opinion it is better. you don't need an extra trait, you don't need to box the detector and you get an iterator instead of a vector.

i did it quite quickly so, it's untested for now but i just re-factored your code.

@wagnerf42
Copy link
Contributor

actually I maybe found something nice. if i decrease the clarity threshold then suddenly the polyphonic detector gives me better results than the standard one. it actually makes sense i guess. i'll do more testing with that.

@alesgenova
Copy link
Owner Author

Sorry for the delay, I need to get around to reviewing and merging you branch!

I'm surprising that the results improve, the first pitch detected by the polyphonic and regular detector should be the same (given the same clarity value). I might make sense that a lower clarity gives you better results though, with a value too high it's possible that we discard relevant peaks. But with a value too low we may pick irrelevant noise frequencies. What value is working best for you?

@wagnerf42
Copy link
Contributor

hi, no worry for the delays, it's all fine.
the first peak is always the same between monophonic and polyphonic.
what changes is that when i actually have two peaks and a threshold of 0.7 then i see no peak at all. but in polyphonic with a threshold of 0.5 then i see both peaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants