-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scoring: remove IDF from BM25 scoring #912
base: main
Are you sure you want to change the base?
scoring: remove IDF from BM25 scoring #912
Conversation
We remove IDF because we want to use BM25 scoring for keyword search and our query-time calculation of IDF won't work anymore if terms are AND-ed (keyword search) instead of OR-ed (codycontext). Our evaluations show a slight improvement which supports the decision to treat IDF as constant. This is also in line with how we calculate line-based BM25. Test plan: updated unit tests
@@ -79,8 +79,8 @@ func TestBM25(t *testing.T) { | |||
query: &query.Substring{Pattern: "example"}, | |||
content: exampleJava, | |||
language: "Java", | |||
// bm25-score: 0.58 <- sum-termFrequencyScore: 14.00, length-ratio: 1.00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove the score from the comments because it is redundant and requires an update for every update of the scoring function. sum-termFrequencyScore and length-ratio are more robust.
Couple questions
|
|
Although I had the same reaction as @mmanela, that we should really avoid deviating from the classic BM25 formula, we did an analysis here that made me feel okay about this: https://linear.app/sourcegraph/issue/SPLF-838/use-bm25-for-multi-term-keyword-searches. TL;DR: eval results on every dataset improved, and IDF may not be as critical for the code search use case. |
We remove IDF from our BM25 scoring, effectively treating it as constant.
This is supported by our evaluations which showed that for keyword style queries, IDF can down-weight the score of important keywords too much, leading to a worse ranking. The intuition is that for code search, each keyword is important independently of how frequent it appears in the corpus.
Removing IDF allows us to apply BM25 scoring to a wider range of query types. Previously, BM25 was limited to queries with individual terms combined using OR, as IDF was calculated on the fly at query time.
Test plan:
updated tests