-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChunkMatches: drop trailing newline #709
Conversation
Fuzz test failed on commit dfff4eb. To troubleshoot locally, use the GitHub CLI to download the seed corpus with
|
Fuzz test failed on commit d12ed33. To troubleshoot locally, use the GitHub CLI to download the seed corpus with
|
Fuzz test failed on commit c840017. To troubleshoot locally, use the GitHub CLI to download the seed corpus with
|
c840017
to
6ac8e22
Compare
# Pinned a commit to make go version configurable. | ||
# This should be safe to upgrade once this commit is in a released version: | ||
# https://github.com/jidicula/go-fuzz-action/commit/23cc553941669144159507e2cccdbb4afc5b3076 | ||
- uses: jidicula/go-fuzz-action@0206b61afc603b665297621fa5e691b1447a5e57 | ||
with: | ||
packages: 'github.com/sourcegraph/zoekt' # This is the package where the Protobuf round trip tests are defined | ||
fuzz-time: 30s | ||
fuzz-minimize-time: 1m | ||
go-version: '1.21' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to update this so the build doesn't fail when using new go 1.21
features.
// Trim the last newline before splitting because a trailing newline does not constitute a new line | ||
lines := bytes.Split(bytes.TrimSuffix(content, []byte{'\n'}), []byte{'\n'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. one\ntwo\nthree\n
is three lines, not four. Same as one\ntwo\nthree
Co-authored-by: Keegan Carruthers-Smith <[email protected]>
The updates our "line model" to fix the edge cases that led to sourcegraph/sourcegraph#60605. In short, this changes the definition of a "line" to include its terminating newline (if it exists). Before this, we had defined a "line" as starting at the byte after a newline (or the beginning of a file) and ending at the byte before a newline (or the end of the file). The problem with that definition is that a newline that is the last byte in the file can never successfully be matched because we would trim that from the returned content, so any ranges that would match that trailing newline would be out of bounds in the result returned to the client. That's the reason behind the panics caused by #709, which was an attempt to formalize the "line does not include a trailing newline" definition. So, instead, this redefines a line as ending at the byte after a newline (or the end of the file). This means that a regex can successfully and safely match a terminating newline.
I'd like to start taking advantage of the
NumContextLines
option, but have been running into an issue where, if context lines were to extend past the end of the file, we get the trailing newline at the end of the file.This is not desirable because the empty slice after a trailing newline is not treated as a "line" by any editor, so when we split on newlines, an empty line is shown to the user. By the time we're splitting on newlines, we do not know whether or not we're at the end of the file, so we cannot know whether we can trim that trailing newline, or whether it is, correctly, an empty line that should be rendered.
This is really annoying stuff that bites us elsewhere as well (searcher, syntax highlighter). It might be nice to unify the definitions of "what is a line" in some library, but for now, I'll be happy with "don't return an extra line."