Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChunkMatches: drop trailing newline #709

Merged
merged 7 commits into from
Dec 11, 2023
Merged

Conversation

camdencheek
Copy link
Member

@camdencheek camdencheek commented Dec 8, 2023

I'd like to start taking advantage of the NumContextLines option, but have been running into an issue where, if context lines were to extend past the end of the file, we get the trailing newline at the end of the file.

This is not desirable because the empty slice after a trailing newline is not treated as a "line" by any editor, so when we split on newlines, an empty line is shown to the user. By the time we're splitting on newlines, we do not know whether or not we're at the end of the file, so we cannot know whether we can trim that trailing newline, or whether it is, correctly, an empty line that should be rendered.

This is really annoying stuff that bites us elsewhere as well (searcher, syntax highlighter). It might be nice to unify the definitions of "what is a line" in some library, but for now, I'll be happy with "don't return an extra line."

Copy link
Contributor

github-actions bot commented Dec 8, 2023

Fuzz test failed on commit dfff4eb. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 7146729606 -n testdata

Copy link
Contributor

github-actions bot commented Dec 8, 2023

Fuzz test failed on commit d12ed33. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 7146864038 -n testdata

Copy link
Contributor

github-actions bot commented Dec 8, 2023

Fuzz test failed on commit c840017. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 7146890000 -n testdata

@camdencheek camdencheek force-pushed the cc/drop-trailing-newline branch from c840017 to 6ac8e22 Compare December 8, 2023 22:24
Comment on lines +26 to +34
# Pinned a commit to make go version configurable.
# This should be safe to upgrade once this commit is in a released version:
# https://github.com/jidicula/go-fuzz-action/commit/23cc553941669144159507e2cccdbb4afc5b3076
- uses: jidicula/go-fuzz-action@0206b61afc603b665297621fa5e691b1447a5e57
with:
packages: 'github.com/sourcegraph/zoekt' # This is the package where the Protobuf round trip tests are defined
fuzz-time: 30s
fuzz-minimize-time: 1m
go-version: '1.21'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to update this so the build doesn't fail when using new go 1.21 features.

Comment on lines +35 to +36
// Trim the last newline before splitting because a trailing newline does not constitute a new line
lines := bytes.Split(bytes.TrimSuffix(content, []byte{'\n'}), []byte{'\n'})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. one\ntwo\nthree\n is three lines, not four. Same as one\ntwo\nthree

@camdencheek camdencheek marked this pull request as ready for review December 8, 2023 22:42
@camdencheek camdencheek requested a review from a team December 8, 2023 22:42
@isker
Copy link
Contributor

isker commented Dec 10, 2023

I endorse this change 🌞:
https://github.com/isker/neogrok/blob/01e04579bb89a358ea8a015b6c31c46558045a31/src/lib/server/content-parser.ts#L134-L143

contentprovider.go Outdated Show resolved Hide resolved
Co-authored-by: Keegan Carruthers-Smith <[email protected]>
@camdencheek camdencheek merged commit e92f6c7 into main Dec 11, 2023
8 checks passed
@camdencheek camdencheek deleted the cc/drop-trailing-newline branch December 11, 2023 16:02
camdencheek added a commit that referenced this pull request Apr 18, 2024
The updates our "line model" to fix the edge cases that led to sourcegraph/sourcegraph#60605. In short, this changes the definition of a "line" to include its terminating newline (if it exists).

Before this, we had defined a "line" as starting at the byte after a newline (or the beginning of a file) and ending at the byte before a newline (or the end of the file).

The problem with that definition is that a newline that is the last byte in the file can never successfully be matched because we would trim that from the returned content, so any ranges that would match that trailing newline would be out of bounds in the result returned to the client. That's the reason behind the panics caused by #709, which was an attempt to formalize the "line does not include a trailing newline" definition.

So, instead, this redefines a line as ending at the byte after a newline (or the end of the file). This means that a regex can successfully and safely match a terminating newline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants