ChunkMatches: drop trailing newline #709

camdencheek · 2023-12-08T22:04:06Z

I'd like to start taking advantage of the NumContextLines option, but have been running into an issue where, if context lines were to extend past the end of the file, we get the trailing newline at the end of the file.

This is not desirable because the empty slice after a trailing newline is not treated as a "line" by any editor, so when we split on newlines, an empty line is shown to the user. By the time we're splitting on newlines, we do not know whether or not we're at the end of the file, so we cannot know whether we can trim that trailing newline, or whether it is, correctly, an empty line that should be rendered.

This is really annoying stuff that bites us elsewhere as well (searcher, syntax highlighter). It might be nice to unify the definitions of "what is a line" in some library, but for now, I'll be happy with "don't return an extra line."

github-actions · 2023-12-08T22:05:14Z

Fuzz test failed on commit dfff4eb. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 7146729606 -n testdata

github-actions · 2023-12-08T22:21:09Z

Fuzz test failed on commit d12ed33. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 7146864038 -n testdata

github-actions · 2023-12-08T22:24:12Z

Fuzz test failed on commit c840017. To troubleshoot locally, use the GitHub CLI to download the seed corpus with

gh run download 7146890000 -n testdata

camdencheek · 2023-12-08T22:40:29Z

.github/workflows/ci.yml

+      # Pinned a commit to make go version configurable.
+      # This should be safe to upgrade once this commit is in a released version:
+      # https://github.com/jidicula/go-fuzz-action/commit/23cc553941669144159507e2cccdbb4afc5b3076
+      - uses: jidicula/go-fuzz-action@0206b61afc603b665297621fa5e691b1447a5e57
        with:
          packages: 'github.com/sourcegraph/zoekt' # This is the package where the Protobuf round trip tests are defined
          fuzz-time: 30s
          fuzz-minimize-time: 1m
+          go-version: '1.21'


Needed to update this so the build doesn't fail when using new go 1.21 features.

camdencheek · 2023-12-08T22:41:06Z

contentprovider_test.go

+			// Trim the last newline before splitting because a trailing newline does not constitute a new line
+			lines := bytes.Split(bytes.TrimSuffix(content, []byte{'\n'}), []byte{'\n'})


e.g. one\ntwo\nthree\n is three lines, not four. Same as one\ntwo\nthree

isker · 2023-12-10T05:19:11Z

I endorse this change 🌞:
https://github.com/isker/neogrok/blob/01e04579bb89a358ea8a015b6c31c46558045a31/src/lib/server/content-parser.ts#L134-L143

contentprovider.go

Co-authored-by: Keegan Carruthers-Smith <[email protected]>

The updates our "line model" to fix the edge cases that led to sourcegraph/sourcegraph#60605. In short, this changes the definition of a "line" to include its terminating newline (if it exists). Before this, we had defined a "line" as starting at the byte after a newline (or the beginning of a file) and ending at the byte before a newline (or the end of the file). The problem with that definition is that a newline that is the last byte in the file can never successfully be matched because we would trim that from the returned content, so any ranges that would match that trailing newline would be out of bounds in the result returned to the client. That's the reason behind the panics caused by #709, which was an attempt to formalize the "line does not include a trailing newline" definition. So, instead, this redefines a line as ending at the byte after a newline (or the end of the file). This means that a regex can successfully and safely match a terminating newline.

drop trailing newline

dfff4eb

camdencheek added 2 commits December 8, 2023 15:19

add explanatory comment

76aafd6

update go version for fuzz testing

d12ed33

use pinned version

6ac8e22

camdencheek force-pushed the cc/drop-trailing-newline branch from c840017 to 6ac8e22 Compare December 8, 2023 22:24

camdencheek added 2 commits December 8, 2023 15:25

add explanation

5c2e4fc

update e2e_test

083cb67

camdencheek commented Dec 8, 2023

View reviewed changes

camdencheek marked this pull request as ready for review December 8, 2023 22:42

camdencheek requested a review from a team December 8, 2023 22:42

keegancsmith approved these changes Dec 11, 2023

View reviewed changes

contentprovider.go Outdated Show resolved Hide resolved

Update contentprovider.go

cc6096f

Co-authored-by: Keegan Carruthers-Smith <[email protected]>

camdencheek merged commit e92f6c7 into main Dec 11, 2023
8 checks passed

camdencheek deleted the cc/drop-trailing-newline branch December 11, 2023 16:02

camdencheek mentioned this pull request Feb 24, 2024

search: panic in ChunkMatch.MatchedContent for fileContainsFilterJob sourcegraph/sourcegraph-public-snapshot#60605

Open

camdencheek mentioned this pull request Mar 18, 2024

Always include trailing newline #747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChunkMatches: drop trailing newline #709

ChunkMatches: drop trailing newline #709

camdencheek commented Dec 8, 2023 •

edited

Loading

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

camdencheek Dec 8, 2023

camdencheek Dec 8, 2023

isker commented Dec 10, 2023

		// Trim the last newline before splitting because a trailing newline does not constitute a new line
		lines := bytes.Split(bytes.TrimSuffix(content, []byte{'\n'}), []byte{'\n'})

ChunkMatches: drop trailing newline #709

ChunkMatches: drop trailing newline #709

Conversation

camdencheek commented Dec 8, 2023 • edited Loading

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

github-actions bot commented Dec 8, 2023

camdencheek Dec 8, 2023

Choose a reason for hiding this comment

camdencheek Dec 8, 2023

Choose a reason for hiding this comment

isker commented Dec 10, 2023

camdencheek commented Dec 8, 2023 •

edited

Loading