Improving test suite legibility #8

Azeirah · 2025-01-29T21:53:17Z

I've been working hard on a high-quality test suite! It's already getting somewhere and is giving good results, but it could be so, so, so much better!

Testing documentation

There's already a pretty good start for the testing documentation in testing.md

What makes a good test-suite?

When I think of automated testing in the context of ReMarkable and Remarks, I'm thinking of running automated tests to detect issues, as well as generating a visual report that can be inspected by humans.

The test suite as it stands right now

The test suite has three major parts:

Datatests - Where I run remarks on a large set of remarkable notebooks from real customers, to check for the most important issues
PDF output testing - Where PDF outputs are tested against various assertions
Markdown output testing - Where the Markdown output is tested against various assertions

Datatesting

This is proprietary and sensitive, because it tests on actual user data. I cannot and will not visually inspect the output data. But we can test whether we get crashes, we can look at warnings, we can look at execution time. This is what datatest.py does.

The most important output that datatest.py provides is the overview metric:

it also generates a simple inspectable sqlite db with stdout and stderr output as well as execution time. This is useful to find common errors.

I don't think I am looking to change anything about datatesting for now. It works well enough for what it is, and it runs on sensitive data, which I'm not a fan of.

PDF output testing

This testing happens in test_initial.py

This is the most important output test set. The set-up requires quite a lot:

An input remarkable notebook (supports either .rmn archives OR the legacy format of a directory)
Notebook metadata (see below)
A pytest test function

Let's take a look at the notebook metadata for a notebook called "gosper_notebook"

# A metadata object MUST be entirely hand-crafted and hand-checked
gosper_notebook = {
    "description": """
        The "gosper" notebook is a notebook with 3 pages with some notes on the
        Gosper curve and Lindenmayer systems.
        It was made on the ReMarkable 2.
        
        It is not an ebook, has no background, no layers and all pages are the default sizes.
        Everything in the notebook is written in black.
    """,
    # ReMarkable document name
    "notebook_name": "Gosper",
    # Where the ReMarkable document can be found
    ".rmn_source": "tests/in/v2_notebook_complex",
    "notebook_type": ReMarkableNotebookType.NOTEBOOK,
    # The amount of pages that are coming from a source PDF
    "pdf_pages": 0,
    ".rm_files": [
        {
            ".rm_file_version": ReMarkableAnnotationsFileHeaderVersion.V3,
            "output_document_position": 0
        }, {
            ".rm_file_version": ReMarkableAnnotationsFileHeaderVersion.V3,
            "output_document_position": 1
        }, {
            ".rm_file_version": ReMarkableAnnotationsFileHeaderVersion.V3,
            "output_document_position": 2
        }
    ],
    "export_properties": {
        "merged_pages": 3
    }
}

This is just a work-in-progress metadata object. The associated test case looks like this

@with_remarks(gosper_notebook['.rmn_source'])
def test_pdf_output():
    gosper_rmc = fitz.open(f"tests/out/{gosper_notebook['notebook_name']} _remarks.pdf")
    assert is_valid_pdf(gosper_rmc)
    assert gosper_rmc.page_count == gosper_notebook["export_properties"]["merged_pages"]

    # There should be a warning, since v3 is not yet supported by the rmc-renderer
    assert_scrybble_warning_appears_on_page(gosper_rmc, gosper_notebook['.rm_files'][0]['output_document_position'])
    assert_scrybble_warning_appears_on_page(gosper_rmc, gosper_notebook['.rm_files'][1]['output_document_position'])
    assert_scrybble_warning_appears_on_page(gosper_rmc, gosper_notebook['.rm_files'][2]['output_document_position'])

Now, when you run the test suite, you will get red/green on these assertions as well as for the test itself.

In my personal experience, this isn't enough context. It's nice, but it's not enough. Most of what is happening in this project is visual, and you need to be able to inspect what is happening from start to finish. If you get an error, you will then have to manually trace

"Wait, what was that file again?"
"Umm.. where does this come from?"
"Why does this error happen only for this document, and not for the other one?"

I believe this is a failure in providing the right context.

Markdown output testing

The markdown output testing is actually quite interesting, and luckily quite easy to work with. Text is almost always in a binary format, which means most of the tests have very clear yes or no results.

For these tests, I use parsita by Dr. Hagen, it is an amazing parser combinators library. (parser combinators are basically "What if regex was composable like Lego bricks?")

The entirety of the parsers looks like this:

r"""
 __  __            _       _                     
|  \/  |          | |     | |                    
| \  / | __ _ _ __| | ____| | _____      ___ __  
| |\/| |/ _` | '__| |/ / _` |/ _ \ \ /\ / / '_ \ 
| |  | | (_| | |  |   < (_| | (_) \ V  V /| | | |
|_|  |_|\__,_|_|  |_|\_\__,_|\___/ \_/\_/ |_| |_|

Lessons about parsita.

1. When invoking a parser, you _must_ consume all the tokens until the EOD or you will get a failure
   You can do this with 
   `{...} << whatever`
2. When you want to extract _one_ value out of a big text. You can say the following:
   parser_that_must_exist_around_it >> parser_that_follows >> another_parser << the_parser_you_care_about >> after_the_parser_you_care_about
   So:
   `{...} >> yes << whatever` => `Success<yes>`
3. Lambdas are evil. Do not use lambdas to create abstractions.
   While it may seem attractive to write a lambda to express a common pattern, this is not a good idea.
   The operators in parsita have specific meaning, and parsita is a language expressed with operators.
   When you write a function, the result of the operator is lost.
"""

any_char = reg(r'.') | lit("\n")
whatever = rep(any_char)
newline = lit('\n')

to_newline = reg(r'[^\n]+')

obsidian_tag = reg(r"#([a-z/])+")
frontmatter = opt(
    lit('---') >> newline >>
    lit("tags") >> lit(":\n") >> lit("- ") >> lit("'") >> obsidian_tag << lit("'") << rep(newline) <<
    lit("---") << rep(newline)
)
autogeneration_warning = lit("""> [!WARNING] **Do not modify** this file
> This file is automatically generated by Scrybble and will be overwritten whenever this file in synchronized.
> Treat it as a reference.""")
h1_tag = lit("# ")
h2_tag = lit("## ")
h3_tag = lit("### ")
h4_tag = lit("#### ")
h5_tag = lit("##### ")
h6_tag = lit("###### ")

With these parsers, it is both possible and easy to test whether a particular piece of text is in the right place in a document. Or to check ordering. Or to check whether something is explicitly not there, etc. It feels like "speaking" parser (thanks Dr Hagen parser combinator library maker! This library is amazing!)

Here, there is again a similar problem of missing context. When you're running the tests and something fails, you always need to jump back to the source document. It is less problematic here because the data is a lot more binary. It is either correct, or it is not. With the PDF tests there are a lot more details to take into consideration.

Problems

Given what I have written here so far, I think where the test suite shines at the moment is

Size: there is very little testing code for a lot of gain
Simple: It is very easy to run

Step 1: nix develop
Step 2: pytest

The biggest issue with the tests is what I've been calling a "context" issue. A lot of what the ReMarkable does is visual. You can write code and write tests all you want, but the only thing that really matters in the end is whether what you see on the device is "the same" as the output PDF and output Markdown.

It is often difficult to find the source files, as well as information about the source files, because they're scattered in the repository. Sometimes we don't even have pictures of the ReMarkable tablet at all! That in particular is the biggest problem.

I've been trying to remedy this issue by creating what I call "metadata objects"

(ie)

gosper_notebook = {
    "description": """
        The "gosper" notebook is a notebook with 3 pages with some notes on the
        Gosper curve and Lindenmayer systems.
        It was made on the ReMarkable 2.
        
        It is not an ebook, has no background, no layers and all pages are the default sizes.
        Everything in the notebook is written in black.
    """,
    # ReMarkable document name
    "notebook_name": "Gosper",
    # Where the ReMarkable document can be found
    ".rmn_source": "tests/in/v2_notebook_complex",
    "notebook_type": ReMarkableNotebookType.NOTEBOOK,
    # The amount of pages that are coming from a source PDF
    "pdf_pages": 0,
    ".rm_files": [
        {
            ".rm_file_version": ReMarkableAnnotationsFileHeaderVersion.V3,
            "output_document_position": 0
        }, {
            ".rm_file_version": ReMarkableAnnotationsFileHeaderVersion.V3,
            "output_document_position": 1
        }, {
            ".rm_file_version": ReMarkableAnnotationsFileHeaderVersion.V3,
            "output_document_position": 2
        }
    ],
    "export_properties": {
        "merged_pages": 3
    }
}

But this requires you to go back and forth between all kinds of files, metadata and such. It's a mess! It also forces you to interpret the metadata. Some of it is self-evident, but a lot of it isn't. Just because I know what I mean with "pdf_pages" doesn't mean that someone else knows what it means (for example, should a remarkable notebook that is a quicksheet have a "pdf_pages" value of 0, or something else?)

TL;DR

The test suite is already well underway, and this is super important for the Scrybble workflow to work well :)

The primary issue with the current test-suite is having to jump between code, input files, metadata about the notebook, pictures of the notebook and output files all the time. This is annoying.

I'm looking for a way to streamline this process. I'm thinking of something in the vein of Donald Knuth's literate programming, or perhaps moldable development?

Ideas welcome!

The text was updated successfully, but these errors were encountered:

Azeirah · 2025-01-29T22:13:55Z

Ideas

This is a placeholder for ideas.

Literate programming Donald Knuth-style (all context in one place)
Add CI? Probably not a bad idea in general
Create a slightly more moldable development environment with Nix.

Azeirah · 2025-02-04T00:02:26Z

Idea: Tests as a source-of-truth for functionality

I was thinking about something somewhat unconventional, and that is to add a known failure test case, so that we can measure progress over time.

There are still quite some open issues in remarks that remarks doesn't handle well, like

Color output
Per-page tags
Extract text writte with the folio
Smart highlights are missing
Probably more

It would be really nice if the test suite would distinguish between

"We know that X is not implemented yet, but we automatically test against it"

That's some hardcore TDD right there!

From this we can also easily derive documentation for the site, for example the roadmap and the FAQ.

The test-suite then becomes more as a sort of queryable database for the frontend :)

Azeirah · 2025-02-04T00:07:39Z

Idea: Generate small self-contained markdown reports to highlight outputs

I think it would be nice to create a folder with markdown reports that highlight how an input document looks as output. This would again be great as a source-of-truth for documentation on the site.

For example, I think it would be trivial to generate a markdown document like so, see #13 (comment) for an example how the images are sourced. This is already implemented.

# Remarks {version} - {date}

## On computable numbers, page 1

{HERE IS A PHOTO OF HOW IT LOOKS ON THE REMARKABLE PAPER PRO}

{HERE IS THE SCREENSHOT OF THE PDF GENERATED BY REMARKS OF THE SAME PAGE}

## On computable numbers, page 2

{HERE IS A PHOTO OF HOW IT LOOKS ON THE REMARKABLE PAPER PRO}

{HERE IS THE SCREENSHOT OF THE PDF GENERATED BY REMARKS OF THE SAME PAGE}

## {...Another interesting page}

{HERE IS A PHOTO OF HOW IT LOOKS ON THE REMARKABLE PAPER PRO}

{HERE IS THE SCREENSHOT OF THE PDF GENERATED BY REMARKS OF THE SAME PAGE}

This way, I'd never have to manually update the site's documentation if it is linked to the output of the test suite.

Azeirah mentioned this issue Feb 3, 2025

Improve test framework #13

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving test suite legibility #8

Improving test suite legibility #8

Azeirah commented Jan 29, 2025 •

edited

Loading

Azeirah commented Jan 29, 2025

Azeirah commented Feb 4, 2025 •

edited

Loading

Azeirah commented Feb 4, 2025 •

edited

Loading

Improving test suite legibility #8

Improving test suite legibility #8

Comments

Azeirah commented Jan 29, 2025 • edited Loading

Testing documentation

What makes a good test-suite?

The test suite as it stands right now

Datatesting

PDF output testing

Markdown output testing

Problems

TL;DR

Azeirah commented Jan 29, 2025

Ideas

Azeirah commented Feb 4, 2025 • edited Loading

Idea: Tests as a source-of-truth for functionality

Azeirah commented Feb 4, 2025 • edited Loading

Idea: Generate small self-contained markdown reports to highlight outputs

Azeirah commented Jan 29, 2025 •

edited

Loading

Azeirah commented Feb 4, 2025 •

edited

Loading

Azeirah commented Feb 4, 2025 •

edited

Loading