python.gram Rust/C++/Borgo compatible match syntax: parse without delimiters #107

adsharma · 2024-12-12T18:38:20Z

If I remove the endpmatch delimiter from both the test case and the grammar, we fail to parse.
I've tried running parser.py -vv test3.py and tried to understand why it fails. But it isn't entirely clear.

Is there a solution to this problem such as:

Reordering rules
Add more invalid_foo rules

I would really like to avoid using endpmatch.

The text was updated successfully, but these errors were encountered:

adsharma · 2024-12-14T21:57:27Z

Replacing endpmatch with a semicolon works. However, I'd like to avoid these delimiters in the interest of keeping things clean.

Here's the fundamental issue:

def foo():
    a = 10
    return a

works fine. However,

def foo():
    a = pmatch ...
    return a

fails to parse because there is a DEDENT at the end of match_expr, which eats up multiple indentations and multiple newlines.

So after parsing a = pmatch ... as a simple_stmt or a statement, the parser is not able to proceed to consume return a as another statement.

How can I change the grammar so that INDENT means newline + n spaces and DEDENT means newline - n spaces? In this example n=4.

If it's not possible because this is baked deeply into python grammar, I could give up and accept the least intrusive delimiter and move on.

adsharma · 2025-01-24T20:28:46Z

I spent sometime looking into how DEDENT is handled in the tokenizer. My reading of the code is that you're using python's C tokenizer and this behavior is coming from there. I understand the motivation here is to maximize compatibility.

Does it make sense to have another tokenizer that handles DEDENT explicitly where the fine grained control, not compatibility matters more?

adsharma · 2025-01-25T19:18:55Z

https://github.com/adsharma/python-grammar/blob/main/tokenizer.py

Has a pure python implementation of the tokenizer. Haven't tested it with the generated parser yet. But it serves as a useful starting point of discussion for how the tokenizer should handle DEDENT.

adsharma mentioned this issue Jan 24, 2025

PEG Parser? evhub/coconut#862

Open

adsharma mentioned this issue Jan 24, 2025

Add test_dedent #108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python.gram Rust/C++/Borgo compatible match syntax: parse without delimiters #107

python.gram Rust/C++/Borgo compatible match syntax: parse without delimiters #107

adsharma commented Dec 12, 2024

adsharma commented Dec 14, 2024 •

edited

Loading

adsharma commented Jan 24, 2025

adsharma commented Jan 25, 2025

python.gram Rust/C++/Borgo compatible match syntax: parse without delimiters #107

python.gram Rust/C++/Borgo compatible match syntax: parse without delimiters #107

Comments

adsharma commented Dec 12, 2024

adsharma commented Dec 14, 2024 • edited Loading

adsharma commented Jan 24, 2025

adsharma commented Jan 25, 2025

adsharma commented Dec 14, 2024 •

edited

Loading