-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identifying unexpected token error positions #7
Comments
We inject invalid rules and we have a mechanism in the C parser to abort the backtracking. For instance, check: https://github.com/python/cpython/blob/master/Grammar/python.gram#L106
That is a tokenizer error, the tokenizer has reached the end of the source while expecting more tokens. This error has been improved in Python3.10 (when is possible to retain the source, like when using "-c"):
|
Without special error rules you can still do a decent job. Just make the error point at the last token read (assuming your tokenizer is “lazy”, i.e. only tokenized as far as needed by the parser). In most cases this gives adequate errors. |
@pablogsal, I was rather looking for information on how the parser handles cases where there isn't a special error rule. @gvanrossum Thanks, that works. Earlier I found the part of the code in the repo here that is doing what you say, and I just finished getting it to work for my parser - seems to do the job! I really enjoyed reading the blogs BTW. If I hadn't found them I'd still be hacking away at my project's old LALR(1) parser. |
Is not technically "fixed" but "improved". Notice the old error is still correct: there was an unexpected end of file token while parsing. With our parser, is not always easy to emit the improved version because we don't have all the text that we parsed (for instance, when reading from stdin). |
I hope it's ok to ask a general question about pegen here. I've built a parser following the blog posts and code here, and it generally works really nicely but one thing I found missing in the blog is how to handle unexpected tokens. As far as I understand, the recursive descent parser will continue to backtrack on unexpected input until it reaches the first rule again, unless you define an explicit rule to handle particular errors. I assume there is some strategy to identify the token that caused the error, like with how Python's parser knows the error in the following line is the
*
:How/where is pegen handling this sort of error? Or, if pegen doesn't handle this error, where is Python's parser handling it (since I know it does!)?
Aside: while trying a different invalid Python syntax example I got something unexpected:
With 3.9.2 this gives the error
which seems to be not showing the line with the error (but it's marking it). The same behaviour happens when
hi(
is in a file. Is this a bug with Python's new parser's error handling (and if so, should I report it)?The text was updated successfully, but these errors were encountered: