Replies: 18 comments 9 replies
-
I like the suggestion. Also, the similar topic was raised some time ago: #941 But it looks like your structure is more thoughtful. |
Beta Was this translation helpful? Give feedback.
-
Could you please describe the detailed transform for all grammars in the repository? I'll suggest fixes if it's required. |
Beta Was this translation helpful? Give feedback.
-
@KOLANICH I've resisted changes lie this for quite a while. However, with the number of grammars there are now, I think it might be time. I like the structure you've proposed. One of the reasons I've been concerned to accept a change like this is that I am worried it will be a barrier to people finding a grammar they're looking for. Could an index of grammars be generated and published as part of this? |
Beta Was this translation helpful? Give feedback.
-
The problem with any index is that it has to be updated. It can be automated, though. |
Beta Was this translation helpful? Give feedback.
-
I think the first thing to do is propose the new directory structure and where the grammars currently reside would be moved to. We don't have a "C++" directory but "cpp", and we don't have a Bash grammar at all. I'm sure there will be several grammars that fit into multiple categories. For example, I have grammars for many parser generator systems, including tree-sitter, which is a JSON structured-document that represents a context-free grammar. What would these all fall under? I worry that unless there is an index, I won't be able to find a grammar. As @teverett suggests, perhaps what we should have is a generated index page where one would enter search terms. And if I'm working on a particular grammar, I can set up an alias to combine a Note, the only other realistic grammar database that I know of is Grammar Zoo (index page for the repo). The github repository for this website is https://github.com/slebok/zoo. You can peruse that repo and see how Zaytsev (https://grammarware.net/) organized it. Note, each grammar is described by a meta file (zoo.xml) containing the author, date written, how it was written (e.g., "scraped"), source, DOI for papers, etc. See this example: https://github.com/slebok/zoo/blob/master/zoo/ada/ada83/ichbiah/zoo.xml. The meta could contain searchable terms, which would be a way of generating the indexing page. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for letting me know about this project. In fact I haven't known about Zaytsev work and has created (well, not really "created", it is very immature) something similar (an own DSL with the goal to be transpiled (and work after transpilation) into DSLs of as many different parser gens as possible (also a wrapper is generated to use the built AST uniformly) ), and my main motivation for this proposal was to have them structured, so for me not to get mad when porting your grammars into my DSL.
It seems that the organization relies more on XML files than on directory structure, at least https://github.com/slebok/zoo/tree/master/zoo looks like a pile similar to the one we see in this repo. The hierarchy I propose for this repo is more influenced by the one we (I'm a contributor of that repo) use in https://github.com/kaitai-io/kaitai_struct_formats/ .
Fortunately, one can enter search terms into GitHub search, and it works without JavaScript, but to be honest, I dislike the ranking: https://github.com/antlr/grammars-v4/search?q=json&type=code&l=ANTLR doesn't have the JSON grammars on the first lines. |
Beta Was this translation helpful? Give feedback.
-
<source>
<author>Jean D. Ichbiah</author>
<title>Preliminary Ada reference manual; Syntax Summary</title>
<subtitle>ACM SIGPLAN Notices, Volume 14 Issue 6a</subtitle>
<date>June 1979</date>
<specific>pages E-1 to E-5 (142-146)</specific>
<link>
<doi>10.1145/956650.956651</doi>
</link>
</source> In UG and KS we inline this kind of metadata into grammars themselves under a |
Beta Was this translation helpful? Give feedback.
-
The .g4 files can have comments (block |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I have to say, as a retired long-time programmer, I don't find the proposed organization any better than the flat model we currently have. One person's obvious hierarchy is another person's chaos. I think we'd be far better off with a structured metadata file in each grammar's root directory, and an automatically-recreated index file in the repo's root based on those files. |
Beta Was this translation helpful? Give feedback.
-
I have a preliminary PR for Github Actions to generate an index file (grammars.json for now) that calls @parrt 's _script/mkindex.py script. It's only kicked off when there is a push into "master", not PRs. The file contains everything needed for lab.antlr.org for selecting a grammar and input file. Right now, it's a control that offers a flat view of the entire grammars. You can try it out here. If we offer a structured view, we're going to need to be able to find the grammar easily with a search term, like "cpp" or "c++", as "one person's ... hierarchy is another person's chaos." I couldn't find anything without setting up some scripts. If we change the structure of the repo, the select control should also probably be redesigned to reflect the file system organization. |
Beta Was this translation helpful? Give feedback.
-
My thought was to merge the generated markdown into the main repo readme.md, and then link to the appropriate readme files where they exist. |
Beta Was this translation helpful? Give feedback.
-
I just looked at all 228 README files. 200 (87%) of them are for grammars where the directory name in this repository is the proper noun for the thing being parsed. That's pretty good - it means that 62% of the 321 grammars in this repository are verifiably stored in the most likely place someone seeking them would look. Here is the list of all 28 grammars with READMEs that aren't in proper-noun directories. Some of them are still pretty obvious. Do we actually have a problem that is worth discussing and trying to solve?
|
Beta Was this translation helpful? Give feedback.
-
I think the best way to implement this is to use the Github Pages. A nice website can be implemented for the repo on a branch, say "gh-pages". Then when the repo is updated with new grammars, the gh-pages branch is updated with new information on grammars that are "deployed" using Gihub Actions. The basic github.com/antlr/grammars-v4 view would still be what it is, but you can have a UI at https://antlr.github.io/grammars-v4/ that presents the grammars the way you want to organize them. Markdown itself doesn't have tables that can be sorted by column selection. You need Javascript for that, but Github Pages allows you to do that. |
Beta Was this translation helpful? Give feedback.
-
@parrt that was my perspective too; fold it into the readme. |
Beta Was this translation helpful? Give feedback.
-
I'm liking Ken's idea to tag grammars with various classifications, but echoing @RossPatterson, maybe we don't actually have a real problem here. Have we gotten any feedback that suggest people can't find what they need I just digging around in the subdirectories? |
Beta Was this translation helpful? Give feedback.
-
Currently, no problem other that some grammars are hidden under subdirectories such as /asm and /esolang. However if we did refactor into numerous subdirectories, I'd rather provide an index than ask people to dig through the source tree. |
Beta Was this translation helpful? Give feedback.
-
I could be ok with an alphabetical table. While I don't support asking people to recurse through directories, but ctl-F search on readme.md seems reasonable? |
Beta Was this translation helpful? Give feedback.
-
It is a bit hard to navigate this repo when all the dirs are piled into the main dir.
It is proposed to reorganize it by introducing dirs with demantic names and moving parsers' dirs into them.
The proposed dir hierarchy:
config
- config files and records.grammar
- DSLs describing other grammars.text
- grammars like the ones for tools like ANTLRddl
- DSLs for describing binary grammars, likeprotobuf
,flatbuffers
,capnproto
,FlexT
and so onprogramming
- programming and scripting languages, like C++ or bash.programms
- for parsing output of software, when it is infeasible to use a machine-readable interface.protocols
- for interfacing servers or devices, single command per line, such as SCPI, AT, JTAG consoles, SMTP, stuff like this.serialization
- serialization languages, like JSON, YAML, protobuf and CSV.embedded
- grammars used as parts of other formats, that don't belong to anywhere elseidentifiers
- various identifiers, like SSNs, phone numbers, VIN-codes, UUID and so onnetwork
- network addresses: IPv4, IPv6, MAC, IMEI,products
- product namebers, likeHTE721010A9E630
The rest of identifiers should stay in
root
untill it is decided to where they are to be moved.Beta Was this translation helpful? Give feedback.
All reactions