-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line breaking rules #237
Comments
Those requirements can be generalized to:
(*) this rule applies, for example, to Spanish use of inverted question and exclamation marks - it's easier to treat them as an anti-parallel case to sentence-ending punctuation instead of having regular question and exclamation mark have a dual nature by sometimes treating them as part of a pair... These rules can the be combined with those that govern whether and how words themselves can be broken. |
A generalized summary would help, but I think we also need to write the requirements clearly, otherwise the implementers don't know which punctuations are starting/ending punctuations (I don't know Arabic, but for Chinese, I don't know if connector marks or interpuncts are considered "ending punctuations" or not). Although there are some data in UAX #14 and CLDR/ICU, these data are not necessarily accurate, and we can make them clearer in the requirements. |
Indeed, that's one of the more obvious sections for which the task force didn't yet provide detail.
Certainly, but not in this document, whose scope was limited in the group charter to Arabic and Persian because they were similar and the group participants were not familiar with Uighur. I'd certainly be interested in getting hold of a copy (in English) of the standard you mentioned, so that we can apply that information it contains to our language enablement program. |
Actually, what needs to be said here is a little more complicated than listing characters that should or shouldn't appear at one particular end of a line. Fwiw, at https://r12a.github.io/scripts/arabic/#linebreak_props you can find a list of the default Unicode line-break properties for the list of (non-ASCII) characters that i think are needed for Arabic (not Persian) language support (slightly different from the list in alreq, which was more closely tied to CLDR). It's possible that tailoring needs to be applied to the list for Arabic language text. |
Reading the Chinese national standard GB/T 32411-2015 Information technology for the Uyghur, Kazagh, and Kirghiz editor common software, I noticed the following text:
I wonder if if Arabic/Persian has something similar. If so, I think we should document them (perhaps in § 4.1 Line breaking, see similar sections in clreq and jlreq).
By the way, should we document requirements in other languages using the Arabic script? For example, Arabic-derived Uyghur/Uighur requires marking of all vowels and uses hyphenation, which is different from Arabic and Persian.
The text was updated successfully, but these errors were encountered: