Line breaking rules #237

xfq · 2020-08-22T08:12:37Z

Reading the Chinese national standard GB/T 32411-2015 Information technology for the Uyghur, Kazagh, and Kirghiz editor common software, I noticed the following text:

No line should begin with period, comma, question mark, exclamation mark, exclamatory question mark, colon, dash, closing single quotation mark, closing double quotation mark, closing parenthesis, and closing book title mark.

No line should end with opening single quotation mark, opening double quotation mark, opening parenthesis, and opening book title marks.

I wonder if if Arabic/Persian has something similar. If so, I think we should document them (perhaps in § 4.1 Line breaking, see similar sections in clreq and jlreq).

By the way, should we document requirements in other languages using the Arabic script? For example, Arabic-derived Uyghur/Uighur requires marking of all vowels and uses hyphenation, which is different from Arabic and Persian.

asmusf · 2020-08-23T03:32:46Z

Those requirements can be generalized to:

No line should begin with sentence, clause or phrase-ending punctuation.
No line should end with sentence, clause or phrase-starting punctuation. (*)
No paired punctuation should appear on a line that does not contain some of the contents enclosed by the pair.

(*) this rule applies, for example, to Spanish use of inverted question and exclamation marks - it's easier to treat them as an anti-parallel case to sentence-ending punctuation instead of having regular question and exclamation mark have a dual nature by sometimes treating them as part of a pair...

These rules can the be combined with those that govern whether and how words themselves can be broken.

xfq · 2020-08-24T00:57:54Z

A generalized summary would help, but I think we also need to write the requirements clearly, otherwise the implementers don't know which punctuations are starting/ending punctuations (I don't know Arabic, but for Chinese, I don't know if connector marks or interpuncts are considered "ending punctuations" or not). Although there are some data in UAX #14 and CLDR/ICU, these data are not necessarily accurate, and we can make them clearer in the requirements.

r12a · 2020-08-25T12:16:58Z

I wonder if if Arabic/Persian has something similar. If so, I think we should document them (perhaps in § 4.1 Line breaking, see similar sections in clreq and jlreq).

Indeed, that's one of the more obvious sections for which the task force didn't yet provide detail.

By the way, should we document requirements in other languages using the Arabic script? For example, Arabic-derived Uyghur/Uighur requires marking of all vowels and uses hyphenation, which is different from Arabic and Persian.

Certainly, but not in this document, whose scope was limited in the group charter to Arabic and Persian because they were similar and the group participants were not familiar with Uighur.

I'd certainly be interested in getting hold of a copy (in English) of the standard you mentioned, so that we can apply that information it contains to our language enablement program.

r12a · 2020-08-25T12:36:13Z

Actually, what needs to be said here is a little more complicated than listing characters that should or shouldn't appear at one particular end of a line. Fwiw, at https://r12a.github.io/scripts/arabic/#linebreak_props you can find a list of the default Unicode line-break properties for the list of (non-ASCII) characters that i think are needed for Arabic (not Persian) language support (slightly different from the list in alreq, which was more closely tied to CLDR). It's possible that tailoring needs to be applied to the list for Arabic language text.

xfq added the question label Aug 22, 2020

r12a added the l:ug Uighur label Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Line breaking rules #237

Line breaking rules #237

xfq commented Aug 22, 2020

asmusf commented Aug 23, 2020

xfq commented Aug 24, 2020

r12a commented Aug 25, 2020

r12a commented Aug 25, 2020

Line breaking rules #237

Line breaking rules #237

Comments

xfq commented Aug 22, 2020

asmusf commented Aug 23, 2020

xfq commented Aug 24, 2020

r12a commented Aug 25, 2020

r12a commented Aug 25, 2020