You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the section on typographic units there is a discussion on extended grapheme clusters, using स्कूल as an example. The text says:
There are two syllables in this word: SA+VIRAMA+KA+UU and LA. Note, however, that there are three Unicode grapheme clusters here: SA+VIRAMA, KA+UU and LA.
Styling is done on the basis of the whole orthographic syllable, not the first character, nor even the first grapheme.
Unicode 15.1, UAX #29 added a new rule specifically for some Indic scripts:
GB9c rule only applies to extended grapheme clusters:
Do not break within certain combinations with Indic_Conjunct_Break (InCB)=Linker.
So स्कूल will be three extended grapheme clusters (['स्', 'कू', 'ल'] – SA+VIRAMA, KA+UU and LA) in Unicode 15.0 and prior versions, and two extended grapheme clusters (['स्कू', 'ल'] – SA+VIRAMA+KA+UU and LA) in Unicode 15.1 onwards.
So the effect of extended grapheme cluster level segmentation will depend on the Version of Unicode the toolchain is using at the pint of segentation.
The text was updated successfully, but these errors were encountered:
In the section on typographic units there is a discussion on extended grapheme clusters, using स्कूल as an example. The text says:
Unicode 15.1, UAX #29 added a new rule specifically for some Indic scripts:
So the following characters:
can now extend a grapheme cluster.
So स्कूल will be three extended grapheme clusters (['स्', 'कू', 'ल'] – SA+VIRAMA, KA+UU and LA) in Unicode 15.0 and prior versions, and two extended grapheme clusters (['स्कू', 'ल'] – SA+VIRAMA+KA+UU and LA) in Unicode 15.1 onwards.
So the effect of extended grapheme cluster level segmentation will depend on the Version of Unicode the toolchain is using at the pint of segentation.
The text was updated successfully, but these errors were encountered: