CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295

pedberg-icu · 2025-01-19T18:40:29Z

This PR completes the ticket.

This ticket adds a Hant-Latn transform which uses 101 readings that are different for Hant/TW, then calls the main Hani-Latn transform for everything else. It also adds Hans-Latn as an alias for the current Hani-Latn transform, and adds test data for both. Also, as noted in the ticket:

The current Han-Latin transform has the following context rule:
沈 } \u0020? 阳 ->shěn;# 沈 is shěn (not chén) if followed by 阳 yáng: 沈阳 city Shěnyáng; this dates from before Unicode 14, at which point the kMandarin value for U+6C88 沈 was changed from “chén” to “shěn chén”. With that change the rule became obsolete for Hans. However it is still slightly relevant for Hant; if the simplified form 沈阳 of the name for the city Shenyang is encountered in a Hant/Taiwan context, the first character should be transliterated as shěn rather than as chén. So this rule should be moved to the Hant-Latn transform.

ALLOW_MANY_COMMITS=true

jira-pull-request-webhook · 2025-01-19T18:56:03Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

robertbastian · 2025-01-22T17:19:21Z

common/transforms/Hant-Latin.xml

+# Convert compounds; these are added individually, not derived from Unihan kMandarin.
+# Here Han-Spacedhan() has not yet been applied.
+# The following was moved from Hans-Latn; in a Hant/Taiwan context, the simplified-form city name 沈阳 should still transform to shěnyáng.
+沈 } 阳 →shěn;# 沈 is shěn (not chén) if followed by 阳 yáng: 沈阳 city Shěnyáng


Hans-Latn also matched 沈阳, but this only matches 沈阳 without a space; is this intended?

@robertbastian @srl295 I think the behavior for the new Hant is more correct, we should fix (under a different PR) Hans-Latn to do the Han-SpacedHan after handling the compounds. Filed a ticket: https://unicode-org.atlassian.net/browse/CLDR-18254

srl295

Looks good besides @robertbastian comment about spaces

srl295 · 2025-01-22T19:52:02Z

common/transforms/Hant-Latin.xml

+[苧]→zhù;   # U+82E7
+# END From Unicode 17, the above should be autogenerated:
+# Then run the normal Hani-Latn transform for the rest
+::Hani-Latn();


nice solution!

pedberg-icu requested review from macchiati, btangmu and FrankYFTang January 19, 2025 18:40

pedberg-icu self-assigned this Jan 19, 2025

CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn

2a08662

pedberg-icu force-pushed the CLDR-18080-add-Hant-Latn-transform-and-adjust-Hani-Latn branch from 70dd582 to 2a08662 Compare January 19, 2025 18:55

pedberg-icu requested a review from robertbastian January 22, 2025 16:48

AEApple requested a review from srl295 January 22, 2025 17:38

robertbastian reviewed Jan 22, 2025

View reviewed changes

srl295 approved these changes Jan 22, 2025

View reviewed changes

pedberg-icu merged commit 958d087 into unicode-org:main Jan 22, 2025
12 checks passed

pedberg-icu deleted the CLDR-18080-add-Hant-Latn-transform-and-adjust-Hani-Latn branch January 22, 2025 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295

CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295

pedberg-icu commented Jan 19, 2025 •

edited

Loading

jira-pull-request-webhook bot commented Jan 19, 2025

robertbastian Jan 22, 2025

pedberg-icu Jan 22, 2025 •

edited

Loading

srl295 left a comment

srl295 Jan 22, 2025

CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295

CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295

Conversation

pedberg-icu commented Jan 19, 2025 • edited Loading

jira-pull-request-webhook bot commented Jan 19, 2025

robertbastian Jan 22, 2025

Choose a reason for hiding this comment

pedberg-icu Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

srl295 left a comment

Choose a reason for hiding this comment

srl295 Jan 22, 2025

Choose a reason for hiding this comment

pedberg-icu commented Jan 19, 2025 •

edited

Loading

pedberg-icu Jan 22, 2025 •

edited

Loading