Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295

Conversation

pedberg-icu
Copy link
Contributor

@pedberg-icu pedberg-icu commented Jan 19, 2025

CLDR-18080

  • This PR completes the ticket.

This ticket adds a Hant-Latn transform which uses 101 readings that are different for Hant/TW, then calls the main Hani-Latn transform for everything else. It also adds Hans-Latn as an alias for the current Hani-Latn transform, and adds test data for both. Also, as noted in the ticket:

  • The current Han-Latin transform has the following context rule:
    沈 } \u0020? 阳 ->shěn;# 沈 is shěn (not chén) if followed by 阳 yáng: 沈阳 city Shěnyáng; this dates from before Unicode 14, at which point the kMandarin value for U+6C88 沈 was changed from “chén” to “shěn chén”. With that change the rule became obsolete for Hans. However it is still slightly relevant for Hant; if the simplified form 沈阳 of the name for the city Shenyang is encountered in a Hant/Taiwan context, the first character should be transliterated as shěn rather than as chén. So this rule should be moved to the Hant-Latn transform.

ALLOW_MANY_COMMITS=true

@pedberg-icu pedberg-icu force-pushed the CLDR-18080-add-Hant-Latn-transform-and-adjust-Hani-Latn branch from 70dd582 to 2a08662 Compare January 19, 2025 18:55
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@AEApple AEApple requested a review from srl295 January 22, 2025 17:38
# Convert compounds; these are added individually, not derived from Unihan kMandarin.
# Here Han-Spacedhan() has not yet been applied.
# The following was moved from Hans-Latn; in a Hant/Taiwan context, the simplified-form city name 沈阳 should still transform to shěnyáng.
沈 } 阳 →shěn;# 沈 is shěn (not chén) if followed by 阳 yáng: 沈阳 city Shěnyáng
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hans-Latn also matched 沈 阳, but this only matches 沈阳 without a space; is this intended?

Copy link
Contributor Author

@pedberg-icu pedberg-icu Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertbastian @srl295 I think the behavior for the new Hant is more correct, we should fix (under a different PR) Hans-Latn to do the Han-SpacedHan after handling the compounds. Filed a ticket: https://unicode-org.atlassian.net/browse/CLDR-18254

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good besides @robertbastian comment about spaces

[苧]→zhù; # U+82E7
# END From Unicode 17, the above should be autogenerated:
# Then run the normal Hani-Latn transform for the rest
::Hani-Latn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice solution!

@pedberg-icu pedberg-icu merged commit 958d087 into unicode-org:main Jan 22, 2025
12 checks passed
@pedberg-icu pedberg-icu deleted the CLDR-18080-add-Hant-Latn-transform-and-adjust-Hani-Latn branch January 22, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants