-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295
CLDR-18080 add Hant-Latn transform, add Hans-Latn alias for Hani-Latn #4295
Conversation
70dd582
to
2a08662
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
# Convert compounds; these are added individually, not derived from Unihan kMandarin. | ||
# Here Han-Spacedhan() has not yet been applied. | ||
# The following was moved from Hans-Latn; in a Hant/Taiwan context, the simplified-form city name 沈阳 should still transform to shěnyáng. | ||
沈 } 阳 →shěn;# 沈 is shěn (not chén) if followed by 阳 yáng: 沈阳 city Shěnyáng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hans-Latn
also matched 沈 阳
, but this only matches 沈阳
without a space; is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertbastian @srl295 I think the behavior for the new Hant is more correct, we should fix (under a different PR) Hans-Latn to do the Han-SpacedHan after handling the compounds. Filed a ticket: https://unicode-org.atlassian.net/browse/CLDR-18254
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good besides @robertbastian comment about spaces
[苧]→zhù; # U+82E7 | ||
# END From Unicode 17, the above should be autogenerated: | ||
# Then run the normal Hani-Latn transform for the rest | ||
::Hani-Latn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice solution!
CLDR-18080
This ticket adds a
Hant-Latn
transform which uses 101 readings that are different for Hant/TW, then calls the mainHani-Latn
transform for everything else. It also addsHans-Latn
as an alias for the currentHani-Latn
transform, and adds test data for both. Also, as noted in the ticket:沈 } \u0020? 阳 ->shěn;# 沈 is shěn (not chén) if followed by 阳 yáng: 沈阳 city Shěnyáng
; this dates from before Unicode 14, at which point the kMandarin value for U+6C88 沈 was changed from “chén” to “shěn chén”. With that change the rule became obsolete forHans
. However it is still slightly relevant forHant
; if the simplified form 沈阳 of the name for the city Shenyang is encountered in a Hant/Taiwan context, the first character should be transliterated as shěn rather than as chén. So this rule should be moved to theHant-Latn
transform.ALLOW_MANY_COMMITS=true