-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-13589 Add unit test that locale time cycle matches region prefs #4383
base: main
Are you sure you want to change the base?
Conversation
-Copy icu4j DateTimeGeneratorTest.testJjMapping almost verbatim
tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestTimeCycle.java
Outdated
Show resolved
Hide resolved
ULocale[] locales = DateFormat.getAvailableULocales(); | ||
for (ULocale locale : locales) { | ||
String localeID = locale.getName(); | ||
DateTimePatternGenerator dtpg = DateTimePatternGenerator.getInstance(locale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we want to populate the DateTimePatternGenerator with current CLDR data, not ICU data. This typically uses ICUServiceBuilder. See for example: tools/cldr-code/src/main/java/org/unicode/cldr/test/FlexibleDateFromCLDR.java
for (ULocale locale : locales) { | ||
String localeID = locale.getName(); | ||
DateTimePatternGenerator dtpg = DateTimePatternGenerator.getInstance(locale); | ||
DateFormat dfmt = DateFormat.getTimeInstance(DateFormat.SHORT, locale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again here we want to get the current CLDR data, using a CLDRFile
method, instead of the ICU data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to revise to use CLDR data instead of ICU data
-Similar to icu4j DateTimeGeneratorTest.testJjMapping -Use current CLDR data, not ICU data
This comment was marked as outdated.
This comment was marked as outdated.
// Compare ICU data version: | ||
// DateTimePatternGenerator dtpg = DateTimePatternGenerator.getInstance(uloc); | ||
|
||
DateTimePatternGenerator dtpg = DateTimePatternGenerator.getEmptyInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getEmptyInstance
? Is it OK that dtpg
itself isn't associated with the locale? Instead it's icuServiceBuilder
that's locale-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to load up the DateTimePatternGenerator yourself with the CLDR data for the locale, it does not automatically do that. Take a look at tools/cldr-code/src/main/java/org/unicode/cldr/test/FlexibleDateFromCLDR.java (in fact you may be able to use that class in this test)
// DateFormat dfmt = DateFormat.getTimeInstance(DateFormat.SHORT, uloc); | ||
|
||
SimpleDateFormat dfmt = | ||
icuServiceBuilder.getDateFormat(LDMLConstants.GREGORIAN, jPattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DateFormat.getTimeInstance
specifies "time" while icuServiceBuilder.getDateFormat
doesn't. I didn't see any method like icuServiceBuilder.getTime...
I'm just guessing about jPattern
here, versus jPatSkeleton
or shortPatSkeleton
or something completely different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking generally here.
You have to be careful not to use "plain" ICU calls for formatting in tests or examples, since that will use ICU's (old) copy of CLDR data, not what is in the repository. So it won't test what needs to be tested.
If ICUServiceBuilder has API for what you want, that's great. Internally, it uses API code, but makes calls that load up CLDR data and then calls mechanisms that build the ICU classes using CLDR data.
There are some cases where ICUServiceBuilder doesn't have an API for what you want. In that case, do a search for the class you want to use on the ICU side within the CLDR code. For example, for DateTimePatternGenerator, there are a few places in the code that build it up, like
CheckDates:
DateTimePatternGenerator dateTimePatternGenerator = DateTimePatternGenerator.getEmptyInstance();
(This ought to be more centralized and documented.) I have to run to a meeting, but will check later.
tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestTimeCycle.java
Outdated
Show resolved
Hide resolved
-This is only a test; draft PR
I've added a 3rd commit, with comments and debugging, and changed this to a DRAFT PR. Calling |
@@ -30,7 +30,7 @@ | |||
* | |||
* @author markdavis | |||
*/ | |||
class FlexibleDateFromCLDR { | |||
public class FlexibleDateFromCLDR { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an old (?) comment above, "Temporary class while refactoring", which casts suspicion on making this public
Hmm. It may be that using DateTimePatternGenerator to handle the 'j' character is not the right way to approach this. Basically what we are trying to do is make sure that the gregorian and/or preferred-calendar (if different) time formats in each locale - which may be inherited - use the preferred hour cycle for the region or region-language combination as specified by the supplemental
|
@pedberg-icu I'm starting to have some success with that, thanks! The path you gave doesn't seem quite right. I'm using this instead, OK?
|
-This is only a test; draft PR
The 4th commit uses It fails for locale "aa", since
Is there some other way to get a region/territory from a locale ID? |
You should be calling LikelySubtags to fill in the region if there isn't one. |
Yes you are right, I had forgotten about the (implied) type="standard" |
Yes. Tom, see my step 4, I had mentioned "you can use sdi.getLikelySubtags() to get a map". This returns a Map<String, String> that should mirror the content of |
-This is only a test; draft PR
The 5th commit uses 1 locale apc, calendar gregorian, expected h to occur in both patterns h and HH:mm |
for (char timeCycleChar : timeCycleChars) { | ||
boolean has1 = jPatSkeleton.indexOf(timeCycleChar) >= 0; | ||
boolean has2 = shortPatSkeleton.indexOf(timeCycleChar) >= 0; | ||
if (has1 && !has2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this (like the original ICU code) is unsymmetrical and leaves out checking if (has2 && !has1)
-- however, I tried (has1 != has2)
and just got exactly twice the number of failures, for example:
105 locale yue_Hant, calendar gregorian, expected H to occur in both patterns H and ah:mm
106 locale yue_Hant, calendar gregorian, expected h to occur in both patterns H and ah:mm
I guess (has2 && !has1)
could be more significant if jPatSkeleton
had none of h, H, k, K, but at least currently that's not the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just taking one example, apc.xml
It has no timeFormatLength values, so it will just inherit from root.
Lines like the following also just inherit from root.
↑↑↑
root has (correctly) HH.
But its region is:
which matches
and results in 'h'.
So we should filter out locales that are top-level locals with no explicit value for timeFormatLength.
(Might also have to refine this further).
+ " get (region " | ||
+ region | ||
+ ") null, falling back to 001"); | ||
prefAndAllowedHr = timeData.get("001" /* world */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fallback to 001 happens for az_Cyrl, az_Latn, bal_Arab, bal_Latn, be_TARASK, ..., altogether 53 locales. Strangely not the same as the 53 failing locales, though there is some overlap, such as for el_POLYTON
Fallback to 001 is expected, and not a problem. 53 is just a coincidence,
tho a pretty unlikely one!
…On Thu, Feb 20, 2025 at 2:50 PM Tom Bishop ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestTimeCycle.java
<#4383 (comment)>:
> + + localeID
+ + " loc2 = "
+ + loc2
+ + " region = "
+ + region);
+ }
+ }
+ prefAndAllowedHr = timeData.get(region);
+ if (prefAndAllowedHr == null) {
+ System.out.println(
+ "getCldrJPatternForLocale "
+ + localeID
+ + " get (region "
+ + region
+ + ") null, falling back to 001");
+ prefAndAllowedHr = timeData.get("001" /* world */);
fallback to 001 happens for az_Cyrl, az_Latn, bal_Arab, bal_Latn,
be_TARASK, ..., altogether 53 locales. Strangely not the same as the 53
failing locales, though there is some overlap, such as for el_POLYTON
—
Reply to this email directly, view it on GitHub
<#4383 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMGLYN4RZCDCC3BC3VD2QZL4DAVCNFSM6AAAAABXML2M3SVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDMMZRGM3DOMZQGU>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Mark wrote: "So we should filter out locales that are top-level locals with no explicit value for timeFormatLength." I disagree, finding those is part of the point of this (otherwise they will cause errors in ICU). If a top level locale has a default region that prefers 'h' and has no standard time formats, we need to add them. That is part the criteria for Basic I believe. But maybe I misunderstood... |
-Similar to icu4j DateTimeGeneratorTest.testJjMapping
-Use current CLDR data, not ICU data
CLDR-13589
ALLOW_MANY_COMMITS=true