Mapping icu4c APIs Again #2810
-
Hi, now that icu4x has had a while to mature, I am picking up some of the work Eric Erhardt did investing icu4x for use in .Net: #499. Since there have been many changes since then, I decided to start a new thread. Right now I am just trying to evaluate two things:
I see since then icu4x has implemented something roughly equivalent to .dat correlation files, which was a missing feature before: #78. I am looking at the size difference of these files as compare to what we get out of icu4c. I know that presumably the files become smaller with fewer locales in them. We normally construct our .dat files by providing a list of locales to filter out, rather than a list of them to include. I didn't see any way to do that using icu_datagen: https://github.com/unicode-org/icu4x/blob/main/docs/tutorials/data_management.md. I could just generate the compliment set of locals, but I'm not sure how to get out what icu_datagen thinks of as "all"? Also if any one has any other notions how to make that file smaller, I am definitely interested. And if there has been any sort of icu4c->icu4x "translation guide" since Eric's thread? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
From a quick skim it seems like we implement those APIs in broad strokes. I'm not an ICU4C expert so I'm not sure of the details though.
It's all locales found in CLDR. I think the full list is in any folder in the CLDR repo, e.g. https://github.com/unicode-org/cldr-json/tree/main/cldr-json/cldr-misc-full/main.
The main thing to do is to use the postcard format.
Not really. We're happy to answer questions about individual APIs. There are some plans to make a compatabiltiy layer so that people can abstract over both ICU4C and ICU4X at once, but that doesn't exist yet. However, there is also the "baked" format which actually produces Rust code that compiles straight into Rust statics. This cannot be loaded dynamically, but can be built as a part of the application. This benefits greatly from compiler optimization and can be really really efficient, however it's super hard to measure the actual size impact since the compiler can be really smart and optimize things out on test applications. Either way, the data loading infrastructure is fully pluggable so you can start with code using postcard and then try plugging in baked data to see what happens. |
Beta Was this translation helpful? Give feedback.
From a quick skim it seems like we implement those APIs in broad strokes. I'm not an ICU4C expert so I'm not sure of the details though.
It's all locales found in CLDR. I think the full list is in any folder in the CLDR repo, e.g. https://github.com/unicode-org/cldr-json/tree/main/cldr-json/…