From e43cf40ce0e0c46c528937b3ed49a721bc370499 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 17 Sep 2024 11:58:05 +0200 Subject: [PATCH] Support GB18030-2022 One legacy encoding was updated and relevant regulation requires software to match. As such the Encoding Standard should match as well. This aims to make the minimum number of changes necessary and does not impact GBK, only gb18030. Updated tests are in https://github.com/WebKit/WebKit/tree/main/LayoutTests/imported/w3c/web-platform-tests/encoding/legacy-mb-schinese/gb18030. If these are deemed satisfactory they will be exported. --- encoding.bs | 406 ++++++++++++++++++++++++++++++++++++++++-- tools-gb18030-2022.py | 70 ++++++++ 2 files changed, 463 insertions(+), 13 deletions(-) create mode 100644 tools-gb18030-2022.py diff --git a/encoding.bs b/encoding.bs index 9694e88..32b156d 100644 --- a/encoding.bs +++ b/encoding.bs @@ -835,7 +835,8 @@ specification, excluding index single-byte, which have their own table: This matches the GB18030-2005 standard for code points encoded as two bytes, except for 0xA3 0xA0 which maps to U+3000 to be compatible with deployed content. This index covers the CJK Unified Ideographs block of Unicode in its entirety. Entries from that block that are above or - to the left of (the first) U+3000 in the visualization are in the Unicode order. + to the left of (the first) U+3000 in the visualization are in the Unicode order. (Support for the + GB18030-2022 standard is handled separately to avoid impacting GBK.)