-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support utf-8 encoding guessing #84495
Comments
(Experimental duplicate detection) |
These issues will be solve by #84503 |
In particular, aadsm/jschardet#48 will be treat as |
I like this change because it makes |
@sunbohong I noticed a bad regression though that made me push ddfca30 to ignore the guessed encoding "ascii" for a simple reason:
=> the file is still guessed as "ascii" because it was not saved with a proper encoding |
Verification:
=> the encoding should be |
Given scary issues such as #85821 I am putting UTF-16 and 32 back to the list of ignored encodings for guessing. Still, UTF-8 can be guessed. |
utf-8(
![image](https://user-images.githubusercontent.com/7285119/68597276-a622ce00-04d7-11ea-95e9-a77d30e501ea.png)
['ascii', 'utf-8', 'utf-16', 'utf-32']
) encoding guessing is disabled by this line. Ignore encodings that cannot guess correctlyBut this link didn't mention it.
And in my test,the follow text will be guessing as utf-8 correctly(result='utf-8').
jschardet result:
In the other hand,with the follow setting, it will be detect as
GBK
The text was updated successfully, but these errors were encountered: