-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float string formatting with no specified presentation type behavior. #99694
Comments
The docs could possibly be clarified here; "same as 'g'" is intended to mean "the same style as 'g'-formatting", but without specifying the precision used. The precision is described later in the same paragraph:
So in the example you give, the relevant precision would be >>> value = 3141500.0
>>> f"{value:.7g}"
'3141500'
>>> f"{value}"
'3141500.0' Perhaps we'd do better not to mention the "g" format at all here, and to give a lower-level description of the behaviour. |
OTOH, a precision of 5 is also large enough to represent the given value faithfully for this particular example, so that text is indeed misleading. |
I think you mean when the presentation type is the empty string? I don't think it can be None.
|
Even
is basically meaningless without further clarification:
gives
There should at least be information about how floats get rounded when cast to strings. @ericvsmith Yes, the questions is what happens when there is no explicit presentation type in the format spec. under |
The doc says None, not |
None
presentation type behavior.
Are there any suggested changes here? If not, I think we should close this. |
Yes. On this page https://docs.python.org/3/library/string.html#formatspec the "None" row for presentation type is ambiguous at best at flat out wrong at worst. See the two example above. The main issues are (1) the comparison with the I tried to dig into some source code to figure out how and where floats do get rounded for string formatting but I wasn't able to figure it out since I have very little familiarity with the C code underlying Python. (1) could be addressed by changing the documentation in the linked page. (2) could be addressed on this page or possibly another documentation page. Also, for what it's worth, though this shouldn't really be necessary, the reason I became interested in this is because I've been having a look at a project that involves custom numeric string formatting. The project will either extend or re-implement the native Python float string formatting but some edge cases are challenging to code to or unit test without more clear documentation on what native behavior is expected. |
Without a format spec, it can still show scientific notation: >>> format(1e100, '')
'1e+100' Or am I missing your point? For the other point, @mdickinson can answer better than I, but I don't think there are any guarantees. Where possible, we use the so-called "short float repr", but this isn't guaranteed on all platforms. |
Ignoring the docs completely for a moment, it may be helpful to look at the source to understand the current behaviour. On a typical machine (i.e., one where we're not being forced back to the legacy code instead of using Lines 1006 to 1011 in 2e80c2a
With some debugging >>> x = 1729.3141
>>> format(x, '')
format_code=r, mode=0, precision=0
flags: add_dot_0_if_integer
'1729.3141'
>>> format(x, 'g')
format_code=g, mode=2, precision=6
flags: none
'1729.31'
>>> format(x, '.3')
format_code=g, mode=2, precision=3
flags: add_dot_0_if_integer
'1.73e+03'
>>> format(x, '.3g')
format_code=g, mode=2, precision=3
flags: none
'1.73e+03'
>>> format(x, '21')
format_code=r, mode=0, precision=0
flags: add_dot_0_if_integer
' 1729.3141'
>>> format(x, '21g')
format_code=g, mode=2, precision=6
flags: none
' 1729.31' There are two separate cases for a "no presentation type" format specification:
As to the effects of that
Here's an example where that difference manifests itself: >>> format(157.6, ".3g")
'158'
>>> format(157.6, ".3")
'1.58e+02' In that second case, if we'd followed the exact same rules as for >>> format(157.6, ".4g")
'157.6'
>>> format(157.6, ".4")
'157.6' Similarly, with a precision of >>> format(157.6, ".2g")
'1.6e+02'
>>> format(157.6, ".2")
'1.6e+02' And a case where that extra >>> format(1576.0, ".6g")
'1576'
>>> format(1576.0, ".6")
'1576.0' |
So going back to the docs, it looks as though we're not clearly expressing that "with precision" versus "without precision" distinction, and then we're kinda mashing the two cases together in the text. The
text is accurate, but only when a precision is used, while the
text is accurate in the case where no precision is used (and is valid for both the legacy and the dtoa.c-bsaed floating-point repr in that case). |
@mdickinson thanks so much for the thorough analysis, it's just what I needed. I'll parse through this and see if I can draft revised documentation for this. A note to all, I'm not really familiar with |
Here's my attempt at a summary of the cases. I'm considering the formatting of a float to a string when no presentation type is specified. In this case:
Do these statements seem correct? @mdickinson? The usage of |
Doesn't that example match the docs? Can you say what output you were expecting instead? |
Sorry, I just deleted my comment because I started to think the same. I'll see if I can restore it. The case was
I was expecting 100.00 because that is the result of I guess this is the use case for
|
Thanks. Yes, the doc wording is using "insignificant" in the sense that removing the zero doesn't change the value (as opposed to, for example, removing a trailing zero from the string |
Responding to your earlier comment:
Yes, I think that's accurate.
Yes. For floats, one of the constraints being applied for the no-precision case is that the output should "look like" a float rather than appearing to be an integer - that is, the output string should have either a decimal separator or an exponent indicator. A second constraint is that (for whatever reason - it's essentially cosmetic), we never produce a trailing "." without a following digit.
Again, this seems accurate, yes. I do understand the frustration: for a lot of this, the reason behind the behaviour is no better than history + "C did it/does it that way", so this is the behaviour that users expected at the time. But it's virtually impossible to change these sorts of details without breaking someone's code, somewhere. (For this exact reason I have a back-burner project for a third-party library to generalise formatting and rounding tasks and make them more consistent and flexible, but it may be some time before it sees the light of day.) |
No, I was incorrect that no format string plus no precision is similar to Your formatting and rounding package sounds interesting. I think the python community is missing a standalone package for easy formatting of scientific numbers. The native formatting is close enough that I can understand why something hasn't arisen. But it seems to be solving slightly different problems so it ends up being a little clunky to get just what you want if you're being very particular about digits (like you sometimes are in certain sciences). The uncertainties package does a pretty nice job of this but its formatting syntax is intertwined pretty heavily with the native python string formatting which, in my opinion, is a bit clunky. I've written my own uncertainties formatting which incorporates rounding using the |
I don't yet know how to contribute to the code that generates the python documentation. I'll work on learning that, but in the meantime, here's a first draft of my proposed changes. under
to
Simply remove the word "insignificant". under rewrite the following:
A comment here: The main blind spot in the documentations of the "None" presentation type is that the rules for the Under
to
At the bottom (or top) of the table include the comment
|
Well, now docs says: "For float this is like the 'g' type, except that when fixed-point notation is used to format the result, it always includes at least one digit past the decimal point, and switches to the scientific notation when exp >= p - 1. When the precision is not specified, the latter will be as large as needed to represent the given value faithfully." This issue seems to me a duplicate of #123560. Probably, now this can be closed. |
Documentation
The string format specification mini language is used to customize the presentation floats (and other types) as strings. Details like the minimum field with and precision of the representation of the float can be controlled. Documentation for the mini language is found at https://docs.python.org/3/library/string.html#format-specification-mini-language.
In the documentation it states that
However, I don't find this to be the case in practice:
results in
Is there accurate documentation about the expected formatting of floats when presentation type is not specified? If not could someone describe the expected behavior?
The text was updated successfully, but these errors were encountered: