-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some improvements around Q_rsqrt()
#1458
Conversation
1146cce
to
f6d637e
Compare
Hmm, I removed the |
f6d637e
to
d535089
Compare
I readded that commit, what breaks the test is the removal of the second iteration. How do we check test to be true, by comparing against previous values we computed with Dæmon? |
Yeah that's how I made the patch trace tests. Since there is a long chain of calculations for converting a patch to collision geometry, I wasn't able to find the platonically ideal value (unlike with brushes where it easy). Making traces imprecise is probably much more likely to cause bugs than graphics stuff. Let's not mess with traces. |
But is it correct to do |
src/engine/qcommon/q_math.cpp
Outdated
@@ -1082,7 +1082,7 @@ void PerpendicularVector( vec3_t dst, const vec3_t src ) | |||
/* | |||
* * normalize the result | |||
*/ | |||
VectorNormalize( dst ); | |||
VectorNormalizeFast( dst ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't want to see the inaccurate version used for basic math routines. It's OK to use the sloppy version for graphics-specific stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it inaccurate? They both use Q_sqrt()
(and then they both use _mm_rsqrt_ps()
).
// returns vector length
inline vec_t VectorNormalize( vec3_t v )
{
vec_t length = DotProduct( v, v );
if ( length != 0.0f )
{
vec_t ilength = Q_rsqrt( length );
/* sqrtf(length) = length * (1 / sqrtf(length)) */
length *= ilength;
VectorScale( v, ilength, v );
}
return length;
}
// does NOT return vector length, uses rsqrt approximation
// fast vector normalize routine that does not check to make sure
// that length != 0, nor does it return length
inline void VectorNormalizeFast( vec3_t v )
{
vec_t ilength = Q_rsqrt( DotProduct( v, v ) );
VectorScale( v, ilength, v );
}
The comment before VectorNormalizeFast()
saying uses rsqrt approximation
is misleading, they both uses rsqrt. Maybe long time ago in the past the other one didn't, but they both do.
The difference is that VectorNormalize()
tests for the length not being zero, skips an useless scale if it is, and returns the length.
In the past I did the contrary, replacing VectorNormalizeFast()
with VectorNormalize()
in hope the extra test would actually make it faster by skipping the scale, it was slower because of the test breaking the vectorization.
The two functions should produce the same results as they use the same operations, one of them is just slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I commented on an irrelevant diff here, but I stand by the words themselves. The old version of Q_rsqrt was pretty accurate, having only the least significant 1-2 binary places wrong, so it was good enough to use more or less anywhere. Now if we are going to make it a rough approximation, we shouldn't be using it all the time.
95fa8f8
to
4ba3860
Compare
Actually 3 constants are needed to be modified to get better precision. I reimplemented the |
Was it correct to always do the second iteration of the magic trick even when using SSE's |
The inline __m128 sseQuatNormalize( __m128 q ) {
__m128 p = _mm_mul_ps( q, q );
__m128 t, h;
p = _mm_add_ps( sseSwizzle( p, XXZZ ),
sseSwizzle( p, YYWW ) );
p = _mm_add_ps( sseSwizzle( p, XXXX ),
sseSwizzle( p, ZZZZ ) );
t = _mm_rsqrt_ps( p );
h = _mm_mul_ps( _mm_set1_ps( 0.5f ), t );
t = _mm_mul_ps( _mm_mul_ps( t, t ), p );
t = _mm_sub_ps( _mm_set1_ps( 3.0f ), t );
t = _mm_mul_ps( h, t );
return _mm_mul_ps( q, t );
} |
92c11f2
to
a2e2874
Compare
Using the values from http://rrrola.wz.cz/inv_sqrt.html the compute is wrong if we do two iterations, but it is meant to give good results with a single iteration anyway. |
ioquake3 has a second function named On our side in
Do you know some tracing code that uses |
I've checked many idtech-derivated sources like ioquake3, vkQuake, Qfusion, ET:Xreal, ET:Legacy, RBQUAKE3, JediOutcast, OpenJK, KingpinQ3, dhewm3, UFO:AI, Smokin' Guns, World of Padman, Urban Terror, ETe, Quake3e, Qio, iodfe, FTE:QW and many others… Some use two iterations in a second function used exclusively in tracing code. The only ones using two iterations in generic KingPingQ3 is a fork of XreaL like us, so it's just inherited. UFO:AI has very different code so maybe the function is also used for tracing (unlike what we do). |
I'm just losing my time, the question should not be:
It doing two iterations is actually useless, we're the only ones doing that. The question should be:
Many others do that. |
Here is something we may use for tracing (we currently use inline float Q_rsqrt_precise( const float n )
{
static const float a = 0.5f;
static const float b = 3.0f;
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
float o;
// SSE rsqrt relative error bound: 3.7 * 10^-4
_mm_store_ss( &o, _mm_rsqrt_ss( _mm_load_ss( &n ) ) );
#else
/* Magic constant from Quake 3.
See https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb/code/game/q_math.c#L561
const uint32_t magic = 0x5f3759dful; */
/* Magic constant computed by Chris Lomont.
See: https://www.lomont.org/papers/2003/InvSqrt.pdf */
static const uint32_t magic = 0x5f375a86ul;
float o = RsqrtMagic( n, magic );
o *= a * ( b - n * o * o );
#endif
// Two iterations for higher precision.
o *= a * ( b - n * o * o );
return o;
} Here is what we may use for everything else: inline float Q_rsqrt( const float n )
{
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
float o;
_mm_store_ss( &o, _mm_rsqrt_ss( _mm_load_ss( &n ) ) );
#else
static const float a = 0.703952253f;
static const float b = 2.38924456f;
static const uint32_t magic = 0x5f1ffff9ul;
float o = RsqrtMagic( n, magic );
o *= a * ( b - n * o * o );
#endif
return o;
} Both codes assume this exist: #if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
#else
inline float RsqrtMagic( const float n, const uint32_t magic )
{
return Util::bit_cast<float>( magic - ( Util::bit_cast<uint32_t>( n ) >> 1 ) );
}
#endif |
b0069d3
to
1fc022f
Compare
If you get a worse result after a second iteration, you are doing something wrong. The Newton's method iteration should always get closer to the answer.
Yes that improves the precision a lot
To get good precision
I would rather phrase it as a fast one for graphics code, and a correct one for everything else. |
On the other hand Also I looked at the game code, there is no usage of it. It looks like only some of the renderer is using |
Not sure it applies here, rrrola not only provides a different magic value, but also provides other values than |
Not in our code base. Apparently Q_rsqrt has used a precise version since 2013. |
Well, one can just use the |
1fc022f
to
4abf136
Compare
So now is the status of the PR:
I removed other things. Anyway:
In that regard, not naming The purpose of this PR is to speedup |
4abf136
to
4cb2b6f
Compare
For information, the only functions remaining to use
The functions using
|
src/engine/qcommon/q_shared.h
Outdated
// relative error bound after the initial iteration: 1.8 * 10^-3 | ||
/* Magic constants by Jan Kadlec. | ||
See: http://rrrola.wz.cz/inv_sqrt.html */ | ||
static const float a = 0.703952253f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constexpr
not static const
src/engine/qcommon/q_shared.h
Outdated
static const float a = 0.5f; | ||
static const float b = 3.0f; | ||
float o = Q_rsqrt_fast( n ); | ||
// Second iteration for higher precision/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment no longer makes sense since the preceding calculations are different. You could describe this as "Do an iteration of Newton's method for finding the zero of f(x) = 1/x^2 - n"
The relative error bound is: 2.00010826×10⁻⁷ */ | ||
|
||
// Compute approximate inverse square root. | ||
inline float Q_rsqrt_fast( const float n ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the error bound for Q_rsqrt_fast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The page http://rrrola.wz.cz/inv_sqrt.html says:
max rel_error | avg rel_error² |
---|---|
6.50196699×10⁻⁴ | 2.00010826×10⁻⁷ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what means the 2
in avg rel_error²
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it may simple be the squared relative error
, annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I guess the relative error is √2.00010826×10⁻⁷
so 4.4722569917212941×10⁻⁴
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max rel error is what you want for a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, when compiled with -DUSE_ARCH_INTRINSICS=OFF
, with a relative tolerance of 4.4722569917212941e-4
, the tests for Q_rsqrt_fast()
fail this way:
src/engine/qcommon/q_math_test.cpp:228: Failure
Value of: Q_rsqrt_fast(1e-6)
Expected: is approximately 1000 (absolute error <= 0.44722569)
Actual: 1000.5 (of type float), which is 0.502686 from 1000
src/engine/qcommon/q_math_test.cpp:229: Failure
Value of: Q_rsqrt_fast(0.036)
Expected: is approximately 5.270463 (absolute error <= 0.0023570864)
Actual: 5.27387 (of type float), which is 0.00340557 from 5.27046
src/engine/qcommon/q_math_test.cpp:232: Failure
Value of: Q_rsqrt_fast(3)
Expected: is approximately 0.57735032 (absolute error <= 0.0002582059)
Actual: 0.576975 (of type float), which is -0.00037539 from 0.57735
[ FAILED ] QSharedMathTest.FastInverseSquareRoot (0 ms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max rel error is what you want for a unit test
Ah OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the tests pass (both with SSE and without).
565861e
to
6879ccb
Compare
LGTM but I want the error bounds to be more clearly documented. The error bound number for Q_rsqrt_fast is in there somewhere, but it is unclear whether it applies to that function or something else from the giant wall of text. And Q_rsqrt now says nothing at all. Maybe we don't know what it is exactly but we can say it's as least as good as the old one. Like the relative error is at most 5e-6, probably less. |
6879ccb
to
7d522ec
Compare
Done. |
Q_sqrt_fast()
that skips the second iteration, it is believed to be useless:Do we need this level of precision in
Q_rsqrt
? #1457Q_rsqrt
trick,the values come from: http://rrrola.wz.cz/inv_sqrt.html
Q_sqrt_fast()
inR_TBNtoQtangents()
andVectorNormalizeFast()
.