-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up path processing #319
base: develop
Are you sure you want to change the base?
Conversation
The following constant expression: 1024 * 1024 should be evaluated by the compiler at runtime as though it were: (int)1024 * (int)1024 and one would expect the result to be: (int)1048576 However, strictly speaking, an `int' is only required to be able to represent at least the number 32767. However, a `long int' must be able to represent at least `2147483647'. Thus, strictly speaking, proper C code should specify that at least one of the integer constants is to be represented as a `long int': 1024L * 1024 Consider, also, the following program: #include <limits.h> // UINT_MAX #include <stdio.h> // printf #define PRINT_SIZE_OF(type) \ printf("sizeof(" #type ") = %u\n", (unsigned int)sizeof(type)) #define PRINT_UNSIGNED_LONG_LONG_INT(expr) \ printf(#expr " = %llu\n", (unsigned long long int)(expr)) int main() { PRINT_SIZE_OF(int); PRINT_SIZE_OF(long long int); PRINT_UNSIGNED_LONG_LONG_INT(UINT_MAX + 1); PRINT_UNSIGNED_LONG_LONG_INT(UINT_MAX + 1ULL); PRINT_UNSIGNED_LONG_LONG_INT(65536U * 65536); PRINT_UNSIGNED_LONG_LONG_INT(65536ULL * 65536); } That program might produce the following output: sizeof(int) = 4 sizeof(long long int) = 8 UINT_MAX + 1 = 0 UINT_MAX + 1ULL = 4294967296 65536U * 65536 = 0 65536ULL * 65536 = 4294967296 As you can see, specifying the type of an integer constant can be important. Take, for example, one of the above calculations: 65536U * 65536 Without the "U" suffix to specify that the integer constant has an unsigned type, the compiler would compute: (int)65536 * (int)65536 On the same machine, this would result in overflow of a signed integer type, which invokes undefined behavior; a good compiler would probably warn about it, or even produce an error.
The main problem was that the error message used `Negativ' rather than `Negative'. However, the message was also a little too playful; this commit changes the entire message to something a little more dour.
lib/cmdline.c
Outdated
@@ -1165,11 +1165,11 @@ void rm_cmd_set_paths_from_cmdline( | |||
g_assert(v->paths); | |||
|
|||
for(char *path, **list = v->paths; (path = *list); ++list) { | |||
if(strcmp(path, "-") == 0) { | |||
if(path[0] == '-' && path[1] == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert that. That's over optimization and hurts readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following reply is taken from the log message of the updated commit.
The way rmlint
currently works is that it produces a shell script that sometimes performs its work by running rmlint
many times with varying arguments; rmlint
can be run a large number of times, and each invocation involves argument processing whereby each path argument undergoes multiple char-by-char comparisons in search of a match for a special, well-known value (e.g., "-" or "//").
Therefore, this commit optimizes the process by using tailored, idiomatic C-string calculations in lieu of invoking a generic, open-ended, arbitrary comparison function; this commit replaces the Standard C Library's strcmp()
with boolean operators and character comparisons.
The calculations are idiomatic in that they are a familiar way of processing strings, but are somewhat particular to C code. To aid reading this code, comments have been embedded to clarify the purpose, as in the following:
if(/* path is "-" */ path[0] == '-' && path[1] == 0)
It's true that an optimizing compiler might perform this substitution automatically, or it might even make a better, platform-specific optimization. However, this seems unlikely based on the following rudimentary benchmarks; first define some Bash functions:
# Copy this script with the following command line, where "$THIS"
# expands to the revision of this commit log message:
#
# git log -1 --format=%b "$THIS" |
# sed -n '/^ #-/,/^ #-/p' | cut -c3- >/tmp/script.sh
#
#----------------------------------------------------------------
make_test()
{
# Note that the body of this function is
# explicitly indented with tab characters
# for the benefit of the "<<-" operator.
local name="$1" expression="$2"
cat >"$name".c <<-EOF
#include <stdbool.h> // bool
#include <string.h> // strcmp
#define ITERATIONS 10000000 // 10 million
int main()
{
int result = 0;
char string[3] = {0};
const char *const volatile pointer = string;
bool toggle = 0;
for (long i = 0; i < ITERATIONS; ++i) {
string[0] = string[1] = '/' + toggle;
toggle = !toggle;
const char *const s = pointer;
result += $expression;
}
return result;
}
EOF
}
build()
{
local name="$1"; shift
gcc -std=c99 -pedantic -Wall -static -fno-plt -fno-pie "$@" \
"$name".c -o "$name"
}
build_all() { build strcmp "$@" && build equals "$@"; }
run() { time -p ./"$1"; }
benchmark()
{
echo =========== "$@"
run strcmp
echo -----------
run equals
}
build_and_benchmark()
{
build_all "$@" || return 1
benchmark "$@"; return 0
}
#----------------------------------------------------------------
Then, create and run some benchmarks:
make_test strcmp 'strcmp(s,"//")'
make_test equals "(s[0] == '/' && s[1] == '/' && s[2] == 0)"
gcc --version | head -n 1
build_and_benchmark -O0
build_and_benchmark -Og
build_and_benchmark -O1
build_and_benchmark -O2
All together, this produces the following output on my [very old and slow] system; at each optimization level, timing information is provided first for strcmp()
and then for the manual calculation. Perversely, optimization seems to make strcmp()
slower, not faster; in contrast, optimization drastically improves the performance of manual calculation, which is also always faster than using strcmp()
.
gcc (GCC) 7.3.1 20180312
=========== -O0
real 0.35
user 0.35
sys 0.00
-----------
real 0.24
user 0.24
sys 0.00
=========== -Og
real 0.58
user 0.57
sys 0.00
-----------
real 0.15
user 0.15
sys 0.00
=========== -O1
real 0.60
user 0.59
sys 0.00
-----------
real 0.16
user 0.16
sys 0.00
=========== -O2
real 0.58
user 0.57
sys 0.00
-----------
real 0.07
user 0.07
sys 0.00
Assembler code for the unoptimized case can be produced with the following:
build_all -S -O0
cat strcmp # review
cat equals
Here is the meat of the comparison:
strcmp equals
------------------------- | -------------------------
.LC0: | movl -20(%ebp), %eax
.string "//" | movzbl (%eax), %eax
... | cmpb $47, %al // s[0] == '/'
pushl $.LC0 | jne .L3
pushl -20(%ebp) | movl -20(%ebp), %eax
call *strcmp@GOT | addl $1, %eax
| movzbl (%eax), %eax
| cmpb $47, %al // s[1] == '/'
| jne .L3
| movl -20(%ebp), %eax
| addl $2, %eax
| movzbl (%eax), %eax
| testb %al, %al // s[2] == 0
| jne .L3
| movl $1, %eax
| jmp .L4
| .L3:
| movl $0, %eax
As you can see, one is calling a function, while the other is performing the manual calculations. Now, consider the optimized version:
build_all -S -O2
cat strcmp
cat equals
This yields:
strcmp equals
------------------------- | -------------------------
.LC0: | movl -20(%ebp), %ebx
.string "//" | cmpb $47, (%ebx)
... | jne .L2
movl -36(%ebp), %esi | cmpb $47, 1(%ebx)
movl $.LC0, %edi | jne .L2
... | cmpb $1, 2(%ebx)
movl $3, %ecx |
repz cmpsb |
seta %cl |
setb %bl |
subl %ebx, %ecx |
movsbl %cl, %ecx |
There is no longer a call to strcmp()
, but the compiler has substituted what is essentially an exact replica of strcmp()
in terms of x86 assember code; the repz cmpsb
iterates through 2 strings in memory, comparing the bytes, stopping when there is a mismatch. Strangely, one would think that a native x86 instruction sequence might be inherently faster, but the benchmarks reveal that the naive integer comparisons are much faster. I suspect the reason for this is 2-fold:
-
Perhaps,
repz cmpsb
is performing a lot more memory access; in contrast, each naive integer comparison instruction can encode a comparand directly (e.g., the number 47, which is ASCII for '/'). -
Maybe,
repz cmpsb
cannot benefit from instruction pipelining as readily as separate integer comparisons.
Perhaps the GNU GCC team should re-consider how strcmp()
is optimized.
Anyway, the results are clear: It's better to do these small calculations by hand, rather than rely on the compiler to do something magical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your answer was again enlightening and I learned (again!) something out of it.
Thanks. But...
Anyway, the results are clear: It's better to do these small calculations by
hand, rather than rely on the compiler to do something magical.
The results are clear for a question that was not asked. The question you
answered is »How fast is manually comparing two characters by hand vs. strcmp
on different optimization levels?«. The question I had in mind (and I apologize
for not being more clear on that) was »Is it really worth optimizing this small
detail of the command line parsing code, in respect of other traits like
total performance, readability, maintainability and the general cost of
changing code?«
By reading your other comment, you seem to answer this question with a »yes« though:
In addition to the other comment, I do not think there is such a thing as
"over-optimization"; the problem with optimization is not that it's over done,
but rather as you imply: It might damage some other aspect of the program, such
as readability.
(The following is mostly general about optimizations, not exactly about strcmp)
You're right in that point that optimizing too much can hurt programs in other
ways. Not only readability (optimized code is often longer), but also
maintainability (changes get harder and harder to understand), suddenly having
problems on weird platforms, not well carried-out optimization that produce
wrong results (sometimes, without you noticing!) or simply loosing time that
could have been better spend with e.g. bug fixes or features. I have been
bitten by every one of these.
Maybe I should have used the already coined and more often used term »premature
optimization«. Still, optimizations are not an asset by themselves. They are
a liability. If I have to decide, if something needs to be optimized, it boils
down to two rules (other projects might handle that differently obviously):
- Don't optimize before something became too slow.
It does not matter how fast you're calculating a wrong result. - If there is need for optimization, try to optimize the big portions first.
In rmlint's case that is everything I/O related.
Defining "too slow" for rule 1 is difficult, but "too slow" for rmlint is: Some
user (or I or @SeeSpotRun) complained it is too slow for his use case and
convinces us that it is indeed a problem of rmlint.
This is of course not an exact science, but that's a description that rarely
fits to leading an open source project. After all code is not about machines.
It's about people and merely for machines. rmlint is something that is
actually used by people, which shifts the priorities a lot more into the
direction of stability and being conservative regarding changes and less about
optimizing every cycle out of it for reaching »perfection« (google keywords:
"perfect is the enemy of good").
Now, to the strcmp example:
Then again, why should readability be the winning aspect? Perhaps strcmp() is
over-readablized, to the unnecessary detriment of optimization. Indeed, a
C-string is explicitly an array of characters, and inspecting individual
characters (especially when looking for a very short string) is quite
normal—it's one of the killer features C was designed to have.
As described above, it's not only about readability. A more appropriate
benchmark here would have been running the rmlint binary with a few hundred
paths (pipe via stdin e.g.) and make it exit after parsing command line
options. I highly doubt you'd see any significant difference in execution time
there (See rule #2). And no, I'm not exactly asking for that number. ;-)
If there is a need to improve readability, then other means should be employed
before sacrificing run-time efficiency: Abstraction facilities (such as macros
or inlinable functions or structs), or even just plain, free-form, natural
human descriptions in the form of embedded comments.
My point is not how you optimized it and how to improve from there.
My point is that you did optimize it without taking rule #1 into account.
I mean is something like strcmp(x,"y") == 0 actually that readable?
Probably not - as a lot of stuff we got used to in C (and If I would write
rmlint today, I would not use C). But it's a change, which needs to be
evaluated to be worth and this explains my initial reaction of rejecting it. It
would have been a different case if the code in question would have been in a
really tight loop somewhere, where it could keep I/O from happening.
Don't get this wrong, I'm not opposed to changes in general and I'm thankful
for every contribution. It's more that I, as project maintainer, have (to have)
different priorities. Optimization for the sake of optimizations may be fun and
okay-ish only if you don't have users yet. Imagine you managed to shave of all
possible additional CPU cycles rmlint is currently doing. What would be the
result? A hard-to-change code base that would use tons of specific unrolled
code and compiler specific stuff. The user wouldn't even notice - in the best
case. I actually wanted to write something similar already about the previous
inline PR, but decided in the end that it's a low hanging fruit that did not
involve logic changes. This PR though includes some more logic changes.
What's the result of this discussion now? I'm not sure that rejecting the
"strcmp" change makes sense now, since we (well, mostly you) talked and thought
about it in so much detail now, so this can stay if you add an comment (e.g.
with a link to this thread). See this more as future advice for your PRs.
EDIT: This of course does not mean that micro optimizations are okay in future PRs.
On a different note: I have to go over the PR one more time since I forgot most
of the other changes in the meantime. Sorry for the delay, was a bit busy.
lib/cmdline.c
Outdated
g_assert(v->paths); | ||
|
||
for(char *path, **list = v->paths; (path = *list); ++list) { | ||
if(path[0] == '-' && path[1] == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert that to strcmp
if you don't have numbers that prove otherwise. That is just over-optimization for me and does not help readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the other comment, I do not think there is such a thing as "over-optimization"; the problem with optimization is not that it's over done, but rather as you imply: It might damage some other aspect of the program, such as readability.
Then again, why should readability be the winning aspect? Perhaps strcmp()
is over-readablized, to the unnecessary detriment of optimization. Indeed, a C-string is explicitly an array of characters, and inspecting individual characters (especially when looking for a very short string) is quite normal—it's one of the killer features C was designed to have.
If there is a need to improve readability, then other means should be employed before sacrificing run-time efficiency: Abstraction facilities (such as macros or inlinable functions or structs), or even just plain, free-form, natural human descriptions in the form of embedded comments.
I mean is something like strcmp(x,"y") == 0
actually that readable? Sure, a competent C programmer should be familiar with it, but he should also be familiar with indexing individual characters, too; both are equally readable from that perspective, so one might as well choose the solution that has better performance.
This simply introduces `const' discipline. Also, `g_assert(cfg);' has been inserted.
It was already returning values as though it did.
Before, `rm_cmd_parse_replay()' simply ignored whether there was a failure; now, it both returns a status and sets an error message.
This commit introduces a number of finer-grained functions related to resolving and verifying paths, and then uses those functions not only to refactor `rm_cfg_add_path()', but also to introduce a new function, `rm_cfg_prepend_json()'. The finer-grained functions are defined in a new source file: lib/path-funcs.h Similarly, `RmPath' has been moved to its own header: lib/path.h Additionally, some of the `#include' directives have been cleaned up as part of this refactoring; if source file `X' uses an identifier whose declaration comes from source file `Y', then `X' *always* includes `Y' explicitly (even if `Y' is included indirectly through some other source file), and each `#include' directive is associated with a comment that lists each such identifier that it provides.
This commit introduces a number of finer-grained functions related to resolving and verifying paths, and then uses those functions not only to refactor `rm_cfg_add_path()', but also to introduce a new function, `rm_cfg_prepend_json()'. The finer-grained functions are defined in a new source file: lib/path-funcs.h Similarly, `RmPath' has been moved to its own header: lib/path.h Additionally, some of the `#include' directives have been cleaned up as part of this refactoring; if source file `X' uses an identifier whose declaration comes from source file `Y', then `X' *always* includes `Y' explicitly (even if `Y' is included indirectly through some other source file), and each `#include' directive is associated with a comment that lists each such identifier that it provides.
Move it closer to `rm_cmd_set_paths()'.
By definition, `sizeof(char)' evaluates to 1.
A pointer to an `RmSession' object was being passed around, but only a pointer to its associated `RmCfg' object is needed.
Variables that need to be shared across `rm_cmd_set_paths*()' may be placed in an instance of this struct.
Before, it was just ignoring failures in `malloc()' and `getdelim()'. Also, one of the error messages in `lib/treemerge.c' has been set to match the error message introduced here.
It's only used locally in one function, so it can just be a local variable in that one function.
The compiler probably already does this, but I find it clearer and more satisfying to do this explicitly, especially since it allows avoiding a call to `g_strfreev()', rather than relying on that function to check `paths' for us again. For the sake of clarity in the diff, the existing code has not yet been indented to reflect the fact that it's part of the `if' statement's consequent.
You can check that only white space changed: $ git diff HEAD^ --ignore-space-change && echo Nothing\ changed Nothing changed
On each iteration of the loop, a path string is freed when processing is finished, and then the whole list of strings is simply freed without having to iterate through the list again to free each string.
These variables were handled directly in `rm_cfg_prepend_path()'; now, they're handled directly in `rm_cmd_set_paths*()'. This has allowed for optimizing the call to `rm_cfg_prepend_path()' that is used for adding the current working directory when no other path has been supplied by the user; there's no reason to check whether the current working directory is a JSON file.
The field `idx' of struct `RmPath' is now `index'.
Though an optimizing compiler might do this already, one might as well just do it explicitly, so as not to depend on the whims of any particular compiler.
The field `treat_as_single_vol' of struct `RmPath' is now `single_volume'.
Now, member `path_count' of a struct `RmCfg' has a more guaranteed and expected purpose: During and after the user-input processing of non-option path arguments, it represents the number of items in the associated list `paths'. This has allowed for the removal of a call to `g_slist_length()'.
This breaks the project, and is thus meant to be merged into a working commit.
This makes it consistent with `rm_cfg_prepend_json()', and it also implies that the list of files is being constructed such that it will be reversed with respect to the order in which paths are provided by the user.
Code from `rm_cmd_set_paths()' has been moved to its own function: rm_cmd_set_paths_from_cmdline()
The way `rmlint' currently works is that it produces a shell script that sometimes performs its work by running `rmlint' many times with varying arguments; `rmlint' can be run a large number of times, and each invocation involves argument processing whereby each path argument undergoes multiple char-by-char comparisons in search of a match for a special, well-known value (e.g., "-" or "//"). Therefore, this commit optimizes the process by using tailored, idiomatic C-string calculations in lieu of invoking a generic, open-ended, arbitrary comparison function; this commit replaces the Standard C Library's `strcmp()' with boolean operators and character comparisons. The calculations are idiomatic in that they are a familiar way of processing strings, but are somewhat particular to C code. To aid reading this code, comments have been embedded to clarify the purpose, as in the following: if(/* path is "-" */ path[0] == '-' && path[1] == 0) It's true that an optimizing compiler might perform this substitution automatically, or it might even make a better, platform-specific optimization. However, this seems unlikely based on the following rudimentary benchmarks; first define some Bash functions: # Copy this script with the following command line, where "$THIS" # expands to the revision of this commit log message: # # git log -1 --format=%b "$THIS" | # sed -n '/^ #-/,/^ #-/p' | cut -c3- >/tmp/script.sh # #---------------------------------------------------------------- make_test() { # Note that the body of this function is # explicitly indented with tab characters # for the benefit of the "<<-" operator. local name="$1" expression="$2" cat >"$name".c <<-EOF #include <stdbool.h> // bool #include <string.h> // strcmp #define ITERATIONS 10000000 // 10 million int main() { int result = 0; char string[3] = {0}; const char *const volatile pointer = string; bool toggle = 0; for (long i = 0; i < ITERATIONS; ++i) { string[0] = string[1] = '/' + toggle; toggle = !toggle; const char *const s = pointer; result += $expression; } return result; } EOF } build() { local name="$1"; shift gcc -std=c99 -pedantic -Wall -static -fno-plt -fno-pie "$@" \ "$name".c -o "$name" } build_all() { build strcmp "$@" && build equals "$@"; } run() { time -p ./"$1"; } benchmark() { echo =========== "$@" run strcmp echo ----------- run equals } build_and_benchmark() { build_all "$@" || return 1 benchmark "$@"; return 0 } #---------------------------------------------------------------- Then, create and run some benchmarks: make_test strcmp 'strcmp(s,"//")' make_test equals "(s[0] == '/' && s[1] == '/' && s[2] == 0)" gcc --version | head -n 1 build_and_benchmark -O0 build_and_benchmark -Og build_and_benchmark -O1 build_and_benchmark -O2 All together, this produces the following output on my [very old and slow] system; at each optimization level, timing information is provided first for `strcmp()' and then for the manual calculation. Perversely, optimization seems to make `strcmp()' slower, not faster; in contrast, optimization drastically improves the performance of manual calculation, which is also always faster than using `strcmp()'. gcc (GCC) 7.3.1 20180312 =========== -O0 real 0.35 user 0.35 sys 0.00 ----------- real 0.24 user 0.24 sys 0.00 =========== -Og real 0.58 user 0.57 sys 0.00 ----------- real 0.15 user 0.15 sys 0.00 =========== -O1 real 0.60 user 0.59 sys 0.00 ----------- real 0.16 user 0.16 sys 0.00 =========== -O2 real 0.58 user 0.57 sys 0.00 ----------- real 0.07 user 0.07 sys 0.00 Assembler code for the unoptimized case can be produced with the following: build_all -S -O0 cat strcmp # review cat equals Here is the meat of the comparison: strcmp equals ------------------------- | ------------------------- .LC0: | movl -20(%ebp), %eax .string "//" | movzbl (%eax), %eax ... | cmpb $47, %al // s[0] == '/' pushl $.LC0 | jne .L3 pushl -20(%ebp) | movl -20(%ebp), %eax call *strcmp@GOT | addl $1, %eax | movzbl (%eax), %eax | cmpb $47, %al // s[1] == '/' | jne .L3 | movl -20(%ebp), %eax | addl $2, %eax | movzbl (%eax), %eax | testb %al, %al // s[2] == 0 | jne .L3 | movl $1, %eax | jmp .L4 | .L3: | movl $0, %eax As you can see, one is calling a function, while the other is performing the manual calculations. Now, consider the optimized version: build_all -S -O2 cat strcmp cat equals This yields: strcmp equals ------------------------- | ------------------------- .LC0: | movl -20(%ebp), %ebx .string "//" | cmpb $47, (%ebx) ... | jne .L2 movl -36(%ebp), %esi | cmpb $47, 1(%ebx) movl $.LC0, %edi | jne .L2 ... | cmpb $1, 2(%ebx) movl $3, %ecx | repz cmpsb | seta %cl | setb %bl | subl %ebx, %ecx | movsbl %cl, %ecx | There is no longer a call to `strcmp()', but the compiler has substituted what is essentially an exact replica of `strcmp()' in terms of x86 assember code; the `repz cmpsb' iterates through 2 strings in memory, comparing the bytes, stopping when there is a mismatch. Strangely, one would think that a native x86 instruction sequence might be inherently faster, but the benchmarks reveal that the naive integer comparisons are much faster. I suspect the reason for this is 2-fold: * Perhaps, `repz cmpsb' is performing a lot more memory access; in contrast, each naive integer comparison instruction can encode a comparand directly (e.g., the number 47, which is ASCII for '/'). * Maybe, `repz cmpsb' cannot benefit from instruction pipelining as readily as separate integer comparisons. Perhaps the GNU GCC team should re-consider how `strcmp()' is optimized. Anyway, the results are clear: It's better to do these small calculations by hand, rather than rely on the compiler to do something magical.
Each of these functions is pretty much single-purpose, and should therefore be inlined at the few call sites.
When `cfg->replay' is false, there's no reason to check whether a path is a `json' file.
It's a static function that is used in only place.
The functions in question are now finer-grained and inlinable, which is worthwhile because these functions are only used in basically one part of the program, and thus exist mainly as a matter of organization (abstraction, documentation, etc.).
The code has been organized into finer-grained functions/macros, and has thereby been easily optimized for certain cases.
4627418
to
0d332c9
Compare
The original patch series has essentially been updated as per the following diff:
|
This series entails cleanup and refactoring, which clarify the code and improve the handling of errors in user input and runtime. Here are the main commits:
Each of those is composed of finer-grained commits: