Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up path processing #319

Open
wants to merge 37 commits into
base: develop
Choose a base branch
from

Conversation

mfwitten
Copy link
Contributor

This series entails cleanup and refactoring, which clarify the code and improve the handling of errors in user input and runtime. Here are the main commits:

4 [HEAD] lib/cmdline.c: Refactor `rm_cmd_set_paths*()'
3 lib/cfg.c -> lib/cfg-funcs.h: Refactor functions
2 Miscellaneous improvements
1 [develop]

Each of those is composed of finer-grained commits:

4 [HEAD] lib/cmdline.c: Refactor `rm_cmd_set_paths*()'
|\  
| o lib/cmdline.c: INLINE `rm_cmd_set_paths()'
| o lib/cmdline.c: Tell compiler to optimize for the value of `replay'
| o lib/cmdline.c: Replace `strcmp()' with direct character comparisons
| o lib/cmdline.c: Abstract out `rm_cmd_set_paths_from_cmdline()'
| o lib/cfg.h: Use `RmCfg::path_count' more purposefully
| o lib/cmdline.c: Move `cfg->replay' and `cfg->path_count++' out of loops
| o lib/cfg-funcs.h: Move `cfg->replay' and `cfg->path_count++'
| o lib/cmdline.c: Free each command-line path as it is processed.
| o lib/cmdline.c: Use a pointer rather than an index integer for looping
| o lib/cmdline.c: Indent code to reflect the fact that it's in a block
| o lib/cmdline.c: Move constant loop condition (`paths') out of the loop
| o lib/cmdline.c: Consolidate the logic around whether to read paths from `stdin'
| o lib/cfg.h: Remove member `read_stdin'
| o lib/cmdline.c: Add error handling to `rm_cmd_set_paths_from_stdin()'
| o lib/cmdline.c: Abstract out struct `rm_cmd_set_paths_vars'
| o lib/cmdline.c: Replace `RmSession' with `RmCfg' in `rm_cmd_set_paths*()'
| o lib/cmdline.c: Simplify size in call to `malloc()'
| o lib/cmdline.c: Move `rm_cmd_read_paths_from_stdin()'
|/  
3 lib/cfg.c -> lib/cfg-funcs.h: Refactor functions
|\  
| o lib/cfg-funcs.h: Insert `g_assert(cfg);' into `rm_cfg_set_default()'
| o lib/cfg.c -> lib/cfg-funcs.h: Make functions inlinable
| |\  
| | o lib/cfg-funcs.h: Complete `git mv lib/cfg.c lib/cfg-funcs.h'
| | o lib/cfg.c: git mv lib/cfg.c lib/cfg-funcs.h
| |/  
| o lib/cfg.*: rm_cfg_prepend_path(): swap parameters `preferred' and `path'
| o lib/cfg.*: Rename `rm_cfg_add_path' to `rm_cfg_prepend_path()'
| o lib/path.h: s/treat_as_single_vol/single_volume/ in `RmPath'
| o lib/path.h: s/idx/index/ in `RmPath'
| o lib/cfg.*: Refactor `rm_cfg_add_path()'
| o lib/cmdline.c: Test parameters of `rm_cmd_parse_replay()' with `g_assert()'
| o lib/cmdline.c: return the status of `rm_cfg_add_path()'
| o lib/cfg.*: `rm_cfg_add_path()' now returns `bool'
| o lib/cfg.*: Clean up `rm_cfg_free_paths()'
|/  
2 Miscellaneous improvements
|\  
| o lib/cmdline.c: Fix error message
| o Use explicit types for some integer constants
|/  
1 [develop]

The following constant expression:

    1024 * 1024

should be evaluated by the compiler at runtime as though it were:

    (int)1024 * (int)1024

and one would expect the result to be:

    (int)1048576

However, strictly speaking, an `int'  is only required to be able
to represent  at least  the number 32767.  However, a  `long int'
must be able  to represent at least  `2147483647'. Thus, strictly
speaking, proper C  code should specify that at least  one of the
integer constants is to be represented as a `long int':

    1024L * 1024

Consider, also, the following program:

    #include <limits.h> // UINT_MAX
    #include <stdio.h>  // printf

    #define PRINT_SIZE_OF(type) \
        printf("sizeof(" #type ") = %u\n", (unsigned int)sizeof(type))

    #define PRINT_UNSIGNED_LONG_LONG_INT(expr) \
        printf(#expr " = %llu\n", (unsigned long long int)(expr))

    int main()
    {
        PRINT_SIZE_OF(int);
        PRINT_SIZE_OF(long long int);

        PRINT_UNSIGNED_LONG_LONG_INT(UINT_MAX + 1);
        PRINT_UNSIGNED_LONG_LONG_INT(UINT_MAX + 1ULL);

        PRINT_UNSIGNED_LONG_LONG_INT(65536U * 65536);
        PRINT_UNSIGNED_LONG_LONG_INT(65536ULL * 65536);
    }

That program might produce the following output:

    sizeof(int) = 4
    sizeof(long long int) = 8
    UINT_MAX + 1 = 0
    UINT_MAX + 1ULL = 4294967296
    65536U * 65536 = 0
    65536ULL * 65536 = 4294967296

As you can see, specifying the type of an integer constant can be
important. Take, for example, one of the above calculations:

    65536U * 65536

Without the "U"  suffix to specify that the  integer constant has
an unsigned type, the compiler would compute:

    (int)65536 * (int)65536

On the  same machine, this would  result in overflow of  a signed
integer type,  which invokes undefined behavior;  a good compiler
would probably warn about it, or even produce an error.
The main  problem was that  the error message used  `Negativ' rather
than `Negative'. However, the message was also a little too playful;
this commit  changes the entire  message to something a  little more
dour.
lib/cfg.c Show resolved Hide resolved
lib/cmdline.c Show resolved Hide resolved
lib/cmdline.c Outdated Show resolved Hide resolved
lib/cfg.c Outdated Show resolved Hide resolved
lib/cfg.c Outdated Show resolved Hide resolved
lib/cmdline.c Outdated Show resolved Hide resolved
lib/cmdline.c Outdated Show resolved Hide resolved
lib/cmdline.c Outdated
@@ -1165,11 +1165,11 @@ void rm_cmd_set_paths_from_cmdline(
g_assert(v->paths);

for(char *path, **list = v->paths; (path = *list); ++list) {
if(strcmp(path, "-") == 0) {
if(path[0] == '-' && path[1] == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert that. That's over optimization and hurts readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following reply is taken from the log message of the updated commit.


The way rmlint currently works is that it produces a shell script that sometimes performs its work by running rmlint many times with varying arguments; rmlint can be run a large number of times, and each invocation involves argument processing whereby each path argument undergoes multiple char-by-char comparisons in search of a match for a special, well-known value (e.g., "-" or "//").

Therefore, this commit optimizes the process by using tailored, idiomatic C-string calculations in lieu of invoking a generic, open-ended, arbitrary comparison function; this commit replaces the Standard C Library's strcmp() with boolean operators and character comparisons.

The calculations are idiomatic in that they are a familiar way of processing strings, but are somewhat particular to C code. To aid reading this code, comments have been embedded to clarify the purpose, as in the following:

if(/* path is "-" */ path[0] == '-' && path[1] == 0)

It's true that an optimizing compiler might perform this substitution automatically, or it might even make a better, platform-specific optimization. However, this seems unlikely based on the following rudimentary benchmarks; first define some Bash functions:

# Copy this script with the following command line, where "$THIS"
# expands to the revision of this commit log message:
#
#   git log -1 --format=%b "$THIS" |
#     sed -n '/^  #-/,/^  #-/p' | cut -c3- >/tmp/script.sh
#
#----------------------------------------------------------------
make_test()
{
	# Note that the body of this function is
	# explicitly indented with tab characters
	# for the benefit of the "<<-" operator.

	local name="$1" expression="$2"

	cat >"$name".c <<-EOF
	#include <stdbool.h> // bool
	#include <string.h>  // strcmp

	#define ITERATIONS 10000000 // 10 million

	int main()
	{
	    int result = 0;
	    char string[3] = {0};
	    const char *const volatile pointer = string;
	    bool toggle = 0;
	    for (long i = 0; i < ITERATIONS; ++i) {
	        string[0] = string[1] = '/' + toggle;
	        toggle = !toggle;
	        const char *const s = pointer;
	        result += $expression;
	    }
	    return result;
	}
	EOF
}

build()
{
  local name="$1"; shift
  gcc -std=c99 -pedantic -Wall -static -fno-plt -fno-pie "$@" \
    "$name".c -o "$name"
}

build_all() { build strcmp "$@" && build equals "$@"; }

run() { time -p ./"$1"; }

benchmark()
{
  echo =========== "$@"
  run strcmp
  echo -----------
  run equals
}

build_and_benchmark()
{
  build_all "$@" || return 1
  benchmark "$@"; return 0
}
#----------------------------------------------------------------

Then, create and run some benchmarks:

make_test strcmp 'strcmp(s,"//")'
make_test equals "(s[0] == '/' && s[1] == '/' && s[2] == 0)"

gcc --version | head -n 1
build_and_benchmark -O0
build_and_benchmark -Og
build_and_benchmark -O1
build_and_benchmark -O2

All together, this produces the following output on my [very old and slow] system; at each optimization level, timing information is provided first for strcmp() and then for the manual calculation. Perversely, optimization seems to make strcmp() slower, not faster; in contrast, optimization drastically improves the performance of manual calculation, which is also always faster than using strcmp().

gcc (GCC) 7.3.1 20180312
=========== -O0
real 0.35
user 0.35
sys 0.00
-----------
real 0.24
user 0.24
sys 0.00
=========== -Og
real 0.58
user 0.57
sys 0.00
-----------
real 0.15
user 0.15
sys 0.00
=========== -O1
real 0.60
user 0.59
sys 0.00
-----------
real 0.16
user 0.16
sys 0.00
=========== -O2
real 0.58
user 0.57
sys 0.00
-----------
real 0.07
user 0.07
sys 0.00

Assembler code for the unoptimized case can be produced with the following:

build_all -S -O0
cat strcmp # review
cat equals

Here is the meat of the comparison:

         strcmp                    equals
------------------------- | -------------------------
   .LC0:                  |  movl    -20(%ebp), %eax
   .string "//"           |  movzbl  (%eax), %eax
   ...                    |  cmpb    $47, %al           // s[0] == '/'
   pushl   $.LC0          |  jne     .L3
   pushl   -20(%ebp)      |  movl    -20(%ebp), %eax
   call    *strcmp@GOT    |  addl    $1, %eax
                          |  movzbl  (%eax), %eax
                          |  cmpb    $47, %al           // s[1] == '/'
                          |  jne     .L3
                          |  movl    -20(%ebp), %eax
                          |  addl    $2, %eax
                          |  movzbl  (%eax), %eax
                          |  testb   %al, %al           // s[2] == 0
                          |  jne     .L3
                          |  movl    $1, %eax
                          |  jmp     .L4
                          |  .L3:
                          |  movl    $0, %eax

As you can see, one is calling a function, while the other is performing the manual calculations. Now, consider the optimized version:

build_all -S -O2
cat strcmp
cat equals

This yields:

         strcmp                    equals
------------------------- | -------------------------
 .LC0:                    |  movl    -20(%ebp), %ebx
 .string "//"             |  cmpb    $47, (%ebx)
 ...                      |  jne     .L2
 movl    -36(%ebp), %esi  |  cmpb    $47, 1(%ebx)
 movl    $.LC0, %edi      |  jne     .L2
 ...                      |  cmpb    $1, 2(%ebx)
 movl    $3, %ecx         |
 repz cmpsb               |
 seta    %cl              |
 setb    %bl              |
 subl    %ebx, %ecx       |
 movsbl  %cl, %ecx        |

There is no longer a call to strcmp(), but the compiler has substituted what is essentially an exact replica of strcmp() in terms of x86 assember code; the repz cmpsb iterates through 2 strings in memory, comparing the bytes, stopping when there is a mismatch. Strangely, one would think that a native x86 instruction sequence might be inherently faster, but the benchmarks reveal that the naive integer comparisons are much faster. I suspect the reason for this is 2-fold:

  • Perhaps, repz cmpsb is performing a lot more memory access; in contrast, each naive integer comparison instruction can encode a comparand directly (e.g., the number 47, which is ASCII for '/').

  • Maybe, repz cmpsb cannot benefit from instruction pipelining as readily as separate integer comparisons.

Perhaps the GNU GCC team should re-consider how strcmp() is optimized.

Anyway, the results are clear: It's better to do these small calculations by hand, rather than rely on the compiler to do something magical.

Copy link
Owner

@sahib sahib Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your answer was again enlightening and I learned (again!) something out of it.
Thanks. But...

Anyway, the results are clear: It's better to do these small calculations by
hand, rather than rely on the compiler to do something magical.

The results are clear for a question that was not asked. The question you
answered is »How fast is manually comparing two characters by hand vs. strcmp
on different optimization levels?«. The question I had in mind (and I apologize
for not being more clear on that) was »Is it really worth optimizing this small
detail of the command line parsing code, in respect of other traits like
total performance, readability, maintainability and the general cost of
changing code?«

By reading your other comment, you seem to answer this question with a »yes« though:

In addition to the other comment, I do not think there is such a thing as
"over-optimization"; the problem with optimization is not that it's over done,
but rather as you imply: It might damage some other aspect of the program, such
as readability.

(The following is mostly general about optimizations, not exactly about strcmp)

You're right in that point that optimizing too much can hurt programs in other
ways. Not only readability (optimized code is often longer), but also
maintainability (changes get harder and harder to understand), suddenly having
problems on weird platforms, not well carried-out optimization that produce
wrong results (sometimes, without you noticing!) or simply loosing time that
could have been better spend with e.g. bug fixes or features. I have been
bitten by every one of these.

Maybe I should have used the already coined and more often used term »premature
optimization«. Still, optimizations are not an asset by themselves. They are
a liability. If I have to decide, if something needs to be optimized, it boils
down to two rules (other projects might handle that differently obviously):

  1. Don't optimize before something became too slow.
    It does not matter how fast you're calculating a wrong result.
  2. If there is need for optimization, try to optimize the big portions first.
    In rmlint's case that is everything I/O related.

Defining "too slow" for rule 1 is difficult, but "too slow" for rmlint is: Some
user (or I or @SeeSpotRun) complained it is too slow for his use case and
convinces us that it is indeed a problem of rmlint.

This is of course not an exact science, but that's a description that rarely
fits to leading an open source project. After all code is not about machines.
It's about people and merely for machines. rmlint is something that is
actually used by people, which shifts the priorities a lot more into the
direction of stability and being conservative regarding changes and less about
optimizing every cycle out of it for reaching »perfection« (google keywords:
"perfect is the enemy of good").

Now, to the strcmp example:

Then again, why should readability be the winning aspect? Perhaps strcmp() is
over-readablized, to the unnecessary detriment of optimization. Indeed, a
C-string is explicitly an array of characters, and inspecting individual
characters (especially when looking for a very short string) is quite
normal—it's one of the killer features C was designed to have.

As described above, it's not only about readability. A more appropriate
benchmark here would have been running the rmlint binary with a few hundred
paths (pipe via stdin e.g.) and make it exit after parsing command line
options. I highly doubt you'd see any significant difference in execution time
there (See rule #2). And no, I'm not exactly asking for that number. ;-)

If there is a need to improve readability, then other means should be employed
before sacrificing run-time efficiency: Abstraction facilities (such as macros
or inlinable functions or structs), or even just plain, free-form, natural
human descriptions in the form of embedded comments.

My point is not how you optimized it and how to improve from there.
My point is that you did optimize it without taking rule #1 into account.

I mean is something like strcmp(x,"y") == 0 actually that readable?

Probably not - as a lot of stuff we got used to in C (and If I would write
rmlint today, I would not use C). But it's a change, which needs to be
evaluated to be worth and this explains my initial reaction of rejecting it. It
would have been a different case if the code in question would have been in a
really tight loop somewhere, where it could keep I/O from happening.

Don't get this wrong, I'm not opposed to changes in general and I'm thankful
for every contribution. It's more that I, as project maintainer, have (to have)
different priorities. Optimization for the sake of optimizations may be fun and
okay-ish only if you don't have users yet. Imagine you managed to shave of all
possible additional CPU cycles rmlint is currently doing. What would be the
result? A hard-to-change code base that would use tons of specific unrolled
code and compiler specific stuff. The user wouldn't even notice - in the best
case. I actually wanted to write something similar already about the previous
inline PR, but decided in the end that it's a low hanging fruit that did not
involve logic changes. This PR though includes some more logic changes.

What's the result of this discussion now? I'm not sure that rejecting the
"strcmp" change makes sense now, since we (well, mostly you) talked and thought
about it in so much detail now, so this can stay if you add an comment (e.g.
with a link to this thread). See this more as future advice for your PRs.

EDIT: This of course does not mean that micro optimizations are okay in future PRs.


On a different note: I have to go over the PR one more time since I forgot most
of the other changes in the meantime. Sorry for the delay, was a bit busy.

lib/cmdline.c Outdated
g_assert(v->paths);

for(char *path, **list = v->paths; (path = *list); ++list) {
if(path[0] == '-' && path[1] == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert that to strcmp if you don't have numbers that prove otherwise. That is just over-optimization for me and does not help readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the other comment, I do not think there is such a thing as "over-optimization"; the problem with optimization is not that it's over done, but rather as you imply: It might damage some other aspect of the program, such as readability.

Then again, why should readability be the winning aspect? Perhaps strcmp() is over-readablized, to the unnecessary detriment of optimization. Indeed, a C-string is explicitly an array of characters, and inspecting individual characters (especially when looking for a very short string) is quite normal—it's one of the killer features C was designed to have.

If there is a need to improve readability, then other means should be employed before sacrificing run-time efficiency: Abstraction facilities (such as macros or inlinable functions or structs), or even just plain, free-form, natural human descriptions in the form of embedded comments.

I mean is something like strcmp(x,"y") == 0 actually that readable? Sure, a competent C programmer should be familiar with it, but he should also be familiar with indexing individual characters, too; both are equally readable from that perspective, so one might as well choose the solution that has better performance.

lib/cmdline.c Outdated Show resolved Hide resolved
This simply  introduces `const'  discipline.
Also, `g_assert(cfg);' has been inserted.
It was already returning values as though it did.
Before, `rm_cmd_parse_replay()' simply ignored whether there was a
failure; now, it both returns a status and sets an error message.
This commit introduces  a number of finer-grained  functions related to
resolving and verifying  paths, and then uses those  functions not only
to refactor `rm_cfg_add_path()', but also to introduce  a new function,
`rm_cfg_prepend_json()'. The  finer-grained functions are defined  in a
new source file:

    lib/path-funcs.h

Similarly, `RmPath' has been moved to its own header:

    lib/path.h

Additionally, some of the `#include' directives have been cleaned up as
part of this  refactoring; if source file `X' uses  an identifier whose
declaration comes from source file  `Y', then `X' *always* includes `Y'
explicitly  (even if  `Y'  is included  indirectly  through some  other
source  file),  and each  `#include'  directive  is associated  with  a
comment that lists each such identifier that it provides.
This commit introduces  a number of finer-grained  functions related to
resolving and verifying  paths, and then uses those  functions not only
to refactor `rm_cfg_add_path()', but also to introduce  a new function,
`rm_cfg_prepend_json()'. The  finer-grained functions are defined  in a
new source file:

    lib/path-funcs.h

Similarly, `RmPath' has been moved to its own header:

    lib/path.h

Additionally, some of the `#include' directives have been cleaned up as
part of this  refactoring; if source file `X' uses  an identifier whose
declaration comes from source file  `Y', then `X' *always* includes `Y'
explicitly  (even if  `Y'  is included  indirectly  through some  other
source  file),  and each  `#include'  directive  is associated  with  a
comment that lists each such identifier that it provides.
Move it closer to `rm_cmd_set_paths()'.
By definition, `sizeof(char)' evaluates to 1.
A pointer to an `RmSession' object was being passed around, but only a
pointer to its associated `RmCfg' object is needed.
Variables that need to be shared across `rm_cmd_set_paths*()' may
be placed in an instance of this struct.
Before, it was just ignoring failures in `malloc()' and `getdelim()'.
Also, one of the error messages in `lib/treemerge.c' has been set to
match the error message introduced here.
It's only used locally in one function, so it can just be a local
variable in that one function.
The compiler  probably already does  this, but I find  it clearer
and more  satisfying to do  this explicitly, especially  since it
allows avoiding a call to  `g_strfreev()', rather than relying on
that function to check `paths' for us again.

For the  sake of clarity in  the diff, the existing  code has not
yet been indented to reflect the  fact that it's part of the `if'
statement's consequent.
You can check that only white space changed:

    $ git diff HEAD^ --ignore-space-change && echo Nothing\ changed
    Nothing changed
On each iteration of the loop, a path string is freed when
processing is finished, and then the whole list of strings
is simply freed without having to iterate through the list
again to free each string.
These variables were handled directly in `rm_cfg_prepend_path()';
now, they're handled directly in `rm_cmd_set_paths*()'.

This has allowed for optimizing the call to `rm_cfg_prepend_path()'
that is used for adding the current working directory when no other
path has been supplied by the user; there's no reason to check
whether the current working directory is a JSON file.
The field `idx' of struct `RmPath' is now `index'.
Though an optimizing compiler might do this already, one might as
well just do it explicitly, so as not to depend on the whims of
any particular compiler.
The field `treat_as_single_vol' of struct `RmPath' is now `single_volume'.
Now, member `path_count' of a struct `RmCfg' has a more guaranteed
and expected  purpose: During and after  the user-input processing
of non-option path arguments, it represents the number of items in
the associated list `paths'.

This has allowed for the removal of a call to `g_slist_length()'.
This breaks the project, and is thus meant to be merged into
a working commit.
This makes it consistent with `rm_cfg_prepend_json()', and it also
implies that the  list of files is being constructed  such that it
will be  reversed with  respect to  the order  in which  paths are
provided by the user.
Code from `rm_cmd_set_paths()' has been moved to its own function:

    rm_cmd_set_paths_from_cmdline()
The way `rmlint' currently works is that it produces a shell script
that sometimes  performs its  work by  running `rmlint'  many times
with  varying arguments;  `rmlint' can  be  run a  large number  of
times,  and each  invocation involves  argument processing  whereby
each path  argument undergoes multiple char-by-char  comparisons in
search of  a match for  a special,  well-known value (e.g.,  "-" or
"//").

Therefore,  this commit  optimizes the  process by  using tailored,
idiomatic  C-string calculations  in  lieu of  invoking a  generic,
open-ended,  arbitrary comparison  function;  this commit  replaces
the  Standard C  Library's  `strcmp()' with  boolean operators  and
character comparisons.

The calculations are  idiomatic in that they are a  familiar way of
processing strings, but  are somewhat particular to C  code. To aid
reading  this code,  comments  have been  embedded  to clarify  the
purpose, as in the following:

  if(/* path is "-" */ path[0] == '-' && path[1] == 0)

It's  true   that  an   optimizing  compiler  might   perform  this
substitution  automatically,  or  it  might  even  make  a  better,
platform-specific optimization. However,  this seems unlikely based
on  the following  rudimentary benchmarks;  first define  some Bash
functions:

  # Copy this script with the following command line, where "$THIS"
  # expands to the revision of this commit log message:
  #
  #   git log -1 --format=%b "$THIS" |
  #     sed -n '/^  #-/,/^  #-/p' | cut -c3- >/tmp/script.sh
  #
  #----------------------------------------------------------------
  make_test()
  {
  	# Note that the body of this function is
  	# explicitly indented with tab characters
  	# for the benefit of the "<<-" operator.

  	local name="$1" expression="$2"

  	cat >"$name".c <<-EOF
  	#include <stdbool.h> // bool
  	#include <string.h>  // strcmp

  	#define ITERATIONS 10000000 // 10 million

  	int main()
  	{
  	    int result = 0;
  	    char string[3] = {0};
  	    const char *const volatile pointer = string;
  	    bool toggle = 0;
  	    for (long i = 0; i < ITERATIONS; ++i) {
  	        string[0] = string[1] = '/' + toggle;
  	        toggle = !toggle;
  	        const char *const s = pointer;
  	        result += $expression;
  	    }
  	    return result;
  	}
  	EOF
  }

  build()
  {
    local name="$1"; shift
    gcc -std=c99 -pedantic -Wall -static -fno-plt -fno-pie "$@" \
      "$name".c -o "$name"
  }

  build_all() { build strcmp "$@" && build equals "$@"; }

  run() { time -p ./"$1"; }

  benchmark()
  {
    echo =========== "$@"
    run strcmp
    echo -----------
    run equals
  }

  build_and_benchmark()
  {
    build_all "$@" || return 1
    benchmark "$@"; return 0
  }
  #----------------------------------------------------------------

Then, create and run some benchmarks:

  make_test strcmp 'strcmp(s,"//")'
  make_test equals "(s[0] == '/' && s[1] == '/' && s[2] == 0)"

  gcc --version | head -n 1
  build_and_benchmark -O0
  build_and_benchmark -Og
  build_and_benchmark -O1
  build_and_benchmark -O2

All together,  this produces the  following output on my  [very old
and slow] system; at each optimization level, timing information is
provided first for `strcmp()' and  then for the manual calculation.
Perversely,  optimization  seems  to make  `strcmp()'  slower,  not
faster;   in  contrast,   optimization  drastically   improves  the
performance of manual calculation, which is also always faster than
using `strcmp()'.

  gcc (GCC) 7.3.1 20180312
  =========== -O0
  real 0.35
  user 0.35
  sys 0.00
  -----------
  real 0.24
  user 0.24
  sys 0.00
  =========== -Og
  real 0.58
  user 0.57
  sys 0.00
  -----------
  real 0.15
  user 0.15
  sys 0.00
  =========== -O1
  real 0.60
  user 0.59
  sys 0.00
  -----------
  real 0.16
  user 0.16
  sys 0.00
  =========== -O2
  real 0.58
  user 0.57
  sys 0.00
  -----------
  real 0.07
  user 0.07
  sys 0.00

Assembler code  for the unoptimized  case can be produced  with the
following:

  build_all -S -O0
  cat strcmp # review
  cat equals

Here is the meat of the comparison:

           strcmp                    equals
  ------------------------- | -------------------------
     .LC0:                  |  movl    -20(%ebp), %eax
     .string "//"           |  movzbl  (%eax), %eax
     ...                    |  cmpb    $47, %al           // s[0] == '/'
     pushl   $.LC0          |  jne     .L3
     pushl   -20(%ebp)      |  movl    -20(%ebp), %eax
     call    *strcmp@GOT    |  addl    $1, %eax
                            |  movzbl  (%eax), %eax
                            |  cmpb    $47, %al           // s[1] == '/'
                            |  jne     .L3
                            |  movl    -20(%ebp), %eax
                            |  addl    $2, %eax
                            |  movzbl  (%eax), %eax
                            |  testb   %al, %al           // s[2] == 0
                            |  jne     .L3
                            |  movl    $1, %eax
                            |  jmp     .L4
                            |  .L3:
                            |  movl    $0, %eax

As  you can  see, one  is calling  a function,  while the  other is
performing  the manual  calculations. Now,  consider the  optimized
version:

  build_all -S -O2
  cat strcmp
  cat equals

This yields:

           strcmp                    equals
  ------------------------- | -------------------------
   .LC0:                    |  movl    -20(%ebp), %ebx
   .string "//"             |  cmpb    $47, (%ebx)
   ...                      |  jne     .L2
   movl    -36(%ebp), %esi  |  cmpb    $47, 1(%ebx)
   movl    $.LC0, %edi      |  jne     .L2
   ...                      |  cmpb    $1, 2(%ebx)
   movl    $3, %ecx         |
   repz cmpsb               |
   seta    %cl              |
   setb    %bl              |
   subl    %ebx, %ecx       |
   movsbl  %cl, %ecx        |

There  is no  longer a  call to  `strcmp()', but  the compiler  has
substituted what is  essentially an exact replica  of `strcmp()' in
terms of  x86 assember  code; the `repz cmpsb'  iterates  through 2
strings in  memory, comparing the  bytes, stopping when there  is a
mismatch. Strangely, one would think  that a native x86 instruction
sequence might be inherently faster, but the benchmarks reveal that
the naive integer comparisons are much faster. I suspect the reason
for this is 2-fold:

  * Perhaps, `repz cmpsb'  is performing a lot  more memory access;
    in  contrast, each  naive  integer  comparison instruction  can
    encode  a comparand  directly (e.g.,  the number  47, which  is
    ASCII for '/').

  * Maybe, `repz cmpsb' cannot  benefit from instruction pipelining
    as readily as separate integer comparisons.

Perhaps  the GNU  GCC  team should  re-consider  how `strcmp()'  is
optimized.

Anyway,  the results  are  clear:  It's better  to  do these  small
calculations  by hand,  rather  than  rely on  the  compiler to  do
something magical.
Each of these functions is pretty much single-purpose, and should
therefore be inlined at the few call sites.
When `cfg->replay' is false, there's no reason to check whether a
path is a `json' file.
It's a static function that is used in only place.
The functions in question are now finer-grained and inlinable, which
is worthwhile because these functions are only used in basically one
part  of  the  program,  and  thus  exist  mainly  as  a  matter  of
organization (abstraction, documentation, etc.).
The code has been organized into finer-grained functions/macros,
and has thereby been easily optimized for certain cases.
@mfwitten
Copy link
Contributor Author

The original patch series has essentially been updated as per the following diff:

diff --git a/lib/cfg-funcs.h b/lib/cfg-funcs.h
index 9b218c31..c2e4f000 100644
--- a/lib/cfg-funcs.h
+++ b/lib/cfg-funcs.h
@@ -121,8 +121,10 @@ bool rm_cfg_prepend_json(
             real_path,
             cfg->path_count++,
             false /* not preferred */
-        ); return true;
-    } return false;
+        );
+        return true;
+    }
+    return false;
 }
 
 #define PREPEND_TO(list)    \
@@ -149,8 +151,10 @@ bool rm_cfg_prepend_path(
         } else {
             PREPEND_TO(cfg->paths);
             ++(cfg->path_count);
-        } return true;
-    } return false;
+        }
+        return true;
+    }
+    return false;
 }
 
 #undef PREPEND_TO
@@ -159,9 +163,9 @@ static INLINE
 void rm_cfg_free_paths(RmCfg *const cfg) {
     g_assert(cfg);
     g_slist_free_full(cfg->paths, (GDestroyNotify)rm_path_free);
-    cfg->paths = 0;
+    cfg->paths = NULL;
     g_slist_free_full(cfg->json_paths, (GDestroyNotify)rm_path_free);
-    cfg->json_paths = 0;
+    cfg->json_paths = NULL;
 }
 
 #endif /* end of include guard */
diff --git a/lib/cmdline.c b/lib/cmdline.c
index 56d07a8e..a3ab75f9 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -1074,12 +1074,25 @@ static gboolean rm_cmd_parse_rankby(_UNUSED const char *option_name,
 
 static gboolean rm_cmd_parse_replay(_UNUSED const char *option_name,
                                     const gchar *path, RmSession *session,
-                                    _UNUSED GError **error) {
+                                    GError **error) {
     g_assert(session);
     g_assert(session->cfg);
     session->cfg->replay = true;
     session->cfg->cache_file_structs = true;
-    return rm_cfg_prepend_json(session->cfg, path);
+
+    if(rm_cfg_prepend_json(session->cfg, path)) {
+        return true;
+    }
+
+    g_assert(error);
+    g_assert(*error == NULL);
+
+    g_set_error(
+        error, RM_ERROR_QUARK, 0,
+        _("Failed to include this replay file: %s"), path
+    );
+
+    return false;
 }
 
 static gboolean rm_cmd_parse_equal(_UNUSED const char *option_name,
@@ -1145,7 +1158,7 @@ static bool rm_cmd_set_cmdline(RmCfg *cfg, int argc, char **argv) {
     return true;
 }
 
-typedef struct rm_cmd_set_paths_vars {
+typedef struct RmCmdSetPathVars {
     RmCfg *const cfg;
     char **const paths;
     unsigned int index;
@@ -1154,23 +1167,21 @@ typedef struct rm_cmd_set_paths_vars {
     bool stdin_paths_preferred;
     const bool null_separated;
     bool all_paths_valid;
-} rm_cmd_set_paths_vars;
+} RmCmdSetPathVars;
 
 static INLINE
 void rm_cmd_set_paths_from_cmdline(
-    rm_cmd_set_paths_vars *const v,
+    RmCmdSetPathVars *const v,
     const bool replay
 ) {
     g_assert(v);
     g_assert(v->paths);
 
     for(char *path, **list = v->paths; (path = *list); ++list) {
-        if(path[0] == '-' && path[1] == 0) {
+        if(/* path is "-" */ path[0] == '-' && path[1] == 0) {
             v->read_stdin = true;
-            /* remember whether to treat stdin paths as preferred paths */
             v->stdin_paths_preferred = v->is_prefd;
-        } else if(path[0] == '/' && path[1] == '/' && path[2] == 0) {
-            /* the '//' separator separates non-preferred paths from preferred */
+        } else if(/* path is "//" */ path[0] == '/' && path[1] == '/' && path[2] == 0) {
             v->is_prefd = !v->is_prefd;
         } else {
             v->all_paths_valid &= rm_cfg_prepend_path(
@@ -1184,7 +1195,7 @@ void rm_cmd_set_paths_from_cmdline(
 
 static INLINE
 bool rm_cmd_set_paths_from_stdin(
-    rm_cmd_set_paths_vars *const v,
+    RmCmdSetPathVars *const v,
     const bool replay
 ) {
     g_assert(v);
@@ -1195,8 +1206,8 @@ bool rm_cmd_set_paths_from_stdin(
 
     char delim = v->null_separated ? 0 : '\n';
 
-    size_t buf_len;
-    char *path_buf = malloc(buf_len = PATH_MAX);
+    size_t buf_len = PATH_MAX;
+    char *path_buf = malloc(buf_len);
     if(!path_buf) {
         rm_log_perror(_("Failed to allocate memory"));
         return false;
@@ -1243,7 +1254,7 @@ bool rm_cmd_set_paths(RmCfg *const cfg, char **const paths) {
 
     const bool replay = cfg->replay;
     const bool read_stdin0 = cfg->read_stdin0;
-    rm_cmd_set_paths_vars v = {
+    RmCmdSetPathVars v = {
         .cfg = cfg,
         .paths = paths,
         .index = cfg->path_count,
diff --git a/lib/path-funcs.h b/lib/path-funcs.h
index 5599387b..1f46ac0b 100644
--- a/lib/path-funcs.h
+++ b/lib/path-funcs.h
@@ -39,7 +39,7 @@
 #endif
 
 #include "path.h"       // RmPath
-#include "config.h"     // _, rm_log_warning_line, rm_log_perrorf
+#include "config.h"     // _, rm_log_warning_line
 
 static INLINE
 void rm_path_free(RmPath *const p) {
@@ -63,7 +63,9 @@ bool rm_path_is_real(
     rm_log_warning_line(
         _("Can't get real path for directory or file \"%s\": %s"),
         path, strerror(errno)
-    ); return false;
+    );
+
+    return false;
 }
 
 #if HAVE_FACCESSAT
@@ -79,8 +81,10 @@ bool rm_path_is_accessible(const char *const path) {
         rm_log_warning_line(
             _("Can't open directory or file \"%s\": %s"),
             path, strerror(errno)
-        ); return false;
-    } return true;
+        );
+        return false;
+    }
+    return true;
 }
 
 #undef NOT_ACCESSIBLE
@@ -109,8 +113,10 @@ bool rm_path_is_file(const char *const path) {
         rm_log_warning_line(
             _("Could not get metadata for path \"%s\": %s"),
             path, strerror(errno)
-        ); return false;
-    } return S_ISREG(s.st_mode);
+        );
+        return false;
+    }
+    return S_ISREG(s.st_mode);
 }
 
 static INLINE
diff --git a/po/de.po b/po/de.po
index feb59eb5..e4d23497 100644
--- a/po/de.po
+++ b/po/de.po
@@ -7,7 +7,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: rmlint 2.0.0\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2018-10-15 04:46+0000\n"
+"POT-Creation-Date: 2019-01-21 21:40+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <[email protected]>\n"
@@ -1028,6 +1028,11 @@ msgstr ""
 msgid "No stamp file at `%s`, will create one after this run."
 msgstr ""
 
+#: lib/cmdline.c
+#, c-format
+msgid "Failed to include this replay file: %s"
+msgstr ""
+
 #: lib/path-funcs.h
 #, c-format
 msgid "Invalid path \"%s\""
diff --git a/po/es.po b/po/es.po
index d0a4c46e..fd81c42d 100644
--- a/po/es.po
+++ b/po/es.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: rmlint 2.4.0\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2018-10-15 04:46+0000\n"
+"POT-Creation-Date: 2019-01-21 21:40+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <[email protected]>\n"
@@ -1005,6 +1005,11 @@ msgstr ""
 msgid "No stamp file at `%s`, will create one after this run."
 msgstr ""
 
+#: lib/cmdline.c
+#, c-format
+msgid "Failed to include this replay file: %s"
+msgstr ""
+
 #: lib/path-funcs.h
 #, c-format
 msgid "Invalid path \"%s\""
diff --git a/po/fr.po b/po/fr.po
index 2b2c58c1..ea5b2db4 100644
--- a/po/fr.po
+++ b/po/fr.po
@@ -7,7 +7,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: rmlint 2.0.0\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2018-10-15 04:46+0000\n"
+"POT-Creation-Date: 2019-01-21 21:40+0000\n"
 "PO-Revision-Date: 2014-12-02 13:37+0100\n"
 "Last-Translator: F. <[email protected]>\n"
 "Language-Team: none\n"
@@ -1001,6 +1001,11 @@ msgstr ""
 msgid "No stamp file at `%s`, will create one after this run."
 msgstr ""
 
+#: lib/cmdline.c
+#, c-format
+msgid "Failed to include this replay file: %s"
+msgstr ""
+
 #: lib/path-funcs.h
 #, c-format
 msgid "Invalid path \"%s\""
diff --git a/po/rmlint.pot b/po/rmlint.pot
index 94b92658..348f86fa 100644
--- a/po/rmlint.pot
+++ b/po/rmlint.pot
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: rmlint 2.7.0\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2018-10-15 04:46+0000\n"
+"POT-Creation-Date: 2019-01-21 21:40+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language-Team: LANGUAGE <[email protected]>\n"
@@ -950,6 +950,11 @@ msgstr ""
 msgid "No stamp file at `%s`, will create one after this run."
 msgstr ""
 
+#: lib/cmdline.c
+#, c-format
+msgid "Failed to include this replay file: %s"
+msgstr ""
+
 #: lib/path-funcs.h
 #, c-format
 msgid "Invalid path \"%s\""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants