Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre_check: :all_same option #443

Closed
wants to merge 5 commits into from

Conversation

sabiwara
Copy link

@sabiwara sabiwara commented Dec 5, 2024

Close #435

I tried to split individual commits to help with the review.

Note regarding b33b5cf

I noticed that Benchee.UnknownProfilerError might have accidentally been declared as Benchee.Profile.Benchee.UnknownProfilerError, hence the proposed move to a new file where we can define custom errors in the top-level Benchee. If nesting was voluntary here, I'm happy to replicate this pattern and define it within Benchee.Benchmark.Runner.

Note regarding the name

I'm OK with the originally proposed :assert_all_same as well, but since it is used within pre_check, the notion of checking/asserting felt slightly redundant and pre_check: :all_same felt enough. Happy to change it back or to something else entirely 🙂

Note regarding 44dff6f

I implemented a custom collector in order to retrieve the return value, but thanks to dialyzer I realized it didn't strictly speaking match the behaviour which expects a number. We could perhaps override scenario.function to a wrapper sending a message to the parent, but I wanted to consult you first about the best approach.

@PragTob
Copy link
Member

PragTob commented Dec 31, 2024

Again, appologies for my tardyness there was a lot going on. Getting to it these days 😅

@PragTob
Copy link
Member

PragTob commented Dec 31, 2024

Hm a failing test looks like it concerns reductions... curios if that's just a flakey or consistent:



  1) test collect/1 returns the reduction count and result of the function (Benchee.Collect.ReductionsTest)
     test/benchee/benchmark/collect/reductions_test.exs:6
     Assertion with <= failed
     code:  assert reductions <= 71
     left:  227
     right: 71
     stacktrace:
       test/benchee/benchmark/collect/reductions_test.exs:13: (test)

@PragTob
Copy link
Member

PragTob commented Dec 31, 2024

Thanks for the great description! 💚

The error declaration definitely wasn't on purpose, thanks for catching that. I'm also good with :all_same.

The other one I'll have to ponder 🤔

Copy link
Member

@PragTob PragTob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this is brilliant! 💚

Being tardy on this again, I'm happy to finish it up myself. I'm curios about the mac test failure and hope it's only a flakey because I have no idea how that would have happened/started failing.

The only real point to address (I think) is the runner implementation vs. the separate collector. Oh and I'm curios about the choice of Task.async but yeah :)

IMG_20220619_184733


@spec collect((-> any)) :: {any, any}
def collect(function) do
result = function |> Task.async() |> Task.await(:infinity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is curios to me - why did you decide to implement it with Task.async & await? As we're immediately awaiting there is no time benefit - copying the function including closure to the process likely makes it slower.

I think I can see an argument for isolating the different functions & inputs being run - but then I think I'd rather do it "on the outside" - i.e. wrapping the invocation of this inside the pre_check function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was for isolation indeed since it seemed other collectors were all running it in a new process and I tried to make it consistent - I just noticed it isn't the case for Benchee.Benchmark.Collect.Time so there is no reason indeed 😁

Should be:

Suggested change
result = function |> Task.async() |> Task.await(:infinity)
result = function.()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah reductions and memory run in a separate process to isolate their respective measurements :)

@spec collect((-> any)) :: {any, any}
def collect(function) do
result = function |> Task.async() |> Task.await(:infinity)
{result, result}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I thought we don't need a new collector/it's odd but then I saw that we already implemented it with collectors and thought "ok maybe it's good". But, seeing here that we return the result twice made me question this again - because we already take the result and pass it on for the hooks! Runner.collect/3 then returns the measurement but it has the return value of the function:

  @doc """
  Takes one measure with the given collector.

  Correctly dispatches based on the number of iterations to perform.
  """
  def collect(
        scenario = %Scenario{function: function},
        scenario_context = %ScenarioContext{
          num_iterations: 1
        },
        collector
      ) do
    new_input = Hooks.run_before_each(scenario, scenario_context)
    function = main_function(function, new_input)

    {measurement, return_value} = invoke_collector(collector, function)

    Hooks.run_after_each(return_value, scenario, scenario_context)
    measurement
  end

Returning both seems like overhead, but maybe we could just implement a different Runner.collect function that in the end returns the return_value instead of the measurement? It also could be an option but that feels odd.

The more I think about it, I think just returning both here seems like a fun move (returning the measurement and the return value) and then we can do with it what we wish.

Although... there is the case of multiple iterations in Runner.collect - which would be a real headache to deal with. In that case I believe doing a separate runner function for this purpose that only ever does one iteration (as is its purpose) works better.

Sorry I hope I make sense, just wanted to talk through the whole thought process with you :)

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning both seems like overhead, but maybe we could just implement a different Runner.collect function that in the end returns the return_value instead of the measurement? It also could be an option but that feels odd.

Perhaps not so odd I think - it could be Runner.run_once(scenario, scenario_context) which doesn't even need to take in a collector.

In that case I believe doing a separate runner function for this purpose that only ever does one iteration (as is its purpose) works better.

Agreed 👍

The only issue is that the new API would need to be bubbled up to Benchee.Benchmark.RunOnce as well. Perhaps RunOnce could be merged with Runner to simplify this, I'm not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah run_once would need to use it. I'm not quite sure why I/we created RunOnce - I guess so Profile doesn't have to depend on all of Runner but that seems disingenious since RunOnce depends on Runner - so merging that back in seems fine!

@sabiwara
Copy link
Author

Again, appologies for my tardyness there was a lot going on. Getting to it these days 😅

No worries and no need to apologize at all. Thanks a lot for taking the time!

I'm happy to finish it up myself.

Thanks! 💜 Since what remains mostly involves "where should we add/move this" kind of decisions which I can't really take as a non-maintainer, I think it would be a more efficient approach, otherwise we might end up with many iterations.

I'm curios about the mac test failure and hope it's only a flakey because I have no idea how that would have happened/started failing.

I wouldn't be surprised it would be flake in this case. I've been getting some flake too on the CI for my Dune project regarding reduction counts.

Oh and I'm curios about the choice of Task.async but yeah :)

That was just me cargo culting 🤣

@PragTob
Copy link
Member

PragTob commented Jan 1, 2025

Oha, dune looks interesting! 😁 I'll see to take this over then, thank you so much for your work! 💚

@PragTob
Copy link
Member

PragTob commented Jan 5, 2025

Finished up in #448

@PragTob PragTob closed this Jan 5, 2025
@sabiwara sabiwara deleted the pre_check_all_same branch January 5, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature suggestion: assert_all_same option to compare alternative implementations
2 participants