Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-9377. Implement some metrics with RW-lock instead of synchronization #7708

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

juncevich
Copy link
Contributor

What changes were proposed in this pull request?

Change synchronized metrics collect methods to read/write lock implementation.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9377

How was this patch tested?

I wrote the JUnit test (I'm not sure I should add this test to this PR) to compare performance metric realization.

  @RepeatedTest(100)
  void testMyMetrics() throws InterruptedException {
    OzoneMutableRate rate = new OzoneMutableRate("Test name", "Test description", true);
    final MetricsRecordBuilder builder = getMetricsRecordBuilder();
    ExecutorService executor = Executors.newFixedThreadPool(10);
    for (int i = 0; i < 100; i++) {
      int finalI = i;

      executor.submit(() -> {
        rate.add(finalI);
        if (finalI % 100 == 0) {
          rate.snapshot(getMetricsRecordBuilder());
        }
      });
    }
    executor.awaitTermination(1, TimeUnit.SECONDS);
    rate.snapshot(builder);
    assertEquals(100, builder.metrics().get(0).value().intValue());
  }

@juncevich juncevich force-pushed the HDDS-9377 branch 2 times, most recently from 2f66e63 to 907cb96 Compare January 16, 2025 12:28
@juncevich juncevich marked this pull request as ready for review January 28, 2025 07:38
@adoroszlai adoroszlai changed the title HDDS-9377. Call MutableStat.add(long x) method as a runnable task in a separate service executor HDDS-9377. Implement metrics system with RW-lock instead of synchronization Jan 29, 2025
@adoroszlai adoroszlai requested a review from smengcl January 30, 2025 10:41
@adoroszlai
Copy link
Contributor

Thanks @juncevich for the patch. It seems this removes synchronization in some individual metrics, but not in MetricsSytemImpl itself. (see also HDDS-12193)

My 2 cents: I don't think we should reimplement Hadoop metrics system in Ozone with tweaks. Either improve it in Hadoop to benefit the ecosystem, or use a better existing metrics library in Ozone.

@adoroszlai adoroszlai changed the title HDDS-9377. Implement metrics system with RW-lock instead of synchronization HDDS-9377. Implement some metrics with RW-lock instead of synchronization Feb 4, 2025
@juncevich
Copy link
Contributor Author

Thanks @juncevich for the patch. It seems this removes synchronization in some individual metrics, but not in MetricsSytemImpl itself. (see also HDDS-12193)

My 2 cents: I don't think we should reimplement Hadoop metrics system in Ozone with tweaks. Either improve it in Hadoop to benefit the ecosystem, or use a better existing metrics library in Ozone.

Thanks for the notices. Could you tell me more about the better existing metrics library in Ozone? Is it an existing library in Ozone, or have you offered to change the metrics library from Hadoop to another metric system?

@adoroszlai
Copy link
Contributor

Could you tell me more about the better existing metrics library in Ozone? Is it an existing library in Ozone, or have you offered to change the metrics library from Hadoop to another metric system?

Not "existing in Ozone", I meant "existing somewhere else, which we could start using in Ozone".

  1. copy Hadoop's metrics system and improve
  2. improve in Hadoop
  3. replace

I consider (1) to be the worst option, both for Ozone and globally.

@juncevich
Copy link
Contributor Author

Could you tell me more about the better existing metrics library in Ozone? Is it an existing library in Ozone, or have you offered to change the metrics library from Hadoop to another metric system?

Not "existing in Ozone", I meant "existing somewhere else, which we could start using in Ozone".

  1. copy Hadoop's metrics system and improve
  2. improve in Hadoop
  3. replace

I consider (1) to be the worst option, both for Ozone and globally.

I think (2) is the best option, but it takes the longest. I will ask about not synchronizing metrics in the Hadoop project. And will return with the answer.

@juncevich
Copy link
Contributor Author

I looked at communication in Hadoop(mailing lists) and discussion and realization can take a long time )
I think the best way is not to depend on Hadoop libraries and change the metric library to another instead of improving the Hadoop metric library.
I offer to close this PR and create one task to change the metric library and (I'm not sure) another to create an Ozone metric API which can be implemented by any metric library.
What's your opinion, @adoroszlai ?

@kerneltime
Copy link
Contributor

@juncevich I have a few questions, what is the desired impact of this code change? Do you see a performance difference? How much is it? Answers to those questions will help us decide if it is ok to duplicate code but fix it to gain benefits. In the linked jira you do describe the problem but do not mention how much this PR will help solve the issue. Also, internal improvements sometimes do not lead to externally visible improvements, any details on how it helped speed up the test you'll deviced?

@juncevich
Copy link
Contributor Author

juncevich commented Feb 5, 2025

@juncevich I have a few questions, what is the desired impact of this code change? Do you see a performance difference? How much is it? Answers to those questions will help us decide if it is ok to duplicate code but fix it to gain benefits. In the linked jira you do describe the problem but do not mention how much this PR will help solve the issue. Also, internal improvements sometimes do not lead to externally visible improvements, any details on how it helped speed up the test you'll deviced?

@kerneltime The desired impact is to get rid of the locks in metrics collecting and improve writing speed. Without a fix, it was a 3000-3200 writing operation (CreateFile or CommitKey). After fixing it was 3600-3700. Improvement of more than 12%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants