Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: parallel test and build #7093

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sgammon
Copy link
Contributor

@sgammon sgammon commented Mar 10, 2024

Summary

This changeset improves Guava's build and test speed by activating Maven's built-in features for parallel build and test execution. Guava's testsuite is enormous, with over a million tests executed during a build; this PR cuts the execution time for all tests from ~6.5 minutes to ~2 minutes. Compile steps also see ~33% speedup.

Some of Guava's tests are understandably flaky when executed like this. There are only 4-5 tests that I've seen be flaky after many runs. Setting a reasonable test-retry count (3 in this PR) covers this, and I haven't seen the testsuite flake to a failure state since. Split out from work in #7094.

Representative sample when built and executed serially:

[INFO] Guava Maven Parent ..................... SUCCESS [  0.129 s]
[INFO] Guava: Google Core Libraries for Java .. SUCCESS [ 15.653 s]
[INFO] Guava BOM .............................. SUCCESS [  0.064 s]
[INFO] Guava Testing Library .................. SUCCESS [01:26 min]
[INFO] Guava Unit Tests ....................... SUCCESS [06:26 min] <--
[INFO] Guava GWT compatible libs .............. SUCCESS [ 11.092 s]

Representative sample with this PR applied:

[INFO] Guava Maven Parent ..................... SUCCESS [  0.121 s]
[INFO] Guava: Google Core Libraries for Java .. SUCCESS [  9.681 s]
[INFO] Guava BOM .............................. SUCCESS [  0.120 s]
[INFO] Guava Testing Library .................. SUCCESS [ 47.883 s]
[INFO] Guava Unit Tests ....................... SUCCESS [01:57 min]  <--
[INFO] Guava GWT compatible libs .............. SUCCESS [  6.909 s]
Screenshot 2024-03-10 at 6 31 23 PM

Raw benchmark data is available here. Averaged over 5 runs.

guavabench.mp4

Benchmark was run with full clean/build/test, in two different sessions, on the same machine. Builds are not sharing resources. At times, this video plays at 8x speed to skip over longer stretches of build time; the timecode counter reflects this.

Benchmark command:

mvnw clean && sleep 1 && mvnw package -Dmaven.javadoc.skip=true -Dgpg.skip=true

CI runs

Concurrency tuning

The current settings are:

  • Maven will run with up to 2 threads per core for builds, and
  • Will run with up to 2.5 test forks per core
  • Reuse of forks was also turned on
  • Parallel GC was activated for Maven itself

The full JVM argline for Maven:

-XX:-TieredCompilation -XX:TieredStopAtLevel=1 -XX:+UseParallelGC -Djava.awt.headless=true

Benchmarking environment

  • Apple M2 Max, 96GB RAM
  • macOS Sonoma 14.3.1
  • GraalVM CE JVM 21.0.2
openjdk version "21.0.2" 2024-01-16
OpenJDK Runtime Environment GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30)
OpenJDK 64-Bit Server VM GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30, mixed mode, sharing)

PR Tree

This PR includes the following PRs, which should be rebased away after they are merged:

Changelog

  • chore: enable parallel build
  • chore: enable parallel test execution
  • chore: enable parallel gc for maven
  • chore: tune tiered compilation for maven
  • chore: tune thread count for maven
  • fix: enable test retries (max = 3) for parallel-flaky tests

cc / @cushon @cpovirk

@@ -0,0 +1 @@
-XX:-TieredCompilation -XX:TieredStopAtLevel=1 -XX:+UseParallelGC -Djava.awt.headless=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older versions of Java and Maven don't seem to pick these up, but need to confirm with testing to make sure -XX:+UseParallelGC etc. don't break specific JVM versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,2 @@
-T2C
--strict-checksums
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--strict-checksums made it in, but this is a good thing: Maven will reject dependencies with checksum failures. Surprisingly, this is not the default.

Comment on lines +15 to +18
<!-- Enable parallel test execution -->
<parallel>all</parallel>
<perCoreThreadCount>false</perCoreThreadCount>
<threadCount>48</threadCount>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel settings; tunable as properties from outside the build

@@ -31,6 +35,7 @@
<otherVariant.version>HEAD-android-SNAPSHOT</otherVariant.version>
<otherVariant.jvmEnvironment>android</otherVariant.jvmEnvironment>
<otherVariant.jvmEnvironmentVariantName>android</otherVariant.jvmEnvironmentVariantName>
<surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surefire retries

Comment on lines +257 to +258
<reuseForks>true</reuseForks>
<forkCount>2C</forkCount>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test forks

android/guava/pom.xml Outdated Show resolved Hide resolved
@cpovirk
Copy link
Member

cpovirk commented Mar 11, 2024

Thanks, this is something we probably should have looked into long ago. I have lots of thoughts:

  • While this helps local tests a lot (I see ~8:50->~3:40 for Guava Unit Tests in my testing, for which I just used mvn clean install, a noisier test than yours), it seems like the time for CI doesn't change dramatically. That doesn't make it bad, but it sets up nicely for my next bullet :)
  • I do worry a lot about flakiness, both the possibility of full-on failures in the future and the possibility that flakiness looks scary to users. In particular, I saw a failure in testLocationsFrom_idempotentScan that produced a massive list of files ("expected : [target/surefire/surefire-20240311133955068_90tmp, target/surefire/surefire_42-20240311133955068_131tmp, target/surefire/surefire-20240311133955068_74tmp, target/surefire/surefire-20240311133955068_98tmp...."). Many users would take that trade in advance for much faster feedback, but maybe we could put it behind a profile that people could opt into? That would keep the CI from flaking, and it would keep random users' builds from flaking, but people could opt in to more speed if they accept the risks. Of course, I won't pretend that most users would ever find that lever. (Maybe the build could be made to print out a note to recommend it?) But if you personally can opt in to faster runs as you work on your various PRs, that itself might provide a good chunk of the value.
  • There might also be ways to get tests to actually work when run in parallel... :) It's a shame that ours don't do that. I'm not even immediately sure what's up with the testLocationsFrom_idempotentScan failure. I guess that target/surefire may be on the classpath for test runs or something?? That seems a little unfortunate. Maybe our test could use some other directory that it creates. Or we could take the easy way out and disable that test in our external builds (while continuing to run it internally). I don't want to push you to take that on, just to list it as an option if we want to pursue making parallelism the default in the future. I'd probably always be a little nervous about parallelism, though, given that it's not the default for us internally. Or rather, our approach to parallelism internally includes not just separate JVMs but more strongly separated filesystems, too.
  • And that brings me to another thing I was wondering about: Can we get most of the gains from only forks, without thread-level parallelism? With only <forkCount>2C</forkCount> (and <reuseForks>true</reuseForks>), I get comparable performance to what I get with those plus the 48-threads setting. (Perhaps both those values could be tweaked further, of course, but that's my initial point of comparison.) Now, the fork-only approach is the one I was running when I got the testLocationsFrom_idempotentScan error discussed above, so clearly it doesn't solve all our problems, either. But it might minimize the chance of problems with lower risk. (Again, separate VMs (with additional isolation) is what we use internally, so it's more likely for our tests to work with that than with threading.)
  • When I'm sitting at my desktop in the office tomorrow, I want to test what happens to my machine when I have n JVMs with m threads running at the same time :) All is well when I'm running it remotely on my desktop from my laptop, but we'll see what happens to responsiveness in person. This may be end up as another reason to have people opt in to the parallelism, or maybe it won't matter, especially for the fork-only approach or with tweaked numbers in the settings? We'll see.
  • In your comment above, you mention "2 threads per core for builds" (and 2.5 forks per core for tests). I think the current PR is 2 forks per core for tests and no forking for builds, right? Or does Maven/javac support forking for builds? I've heard of schemes to parallelize builds, but I think they require building atop javac, and I don't know if Maven offers anything like that.

@sgammon
Copy link
Contributor Author

sgammon commented Mar 11, 2024

@cpovirk My thoughts are surprisingly aligned with yours, actually:

it seems like the time for CI doesn't change dramatically

I noticed this too and found it to be an interesting point. I hope to flush it out a bit more in my CI PR.

In particular, I saw a failure in testLocationsFrom_idempotentScan that produced a massive list of files [...] but maybe we could put it behind a profile that people could opt into

I thought about that, or perhaps just suppressing failures until they actually fail--that test seems to flake but always passes when it is retried, rendering the error messages useless anyway. I'll look into options in the Surefire plugin. The build still hasn't actually flaked because of this test. I assume that, if someone is building Guava from source, they should be okay with failure logs as long as the build itself doesn't consider it a failure, but the point you make here is valid and it bothers me too.

There might also be ways to get tests to actually work when run in parallel

Yes, I figure this is worth looking at. Honestly, it's literally 4-5 tests that seem to be angry out of 1M+. I could maybe gather which tests are flaky so a determination could be made about simply disabling them. I think there are ~500 ignored tests as it is so it may not be a big deal, I don't know. I definitely didn't want to do this without asking first, though.

And that brings me to another thing I was wondering about: Can we get most of the gains from only forks, without thread-level parallelism?

I noticed this, too, once I actually visualized the benchmarks. I think you are right, and I think most of the "risk" comes from the threaded building in Maven, which is new and not as well supported as VM forking during tests. I'll keep the configurable properties but rollback the threaded build as a default; that should offer a nice balance, where the threaded build can be tried but it isn't on by default.

The test forking obviously provides a huge benefit so I'll leave that on by default unless otherwise advised

This may be end up as another reason to have people opt in to the parallelism

I think that's smart if only because threaded building is still a bit buggy in Maven. But you're right that the build should not necessarily assume it is the only thing running on the machine. (Still bugs me, then, that CI sees less of an improvement, but anyway.)

In your comment above, you mention "2 threads per core for builds" (and 2.5 forks per core for tests). I think the current PR is 2 forks per core for tests and no forking for builds

That's right, it's Maven's multi-threaded building, not forking. I'm not sure if Maven supports forking/re-use for builds, but I do know there is now a Maven daemon, maybe that is where forking can safely take place, as jobs are assigned/consumed?

In any case, it's a moot point because of the risk identified above for multi-threaded builds. It should just be optional bc risk / low return.

This changeset optimizes the Guava build significantly by enabling
parallel build and test features supported by Maven. With these
flags enabled, only a few tests exhibit flaky behavior; applying a
sensible count of test retries (3) solves the problem.

As a result, the testsuite can now be executed often, because it
takes about 2 minutes to run. Building is also much faster. After
benchmarking different configurations, 2-threads-per-core and
2-test-forks-per-core seems optimal:

```
[INFO] Guava Maven Parent ..................... SUCCESS [  0.121 s]
[INFO] Guava: Google Core Libraries for Java .. SUCCESS [  9.681 s]
[INFO] Guava BOM .............................. SUCCESS [  0.120 s]
[INFO] Guava Testing Library .................. SUCCESS [ 47.883 s]
[INFO] Guava Unit Tests ....................... SUCCESS [01:57 min]  <--
[INFO] Guava GWT compatible libs .............. SUCCESS [  6.909 s]
```

When built and executed serially:
```
[INFO] Guava Maven Parent ..................... SUCCESS [  0.129 s]
[INFO] Guava: Google Core Libraries for Java .. SUCCESS [ 15.653 s]
[INFO] Guava BOM .............................. SUCCESS [  0.064 s]
[INFO] Guava Testing Library .................. SUCCESS [01:26 min]
[INFO] Guava Unit Tests ....................... SUCCESS [06:26 min] <--
[INFO] Guava GWT compatible libs .............. SUCCESS [ 11.092 s]
```

Benchmark hardware:
- Apple M2 Max, 96GB RAM
- macOS Sonoma 14.3.1
- GraalVM CE JVM 21.0.2

```
openjdk version "21.0.2" 2024-01-16
OpenJDK Runtime Environment GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30)
OpenJDK 64-Bit Server VM GraalVM CE 21.0.2+13.1 (build 21.0.2+13-jvmci-23.1-b30, mixed mode, sharing)
```

- chore: enable parallel build
- chore: enable parallel test execution
- chore: enable parallel gc for maven
- chore: tune tiered compilation for maven
- chore: tune thread count for maven
- fix: enable test retries (max = 3) for parallel-flaky tests

Signed-off-by: Sam Gammon <[email protected]>
@sgammon sgammon force-pushed the chore/build-test-performance branch from a726c91 to e946f2b Compare March 12, 2024 01:14
@cpovirk
Copy link
Member

cpovirk commented Mar 12, 2024

  • Thanks for the link on multi-threaded building. I was confusing "building" broadly (which is "easy" to parallelize) with "compiling" specifically (which is harder). And I had been focused on the pom.xml from my comments on the tests, so I'd forgotten about -T2C. So now I understand that the multi-threaded building is in place.

  • The thread-based parallelism does slow my machine's responsiveness. It doesn't grind things to a halt, but it does make things a bit harder. The fork-based parallelism seems gentler.

  • But in my new run of fork-based parallelism, I got a mysterious crash:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.2.5:test (default-test) on project guava-tests: 
[ERROR] 
[ERROR] Please refer to /usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire-reports for the individual test results.
[ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests' && '/usr/local/buildtools/java/jdk11/bin/java' '-Xmx1536M' '-Duser.language=hi' '-Duser.country=IN' '--add-opens' 'java.base/java.lang=ALL-UNNAMED' '--add-opens' 'java.base/java.util=ALL-UNNAMED' '--add-opens' 'java.base/sun.security.jca=ALL-UNNAMED' '-jar' '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire/surefirebooter-20240312164054787_92.jar' '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire' '2024-03-12T16-37-51_542-jvmRun5' 'surefire-20240312164054787_88tmp' 'surefire_28-20240312164054787_91tmp'
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 139
[ERROR] Crashed tests:
[ERROR] com.google.common.util.concurrent.FuturesTest
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests' && '/usr/local/buildtools/java/jdk11/bin/java' '-Xmx1536M' '-Duser.language=hi' '-Duser.country=IN' '--add-opens' 'java.base/java.lang=ALL-UNNAMED' '--add-opens' 'java.base/java.util=ALL-UNNAMED' '--add-opens' 'java.base/sun.security.jca=ALL-UNNAMED' '-jar' '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire/surefirebooter-20240312164054787_92.jar' '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire' '2024-03-12T16-37-51_542-jvmRun5' 'surefire-20240312164054787_88tmp' 'surefire_28-20240312164054787_91tmp'
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 139
[ERROR] Crashed tests:
[ERROR] com.google.common.util.concurrent.FuturesTest
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:358)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:296)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1241)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1090)
[ERROR] 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:910)
[ERROR] 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:126)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2(MojoExecutor.java:328)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute(MojoExecutor.java:316)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:174)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.access$000(MojoExecutor.java:75)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor$1.run(MojoExecutor.java:162)
[ERROR] 	at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute(DefaultMojosExecutionStrategy.java:39)
[ERROR] 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:159)
[ERROR] 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:105)
[ERROR] 	at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:193)
[ERROR] 	at org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:180)
[ERROR] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[ERROR] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
[ERROR] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
[ERROR] 	at java.base/java.lang.Thread.run(Thread.java:830)
[ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests' && '/usr/local/buildtools/java/jdk11/bin/java' '-Xmx1536M' '-Duser.language=hi' '-Duser.country=IN' '--add-opens' 'java.base/java.lang=ALL-UNNAMED' '--add-opens' 'java.base/java.util=ALL-UNNAMED' '--add-opens' 'java.base/sun.security.jca=ALL-UNNAMED' '-jar' '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire/surefirebooter-20240312164054787_92.jar' '/usr/local/google/home/cpovirk/clients/guava-black/guava/guava-tests/target/surefire' '2024-03-12T16-37-51_542-jvmRun5' 'surefire-20240312164054787_88tmp' 'surefire_28-20240312164054787_91tmp'
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 139
[ERROR] Crashed tests:
[ERROR] com.google.common.util.concurrent.FuturesTest
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:643)
[ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.lambda$null$3(ForkStarter.java:350)
[ERROR] 	... 4 more

I couldn't find the alleged log files or dump files, and the normal FuturesTest output file contained only the log messages from that test. I wonder if I need to increase the timeout? I'm going to run again and see what happens. If it fails, I'll run the given command directly, which may provide a lead.

  • If we can work that out, then I'll go back to being interested in how many tests we'd need to disable in order to get parallelism working reliably without retries.

@cpovirk
Copy link
Member

cpovirk commented Mar 12, 2024

I just kicked off another run with -Dmaven.surefire.debug -X. It's been chatty. With one fork outstanding, I see stuff like this at the end of the log:

ERROR: transport error 202: No sockets to listen to: Address already in use
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c:735]
[DEBUG] Fork Channel [3] connected to the client.JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c:735]

ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c:735]
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [src/jdk.jdwp.agent/share/native/libjdwp/debugInit.c:735]
[DEBUG] Fork Channel [5] connected to the client.
[DEBUG] Closing the fork 6 after not saying Good Bye.
[DEBUG] Closing the fork 16 after not saying Good Bye.

I'll be interested to see what the "normal" output looks like, but I'm out of time for the day.

(If it's related to ports: I want to say that our machines use up a surprising number of ports, but I wouldn't have expected it to be enough to interfere with some modest Maven parallelism. Maybe the messages above are mostly spam and the real problem is still a timeout? I haven't experimented with a higher timeout yet....)

@sgammon
Copy link
Contributor Author

sgammon commented Mar 12, 2024

@cpovirk I'm working on an approach locally which will let us tag tests which need to be run serially. The vast majority of the tests work totally fine under parallel execution; and, I think -T2C is doing a lot more than I initially expected. My benchmark numbers may not be showing the full gain for that because I was measuring the test step compile, rather than install, and at that phase in CI it would have access to cached classes.

So, if we can tag such tests and just run them after in their own suite, we should get most of the gain with no interference with the actual test code, which I'd like to avoid. If this can be a build-only PR it should be easier to understand the impacts.

@cpovirk
Copy link
Member

cpovirk commented Mar 13, 2024

Nice, that sounds good.

Updates on my end:

  • My run from yesterday never un-hung itself. That's discouraging, but maybe it was a fluke.
  • I kicked off fork-only runs in a loop today, and I got a crash like the one above on the 6th try.
  • Then I set forkedProcessExitTimeoutInSeconds to 600, and I kicked off a new loop. That's up to its 12th try without a failure. (It's still detecting the flake, of course.)

@cpovirk
Copy link
Member

cpovirk commented Mar 13, 2024

Ha, I spoke too soon! Try number 12 went on to crash. The problem was in FuturesTest again. (I want to say that the one I saw earlier today was in a different util.concurrent test, but I didn't save the output, so I'm not sure.)

This makes me nervous about even fork-level parallelism for tests :\ But even if we rule out all test parallelism, we should be able to get some benefits from the rest of the changes here.

Or, if we're lucky, the crashes are restricted to a few tests, too, in which case we can exclude them. (Hopefully we can narrow it down to something more specific than "all of FuturesTest" :))

@cpovirk
Copy link
Member

cpovirk commented Mar 13, 2024

I later bumped the timeout to 6000, and try number 13 failed, this time in AbstractFutureTest. That might have been the one that had failed earlier, but I'm not certain.

@cgdecker cgdecker requested a review from cpovirk March 18, 2024 14:04
@cgdecker cgdecker added the P3 no SLO label Mar 18, 2024
@cpovirk
Copy link
Member

cpovirk commented Apr 5, 2024

It occurred to me that there's one small thing that we could do to improve build performance: disable Error Prone. We run it over the code internally, so there's no strict need to run it externally, too. (I think I added it partially to bring Error Prone to people's attention and partially to help anyone who might write a PR.)

I just did 1 run of mvn clean install -Dmaven.source.skip -Dmaven.javadoc.skip -DskipTests in each of 3 configurations. (Note that those flags make the build for guava-gwt fail.)

As things are today:

[INFO] Guava: Google Core Libraries for Java .............. SUCCESS [ 32.405 s]
...
[INFO] Guava Testing Library .............................. SUCCESS [ 16.671 s]
[INFO] Guava Unit Tests ................................... SUCCESS [ 41.239 s]

-XepDisableAllWarnings:

[INFO] Guava: Google Core Libraries for Java .............. SUCCESS [ 16.384 s]
...
[INFO] Guava Testing Library .............................. SUCCESS [ 10.338 s]
[INFO] Guava Unit Tests ................................... SUCCESS [ 21.953 s]

-XepDisableAllChecks:

[INFO] Guava: Google Core Libraries for Java .............. SUCCESS [ 10.660 s]
...
[INFO] Guava Testing Library .............................. SUCCESS [  6.852 s]
[INFO] Guava Unit Tests ................................... SUCCESS [ 13.033 s]

-XepDisableAllWarnings might make a good sweet spot. (It's close to what we use internally.)

@sgammon
Copy link
Contributor Author

sgammon commented Apr 5, 2024

@cpovirk Nice! I wonder if I can get a quick build matrix going so we can test all of these variants out. I'll be returning to this soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 no SLO
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants