-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - run 'runtime-tests' on Android #746
base: main
Are you sure you want to change the base?
Conversation
Have you tried running this locally?
I'm wondering if we are leaking some memory during tests or it is just this test itself. |
just curious, I was expecting the time to increase a lot, how long is it taking now? Anyway, +1 on running nightly and/or on-demand. |
Yes, same result, but thanks for the command line, it's helpful to continue debugging.
It takes approx 10 minutes to get to the end of the tests. |
Not too bad after all :) but it still probably makes sense to make it optional/nightly for now |
I can take a look at the test time eventually. We can also shard and/or run tests in parallel. There is always the option to use firebase test lab to further speed them up. It will not cost much, but that would be a consideration for later. Just mentioning it for reference that we have options if we need these tests to run fast. |
yes! the test actually passes with:
so we pair-debugged this a little and it looks like the main issue is memory pressure because a lot of tests are run in a small amount of time. We tried adding a so, either we find a way to setup the VM so that is not an issue (we did try to change the finally, one last resort would be to add a stack count in the interpreter, and flag that specific test so that the we put a limit on it. in the meantime we could also just skip that one test |
With the updated commits, especially after Now the next issue is that, as @evacchi noticed too, the Android SDK disconnects/crash when trying to send back the tests results ( its reproducible in CI ). @yigit any clue? Can you try to take a look? |
@yigit I forgot to answer:
This is not going to happen unless we have a real world use-case for it.
I'll integrate the changes when we finalize this PR.
Let's get to a working and more complete setup, and we iterate as we see the need. |
First time I've seen that issue but looks like it was already reported: https://issuetracker.google.com/issues/330756856 For short term, i think sharding is the easiest solution we have (assuming the junit5 runner doesn't have an issue with it). Hopefully this weekend I'll have some time to play with this.
Is there a limit on the virtual stack or is it basically the limit of the host machine. If it is the limit of the host, yes we'll likely hit that no matter what. If it is bounded by the host process, it might make sense to limit it to (max memory - some buffer like a mb). But I'm guessing the same thing would be a problem on JVM. Maybe the overhead of the test runner is bigger as it has to keep everything in memory until reporting (about which i'm surprised but also not that surprised :) ) |
I've created 2 variants with sharding.
We also don't have to use matrix build and can simply invoke gradle multiple times in the same build to run shard by shard. We might need some massaging to prevent results from overriding each-other. We may also not want to shard small flavors (e.g. run The other 2 things in those branches are:
It is failing with
like errors but I didn't get a chance to diagnose. |
@yigit thanks a lot for getting back and all the information!
I did the research(specs/proposals etc.) and haven't found any good reference, looks like everyone is relying on the underlying runtime limit.
already reported here.
I'm fine with it if it's reproducible locally (and we can bundle the command to execute everything locally in a bash script). Trying out the commands in your branch, sharding is trying to execute single tests methods on different shards and this is not going to work (that's the root cause of the NPE you noticed). Regarding "zipping the results" and "Gradle scans," I truly appreciate your enthusiasm and your drive to make things better—thank you! That said, it’s important to keep the scope in mind. The Gradle build we use for running tests on Android serves as a basic sanity check. The simpler it is, the better. More specifically:
Thank you again for your efforts—I really value your commitment and thoughtfulness! |
Forgot to follow up regarding the build matrix: |
I'm afraid android test runner does not support sharding by class 😞 .
Not sure if I understand. You mean invoking Gradle multiple times per shard? We can definitely extract this to a shell script or a single Gradle task (might be nicer for local analysis as Studio would discover that as well, so one can debug a specific failure). But I'm guessing shell script will be easier for contributors. Also, about the test results/scans: Scans are good if we want to ever optimize this etc but doesn't really matter (it also doesn't cost to publish them for open source projects). But we can remove either of these (though I would recommend keeping test results). |
I played a bit more with this, sharding by class etc. Maybe that can be avoided by changing codegen to divide tests but based on the order problem you've mentioned, I'm not even sure if that is feasible. I think the only viable option might be to generate fat tests for android. e.g. JavaTestGen could generate tests that look like this, when invoked for android.
Not sure if that is something you would be interested in doing but I'm a bit out of options here :'(. Sorry, i didn't expect android support to get so complicated :/. |
no worries, all your suggestions are very welcome :D
I think what @andreaTP meant is "a single command to run the entire suite", but let's hear from him
I don't know what you guys think of this approach, but if we know what tests cause the problem (and I believe they will be a handful) maybe we could just exclude them from the main run and then create a task that run those separately? e.g.
we could also "tag" the tests with an annotation (e.g. |
Quickly answering from the phone.
No, I mean removing the matrix for Android SDK and test only on 33. Re: test results, I'm happy to keep whatever comes for free, e.g. a step in CI is fine. Re: scans, no strong opinions, the Gradle build is a side build for this project and I'm not sure how I would action a feedback from the scans.
This is really worrying!
The current test gen is already able to split the generation based on configuration, but I don't think we should invest in a complex setup there. All in all, I think that the priority here is to open a bug report and try to overcome the grpc limit/client disconnection (this is the biggest blocker). Second we should start considering running a few smoke tests instead of the full TCK. |
@yigit I was honestly expecting some hiccups :-) and I'm grateful to have someone knowledgeable like you assisting in the process. Thanks a ton and no worries! |
I noticed that bug somehow didn't get triaged and was reported in the wrong component. I moved it to the Android Gradle Plugin and will ping someone during the week. I'm afraid this is the code that creates that grpc client: And it doesn't receive any configuration parameters. Are the |
Separating the tests in groups with annotations is definitely possible and easy to implement 👍 as soon as we verify that it can work I volunteer to make the changes (e.g.
|
Usually it is possible to use env vars and global properties anyhow, I'll check on Mon. |
Investigating #746 I noticed that the resizes of the `ArrayList` were consuming a lot of memory.
@yigit no pressure and no timelines, just curious if you managed to keep the ball rolling on this subject and you have any kind of news 🙂 . I'd be happy to help, but this is all very new for me, and I'm unsure where to start, for example, to try to tweak the grpc limit. Any link/information will be much appreciated! |
Hey sorry i forgot to update here but i did re-route it internally. The team is looking into it but it being fixed then shipping in a version that we can use is going to be a long time :'( And there doesn't seem to be a workaround. I think our 2 options here are:
If (2) is not too complex of a code-generator change, I think that is the better option so we get coverage without waiting for another AGP release. I can give it a try if you are open to that solution. |
Thanks a lot for getting back @yigit !
Given the past conversation where you mentioned that even some single test classes are exceeding the limits I'm not sure how it will work. In my mind, after #748 , you should be able to avoid generating the test jars, and directly invoke the codegen from Gradle to materialize test classes on the fly. If everything fails for running the testsuite we can always fallback to smoke testing, in this case the next step would be to include the manually written wasi tests as they are running some popular languages "hello world" samples. |
|
This should be doable but a bit of work ... I'd try to avoid it if possible. Re Gradle, I was thinking about doing something like this: |
Opening for visibility @yigit and @evacchi .
I'm not done with this:
wasi
(Wat2Wasm
), decided to give up and disabled the tests, @yigit help here will be appreciated 🙏skip-stack-guard-page
test (it's checking the call stack exhausted)Thoughts?