Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: compile times #1303

Open
LukasKalbertodt opened this issue Dec 12, 2024 · 1 comment
Open

Tracking issue: compile times #1303

LukasKalbertodt opened this issue Dec 12, 2024 · 1 comment
Labels
area:backend Everything backend related kind:tracking-issue Long lived for tracking the progress of a more general topic

Comments

@LukasKalbertodt
Copy link
Member

The incremental compile times for the Tobira backend have gotten quite annoying. touch src/api/id.rs && cargo build takes roughly 16s for me, which is not great. And I have a Ryzen 7 7700X which is quite modern, especially regarding single core performance, which is most relevant for incremental compile times.
The full compile time (including all dependencies) with 51s isn't that bad, that's quite acceptable. But yeah the incremental compile time really makes development ... uhg.

This issue is just here to collect information on the issue and potentially track progress (or lack thereof...).


Unfortunately, it's not too surprising that Tobira takes a while to compile. We use lots of procedural macro helpers (confique for configuration, juniper for the GraphQL API, serde for various (de)serializations, clap for CLI args, reinda for assets, ...). I don't want to shit on proc-macros, I do like the tool, but they have to be expanded and, likely more importantly, they can quickly generate quite a lot of code that rustc then has to churn through.


Some resources:

@LukasKalbertodt LukasKalbertodt added area:backend Everything backend related kind:tracking-issue Long lived for tracking the progress of a more general topic labels Dec 12, 2024
@LukasKalbertodt
Copy link
Member Author

I did some investigations. (On c82a279).


Running RUSTC_BOOTSTRAP=1 cargo rustc -- -Z self-profile twice (first time rebuilds everything, second time is incremental), and taking the newer written .mm_profdata file, running summarize on it, I get:

+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| Item                                      | Self time | % of total time | Time     | Item count | Incremental load time | Incremental result hashing time |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_module_codegen_emit_obj              | 9.28s     | 30.332          | 9.28s    | 66         | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| run_linker                                | 5.52s     | 18.038          | 5.52s    | 1          | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_passes                               | 3.29s     | 10.757          | 3.29s    | 1          | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| codegen_module                            | 2.97s     | 9.714           | 3.97s    | 65         | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| eval_to_allocation_raw                    | 649.51ms  | 2.124           | 2.98s    | 6500       | 2.73ms                | 2.29ms                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_module_codegen                       | 641.36ms  | 2.097           | 9.92s    | 66         | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| typeck                                    | 572.67ms  | 1.872           | 2.03s    | 6207       | 51.22ms               | 11.76ms                         |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| monomorphization_collector_graph_walk     | 523.41ms  | 1.711           | 1.35s    | 1          | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_module_optimize                      | 499.39ms  | 1.633           | 499.39ms | 66         | 0.00ns                | 0.00ns                          |
+-------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+

It's pretty clear from this, that the code generation phase takes by far the longest. Here I used the default linker it seems (why is my mold config gone... oh well). These phases take so long because rustc is likely emitting tons of LLVM code. This is unfortunately also a well known problem (thanks to proc macros and monomorphization). It's not impossible to improve the situation, however!

Here is the flame graph for the same data:

rustc

Running cargo llvm-lines, I get:

  Lines                  Copies               Function name
  -----                  ------               -------------
  3245204                88453                (TOTAL)
    85339 (2.6%,  2.6%)     61 (0.1%,  0.1%)  juniper::types::async_await::resolve_selection_set_into_async_recursive::{{closure}}
    35733 (1.1%,  3.7%)    129 (0.1%,  0.2%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    34648 (1.1%,  4.8%)    310 (0.4%,  0.6%)  <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
    33079 (1.0%,  5.8%)    290 (0.3%,  0.9%)  <futures_util::future::future::flatten::Flatten<Fut,<Fut as core::future::future::Future>::Output> as core::future::future::Future>::poll
    31785 (1.0%,  6.8%)    291 (0.3%,  1.2%)  <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
    30391 (0.9%,  7.7%)    310 (0.4%,  1.6%)  futures_util::future::future::map::_::<impl futures_util::future::future::map::Map<Fut,F>>::project_replace
    28252 (0.9%,  8.6%)     84 (0.1%,  1.7%)  <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
    26474 (0.8%,  9.4%)    366 (0.4%,  2.1%)  juniper::types::async_await::resolve_selection_set_into_async_recursive::{{closure}}::{{closure}}
    26070 (0.8%, 10.2%)    975 (1.1%,  3.2%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    24569 (0.8%, 11.0%)   1024 (1.2%,  4.3%)  alloc::boxed::Box<T>::new
    21480 (0.7%, 11.6%)    179 (0.2%,  4.5%)  juniper::executor::Registry<S>::get_type
    19372 (0.6%, 12.2%)    684 (0.8%,  5.3%)  core::result::Result<T,E>::map_err
    18564 (0.6%, 12.8%)    156 (0.2%,  5.5%)  juniper::executor::Executor<CtxT,S>::resolve_async::{{closure}}
    18330 (0.6%, 13.4%)    136 (0.2%,  5.6%)  <core::slice::iter::Iter<T> as core::iter::traits::iterator::Iterator>::fold
    17042 (0.5%, 13.9%)    624 (0.7%,  6.4%)  core::option::Option<T>::map
    16182 (0.5%, 14.4%)     87 (0.1%,  6.5%)  alloc::raw_vec::RawVec<T,A>::try_allocate_in
    16159 (0.5%, 14.9%)    113 (0.1%,  6.6%)  juniper::executor::Executor<CtxT,S>::resolve_with_ctx_async::{{closure}}
    14808 (0.5%, 15.4%)    102 (0.1%,  6.7%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    14774 (0.5%, 15.8%)     83 (0.1%,  6.8%)  <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
    14722 (0.5%, 16.3%)     84 (0.1%,  6.9%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::new
    14641 (0.5%, 16.7%)    510 (0.6%,  7.5%)  core::result::Result<T,E>::map
    14048 (0.4%, 17.1%)    979 (1.1%,  8.6%)  core::ops::function::FnOnce::call_once
    14001 (0.4%, 17.6%)     44 (0.0%,  8.6%)  <toml_edit::de::value::ValueDeserializer as serde::de::Deserializer>::deserialize_any
    13921 (0.4%, 18.0%)    309 (0.3%,  9.0%)  core::pin::Pin<Ptr>::set
    12790 (0.4%, 18.4%)     84 (0.1%,  9.1%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::release_task
    12705 (0.4%, 18.8%)    314 (0.4%,  9.4%)  std::panicking::try
    11928 (0.4%, 19.2%)     84 (0.1%,  9.5%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::unlink
    11472 (0.4%, 19.5%)    201 (0.2%,  9.7%)  alloc::sync::Arc<T>::new
    11440 (0.4%, 19.9%)   2860 (3.2%, 13.0%)  core::pin::Pin<Ptr>::new_unchecked
    11346 (0.3%, 20.2%)    206 (0.2%, 13.2%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    11256 (0.3%, 20.6%)     84 (0.1%, 13.3%)  alloc::sync::Arc<T,A>::from_raw_in
    11047 (0.3%, 20.9%)    290 (0.3%, 13.6%)  futures_util::future::future::flatten::_::<impl futures_util::future::future::flatten::Flatten<Fut1,Fut2>>::project
    11012 (0.3%, 21.2%)    200 (0.2%, 13.9%)  <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop
    11004 (0.3%, 21.6%)     84 (0.1%, 13.9%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::link
    10798 (0.3%, 21.9%)     22 (0.0%, 14.0%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
    10578 (0.3%, 22.2%)     86 (0.1%, 14.1%)  alloc::sync::Arc<T,A>::downgrade
    10416 (0.3%, 22.6%)     84 (0.1%, 14.2%)  futures_util::stream::futures_unordered::ready_to_run_queue::ReadyToRunQueue<Fut>::dequeue
    10364 (0.3%, 22.9%)    188 (0.2%, 14.4%)  <alloc::sync::Arc<T,A> as core::clone::Clone>::clone
    10121 (0.3%, 23.2%)    310 (0.4%, 14.7%)  futures_util::future::future::map::_::<impl futures_util::future::future::map::Map<Fut,F>>::project
    10103 (0.3%, 23.5%)      3 (0.0%, 14.7%)  <meilisearch_sdk::errors::_::<impl serde::de::Deserialize for meilisearch_sdk::errors::ErrorCode>::deserialize::__Visitor as serde::de::Visitor>::visit_enum
     9576 (0.3%, 23.8%)     84 (0.1%, 14.8%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::push
     9177 (0.3%, 24.1%)     57 (0.1%, 14.9%)  <toml_edit::de::spanned::SpannedDeserializer<T> as serde::de::MapAccess>::next_value_seed

Independent of all that, I tried disabling built to see if it changed anything. Kind of? FWIW I used this as dummy replacement:

mod build_info {
    // include!(concat!(env!("OUT_DIR"), "/built.rs"));
    pub const PKG_VERSION: &str = "2.12";
    pub const BUILT_TIME_UTC: &str = "now";
    pub const GIT_COMMIT_HASH: Option<&str> = Some("123");
    pub const GIT_DIRTY: Option<bool> = Some(false);
    pub const TARGET: &str = "beteradf";
}

And it reduced build times from 16.5s to 16s. Which isn't a lot relatively speaking, but if we manage to reduce the build times drastically by some other way, it's worth remembering these 500ms we can maybe shave off.


I will however, not actually spend any time now trying to improve the situation. I just wanted to write down my observations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:backend Everything backend related kind:tracking-issue Long lived for tracking the progress of a more general topic
Projects
None yet
Development

No branches or pull requests

1 participant