Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup Accumulation process #137

Open
TravisH18 opened this issue Nov 19, 2024 · 2 comments
Open

Speedup Accumulation process #137

TravisH18 opened this issue Nov 19, 2024 · 2 comments
Assignees

Comments

@TravisH18
Copy link
Collaborator

A few parts of the Accumulation loop are bottlenecks at the moment.

  1. make_all_cat_comids takes ~15 minutes.
  2. Zone processing for loop in MakeVectors function takes ~80 minutes.
  3. Bastards function itself takes ~2-3 minutes.

Generic parallelism and changes to libraries like pyogrio or doing numpy vectorization will give us massive speedups without having to alter code too much.

@TravisH18 TravisH18 self-assigned this Nov 19, 2024
@TravisH18
Copy link
Collaborator Author

Created Speedup branch and pushed changes to makeVector process. This was the main slowdown in the accumulation process. Could continue working on children / bastard functions as well as adding numba to swapper function since it uses pure numpy processes.

@TravisH18
Copy link
Collaborator Author

Adding another parallel loop to the main Accumulation function in StreamCat_functions.py when we use a for loop to go through the columns in the table containing watershed values to create the numpy ndarray, called data, that will be used to create the final parquet file. The parallel function will use a thread for each column since at this point columns do not need to share data or go in any specific order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant