Speedup Accumulation process #137

TravisH18 · 2024-11-19T22:54:13Z

A few parts of the Accumulation loop are bottlenecks at the moment.

make_all_cat_comids takes ~15 minutes.
Zone processing for loop in MakeVectors function takes ~80 minutes.
Bastards function itself takes ~2-3 minutes.

Generic parallelism and changes to libraries like pyogrio or doing numpy vectorization will give us massive speedups without having to alter code too much.

TravisH18 · 2024-11-25T22:18:52Z

Created Speedup branch and pushed changes to makeVector process. This was the main slowdown in the accumulation process. Could continue working on children / bastard functions as well as adding numba to swapper function since it uses pure numpy processes.

TravisH18 · 2025-02-20T19:31:55Z

Adding another parallel loop to the main Accumulation function in StreamCat_functions.py when we use a for loop to go through the columns in the table containing watershed values to create the numpy ndarray, called data, that will be used to create the final parquet file. The parallel function will use a thread for each column since at this point columns do not need to share data or go in any specific order.

TravisH18 added the enhancement label Nov 19, 2024

TravisH18 self-assigned this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup Accumulation process #137

Speedup Accumulation process #137

TravisH18 commented Nov 19, 2024

TravisH18 commented Nov 25, 2024

TravisH18 commented Feb 20, 2025

Speedup Accumulation process #137

Speedup Accumulation process #137

Comments

TravisH18 commented Nov 19, 2024

TravisH18 commented Nov 25, 2024

TravisH18 commented Feb 20, 2025