Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement task restart policies #280

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

ianmkenney
Copy link
Member

closes #277

* Test: test_add_task_restart_policy_patterns
* Test: test_get_task_restart_policy_patterns
* Test: test_remove_task_restart_policy_patterns
* Test: test_clear_task_restart_policy_patterns
* Test: test_task_resolve_restarts
* TaskRestartPattern
* TaskRestartPolicy
* TaskHistory
* Removed TaskRestartPolicy and TaskHistory
* Added Traceback
* TaskReturnPattern: Confirm that the input pattern is a string type and that it is not empty.
* Traceback: Confirm that the input is a list of strings and that none of them are empty.
@ianmkenney ianmkenney force-pushed the feature/iss-277-restart-policy branch from 7e82f54 to 6a167f1 Compare July 18, 2024 20:46
ianmkenney and others added 24 commits July 22, 2024 14:24
Similar to `TaskHub`s, the `TaskRestartPattern` needs additonal hashed
data to uniquely identify it as a Neo4j node (via the gufe key). The
unit tests have been updated to reflect this change.
`statestore` methods have been added to modify the database state:

* add_task_restart_patterns
* remove_task_restart_patterns
* get_task_restart_patterns

Tests were added for each method in the integration tests for the
statestore.
The `add_task_restart_patterns` method now establishes the APPLIES
relationship between the each new pattern and all Tasks ACTIONED
on the corresponding TaskHub.

Added testing for creation of the APPLIES relationship, asserting
the number of created connections over multiple TaskHubs and Tasks.

Further subdivided the test classes.

Additionally added a `set_task_restart_patterns_max_retries` method
for updating the max_retries of a TaskRestartPattern.
"actioning" a Task on a TaskHub with preexisting TaskRestartPatterns
created the APPLIES relationship between them with a num_retries value
of 0. This behavior is tested in the test_action_task function in
the statestore.
When an actioned Task is canceled and also has an APPLIES relationship
with a TaskRestartPattern, APPLIES is removed between the two
nodes.

Removed org, project, and campaign fields since they are not
necessary for the APPLIES relationship.
Setting an actioned Task status to the following statuses now removes
the APPLIES relationship from attached TaskRestartPatterns:

* complete
* invalid
* deleted

NOTE: tests have not been added for this yet
Confirming that changing the status of an actioned Task to any of
the following removes the APPLIES relationship:

* complete
* invalid
* deleted
New statestore method placeholders:
  - add_task_traceback
  - resolve_task_restarts

The compute api will add a Task Traceback and resolve restarts for
returned failed Tasks.

When a list of restart patterns are added, restarts are resolved.
* Renamed add_task_traceback to add_protocol_dag_result_ref_traceback
* Added tests for add_protocol_dag_result_ref_traceback
Implemented half of the resolve_task_restarts test
With this decorator, if a transaction isn't passed as a keyword arg, one
is automatically created (and closed). This allows a chaining behavior
where many method calls share a single transaction object.
* Removed custom tokenization
* Implemented _defaults to allow default tokenization to work
cancel_map has been changed from a defaultdict to a base dict and
instead using the dict.get method to return None. Additionally added a
set of all task/taskhub pairs that is later used to determine what
should be canceled.

I've also added grouping on taskhubs so the number of calls to
cancel_tasks is minimized.
@ianmkenney ianmkenney force-pushed the feature/iss-277-restart-policy branch from 1071369 to f03417c Compare October 8, 2024 17:26
Copy link

codecov bot commented Oct 8, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@dotsdl dotsdl marked this pull request as ready for review October 9, 2024 21:32
@dotsdl dotsdl self-requested a review October 25, 2024 23:01
Copy link
Member

@dotsdl dotsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this impressive feature @ianmkenney! I have a few notes, and I've made some modifications where it was obvious to me what to do.

Can you address the notes and fix any broken tests? After that, I think we should be good to merge!

alchemiscale/storage/statestore.py Show resolved Hide resolved
@@ -1411,30 +1461,51 @@ def action_tasks(
# so we can properly return `None` if needed
task_map = {str(task): None for task in tasks}

q = f"""
query_safe_task_list = [str(task) for task in tasks if task]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the trailing if task? Unclear under what conditions this would apply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a None were passed in this would filter it out. This was made more explicit in the current branch with a sync with main (d331cc4).

alchemiscale/storage/statestore.py Outdated Show resolved Hide resolved
alchemiscale/storage/models.py Outdated Show resolved Hide resolved
alchemiscale/storage/statestore.py Outdated Show resolved Hide resolved
alchemiscale/tests/unit/test_storage_models.py Outdated Show resolved Hide resolved
alchemiscale/tests/unit/test_storage_models.py Outdated Show resolved Hide resolved
alchemiscale/tests/integration/storage/test_statestore.py Outdated Show resolved Hide resolved
alchemiscale/tests/integration/storage/test_statestore.py Outdated Show resolved Hide resolved
@ianmkenney ianmkenney requested a review from dotsdl January 22, 2025 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add user-settable server-side Task restart policy, per-AlchemicalNetwork
3 participants