You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this the origin of this particular bug, but I have not successfully reproduced this error on other DRMAA implementations.
I've submitted jobs to my SLURM 18.08 system where, occassionally, I get a reported "unknown signal?!". The exact same job, when resubmitted, may or may not have this issue. I cannot track down exactly what happens when this occurs or what causes this.
I have run strace on the job itself that was submitted on equivalent jobs, one which reports the "unknown signal" vs a regular exiting job and I cannot find any discernable difference and notably when tracing specifically for any signals.
sacct reports nothing unusual, and actually seems to indicate that the job exited without issue. The sysadmin for our cluster system seems to agree and cannot find any issue.
This could be a cluster-specific issue, DRMAA issue, or not. If I'm looking in the wrong place please kindly redirect me. I'm not sure where or how I could start tracking down this issue.
Thanks for your time.
The text was updated successfully, but these errors were encountered:
Hello,
I'm not sure if this the origin of this particular bug, but I have not successfully reproduced this error on other DRMAA implementations.
I've submitted jobs to my SLURM 18.08 system where, occassionally, I get a reported "unknown signal?!". The exact same job, when resubmitted, may or may not have this issue. I cannot track down exactly what happens when this occurs or what causes this.
I have run strace on the job itself that was submitted on equivalent jobs, one which reports the "unknown signal" vs a regular exiting job and I cannot find any discernable difference and notably when tracing specifically for any signals.
sacct
reports nothing unusual, and actually seems to indicate that the job exited without issue. The sysadmin for our cluster system seems to agree and cannot find any issue.This could be a cluster-specific issue, DRMAA issue, or not. If I'm looking in the wrong place please kindly redirect me. I'm not sure where or how I could start tracking down this issue.
Thanks for your time.
The text was updated successfully, but these errors were encountered: