-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disease-Gene exclusions #180
Conversation
@joeflack4 did you mean to leave this as Draft for now? |
Added a way for us to manually make exclusions, such that entries in morbidmap.txt will not get populated as disease-genes associations. - Add: data/exclusions-disease-gene.tsv: Manually curated file. - Update: Logic to utilize the above TSV.
1c688b2
to
8b41b67
Compare
@twhetzel Yes, a draft because there was a small refactor and I want to run the output before and after and ensure that there is no diff (other than the intended effects of the exclusions). That will be quick for me to do. If you don't hear from me by tomorrow about this, it means that everything is good on my end. Otherwise, this code is ready to review! |
@joeflack4 yes, that would be great! Please include the updated file. |
@joeflack4 when will this be ready to review with the updated file? |
It's possible tonight but likely tomorrow |
@twhetzel Done! See: I examined all of the I did make an update to the SPARQL query which creates There are a couple HGNC declaration removals in the diff. That's the only thing that confused me. But for those removals, there were also axioms that were removed related to the relationships having previously been causal (RO:0004013), and I think these are removed as a consequence of that; because perhaps there are no longer any such references to those HGNC... I'm not sure. Maybe you can examine and see what you think. But I am not worried by it. Next steps:
|
- Update: Added Sabrina's ORCID to exclusions file. - Bug fix: Was still not filtering exlcusions correctly; was missing a logical condition. - Bug fix?: Added RO:0003302 entries to disease-gene-relationships (.sparql / .tsv). I think these were previously left out by mistake. - Update: A comment to be more clear
c893901
to
1be4a68
Compare
OMIM:108770 MONDO:0007171 atrial standstill 1' https://orcid.org/0000-0002-4142-7153 digenic | ||
OMIM:620040 MONDO:0031057 "dyskeratosis congenita, digenic'" https://orcid.org/0000-0002-4142-7153 digenic | ||
OMIM:619478 MONDO:0030355 "facioscapulohumeral muscular dystrophy 4, digenic'" https://orcid.org/0000-0002-4142-7153 digenic | ||
OMIM:300818 MONDO:0010438 paroxysmal nocturnal hemoglobinuria 1 https://orcid.org/0000-0002-4142-7153 "disease caused by a somatic mutation, therefore a gene association stating this is due to a germline mutation should not be added" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no disease to gene associations on any of these diseases (OMIM phenotypes) now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
Just for clarification, you mean no more disease-defining (RO:0004013
) associations. They are now RO:0003302
.
@@ -0,0 +1,8 @@ | |||
omim_id mondo_id mondo_label orcid exclusion_reason_comment | |||
OMIM:603956 MONDO:0002974 cervical cancer' https://orcid.org/0000-0002-4142-7153 evidence of various genes involved | |||
OMIM:619151 MONDO:0030894 "AMED syndrome, digenic'" https://orcid.org/0000-0002-4142-7153 digenic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain the HGNC removals in the diff.diff file using this as an example. When creating the omim.owl file from the omim.ttl file, there is a step in the make goal that adds the hgnc links where there is a disease2gene association represented with RO:0004003. Since this disease no longer has the RO:0004003 association and the gene ADH5 is not used as a disease-defining gene for any other disease it does not get converted to a HGNC identifier. The same thing happens with OMIM:620040 MONDO:0031057 "dyskeratosis congenita, digenic'"
.
If you open the omim-b4.owl and omim-after.owl files in Protege and go to the Object properties tab and check the Usage of RO:0003302 it's easy to see where the new associations exist that use the property (the number is small enough to not need to sparql this information). And looking at the gene entry in Protege, you can also check which of these genes that now have the RO:0003302 association also still have other disease-defining associations for other diseases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really glad you looked into the HGNC stuff! Thanks. I was thinking that might be the case.
Also, good Protégé tips!
@@ -24,6 +24,7 @@ WHERE { | |||
|
|||
FILTER( | |||
?PredUri IN ( | |||
<http://purl.obolibrary.org/obo/RO_0003302>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, good to add in case any analysis comparisons of the data ever look at the other gene association properties other than the disease-defining property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! If you can make a new release from this branch (that will be the latest release so after the scheduled GH Action release is created today) I can use the files to test out them in the downstream pipelines.
@joeflack4 I have not had time to do the test I wanted to run before this is merged into |
addresses #174
Added a way for us to manually make exclusions, such that entries in morbidmap.txt will not get populated as disease-genes associations.
Changes
data/exclusions-disease-gene.tsv
: Manually curated file.Additional info