Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disease-Gene exclusions #180

Merged
merged 2 commits into from
Dec 15, 2024
Merged

Disease-Gene exclusions #180

merged 2 commits into from
Dec 15, 2024

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Dec 12, 2024

addresses #174

Added a way for us to manually make exclusions, such that entries in morbidmap.txt will not get populated as disease-genes associations.

Changes

  • Add: data/exclusions-disease-gene.tsv: Manually curated file.
  • Update: Logic to utilize the above TSV.

Additional info

@joeflack4 joeflack4 requested a review from twhetzel December 12, 2024 00:48
@joeflack4 joeflack4 self-assigned this Dec 12, 2024
@joeflack4 joeflack4 added the enhancement New feature or request label Dec 12, 2024
@joeflack4 joeflack4 changed the base branch from main to develop December 12, 2024 00:52
@joeflack4 joeflack4 marked this pull request as draft December 12, 2024 00:54
@twhetzel
Copy link
Contributor

@joeflack4 did you mean to leave this as Draft for now?

Added a way for us to manually make exclusions, such that entries in morbidmap.txt will not get populated as disease-genes associations.
- Add: data/exclusions-disease-gene.tsv: Manually curated file.
- Update: Logic to utilize the above TSV.
@joeflack4
Copy link
Contributor Author

joeflack4 commented Dec 12, 2024

@twhetzel Yes, a draft because there was a small refactor and I want to run the output before and after and ensure that there is no diff (other than the intended effects of the exclusions).

That will be quick for me to do. If you don't hear from me by tomorrow about this, it means that everything is good on my end.

Otherwise, this code is ready to review!
If you like, I can show you snippets of the output so you can see the non-causal RO property being used, as well as the ORCID source annotations.

@twhetzel
Copy link
Contributor

@joeflack4 yes, that would be great! Please include the updated file.

@twhetzel
Copy link
Contributor

@joeflack4 when will this be ready to review with the updated file?

@joeflack4
Copy link
Contributor Author

It's possible tonight but likely tomorrow

@joeflack4 joeflack4 marked this pull request as ready for review December 14, 2024 02:44
@joeflack4
Copy link
Contributor Author

joeflack4 commented Dec 14, 2024

@twhetzel Done! See:

I examined all of the omim_id in the exclusions TSV Sabrina made. I confirmed all of them changed to non-causal RO:0003302 in both the omim.owl diff as well as the TSV output.

I did make an update to the SPARQL query which creates disease-gene-relationships.tsv. Previously, it was not including RO:0003302 entries. I think that might have been a mistake. Now that I've added it, I do see those rows in the Google Sheet. I'm surprised there are only 38 of them.

There are a couple HGNC declaration removals in the diff. That's the only thing that confused me. But for those removals, there were also axioms that were removed related to the relationships having previously been causal (RO:0004013), and I think these are removed as a consequence of that; because perhaps there are no longer any such references to those HGNC... I'm not sure. Maybe you can examine and see what you think. But I am not worried by it.

Next steps:

  • 1. @twhetzel Review
  • 2. @joeflack4 New release using this branch
  • 3. @joeflack4 Merge this PR -> develop -> main
    • @twhetzel I'm thinking I will do this this weekend as well, and I will do the review for develop --> main, but let me know if you want me to leave it open ahd have you review it as well.

- Update: Added Sabrina's ORCID to exclusions file.
- Bug fix: Was still not filtering exlcusions correctly; was missing a logical condition.
- Bug fix?: Added RO:0003302 entries to disease-gene-relationships (.sparql / .tsv). I think these were previously left out by mistake.
- Update: A comment to be more clear
OMIM:108770 MONDO:0007171 atrial standstill 1' https://orcid.org/0000-0002-4142-7153 digenic
OMIM:620040 MONDO:0031057 "dyskeratosis congenita, digenic'" https://orcid.org/0000-0002-4142-7153 digenic
OMIM:619478 MONDO:0030355 "facioscapulohumeral muscular dystrophy 4, digenic'" https://orcid.org/0000-0002-4142-7153 digenic
OMIM:300818 MONDO:0010438 paroxysmal nocturnal hemoglobinuria 1 https://orcid.org/0000-0002-4142-7153 "disease caused by a somatic mutation, therefore a gene association stating this is due to a germline mutation should not be added"
Copy link
Contributor

@twhetzel twhetzel Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no disease to gene associations on any of these diseases (OMIM phenotypes) now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!
Just for clarification, you mean no more disease-defining (RO:0004013) associations. They are now RO:0003302.

@@ -0,0 +1,8 @@
omim_id mondo_id mondo_label orcid exclusion_reason_comment
OMIM:603956 MONDO:0002974 cervical cancer' https://orcid.org/0000-0002-4142-7153 evidence of various genes involved
OMIM:619151 MONDO:0030894 "AMED syndrome, digenic'" https://orcid.org/0000-0002-4142-7153 digenic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain the HGNC removals in the diff.diff file using this as an example. When creating the omim.owl file from the omim.ttl file, there is a step in the make goal that adds the hgnc links where there is a disease2gene association represented with RO:0004003. Since this disease no longer has the RO:0004003 association and the gene ADH5 is not used as a disease-defining gene for any other disease it does not get converted to a HGNC identifier. The same thing happens with OMIM:620040 MONDO:0031057 "dyskeratosis congenita, digenic'".

If you open the omim-b4.owl and omim-after.owl files in Protege and go to the Object properties tab and check the Usage of RO:0003302 it's easy to see where the new associations exist that use the property (the number is small enough to not need to sparql this information). And looking at the gene entry in Protege, you can also check which of these genes that now have the RO:0003302 association also still have other disease-defining associations for other diseases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad you looked into the HGNC stuff! Thanks. I was thinking that might be the case.

Also, good Protégé tips!

@@ -24,6 +24,7 @@ WHERE {

FILTER(
?PredUri IN (
<http://purl.obolibrary.org/obo/RO_0003302>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, good to add in case any analysis comparisons of the data ever look at the other gene association properties other than the disease-defining property.

Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! If you can make a new release from this branch (that will be the latest release so after the scheduled GH Action release is created today) I can use the files to test out them in the downstream pipelines.

@joeflack4 joeflack4 merged commit 20f7580 into develop Dec 15, 2024
1 check passed
@joeflack4 joeflack4 deleted the d2g-exclusions branch December 15, 2024 00:05
@twhetzel
Copy link
Contributor

@joeflack4 I have not had time to do the test I wanted to run before this is merged into main, so will need to move on without that and confirm this when the pipeline is run. With that being said, can you merge this into main and create a release?

@joeflack4
Copy link
Contributor Author

@twhetzel Don't actually need to merge into main to run a release. The latest release was ran off this branch.

As for merging into main, you wanted to review that. Since we don't need to merge it to do the release, you can still take your time with that:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an exclusion file for OMIM entries to exclude from being added as a gene association
2 participants