-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relating PRO to UniProt #165
Comments
I would very much like there to be a single URI for a concept like "human Shh protein" (or at least two equivalent interchangeable URIs). |
This will be possible once we find out just what the UniProt PURLS intend to mean. I recall @JervenBolleman saying he considers them to mean the same as PRO when he gives talks, but I'm not sure there's agreement on that (several people on the previous thread--myself included--indicated that they consider them as referring to database entries). In PRO we consider them exactly that--database entries that are about some protein class (for example, http://purl.uniprot.org/uniprot/P05067 is_about http://purl.obofoundry.org/obo/PR_P05067). My main concern is that the UniProt PURLs might be overloaded in meaning. That is, some people consider them to refer to classes of proteins, some say they refer to database entries, and others might consider them as referring to sequences . If they are database entries, fine, but for PRO purposes we'll need a way to refer to the sequence. If they are protein classes, fine, we'll provide the appropriate equivalency statements, but we'll still need a way to refer to the sequence. If they are sequences, fine, we'll make the appropriate connection. I recall @cmungall suggesting that for the sequences we use a URL such as https://www.uniprot.org/uniprot/P05067.fasta?version=1. That would be fine, but there are also these things: http://purl.uniprot.org/isoforms/P05067-1. I asked if that PURL is intended to represent the (current) sequence, or intended to represent the class of proteins derived from that isoform. I did not get an answer. |
[broken record] Our IDs can do dual duties as representing database entities and things in nature. There is no need to get meta and introduce an extra layer of indirection. Or at least I am not aware of such a use case, where someone really needs to track both these things and keep them distinct. I think the sequence vs protein molecule aspect is a bit more nuanced |
I believe you missed my point. It isn't that I am introducing a layer. The question is "What kind of entity does UniProt consider its entries to be?" And one possible answer is..."Database entries." |
@cmungall asked "What are the semantics of a non-GCRP trembl ID according to PRO?" TrEMBL entries fall into the following types: A) If there already exists a Swiss-Prot entry describing the products of some gene G (SP_of_G), then the TrEMBL entry describing a product of the same gene (Tr_of_G) can be:
B) If no Swiss-Prot entry describes the products of the TrEMBL gene, then the TrEMBL entry describing a product of that gene (Tr_of_G) can be:
C) If no gene is indicated in the TrEMBL entry (call it TrX), then...
Technically speaking, TrEMBL entries (like some Swiss-Prot) can also describe fragments. |
I'm going to post a strawman proposal: PRO gene-level protein classes and UniProt canonical/GCRP entries are to be considered equivalent in the strict OWL sense. (ergo the URIs could be collapsed with no loss of logical entailment and no introduction of inconsistency. This would be a win as the community would not have to make an arbitrary selection between two distinct PURLs/CURIEs) Ontologically these are protein classes, which are material entity classes (as is currently the case in PRO) (The uniprot docs talk about these as sequences, which is perfectly valid as the main use case for these involves treating them as sequences, but in the ontological treatment, the sequence would be a property of the material entity) They are the superclasses of isoform classes (as they are now, in PRO) The isoform level classes in PRO would be equivalent to the uniprot isoform entries (e.g. P12345-1) There could be some kind of has-canonical-form relationship between the main class and isoform-1 (see http://purl.obolibrary.org/obo/RO_0002214) Note that at the database level, the canonical entry will have annotations for things such as protein domains, functions, etc. At the ontological level this will not be taken to mean that all instances of that protein have those properties. Otherwise we end up with logical inconsistencies. Instead it will be a some-some. Note that neither resource needs to make any changes to implement this. It would be a semantic MOU about ontological commitment of PURLs. And both would agree not to publish logical axioms that introduce logical inconsistencies. However, if both parties agree, then there is a strong case for PRO switching from PRO purls for gene-level to instead use uniprot PURLs. |
Unfortunately, the PRO meeting is heavily focused on preparing for work proposed as part of an upcoming grant, and will be rather high level. It is possible (and likely) that this can be discussed with a few people outside the meeting, but there just isn't time to do so during the meeting itself (plus, we won't have the required stakeholders present). Given my schedule, I myself will not be able to address your proposal for another few weeks. |
This issue is a continuation of the discussion here:
geneontology/neo#34
This thread will focus on:
Interested parties (so far):
@JervenBolleman
@cmungall
@goodb
@alanruttenberg
The text was updated successfully, but these errors were encountered: