Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2022_KumarScience_Xinjiang #252

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

stschiff
Copy link
Member

@stschiff stschiff commented Feb 13, 2025

This is a take-over of #206 by @ainashch. A first review by @nevrome was:

  • The package name does not follow our expected standard of Year_AuthorName_RelevantKeyword. I propose 2022_Kumar_Xinjiang.
  • Please remove all columns that are completely empty/filled only with n/a.
  • The Relation_To column works with the Alternative_IDs, not the Poseidon_IDs. Is there a reason why there are two sample naming schemes existing in parallel? Why did you opt for the alternative one for the Relation_To column? I think there are multiple possible solutions to this.
  • It seems you used a combination of Relation_Degree == first + Relation_Type == identical to express that two samples are from the same individual. This is not necessary. Relation_Degree can be set to identical directly.
  • There is a site called G218 - just to make sure: This is a proper site name?
  • The last sample has the Site set to Unknown. I think it would be better to put it to n/a.
  • The Date_Type should be set to contextual for contextual ages. Date_Note then does not need the redundant *Date contextual (what does the * mean?).
  • Date_BC_AD_Median can be computed as the mean of Date_BC_AD_Start and Date_BC_AD_Stop for contextual ages.
  • The Publication column is typically used for a bibtex key in a complete package. In this .janno-only submission we can leave it like it is for now.

@stschiff
Copy link
Member Author

I was able to download the genotype data for this package from a platform in China to which the author had uploaded it.

@stschiff
Copy link
Member Author

Oookay, so I've gone through some of the review points. Some quick remarks on them:

  1. I've added genotype data from the authors. They came with a more fine-grained group labeling, so I added those as first group to the Janno, but kept the more coarse-grained one that @ainashch added from the Supplement.
  2. I've fixed the issue about the "identical" relationships as flagged by @nevrome.
  3. I've not fixed the issue that the Relationships are currently not given in terms of Poseidon_IDs but in terms of Alternative_IDs. I would hope that perhaps @ainashch could help with this?
  4. Yes, the site's name appears to be "G218", so I've kept that.
  5. The date information is a mess, but @ainashch has indeed dutifully filled all we have. We don't have uncalibrated dates or errors, just calibrated ones. In some cases they seem to be indirect, and in some cases published elsewhere. I've made clearer notes to indicate this. I would leave the dates as they are now, including the fact that for some dates we just have a point estimate (entered in the median column), and for others we have boundaries, but no median. I don't see a good way around that for now.
  6. I've added bibliographic information.

@stschiff
Copy link
Member Author

So the one task left to do is fixing the relationships in terms of Poseidon_IDs. I think we need to make do this with a short script and some lookup table to exchange Alternative and Poseidon_IDs. @ainashch do you think you can perhaps just download the Janno file from this PR, work on this and send the fixed one back to me so I can included it?

@stschiff stschiff marked this pull request as draft February 13, 2025 08:28
@stschiff stschiff self-assigned this Feb 13, 2025
@nevrome nevrome changed the title Add 2022 kumar science xinjiang Add 2022_KumarScience_Xinjiang Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants