Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2021_Popovic_Tiwanaku #253

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

martynamolak
Copy link

@martynamolak martynamolak commented Feb 13, 2025

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@nevrome
Copy link
Member

nevrome commented Feb 13, 2025

@martynamolak Thanks for this submission! The automatic validation runs through, and this is ready for review!

@smpeltola Thanks for joining our new reviewer team! We're getting a lot more submissions now and have to be more organized about this. Would you be available to have a look at this package, specifically the .janno, .bib, and .yml file? Feel free to decline if you're not, at the moment.

@nevrome nevrome assigned nevrome and martynamolak and unassigned nevrome Feb 13, 2025
@nevrome nevrome requested a review from smpeltola February 13, 2025 14:47
@nevrome nevrome changed the title first draft of Popovic 2021 Sci Adv Tiwanaku paper dataset Add 2021_Popovic_Tiwanaku Feb 13, 2025
@smpeltola
Copy link
Contributor

@nevrome I can take this. Do you have a separate checklist for reviewers?

@nevrome
Copy link
Member

nevrome commented Feb 14, 2025

Thank you very much!

Unfortunately we don't have a checklist yet. The process is a bit hard to formalize, I fear. Feel free to suggest ToDos for a potential checklist!

For the moment the idea is that you as a reviewer look through the (human-readable) package data and report anything that strikes you as odd. You have submitted a package earlier and you are a potential user of this data. So I think you can focus on things that would make using this data harder for you. And of course concordance with the specification (e.g. https://www.poseidon-adna.org/#/janno_details) and structural correctness.

Copy link
Contributor

@smpeltola smpeltola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package look pretty good! I spotted some minor issues (some of which I was unsure of), listed below.

.janno-file

  • Latitude and Longitude: round up to 5 decimals
  • Y_Haplogroup: na (female) -> n/a
  • Y_Haplogroup: Notation does not follow the YFull syntax. I'm quite sure this has not been enforced in the past, but should there be a note in the Notes column to indicate which tree/nomenclature was used? "notation should follow a syntax with the main branch + the most terminal derived Y-SNP separated with a minus symbol (e.g. R1b-P312), similar to that used by Yfull."
  • Damage: Should be given as %, appears to be in fraction now
  • Contamination_Err: Should this be a single number rather than a range? "Some tools for contamination estimation do not return a mean plus a standard error. ContamMix, for example, yields a 95% confidence interval instead, to better represent assymetric output distributions. Contamination and Contamination_Err can not represent this. We suggest to derive a mean and a standard error from these alternative outputs. The latter can be calculated as the largest distance from the mean to the limits of the confidence interval."

.bib-file

  • Add DOI
  • Add full journal name
  • Abstract field doesn't actually contain an abstract

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants