-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ampir Panaroo and Yara ToolFactory generated tools. #1326
Conversation
…m/fubar2/nftoolmaker It actually does work. Sort of. Hope this is useful to someone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ross. Lets try to figure this out.
<requirements> | ||
<requirement version="1.1.0" type="package">r-ampir</requirement> | ||
</requirements> | ||
<stdio> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very simple stdio is deprecated and should be replaced with <command detect_errors="aggressive">
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the galaxyxml supplied default. Not trivial to fix but needs a galaxyxml PR to replace the default stdio addition and to add keywords to the <command> tag. More work than I need right now - can we live with it until galaxyxml is fixed?
Easy to replace the actual contents of the tag - for example, would this from one of bgruening's tools be better for an automated boilerplate stdio?
<stdio>
<exit_code range=":-1" level="fatal" description="Error occurred. Please check Tool Standard Error" />
<exit_code range="137" level="fatal_oom" description="Out of Memory" />
<exit_code range="1:" level="fatal" description="Error occurred. Please check Tool Standard Error" />
</stdio>
</xml>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just don't use it, put in a profile version for reasonable defaults.
tools/ampir/ampir.xml
Outdated
<stdio> | ||
<exit_code range="1:" level="fatal"/> | ||
</stdio> | ||
<version_command><![CDATA[echo "0.01"]]></version_command> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mh, that should be ampir --version
or something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or in this case something like:
<version_command><![CDATA[Rscript -e 'library(vegan)' 2>&1 > /dev/null | grep 'This is vegan']]></version_command>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have always wondered about this. The ToolFactory form allows one to be specified for manual building. However for automation, it's a little tricky....it will now echo the first Conda dependency version if given, or the supplied ToolFactory form version string for manual builds.
What should that "version" command report?
- Should it report the tool XML/wrapper version? (makes most sense to me, giving all the other metadata)
- If there are >1 conda package(s), is the first one the version?
- What about a container?
For the record, every module script includes code to generate a fresh version.txt when run. It seems to be in every nf-core module. Unfortunately, sometimes it is buried inside an Rscript and sometimes it is part of the bash script. Either way, it's too complex to automate reliably. I could echo the conda version but that won't work for a container.
Suggestions or PR welcomed if anyone thinks it's a show-stopper.
This is not going to be simple or reliable to automate as your suggestion shows so some imperfect but good enough for nf-core (TM) fudging is called for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now uses the first conda dependency version if there is one, or else the tool version as the version string.
Does that work?
@@ -0,0 +1,15 @@ | |||
#!/usr/bin/env Rscript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that script is part of the configfile isn't it? Why is this included in the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user wanting to fix a converted module will need to get the script working first so it is supplied as a convenience - no need to copypasta - just setup a conda env and get it working on the command line - then it can be fixed in my nf-core module fork permanently so the next build run will work correctly.
Will permanently remove it if you think that's better - please let me know?
Tool generation produces a yaml file containing the generated command line.
Any ToolFactory instance can regenerate the tool because the TF will (soon) have a ToolFactory that accepts a yaml file as input instead of requiring a form to be filled in - but then the history job can be rerun and the form easily modifed so changes are easier. Will make those available on the github repository for ToolFactory users.
<inputs> | ||
<param name="faa" type="data" optional="false" label="FASTA file containing amino acid sequences" help="" format="fasta" multiple="false"/> | ||
<param name="min_length" type="integer" value="10" label="Minimum protein length for which predictions will be generated" help=""/> | ||
<param name="min_probability" type="float" value="0.7" label="Cut-off for AMP prediction" help=""/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add somehow min/max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible only if there is a machine readable source of truth.
That is never specified for any parameter in any of the nf-core module stuff I have found. If you can suggest an automatable way, let me know. It's not part of the TF either and would make the form even more horribly complex so probably won't fix that.
tools/ampir/ampir.xml
Outdated
</inputs> | ||
<outputs> | ||
<data name="amps_faa" format="fasta" label="File containing AMP predictions in amino acid FASTA format" hidden="false"/> | ||
<data name="amps_tsv" format="tsv" label="File containing AMP predictions in TSV format" hidden="false"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsv should be tabular I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsv is indeed in datatypes_config
<datatype extension="tsv" type="galaxy.datatypes.tabular:TSV" display_in_upload="true">
<converter file="tabular_to_csv.xml" target_datatype="csv"/>
</datatype>
nf-core to Galaxy formats is challenging. Needs a translate table so it is easy enough to convert tsv (they always use that) to tabular if you prefer ..... will do since it's easy.
</test> | ||
</tests> | ||
<help><![CDATA[description: A toolkit to predict antimicrobial peptides from protein sequences on a genome-wide scale. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are we doing with the help? Fix manually and later keep it during merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is grabbed from the yaml - can't really see any other useful sources but happy to use anything that you'd like to see.
This should all be automated. Manual fixes are not going to be part of these tools since the whole point is that they can be re-autogenerated whenever the underlying nf-core repository is updated in CI?
|
||
|
||
|
||
doi: 10.1093/bioinformatics/btaa653 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this DOI is different from the one below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
It comes from the nf-core module yaml and correctly leads to https://pubmed.ncbi.nlm.nih.gov/32683445/
The other is the ToolFactory paper DOI.
Automating the nf-core doi is possible but it is not mandatory so messy - could be done if you want?
…ply to all future generated tools :)
Thanks @bgruening! |
Allows single ended fastq or fasta to be mapped, or a pair of forward and reverse - both must be fastq
@bgruening
Anyone know why $foo.ext will return fastqsanger instead of fastq for a fastq in my history? It caused me hours of wasted time even though the data was definitely showing fastq not fastqsanger.. I noticed others have used startswith('fastq') so I used that....but it's a horrible hack for an odd wart. |
grrrr - vanishing single quotes added back to give |
tools/ampir/ampir.xml
Outdated
@@ -0,0 +1,87 @@ | |||
<tool name="ampir" id="ampir" version="1.1.0"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You definitely want a profile version here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it when @bgruening mentioned it but now I see it was in the wrong place :( Currently rebuilding everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<tool profile="22.05" name="yara_mapper" id="yara_mapper" version="1.17">
<!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay-->
<!--Created by [email protected] at 04/09/2023 18:31:39 using the Galaxy Tool Factory.-->
now and for any future generated tool. Thanks @mvdbeek! Will upload in the morning. Been a long day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvdbeek @bgruening - are these ok now?
Thanks for reviewing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only commented on the obvious stuff. You want to drop the stdio section too, drop the unused script, write a help section and address @bgruening's comments. I don't think the auto-generation should lower the usual threshold we have for IUC tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these too @mvdbeek ?
Looks like most of them use stdio here, so am not sure how to understand your request, or which of @bgruening's comments remain unresolved. Help appreciated.
tools/openms/FidoAdapter.xml:11: <expand macro="stdio"/>
tools/openms/RTEvaluation.xml:11: <expand macro="stdio"/>
tools/openms/FeatureFinderMetabo.xml:11: <expand macro="stdio"/>
tools/openms/OpenSwathDecoyGenerator.xml:11: <expand macro="stdio"/>
tools/openms/FFEval.xml:11: <expand macro="stdio"/>
tools/openms/OpenSwathWorkflow.xml:11: <expand macro="stdio"/>
tools/openms/PTPredict.xml:11: <expand macro="stdio"/>
tools/openms/ExternalCalibration.xml:11: <expand macro="stdio"/>
tools/openms/CVInspector.xml:11: <expand macro="stdio"/>
tools/openms/XTandemAdapter.xml:11: <expand macro="stdio"/>
tools/openms/PhosphoScoring.xml:11: <expand macro="stdio"/>
tools/openms/HighResPrecursorMassCorrector.xml:11: <expand macro="stdio"/>
tools/openms/PeptideIndexer.xml:11: <expand macro="stdio"/>
tools/openms/OpenSwathFileSplitter.xml:11: <expand macro="stdio"/>
tools/openms/PeakPickerWavelet.xml:11: <expand macro="stdio"/>
tools/openms/MapRTTransformer.xml:11: <expand macro="stdio"/>
tools/openms/QCMerger.xml:11: <expand macro="stdio"/>
tools/openms/MRMMapper.xml:11: <expand macro="stdio"/>
tools/openms/SpectraMerger.xml:11: <expand macro="stdio"/>
tools/openms/XMLValidator.xml:11: <expand macro="stdio"/>
tools/openms/FeatureLinkerLabeled.xml:11: <expand macro="stdio"/>
tools/openms/TransformationEvaluation.xml:11: <expand macro="stdio"/>
tools/numeric_clustering/numeric_clustering.xml:6: <stdio>
tools/augustus/augustus.xml:12: </stdio>
tools/crt/crt.xml:8: <stdio>
tools/quast/quast.xml:28: </stdio>
tools/metilene/metilene.xml:12: <stdio>
tools/metilene/metilene.xml:15: </stdio>
tools/text_processing/text_processing/macros.xml:9: <xml name="stdio">
tools/minced/minced.xml:7: <stdio>
tools/minced/minced.xml:10: </stdio>
tools/format_cd_hit_output/format_cd_hit_output.xml:7: <stdio>
tools/format_cd_hit_output/format_cd_hit_output.xml:9: </stdio>
tools/rna_tools/miRDeep2/miRDeep2_quantifier/quantifier.xml:6: <stdio>
tools/rna_tools/miRDeep2/miRDeep2_quantifier/quantifier.xml:13: </stdio>
tools/rna_tools/cmv/cmcv.xml:7: <expand macro="stdio" />
tools/rna_tools/dotknot/dotknot.xml:12: </stdio>
tools/rna_tools/infernal/cmstat.xml:7: <expand macro="stdio" />
tools/rna_tools/locarna/locarna_exparnap.xml:12: <expand macro="stdio" />
tools/rna_tools/vienna_rna/rnaplex.xml:9: <expand macro="stdio" />
tools/rna_tools/vienna_rna/rnaplot.xml:8: <expand macro="stdio" />
tools/rna_tools/ribotaper/ribotaper_part2_create_metaplots.xml:7: <stdio>
tools/instagraal/instagraal.xml:22: </stdio>
tools/GraphClust/Plotting/MotifFinderPlot.xml:7: <stdio>
tools/GraphClust/Plotting/MotifFinderPlot.xml:9: </stdio>
tools/GraphClust/PrepareForMlocarna/preMlocarna.xml:6: <stdio>
tools/GraphClust/PrepareForMlocarna/preMlocarna.xml:8: </stdio>
tools/GraphClust/NSPDK/NSPDK_sparseVect.xml:6: <stdio>
tools/GraphClust/NSPDK/NSPDK_candidateClusters.xml:8: </stdio>
tools/GraphClust/GSPAN/fasta2shrep_gspan.xml:6: <stdio>
tools/GraphClust/GSPAN/fasta2shrep_gspan.xml:8: </stdio>
tools/GraphClust/LocARNAGraphClust/locarna_best_subtree.xml:8: <stdio>
tools/GraphClust/LocARNAGraphClust/locarna_best_subtree.xml:10: </stdio>
tools/GraphClust/Preprocessing/preprocessing.xml:8: <stdio>
tools/GraphClust/Preprocessing/preprocessing.xml:10: </stdio>
tools/jamm/jamm.xml:28:<stdio>
tools/jamm/jamm.xml:32:</stdio>
tools/platypus/platypus.xml:6: <stdio>
tools/image_processing/imagej2/imagej2_macros.xml:10: <stdio>
tools/image_processing/imagej2/imagej2_macros.xml:15: </stdio>
tools/EDeN/EDeN/EDeN_feature.xml:36: <stdio>
tools/EDeN/EDeN/EDeN_feature.xml:41: </stdio>
tools/twistdna/twistdna.xml:6: <stdio>
tools/twistdna/twistdna.xml:9: </stdio>
tools/protease_prediction/protease.xml:7: <expand macro="stdio"/>
tools/protease_prediction/macros.xml:8: <xml name="stdio">
tools/protease_prediction/macros.xml:9: <stdio>
tools/protease_prediction/macros.xml:11: </stdio>
tools/labels/labels.xml:8: <stdio>
tools/labels/labels.xml:13: </stdio>
tools/ampir/ampir.xml:8: <stdio>
tools/ampir/ampir.xml:12: </stdio>
tools/sqlite_merger/sqlite_merger.xml:19: <stdio>
tools/sqlite_merger/sqlite_merger.xml:24: </stdio>
tools/mafft/mafft-add.xml:8: <stdio>
tools/vt/vt_macros.xml:16: <xml name="stdio">
tools/vt/vt_macros.xml:17: <stdio>
tools/vt/vt_macros.xml:22: </stdio>
tools/vt/vt_macros.xml:22: </stdio>
tools/vt/vt_macros.xml:22: </stdio>
tools/vt/vt_normalize.xml:8: <expand macro="stdio" />
tools/vt/vt_decompose.xml:8: <expand macro="stdio" />
tools/bigwig_to_bedgraph/bigwig_to_bedgraph.xml:8: <stdio>
tools/bigwig_to_bedgraph/bigwig_to_bedgraph.xml:10: </stdio>
tools/sklearn/pre_process.xml:7: <expand macro="macro_stdio" />
tools/sklearn/feature_selection.xml:8: <expand macro="macro_stdio" />
tools/sklearn/to_categorical.xml:7: <expand macro="macro_stdio" />
tools/sklearn/model_prediction.xml:8: <expand macro="macro_stdio" />
tools/sklearn/discriminant.xml:8: <expand macro="macro_stdio" />
tools/sklearn/keras_macros.xml:4: <xml name="macro_stdio">
tools/sklearn/keras_macros.xml:5: <stdio>
tools/sklearn/keras_macros.xml:7: </stdio>
tools/sklearn/model_validation.xml:7: <expand macro="macro_stdio" />
tools/sklearn/regression_metrics.xml:7: <expand macro="macro_stdio" />
tools/sklearn/pca.xml:7: <expand macro="macro_stdio" />
tools/sklearn/label_encoder.xml:7: <expand macro="macro_stdio"/>
tools/sklearn/nn_classifier.xml:7: <expand macro="macro_stdio" />
tools/sklearn/stacking_ensembles.xml:41: <expand macro="macro_stdio" />
tools/sklearn/fitted_model_eval.xml:8: <expand macro="macro_stdio" />
tools/sklearn/main_macros.xml:13: <xml name="macro_stdio">
tools/sklearn/main_macros.xml:14: <stdio>
tools/sklearn/main_macros.xml:18: </stdio>
tools/sklearn/numeric_clustering.xml:7: <expand macro="macro_stdio" />
tools/sklearn/lightgbm.xml:9: <expand macro="macro_stdio" />
tools/sklearn/simple_model_fit.xml:8: <expand macro="macro_stdio" />
tools/sklearn/association_rules.xml:7: <expand macro="macro_stdio"/>
tools/sklearn/train_test_eval.xml:8: <expand macro="macro_stdio" />
tools/sklearn/svm.xml:8: <expand macro="macro_stdio" />
tools/sklearn/estimator_attributes.xml:7: <expand macro="macro_stdio" />
tools/sklearn/pairwise_metrics.xml:7: <expand macro="macro_stdio" />
tools/sklearn/train_test_split.xml:12: <expand macro="macro_stdio" />
tools/sklearn/pipeline.xml:7: <expand macro="macro_stdio" />
tools/sklearn/keras_train_and_eval.xml:8: <expand macro="macro_stdio" />
tools/sklearn/search_model_validation.xml:32: <expand macro="macro_stdio" />
tools/sklearn/generalized_linear.xml:7: <expand macro="macro_stdio" />
tools/sklearn/clf_metrics.xml:7: <expand macro="macro_stdio" />
tools/sklearn/keras_model_builder.xml:8: <expand macro="macro_stdio" />
tools/sklearn/sparse.xml:7: <expand macro="macro_stdio" />
tools/sklearn/ml_visualization_ex.xml:8: <expand macro="macro_stdio" />
tools/sklearn/keras_model_config.xml:8: <expand macro="macro_stdio" />
tools/sklearn/sample_generator.xml:7: <expand macro="macro_stdio" />
tools/sklearn/keras_batch_models.xml:8: <expand macro="macro_stdio" />
tools/sklearn/ensemble.xml:7: <expand macro="macro_stdio" />
tools/diff/diff.xml:11: <stdio>
tools/diff/diff.xml:14: </stdio>
tools/format_metaphlan2_output/format_metaphlan2_output.xml:7: <stdio>
tools/format_metaphlan2_output/format_metaphlan2_output.xml:8: </stdio>
data_managers/data_manager_sortmerna_database_downloader/data_manager/data_manager_sortmerna_download.xml:10: <stdio>
data_managers/data_manager_sortmerna_database_downloader/data_manager/data_manager_sortmerna_download.xml:13: </stdio>
chemicaltoolbox/openbabel/macros.xml:33: <xml name="stdio">
chemicaltoolbox/openbabel/macros.xml:34: <stdio>
chemicaltoolbox/openbabel/macros.xml:36: </stdio>
chemicaltoolbox/opsin/opsin.xml:7: <stdio>
chemicaltoolbox/opsin/opsin.xml:12: </stdio>
chemicaltoolbox/data_source/get_pubchem/get_pubchem_as_smiles.xml:7: <stdio>
chemicaltoolbox/data_source/get_pubchem/get_pubchem_as_smiles.xml:20: </stdio>
chemicaltoolbox/autodock_vina/prepare_ligand/prepare_ligand.xml:10: <stdio>
chemicaltoolbox/autodock_vina/prepare_ligand/prepare_ligand.xml:12: </stdio>
chemicaltoolbox/autodock_vina/prepare_receptor/prepare_receptor.xml:10: <stdio>
chemicaltoolbox/autodock_vina/prepare_receptor/prepare_receptor.xml:12: </stdio>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just your new tool please. I can't review everything, and if you do need something special not covered by the defaults you will indeed need a stdio tag set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvdbeek hexylena/galaxyxml#38 fixes it I think as far as is feasible adding detect_errors="aggressive" to the command tag. I copied that stdio from one of Bjoern's recent tools and see that most stull use it but am happy to lead the charge once the PR is in. Automated help is a stretch since the nf-core stuff has none to speak of, but the hand-crafted ones, panaroo and yara have handcrafted and I hope quite reasonable help from their documentation. Ampir is completely automated from the nf-core module and test. A different order of not IUC.
…dio replacement...
adding first tested and working generated tool from the nf-core module to galaxy tool converter at https://github.com/fubar2/nftoolmaker
It passes planemo tests and lint. Generated by parsing the nf-core module yaml and using the ToolFactory to generate the XML.
Will need cleaning up - specific suggestions welcomed - changes will affect all future generated tools as I'd like to make them as consistent as possible with IUC guidelines that make sense for autogenerated tools.
The test data is a real challenge because it has to be extracted from special test descriptions written in their nextflow DDL.
Fortunately, most nf-core developers have line oriented habits making it possible to decompose these without needing a proper parser. Mind you, I did try - pyparsing is great but my skills were weak.
Where DSL trickery is used, the automation fails and the nextflow source can be fixed if needed so the tool will work.
However, in this trivial test case, it works completely automagically.
Hope this is useful to someone.