-
Notifications
You must be signed in to change notification settings - Fork 7
Overview of repository statistics
The germinator project now includes scripts that generate json-format statistics for phylesystem instances and for synthetic trees.
Both scripts pull data via API calls, accumulate counts of studies and OTUs within the studies, and generate a json record. The record is added to a json structure of records from previous runs, keyed by the time (at end of the analysis). Time is recorded as ISO 8601 with hour resolution (i.e., formatted '%Y-%m-%dT%HZ').
These scripts use rrsync to push the output of the generation scripts to the appropriate location in a web2py application running on a server machine. Use of rrsync requires a web2py installation and appropriate ssh public-key configuration. The current configuration pushes to a folder called 'statistics', within a web2py installation (e.g., HOST:/home/USER/web2py/applications/opentree/static/). The push script specifies the statistics folder, and the location on the web2py server is specified in the server's .ssh/authorized_keys file
The scripts take three command line arguments:
- a local folder where the working copy of the report (which is appended to by the python script is found). There should be a separate folder for each server (e.g. dev, production) being monitored.
- the api server used to retrieve studies and otus. This may not be the same as the target host.
- the target host. The scripts will authenticate with the user 'opentree' by default.
At present, these scripts are run on crontabs as follows:
0 0 * * * /home/opentree/statistics/phylesystem_stats.sh devstats devapi.opentreeoflife.org devtree.opentreeoflife.org
0 0 * * * /home/opentree/statistics/synthesis_stats.sh devstats devapi.opentreeoflife.org devtree.opentreeoflife.org
- reported_study_count - integer length of list of studies returned
- study_count - integer number of studies that returned otus when queried
- OTU_count - integer count of OTUs in studies
- unique_OTU_count - count of OTUs without duplicates
- unmapped_OTU_count - the count of the OTU objects that have not been mapped to OTT.
- nominated_study_count - count of 'nominated' studies (= not marked ot:notIntendedForSynthesis)
- nominated_study_OTU_count - count of OTUs in nominated studies
- nominated_study_unique_OTU_count - count of OTUs in nominated studies w/o duplicates
- nominated_study_unmapped_OTU_count - the count of the OTU objects in nominated studies that have not been mapped to OTT
- run_time - elapsed time for processing, including queries, in seconds
As of 2015-01-26 there have only been two synthetic trees, one from April 2014 and one from September 2014.
- date (TO BE DONE) - date of synthesis
- reported_study_count - integer length of list of studies in synthesis returned
- study_count - integer number of studies that returned OTUs when queried (subset of previous)
- total_OTU_count - sum of number of OTUs across all studies
- unique_OTU_count - integer length of OTU list without duplicates
- run_time - elapsed time for processing, including queries, in seconds
As of 2015-01-26 we have taxonomy versions 2.0 through 2.8.
- date (TO BE DONE) - date of taxonomy release
- version - a string e.g. "2.8"
- taxon_count - total number of taxa including 'hidden' taxa
- visible_taxon_count - number of taxa exclusive of 'hidden' taxa
- release page - URL (does this belong with statistics?)
- download link - URL for compressed tarball, see http://files.opentreeoflife.org/ott/ - credits are on release page (does this belong with statistics?)
- [other fields TBD]