Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better method for scaling metrics #20

Open
maplesond opened this issue Jan 7, 2015 · 1 comment
Open

Better method for scaling metrics #20

maplesond opened this issue Jan 7, 2015 · 1 comment
Labels

Comments

@maplesond
Copy link
Collaborator

Currently in V0.11.0 we use a scaling procedure to reduce any given metric into a range between 0 and 1 that can then be weighted and combined with other scaled and weighted metrics to give a final score for the assembly. An issue with this approach is that often in assembly projects we sweep large ranges of k values and get at least one useless assembly. Excluding this assembly could change the overall ranking of the 1st and 2nd assembly (depending on the weightings scores of other metrics). Scaling by the range means that the weight of the metric could be decided entirely by two outliers.

An alternative approach taken by Abbas et al, 2014 (http://link.springer.com/article/10.1186/1471-2164-15-S9-S10/fulltext.html) is to rank the assemblies based on a single metric before weighting and combining results. A problem with this method is that it can give to much credence to a metric where all assemblies contain similar values for that metric.

We are therefore looking for a better means of scaling assembly metrics. Potentially methods such as the inner quartile range or standard deviation of the metric may work better. This thread is for discussing this topic.

@sjackman
Copy link

Assemblathon 1 used sum of ranks.
http://genome.cshlp.org/content/early/2011/09/16/gr.126599.111

The sum of the rankings from each category was then used to create an overall rank for the assemblies

Assemblathon 2 used average rank and z-scores.
http://www.gigasciencejournal.com/content/2/1/10

In addition to ranking assemblies by each of these metrics and then calculating an average rank (Additional file 2: Figures S9–S11), we also calculated z-scores for each key metric and summed these. This has the benefit of rewarding/pen- alizing those assemblies with exceptionally high/low scores in any one metric. One way of addressing the reliability and contribution of each of these key metrics is to remove each metric in turn and recalculate the z-score. This can be used to produce error bars for the final z-score, showing the minimum and maximum z-score that might have occurred if we had used any combination of nine (bird and snake) or six (fish) metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants