Better method for scaling metrics #20

maplesond · 2015-01-07T09:38:53Z

Currently in V0.11.0 we use a scaling procedure to reduce any given metric into a range between 0 and 1 that can then be weighted and combined with other scaled and weighted metrics to give a final score for the assembly. An issue with this approach is that often in assembly projects we sweep large ranges of k values and get at least one useless assembly. Excluding this assembly could change the overall ranking of the 1st and 2nd assembly (depending on the weightings scores of other metrics). Scaling by the range means that the weight of the metric could be decided entirely by two outliers.

An alternative approach taken by Abbas et al, 2014 (http://link.springer.com/article/10.1186/1471-2164-15-S9-S10/fulltext.html) is to rank the assemblies based on a single metric before weighting and combining results. A problem with this method is that it can give to much credence to a metric where all assemblies contain similar values for that metric.

We are therefore looking for a better means of scaling assembly metrics. Potentially methods such as the inner quartile range or standard deviation of the metric may work better. This thread is for discussing this topic.

sjackman · 2015-01-26T19:25:53Z

Assemblathon 1 used sum of ranks.
http://genome.cshlp.org/content/early/2011/09/16/gr.126599.111

The sum of the rankings from each category was then used to create an overall rank for the assemblies

Assemblathon 2 used average rank and z-scores.
http://www.gigasciencejournal.com/content/2/1/10

In addition to ranking assemblies by each of these metrics and then calculating an average rank (Additional file 2: Figures S9–S11), we also calculated z-scores for each key metric and summed these. This has the benefit of rewarding/pen- alizing those assemblies with exceptionally high/low scores in any one metric. One way of addressing the reliability and contribution of each of these key metrics is to remove each metric in turn and recalculate the z-score. This can be used to produce error bars for the final z-score, showing the minimum and maximum z-score that might have occurred if we had used any combination of nine (bird and snake) or six (fish) metrics.

maplesond added the question label Jan 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better method for scaling metrics #20

Better method for scaling metrics #20

maplesond commented Jan 7, 2015

sjackman commented Jan 26, 2015

Better method for scaling metrics #20

Better method for scaling metrics #20

Comments

maplesond commented Jan 7, 2015

sjackman commented Jan 26, 2015