Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Mann-Whitney-U test #182

Open
wants to merge 3 commits into
base: developer
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/FSharp.Stats/FSharp.Stats.fsproj
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@
<Compile Include="Testing\Correlation.fs" />
<Compile Include="Testing\TTest.fs" />
<Compile Include="Testing\FTest.fs" />
<Compile Include="Testing\UTest.fs" />
<Compile Include="Testing\FriedmanTest.fs" />
<Compile Include="Testing\HTest.fs" />
<Compile Include="Testing\ChiSquareTest.fs" />
Expand Down
24 changes: 23 additions & 1 deletion src/FSharp.Stats/Testing/TestStatistics.fs
Original file line number Diff line number Diff line change
Expand Up @@ -96,4 +96,26 @@ module TestStatistics =
let cdf = Distributions.Continuous.Normal.CDF 0. 1. statistic
let pvalue = 1.- cdf
let pvalueTwoTailed = pvalue * 2.
{Statistic=statistic; PValueLeft=pvalue;PValueRight = cdf; PValueTwoTailed = pvalueTwoTailed}
{Statistic=statistic; PValueLeft=pvalue;PValueRight = cdf; PValueTwoTailed = pvalueTwoTailed}


/// <summary>
/// Computes the Mann-Whitney U-test statistics for a given statistic.
/// </summary>
/// <param name="Statistic">The test statistic.</param>
/// <param name="PValue">One Tailed/Sided.</param>
/// <param name="PValueTwoTailed">Two Tailed/Sided.</param>
type UTestTestStatistics = {
Statistic : float
PValueLeft : float
PValueRight : float
PValueTwoTailed : float
}
let createUTest statistic : UTestTestStatistics =
let cdf = Distributions.Continuous.Normal.CDF 0. 1. statistic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approximation via the z-Distribution is only valid for large samples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the distribution is discrete, I cannot find any other (continuous) approximation than the one via normal distribution.
https://stats.stackexchange.com/a/251734

So, what do you suggest? Delete the whole test because of normal approximation might be too inaccurate at low sample sizes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should aim to implement the exact distribution for n+m<100 and the approximation for larger values

see #213 (comment)

{
Statistic = statistic
PValueLeft = 1. - cdf
PValueRight = cdf
PValueTwoTailed = cdf * 2.
}
51 changes: 51 additions & 0 deletions src/FSharp.Stats/Testing/UTest.fs
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
namespace FSharp.Stats.Testing

// taken/implemented from: https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#U_statistic
module UTest =

open FSharp.Stats
open FSharp.Stats.Testing

// TO DO: Bergmann et al. (2000) showed that there are different implementations of this test that lead to different results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if your computations are correct before PR can be merged. Additionally, check your results against established packages in R, SPSS or puplications rather than volatile wikipedia chapters.

// They implied that some of them are using a false algorithm. Check if the mathematical derivation from above is wrong too.
// Read: https://www.jstor.org/stable/2685616
let inline private compute (seq1 : seq<'T>) (seq2 : seq<'T>) =
let sortedMerge =
(seq1 |> Seq.map (fun v -> float v, 0), seq2 |> Seq.map (fun v -> float v, 1)) // 0 = first group; 1 = second group
||> Seq.append
|> Seq.sortByDescending (fun (v,groupIndex) -> v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary that the sequences are sorted? The ranking is order-independent and as I can see there is no need for sorting.

|> Array.ofSeq
// let abundance = // method for equal ranks instead of mean ranks when identical values occur.
// sortedMerge
// |> Array.map (
// fun v -> Array.filter (fun v2 -> v2 = v) sortedMerge
// >> Array.length
// )
// let myMap = sortedMerge |> Array.mapi (fun i x -> x, i + 2 - Array.item i abundance) |> Map // wrong: must return mean of ranksums with equal ranks, not always the same rank!
// let rankedMerge = sortedMerge |> Array.map (fun (v,group) -> float myMap.[(v,group)],v,group)
Comment on lines +18 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

let rankedMerge = // method for mean ranks instead of equal ranks when identical values occur.
sortedMerge
|> Array.map fst
|> Rank.rankAverage
|> fun res ->
(sortedMerge, res)
||> Array.map2 (fun (v,group) rank -> rank, v, group)
let calcRankSum group =
rankedMerge
|> Array.filter (fun (rank,v,group') -> group' = group)
|> Array.fold (fun state (rank,v,group') -> state + rank) 0.
Comment on lines +35 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with countBy

Comment on lines +35 to +36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with countBy

let rankSumSeq1 = calcRankSum 0
let rankSumSeq2 = calcRankSum 1
let seq1Length = Seq.length seq1 |> float
let seq2Length = Seq.length seq2 |> float
let u1 = seq1Length * seq2Length + (seq1Length * (seq1Length + 1.) / 2.) - rankSumSeq1
let u2 = seq1Length * seq2Length + (seq2Length * (seq2Length + 1.) / 2.) - rankSumSeq2
Comment on lines +41 to +42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporary results should not be calculated twice

let uMin = min u1 u2
let z = (uMin - seq1Length * seq2Length / 2.) / System.Math.Sqrt (seq1Length * seq2Length * (seq1Length + seq2Length + 1.) / 12.)
z

/// Computes a Mann-Whitney U-test. Aka Wilcoxon-Mann-Whitney test.
/// Use this test for independent samples and the Wilcoxon test (= Wilcoxon ranksum test) for dependent samples.
let inline computeUtest (seq1 : seq<'T>) (seq2 : seq<'T>) =
let z = compute seq1 seq2
TestStatistics.createUTest z
28 changes: 28 additions & 0 deletions tests/FSharp.Stats.Tests/Testing.fs
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,34 @@ let tTestTests =
Expect.floatClose Accuracy.low tTest4.Statistic 0.514 "t statistic should be equal."
]


[<Tests>]
let uTestTests =
// taken from https://de.wikipedia.org/wiki/Wilcoxon-Mann-Whitney-Test#Beispiel
let testList1 =
([0;400;500;550;600;650;750;800;900;950;1000;1100;1200;1500;1600;1800;1900;2000;2200;3500 ],["M";"W";"M";"W";"M";"W";"M";"M";"W";"W";"M";"M";"W";"M";"W";"M";"M";"M";"M";"M"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add duplicates to check ranking procedure

||> List.map2 (fun pay sex -> sex, pay) |> List.sortBy fst

let testList1A = testList1 |> List.choose (fun (sex,pay) -> if sex = "W" then Some pay else None)
let testList1B = testList1 |> List.choose (fun (sex,pay) -> if sex = "M" then Some pay else None)

let observedResult1 = UTest.computeUtest testList1A testList1B
let expectedResult1 : TestStatistics.UTestTestStatistics = {
Statistic = -1.15
PValueTwoTailed = 0.2505
PValueLeft = 0.875
PValueRight = 0.1253
}

testList "Testing.UTest" [
testCase "TwoSample" <| fun () ->
Expect.floatClose Accuracy.low observedResult1.PValueLeft expectedResult1.PValueLeft "left p-value should be equal"
Expect.floatClose Accuracy.low observedResult1.PValueRight expectedResult1.PValueRight "right p-value should be equal"
Expect.floatClose Accuracy.low observedResult1.PValueTwoTailed expectedResult1.PValueTwoTailed "p-value should be equal"
Expect.floatClose Accuracy.low observedResult1.Statistic expectedResult1.Statistic "test statistic should be equal"
]


[<Tests>]
let chiSquaredTests =
// ChiSquared https://www.graphpad.com/quickcalcs/chisquared2/
Expand Down