class pclStatsBox::ConfusionMatrix
sys::Obj pclStatsBox::ConfusionMatrix
A Confusion Matrix is often used in statistics or machine learning to hold the number of observed against predicted labels from an experiment.
A confusion matrix represents "the relative frequencies with which each of a number of stimuli is mistaken for each of the others by a person in a task requiring recognition or identification of stimuli" (R. Colman, A Dictionary of Psychology, 2008). Each row represents the predicted label of an instance, and each column represents the observed label of that instance. Numbers at each (row, column) reflect the total number of instances of predicted label "row" which were observed as having label "column".
A two-class example is:
Observed Observed | Positive Negative | Predicted ------------------------------+------------ a b | Positive c d | Negative
Here the value:
a
the true positives (those predicted positive and observed positive)b
the false negatives (those predicted positive but observed negative)c
the false positives (those predicted negative but observed positive)d
the true negatives (those predicted negative and observed negative)
From this table we can calculate statistics like:
- true positive rate - a/(a+b)
- positive recall - a/(a+c)
As statistics can also be calculated for the negative label, e.g. the true negative rate is d/(c+d), the functions below have an optional "label" parameter, to specify which label they are calculated for: the default is to report for the first label named when the matrix is created
The implementation supports confusion matrices with more than two labels. When more than two labels are in use, the statistics are calculated as if the first, or named, label were positive and all the other labels are grouped as if negative.
Usage
The following example creates a simple two-label confusion matrix, prints a few statistics and displays the table:
using pclStatsBox class ExampleConfusionMatrix { static Void main() { cm := ConfusionMatrix(["pos", "neg"]) cm.addCount("pos", "pos", 10) cm.addCount("pos", "neg", 3) cm.addCount("neg", "neg", 20) cm.addCount("neg", "pos", 5) echo("Confusion Matrix") echo("") echo(cm) echo("Precision: ${cm.precision}") echo("Recall : ${cm.recall}") echo("MCC : ${cm.matthewsCorrelation}") } }
which outputs:
Confusion Matrix Observed| pos neg | Predicted --------+---------- 10 3 | pos 5 20 | neg Precision: 0.6666666666666666 Recall : 0.7692307692307693 MCC : 0.5524850114241865
- addCount
-
Void addCount(Str predicted, Str observed, Int count := 1)
Adds total to the count for given (predicted, observed) labels. Throws an error if labels are not valid.
- cohenKappa
-
Float cohenKappa(Str label := this.labels.first())
Cohen's Kappa statistic compares observed accuracy with an expected accuracy.
- count
-
Int count(Str predicted, Str observed)
Returns count for given (predicted, observed) labels. Throws an error if labels are not valid.
- fMeasure
-
Float fMeasure(Str label := this.labels.first())
Harmonic mean of the precision and recall for the given label.
- falseNegative
-
Int falseNegative(Str label := this.labels.first())
Returns the number of instances of the given label which are incorrectly observed.
- falsePositive
-
Int falsePositive(Str label := this.labels.first())
Returns the number of instances incorrectly observed as the given label.
- falseRate
-
Float falseRate(Str label := this.labels.first())
Returns the proportion of instances of given label incorrectly observed out of all instances not originally of that label.
- geometricMean
-
Float geometricMean()
Nth root of product of true-rate for each label
- make
-
new make(Str[] labels := ["positive","negative"])
Constructor takes a list of labels for the confusion matrix. There should be at least two labels, and the default is ("positive", "negative")
- matthewsCorrelation
-
Float matthewsCorrelation(Str label := this.labels.first())
Matthew's Correlation is a measure of the quality of binary classification.
- overallAccuracy
-
Float overallAccuracy()
Proportion of instances which are correctly observed.
- precision
-
Float precision(Str label := this.labels.first())
Precision is the proportion of instances of given label which are correctly observed.
- prevalence
-
Float prevalence(Str label := this.labels.first())
Prevalence is proportion of instances of given label out of total.
- recall
-
Float recall(Str label := this.labels.first())
Recall is equal to the trueRate, for a given label.
- sensitivity
-
Float sensitivity(Str label := this.labels.first())
Sensitivity is another name for the true positive rate (recall).
- specificity
-
Float specificity(Str label := this.labels.first())
Specificity is 1-falseRate for a given label.
- toStr
-
virtual override Str toStr()
Returns a string representation of the matrix across multiple lines in a table-like format.
- total
-
Int total()
Returns total of all counts.
- trueNegative
-
Int trueNegative(Str label := this.labels.first())
Returns the number of instances NOT of the given label which are correctly observed.
- truePositive
-
Int truePositive(Str label := this.labels.first())
Returns the number of instances of the given label correctly observed.
- trueRate
-
Float trueRate(Str label := this.labels.first())
Returns the proportion of instances of the given label which are correctly observed.