class pclStatsBox::ConfusionMatrix

sys::Obj
  pclStatsBox::ConfusionMatrix

A Confusion Matrix is often used in statistics or machine learning to hold the number of observed against predicted labels from an experiment.

A confusion matrix represents "the relative frequencies with which each of a number of stimuli is mistaken for each of the others by a person in a task requiring recognition or identification of stimuli" (R. Colman, A Dictionary of Psychology, 2008). Each row represents the predicted label of an instance, and each column represents the observed label of that instance. Numbers at each (row, column) reflect the total number of instances of predicted label "row" which were observed as having label "column".

A two-class example is:

Observed        Observed      | 
Positive        Negative      | Predicted
------------------------------+------------
    a               b         | Positive
    c               d         | Negative

Here the value:

From this table we can calculate statistics like:

As statistics can also be calculated for the negative label, e.g. the true negative rate is d/(c+d), the functions below have an optional "label" parameter, to specify which label they are calculated for: the default is to report for the first label named when the matrix is created

The implementation supports confusion matrices with more than two labels. When more than two labels are in use, the statistics are calculated as if the first, or named, label were positive and all the other labels are grouped as if negative.

Usage

The following example creates a simple two-label confusion matrix, prints a few statistics and displays the table:

using pclStatsBox

class ExampleConfusionMatrix
{
  static Void main()
  {
    cm := ConfusionMatrix(["pos", "neg"])

    cm.addCount("pos", "pos", 10)
    cm.addCount("pos", "neg", 3)
    cm.addCount("neg", "neg", 20)
    cm.addCount("neg", "pos", 5)

    echo("Confusion Matrix")
    echo("")
    echo(cm)
    echo("Precision: ${cm.precision}")
    echo("Recall   : ${cm.recall}")
    echo("MCC      : ${cm.matthewsCorrelation}")
  }
}

which outputs:

Confusion Matrix

Observed|
pos neg | Predicted
--------+----------
 10   3 | pos
  5  20 | neg

Precision: 0.6666666666666666
Recall   : 0.7692307692307693
MCC      : 0.5524850114241865
addCount

Void addCount(Str predicted, Str observed, Int count := 1)

Adds total to the count for given (predicted, observed) labels. Throws an error if labels are not valid.

cohenKappa

Float cohenKappa(Str label := this.labels.first())

Cohen's Kappa statistic compares observed accuracy with an expected accuracy.

count

Int count(Str predicted, Str observed)

Returns count for given (predicted, observed) labels. Throws an error if labels are not valid.

fMeasure

Float fMeasure(Str label := this.labels.first())

Harmonic mean of the precision and recall for the given label.

falseNegative

Int falseNegative(Str label := this.labels.first())

Returns the number of instances of the given label which are incorrectly observed.

falsePositive

Int falsePositive(Str label := this.labels.first())

Returns the number of instances incorrectly observed as the given label.

falseRate

Float falseRate(Str label := this.labels.first())

Returns the proportion of instances of given label incorrectly observed out of all instances not originally of that label.

geometricMean

Float geometricMean()

Nth root of product of true-rate for each label

make

new make(Str[] labels := ["positive","negative"])

Constructor takes a list of labels for the confusion matrix. There should be at least two labels, and the default is ("positive", "negative")

matthewsCorrelation

Float matthewsCorrelation(Str label := this.labels.first())

Matthew's Correlation is a measure of the quality of binary classification.

overallAccuracy

Float overallAccuracy()

Proportion of instances which are correctly observed.

precision

Float precision(Str label := this.labels.first())

Precision is the proportion of instances of given label which are correctly observed.

prevalence

Float prevalence(Str label := this.labels.first())

Prevalence is proportion of instances of given label out of total.

recall

Float recall(Str label := this.labels.first())

Recall is equal to the trueRate, for a given label.

sensitivity

Float sensitivity(Str label := this.labels.first())

Sensitivity is another name for the true positive rate (recall).

specificity

Float specificity(Str label := this.labels.first())

Specificity is 1-falseRate for a given label.

toStr

virtual override Str toStr()

Returns a string representation of the matrix across multiple lines in a table-like format.

total

Int total()

Returns total of all counts.

trueNegative

Int trueNegative(Str label := this.labels.first())

Returns the number of instances NOT of the given label which are correctly observed.

truePositive

Int truePositive(Str label := this.labels.first())

Returns the number of instances of the given label correctly observed.

trueRate

Float trueRate(Str label := this.labels.first())

Returns the proportion of instances of the given label which are correctly observed.