Confusion Matrix#
The Confusion Matrix calculates and visualizes the performance of any classification method. Each column of the matrix corresponds to the patterns in a predicted class, whereas each row corresponds to the instances in an actual class.
Confusion Matrix layout#
The Confusion Matrix task is made of only one tab, which is divided itself into three panes:
The Options, where users can configure different options.
The Confusion Matrix, where users can view the percentages of correctly and incorrectly forecast samples in a grid.
The Confusion Plot, where users can view the same information contained in the Confusion Matrix pane, but in a more readable grid.
Options
Within this pane, users can set the following options:
Show percentage. If selected, the percentage in the table in parentheses will be displayed.
Outputs. Through the drop-down list, users can select the preferred output attribute.
Confusion Matrix
Within this pane, the numerical matrix represents the percentages of correctly and incorrectly forecast samples in a grid.
Confusion Plot
This pane contains a matrix that graphically displays the same information in a more readable grid, with each column representing patterns in a predicted class and each row representing instances in an actual class.
By hovering over the previously mentioned matrix, users will find the following customized plot visualization:
Download plot as a png
Zoom
Pan
Box select
Lasso select
Zoom in
Zoom out
Autoscale
Reset axes
For more information about the above-mentioned plots, please refer to the Data Manager page.
Example#
After having imported the dataset with the Import from Text File task, split the dataset into test and training sets (30% test and 70% training) with the Split Data task. Then, compute a LLM Classification task, specifying the Income attribute as Output. Apply the model with the Apply Model task leaving the default options. Add a Confusion Matrix task and link it to the Apply Model task.
The test set’s confusion matrix demonstrates that the majority of errors are due to misclassification between class >50K and <=50K. Essentially, there are few examples of class <=50K being classified as >50K, but many examples of class >50K being classified as <=50K. Although the confusion matrix may seem trivial in a two-class problem, when more classes are present, the information contained in the matrix may help to understand the phenomenon under examination and improve classification accuracy.