Confusion Matrix#

The Confusion Matrix calculates and visualizes the performance of any classification method. Each column of the matrix corresponds to the patterns in a predicted class, whereas each row corresponds to the instances in an actual class.


Confusion Matrix layout#

The Confusion Matrix task is made of only one tab, which is divided itself into three panes:

  • The Options, where users can configure different options.

  • The Confusion Matrix, where users can view the percentages of correctly and incorrectly forecast samples in a grid.

  • The Confusion Plot, where users can view the same information contained in the Confusion Matrix pane, but in a more readable grid.

Options

Within this pane, users can set the following options:

  • Show percentage. If selected, the percentage in the table in parentheses will be displayed.

  • Outputs. Through the drop-down list, users can select the preferred output attribute.

Confusion Matrix

Within this pane, the numerical matrix represents the percentages of correctly and incorrectly forecast samples in a grid.

Confusion Plot

This pane contains a matrix that graphically displays the same information in a more readable grid, with each column representing patterns in a predicted class and each row representing instances in an actual class.

By hovering over the previously mentioned matrix, users will find the following customized plot visualization:

  • Download plot as a png

  • Zoom

  • Pan

  • Box select

  • Lasso select

  • Zoom in

  • Zoom out

  • Autoscale

  • Reset axes

For more information about the above-mentioned plots, please refer to the Data Manager page.


Example#

  • After having imported the dataset with the Import from Text File task, split the dataset into test and training sets (30% test and 70% training) with the Split Data task. Then, compute a LLM Classification task, specifying the Income attribute as Output. Apply the model with the Apply Model task leaving the default options. Add a Confusion Matrix task and link it to the Apply Model task.

    https://cdn.rulex.ai/docs/Factory/confusion-matrix-ex1.webp
  • The test set’s confusion matrix demonstrates that the majority of errors are due to misclassification between class >50K and <=50K. Essentially, there are few examples of class <=50K being classified as >50K, but many examples of class >50K being classified as <=50K. Although the confusion matrix may seem trivial in a two-class problem, when more classes are present, the information contained in the matrix may help to understand the phenomenon under examination and improve classification accuracy.

    https://cdn.rulex.ai/docs/Factory/confusion-matrix-ex2.webp