Statistics configuration panels

Each time you want to evaluate statistics on some attribute of your underlying dataset in the Statistic panel , you first need to perform drag and drop operations on the desired input attributes and then select the class of statistics you want to compute.

It is possible to further customize the provided stats list, either by adding a non-default included statistic or by controlling some computation options of this operation.

All these fine-tuning customizations are performed in the Statistics configuration panels, which can be accessed by clicking on the pencil icon situated in the right-hand side of each row of the Statistics area.

Since these panels differ according to the selected statistical class , we will then dedicate one section to each group. All the panels are divided into two tabs:

  • Statistics tab: where you can choose the actual stats to evaluate.

  • Options tab: where you can configure computation options for the evaluation.


Single statistics

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Descriptive, location and central tendency measures, stat category which includes:
    • Number of distinct values (default)

    • Number of missing values (default)

    • Minimum value (default)

    • Index of minimum element

    • Maximum value (default)

    • Index of maximum element

    • Sum value (default)

    • Absolute sum value

    • Product value

    • Absolute product value

    • Mean value (default)

    • Absolute mean values

    • Geometric mean value

    • Geometric absolute mean value

    • Harmonic mean value

    • Harmonic absolute mean value

    • Mode value (default)

    • Number of mode elements (default)

    • Index of mode element

    • Median value (default)

    • Lower quartile

    • Upper quartile

    • Lower whisker for box plot

    • Upper whisker for box plot

  • Dispersion and heterogeneity measures, stat category which includes:
    • Range of values

    • Interquartile range

    • Standard error of mean

    • Standard deviation (default)

    • Standard error of standard deviation

    • Variance

    • Standard error of variance

    • Coefficient of variation

    • Mean absolute deviation

    • Median absolute deviation

    • Pietra index

    • Entropy

    • Normalized entropy

    • Gini coefficient

    • Normalized Gini coefficient

  • Concentration measures, stat category which includes:
    • Gini concentration index

  • Symmetry and shape measures, stat category which includes:
    • Skewness value

    • Standard error of skewness

    • Kurtosis value

    • Standard error of kurtosis

You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, the unique computation option, which can be found in the Options tab, is:

  • Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).


Values, frequencies and quantiles

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Frequencies indicators, stat category which includes:
    • Distinct values (default)

    • Absolute frequencies (default)

    • Relative frequencies

    • Cumulative frequencies

    • Partial sums

    • Partial means

    • Lorenz Curve

    • Pietra Curve

    • Generic quantiles

    • Rank

You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, you can specify in the Options tab the following options:

  • Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).

  • Value for generic quantiles: the number of quantiles used when requested by the frequencies’ evaluation.


Correlation/Covariance

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Pearson correlation coefficient, stat category which includes:
    • r-value of Pearson coefficient (default)

    • P-value of Pearson coefficient (default)

  • Spearman Correlation Coefficient, stat category which includes:
    • ρ -value for Spearman coefficient (default)

    • P-value for Spearman coefficient

  • Kendall Tau, stat category which includes:
    • τ -value for Kendall Tau

    • P-value for Kendall Tau

  • Simple regression coefficient, stat category which includes:
    • β -value for Simple regression coefficient

    • P-value for Simple regression coefficient

You can check/uncheck single statistic to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, no computation options are present in the Options tab.


Cross tabulation statistics

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Contingency tables, stat category which includes:
    • Contingency table (default)

    • Expected contingency table

  • Statistical test, stat category which includes:
    • Pearson χ square (default)

    • P-value for Pearson χ square (default)

You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, the unique computation option, which can be found in the Options tab, is:

  • Use missing values to control if a missing value has to be considered during stats evaluation.


ROC Curve

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of valid positives samples

    • Number of valid negatives samples

    • Number of total valid samples

  • ROC curve (scalar), stat category which includes:
    • Area Under Curve (default)

    • P-value of Area Under Curve (default)

    • Standard Error of Area Under Curve (default)

    • Point of maximum youden index

    • Point closest to ``(0, 1)``

    • Point of maximum accuracy

    • Point with specificity = sensitivity

  • ROC curve (vector), stat category which includes:
    • AUC 95% confidence interval (default)

    • 1-Specificity (default)

    • Sensitivity (default)

    • Accuracies

    • Thresholds (default)

    • Youden indices (default)

    • Likelihood ratio - (default)

    • Likelihood ratio + (default)

You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, you can specify in the Options tab the following computation options:

  • Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).

  • Use target attribute: select if you want to use the terms within the Var_2/Target area as the target for the ROC curve.

  • Consider missing value as target with negative outcome: if selected, in a binary classification the cases with missing outputs are considered as if they have a negative output.

  • Positive test for: this drop-down menu allows you to select one of the following options:
    • Greater Values

    • Lower Values

    • Automatic Selection


Test for independent samples

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of valid positives samples

    • Number of valid negatives samples

    • Number of total valid samples

  • Wilcoxon and Mann-Whitney test, stat category which includes:
    • Mann-Whitney U-value

    • Mann-Whitney normalized U-value (default)

    • Wilcoxon R1 -value

    • Wilcoxon Normalized R1 -value

    • P-value of Wilcoxon test (default)

  • Kolmogorov-Smirnov test, stat category which includes:
    • KS value (default)

    • P-value for KS test (default)

  • Student t-test, stat category which includes:
    • Student t-value (default)

    • P-value for Student t-test (default)

  • Levene test, stat category which includes:
    • F-value for Levene test (default)

    • P-value for Levene test (default)

You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, you can specify in the Options tab the following computation options:

  • Use target attribute: select if you want to use the terms within the Var_2/Target area as the target for the ROC curve.


Test for paired samples

Stats available in the Statistics tab for this subclass are:

  • Sample size, stat category which includes:
    • Number of total valid pairs

  • Student t-test, stat category which includes:
    • Student t-value (default)

    • P-value for Student t-test (default)

  • Wilcoxon test, stat category which includes:
    • W-value for Wilcoxon test (default)

    • W-value for normalized Wilcoxon test

    • P-value for Wilcoxon test (default)

    • Number of unequal pairs

You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.

For this subclass, no computation options are present in the Options tab.