Statistics configuration panels#

Anytime in the Statistic Manager you want to evaluate statistics on some attribute of the underlying dataset, you first need to drag’n drop desired input attributes and then select the class of statistics you want to compute.

Then, you may want to further customize the provided stats list by adding a not default included statistic or by controlling some computation options of this operation.

All these fine-tuning customizations are performed in the Statistics configuration panel opened by clicking on the Pencil icon located at the right of each row in the Statistics area.

Since these panels differ according to the statistic class selected we are to dedicate one section to each group. All the panels own two tabs:

  • Statistics tab: where the actual stats to evaluate are chosen.

  • Options tab: where computation options for the evaluation are imposed.


Single statistics#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Descriptive, location and central tendency measures, stat category which includes:
    • Number of distinct values

    • Number of missing values

    • Minimum value

    • Index of minimum element

    • Maximum value

    • Index of maximum element

    • Sum value

    • Absolute sum value

    • Product value

    • Absolute product value

    • Mean value

    • Absolute mean values

    • Geometric mean value

    • Geometric absolute mean value

    • Harmonic mean value

    • Harmonic absolute mean value

    • Mode value

    • Number of mode elements

    • Index of mode element

    • Median value

    • Lower quartile

    • Upper quartile

    • Lower whisker for box plot

    • Upper whisker for box plot

  • Dispersion and heterogeneity measures, stat category which includes:
    • Range of values

    • Interquartile range

    • Standard error of mean

    • Standard deviation

    • Standard error of standard deviation

    • Variance

    • Standard error of variance

    • Coefficient of variation

    • Mean absolute deviation

    • Median absolute deviation

    • Pietra index

    • Entropy

    • Normalized entropy

    • Gini coefficient

    • Normalized Gini coefficient

  • Concentration measures, stat category which includes:
    • Gini concentration index

  • Symmetry and shape measures, stat category which includes:
    • Skewness value

    • Standard error of skewness

    • Kurtosis value

    • Standard error of kurtosis

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, the unique computation option which can be specified is:

  • Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).


Values, frequencies and quantiles#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Frequencies indicators, stat category which includes:
    • Distinct values

    • Absolute frequencies

    • Relative frequencies

    • Cumulative frequencies

    • Partial sums

    • Partial means

    • Lorenz Curve

    • Pietra Curve

    • Generic quantiles

    • Rank

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, you can specify the following options:

  • Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).

  • Value for generic quantiles: the number of quantiles used when requested by the frequencies’ evaluation.


Correlation/Covariance#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Pearson correlation coefficient, stat category which includes:
    • r-value of Pearson coefficient

    • P-value of Pearson coefficient

  • Spearman Correlation Coefficient, stat category which includes:
    • ρ -value for Spearman coefficient

    • P-value for Spearman coefficient

  • Kendall Tau, stat category which includes:
    • τ-value for Kendall Tau

    • P-value for Kendall Tau

  • Simple regression coefficient, stat category which includes:
    • β-value for Simple regression coefficient

    • P-value for Simple regression coefficient

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, no computation options are present.


Cross tabulation statistics#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of total valid samples

  • Contingency tables, stat category which includes:
    • Contingency table

    • Expected contingency table

  • Statistical test, stat category which includes:
    • Pearson χ square

    • P-value for Pearson χ square

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, the unique computation option which can be specified is:

  • Use missing values to control if missing value has to be considered during stats evaluation.


ROC Curve#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of valid positives samples

    • Number of valid negatives samples

    • Number of total valid samples

  • ROC curve (scalar), stat category which includes:
    • Area Under Curve

    • P-value of Area Under Curve

    • Standard Error of Area Under Curve

    • Point of maximum youden index

    • Point closest to (0, 1)

    • Point of maximum accuracy

    • Point with specificity = sensitivity

  • ROC curve (vector), stat category which includes:
    • AUC 95% confidence interval

    • 1-Specificity

    • Sensitivity

    • Accuracies

    • Thresholds

    • Youden indices

    • Likelihood ratio -

    • Likelihood ration +

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, you can specify the following computation options:

  • Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).

  • Use target attribute: select if you want to use the terms into the Var_2/Target area as ROC curve target.

  • Consider missing value as target with negative outcome

  • Positive test for: this drop-down menu permits to select one of the following choices:
    • Greater Values

    • Lower Values

    • Automatic Selection


Test for independent samples#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of valid positives samples

    • Number of valid negatives samples

    • Number of total valid samples

  • Wilcoxon and Mann-Whitney test, stat category which includes:
    • Mann-Whitney U-value

    • Mann-Whitney normalized U-value

    • Wilcoxon R1-value

    • Wilcoxon Normalized R1-value

    • P-value of Wilcoxon test

  • Kolmogorov-Smirnov test, stat category which includes:
    • KS value

    • P-value for KS test

  • Student t-test, stat category which includes:
    • Student t-value

    • P-value for Student t-test

  • Levene test, stat category which includes:
    • F-value for Levene test

    • P-value for Levene test

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, you can specify the following computation options:

  • Use target attribute: select if you want to use the terms into the Var_2/Target area as ROC curve target.


Test for paired samples#

Stats available in the Statistics tab for this subclass are (in bold the default one):

  • Sample size, stat category which includes:
    • Number of total valid pairs

  • Student t-test, stat category which includes:
    • Student t-value

    • P-value for Student t-test

  • Wilcoxon test, stat category which includes:
    • W-value for Wilcoxon test

    • W-value for normalized Wilcoxon test

    • P-value for Wilcoxon test

    • Number of unequal pairs

You can check/uncheck single statistic to add them to the computed list. You can also check/uncheck the whole stat category to add/remove all its entries from the evaluated list.

For this subclass, no computation options are present.