Statistics configuration panels¶
Each time you want to evaluate statistics on some attribute of your underlying dataset in the Statistic panel , you first need to perform drag and drop operations on the desired input attributes and then select the class of statistics you want to compute.
It is possible to further customize the provided stats list, either by adding a non-default included statistic or by controlling some computation options of this operation.
All these fine-tuning customizations are performed in the Statistics configuration panels, which can be accessed by clicking on the pencil icon situated in the right-hand side of each row of the Statistics area.
Since these panels differ according to the selected statistical class , we will then dedicate one section to each group. All the panels are divided into two tabs:
Statistics tab: where you can choose the actual stats to evaluate.
Options tab: where you can configure computation options for the evaluation.
Single statistics¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of total valid samples
- Descriptive, location and central tendency measures, stat category which includes:
Number of distinct values (default)
Number of missing values (default)
Minimum value (default)
Index of minimum element
Maximum value (default)
Index of maximum element
Sum value (default)
Absolute sum value
Product value
Absolute product value
Mean value (default)
Absolute mean values
Geometric mean value
Geometric absolute mean value
Harmonic mean value
Harmonic absolute mean value
Mode value (default)
Number of mode elements (default)
Index of mode element
Median value (default)
Lower quartile
Upper quartile
Lower whisker for box plot
Upper whisker for box plot
- Dispersion and heterogeneity measures, stat category which includes:
Range of values
Interquartile range
Standard error of mean
Standard deviation (default)
Standard error of standard deviation
Variance
Standard error of variance
Coefficient of variation
Mean absolute deviation
Median absolute deviation
Pietra index
Entropy
Normalized entropy
Gini coefficient
Normalized Gini coefficient
- Concentration measures, stat category which includes:
Gini concentration index
- Symmetry and shape measures, stat category which includes:
Skewness value
Standard error of skewness
Kurtosis value
Standard error of kurtosis
You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, the unique computation option, which can be found in the Options tab, is:
Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).
Values, frequencies and quantiles¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of total valid samples
- Frequencies indicators, stat category which includes:
Distinct values (default)
Absolute frequencies (default)
Relative frequencies
Cumulative frequencies
Partial sums
Partial means
Lorenz Curve
Pietra Curve
Generic quantiles
Rank
You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, you can specify in the Options tab the following options:
Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).
Value for generic quantiles: the number of quantiles used when requested by the frequencies’ evaluation.
Correlation/Covariance¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of total valid samples
- Pearson correlation coefficient, stat category which includes:
r-value of Pearson coefficient (default)
P-value of Pearson coefficient (default)
- Spearman Correlation Coefficient, stat category which includes:
ρ -value for Spearman coefficient (default)
P-value for Spearman coefficient
- Kendall Tau, stat category which includes:
τ -value for Kendall Tau
P-value for Kendall Tau
- Simple regression coefficient, stat category which includes:
β -value for Simple regression coefficient
P-value for Simple regression coefficient
You can check/uncheck single statistic to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, no computation options are present in the Options tab.
Cross tabulation statistics¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of total valid samples
- Contingency tables, stat category which includes:
Contingency table (default)
Expected contingency table
- Statistical test, stat category which includes:
Pearson χ square (default)
P-value for Pearson χ square (default)
You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, the unique computation option, which can be found in the Options tab, is:
Use missing values to control if a missing value has to be considered during stats evaluation.
ROC Curve¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of valid positives samples
Number of valid negatives samples
Number of total valid samples
- ROC curve (scalar), stat category which includes:
Area Under Curve (default)
P-value of Area Under Curve (default)
Standard Error of Area Under Curve (default)
Point of maximum youden index
Point closest to ``(0, 1)``
Point of maximum accuracy
Point with specificity = sensitivity
- ROC curve (vector), stat category which includes:
AUC 95% confidence interval (default)
1-Specificity (default)
Sensitivity (default)
Accuracies
Thresholds (default)
Youden indices (default)
Likelihood ratio - (default)
Likelihood ratio + (default)
You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, you can specify in the Options tab the following computation options:
Statistics on integer variables are continuous: a checkbox controlling if the result of a statistic evaluation on integer attribute must be converted to integer as well (if option unchecked) or not (if option checked).
Use target attribute: select if you want to use the terms within the Var_2/Target area as the target for the ROC curve.
Consider missing value as target with negative outcome: if selected, in a binary classification the cases with missing outputs are considered as if they have a negative output.
- Positive test for: this drop-down menu allows you to select one of the following options:
Greater Values
Lower Values
Automatic Selection
Test for independent samples¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of valid positives samples
Number of valid negatives samples
Number of total valid samples
- Wilcoxon and Mann-Whitney test, stat category which includes:
Mann-Whitney U-value
Mann-Whitney normalized U-value (default)
Wilcoxon R1 -value
Wilcoxon Normalized R1 -value
P-value of Wilcoxon test (default)
- Kolmogorov-Smirnov test, stat category which includes:
KS value (default)
P-value for KS test (default)
- Student t-test, stat category which includes:
Student t-value (default)
P-value for Student t-test (default)
- Levene test, stat category which includes:
F-value for Levene test (default)
P-value for Levene test (default)
You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, you can specify in the Options tab the following computation options:
Use target attribute: select if you want to use the terms within the Var_2/Target area as the target for the ROC curve.
Test for paired samples¶
Stats available in the Statistics tab for this subclass are:
- Sample size, stat category which includes:
Number of total valid pairs
- Student t-test, stat category which includes:
Student t-value (default)
P-value for Student t-test (default)
- Wilcoxon test, stat category which includes:
W-value for Wilcoxon test (default)
W-value for normalized Wilcoxon test
P-value for Wilcoxon test (default)
Number of unequal pairs
You can check/uncheck single statistics to add/remove them from your computed list. You can also check/uncheck the whole stat category to add/remove all its entries from your evaluated list.
For this subclass, no computation options are present in the Options tab.