The Attribute List¶
Located on the left side of the Data Manager task, the Attribute list contains the whole list of features of your dataset. It is meant to be the primary location for the metadata information about your underlying data.
In Rulex Platform, we refer to the actually columns of the dataset as attributes or features. Moreover, features are then further classifies as Attribute or Result according to their origin:
if they are provided by the user through import tasks or added with explicit column addition in the Data Manager they are called Attribute
if they are originated by an automatic routine of a Machine Learning task they are called Result.
As a matter of convenience, Results are always placed after the Attributes in the table visualization.
Feature properties¶
A single feature owns several properties completely defining it.
Property |
Graphical visualization |
Comments or Details |
---|---|---|
Name |
Name is written inside the Attribute chip in the text contrast color. |
In Rulex Studio to permit multi table attribute list, the Table name is advanced to the actual feature name keeping it separated thanks to the |
Position |
Index is written on the left side of the Attribute chip. |
|
Type |
Type is given by the color of the Attribute chip and by Type avatar letter present on the left side of the Attribute chip. |
A complete list of the available types is given in this table. |
Role |
Role is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab. |
A complete list of the available roles is given in one of the following summary tables. |
Ignored |
An ignored attribute shows its Attribute chip as greyed out. |
On the spreadsheet an ignored column shows by default a gray background color. |
Label |
Label setting is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab. |
if set, it states the attribute as a label attribute. This is used in some ML algorithm (as label clustering) to identify columns to be used as label constructors. |
Normalization |
Normalization is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab. |
It permits to override the type of Normalization to be used when required in pre-processing or classification, regression and clustering tasks. |
Distance |
Distance is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab. |
It permits to override the Distance algorithm to be used when required in pre-processing or classification, regression and clustering tasks. |
Maximum |
Maximum is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu <#attribute-context-menu>`_or in the `Attribute tab. |
It permits to fix the Maximum of the feature to be used when required in pre-processing or classification, regression and clustering tasks. It then will be not evaluated from the data. |
Minimum |
Minimum is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu <#attribute-context-menu>`_or in the `Attribute tab. |
It permits to fix the Minimum of the feature to be used when required in pre-processing or classification, regression and clustering tasks. It then will be not evaluated from the data. |
Mean |
Mean is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu <#attribute-context-menu>`_or in the `Attribute tab. |
It permits to fix the Mean of the feature to be used when required in pre-processing or classification, regression and clustering tasks. It then will be not evaluated from the data. |
One of the most important characteristic for a Rulex Platform feature is its type. The list of the 14 built-in attribute types is contained in the next table:
Type |
Description |
Color |
Avatar Letter |
---|---|---|---|
Nominal |
Type describing general string, text or categorical information. |
N |
|
Binary |
Type describing boolean information: only two distinct value are permitted; to the first will be associated false meaning, to the second will be associate true meaning in logical operations. |
B |
|
Integer |
Type describing integer numbers with sign. |
I |
|
Continuous |
Type describing real double-precision numbers. |
C |
|
Percentage |
Type describing a percentage. Even if shown as percentage, value is saved in the dataset as standard double precision number. Type will automatically treat for you the percentage conversion. |
% |
|
Currency |
Type describing a currency amount. Possible currency unit are up to now Dollar (default) and Euro. A change in the currency unit does not imply any conversion of the value stored. |
$ |
|
Date |
Type describing a full date in the following default format (YYYY-MM-DD). |
D |
|
Week |
Type describing the year and the week in the following format (YYYY-Www). |
W |
|
Month |
Type describing the year and the month in the following format (YYYY-MM). |
M |
|
Quarter |
Type describing the year and the quarter in the following format (YYYY-Qq). |
Q |
|
Datetime |
Type describing the full datetime in the following format (YYYY-MM-DD HH:mm:ss.000) at millisecond precision. |
DT |
|
Time |
Type describing the elapsed time (in a not cyclic form) in the following format ((H)HH:mm:ss.000000) at microsecond precision. |
T |
Another important characteristic for a Rulex Platform feature is its role. The list of the built-in attribute roles is contained in the next table:
Role |
Description |
Comments |
---|---|---|
Input |
Column used as a standard input for any machine learning algorithms. |
This is the default role for any imported or new column. |
Output |
Column used as output in any subsequent machine learning tasks. |
Column is indicated by default with a yellow background in the spreadsheet visualization. Moreover, if a Prediction role column with Name |
Prediction |
Column containing the expected prediction for a particular output. This prediction was obtained by applying some Machine Learning model on the provided dataset. Default name is |
On the column is automatically imposed a conditional formatting highlight rule which colors cell background as red for any not output equal prediction and as green for any output equal prediction. |
Confidence |
Column containing the confidence level for a particular prediction of a considered output. This confidence was obtained by applying some Machine Learning model on the provided dataset. Default name is |
Column is indicated by default with an orange background in the spreadsheet visualization. |
Rule |
Column containing the most important rule index for a particular prediction of a considered output. This rule index was obtained by applying some Machine Learning model on the provided dataset. Default name is |
Column is indicated by default with a purple background in the spreadsheet visualization. |
Selecting main data columns¶
From the Attribute list you can select which column you want to show in the Data tab in the main spreadsheet. This operation is performed by clicking on the checkboxes located next to each feature. Further checkboxes located next to Attributes and Results labels allows the user to check/uncheck the whole categories.
Procedure
Click on a single checkbox to invert its behavior: if checked the column will be shown on the main panel, if unchecked the column will disappear from the dataset visualization.
To check/uncheck a set of features in a unique bulk operation, select them using CTRL/SHIFT click procedure and then use the context menu submenu Check (see more information here).
By clicking on the Attributes or Results checkbox you will uncheck (if before was checked) or you will check (if before was not checked) all the considered category
Anytime some columns inside Attributes or Results category are shown and some others not, the checkbox next to the corresponding category will appear in the indeterminate state. It will become checked if clicked again.
Note
The selection operation performed here is purely a visualization operation: none of these modifications is going to be stored in the manager history or recorded in any undo/redo queue.
Dragging and dropping attributes¶
Selected attribute are then available for drag and drop operation which are the core operation of the whole Data Manager user interface.
By dragging and dropping attributes or results into different places of the Data Manager interface you can:
Control plot inputs in the Plot tab.
Evaluate univariate and bivariate statistics through the Statistic manager in the Sheet tab.
More information about these interactions are available at the provided links.
By dragging and dropping attributes or results along the same Attribute list you can perform a Move operation, moving all the selected columns after the dropping position. If the dropping position is the Attributes or Results label, they will be moved at the beginning of the considered category.
Procedure
Select a list of attributes from the Attribute list by clicking on them pressing Ctrl key.
Drag and drop this list on top of the not selected attribute on the bottom of your list.
The list of considered columns will be moved to the bottom of your list.
Centering an attribute¶
Whenever the number of columns increases, in the Data tab, it is hardly possible all the columns fit in the viewport of the main spreadsheet pane. Therefore, to center the view you have to deal with horizontal scrollbar. This operation can be far to be trivial in case the total number of columns is huge with respect to the number of columns you are actually seen.
For this reason it is possible at any time to center your view on a particular attribute or result by clicking on its chip in the Attribute List. This operation will immediately move your viewport in the Data tab to be centered with respect to the selected chip.
Procedure
Scroll on the Attribute List to have at your disposal the searched attribute. You can use also the Magnifier icon to filter the Attribute List.
Click on the colored chip of the selected attribute.
Viewport will now be centered on the selected target which is now visible at the center of the main spreadsheet.