The Attribute List¶
Located on the left side of the Data Manager task, the Attribute List contains the whole list of features of your dataset. It is meant to be the primary location for the metadata information about your underlying data.
In Rulex Platform, the actual columns of the dataset are referred to as attributes or features. Features are further classified as Attribute or Result according to their origin:
if they are provided by the user through import tasks or added by explicitly inserting columns in a Data Manager, they are called Attribute
if they originate from an automatic routine of a machine learning task, they are called Result.
Results are always placed after the Attributes in the table display.
Feature properties¶
A single feature has several properties that completely define it. The following table illustrates the properties of a feature in a Data Manager task:
Property |
Graphical visualization |
Comments or Details |
---|---|---|
Name |
The name is written inside the Attribute chip in the text contrast color. |
In Rulex Studio to allow attributes to be listed in more than one table, the Table name is preceded by the actual feature name, keeping it separated by using the |
Position |
The index is written on the left side of the Attribute chip. |
There are no comments or further details for this property. |
Type |
The type property is given by the color of the Attribute chip and by Type avatar letter present on the left side of the Attribute chip. |
A complete list of all the available types is given in the table below. |
Role |
The role can only be shown in the Edit attribute panel, by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attributes tab. |
A complete list of all the available roles is given in one of the following summary tables. |
Ignored |
An ignored attribute has its Attribute chip greyed out. |
On the spreadsheet an ignored column shows by default a gray background color. |
Label |
Label setting can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab. |
If this property is checked, the attribute will be defined as a label attribute. This labelling is used in some ML algorithms (e.g. Label Clustering task) to identify the columns to be used as label constructors. |
Normalization |
Normalization can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab. |
It allows the Normalization type to be overridden when required in pre-processing or classification, regression and clustering tasks. |
Distance |
The distance property can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab. |
It allows the Distance algorithm to be overridden when required in pre-processing or classification, regression and clustering tasks. |
Maximum |
The maximum can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab. |
It allows the Maximum of the feature to be fixed when required in pre-processing or classification, regression and clustering tasks. It will then not be evaluated from the data. |
Minimum |
The minimum can only shown in the Edit attribute panel by clicking the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab. |
It allows the Minimum of the feature to be fixed when required in pre-processing or classification, regression and clustering tasks. It will then not be evaluated from the data. |
Mean |
The mean property can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab. |
It allows the Mean of the feature to be fixed when required in pre-processing or classification, regression and clustering tasks. It will then not be evaluated from the data. |
One of the most important characteristic of a Rulex Platform feature is its type. The list of the 14 built-in attribute types is covered in the next table:
Type |
Description |
Color |
Avatar Letter |
---|---|---|---|
Nominal |
Type describing a general string, text or categorical information. |
N |
|
Binary |
Type describing a boolean expression: only two distinct values are allowed, e.g. the value false will be assigned to the first, and the value true to the second when using logical operators. |
B |
|
Integer |
Type describing integer numbers with signs. |
I |
|
Continuous |
Type describing real double-precision numbers. |
C |
|
Percentage |
Type describing a percentage. Even if shown as a percentage, the value is saved in the dataset as standard double precision number. The type will automatically process the conversion to a percentage. |
% |
|
Currency |
Type describing a currency amount. Possible currency unit are so far Dollar (default) and Euro. Changing the currency unit does not imply any conversion of the stored value. |
$ |
|
Date |
Type describing a full date in the following default format (YYYY-MM-DD). |
D |
|
Week |
Type describing the year and the week in the following format (YYYY-Www). |
W |
|
Month |
Type describing the year and the month in the following format (YYYY-MM). |
M |
|
Quarter |
Type describing the year and the quarter in the following format (YYYY-Qq). |
Q |
|
Datetime |
Type describing the full datetime in the following format (YYYY-MM-DD HH:mm:ss.000) with millisecond accuracy. |
DT |
|
Time |
Type describing the elapsed time (in a not cyclic form) in the following format ((H)HH:mm:ss.000000) with microsecond accuracy. |
T |
Another important characteristic of a Rulex Platform feature is its role. The list of the built-in attribute roles is explained in the following table:
Role |
Description |
Comments |
---|---|---|
Input |
Column used as a standard input for any machine learning algorithms. |
This is the default role for any imported or new column. |
Output |
Column used as output in any subsequent machine learning tasks. |
The output role column is displayed by default with a yellow background in the spreadsheet main panel and if there is also a Prediction role column with the name: |
Prediction |
Column containing the expected prediction for a particular output. This prediction was extracted by applying some machine learning model to the provided dataset. Its default name is |
A conditional formatting highlighting rule is automatically applied to the column, coloring the cell background red for each prediction not equal to the output and green for each prediction equal to the output. |
Confidence |
Column containing the confidence level for a particular prediction of a considered output. This confidence was extracted by applying some machine learning model to the provided dataset. Its default name is |
This confidence column is displayed by default with an orange background in the spreadsheet main panel. |
Rule |
Column containing the index of the most important rule for a particular prediction of a considered output. This rule index was extracted by applying some machine learning model to the provided dataset. Its default name is |
The column is displayed by default with a purple background in the spreadsheet main panel. |
Selecting main data columns¶
From the Attribute List you can select which column you want to show in the Data tab of the main spreadsheet. This operation is performed by selecting the checkboxes located at the far left of each feature. Additional checkboxes are located next to the Attributes and Results labels, allowing users to select/clear these entire categories.
Follow the guidelines below if you want to effectively manage the display of columns in the main pane of the Data tab:
See also
Click on a single checkbox to change the column display in the main panel: if the checkbox of a feature is selected, the column will be shown in the main panel, if this is cleared, the column will disappear from the dataset visualization.
To check/uncheck a set of features in a unique bulk operation, select them using CTRL/SHIFT+click, right-click on them and then choose from the Check context submenu the entry you need (more information on this Check entry can be found here).
By clicking on the Attributes or Results checkbox you will uncheck (if before was checked) or you will check (if before was not checked) all the considered category.
When some columns inside Attributes or Results category are shown and others are not, the checkbox next to the corresponding category appears in an undefined state. It will become selected if you click on it again.
Note
The selection operation performed here is purely a visualization operation: none of these changes will be stored in the manager’s history or recorded in an undo/redo queue.
Dragging and dropping attributes¶
Selected attributes are then available for drag-and-drop operations, which are at the core of the entire Data Manager user interface.
By dragging and dropping attributes or results to different places of the Data Manager interface, you can:
Control plot inputs in the Plot tab.
Evaluate univariate and bivariate statistics through the Statistic manager in the Sheet tab.
More information on these interactions can be found in the links above.
When dragging and dropping attributes or results along the same Attribute List, you can perform a Move operation, moving all the selected columns after the release position. If the dropping position is the Attributes or Results label, these will be moved to the beginning of the respective category.
To move a set of attributes to the bottom of the Attribute List, follow the steps listed below:
Procedure
Select a list of attributes from the Attribute List by using CTRL/SHIFT+click.
Drag and drop this list over the unselected attributes at the bottom of the Attribute List.
Centering an attribute¶
When the number of columns increases, on the Data tab, it is hardly possible for all the columns to fit in the viewport of the main spreadsheet pane. Therefore, the horizontal scroll bar must be used to center the view. This operation can be far from trivial in cases where the total number of columns is huge compared to the number of columns actually displayed.
For this reason it is possible at any time to center your view on a particular attribute or result by clicking on its chip in the Attribute List. This operation will instantly move your viewport in the Data tab to be centered on the selected chip.
To center an attribute in the main Data pane of your Data Manager, follow these step-by-step instructions:
Procedure
Scroll through the Attribute List to find the attribute you are looking for. You can use also the magnifying glass icon to filter the Attribute List.
Click on the colored chip of the chosen attribute.