The Attribute List#

Located on the left side of the Data Manager task, the Attribute List contains the whole list of features of your dataset. It is meant to be the primary location for the metadata information about your underlying data.

In Rulex Platform, the actual columns of the dataset are referred to as attributes or features. Features are further classified as Attribute or Result according to their origin:

  • if they are provided by the user through import tasks or added by explicitly inserting columns in a Data Manager, they are called Attribute

  • if they originate from an automatic routine of a machine learning task, they are called Result.

Results are always placed after the Attributes in the table display.


Feature properties#

A single feature has several properties that completely define it. The following table illustrates the properties of a feature in a Data Manager task:

Property

Graphical visualization

Comments or Details

Name

The name is written inside the Attribute chip in the text contrast color.

In Rulex Studio to allow attributes to be listed in more than one table, the Table name is preceded by the actual feature name, keeping it separated by using the ! separator. To see this graphically, you must continuously click on the Type avatar letter on the Attribute chip.

Position

The index is written on the left side of the Attribute chip.

There are no comments or further details for this property.

Type

The type property is given by the color of the Attribute chip and by Type avatar letter present on the left side of the Attribute chip.

A complete list of all the available types is given in the table below.

Role

The role can only be shown in the Edit attribute panel, by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attributes tab.

A complete list of all the available roles is given in one of the following summary tables.

Ignored

An ignored attribute has its Attribute chip greyed out.

On the spreadsheet an ignored column shows by default a gray background color.

Label

Label setting can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab.

If this property is checked, the attribute will be defined as a label attribute. This labelling is used in some ML algorithms (e.g. Label Clustering task) to identify the columns to be used as label constructors.

Normalization

Normalization can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab.

It allows the Normalization type to be overridden when required in pre-processing or classification, regression and clustering tasks.

Distance

The distance property can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab.

It allows the Distance algorithm to be overridden when required in pre-processing or classification, regression and clustering tasks.

Maximum

The maximum can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab.

It allows the Maximum of the feature to be fixed when required in pre-processing or classification, regression and clustering tasks. It will then not be evaluated from the data.

Minimum

The minimum can only shown in the Edit attribute panel by clicking the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab.

It allows the Minimum of the feature to be fixed when required in pre-processing or classification, regression and clustering tasks. It will then not be evaluated from the data.

Mean

The mean property can only be shown in the Edit attribute panel by clicking on the Edit entry in the attribute context menu or by clicking on the double arrow button at the left of the Data tab and switching to the Attribute tab.

It allows the Mean of the feature to be fixed when required in pre-processing or classification, regression and clustering tasks. It will then not be evaluated from the data.

One of the most important characteristic of a Rulex Platform feature is its type. The list of the 14 built-in attribute types is covered in the next table:

Type

Description

Color

Avatar Letter

Nominal

Type describing a general string, text or categorical information.

N

Binary

Type describing a boolean expression: only two distinct values are allowed, e.g. the value false will be assigned to the first, and the value true to the second when using logical operators.

B

Integer

Type describing integer numbers with signs.

I

Continuous

Type describing real double-precision numbers.

C

Percentage

Type describing a percentage. Even if shown as a percentage, the value is saved in the dataset as standard double precision number. The type will automatically process the conversion to a percentage.

%

Currency

Type describing a currency amount. Possible currency unit are so far Dollar (default) and Euro. Changing the currency unit does not imply any conversion of the stored value.

$

Date

Type describing a full date in the following default format (YYYY-MM-DD).

D

Week

Type describing the year and the week in the following format (YYYY-Www).

W

Month

Type describing the year and the month in the following format (YYYY-MM).

M

Quarter

Type describing the year and the quarter in the following format (YYYY-Qq).

Q

Datetime

Type describing the full datetime in the following format (YYYY-MM-DD HH:mm:ss.000) with millisecond accuracy.

DT

Time

Type describing the elapsed time (in a not cyclic form) in the following format ((H)HH:mm:ss.000000) with microsecond accuracy.

T

Another important characteristic of a Rulex Platform feature is its role. The list of the built-in attribute roles is explained in the following table:

Role

Description

Comments

Input

Column used as a standard input for any machine learning algorithms.

This is the default role for any imported or new column.

Output

Column used as output in any subsequent machine learning tasks.

The output role column is displayed by default with a yellow background in the spreadsheet main panel and if there is also a Prediction role column with the name: pred(<output name>), the classifier information is shown in the dataset info row.

Prediction

Column containing the expected prediction for a particular output. This prediction was extracted by applying some machine learning model to the provided dataset. Its default name is pred(<output name>).

A conditional formatting highlighting rule is automatically applied to the column, coloring the cell background red for each prediction not equal to the output and green for each prediction equal to the output.

Confidence

Column containing the confidence level for a particular prediction of a considered output. This confidence was extracted by applying some machine learning model to the provided dataset. Its default name is conf(<output name>).

This confidence column is displayed by default with an orange background in the spreadsheet main panel.

Rule

Column containing the index of the most important rule for a particular prediction of a considered output. This rule index was extracted by applying some machine learning model to the provided dataset. Its default name is rule(<output name>).

The column is displayed by default with a purple background in the spreadsheet main panel.


Selecting main data columns#

From the Attribute List you can select which column you want to show in the Data tab of the main spreadsheet. This operation is performed by selecting the checkboxes located at the far left of each feature. Additional checkboxes are located next to the Attributes and Results labels, allowing users to select/clear these entire categories.

Follow the guidelines below if you want to effectively manage the display of columns in the main pane of the Data tab:

See also

  • Click on a single checkbox to change the column display in the main panel: if the checkbox of a feature is selected, the column will be shown in the main panel, if this is cleared, the column will disappear from the dataset visualization.

  • To check/uncheck a set of features in a unique bulk operation, select them using CTRL/SHIFT+click, right-click on them and then choose from the Check context submenu the entry you need (more information on this Check entry can be found here).

  • By clicking on the Attributes or Results checkbox you will uncheck (if before was checked) or you will check (if before was not checked) all the considered category.

  • When some columns inside Attributes or Results category are shown and others are not, the checkbox next to the corresponding category appears in an undefined state. It will become selected if you click on it again.

Note

The selection operation performed here is purely a visualization operation: none of these changes will be stored in the manager’s history or recorded in an undo/redo queue.


Attribute context menu#

At any time, you can select a set of Attributes or Results or a mix of the two, and then you can open the relative context menu by right-clicking on one of them.

The Attribute context menu contains the following entries:

  • Check: the Check submenu allows users to change the data visualization of the main spreadsheet in a unique bulk operation. The submenu consist of:
    • Check entry: all the selected columns will be checked.

    • Uncheck entry: all the selected columns will be unchecked.

    • Invert entry: the status of all the selected columns will be inverted.

  • Delete: all the selected columns will be erased.

  • Edit: opens the Edit attribute dialog box which allows users to change all the feature properties for the selected columns. For more information on the Edit attribute dialog box, see subsection: Editing an attribute.

  • Move: the Move submenu allows users to change the position of all the selected columns. Possible submenu entries are:
    • To the top entry: moves all the selected attributes to the top of the corresponding category (Attributes or Results).

    • To the bottom entry: moves all the selected attributes to the last entry of the corresponding category (Attributes or Results).

    • Advanced entry: opens the Move attribute dialog box which allows users to control the move operation in detail. For more information on this floating panel, see subsection: Moving an attribute.

  • Rename: allows users to change the name to all the selected columns.

  • Ignored: the Ignored submenu allows users to change the Ignored feature property for all the selected columns in a unique bulk operation. This submenu has two options:
    • Set entry: ignores all the selected columns.

    • Clear entry: restores the display for all the selected columns.

  • Impute Missing: assigns to any present missing values in all the selected columns, the value provided through Set missing values panel that opens after clicking on the entry.

  • Type: changes the data type for all the selected columns, casting them to the chosen type. Type is selected through the submenu or by clicking on the Advanced submenu entry and using the Set Type panel.

  • Split: opens the Split Attribute dialog box which allows users to control the string split operation applied on all the selected columns. This split operation is only available if all the selected columns have a nominal or binary data type. For additional information on the Split attribute dialog box, see subsection: Splitting a nominal attribute.

By right-clicking on the Attributes or Results label you will open a different context menu containing only one-entry:

  • Add attribute/result: add new columns to the end of the selected category. Users provide the names, types and roles of the new columns through the Add Attribute dialog box that opens after this entry is selected.

Note

At the end of any interaction with the Attribute context menu, the selected columns remain selected. To discard a selection, click anywhere outside the Attribute List pane.


Dragging and dropping attributes#

Selected attributes are then available for drag-and-drop operations, which are at the core of the entire Data Manager user interface.

By dragging and dropping attributes or results to different places of the Data Manager interface, you can:

  • run a query on a column using SQL basic operations such as:
  • Control plot inputs in the Plot tab.

  • Evaluate univariate and bivariate statistics through the Statistic manager in the Sheet tab.

More information on these interactions can be found in the links above.

When dragging and dropping attributes or results along the same Attribute List, you can perform a Move operation, moving all the selected columns after the release position. If the dropping position is the Attributes or Results label, these will be moved to the beginning of the respective category.

To move a set of attributes to the bottom of the Attribute List, follow the steps listed below:

Procedure

  1. Select a list of attributes from the Attribute List by using CTRL/SHIFT+click.

  2. Drag and drop this list over the unselected attributes at the bottom of the Attribute List.


Centering an attribute#

When the number of columns increases, on the Data tab, it is hardly possible for all the columns to fit in the viewport of the main spreadsheet pane. Therefore, the horizontal scroll bar must be used to center the view. This operation can be far from trivial in cases where the total number of columns is huge compared to the number of columns actually displayed.

For this reason it is possible at any time to center your view on a particular attribute or result by clicking on its chip in the Attribute List. This operation will instantly move your viewport in the Data tab to be centered on the selected chip.

To center an attribute in the main Data pane of your Data Manager, follow these step-by-step instructions:

Procedure

  1. Scroll through the Attribute List to find the attribute you are looking for. You can use also the magnifying glass icon to filter the Attribute List.

  2. Click on the colored chip of the chosen attribute.