The Attribute List

Located on the left side of the Data Manager task, the Attribute list contains the whole list of features of your dataset. It is meant to be the primary location for the metadata information about your underlying data.

In Rulex Platform, we refer to the actually columns of the dataset as attributes or features. Moreover, features are then further classifies as Attribute or Result according to their origin:

  • if they are provided by the user through import tasks or added with explicit column addition in the Data Manager they are called Attribute

  • if they are originated by an automatic routine of a Machine Learning task they are called Result.

As a matter of convenience, Results are always placed after the Attributes in the table visualization.


Feature properties

A single feature owns several properties completely defining it.

Property

Graphical visualization

Comments or Details

Name

Name is written inside the Attribute chip in the text contrast color.

In Rulex Studio to permit multi table attribute list, the Table name is advanced to the actual feature name keeping it separated thanks to the ! separator. To see it graphically you should continuously click on the Type avatar letter on the Attribute chip.

Position

Index is written on the left side of the Attribute chip.

Type

Type is given by the color of the Attribute chip and by Type avatar letter present on the left side of the Attribute chip.

A complete list of the available types is given in this table.

Role

Role is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab.

A complete list of the available roles is given in one of the following summary tables.

Ignored

An ignored attribute shows its Attribute chip as greyed out.

On the spreadsheet an ignored column shows by default a gray background color.

Label

Label setting is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab.

if set, it states the attribute as a label attribute. This is used in some ML algorithm (as label clustering) to identify columns to be used as label constructors.

Normalization

Normalization is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab.

It permits to override the type of Normalization to be used when required in pre-processing or classification, regression and clustering tasks.

Distance

Distance is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu or in the Attribute tab.

It permits to override the Distance algorithm to be used when required in pre-processing or classification, regression and clustering tasks.

Maximum

Maximum is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu <#attribute-context-menu>`_or in the `Attribute tab.

It permits to fix the Maximum of the feature to be used when required in pre-processing or classification, regression and clustering tasks. It then will be not evaluated from the data.

Minimum

Minimum is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu <#attribute-context-menu>`_or in the `Attribute tab.

It permits to fix the Minimum of the feature to be used when required in pre-processing or classification, regression and clustering tasks. It then will be not evaluated from the data.

Mean

Mean is only shown in the Edit attribute panel opened by the Edit entry in the attribute context menu <#attribute-context-menu>`_or in the `Attribute tab.

It permits to fix the Mean of the feature to be used when required in pre-processing or classification, regression and clustering tasks. It then will be not evaluated from the data.

One of the most important characteristic for a Rulex Platform feature is its type. The list of the 14 built-in attribute types is contained in the next table:

Type

Description

Color

Avatar Letter

Nominal

Type describing general string, text or categorical information.

N

Binary

Type describing boolean information: only two distinct value are permitted; to the first will be associated false meaning, to the second will be associate true meaning in logical operations.

B

Integer

Type describing integer numbers with sign.

I

Continuous

Type describing real double-precision numbers.

C

Percentage

Type describing a percentage. Even if shown as percentage, value is saved in the dataset as standard double precision number. Type will automatically treat for you the percentage conversion.

%

Currency

Type describing a currency amount. Possible currency unit are up to now Dollar (default) and Euro. A change in the currency unit does not imply any conversion of the value stored.

$

Date

Type describing a full date in the following default format (YYYY-MM-DD).

D

Week

Type describing the year and the week in the following format (YYYY-Www).

W

Month

Type describing the year and the month in the following format (YYYY-MM).

M

Quarter

Type describing the year and the quarter in the following format (YYYY-Qq).

Q

Datetime

Type describing the full datetime in the following format (YYYY-MM-DD HH:mm:ss.000) at millisecond precision.

DT

Time

Type describing the elapsed time (in a not cyclic form) in the following format ((H)HH:mm:ss.000000) at microsecond precision.

T

Another important characteristic for a Rulex Platform feature is its role. The list of the built-in attribute roles is contained in the next table:

Role

Description

Comments

Input

Column used as a standard input for any machine learning algorithms.

This is the default role for any imported or new column.

Output

Column used as output in any subsequent machine learning tasks.

Column is indicated by default with a yellow background in the spreadsheet visualization. Moreover, if a Prediction role column with Name pred(<output name>) is also present, classifier information are shown in the dataset info row.

Prediction

Column containing the expected prediction for a particular output. This prediction was obtained by applying some Machine Learning model on the provided dataset. Default name is pred(<output name>).

On the column is automatically imposed a conditional formatting highlight rule which colors cell background as red for any not output equal prediction and as green for any output equal prediction.

Confidence

Column containing the confidence level for a particular prediction of a considered output. This confidence was obtained by applying some Machine Learning model on the provided dataset. Default name is conf(<output name>).

Column is indicated by default with an orange background in the spreadsheet visualization.

Rule

Column containing the most important rule index for a particular prediction of a considered output. This rule index was obtained by applying some Machine Learning model on the provided dataset. Default name is rule(<output name>).

Column is indicated by default with a purple background in the spreadsheet visualization.


Selecting main data columns

From the Attribute list you can select which column you want to show in the Data tab in the main spreadsheet. This operation is performed by clicking on the checkboxes located next to each feature. Further checkboxes located next to Attributes and Results labels allows the user to check/uncheck the whole categories.

Procedure

  1. Click on a single checkbox to invert its behavior: if checked the column will be shown on the main panel, if unchecked the column will disappear from the dataset visualization.

  2. To check/uncheck a set of features in a unique bulk operation, select them using CTRL/SHIFT click procedure and then use the context menu submenu Check (see more information here).

  3. By clicking on the Attributes or Results checkbox you will uncheck (if before was checked) or you will check (if before was not checked) all the considered category

  4. Anytime some columns inside Attributes or Results category are shown and some others not, the checkbox next to the corresponding category will appear in the indeterminate state. It will become checked if clicked again.

Note

The selection operation performed here is purely a visualization operation: none of these modifications is going to be stored in the manager history or recorded in any undo/redo queue.


Attribute context menu

At any time you can select a set of Attributes or Results or a mix of the two; you can then open the relative context menu by right-clicking on one of them.

Attribute context menu is formed by the following entries:

  • Check: the check submenu allows the user to change in a unique bulk operation the data visualization in the main spreadsheet. The submenu is formed by:
    • Check: all the selected columns will be checked.

    • Uncheck: all the selected columns will be unchecked.

    • Invert: the state of all the selected columns will be inverted.

  • Delete: all the selected columns after a dialog box of confirmation will be erased.

  • Edit: open the Edit attribute dialog which allows to change all the feature properties for the selected columns. A complete description about Edit attribute dialog can be found in this dedicated section.

  • Move: the move submenu allows the user to change the position of all the selected columns. Possible submenu entries are:
    • To the top: move all the selected attributes at the beginning of the corresponding category (Attributes or Results).

    • To the bottom: move all the selected attributes as last entries of the corresponding category (Attributes or Results).

    • Advanced: open the Move attribute dialog which allows controlling in detail the movement operation. Further information about this floating panel is given in this dedicated section.

  • Rename: allows the user to change the name to all the selected columns.

  • Ignored: the ignored submenu allows the user to modify in a unique bulk operation the Ignored feature property for all the selected columns. In particular:
    • Set: ignore all the selected columns.

    • Clear: reset Ignored property for all the selected columns.

  • Impute Missing: assign to missing values present in all the selected columns the value provided through the dedicated dialog box opened after the entry click.

  • Type: change the type for all the selected column casting them to the considered type. Type is chosen through the submenu or by using a dedicated dialog box opened by clicking on Advanced submenu entry.

  • Split: open the Split attribute dialog which allows to control the string split operation applied on all the selected columns. Split operation is available only if all the selected column have type nominal or binary. Complete description about Split attribute dialog structure is postponed to a subsequent section.

By right-clicking on the Attributes or Results label you will open a different context menu containing only one-entry:

  • Add attribute/result: add new columns at the end of the selected category. Users provide the names, the types and the roles for the new columns through a dedicated dialog box opened after the entry selection.

Note

At the end of any interaction with the Attribute context menu all the eventually selected columns remains still selected. To discard a selection, you need to click in any point outside the Attribute list pane.


Dragging and dropping attributes

Selected attribute are then available for drag and drop operation which are the core operation of the whole Data Manager user interface.

By dragging and dropping attributes or results into different places of the Data Manager interface you can:

  • Query the column with SQL base operation:
  • Control plot inputs in the Plot tab.

  • Evaluate univariate and bivariate statistics through the Statistic manager in the Sheet tab.

More information about these interactions are available at the provided links.

By dragging and dropping attributes or results along the same Attribute list you can perform a Move operation, moving all the selected columns after the dropping position. If the dropping position is the Attributes or Results label, they will be moved at the beginning of the considered category.

Procedure

  1. Select a list of attributes from the Attribute list by clicking on them pressing Ctrl key.

  2. Drag and drop this list on top of the not selected attribute on the bottom of your list.

  3. The list of considered columns will be moved to the bottom of your list.


Centering an attribute

Whenever the number of columns increases, in the Data tab, it is hardly possible all the columns fit in the viewport of the main spreadsheet pane. Therefore, to center the view you have to deal with horizontal scrollbar. This operation can be far to be trivial in case the total number of columns is huge with respect to the number of columns you are actually seen.

For this reason it is possible at any time to center your view on a particular attribute or result by clicking on its chip in the Attribute List. This operation will immediately move your viewport in the Data tab to be centered with respect to the selected chip.

Procedure

  1. Scroll on the Attribute List to have at your disposal the searched attribute. You can use also the Magnifier icon to filter the Attribute List.

  2. Click on the colored chip of the selected attribute.

  3. Viewport will now be centered on the selected target which is now visible at the center of the main spreadsheet.