The Data tab

The Data tab in the Data Manager task displays the dataset in a spreadsheet format.

https://cdn.rulex.ai/docs/Factory/data-tab.webp

Next to the Data label on the tab header a Left-Right arrow icon allows the user to switch from the Data view to the Attribute view. We will concentrate in describing the Attribute view and its interaction in this dedicated page.

The Data view is organized in four different regions, from top to the bottom:

  • The Formula toolbar

  • The Main spreadsheet

  • The Query Manager

  • The Info row bar


Executing a formula

The Formula toolbar allows the user to execute functions or procedure on the various columns of the dataset. The formula is divided in two different terms:

  • Left term, controlling the column where the result is going to be inserted

  • Right term, presenting the real formula.

Right term formula must follow the syntax rules of the underlying GOLD language. Main rules and operators about this language useful to write complex formula are presented here.

To control the Left term of your operation you can:

  • Click on a spreadsheet column (both on the header or on any cell of the visible grid) to automatically fill the left value. Right value will be filled with the latest executed formula, if any.

  • Click directly on the attribute drop down menu and use its autocomplete functionality to find the desired attribute to be selected.

Hint

In the Left term, by interacting directly with the attribute drop down menu you can insert two custom placeholders, leading to some particular behavior of the formula execution:

  • If you write $names, the right term formula will be applied on the list of header names, permitting you to apply a bulk transformation of column names. Similar operations can be performed in the attribute view. Consider this possibility as a shortcut instead of switching view directly. Further information are available below.

  • If you write $null, you are able to execute a formula which does not return anything since this placeholder means your formula has no left term. This could be useful for example to raise warnings or errors through the errorGold under some conditions.

When the Right term of the formula toolbar is focused (this is highlighted by a colored border for the Formula text area and by the blinking cursor) some filling shortcuts are available:

  1. By clicking on a spreadsheet column (both on the header or on any cell of the visible grid) its corresponding formula value $"<column-name>" will be inserted in the text area next to the blinking position cursor. Cursor is then moved accordingly. Using Ctrl and Shift during clicking operation will allows you to create more complicated list attribute expression as for example ($"<column-name-1>", $"<column-name-2>", ...) or ($"<column-name-1>":$"<column-name-2>").

  2. By dropping an attribute or a list of attributes coming from the Attribute list their corresponding formula value $"<column-name>" will be inserted in the text area next to the blinking position cursor. Cursor is then moved accordingly.

  3. Autocompletion is available during your formula typing:
    • Possible functions are suggested any time you start writing a new not quote enclosed value.

    • Possible attributes or results are suggested any time you start typing $.

To blur the Right term of the formula toolbar you need to click outside the whole Data view for example on the Attribute list panel.

Variables can also be used in formulas. They are recognized by the @ prefix. For example, @Today could be a variable that contains the current date. For more information on variables see the variables page.

See also

When typing formulas, the complete formula with the parameters is displayed under the formula bar with a MONACO editor.

Between the Left and the Right term the Formula icon enables the user to open the Functions Menu: further information about the available functions and its menu is presented in this dedicated page. Formula toolbar supports also a whole list of operators inherited directly from the GOLD language which is underlying the whole platform: more information about this language, its syntax and its operators is present in this section. The use of operators in Formula toolbar and in Data Manager formulas is described more in detail here.

Important

When using functions in formulas, pay close attention to the following syntax:

  • Function names are case-sensitive.

  • When specifying the attributes, the syntax is $"attributename"

  • When specifying a string, the syntax is 'string'

  • When specifying continuous values and integers, just type them.

  • Strings can contain any characters, but must always enclose with the symbol ". For example, "sales".

  • Variable names can contain letters (capital and small), numbers and underscore only.

  • If you need to select more than one attribute at a time, you can use either press Ctrl and select all the required attributes: ($"att1", $"att5", $"att11")

  • If you need to select a range of attributes, you can either press Shift and select the first and last attribute in the range, or select them all with the cursor. The formula bar will include a colon to indicate the range: ($"att1":$"att5").

Examples of multiple selection syntax:. * $"att_sum" = sum($"att1",$"att5", $"att11") * $"att_sum" = sum($"att1":$"att5")

At the right end of the Formula Toolbar you can find two buttons performing the following operations:

  • Approve button: which launch the execution of the formula. Same result can be obtained by pressing Enter during Right term typing.

  • Magnifier button: which performs a search for Text operation. Please refer to the searching for a text section presented below for its usage.

Procedure

  1. In the formula bar, in the Left term, enter the attribute you will be defining in one of the following ways:
    • select an existing attribute from a drop-down menu;

    • click on a cell in the spreadsheet.

  2. In the Right term, create the formula which will define the attribute by:
    • entering attributes

    • entering constants

  3. Press enter or click on the Approve button.

Bulk editing attributes’ headers with the $names parameters

If you need to edit the attributes’ headers, willing to apply them a certain naming rule, you can specify the $names parameter in the left-hand side member of the formula bar.

This way, you can write a function which applies to all attributes, it doesn’t matter how many they are.

For example, if you need to change a separator in the attribute’s headers, you can combine different Rulex functions together.

Example

In this example, the ifelse, find and replace functions have been combined. In the Turkish calendar dataset, the __ header separator must be switched to -. Follow the steps listed below:

  • In the formula bar, write $names in the left-hand side member.

  • In the right-hand side member, type the following function: ifelse(find($names,"_")==1, replace($names,"_","-"), $names).

  • Press Enter or click on the green tick at the right of the formula bar. The results are shown below:

https://cdn.rulex.ai/docs/Factory/names_example.webp

Searching for a text

By clicking on the Magnifier icon located at the end of the Formula toolbar you can search into the grid for the presence of a particular text.

In the search box tooltip opened after the click on the icon you can customize your search operation:

  • In the Search text field you can insert the text to search for.

  • From the Search type drop down menu, you can select the type of searching criteria we are going to apply. Possible values are:
    • Include: we consider any cell containing the searched text.

    • Match: we consider only cell containing the exact searched text.

    • Starts with: we consider only cell starting with the searched text.

    • Ends with: we consider only cell ending with the searched text.

    • Does not include: we consider only cell not containing the searched text.

    • Does not match: we consider only cell different from the searched text.

    • Does not start with: we consider only cell not starting with the searched text.

    • Does not end with: we consider only cell not ending with the searched text.

  • The Search by column checkbox changes the scanning direction from left to right row by row to top to bottom column by column.

  • The Case sensitive checkbox controls if Search type check has to be performed in case-sensitive mode.

  • Direction radio button will control if next occurrence will be searched for before the current one (Up value) or after the current one (Down value).

Procedure

  1. Click on the Magnifier icon located at the right end of the Formula toolbar.

  2. Type in the Search text field the text to be searched.

  3. Control search direction by checking Search by column checkbox or Direction radio button.

  4. Click on Find next button to highlight on the spreadsheet next occurrence, or click on Close to end the search operation.


The Main spreadsheet

The central panel of the whole Data Manager task is the main spreadsheet where data contained in the dataset are presented in a spreadsheet format.

If the number of rows or columns of the whole table are greater than the current viewport a set of horizontal and vertical scrollbar will appear to permit you the navigation of the table.

Note

To easily navigate by using the scrollbar you can:

  • Use your mouse wheel to granular scroll in vertical direction (or horizontal by pressing Shift together with wheel interaction). Each wheel step corresponds to a one row/column shift.

  • Click on the scrollbar track to move in vertical or horizontal direction of an entire viewport page.

Spreadsheet visualization is mainly divided into three parts:

  • Index column: where row indexes are displayed.

  • Header row: where attribute names and types are presented.

  • Main grid: containing the actual values of the dataset in a grid shape.

By clicking on any of these three sectors you can highlight a subset of the whole datasets. Multiple selection can be combined with the usual Ctrl or Shift click procedure. The selection is often used in most of the operations we can trigger from the various context menu we can open from the main spreadsheet.

By double-clicking on one of the headers you will automatically resize its column according to the visualized content. Width resizing for columns and height resizing for rows can be granular changed by moving the right end of the header cell or the bottom end of the index cell with the mouse pointer.

Index column context menu

By right-clicking on any index row you can open the following context menu:

  • Set value: it allows the user to insert a unique value which is going to be written in all the selected cells.

  • Add rows: it will add a provided number of rows before or after the selected rows. If you are selecting more than one continuous rows you can decide to insert a unique set of rows at the end of the whole block or before/after each row. In the second case more than one number of new rows needs to be provided. All these configuration options are decided in the dedicated dialog which is shown after this entry selection.

  • Delete rows: it will delete all the selected rows.

  • Go to row: it enables the user to automatically center a particular provided row in the main spreadsheet viewport.

Header column context menu

By right-clicking on any header column you can open the following context menu:

  • Set value: it allows the user to insert a unique value which is going to be written in all the selected cells.

  • Add attributes: it will add a provided number of columns before or after the selected columns. For any new column the user should provide new name, type and role. If you are selecting more than one continuous columns you can decide to insert a unique set of columns at the end of the whole block or before/after each column. In the second case more than one number of new columns needs to be provided. All these configuration options are decided in the dedicated dialog which is shown after this entry selection.

  • Delete attributes: it will delete all the selected columns.

  • Sort ascending: it will sort the selected column in ascending order.

  • Sort descending: it will sort the selected column in descending order.

  • No sort: it will restore the selected column to the original ordering, resetting eventual ascending or descending ordering on them.

  • Go to row: it enables the user to automatically center a particular provided row in the main spreadsheet viewport.

  • Copy formula: if on the selected column it was executed a formula, this operation copy its Right term.

  • Paste formula: it applies the previous copied formula on the whole set of selected columns.

  • Set style column: it changes the style of the whole dataset column. To configure the style a dedicated panel is opened; its content is described in this dedicated page.

  • Set conditional formatting: it changes the formatting rules for the whole dataset column. This rules can change style of the different column cells according to their contained values. To configure the formatting rules a dedicated panel is opened; its content is described in a separated section.

  • Make default column width: it sets the current column width as the default width for the considered column. This implies a change in column style for the property width.

Note

In Paste formula operation, Attribute references expressed by $ are accordingly shifted by the difference in position between the original copied column and the final pasted column. To fix a particular attribute term you can use the equivalent GOLD operator $$. Indeed, the unique difference between $ and $$ is its behavior under copy and paste operation.

Main grid context menu

  • Copy: it copies a unique cell value (the first of your selection) for subsequent paste operation in another cell.

  • Paste: it pastes the copied value in all the selected cells.

  • Copy to clipboard: it copies the text of the selected cells into your computer clipboard to be available for pasting operation even in programs different by Rulex Platform.

  • Set value: it allows the user to insert a unique value which is going to be written in all the selected cells.

  • Go to row: it enables the user to automatically center a particular provided row in the main spreadsheet viewport.

Editing a cell

To start the manual editing of a cell you can:

  • Double-clicking on it.

  • Selecting it and start typing.

Note

Even if both these operations will start the editing operation, there is a substantial difference between the two behaviors:

  • when you double-click on a cell the value already present will be maintained and the cursor will appear blinking at the end of the old text.

  • when you start typing after a cell selection any value already present in the cell will be erased at the beginning of the new typing operation.

Once you have finished your manual editing you can use Arrows or the Enter button to move to a continuous cell (Enter is equal to the Down arrow) and continuing your editing operation. Real value modifications will be triggered only when no input editing will be performed on one of the cell or blur has been triggered by clicking on a different cell or location outside the scope.

A value modification triggers automatically a decideType operation on the considered column to eventually change its type according to the new provided text.


The Query Manager

In the Query Manager you can execute simple and complex SQL-like operation on your underlying dataset.

In particular, the Query Manager is divided into five regions, any of them controlling one of the base SQL-like operation:

  • Pre-filter: executing a WHERE clause operation as the first operation of your query execution. Its internal structure and its internal interactions are described in Filter section.

  • Group: executing a GROUP clause operation on the considered columns. Its internal structure and its internal interactions are described in Group section.

  • Apply: executing an APPLY clause operation on the considered columns. Aggregator function can be selected differently column by column. Its internal structure and its internal interactions are described in Apply section.

  • Sort: executing a SORT clause operation on the considered columns. Its internal structure and its internal interactions are described in Sort section.

  • Post-filter: executing a WHERE clause operation as the last operation of your query execution. Its internal structure and its internal interactions are described in Filter section.

Order of the operations is described by region positions in the Query Manager: regions are executed from left to right in appearing order.

Note

Pre-filter and Post-filter regions are conceptually equal, they only differs by their execution order.

On the right end of the Query Manager panel some query buttons controls query panel execution behavior. In particular:

  • Clear query will reset all the query erasing all the attributes present in the five regions.

  • Make persistent will apply the query by cutting off all the filtered lines and by reordering the whole dataset. All the attributes present in the five regions are erased.

  • Autocommit: this is a two state button. If turned on at any modification of the query, query is executed on the dataset loaded in memory to present the updated state. Otherwise, the query is executed only by pressing the Play button.

  • Play: present only if Autocommit is turned off, it starts the execution of the provided query on the dataset loaded in memory.


The Info row bar

In the Info row bar some general information about the dataset and the selected column is displayed. In particular, it is always present in the first section of the bar:

  • The total number of rows

  • The number of rows displayed, following filter and group operations performed in the Query Manager.

  • The total number of columns

  • The number of columns displayed, following filter and selection operation performed in the Attribute list.

If a column with output role and a column with prediction role sharing the same output name is present, in second section of the bar you can find:

  • Information about Correct Classification: in case of Machine Learning Classification.

  • Information about Standard Error and Novelty: in case of Machine Learning Regression.

Last part on the right of the whole row bar is filled only following a command; indeed the third area of the Info row bar contains some statistics about the selected feature. In particular for the last selected column the following values are shown:

  • Minimum (ordered)

  • Maximum (ordered)

  • Sum (ordered)

  • Mean (ordered)

  • Std. Dev. (ordered)

  • N. missing (ordered and nominal)

  • Mode (nominal)

  • N. distinct. (nominal)