The Data tab¶
The Data tab in the Data Manager task displays the dataset in a spreadsheet format.
Next to the Data label on the tab header a Left-Right arrow icon allows the user to switch from the Data view to the Attribute view. We will concentrate in describing the Attribute view and its interaction in this dedicated page.
The Data view is organized in four different regions, from top to the bottom:
The Formula toolbar
The Main spreadsheet
The Query Manager
The Info row bar
Executing a formula¶
The Formula toolbar allows the user to execute functions or procedure on the various columns of the dataset. The formula is divided in two different terms:
Left term, controlling the column where the result is going to be inserted
Right term, presenting the real formula.
Right term formula must follow the syntax rules of the underlying GOLD language. Main rules and operators about this language useful to write complex formula are presented here.
To control the Left term of your operation you can:
Click on a spreadsheet column (both on the header or on any cell of the visible grid) to automatically fill the left value. Right value will be filled with the latest executed formula, if any.
Click directly on the attribute drop down menu and use its autocomplete functionality to find the desired attribute to be selected.
Hint
In the Left term, by interacting directly with the attribute drop down menu you can insert two custom placeholders, leading to some particular behavior of the formula execution:
If you write
$names
, the right term formula will be applied on the list of header names, permitting you to apply a bulk transformation of column names. Similar operations can be performed in the attribute view. Consider this possibility as a shortcut instead of switching view directly. Further information are available below.If you write
$null
, you are able to execute a formula which does not return anything since this placeholder means your formula has no left term. This could be useful for example to raise warnings or errors through the errorGold under some conditions.
When the Right term of the formula toolbar is focused (this is highlighted by a colored border for the Formula text area and by the blinking cursor) some filling shortcuts are available:
By clicking on a spreadsheet column (both on the header or on any cell of the visible grid) its corresponding formula value
$"<column-name>"
will be inserted in the text area next to the blinking position cursor. Cursor is then moved accordingly. Using Ctrl and Shift during clicking operation will allows you to create more complicated list attribute expression as for example($"<column-name-1>", $"<column-name-2>", ...)
or($"<column-name-1>":$"<column-name-2>")
.By dropping an attribute or a list of attributes coming from the Attribute list their corresponding formula value
$"<column-name>"
will be inserted in the text area next to the blinking position cursor. Cursor is then moved accordingly.- Autocompletion is available during your formula typing:
Possible functions are suggested any time you start writing a new not quote enclosed value.
Possible attributes or results are suggested any time you start typing
$
.
To blur the Right term of the formula toolbar you need to click outside the whole Data view for example on the Attribute list panel.
Variables can also be used in formulas. They are recognized by the @
prefix. For example, @Today
could be a variable that contains the current date.
For more information on variables see the variables page.
See also
When typing formulas, the complete formula with the parameters is displayed under the formula bar with a MONACO editor.
Between the Left and the Right term the Formula icon enables the user to open the Functions Menu: further information about the available functions and its menu is presented in this dedicated page. Formula toolbar supports also a whole list of operators inherited directly from the GOLD language which is underlying the whole platform: more information about this language, its syntax and its operators is present in this section. The use of operators in Formula toolbar and in Data Manager formulas is described more in detail here.
Important
When using functions in formulas, pay close attention to the following syntax:
Function names are case-sensitive.
When specifying the attributes, the syntax is
$"attributename"
When specifying a string, the syntax is
'string'
When specifying continuous values and integers, just type them.
Strings can contain any characters, but must always enclose with the symbol
"
. For example,"sales"
.Variable names can contain letters (capital and small), numbers and underscore only.
If you need to select more than one attribute at a time, you can use either press Ctrl and select all the required attributes:
($"att1", $"att5", $"att11")
If you need to select a range of attributes, you can either press Shift and select the first and last attribute in the range, or select them all with the cursor. The formula bar will include a colon to indicate the range:
($"att1":$"att5")
.
Examples of multiple selection syntax:.
* $"att_sum"
= sum($"att1",$"att5", $"att11")
* $"att_sum"
= sum($"att1":$"att5")
At the right end of the Formula Toolbar you can find two buttons performing the following operations:
Approve button: which launch the execution of the formula. Same result can be obtained by pressing Enter during Right term typing.
Magnifier button: which performs a search for Text operation. Please refer to the searching for a text section presented below for its usage.
Procedure
- In the formula bar, in the Left term, enter the attribute you will be defining in one of the following ways:
select an existing attribute from a drop-down menu;
click on a cell in the spreadsheet.
- In the Right term, create the formula which will define the attribute by:
entering attributes
entering constants
Press enter or click on the Approve button.
Bulk editing attributes’ headers with the $names
parameters¶
If you need to edit the attributes’ headers, willing to apply them a certain naming rule, you can specify the $names
parameter in the left-hand side member of the formula bar.
This way, you can write a function which applies to all attributes, it doesn’t matter how many they are.
For example, if you need to change a separator in the attribute’s headers, you can combine different Rulex functions together.
Example
In this example, the ifelse
, find
and replace
functions have been combined. In the Turkish calendar dataset, the __
header separator must be switched to -
. Follow the steps listed below:
In the formula bar, write
$names
in the left-hand side member.In the right-hand side member, type the following function:
ifelse(find($names,"_")==1, replace($names,"_","-"), $names)
.Press Enter or click on the green tick at the right of the formula bar. The results are shown below:
Searching for a text¶
By clicking on the Magnifier icon located at the end of the Formula toolbar you can search into the grid for the presence of a particular text.
In the search box tooltip opened after the click on the icon you can customize your search operation:
In the Search text field you can insert the text to search for.
- From the Search type drop down menu, you can select the type of searching criteria we are going to apply. Possible values are:
Include: we consider any cell containing the searched text.
Match: we consider only cell containing the exact searched text.
Starts with: we consider only cell starting with the searched text.
Ends with: we consider only cell ending with the searched text.
Does not include: we consider only cell not containing the searched text.
Does not match: we consider only cell different from the searched text.
Does not start with: we consider only cell not starting with the searched text.
Does not end with: we consider only cell not ending with the searched text.
The Search by column checkbox changes the scanning direction from left to right row by row to top to bottom column by column.
The Case sensitive checkbox controls if Search type check has to be performed in case-sensitive mode.
Direction radio button will control if next occurrence will be searched for before the current one (Up value) or after the current one (Down value).
Procedure
Click on the Magnifier icon located at the right end of the Formula toolbar.
Type in the Search text field the text to be searched.
Control search direction by checking Search by column checkbox or Direction radio button.
Click on Find next button to highlight on the spreadsheet next occurrence, or click on Close to end the search operation.
The Main spreadsheet¶
The central panel of the whole Data Manager task is the main spreadsheet where data contained in the dataset are presented in a spreadsheet format.
If the number of rows or columns of the whole table are greater than the current viewport a set of horizontal and vertical scrollbar will appear to permit you the navigation of the table.
Note
To easily navigate by using the scrollbar you can:
Use your mouse wheel to granular scroll in vertical direction (or horizontal by pressing Shift together with wheel interaction). Each wheel step corresponds to a one row/column shift.
Click on the scrollbar track to move in vertical or horizontal direction of an entire viewport page.
Spreadsheet visualization is mainly divided into three parts:
Index column: where row indexes are displayed.
Header row: where attribute names and types are presented.
Main grid: containing the actual values of the dataset in a grid shape.
By clicking on any of these three sectors you can highlight a subset of the whole datasets. Multiple selection can be combined with the usual Ctrl or Shift click procedure. The selection is often used in most of the operations we can trigger from the various context menu we can open from the main spreadsheet.
By double-clicking on one of the headers you will automatically resize its column according to the visualized content. Width resizing for columns and height resizing for rows can be granular changed by moving the right end of the header cell or the bottom end of the index cell with the mouse pointer.
Editing a cell¶
To start the manual editing of a cell you can:
Double-clicking on it.
Selecting it and start typing.
Note
Even if both these operations will start the editing operation, there is a substantial difference between the two behaviors:
when you double-click on a cell the value already present will be maintained and the cursor will appear blinking at the end of the old text.
when you start typing after a cell selection any value already present in the cell will be erased at the beginning of the new typing operation.
Once you have finished your manual editing you can use Arrows or the Enter button to move to a continuous cell (Enter is equal to the Down arrow) and continuing your editing operation. Real value modifications will be triggered only when no input editing will be performed on one of the cell or blur has been triggered by clicking on a different cell or location outside the scope.
A value modification triggers automatically a decideType operation on the considered column to eventually change its type according to the new provided text.
The Query Manager¶
In the Query Manager you can execute simple and complex SQL-like operation on your underlying dataset.
In particular, the Query Manager is divided into five regions, any of them controlling one of the base SQL-like operation:
Pre-filter: executing a WHERE clause operation as the first operation of your query execution. Its internal structure and its internal interactions are described in Filter section.
Group: executing a GROUP clause operation on the considered columns. Its internal structure and its internal interactions are described in Group section.
Apply: executing an APPLY clause operation on the considered columns. Aggregator function can be selected differently column by column. Its internal structure and its internal interactions are described in Apply section.
Sort: executing a SORT clause operation on the considered columns. Its internal structure and its internal interactions are described in Sort section.
Post-filter: executing a WHERE clause operation as the last operation of your query execution. Its internal structure and its internal interactions are described in Filter section.
Order of the operations is described by region positions in the Query Manager: regions are executed from left to right in appearing order.
Note
Pre-filter and Post-filter regions are conceptually equal, they only differs by their execution order.
On the right end of the Query Manager panel some query buttons controls query panel execution behavior. In particular:
Clear query will reset all the query erasing all the attributes present in the five regions.
Make persistent will apply the query by cutting off all the filtered lines and by reordering the whole dataset. All the attributes present in the five regions are erased.
Autocommit: this is a two state button. If turned on at any modification of the query, query is executed on the dataset loaded in memory to present the updated state. Otherwise, the query is executed only by pressing the Play button.
Play: present only if Autocommit is turned off, it starts the execution of the provided query on the dataset loaded in memory.
The Info row bar¶
In the Info row bar some general information about the dataset and the selected column is displayed. In particular, it is always present in the first section of the bar:
The total number of rows
The number of rows displayed, following filter and group operations performed in the Query Manager.
The total number of columns
The number of columns displayed, following filter and selection operation performed in the Attribute list.
If a column with output role and a column with prediction role sharing the same output name is present, in second section of the bar you can find:
Information about Correct Classification: in case of Machine Learning Classification.
Information about Standard Error and Novelty: in case of Machine Learning Regression.
Last part on the right of the whole row bar is filled only following a command; indeed the third area of the Info row bar contains some statistics about the selected feature. In particular for the last selected column the following values are shown:
Minimum (ordered)
Maximum (ordered)
Sum (ordered)
Mean (ordered)
Std. Dev. (ordered)
N. missing (ordered and nominal)
Mode (nominal)
N. distinct. (nominal)
- Executing queries
- Data Manager functions
- The Attributes tab
- Styling dataset columns