Convert Dataset to Structure¶
The Convert Dataset to Structure task can produce a number of structures from a dataset in input.
These structures include replacement rules, autoregressive models, cluster labels, clusters, discretization cutoffs, frequent itemsets, frequent sequences, results, rules, models and PCA eigenvectors. For more information on the above-mentioned structures, refer to the data structures page.
There are many reasons why users might wish to convert a dataset to a structure, for example:
to quickly add many heuristic rules to a flow by inserting the rules into a table, which can then be imported into a flow as a dataset and then converted into a ruleset.
convert structures that have previously been converted into a dataset using the Convert Structure to Dataset task back to their original format for in-depth analysis in the Data Manager.
create a model from a dataset, which can then be used in an Apply Model task to derive its responses in correspondence with given samples.
The task is made of one tab only, the Options Tab.
The Options tab¶
The Options tab is made of the following panes:
Available attributes list: where the available datasets’ attributes are listed.
Structure pane, where users can select the structure they want to convert.
Configuration pane, which varies according to the chosen conversion structure.
Structure pane
Within this pane, users can select the structure they want to convert. The available options are:
Association rules
Auto regressive models
Clusters
Cluster labels
Discretization cutoffs
Frequent itemsets
Frequent sequences
Monitor
Results
Rules
Models
Pca eigenvectors
Configuration pane
All the structures chosen in the Select the structure option except the Rules one don’t require a specific configuration: the sentence “No parameters need to be set for this task: just compute the task by right-clicking it and selecting Compute > Compute selected” will appear when no structure is selected, or all but the Rules one are selected.
When the Rules structure is chosen, this panel gets updated, and the following options are available:
on the left side of the screen, the Available attributes list is available: it contains all the attributes available for the conversion operation.
- on the right, the following options are available, to define better the Rules structure:
Rule conditions (NOMINAL): drag the attributes containing the rule conditions. Instead of manually dragging and dropping attributes, they can be defined via a filtered list.
Rule ID attribute: specify the attribute containing the ID for each rule. The ID attribute must contain unique and consecutive numbers. If there isn’t any attribute containing the rule ID, this field can be left empty, and the task automatically assigns new numbers to each row.
Rule output name attribute: specify the attribute containing the output attribute.
Rule output value attribute: specify the attribute containing the output attribute values.
Rule covering attribute: specify the attribute containing the covering value.
Rule error attribute: specify the attribute containing the error value.
Example¶
The following example uses the Adult dataset.
In the following example, the use of a Convert Dataset to Structure from a Label Clustering task will be analyzed.
After having imported the dataset with an Import from Text File task, add a Data Manager task and split the dataset into test and training sets (30% test and 70% training) with the Split Data task.
Then, add a Label Clustering task. Specify the following constraints:
Attributes to consider for clustering:
CustomerID
Annual Income (k$)
Age
Label attributes: Gender
Add a Data Manager to the Label Clustering task, then save it and compute it.
Add a Convert Dataset to Structure task to the previously added Data Manager. As described in the Structure pane, select the structure you want to convert. (in this specific case, the structure to be selected will be Cluster labels).
No specific parameters need to be set for this task. Save and compute it.