Convert Dataset to Structure¶
The Convert Dataset to Structure task can produce a number of structures from a dataset in input. Specifically, these structures include replacement rules, autoregressive models, cluster labels, clusters, discretization cutoffs, frequent itemsets, frequent sequences, results, rules, models and PCA eigenvectors. For more information, refer to the data structures page.
There are many reasons why users might wish to convert structures, for instance:
to quickly add many heuristic rules to a flow by inserting the rules into a table, which can then be imported into a flow as a dataset and then converted into a ruleset.
convert structures that have previously been converted to a dataset using the Convert Structure to Dataset task back to their original format for in-depth analysis in the Data Manager.
create a model from a dataset, which can then be used in an Apply Model task to derive its responses in correspondence with given samples.
The task is made of two tabs: the Options Tab and Results tab.
The Options tab¶
The Options tab is made of two panes:
Structure pane, where users can select the structure they want to convert.
Information pane, where users can visualize useful information about the task.
Structure pane
Within this pane, users can select the structure they want to convert. Available options are:
Association rules
Auto regressive models
Clusters
Cluster labels
Discretization cutoffs
Frequent itemsets
Frequent sequences
Monitor
Results
Rules
Models
Pca eigenvectors
Information pane
The Convert Dataset to Structure task does not have any parameters or options to define, the only operation necessary for transforming the dataset is to compute the task. The sentence “No parameters need to be set for this task: just compute the task by right-clicking it and selecting Compute > Compute selected” will appear as opening the task.
The Results tab¶
Within this tab, users can visualize a summary of the computation.
This tab is divided into two panes:
General Info
Within this pane, users can find the following information:
Result Quantities
Within this pane, users can set and configure the following options:
Average dispersion
Average dispersion of clusters
Average weight
Davies-Bouldin index
Dispersion of default cluster
Inter-clusters distance variance
Intra-cluster distance variance
Maximum dispersion
Maximum number of points in a cluster
Minimum dispersion
Minimum number of points in a cluster
Number of clusters
Number of distinct samples
Number of samples
Number of single samples
Number of singleton clusters
These checkboxes are checked by default.
On the right of the above-mentioned checkboxes, users, through a drop-down list, will be able to visualize the following information:
Train
Test
Valid
Whole
Example¶
In the following example, the use of a Convert Dataset to Structure from a Label Clustering task will be analyzed.
After having imported the dataset with an Import from Text File task, add a Data Manager task and split the dataset into test and training sets (30% test and 70% training) with the Split Data task.
Then, add a Label Clustering task. Specify the following constraints:
Attributes to consider for clustering:
CustomerID
Annual Income (k$)
Age
Label attributes: Gender
Add a Data Manager to the Label Clustering task, then save it and compute it.
Add a Convert Dataset to Structure task to the previously added Data Manager. As described in the Structure pane, select the structure you want to convert. (in this specific case, the structure to be selected will be Cluster labels).
No further parameters need to be set for this task. Save and compute it.