Convert Dataset to Structure#

The Convert Dataset to Structure task can produce a number of structures from a dataset in input. Specifically, these structures include replacement rules, autoregressive models, cluster labels, clusters, discretization cutoffs, frequent itemsets, frequent sequences, results, rules, models and PCA eigenvectors. For more information, refer to the data structures page.

There are many reasons why users might wish to convert structures, for instance:

  • to quickly add many heuristic rules to a flow by inserting the rules into a table, which can then be imported into a flow as a dataset and then converted into a ruleset.

  • convert structures that have previously been converted to a dataset using the Convert Structure to Dataset task back to their original format for in-depth analysis in the Data Manager.

  • create a model from a dataset, which can then be used in an Apply Model task to derive its responses in correspondence with given samples.

The task is made of two tabs: the Options Tab and Results tab.


The Options tab#

The Options tab is made of two panes:

  • Structure pane, where users can select the structure they want to convert.

  • Information pane, where users can visualize useful information about the task.

Structure pane

Within this pane, users can select the structure they want to convert. Available options are:

  • Association rules

  • Auto regressive models

  • Clusters

  • Cluster labels

  • Discretization cutoffs

  • Frequent itemsets

  • Frequent sequences

  • Monitor

  • Results

  • Rules

  • Models

  • Pca eigenvectors

Information pane

The Convert Dataset to Structure task does not have any parameters or options to define, the only operation necessary for transforming the dataset is to compute the task. The sentence “No parameters need to be set for this task: just compute the task by right-clicking it and selecting Compute > Compute selected” will appear as opening the task.


The Results tab#

Within this tab, users can visualize a summary of the computation.

This tab is divided into two panes:

General Info

Within this pane, users can find the following information:

Result Quantities

Within this pane, users can set and configure the following options:

  • Average dispersion

  • Average dispersion of clusters

  • Average weight

  • Davies-Bouldin index

  • Dispersion of default cluster

  • Inter-clusters distance variance

  • Intra-cluster distance variance

  • Maximum dispersion

  • Maximum number of points in a cluster

  • Minimum dispersion

  • Minimum number of points in a cluster

  • Number of clusters

  • Number of distinct samples

  • Number of samples

  • Number of single samples

  • Number of singleton clusters

These checkboxes are checked by default.

On the right of the above-mentioned checkboxes, users, through a drop-down list, will be able to visualize the following information:

  • Train

  • Test

  • Valid

  • Whole


Example#

In the following example, the use of a Convert Dataset to Structure from a Label Clustering task will be analyzed.

  • After having imported the dataset with an Import from Text File task, add a Data Manager task and split the dataset into test and training sets (30% test and 70% training) with the Split Data task.

    Then, add a Label Clustering task. Specify the following constraints:

    • Attributes to consider for clustering:

      • CustomerID

      • Annual Income (k$)

      • Age

    • Label attributes: Gender

    https://cdn.rulex.ai/docs/Factory/convert-dataset-structure-example2.webp
  • Add a Data Manager to the Label Clustering task, then save it and compute it.

    Add a Convert Dataset to Structure task to the previously added Data Manager. As described in the Structure pane, select the structure you want to convert. (in this specific case, the structure to be selected will be Cluster labels).

    No further parameters need to be set for this task. Save and compute it.

    https://cdn.rulex.ai/docs/Factory/convert-dataset-structure-example1.webp