Hierarchical Basket Analysis¶

The Hierarchical Basket Analysis task generates association rules from frequent itemsets identified by the Frequent Itemsets Mining task.

The task is divided into three tabs:

the Options tab, where you can configure the analysis features.
the Association rules tab, where you can visualize the generated association rules in a spreadsheet format.
the Results tab, where results on computation are shown.

The Options tab¶

The Options tab contains all the task’s features that can be customized to obtain the desired output.

It is divided into three tabs: the Basic, the Advanced and the Output tabs.

The Available attributes list is always displayed, no matter which Options tab is opened.

The Basic tab has a full range of options:

In the Advanced tab, you will find two Attribute filters related to the following options:

Attribute to filter to select rows including relevant items: drag the attribute from the Available attributes list to specify a filtering criterion.
Items satisfying this criterion are considered as items to be replaced, regardless the number of transactions in which the item appears, that is its support.
Attribute to filter to discard rows including irrelevant items: drag the attribute from the Available attributes list to specify a filtering criterion.
Items satisfying this criterion are considered as items to be discarded, regardless the number of transactions in which the item appears, that is its support. If both the selecting and the discarding filters are specified, the discarding filter prevails.

In the Output tab, you can set the output’s features. The following options are provided:

No maximum # of premises/consequences: if selected, no maximum number of premises can be specified.
Minimum number of different attributes involved in each rule: specify the minimum number of different attributes that must be included in each role.
Negative rules (NOT A implies B, A implies NOT B): if selected, negative rules are also generated. Negative rules are rules for which premise(s) or consequence(s) appear in negative form. For instance: A implies NOT B or NOT A implies B.
Maximum Kulczynski value which triggers the check for negative rules: considering that the presence of a high value for the Kulczynski index identifies a strong and robust correlation between premises and consequences constituting a rule, the same index can also be used, from another perspective, to guide the mining of negative rules.
Consequently if the Kulczysnki index is low (up to the specified maximum value), it is evaluated if the considered rule becomes strong when expressed in negative form (for instance when denying the premise).
This option is enabled only if the Negative Rules (NOT A implies B, A implies NOT B) option is selected.
Maximum # of premises/consequences, negative rules: the maximum number of premises and consequences of the association rules.
This option is enabled only if the Negative Rules (NOT A implies B, A implies NOT B) option is selected.

In the Association rules tab, the details about the generated association rules are provided in a spreadsheet format.

The spreadsheet contains the following attributes:

The Results tab provides information about the computation. It is divided into two sections:

The General info area provides the following information:
- Task Label, which is the task’s name on the interface.
- Elapsed time (sec), which indicates how long it took to complete the computation.
The Result Quantities contains the data quantities: check the results to be visualized, then open them by clicking on the arrow button to visualize the quantities’ values. The following information is provided:
- Minimum support # threshold for items: the minimum threshold for items applied during the latest computation, in absolute terms. It is divided into Item Support and Item.
- Minimum support threshold for items (percentage): the minimum threshold for items applied during the latest computation as a percentage. It is divided into Item Support and Item.
- Number of different items in input: the number of distinct items which were fed to the task during the latest computation. It is divided into Items, Orders, Rules.
- Number of different orders in input: the number of distinct orders which were fed to the task during the latest computation. It is divided into Items, Orders, Rules.
- Number of generated association rules: the number of the associative rules displayed in the Association Rules tab. It is divided into Items, Orders, Rules.

After having extracted the frequent sequences with the Frequent Itemsets Mining task, add the Hierarchical Basket Analysis task to the flow.

Set the following options:

Association rules are stored in the Association Rules tab. Each association rule will be characterized by premise(s) and consequence(s). If, for instance, a rule includes tropical fruit as a premise and citrus fruit as a consequence, it means that if a transaction includes a tropical fruit, it is also likely to include a citrus fruit.

Different indicators qualify and quantify the strength of this cross-selling relationship. To view which rules have the highest confidence, right-click on the Confidence column in the Association Rules tab and select Sort Descending.
We can now perform a few further steps in order to analyze the extracted rules in further detail, and perform filtering and statistical operations on the rules.

Add a Data Manager task to the Import from Task to analyze the rules by filtering them in the Query Manager pane.
In this example, we wanted to filter all the values in the Lift attribute higher than 1.

Alternatively you could also compute min/max or average values in the Sheets tab. For example by using the Variance option from the univariate statistics on the Confidence attribute, as in the example.