Reshape to Long

The reshape to long pivoting operation reshapes a dataset by transforming a set of similar attributes into a new pair of attributes, containing the name of the erased column and the value respectively.

Consequently, the number of attributes in the dataset will be decreased and the number of rows will be increased accordingly.

Rulex Factory’s reshaping operations are similar to creating pivot tables in Microsoft Excel, where you can create a new table with the structure you require by summarizing the data from the original table. However, the Reshape to Long task reshapes the original table without creating a second table, and the new table is structured automatically, increasing or decreasing the number of rows and columns, and repeating some rows to fit the new shape if necessary.

The task is made of only one tab, the Options tab, where configuration adjustments can be made according to your requirements.

You will find more information on how to configure this task in the next paragraph.


The Options tab

The Options tab is divided into two main areas:

  • The Available attributes list, where you can uncheck the attributes you want to remove from (for more information see the corresponding page).

  • The configuration area where you will find all the following configuration options:

    • The Attributes to be transformed in long format: drag and drop the attributes that will become a single column in the resulting data table. Instead of manually dragging and dropping attributes, they can also be defined via a filtered list.

    • The Number of long attributes in the final table: specify over how many columns the united values should be spread. By default, the values of all the Attributes to be transformed in long format are inserted in a single column (with an extra column specifying which attribute the value refers to).

    • The Contiguous attributes in a group belong to the same final long attribute: if selected, contiguous attributes will be used in the same final column, otherwise attributes are selected alternatively.

    • The Keep at least one row for each key: if selected, at least one row is displayed for each key, even if it contains only missing values.


Example

The following example uses the Groceries dataset.

  • After having imported the file onto the stage, drag a Reshape to Long task onto the stage and link it to the source task.

https://cdn.rulex.ai/docs/Factory/reshapetolong-1.webp
  • Double-click on the task to open it and set the reshaping parameters. In the example here, we want our dataset to be reshaped to two columns, one with the order ID, and one with the products. Select all the Item attributes and drag them onto the Attributes to be transformed in long format area. Save and compute the task.

https://cdn.rulex.ai/docs/Factory/reshape-to-long-example-2.webp
  • Add a Data Manager to check the results and link it to the Reshape to Long task. Then, double-click on it. As you can see, the selected attributes have been transformed into the following columns:

    • long_1, which contains the labels of the items (i.e. item 1, item 2 and so on)

    • wide_1, which contains the values.

https://cdn.rulex.ai/docs/Factory/reshape-to-long-example-3.webp