Import File Info#

The Import File Info task performs a statistic on folders and/or files within them.

Its output is a dataset, containing information on files/folders plus extra information, displayed according to the chosen parameters. Its features will be analyzed in the corresponding section of this page.

The task layout is made of one tab only, the Options tab, explained below.


The Options tab#

The Options tab can be divided into three areas:

  • the Source area, where the source containing the files/folders to be analyzed can be specified. More information on the Source area can be found in the corresponding page.

  • the Statistic extra fields, where the user can choose which details will be displayed in a dedicated attribute in the output dataset.
    • File extension

    • Size (it will be expressed in bytes)

    • Last modified date

    • Creation date

    • Owner

    • Reader group

    • Permissions

  • the folder details area, made of the following options:
    • Consider subfolders in a recursive way: if selected, the details of the files contained in the child folders of the one specified in the Source area will be added to the output dataset.

    • Insert rows also for folder stats: if selected, a row containing each folder’s details is added to the output dataset. Moreover, a new column called Folder is added to the dataset, containing the value True if the row describes a folder, False if the row describes a file.

Warning

When dealing with folders, it might happen that the Size of the folder is 0, even though it contains files. The Size of the folder depends on how the filesystem records it. For example, folders from Google Drive will have size 0, while those coming from the local filesystem will have the size equal to the sum of the size of the files within it.


The output dataset#

As previously said, the output produced by the task is a dataset, whose columns can vary according to the chosen Statistic extra fields.

The file will contain at least two columns:

  • Location: the location of the file/folder.

  • Name: the name of the file/folder.

All the other columns in the dataset depend on the choices made upon the task’s configuration, but some of them, even though they have been selected in the Statistic extra fields, might be empty.

This is due to the fact that this file is filesystem-dependent, so if the filesystem doesn’t record that specific parameter, the cell will be left empty.

In the table below users can find a list of all the possible columns in the output dataset, along with all the filesystems supported by Rulex Factory. The marked cells mean that the information is provided by the corresponding filesystem.

Information

Location

Name

Extension

Folder

Size

Modified

Created

Owner

Group

Permission

Azure Storage

FTP

Google Drive

Local

Outlook

S3

SFTP

Sharedrive

SharePoint

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X (Windows only)

X

X

X

X

X

X

X

X

X

X

X

X

X

X


Example#

The following example doesn’t use any dataset.

  • Add an Import File Info task to the flow and open it.

  • Configure the Stat Source: here, we decided to use a Saved connection to a Filesystem.

  • Check all the options contained in the Statistic extra fields area.

  • Save and compute the task.

https://cdn.rulex.ai/docs/Factory/import-file-info-example-1.webp
  • Add a Data Manager task and link it to the Import File Info task.

  • Double-click the task to open it:
    • The Owner, Group and Permission fields, even though they have been selected when configuring the task are empty in the output dataset, as Sharepoint doesn’t store these details in it.

https://cdn.rulex.ai/docs/Factory/import-file-info-example-2.webp
  • Change the Import File Info task configuration, by selecting the Consider subfolders in a recursive way.

  • Save and compute the task.

  • Add a Data Manager task and open it:
    • As the option previously mentioned has been selected, the files within the child folders has been added: the dataset has now 191 rows, while in the previous screenshot it had 126 rows.

https://cdn.rulex.ai/docs/Factory/import-file-info-example-3.webp
  • Change the Import File Info task configuration, by keeping the Consider subfolders in a recursive way selected, and selecting also the Insert rows also for folder stats.

  • Save and compute the task.

  • Add a Data Manager task and open it:
    • The dataset now has 210 rows, as the rows describing folders have been added. There are 10 columns, as the one containing details whether the row contains information on a folder or not has been added

https://cdn.rulex.ai/docs/Factory/import-file-info-example-4.webp