Import from Text File#

The Import from Text File task makes possible importing directly data from a text file, defining the basic Parsing options.

The Import from Text File task is divided into two tabs: the Options tab (please refer to the page Import Overview for further and more detailed information) and the The Text Configuration tab, whose characteristics and configuration will be explained in the next section.


The Text Configuration tab#

The Text Configuration tab is divided into three panes:

Parsing options

Within this pane, users can set and configure the following options:

  • Data separators options available are: TABBING, COMMA, SEMICOLON, SPACE, OTHER.

  • Number separators: it is divided into Thousands separator and a Decimals separator drop-down list.

  • Missing string: user can enter the word they want to remove from the dataset.

  • Text delimiter: select ‘ or “ if these symbols have been used as string delimiters. They will not be included in the imported file. For example, the string “apartment” will be imported as apartment. This option will remove all instances of text delimiters in the string, and not only the initial and closing symbols. The only exception to this rule will be if the symbol is proceeded by a backslash. For example, “ad"cb” will be imported as ad”cb, while “ad”cb” will be imported as adcb. The data type for values with string delimiters is nominal, and this data type will not be altered by the removal of text delimiters. For example, “3” will be imported as 3, but will remain a nominal value, instead of being converted to an integer.

  • Use contiguous separators as a single one: select the checkbox if you want to force the parser to consider any possible group of adjacent separators as one in text files. For example, if you select this option, the string ‘1,2,,,3’, with the comma as a separator, will be parsed as 1, 2, 3, while if not checked it will be parsed as 1, 2, ‘’, ‘’, 3.

Import options

Within this pane, users can set the following options:

  • Start importing from line: the number of the line from which the importing operations will start.

  • Stop importing at line (0 means all): the number of the line where the importing operations will end. Leave the value 0 if you want the whole dataset to be imported.

  • Get names from line: the number of the line from which the column’s names will be taken.

  • Get types from line: the number of the line from which the attributes’ types will be taken.

  • Column to be imported (empty for all): the number of columns to be imported. If left empty, all the columns will be imported.

  • Remove empty rows: select the checkbox if you want to remove the empty rows from the imported dataset.

  • Add an attribute containing filename: select this option to add an extra column with the name of the file to the dataset.

  • Strip spaces: select this option if you want to remove spaces surrounding strings. For example, the string “ class “ will be imported as “class”.

  • Case sensitive: users can select this checkbox if they want uppercase letters values to be considered different from the lower cases ones.

  • Compress white spaces checkbox: select it to remove extra consecutive spaces from within strings. For example the string “university program” would be imported as “university program”.

  • Turn off smart type recognition: if selected, prevents automatic recognition of data types, leaving the generic nominal type. This option is useful when manual identification is preferable, for example when there is the risk of a code being misinterpreted as a date.

  • Remove empty columns: select the checkbox if you want to remove the empty columns from the imported dataset.

Table Preview

Within this pane, users will be able to visualize a preview of their imported data.

On the right of the Table Preview pane, users can find the Number of records in preview spin box.


Example#

  • Drag an Import from Text File task onto the stage and select the file you want to import.

  • Configure the selected task as explained above in the sections above and link it to a Data Manager.

  • According to the selected text file, your Import from Text File task should look like the example provided below.

https://cdn.rulex.ai/docs/Factory/import_textfile_ex.webp