Import from Word File

The Import from Word File task allows users to import data stored in a MS Word file.

The Import from Word File task is divided into two tabs:


The Options tab

The Options tab follows the structure shown in the Import Overview page.


The Word Configuration tab

The Word Configuration tab is divided into three panes:

Parsing options

Users can transform the chosen data into a more readable format. The following options are available:

  • Data separators: it delimits the values of the data to be imported. Users can select one of the following options:

    • TABBING

    • COMMA

    • SEMICOLON

    • SPACE

    • Other

  • Number separators: users can define symbols to mark thousands and decimals in numbers. You can select the symbols from the Thousands and Decimals drop-down lists.

  • Missing string: user can enter the string they want to remove from the dataset.

  • Text delimiter: select ‘ or “ if these symbols have been used as string delimiters. They will not be included in the imported file.

  • Use contiguous separators as a single one: if selected, it forces the parser to consider any possible group of adjacent separators as one. For example, if you select this option, the string ‘1,2,,,3’, with the comma as a separator, will be parsed as 1, 2, 3, while if not checked it will be parsed as 1, 2, ‘’, ‘’, 3.

Import options

Within this pane, users will find the following options:

  • Starting importing from line: the number of the line from which the importing operations will start.

  • Stop importing at line: the number of the line where the importing operations will end. Leave the value 0 if you want the whole dataset to be imported.

  • Get names from line: the number of the line from which the names of the columns will be taken.

  • Get types from line: the number of the line from which the data types will be taken.

  • Column to be imported (empty for all): the number of columns to be imported. If left empty, all the columns will be imported.

  • Remove empty rows: if selected, it removes the empty rows from the imported dataset.

  • Add an attribute containing filename: if selected, it adds an extra column with the name of the file to the dataset.

  • Remove empty columns: if selected, it removes the empty columns from the imported dataset.

  • Case sensitive: if selected, upper cases are considered different from lower cases.

  • Strip spaces: if selected, it removes the empty columns from the imported dataset. For example, the string “ class “” will be imported as “class”.

  • Turn off smart type recognition: if selected, prevents automatic recognition of data types. This option is useful when manual identification is preferable, for example when there is the risk of a code being misinterpreted as a date. However, if data types have been specifically defined in incoming MS Excel files, these data types will be maintained, even when the Turn off smart type recognition option has been selected.

  • Compress white spaces: if selected, it compresses contiguous occurrences of white spaces in one single occurrence. For example the string “university program” would be imported as “university program”.


Example

  • Drag an Import from Word File task onto the stage.

  • Double-click the task to open it.

  • Move the Source slider to Custom.

  • Select Local File System from the drop-down menu.

  • Click Select and browse to the file you want to import.

  • Configure the selected task as explained above, and if needed, change its configuration in the Word Configuration tab.

  • According to the selected Microsoft Word file, your Import from Word File task should look like the example provided below.

https://cdn.rulex.ai/docs/Factory/import-word.webp