Import from Word File#

Rulex Factory, through the Import from Word File task, allows users to import data stored in a Word file, whether they have a table layout or not.

The Import from Word File task is divided into two tabs: the Options tab (please refer to the page Import Overview for further and more detailed information) and the Word Configuration tab, whose characteristics and configuration will be explained in the next section.


The Word Configuration tab#

The Word Configuration tab is divided into three panes:

Parsing options

Users can transform the chosen data into a more readable format. The following options are available:

  • Data separators: options available are: TABBING, COMMA, SEMICOLON, SPACE, OTHER.

  • Number separators: it is divided into Thousands separator and a Decimals separator drop-down list.

  • Missing string: users can enter the word you want to remove from the dataset.

  • Text delimiter: select ‘ or “ if these symbols have been used as string delimiters. They will not be included in the imported file.

  • Use contiguous separators as a single one: select the checkbox if you want to force the parser to consider any possible group of adjacent separators as one in text files. For example, if you select this option, the string ‘1,2,,,3’, with the comma as a separator, will be parsed as 1, 2, 3, while if not checked it will be parsed as 1, 2, ‘’, ‘’, 3.

Import options

Within this pane, users will find the following options:

  • Starting importing from line: select the number of the line from which the importing operations will start.

  • Stop importing at line: select the number of the line where the importing operations will stop. Leave the value 0 if users want to import the whole dataset.

  • Get names from line: select from this spin box which line in the dataset contains the column header names.

  • Get types from line: select from this spin box which row contains the data type values users would like to set for the column.

  • Column to be imported (empty for all): the number of columns to be imported. If left empty, all the columns will be imported.

  • Remove empty rows: select the checkbox if you want to remove the empty rows from the imported dataset.

  • Add an attribute containing filename: select this checkbox to add an extra column with the name of the file to the dataset.

  • Remove empty columns: select this check box if you want to remove the empty columns from the imported dataset.

  • Case sensitive: users can select this checkbox if they want uppercase letters values to be considered different from the lower cases ones.

  • Strip spaces: select this option if you want to remove spaces surrounding strings. For example, the string “ class “ will be imported as “class”.

  • Turn off smart type recognition: if selected, prevents automatic recognition of data types, leaving the generic nominal type. This option is useful when manual identification is preferable, for example when there is the risk of a code being misinterpreted as a date. However, if data types have been specifically defined in incoming MS Excel files, these data types will be maintained, even when the Turn off smart type recognition option has been selected.

  • Compress white spaces: select it to remove extra consecutive spaces from within strings. For example the string “university program” would be imported as “university program”.

Table Preview

Within this pane, users will be able to visualize a preview of their imported tables.

On the right of the Table preview pane, users can find the Number of records in preview spin box.


Example#

  • After having imported a Microsoft Word file onto the stage, drag an Import from Word task onto the stage and link it to the source task.

  • Double-click on the task to open it and configure the selected task as explained in the Options tab (refer to the Import Overview page) and the Word Configuration tab sections.

  • According to the selected Microsoft Word file, your Import from Word task should look like the example provided below.

https://cdn.rulex.ai/docs/Factory/import-word.webp