Changes between Rulex 4 and Rulex Factory¶

In this page we want to summarize all the changes implemented between the old Rulex company’s software (Rulex 4) and the new component of Rulex Platform, Rulex Factory.

In this document, a focus is provided on the difference in options and computation behavior only. The user experience has been completely renovated: the new component is in fact a new software, which lays its roots in Rulex 4. Rulex Factory is part of a bigger project called Rulex Platform.

This list is meant to be a guide for the migration and the transition from the old Rulex 4 process to the new Factory flow. This migration is thought to be effortless and automatically performed by the Rulex Factory Flow import routine.

However, some critical situations need to be understood and solved in Rulex 4 directly before the import operation in the new software.

These situations are listed in the section Changes not mitigated.

Changes managed automatically by import converter¶

The modifications required by Rulex Factory listed here are automatically imposed by the process-flow converter used in the import operation when a prcx file is selected:

In Rulex 3.2, the functions currdate and tostring were available to evaluate the current date and to cast the result to string (an intermediate attribute type, different from nominal, which has been removed in Rulex 4). While Rulex 4 exceptionally allows the use of these two functions in the procvars option of module tasks (Execute Process File task or Rulex Process File Source task), in the import operation Rulex Factory converts these two functions to currDate and cast functions, respectively.
In Rulex Factory, comment lines before any history rows (the one starting with //DATASET OPERATION) are used to automatically build the visual description. Therefore, their presence is mandatory. Since in Rulex 4 some operations did not have these comment lines, in Rulex Factory the converter adds them, when needed. Namely:
- //DATASET OPERATION\n//REMOVE ROW is added to Rulex 3.2 removeRow operation.
- //DATASET OPERATION\n or //RULESET OPERATION\n is added where missing.
In Rulex 4, in the history, the data structure for rules is mentioned as rules. Rulex Factory converter changes it to ruleset to align it with the dataset naming, which was already present in Rulex 4 for data.
In Rulex 4, the status of the history lines is not reset when the task is disconnected from the source task. This may lead to misleading results by using a combination of Import from Task tasks and the computation of part of the history. Therefore, in Rulex Factory the status is always updated when a computed task is disconnected. During the conversion from prcx to rfl, the status of the history lines belonging to not computed tasks is set to ready.
The following options have been removed, as they are mainly unused options of Rulex 3.2 or Bridge options no longer useful in new bridge tasks configuration:
- compat32
- filenamefrc
- delimfrc
- alertstart
- alertend
- alerterror
- alertrecipients
- alerterrortype
- alertduration
- alertdurationtn
- debugfilename
- language
- rhostname
- report
- rportaux
- routputname
- rinputname
- scriptfromfile
- rulboundselrows
The converter deletes these options during the import operation, if they are present.

In Rulex 4, graphical drop-down menu options are sometimes associated with string values (for example in the uri option in import/export task), or with numbers (representing the position of the entry in the list). In this last case, if the number of entries is two, also the False / True binary entries are permitted, since they are automatically converted to a 0 / 1 integer. Consequently, adding a new entry to the list can change the position number, thus affecting the execution of the already existing processes. For this reason, any drop-down menu option in Rulex Factory is now associated with a list of string values. The converter automatically changes the options according to the table below:

Option	Value mapping
byname	`0` → `position` `1` → `name`
cattype	`0` → `inner` `1` → `outer`
oplist	`0` → `X` `1` → `=` `2` → `!=` `3` → `<` `4` → `<=` `5` → `>` `6` → `>=` `17` → `substr` `18` → `superstr` `19` → `not_substr` `20` → `not_superstr` `21` → `begin` `22` → `is_begin` `23` → `not_begin` `24` → `is_not_begin` `25` → `end` `26` → `is_end` `27` → `not_end` `28` → `is_not_end` `29` → `damerau_levenshtein_<=` `30` → `levenshtein_<=` `31` → `hamming_<=` `32` → `long_common_substr_<=` `33` → `damerau_levenshtein_>` `34` → `levenshtein_>` `35` → `hamming_>` `36` → `long_common_substr_>` `37` → `is_anagram` `38` → `is_word` `39` → `include_word` `40` → `primary_phonetic` `41` → `secondary_phonetic` `42` → `some_phonetic_common` `43` → `both_phonetic_common` `44` → `is_included` `45` → `included`
ordimptype	`0` → `fixed` `1` → `mean` `2` → `median` `3` → `mode` `4` → `minimum` `5` → `maximum` `6` → `minimumchange`
timeimptype	`0` → `fixed` `1` → `mean` `2` → `median` `3` → `mode` `4` → `minimum` `5` → `maximum` `6` → `minimumchange`
incrdecr (see note <#incrdecrnote>)	`-1` → `decrement` `1` → `increment`
jointype	`0` → `inner` `1` → `louter` `2` → `router` `3` → `outer` `4` → `lcomplement` `5` → `rcomplement` `6` → `complement`
mergetype	`0` → `nofill` `1` → `left` `2` → `right`
misspolicy	`0` → `normal` `1` → `always` `2` → `never`
inpdisctype	`0` → `incremental` `1` → `entropy` `2` → `chi` `3` → `width` `4` → `frequency` `5` → `roc`
outdisctype	`0` → `width` `1` → `frequency`
distmethod	`0` → `euclidean` `2` → `euclidean-norm` `3` → `manhattan` `4` → `manhattan-norm` `5` → `pearson`
evaldistmethod	`0` → `euclidean` `2` → `euclidean-norm` `3` → `manhattan` `4` → `manhattan-norm` `5` → `pearson`
normtype	`0` → `nonorm` `1` → `attribute` `2` → `normal` `3` → `minmax01` `4` → `minmax-11`
timeunit	`0` → `second` `1` → `minute` `2` → `hour` `3` → `day` `4` → `week` `5` → `month` `6` → `quarter` `7` → `year`
impurtype	`0` → `entropy` `1` → `gini` `2` → `error`
centroidtype	`0` → `means` `1` → `median` `2` → `medoids`
kmeanstype	`0` → `standard` `1` → `incremental` `2` → `error`
arsmoothfunc	`0` → `log` `1` → `box` `2` → `nosmooth`
shuffletype	`0` → `noshuffle` `1` → `reshuffle` `2` → `keepshuffle`
rulewide	`0` → `term` `1` → `condition` `2` → `eule`
ruleinterval	`0` → `operator` `1` → `mix` `2` → `interval`
treepruning	`0` → `no` `1` → `complexity` `2` → `reduced` `3` → `pessimistic`
treeusemissing	`0` → `average` `1` → `remove` `2` → `include`
adefiltmode	`0` → `maximal` `1` → `closed` `2` → `confidence`
appenddata	`0` → `dropinsert` `1` → `appendinsert` `2` → `update` `3` → `updateinsert` `4` → `delete`
nomimptype	`0` → `fixed` `1` → `mode`
allonly	`0` → `allbut` `1` → `only`
firstlast	`0` → `first` `1` → `last`
svmkernel	`0` → `linear` `1` → `polynomial` `2` → `radial` `3` → `sigmoid`
assigntype	`0` → `random` `1` → `smart` `2` → `weight`
fulldeploy	`0` → `all` `1` → `requested` `2` → `fair`
anomaly	`0` → `one-class` `1` → `anomaly`
ordroll	`1` → `minimum` `2` → `maximum` `3` → `summation` `4` → `average` `5` → `median` `6` → `mode` `7` → `standdev` `8` → `absdev`
nomroll	`1` → `minimum` `2` → `maximum` `3` → `summation` `4` → `average` `5` → `median` `6` → `mode` `7` → `standdev` `8` → `absdev`
svmtype	If task is Svm classification task: * `0` → `c_svc` * `1` → `nu_svc` If task is Svm regression task: * `0` → `epsilon_svr` * `1` → `nu_svr`

The converter automatically applies the table above to change the options coming from Rulex 4 during the flow import operation.

Attention

The only option which still keeps its numerical values (even if its graphical representation in Rulex Factory is a drop-down menu) is winauth in Import from Database task and Conditional Import from Database Task. This option has still 0 / 1 or False / True as possible values. This choice was made to reduce the transition effort especially when dealing with possible runtime parametrization.

Note

In the incrdecr option of module tasks (Execute Process File task or Rulex Process File Source task). The use of numbers greater than 1 or lower than -1, which led to inconsistency behavior also in Rulex 4, is no longer allowed.

The following options have been renamed in the transition from Rulex 4 to Rulex Factory:
- rcode → scriptcode
- rcommand → exepath
In Rulex 4, there are conflicting options for import tasks (for example filename / filelist). The list version is used if it is set, overriding the value of the name version. For this reason, in Rulex Factory the various filename, sheetname, query and tablename options have been removed and only the filelist, sheetlist, querylist and tablelist options are available. The converter moves, when needed, the value inserted in the filename, sheetname, query or tablename entry in the new list option, by changing it into a list of a single element.
In Rulex Factory, macro code has completely changed from Rulex 4. In Rulex 4, it was written as a list of internal command code (representing the socket language between the Rulex 4 Client and the Rulex 4 Server). In Rulex Factory, the macro code is made of CLI/API commands, and it is aligned with the new Rulex Platform API service. The converter automatically casts the old commands of Rulex 4 into the corresponding Rulex Platform ones.

Changes mitigated by import converter through value modifications¶

The modifications required by Rulex Factory listed here are automatically imposed by the process-flow converter through the introduction of ad-hoc code routine which can not be re-executed from Rulex Factory directly. They should be left untouched and never replicated in any other point of the Rulex Factory flow.

Comparison operations (==, !=, <=, >=) executed in formulas or in ifelse functions on nominal columns return different results on None entries when working on Rulex 4 and not on Rulex Factory. The differences are mitigated when converting from Rulex 4 by enclosing these operations in a backward-compatibility function rns, which restores the behavior of Rulex 4.

Attention

The function rns must NOT be used in any newly created code, as the new behavior of Rulex Factory is strongly recommended.

Conditions in Rulex Factory have been greatly expanded in operational possibilities. Now operations can be performed directly into condition codes, thus avoiding the need to create many additional support columns. However, as a side effect, conditions received as input in the Convert Ruleset to Dataset task must now be more specific to avoid conflicts between the old and the new structure. For example, while the condition code PROD_SIZE in {3/Midi} is accepted in Rulex 4, in Rulex Factory the string 3/Midi must be surrounded by quotes to specify that it is a string. The converter takes care of these occurrences by writing the condition in the following form: PROD_SIZE in {"3/Midi"}.
In Rulex 4, setting the procvar option to modify one specific variable in a module evaluation leads to a final option reporting the whole list of variables and not only the modified ones. To mitigate this effect, the converter modifies all the procvar options to delete any entry that is equal both in the module task and the parent flow, keeping the ones that are not present in the parent flow.
In Rulex Factory, the Reshape to Wide task will create all the new columns in the same table position of the original long attribute, when more than one long attributes are selected to be expanded. In Rulex 4, instead, when there is more than one long attribute, the new columns are still inserted at the end of the whole table, regardless of the position number or the order of the outputs. In Rulex Factory, a dedicated flag has been added to the task and its value is generated by the converter to ensure the same behavior as in Rulex 4 in the imported task.
In Rulex Factory, the module execution operation requires an rfl file as entry point, while Rulex 4 requires a prcx. In some situations, the options’ names of the module task are defined by using the Runtime Variables task. This makes the update of all the options from .prcx to .rfl more error prone. For this reason, a dedicated flag has been introduced in the Runtime Variables task; it is set as True by the converter to ensure the conversion of all .prcx extensions into .rfl at runtime during the execution of the task itself.

Warning

This dedicated flag does NOT convert the module itself at runtime, but only the option value. The .rfl version is meant to be already present in the same location of the original .prcx version.

In Rulex Factory, the value of the option process in the Import from Task task to specify the current flow has been changed from -- THIS -- to __this__. When importing a prcx, the converter changes all the options process in the Import from Task task with value -- THIS -- into __this__.
Rulex 4 allowed using in option loopvar values like @var and took @var as the iterator instead of its value, while Platform correctly uses the @var value. This difference has been mitigated converting this particular option when importing a prcx.
In Rulex Factory, when importing a .prcx file, the hard reset macros are converted into soft reset macros.
In Rulex Factory, when importing a .prcx file, in the Network Optimizer task, the option defaultcost is set to 1, instead of its default value -1, to keep the Network Optimizer task’s behavior in Rulex Platform equal to the one in Rulex 4.

Changes not mitigated or corrected by the converter¶

These remaining differences between Rulex 4 and Rulex Factory are established to be at low/minor impact, and therefore they are not mitigated by the converter, as they require in case a manual modification of the final imported flow.

The module execution has been moved from Python (used in Rulex 4) to GOLD in Rulex Factory, aligning these two tasks (Execute Process File task or Rulex Process File Source task) to the rest of the tasks during the execution process. As in Rulex 4 all tasks except the module ones must have had their options aligned with their assigned type, in Rulex Factory this inconsistent behavior has been corrected. In Rulex 4 the casting operations in modules performed with Python allow tasks to work even if the provided values in options are not of the correct type, while this is not possible anymore in Rulex Factory. However, the impact of this change has been defined as low/minimal.
When a module is executed in Rulex 4, the process variables used in the module flow have the following priority:
1. Parent workflow
2. Procvar option of the module task
3. Module workflow variables
This leads the procvar option to be meaningless in most occasions. In Rulex Factory, the priority behavior has been changed, and it has the following priority:
1. Procvar option of the module task
2. Parent flow
3. Module flow variables
The impact has been studied as minimal since procvar option use is not so frequent in Rulex 4.
In Rulex Factory, while executing modules, the selected loopvar must be defined and then selected only in the parent flow and no longer in the module only, as required in Rulex 4. This is related to the disaster recovery policy set in Rulex Factory for module computation, which executes the loop iteration in a completely different thread now. This allows the system to recover itself even if a hard crash due to memory consumption happens. The change only deals with the iteration of an unclear loop and its impact has been estimated as minimal.
In Rulex Factory, when using an Import from Task task into a module, if the Flow option is set to __this__, the flow which is taken into account is the module itself, which is the flow where the Import from Task task is located. In Rulex 4 instead, when selecting __this__ as the Flow option, the software took into account the parent flow.
In Rulex 4, process variables are calculated before any tasks leading to an evaluation, which in some situations depends on the running time (execution time) of the current process. In Rulex Factory, to make the whole system more deterministic, the flow variables are evaluated only at the beginning of each computation and then updated only by a Runtime Variables task. This also leads to an improvement in performance. The impact is only related to functions depending on external factors, such as the currDatetime function. However, the use of this function in all the studied cases is affected and not supported by Rulex 4 behavior. In Rulex Factory, more deterministic cases will be obtained without any change in the meaningful part of the flow.
Starting from 1.1.2-21, the Network Optimizer task has been completely rewritten in Rulex Factory to introduce a native Priority management. The modification of the optimization routine may lead to differences compared to previous version, even with the same input data and options. However, all these differences lead to the same or to better cost functions, achieving an overall better optimization.
Starting from 1.1.2-21, the Network Optimizer task raises an error both when the Destination value doesn’t match with the corresponding Destination Quantity value and when the Cost value related to a Source and Destination pair doesn’t match.
Starting from 1.1.2-21 in the Import from Text File task, when a text delimiter option has been defined, the system recognizes the attributes whose values are included into text delimiters as nominal ones.
In the Import from Excel File task, percentage columns defined in MS Excel files are recognized as percentage attributes. In Rulex 4, percentage columns in MS Excel files were imported as continuous attributes, instead.
Starting from 1.2.1-27, the Import from Text File task has been modified to treat a general inconsistency of Rulex 4 in the management of empty lines at the beginning of the file itself. This leads to a necessary modification of nameline and typeline option if set to match the new empty line count. However, the new empty line count is now aligned to what the user can see in a normal text editor.
In Rulex 4, both the soft reset and the hard reset operations didn’t delete data from the database. In Rulex Factory, the hard reset operation now erases data from the database, while the soft reset operation has the same behavior as in Rulex 4.
In Rulex 4, when exporting a MS Excel file, continuous values were converted to integers upon exportation, while in Rulex Factory they are exported keeping the continuous attribute type.
In Rulex Factory, when a continuous column was cast from continuous to nominal in the Data Manager, the software applied a round function; while if the cast operation was performed in other tasks, the software applied a floor function. Starting from Rulex Factory 1.2.1-144, when a column is converted from continuous to nominal, the software applies a round function, no matter which task performs the cast operation.
Starting from 1.2.2-12, the database name is not mandatory anymore when setting up a connection to Spark and Databricks.