Text functions¶

Text functions perform operations on text, such as extracting part of it, removing the start/end of a word.

The following functions are available:

charReplace
distance
find
head
isPrefix
isSuffix
isWord
numExt
pad
phonetic
prefix
replace
strip
suffix
tail
textConcat
textCount
textDecode
textEncode
textFormat
textLen
textLower
textSort
textUpper

Warning

All the functions reported in this page, work only on nominal attribute. If an attribute with type different from nominal is provided, it will be automatically cast to nominal before executing the desired operation. Therefore if you are using one of these functions directly on continuous attributes, it will be cast to nominal retaining a fixed precision, controlled by the Flow Execution Parameters.

charReplace¶

The charReplace function replaces fonts with new ones.

Parameters

charReplace(column, oldchar, newchar, unchanged, charforothers, considersequence)

Parameter	Description
column	The nominal attribute from which you extract the substring. The column parameter is mandatory.
oldchar	The string or list of strings (it can be even a word, but every letter of it will be considered as a separate font) which contain all the fonts that have to be replaced with the newchar value. The oldchar parameter is mandatory.The oldchar parameter is case-sensitive.
newchar	The string or list of strings which replace all the fonts in oldchar. The newchar parameter is mandatory. The newchar parameter is case-sensitive.
unchanged	The string which contains all the fonts that must not be replaced. By default, all the fonts which are not included in oldchar are included in this list.
charforothers	The string which replaces all the other fonts. The default value is “X”.
considersequence	It is set as `False` by default. If it is `True`, every subset of contiguous fonts present in oldchar is replaced by one occurrence of oldchar.

Example - charReplace(column, oldchar, newchar)

The following example uses the Adult dataset.

In this example, we want to replace the ‘married’ string (that are the ‘m’, ‘a’, ‘r’, ‘r’, ‘i’, ‘e’, ‘d’ fonts) with a ‘_’: add a new attribute, called replace, and type the following formula: charReplace($"marital-status",'married','_') and the ‘m’, ‘a’, ‘r’, ‘r’, ‘i’, ‘e’, ‘d’ strings will be replaced by a ‘_’.
As you can see, all these letters are missing in the new values.

Example - charReplace(column, oldchar, newchar, unchanged, charforothers, considersequence)

The following example uses the Adult dataset.

If we want to add more complexity to the function, and we want to edit all the string and also define which fonts must not be changed, we need to specify the optional parameters.
In this example, we want to keep the ‘N’ unchanged, and replace all the other fonts which have not been specified in the oldchar parameter with an ‘*’, which will be the charforothers parameter.
Specify True as the considersequence parameter, so that there will be only one occurrence of that font.
The function will become: charReplace($"marital-status",'married','_','n','*',True)

distance¶

The distance function computes the distance between the values of two columns, column1, column2, according to one of the following methods:

levenshtein (l)
damerau-levenshtein (dl)
lcs (lcs)
hamming (hamming)

Parameters

distance(column1, column2, method)

Parameter	Description
column1	The first attribute used to evaluate the distance. If it is not nominal, it will be cast to nominal upon function’s computation. The column1 parameter is mandatory.
column2	The second attribute used to evaluate the distance. If it is not nominal, it will be cast to nominal upon function’s computation. The column2 parameter is mandatory.
method	The algorithm (”levenshtein“, “damerau-levenshtein“, “lcs“, “hamming“) used to evaluate the string distance. Each method is associated to a string: The method parameter is mandatory. ‘l’ for the Levenshtein algorithm ‘dl’ for the Damerau-Levenshtein algorithm ‘lcs’ for the lcs algorithm ‘hamming’ for the hamming algorithm

Parameter

Description

column1

The first attribute used to evaluate the distance. If it is not nominal, it will be cast to nominal upon function’s computation. The column1 parameter is mandatory.

column2

The second attribute used to evaluate the distance. If it is not nominal, it will be cast to nominal upon function’s computation. The column2 parameter is mandatory.

method

The algorithm (”levenshtein“, “damerau-levenshtein“, “lcs“, “hamming“) used to evaluate the string distance. Each method is associated to a string: The method parameter is mandatory.

‘l’ for the Levenshtein algorithm
‘dl’ for the Damerau-Levenshtein algorithm
‘lcs’ for the lcs algorithm
‘hamming’ for the hamming algorithm

Example - distance(column1, column2, method)

The following example uses the Turkish calendar dataset.

In this example, we want to calculate the distance between the RAMADAN_FLAG attribute and the PUBLIC_HOLIDAY_FLAG attributes using the Levenshtein algorithm.
Add a new attribute, called distance, and type the following formula: distance($"RAMADAN_FLAG",$"PUBLIC_HOLIDAY_FLAG",'l').
The function has returned 0, when the value in both attributes is the same, so no changes has to be made to make them equal.
It has returned 1 when the values were not equal, indicating the operations to perform to make the values equal.

find¶

The find function looks for a value within a column and returns True or False.

Parameters

find(column, value, binary, ischarlist, charpos)

Parameter	Description
column	The nominal attribute where the value will be searched. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
value	The value to be searched in the column. If it is not nominal, it will be cast to nominal upon function’s computation. The value parameter is mandatory.The value parameter is case-sensitive.
binary	If it is `True`, or not specified, results will be provided as boolean values (`True` / `False`), while if it is `False`, results will be provided in binary form (`1` / `0`).
ischarlist	If it is `False`, or not specified, only the value as a block will be considered, while if it is `True`, all the fonts in the value will be considered separately.
charpos	If it is =0, True is returned if the value is present in the column. If `charpos > 0` the function returns in each row the position of the value within the column if it is present in it, otherwise the function returns an empty cell.

Example - find(column, value, charpos=charpos)

The following example uses the Adult dataset.

In this example, we want to find the Bachelors value in the Education attribute.
Add a new attribute, called find, and type the following formula: find($"education",'Bachelors',charpos=2).
As we didn’t specify the binary and ischarlist parameters (we want to keep the default options), we needed to write charpos=2.
If the binary and is charlist parameters would have been specified, we should have written only 2 in the formula.
The function returns a 1 when the Bachelors value is found, or an empty cell when the value is not present in the corresponding row.

head¶

The head function returns in each row the first n letters of the corresponding value in the column.

Parameters

head(column, nchar)

Parameter	Description
column	The nominal attribute containing the strings. If the column is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
nchar	The number of letters we want to select for each string in the column. It can be either a specified value or an integer attribute. The nchar parameter is mandatory.

Example - head(column, nchar)

The following example uses the Adult dataset.

In this example, we want to transform the Sex attribute values into their initial.
Add a new attribute, called newvalues, and write the following formula: head($"sex",1).
The function has returned the first letter of each value in the corresponding row.

isPrefix¶

The isPrefix function checks whether a string is a prefix or not, and returns True in the rows of the column starting with the string value; otherwise the function returns False.

Parameters

isPrefix(column, value, binary)

Parameter	Description
column	The nominal attribute in which the value will be checked. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
value	The value to be checked in the column. If it is not nominal, it will be cast to nominal upon function’s computation. The value parameter is mandatory.The value parameter is case-sensitive.
binary	If it is `True`, or not specified, results will be provided as boolean values (`True` / `False`), while if it is `False`, results will be provided in binary form (`1` / `0`).

Example - isPrefix(column, value)

The following example uses the HR-Employee-Attrition dataset.

In this example, we want to identify if the string ‘Travel’ is a prefix for the values in the BusinessTravel attribute.
Add a new attribute, which we have called travel, and type the following formula: isPrefix($"BusinessTravel",'Travel_').
The results are provided as boolean values, as we didn’t specify the binary parameter.

isSuffix¶

The isSuffix function checks whether a string is a suffix or not, and returns True in the rows of the column ending with the string value; otherwise False is returned.

Parameters

isSuffix(column, value, binary)

Parameter	Description
column	The nominal attribute in which the value will be checked. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
value	The value to be checked in the column. If it is not nominal, it will be cast to nominal upon function’s computation. The value parameter is mandatory.The value parameter is case-sensitive.
binary	If it is `True`, or not specified, results will be provided as boolean values (`True` / `False`), while if it is `False`, results will be provided in binary form (`1` / `0`).

Example - isSuffix(column, value, binary)

The following example uses the Adult dataset.

In this example, we want to know if the string ‘gov’ is a suffix for all the values of the occupation attribute.
Add a new attribute, which we have called suffix, and type the following formula: isSuffix($"workclass",'gov',False).
As we have specified the binary parameter as False, the results will be provided in binary form (1 / 0), where 1 means True, and 0 False.

isWord¶

The isWord function checks whether a string (which can have a delimiter) is contained in an attribute or not.

Parameters

isWord(column, substring, delimiter, binary)

Parameter	Description
column	The nominal attribute in which the substring will be checked. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
substring	The substring to be checked in the column. If it is not nominal, it will be cast to nominal upon function’s computation. The substring parameter is mandatory.The substring parameter is case-sensitive.
delimiter	The delimiter of the substring. If it is not specified, the delimiter is (space) as default.
binary	If it is `True`, or not specified, results will be provided as boolean values (`True` / `False`), while if it is `False`, results will be provided in binary form (`1` / `0`).

Example - isWord(column, substring)

The following example uses the supermarket_sales dataset.

In this example, we want to check if the string ‘and’ is present in the Product line attribute.
Add a new attribute, which we have called isword, and type the following formula: isWord($"Product line",'and').
As we didn’t specify the binary parameter, the results are provided as boolean values. The delimiter parameter hasn’t been specified.

numExt¶

The numExt function returns a string containing only the numerical characters of the input string. If more than one number is present, numbers are delimited by a separator decided by the user (by default “-“).

Parameters

numExt(column, onlyint, separator)

Parameter	Description
column	The nominal attribute used to evaluate the numeric extraction. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
onlyint	If it is `True`, or not specified, only integer values will be extracted, while if it is `False`, also continuous values will be extracted.
separator	The separator used in case of multiple number extraction. It is ‘-’ as default.

Example - numExt(column)

The following example uses the Bike sales dataset.

In this example, we want to create a new attribute, filled with the ages of the Age_Group attribute.
Add a new attribute, which we have called age, and type the following formula: numExt($"Age_Group") and the attribute will be filled with the numbers only.

pad¶

The pad function returns in each row of the result, the values of the column, filled (padded) with the padstring value to reach the specified length. The string can be added at the beginning (where = “begin” or by default) or at the end (where = “end”) of the string, according to the value of the parameter where.

Parameters

pad(column, len, value, where)

Parameter	Description
column	The nominal attribute used to compute the text lengths. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
len	The length (number of letters) of the resulting words. The len parameter is mandatory.
value	The padstring to be added to fill the desired length, if the original word is shorter than the len. It is set to 0 as default. If the padstring is not nominal, it will be cast to nominal upon function’s computation.
where	It controls the position where the padstring is added. If it is ‘begin’, the padstring will be added at the beginning of the word, while if it is ‘end’, the padstring will be added at the end.

Example - pad(column, len)

The following example uses the E-commerce shipping data dataset.

In this example, we want the ID attribute values to have the same length, which is 6 for administrative reasons.
Add a new attribute, called newID and type the following formula: pad($"ID",6).
As we didn’t specify any value parameters, 0 is added to reach the len.
As we didn’t specify any where parameters, the values are added at the beginning of the word.

Example - pad(column, len, value, where)

The following example uses the E-commerce shipping data dataset.

If we want to add specific values (x) at the end of the ID attribute’s strings, we need to add the value and where parameters to the function.
The function becomes: pad($"ID",6,'x','end').

phonetic¶

The phonetic function returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. Phonetic may return the primary Metaphone component (component = “Primary” or component = “P”) or the secondary component (component = “Secondary” or component = “S”). By default, the primary component is returned.

Note

The Metaphone algorithm is a phonetic algorithm for indexing words by their English pronunciation. Its features are the as follows:

Drop duplicate adjacent letters, except for C.
If the word begins with ‘KN’, ‘GN’, ‘PN’, ‘AE’, ‘WR’, drop the first letter.
Drop ‘B’ if after ‘M’ at the end of the word.
‘C’ transforms to ‘X’ if followed by ‘IA’ or ‘H’ (unless in latter case, it is part of ‘-SCH-’, in which case it transforms to ‘K’). ‘C’ transforms to ‘S’ if followed by ‘I’, ‘E’, or ‘Y’. Otherwise, ‘C’ transforms to ‘K’.
‘D’ transforms to ‘J’ if followed by ‘GE’, ‘GY’, or ‘GI’. Otherwise, ‘D’ transforms to ‘T’.
Drop ‘G’ if followed by ‘H’ and ‘H’ is not at the end or before a vowel. Drop ‘G’ if followed by ‘N’ or ‘NED’ and is at the end.
‘G’ transforms to ‘J’ if before ‘I’, ‘E’, or ‘Y’, and it is not in ‘GG’. Otherwise, ‘G’ transforms to ‘K’.
Drop ‘H’ if after vowel and not before a vowel.
‘CK’ transforms to ‘K’.
‘PH’ transforms to ‘F’.
‘Q’ transforms to ‘K’.
‘S’ transforms to ‘X’ if followed by ‘H’, ‘IO’, or ‘IA’.
‘T’ transforms to ‘X’ if followed by ‘IA’ or ‘IO’. ‘TH’ transforms to ‘0’. Drop ‘T’ if followed by ‘CH’.
‘V’ transforms to ‘F’.
‘WH’ transforms to ‘W’ if at the beginning. Drop ‘W’ if not followed by a vowel.
‘X’ transforms to ‘S’ if at the beginning. Otherwise, ‘X’ transforms to ‘KS’.
Drop ‘Y’ if not followed by a vowel.
‘Z’ transforms to ‘S’.
Drop all vowels unless it is the beginning.

Parameters

phonetic(column, component)

Parameter	Description
column	The nominal attribute used to evaluate the phonetic encoding. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
component	If Primary, P or not specified, the function returns the primary component. If Secondary or S, the secondary component is returned.

Example - phonetic(column)

The following example uses the Students_performance dataset.

In this example, we want to retrieve the pronunciation of the test preparation course attribute.
Add a new attribute, called phonetic, and type the following formula: phonetic($"test preparation course") and the attribute is filled with the primary component, as the component parameter hasn’t been specified.

prefix¶

The prefix function considers the chosen value as prefix and returns the preceding characters.

Parameters

prefix(column, value, last)

Parameter	Description
column	The nominal attribute containing the strings. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
value	The substring value to be searched into the strings of the column. The value parameter is mandatory.The value parameter is case-sensitive.
last	If `False`, or not specified, the function selects the first occurrence of the value within the string, while if it is `True`, the function selects the last occurrence of the value which has to be considered.

Example - prefix(column, value, last)

The following example uses the Adult dataset.

In this example, we want to retrieve as prefixes the characters before the i in the marital-status attribute.
Add a new attribute, called prefix, and type the following formula: prefix($"marital-status",'i',true).
As we have specified True in the last parameter, the function will consider the last occurrence of the value: as we can see in row 2, Married-civ-spouse becomes Married-c, as the function considers the last occurrence of the value.

replace¶

The replace function replaces the current strings of the values in the column with the new ones.

Parameters

replace(column, oldvalue, newvalue, ntimes)

Parameters	Description
column	The nominal attribute containing the strings of the values to be replaced. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
oldvalue	The value to be replaced within the column. The oldvalue parameter is mandatory.
newvalue	The new value which replaces the oldvalue. The newvalue parameter is mandatory. The newvalue parameter is case-sensitive.
ntimes	If 0, or not specified, all the occurrences of the oldvalue within all the attribute’s values will be replaced. If a positive number is specified, it indicates the number of times the newvalue will replace the oldvalue starting from the beginning of all the values within the attribute. If a negative number is specified, it indicates the number of times the newvalue will replace the oldvalue starting from the end of all the values within the attribute.

Example - replace(column, oldvalue, newvalue)

The following example uses the Adult dataset.

In this example, we want to replace the United-States values of the native-country attribute with the value USA.
Add a new attribute, called replace, and type the following formula first: replace($"native-country",'United-States','USA').
Where present, the value United-States has been replaced with the USA value, while the other values of the native-country attribute haven’t been changed.

Example - replace(column, oldvalue, newvalue, ntimes)

The following example uses the Adult dataset.

In this example, we want to replace the t character twice, with the @ character in the native-country attribute. This means that, if the character t is present three times within the value, only the first two occurrences will be replaced by @, while the rest will be left unchanged.
Then, type: replace($"native-country",'t','@',2).
As you can see, the first two occurrences of the ‘t’ are replaced by the @.

strip¶

The strip function returns the value without the specified characters or list of characters located at the beginning, at the end or on both sides of the value.

Parameters

strip(column, value, where, ischarlist)

Parameter	Description
column	The nominal attribute containing the original values. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
value	The value containing the string or the list of characters to be removed. If it is not nominal, it will be cast to nominal upon function’s computation. The value parameter is mandatory.The value parameter is case-sensitive.
where	If it is begin, or not specified, the value will be removed from the beginning of the word. If it is end, the value will be removed at the end of the word. If it is both, the value will be removed both at the end and the beginning of the word.
ischarlist	If it is `True`, or not specified, all the characters in the value will be considered separately, while if it is `False`, only the value as a block will be considered.

Example - strip(column, value, ischarlist=ischarlist)

The following example uses the Adult dataset.

In this example, we want to cancel the ‘S’ not only as a block, but also as a list of characters from the occupation attribute.
Add a new attribute, which we will call strip, and type the following formula:strip($"occupation",'S',ischarlist=True).
As we didn’t specify the where parameter, we needed to specify the parameter name in the formula.

suffix¶

The suffix function considers the chosen value as suffix and returns the subsequent characters.

Parameters

suffix(column, value, last)

Parameter	Description
column	The nominal attribute containing the strings. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
value	The substring value to be searched into the strings of the column. The value parameter is mandatory. The value parameter is case-sensitive.
last	If `False`, or not specified, the function selects the first occurrence of the value within the string, while if it is `True`, the function selects the last occurrence of the value which has to be considered.

Example - suffix(column, value, last)

The following example uses the Adult dataset.

In this example, we want to retrieve as suffixes the characters after the i in the marital-status attribute.
Add a new attribute, which we have called suffix, and type the following formula: suffix($"marital-status",'i').
As we didn’t specify the last parameter, the function considers as suffix all the characters after the first value (i in this example) occurrence.

tail¶

The tail function returns the last n letters of the corresponding value in the column.

Parameters

tail(column, nchar)

Parameter	Description
column	The nominal attribute containing the strings. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
nchar	Number of characters we want to select at the end of the string. It can be either a specified value or an integer attribute. The nchar parameter is mandatory.

Example - tail(column, nchar)

The following example uses the Bike sales dataset.

In this example, we want to retrieve the tail of the values of the Year attribute.
Select the Year attribute and type the following formula: tail($"Year",2).
As Year is an integer attribute, the function cast it to nominal.

textConcat¶

The textConcat function returns the concatenation of all the strings in a column.

Parameters

textConcat(column,separator,group)

Parameter	Description
column	The nominal attribute used to evaluate the text concatenation. The column parameter is mandatory.It can also be defined as a list, and in this case the function returns per each row the concatenated strings of the attributes listed. If it is not nominal, it will be cast to nominal upon function’s computation.
separator	The customized separator to be used in the concatenation.
group	The attribute used to group results; The group parameter can also be defined as a list. If the column parameter is defined as a list, then the group parameter is ignored.

Example - textConcat(column, separator)

The following example uses the Adult dataset.

In this example, we want to concatenate the strings of 2 attributes, and separated by a dash.
To achieve this goal we’re going to use the following formula: textConcat(($"age","education"),"-").
If you only define the education attribute as column parameter, all the strings contained in that column will be concatenated in each row.

textCount¶

The textCount function returns the number of occurrences of a provided substring in each string of the evaluated column.

Parameters

textCount(column, substring)

Parameter	Description
column	The nominal attribute where the occurrences will be counted. The column parameter is mandatory.
substring	The researched string. It must be a single nominal string or a nominal attribute. If a nominal attribute is specified, the operation is performed row by row. The substring parameter is mandatory.

Example - textCount(column,substring)

The following example uses the Adult dataset.

After having imported the dataset into the flow, add a Data Manager and link it to the import task.
Add a new column to the dataset, which we have called textCount.
Select the attribute, then type the following function in the formula bar: textCount($"sex","e") and press Enter.

textDecode¶

The textDecode function decodes all the strings in a column.

Parameters

textDecode(column,enctype)

Parameter	Description
column	The nominal attribute used to evaluate the text decoding. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
enctype	The decoding type. The default value is base64, while other possible values are url and hex.

Example - textDecode(column,enctype)

The following example uses the Adult dataset.

In this example, we want to decode a column which has been previously encoded to hex encoding type.
Add a new column to the dataset, in this case we have called it textDecode.
Select the attribute, then type the following function in the formula bar: textDecode($"Encode","hex").

textEncode¶

The textEncode function encodes all the strings in a column.

Parameters

textEncode(column,enctype,alsoslash)

Parameter	Description
column	The nominal attribute used to evaluate the text encoding. If it is not nominal, it will be cast to nominal upon function’s computation. The column parameter is mandatory.
enctype	The encoding type. The default value is base64, while other possible values are `url` and `hex`.
alsoslash	Used only in url encoding, it defines if the `/` and the `\ \` characters should be encoded. The default value is `False`.

Example - textEncode(column,enctype)

The following example uses the Adult dataset.

In this example, we want to encode the values contained in the marital-status attribute.
Add a new column to the dataset, in this case we have called it Encode.
Select the Encode attribute, then type the following function in the formula bar: textEncode($"marital-status","hex").
The values in the marital-status attribute have been encoded according to the hex encoding type.

textExtract¶

The textExtract function returns the string ranging from a defined starting position to defined ending position.

Parameters

textExtract(column,startpos,endpos)

Parameter	Description
column	The nominal attribute used to extract the substring. The column parameter is mandatory.If it is not nominal, it will be cast to nominal upon function’s computation.
startpos	The position of the letter where the extraction starts (the first letter is 1). It can be either a specified value or an integer attribute. The startpos parameter is mandatory.
endpos	The position of the letter where the extraction ends. It can be either a specified value or an integer attribute. The endpos parameter is mandatory.

Example - textExtract(column, startpos, endpos)

The following example uses the Adult dataset.

In this example, we want to retrieve a subset of the string contained in the fnlwgt attribute.
To achieve this goal we’re going to use the following formula: textExtract($"fnlwgt",3,5).
3 is the starting position of the string we need, and 5 is the ending.

textFormat¶

The textFormat function returns the type of the strings in each row of the column.

Parameters

textFormat(column)

Parameter	Description
column	The attribute from which we want to retrieve the type of each string. Its type must be nominal. The column parameter is mandatory.

Example - textFormat(column)

The following example uses the Bike sales dataset.

In this example, we have manually updated certain values within the Age Group attribute.
We have added: * 35 in row 3; * 64 in row 4; * 100% in row 6; * 5.4 in row 10.

Rename the Var_19 attribute to textFormat, then type the following formula: textFormat($"Age_Group").
As you can see, the function has returned ‘nominal’ for all the values except: * row 3, which is ‘integer’; * row 4, which is ‘integer’; * row 6, which is ‘percentage’; * row 10, which is ‘continuous’.

textLen¶

The textLen function returns the length of the string contained in each row of the column.

Parameters

textLen(column,mode,leaveother)

Parameter	Description
column	The attribute used to count the string length. The column parameter is mandatory.If it is not nominal, it will be cast to nominal upon function’s computation.

Example - textLen(column)

The following example uses the Adult dataset.

In this example, we want to count the length of the strings contained in each row of an attribute.
To achieve this goal we’re going to use the following formula: textLen($"fnlwgt").
Here we can see that the formula has counted the number of characters each row contains.

textLower¶

The textLower function changes uppercase fonts of a nominal attribute to lowercase fonts.

Parameters

textLower(column,mode,leaveother)


Parameter	Description
column	The nominal attribute to which modify the fonts to lowercase. *The column* parameter is mandatory.**
mode	By default the textLower function changes all fonts to lowercase, and the following are the permitted parameters: `mode="a"` - changes all fonts to lowercase (default parameter). `mode="b"` - changes the first font to lowercase. `mode="bw"` - changes the first font of each string to lowercase.
leaveother	If left empty or not specified, the default leaveother parameter is `False`. If the leaveother is `False`, the function changes all fonts that are not converted in lowercase to uppercase. If the leaveother is `True`, the function leaves any lowercase font to lowercase. The effects of the leaveother parameter are visible only when the mode `"b"` or mode `"bw"` are defined.

Example - textLower(column, mode, leaveother)

The following example uses the Adult dataset.

In this example, we want to change the uppercase fonts of a nominal attribute to lowercase.
But we want to change only the first font of each string to lower case, and leave to lowercase fonts that are already lowercase.
To achieve this goal we’re going to use the following formula: textLower($"education","bw",True)

textSort¶

The textSort function sorts in ascending order the strings contained in each cell of a nominal attribute.

Parameters

textSort(column,ascending)

Parameter	Description
column	The nominal attribute whose strings you want to sort. The column parameter is mandatory.
ascending	Sets the order of the sort according to the following parameters:if it’s set as `True`, it sorts the strings in ascending order. If it’s set as `False`, it sorts the strings in descending order. If left empty or unspecified, the default parameter is `True`.

Example - textSort(column, ascending)

The following example uses the Adult dataset.

In this example, we want to sort the characters of a nominal attribute in descending order.
To achieve this goal we’re going to use the following formula: textSort($"education",False).
And, in the screenshot we can see the results obtained, now in each cell of the education attribute letters have been sorted in descending order.

textUpper¶

The textUpper function changes lowercase fonts of a nominal attribute to uppercase fonts.

Parameters

textUpper(column,mode,leaveother)

Parameter	Description
column	The nominal attribute to which you want to change the fonts to uppercase. The column parameter is mandatory.
mode	By default the textUpper function changes all fonts to uppercase, and the following are the permitted parameters: `mode="a"` - changes all fonts to uppercase (default parameter). `mode="b"` - changes the first font to uppercase. `mode="bw"` - changes the first font of each string to uppercase.
leaveother	If left empty or not specified, the default leaveother parameter is `False`. If the leaveother is `False`, the function changes all fonts that are not converted in uppercase to lowercase. If the leaveother is `True`, the function leaves any uppercase font as it is. The effects of the leaveother parameter are visible only when the mode `"b"` or mode `"bw"` are defined.

Parameter

Description

column

The nominal attribute to which you want to change the fonts to uppercase. The column parameter is mandatory.

mode

By default the textUpper function changes all fonts to uppercase, and the following are the permitted parameters:

mode="a" - changes all fonts to uppercase (default parameter).
mode="b" - changes the first font to uppercase.
mode="bw" - changes the first font of each string to uppercase.

leaveother

If left empty or not specified, the default leaveother parameter is False. If the leaveother is False, the function changes all fonts that are not converted in uppercase to lowercase. If the leaveother is True, the function leaves any uppercase font as it is.
The effects of the leaveother parameter are visible only when the mode "b" or mode "bw" are defined.

Example - textUpper(column, mode, leaveother)

The following example uses the Adult dataset.

In this example, we want to change the lowercase fonts of a nominal attribute to uppercase.
We want to change only the first font, and convert any other uppercase font to lowercase.
To achieve this goal we’re going to use the following formula:textUpper($"education","b",False).