CheckStandardizeColumns
CheckStandardizeColumns combines the source language (such as SQL) and functions (see the Functions section) to validate and transform/format/standardize columns of the input table.
- Source, Space, Table - defines the input table.
- Key column - name of a key column (must be unique and not null) in the input table. This parameter is only required if you use at least one Function in Columns.
- Target source, Target space, Target table - defines the name of the result table to be created.
- Columns table contains a list of the result columns.
- Do not pass input columns by default - If unchecked, the result table will contain all columns from the input table plush Columns. If checked, the result table will contain only Columns. In the latter case you can still pass the input columns explicitly if you add a row in the Columns table and enter
<any column>in the Input column field. - Update source table - will update Columns in the source table instead of creating a new table specified in Target.
Defining result columns
Each record in the Columns table represent a column or a group of columns of the result table. The Result column name column defines the name of the column. The Pre expression, Function, Post expression columns are used to compute the value of the result column.
Result column name, Pre expression, Post expression can refer by
${col_name} to a column of the input table specified in the Input
column column.
The Input column column can have the following values:
-
Select
<any column>to copy all columns from the input table. -
<expression>used when expressions do not need to refer to a column from the input table. -
name of a column in the input table.
Standardization allows you to combine the power of the source query language (usually it is SQL), in which Pre expression and Post expression are specified, with the Java functions that you specify in the Function column. When Function is specified for at least one column then a round trip of data from the source to InfoLink instance and back is required. So there is a performance penalty for using functions that still can be negligible in practice. For your understanding, below we describe how standardization works in presence of any Function and without them.
When Function is used at least for one column, the value of the
result column is computed as follows. First Pre expression for all
columns are computed in the source database as a single query. You can
refer to the input column from Pre expression using a string
literal ${col_name}. InfoLink iterates over the result of the
query and applies the corresponding Function to each columns. The
result of the function application is stored to the source as temporaty
table. Then another query is evaluated to evaluate Post expression for
each column. You can refer to the result of function application from
Post expression using a string literal ${col_name}.
When Function is not used for any column, the whole operation is
executed as a single query in the source. First Pre expression is
evaluated then Post expression is evaluated for each column. You can
refer to the input column from Pre expression using a string
litteral ${col\_name}. You can refer to the result of Pre
expression from Post expression using a string literal
${col\_name}.
The following Functions are supported:
- pre-built functions specified in the Functions -> “Prebuilt functions” section.
- user-defined functions as described in the Functions -> “User-defined functions” section.