Skip to Content
DocsOperationsCheckStandardizeColumns

CheckStandardizeColumns

CheckStandardizeColumns combines the source language (such as SQL) and functions (see the Functions section) to validate and transform/format/standardize columns of the input table.

  • Source, Space, Table - defines the input table.
  • Key column - name of a key column (must be unique and not null) in the input table. This parameter is only required if you use at least one Function in Columns.
  • Target source, Target space, Target table - defines the name of the result table to be created.
  • Columns table contains a list of the result columns.
  • Do not pass input columns by default - If unchecked, the result table will contain all columns from the input table plush Columns. If checked, the result table will contain only Columns. In the latter case you can still pass the input columns explicitly if you add a row in the Columns table and enter <any column> in the Input column field.
  • Update source table - will update Columns in the source table instead of creating a new table specified in Target.

Defining result columns

Each record in the Columns table represent a column or a group of columns of the result table. The Result column name column defines the name of the column. The Pre expression, Function, Post expression columns are used to compute the value of the result column.

Result column name, Pre expression, Post expression can refer by ${col_name} to a column of the input table specified in the Input column column.

The Input column column can have the following values:

  • Select <any column> to copy all columns from the input table.

  • <expression> used when expressions do not need to refer to a column from the input table.

  • name of a column in the input table.

Standardization allows you to combine the power of the source query language (usually it is SQL), in which Pre expression and Post expression are specified, with the Java functions that you specify in the Function column. When Function is specified for at least one column then a round trip of data from the source to InfoLink instance and back is required. So there is a performance penalty for using functions that still can be negligible in practice. For your understanding, below we describe how standardization works in presence of any Function and without them.

When Function is used at least for one column, the value of the result column is computed as follows. First Pre expression for all columns are computed in the source database as a single query. You can refer to the input column from Pre expression using a string literal ${col_name}. InfoLink iterates over the result of the query and applies the corresponding Function to each columns. The result of the function application is stored to the source as temporaty table. Then another query is evaluated to evaluate Post expression for each column. You can refer to the result of function application from Post expression using a string literal ${col_name}.

When Function is not used for any column, the whole operation is executed as a single query in the source. First Pre expression is evaluated then Post expression is evaluated for each column. You can refer to the input column from Pre expression using a string litteral ${col\_name}. You can refer to the result of Pre expression from Post expression using a string literal ${col\_name}.

The following Functions are supported:

  • pre-built functions specified in the Functions -> “Prebuilt functions” section.
  • user-defined functions as described in the Functions -> “User-defined functions” section.
Last updated on