Profiling
Data profiling allows you to examine the data and collect statistics or informative summaries about that data. The results of data profiling can help you:
-
To understand the data as the first critical step of any data engineering project.
-
To find data quality rules and requirements that will support a more thorough data quality assessment in a later step.
InfoLink supports the following data profiling operations:
-
Column Frequency Analysis
-
Columns Profile
-
Join Analysis
-
Reference Discovery
There is a corresponding subsection in the Specifications section of the navigation tree for each type of data profiling. You first have to create a data profile specification of the corresponding type. Then you can:
-
execute it and see the result by right clicking on the specification name.
-
or create an operation in a scenario that executes the specification.
Column frequency analysis
Column frequency analysis allows you to get the frequency distribution of values in a column. The result is a table with two columns: the first column contains all unique values of the input column and the second column contains the corresponding count - how many times the value appear in the input table.
Create column frequency profile specification
-
Right click on Specifications -> Profiling -> Column Frequency in the navigation tree and select Create profile.
-
Enter the profile name and click Create button. The current window will show the column frequency specification parameters.
-
Enter the parameters: type or select source, space, table, and column for which you want to compute the frequency distribution; enter the name of target table where the result will be stored.
Execute column frequency profile specification
-
Right click on the column frequency profile specification, select Execute, and wait for completion.
-
Right click on the column frequency profile specification and select View result.
Alternatively you can create ColumnFrequencyProfile operation in a scenario providing the name of the specification as a parameter and execute it.
Delete column frequency profile specification
- Right click on the specification and select Delete.