InfoLink Unified Data Model (UDM)
InfoLink provides an unified access to all data sources by mapping their data structures onto the InfoLink unified data model (UDM). InfoLink UDM consists of the following concepts:
-
Table - a named set of records. Objects stored in data sources are represented as tables in InfoLink. For example, CSV Files on S3, Salesforce objects (e.g. Account, User, etc), tables in a relational database are all repsentad as tables in InfoLink.
-
Table schema - an ordered list of columns that defines the structure of a table (and all its records).
-
Record - a ordered list of columns’ values.
-
Space - a named collection of tables. For example, schema in PostgreSQL is represented as space in the InfoLink UDM.
-
Source - a named collection of spaces.
-
Format specification - specifies archive (e.g. ZIP, etc) and content formats (e.g. CSV, XML, etc) of a table. Format specification is provided by the user as parameter of Load operation to describe the format of a file in a file-based source.
Many operations (eg Load) support access to data sources at two levels:
-
file-based level - low-level access treating a data source as a set of files where a file is a stream of bytes without analysing their internal structure. An example of a data source that typically access at this level is Amazon S3;
-
record-based level - high-level access that provides an interface to access objects as a set of structured records (eg Salesforce CRM, PostgreSQL).
Any data source can be accessed at any of levels listed above. InfoLink implements an optimization technique that chooses access level for source and target data sources to achieve best performance. If a data source supports a native mechanism to access data at the required level then the native mechanism is used because native implementations usually provide the best performance. Otherwise InfoLink’s mechanism is used. For example, both Amazon S3 and Azure Blog Storage provide only file-based access. To copy a CSV file from S3 onto Azure Blog Storage, the file will be opened as a steam without analyzing its content (i.e. file-based level of access) and copied onto Azure Blog Storage. If you want to convert the file from CSV to XML then Infolink will parse the content of the file into records (i.e. record-based access), convert records into XML, compose XML file, and store it onto Azure Blog Storage. Another useful to understand example would be to load a table from PostgreSQL into MySQL. Both of these systems support native methods to load export and import CSV files (i.e. file-based level of access). Load operation will use file-based access level for source and target because exporting and then importing CSV file will work much faster then reading the source table by record and inserting the records into the target table.