---
title: "What is an ETL and why PLM should care?"
date: 2018-10-18
categories: 
  - "software-solutions"
  - "technological-stack"
coverImage: "Shémas_dintégration_de_données_1.jpg"
---

Your PLM project will not install a new isolated island. If you do so, then you haven't understood the whole digitalisation process and digital thread concept applying not only to PLM but to your whole organisation. Therefor, you need to understand up-front how your system will communicate with the rest of the company's tools. You will also have to define how you interact with the outside world. The ETL is part of an eco-system of tool that helps you for this.

> Don't start a PLM project without knowing what an ETL is !!

## ETL stands for Extract Transform Load

This is I believe the sample TLA I have seen so far. It says exactly what it does: Extract, Transform and Load data.

#### Extract

The main strength of an ETL on the Extract phase is to allow you to retrieve data from as many sources as possible. The sources can be diverse from application that are exposing an API to a simple text file stored in a folder.

Types of system you may query :

- Business Application

<figure>

![](images/applicationIntegration.png)

<figcaption>

Talend native list of application connectors

</figcaption>

</figure>

- Web services
- Databases
- Files

<figure>

![](images/filesInputs.png)

<figcaption>

Talend input filetypes

</figcaption>

</figure>

The goal of ETLs will be to have as many connectors possible. Talend and its open source model allowed to let the community build a lot of integrations.

#### Transform

Transform is where it becomes much more tricky. Transformation requires a lot of different capabilities like mapping fields, converting flow into arrays of data or into objects, filtering data, joining tables aggregating data,etc. When you are done with all the available tools your ETL provide, most of them allow to add some custom code to make sure you are not limited.

<figure>

![](images/processingData.png)

<figcaption>

Talend toolset for transforming data

</figcaption>

</figure>

#### Load

Finally the LOAD process has the same technical goal of the EXTRACT process: load the prepared data to as many target systems as possible.

## How does it fit in your PLM environment?

### Migration

The #1 scenario for ETL is migration. I have managed a few migration perfectly with an ETL. Usually the graphical UI and the versioning of your ETL setup will make it possible to explain how the migration flow works without getting too technical.

### Keeping in touch with legacy systems and files

I have used ETLs several times to make sure legacy systems could still be integrated to the new solution we were provided. This is usually where we work the most with extracting/inserting data in databases or even playing with files. ETLs often have this cool feature which allow to look for any change in a folder. So whenever a new file appear it can trigger an ETL flow.

<figure>

![](images/waitForTalend.png)

<figcaption>

Talends tools for triggering a flow on a change

</figcaption>

</figure>

### Connecting authoring tools and a central data

The long term use-case for an ETL is the connection with a larger enterprise system like an ESB (Enterprise Service Bus) which I will describe in a future blog post. The goal is to have a central system which will manage the different data sources and connect triggers and data on a single bus. The connection between this bus and any other system would be handled with an ETL allowing to standardize as much as possible the data on the Bus.

## The Risk !

The one risk with ETL is to start creating too many one-to-one connections. It becomes complicated to maintain at some point. Depending on the context it might suit you very well because you need to keep these integrations independent in their evolution. But the bigger the system becomes the more you will need to look for a better organized system using an ESB.

## Sample use-cases

- Retrieve part and labor cost from various ERP to inject in your change management process in order to give the right cost saving information to the cost engineer.
- Replace one software which was reading files from another manufacturing software using an ETL to transform the input file in to web-service calls to the new system.
- Migrating data from Excel files and Access database to fill the newly deployed PLM solution.
- Synchronizing an engineering BOM from one PLM to another
- Synchronizing a Problem Report listing in a PLM from a bug tracker solution like Mantis, jira or other.
- ...

## Some existing solutions you can download:

I have found everything I wanted using Talend. Haven't tried others except clover ETL a few years back.

- [Talend](https://www.talend.com/download/?_ga=2.235379266.2112442318.1539861869-190797227.1539861869)
- [Pentaho data integrator](https://www.hitachivantara.com/en-us/products/big-data-integration-analytics/pentaho-trial-download.html)
- [Jaspersoft ETL](https://community.jaspersoft.com/project/jaspersoft-etl)
- [GeoKettle](http://www.spatialytics.org/projects/geokettle/)

- [CloverETL](https://www.cloverdx.com/product)
- HPCC systems
- Jedox
- Apatar

Here is a great video introducing to ETL

https://www.youtube.com/watch?v=K\_FCHYWGGug
