Each day, The ETL converter should copy the file to a local directory via FTP; convert the file to SQL, then delete the file from the local dir. It could be 4 times a day, maybe one hour after the txt file is completed on the PLC.
We can not delete the file from the PLC, since others should also have access to the data.
The data on the PLC will be deleted after 10 days using FIFO
How do I keep track of which files I have copied ?
Could create a TXT file on the local drive and write the file names, compare with what’s on the FTP drive ?
But how do we do this? That is file comparison..can ETL do this ?
Is there any other way to do this ?
Keep in mind, it could be that the internet connection is offline for a day or 2, the server could be down, The PLC could be down. In some situations you will need to copy 4 files, sometimes 6 files..depends.. Or you could have gaps in the timestamps of the file names from the PLC if it has been down for a while. I know we would then also have gaps in the SQL, but that is OK.
There are several ways of performing this task
First of all, you need to speak with the company which provides the files. It takes time to upload files and you need to make sure that you are not downloading files while they are being uploaded. For example, you can ask them to put additional files once the process is completed. If the file is there the process completed and you can download the files.
The next step is to load data into the database. That is easy, there are several tutorials about it.
Avoiding loading the same data twice.
First of all, we need to get the source file name, this can be done by using a metadata transformation object
Then we need to check if the file was loaded before, we can use Validator “In List object” for that
and the last step to load the data into the table and load the list of processed files into a separate table
Overall logic as follows