Project Title
Extract and Upload TSV file into database
Description
The data to be imported into the database is available in a .tar format which is about 700MB large. When unzipped there are around 50 compressed files with the extension .txt.gz. These files are grouped into approximately 17 categories.
Requirements
Task 1: We need a executable application which can be initiated via windows scheduled tasks that will download the 700MB data file from an ftp location, extract the 700mb file then extract each of the resulting compressed files produced to produce txt files.
Task 2: We also require a sql script that when run will create a sql job in mssql server 2008 to run at a specific time daily which will read all the resulting txt files and bulk insert the values into one table in the database.
The data file in the ftp location is placed there daily so these 2 tasks must run daily except when the Datafile has not been accurately put in the ftp location. Therefore a check needs to be done to determine that the Datafile is OK for processing. To determine if the Datafile is OK for processing in the FTP location there is a status.txt file which will contain similar data to what is shown below
Mon Jan 11 03:33:03 EST 2010|1263198783|OFFER|3262698|SUCCESS
ESS
The whole process must not execute if the Datafile is corrupt, out of date or if the status.txt file does not indicate that the Datafile in the ftp location was successfully added. The Datafile must also be not older the 24hrs.
When the process executes it should take no longer than 30 minutes to complete.
There needs to be flexibility allowing changes in the time the process is executed
Task 1 must always complete before task 2 starts. If task 1 does not take place because the Datafile is corrupt, out of date or if the status.txt file does not indicate that the Datafile in the ftp location was successfully added then task 2 must not execute. Some system of checking programatically whether the executable processes task 1 and whether the sql job should process task 2 needs to be put in place.
A daily email needs to be sent giving relevant details of how the process went. This could be more than 2 email's using the executable to send the first email stating the success or failure with a brief description. The 2nd email could be sent by sqlserver detailing success or failure with a brief description.
These 2 tasks will be used on different servers so must be configurable with different connection strings, ftp logins, smtp settings, execution times, database names e.t.c.
Server details