Utah Python Meeting March 2015 - ETL (Extract, Transform and Load) by Robert Stewart

45 views
Skip to first unread message

Clint Savage

unread,
Mar 5, 2015, 12:23:02 PM3/5/15
to Utah Python List
The March 2015 Utah Python Meeting is next Thursday, March 12, 2015!

We're solving a problem again!

Robert Stewart emailed back to my request and suggested some tools around ETL (Extract, Transform and Load). Below, we are presenting a possible problem and are looking for solutions around this concept.

Data and Python
=====================

A small training program wants to improve its curriculum. It supplies
comma separated value [csv] dumps from the spreadsheets it currently
uses to track its program. The task?

1. Get the data in these csv files into a relational database.
2. Automate the process.
3. Update the process as the source and target schema change.
4. Update the process without causing the automated piece to malfunction.

Questions to discuss:

- What's the best way to manage schema changes?
- How should inserts and updates be handled?
- How should the loading process be automated and monitored for errors?
- How should the data be normalized?
- What tool set is best for handling this kind of problem?

What's your solution?

Although the focus is on Python I'm open to good solutions from any
technology.

-- Where --

Needle.com
14864 S. Pony Express Road
Bluffdale, Utah 84065
Map: http://goo.gl/maps/EEhw9

-- When --

Thursday, March 12, 2014 @ 7pm

Cheers,

herlo

Vernon D. Cole

unread,
Mar 6, 2015, 12:11:10 PM3/6/15
to utahp...@googlegroups.com
I spent a lot of time doing this for a series of CDC surveys in Nigeria.  Copying spreadsheet date into a dbms is easy. (I was feeding a django PostGIS database.) Normalizing and validating the data before it goes in -- that is a challenge.
I have some code samples of how I did it.  I will bring it if I can -- the good car will be in the body shop, so I will have to make the pilgrimage in my pickup.
--
Vernon Cole

Reply all
Reply to author
Forward
0 new messages