Processing Multple CSV Files into DB

562 views
Skip to first unread message

jp

unread,
Feb 16, 2012, 9:00:42 AM2/16/12
to Scriptella ETL
Hi,

I am looking to use scriptella to import multiple csv files into an
Oracle database, but am not sure about the file handling side. I have
the CSV to Oracle part working, but how do i pass the file name for
processing by scriptella from the command line - is it possible to do?
I am looking to run a scheduled job to pick up all the files in a
directory (*.csv) and use scriptella to load them into the database.
Can anyone point me in the right direction with this? I also need to
reference the file name passed from the command line to use in the
insert command as it needs to be stored in the database. I have this
set-up as a basic test...


<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>

<connection id="in" driver="csv" url="data.csv"/>
<connection id="out" url="jdbc:oracle:thin:user/
pa...@1.2.3.4:1521:DBNAME" classpath="ojdbc6.jar"/>

<query connection-id="in">
<!--Empty query means select all-->
<script connection-id="out">
INSERT INTO CSV_LOAD_TEST VALUES
( csv_load_test_seq.nextval, ?priority, ?summary, ?status,data.csv,?
id)
</script>
</query>
</etl>

Any help appreciated,

Thanks,

jp

Fyodor Kupolov

unread,
Feb 17, 2012, 11:03:01 AM2/17/12
to Scriptella ETL
Hi,

You can write a shell script which gets the list of files and calls
scriptella with the file name specified as a system property. Example
of invocation from the command line:
java -DcsvFile=/path/to/data.csv -jar scriptella.jar <ETL_FILE_NAME>

A standard properties syntax can be used in the ETL file:
<connection id="in" driver="csv" url="$csvFile"/>

Regards,
Fyodor


On Feb 16, 3:00 pm, jp <jpfranc...@gmail.com> wrote:
> Hi,
>
> I am looking to use scriptella to import multiple csv files into an
> Oracle database, but am not sure about the file handling side. I have
> the CSV to Oracle part working, but how do i pass the file name for
> processing by scriptella from the command line - is it possible to do?
> I am looking to run a scheduled job to pick up all the files in a
> directory (*.csv) and use scriptella to load them into the database.
> Can anyone point me in the right direction with this? I also need to
> reference the file name passed from the command line to use in the
> insert command as it needs to be stored in the database. I have this
> set-up as a basic test...
>
> <!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
> <etl>
>
> <connection id="in" driver="csv" url="data.csv"/>
> <connection id="out" url="jdbc:oracle:thin:user/
> p...@1.2.3.4:1521:DBNAME" classpath="ojdbc6.jar"/>

jp

unread,
Feb 22, 2012, 11:40:08 AM2/22/12
to Scriptella ETL
Thanks for this - works great. I have a couple of questions relating
to this process i hope you can help with. I want to run some
validation on the csv file prior to loading it into the database. Can
this be acheived in scriptella or should i be looking at using
something else for this bit? What i want to do is:

1. Validate the csv file is utf8 encoded. This is a requirement for
our data load.
2. Check the csv has the correct number of columns.
3. Check the columns are named correctly.
4. Run a pass of the data ensuring the data complies with our schema.
I need to test if fields contain valid dates, ip address, urls etc

I have not tried anything like this in scriptella before so any help
appreaciated.

cheers,

jp
Reply all
Reply to author
Forward
0 new messages