trouble with bulk_import and postgresql sequences

Giovanni Messina

unread,

Sep 3, 2012, 5:19:57 AM9/3/12

to activewareh...@googlegroups.com

Hi all,

I'm migrating an entire system from the old database with a few changes

I always pass by the production of a csv file to verify the correctness of conversions

bulk_import works perfectly with the data entry, but does not update sequence with the value of the last record created for the generation of ids

This problem did not appear with mysql but remains with postgresql

when I try to create by console or by rails interface works perfectly even the autoincrement

this is the relevant code

# import.ctl

ActiveRecord::Base.establish_connection(:development)

post_process :bulk_import, {

:file => file,

:truncate => true,

:columns => destination_fields,

:field_separator => ',',

:target => :development,

:table => 'punti_misura'

}

any idea?

Giovanni

Thibaut Barrère

unread,

Sep 3, 2012, 6:18:48 AM9/3/12

to activewareh...@googlegroups.com

Hi Giovanni,

I'm not using Postgres with bulk load currently, but I suspect you'll
have to set the sequence manually in that case (using a post process
for instance).

I think I can find a link on how to do that, I'll try to post it back!

Thibaut
--
http://www.logeek.fr

Giovanni Messina

unread,

Sep 5, 2012, 10:56:24 AM9/5/12

to activewareh...@googlegroups.com

Thank,

this is the sql code to do

SELECT setval('your_table_id_seq', (SELECT MAX(id) FROM your_table)+1);

how integrate it in a post process?

Giovanni

2012/9/3 Thibaut Barrère <thibaut...@gmail.com>

> --
> You received this message because you are subscribed to the Google Groups "ActiveWarehouse Discuss" group.
> To post to this group, send email to activewareh...@googlegroups.com.
> To unsubscribe from this group, send email to activewarehouse-d...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/activewarehouse-discuss?hl=en.
>

Thibaut Barrère

unread,

Sep 5, 2012, 3:16:39 PM9/5/12

to activewareh...@googlegroups.com

Hi Giovanni,

this is the sql code to do

how integrate it in a post process?

you need to retrieve the ActiveRecord connection then use "execute" on the adapter. It could be:

post_process do

query = "SELECT setval('#{your_table_id_seq}', (SELECT MAX(id) FROM #{your_table})+1);"

ETL::Engine.connection("your_db_section_name_in_database_yml").execute(query)

end

or you could also directly ask ActiveRecord with ActiveRecord::Base.connection.

Let me know how it goes!

Thibaut

--

http://www.logeek.fr

Giovanni Messina

unread,

Sep 10, 2012, 11:50:54 AM9/10/12

to activewareh...@googlegroups.com

work all! :-)

post_process do

query = "SELECT setval('#{table}_id_seq', (SELECT MAX(id) FROM #{table})+1);"

ETL::Engine.connection(environment).execute(query)

end

To document this little option needed with postgresql I will continue to add in my wiki repository

Thibaut Barrère

unread,

Sep 10, 2012, 2:02:52 PM9/10/12

to activewareh...@googlegroups.com

Hi,

work all! :-)

Pretty cool! Thanks for the confirmation.

To document this little option needed with postgresql I will continue to add in my wiki repository

Please do so we keep a trace!

-- Thibaut

Giovanni Messina

unread,

Sep 11, 2012, 4:10:37 AM9/11/12

to activewareh...@googlegroups.com

This is an attempt to structure, in a form more navigable, the documentation available updating it at some points

https://github.com/gmgp/activewarehouse-etl/wiki

After a lot of emails I lost the conclusions on what are the guidelines for documentation and I do not know if I've done things in alignment

But I think this is a little off topic :-)

Gio

2012/9/10 Thibaut Barrère <thibaut...@gmail.com>

--

Thibaut Barrère

unread,

Sep 11, 2012, 4:13:37 AM9/11/12

to activewareh...@googlegroups.com

This is an attempt to structure, in a form more navigable, the documentation available updating it at some points

https://github.com/gmgp/activewarehouse-etl/wiki
After a lot of emails I lost the conclusions on what are the guidelines for documentation and I do not know if I've done things in alignment

Great work! Thanks! Well the conclusions were that I don't currently have enough time to bootstrap the documentation effort.

I started a "docs" branch here inspired by Vagrant but this is not finished:

https://github.com/activewarehouse/activewarehouse-etl/tree/docs

In the interim, can I add you on the master wiki as a contributor? Maybe we can merge it this way?

Let me know what you think!

-- Thibaut

Giovanni Messina

unread,

Sep 11, 2012, 4:50:10 AM9/11/12

to activewareh...@googlegroups.com

for me is ok!

Gio

PS

Before push I try a little in a dummy repo not to do too much damage with the wiki's git :-)

2012/9/11 Thibaut Barrère <thibaut...@gmail.com>

--

Thibaut Barrère

unread,

Sep 11, 2012, 5:13:47 AM9/11/12

to activewareh...@googlegroups.com

Hey,

for me is ok!

Cool! Can you try cloning and pushing via this?:

https://github.com/activewarehouse/activewarehouse-etl/wiki/_access

I *think* it should work as is. If it does not let me know and I'll fix that.

Before push I try a little in a dummy repo not to do too much damage with the wiki's git :-)

I trust you for that :)

Thanks for helping out!

Thibaut

--

http://www.logeek.fr

Jamie van Dyke

unread,

Oct 7, 2012, 10:04:01 AM10/7/12

to activewareh...@googlegroups.com

Thanks for this, I was about to implement the same myself. I have a question though, are you using 'use_temp_tables' at any point? I find it never copies sequences or indexes across, so let's say I was importing to a 'companies' table, I'd get it replaced with one that has no indexes or sequences.

The only I've got it to work (with my little understanding of AW) is to hack the gems to support:

CREATE TABLE tmp_companies ( LIKE companies INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES)

INSERT INTO tmp_companies SELECT * FROM companies

Then my only issue is the drop_table fails because it has dependents i.e. the sequence constraints, which I can get around by using DROP ... CASCADE.

Thoughts? Am I doing it wrong?

Thibaut Barrère

unread,

Oct 8, 2012, 6:09:32 AM10/8/12

to activewareh...@googlegroups.com, Giovanni Messina

Hey,

a couple of (random) notes:

- I'm not using the built-in temp tables mechanism on my side (I used other non built-in mechanisms, such as creating a temp table or staging area myself, though).

- if I remember well, I read that it's fairly frequent to disable indexes and constraints completely in ETL/DW

- my current impression is that temp tables, like job scheduling, or error handling, are areas where we should let the end-user decide more in the future (ie: provide hooks etc) rather than implement a specific scenario that does not makes anyone happy :)

- I suspect you already found this, but in case; the copy for pg is done here:

https://github.com/activewarehouse/adapter_extensions/blob/master/lib/adapter_extensions/adapters/postgresql_adapter.rb

If you describe your underlying needs a bit more, I or someone else will be able to give more thoughts on this?

Giovanni, I'd be also curious to know if you used temp tables on this one!

cheers,

Thibaut

--

http://www.logeek.fr

Jamie van Dyke

unread,

Oct 8, 2012, 8:42:15 AM10/8/12

to activewareh...@googlegroups.com, Giovanni Messina

Thanks Thibaut,

Could you give me an example of how you did that temp_table manually? I know the SQL to do it in postgres, as well as ensure constraints and indexes etc. work, but not from within AW.

Giovanni Messina

unread,

Oct 8, 2012, 8:45:30 AM10/8/12

to Thibaut Barrère, activewareh...@googlegroups.com

I'm really under pressure at the time

briefly

I've always used for my temporary migration data to a csv file for checking the correctness of the procedure and then load in the table post process

below a ctl file that describes how I tried to do, I hope will be helpful.

I hope in particular to be able to breathe to take back this part because at the time the "emergency" have become more :-)

environment ||= "development"

ActiveRecord::Base.establish_connection(environment)

require File.expand_path('../../config/environment', __FILE__)

file = File.expand_path(File.dirname(__FILE__) + '/ana_source/DEV.csv')

file2 = File.expand_path(File.dirname(__FILE__) + '/out/scale.csv')

source_fields = [

:codice,

...

]

source :in, {

:file => file,

:parser => :csv #,

# :skip_lines => 13

}, source_fields

after_read do |r|

r[:location] =~ /\A(\d{5}) (.*) \((\w{2})\)\z/ ? r : nil

end

## find rails id for condominio with codice

copy :codice, :condominio_id

transform(:condominio_id) do |key, value, r|

r[:condominio_id] = r[:condominio_id].gsub(/\AC(\d{3})\.\d\z/, '\\1')

end

transform :condominio_id, :foreign_key_lookup, {

:resolver => ActiveRecordResolver.new(Condominio, :find_by_codice),

}

before_write do |r|

r[:codice] = r[:codice].gsub(/\AC\d{3}.(\d)\z/, '\\1')

r

end

ora = DateTime.now

destination :out, {

:file => file2,

:truncate => true

},

{

:order => [

:id,

:codice,

:condominio_id,

...

:created_at,

:updated_at

],

:virtual => {

:id => :surrogate_key,

:created_at => ora,

:updated_at => ora

}

table = "scale"

post_process :bulk_import, {

:file => file2,

:truncate => true,

:columns => [

:id,

:codice,

...

:created_at,

:updated_at

],

:field_separator => ',',

:target => environment,

:table => table

}

post_process do

query = "SELECT setval('#{table}_id_seq', (SELECT MAX(id) FROM #{table})+1);"

ETL::Engine.connection(environment).execute(query)

end

2012/10/8 Thibaut Barrère <thibaut...@gmail.com>

Thibaut Barrère

unread,

Oct 8, 2012, 1:48:35 PM10/8/12

to ActiveWarehouse Discuss, Giovanni Messina

Hello Jamie,

> Could you give me an example of how you did that temp_table manually?

> I know the SQL to do it in postgres, as well as ensure constraints and indexes etc. work, but not from within AW.

You can either use a specific control file with a pre_process block, or a pre_process block inside the same control file.

Here is a related example where I launch migrations (but you could run arbitrary SQL here instead):

https://github.com/activewarehouse/activewarehouse-etl-sample/blob/master/advanced/etl/prepare_db.ctl

The pre_process blocks are invoked right after the control file is loaded, to achieve things like creating folders, transcoding files, preparing db stuff etc.

Hope this helps!

Thibaut

--

http://www.logeek.fr

Reply all

Reply to author

Forward