Bulk Create for models?

4 views
Skip to first unread message

Rob Galanakis

unread,
Aug 29, 2021, 10:51:36 PMAug 29
to sequel-talk
I am trying to optimize some code to use bulk creation of models- on the order of dozens, not hundreds, and I do still need models, so raw dataset methods aren't suitable.

I don't see a way to do this with Sequel- we have Dataset#import and #multi_insert, and they can return an array of primary keys. But there is internal machinery in Model (@new and @modified attributes among others) that is unavailable, so even if I set the primary key value, the model still thinks it is new. It'd be nice at times to "enqueue" an unsaved model for bulk insert; save it; and then use it once saved.

Is this something reasonable to add? Django, which for the most part I loathe, has this sort of thing (of course its bulk create is entirely hand-rolled and does not use underlying database machinery so is filled with other issues). It would also seem possible to work around this by being able to update a model in-place, or something like 'refreshing' it from some data rather than the database (`_clear_changed_columns` is not enough because of the internal attributes).

Anyway, this can be worked around, but I did want to bring it up in case there is interest.

Thanks as always for Sequel!
- Rob

Jeremy Evans

unread,
Aug 29, 2021, 11:01:10 PMAug 29
to seque...@googlegroups.com
On Sun, Aug 29, 2021 at 7:51 PM Rob Galanakis <r...@lithic.tech> wrote:
I am trying to optimize some code to use bulk creation of models- on the order of dozens, not hundreds, and I do still need models, so raw dataset methods aren't suitable.

I don't see a way to do this with Sequel- we have Dataset#import and #multi_insert, and they can return an array of primary keys. But there is internal machinery in Model (@new and @modified attributes among others) that is unavailable, so even if I set the primary key value, the model still thinks it is new. It'd be nice at times to "enqueue" an unsaved model for bulk insert; save it; and then use it once saved.

Is this something reasonable to add? Django, which for the most part I loathe, has this sort of thing (of course its bulk create is entirely hand-rolled and does not use underlying database machinery so is filled with other issues). It would also seem possible to work around this by being able to update a model in-place, or something like 'refreshing' it from some data rather than the database (`_clear_changed_columns` is not enough because of the internal attributes).

I don't think it sounds like something I'd want to add to Sequel::Model, but I'd have to see an implementation.  There would have to be some significant advantage over the equivalent of:

  enumerable.each{ModelClass.create(...)}
 
It would also have to be general enough that I could see a substantial number of people using it.

You might want to start out creating a plugin for what you want.

Thanks,
Jeremy

Rob Galanakis

unread,
Aug 29, 2021, 11:48:44 PMAug 29
to seque...@googlegroups.com
For now I went with something like this (my actual implementation isn't quite so general):

```
module Bulk
  class Ref
    attr_accessor :instance
    def initialize(instance)
      @instance = instance
    end
  end
  class Creator
    def initialize(model_cls)
      @model_cls = model_cls
      @refs = []
    end
    def prepare(instance)
      r = Ref.new(instance)
      @refs << r
      return r
    end
    def create_all
      pks = @model_cls.multi_insert(@refs.map(&:instance), return: :primary_key)
      by_pk = @model_cls.where(id: pks).all.index_by(&:pk)
      refs = @refs
      @refs = []
      refs.each_with_index do |ref, i|
        inserted_pk = pks[i]
        ref.instance = by_pk[inserted_pk]
      end
      return refs
    end
  end
end
```

I don't mind the additional caller complexity of having to use `ref.instance` for now; I'm a bit hesitant to go the plugin route since it will have to muck around with the guts of Model. But I'll keep it in mind if I need it in the future (for what it's worth, I have had to add a Sequel-style multi_insert helper for every significant Django project I work on; this is the first time I've needed a Django-style bulk_create helper in 7 years of Sequel).

- Rob

--
You received this message because you are subscribed to a topic in the Google Groups "sequel-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sequel-talk/w8pWnUWHaC4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sequel-talk...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sequel-talk/CADGZSSfZpD2JnMowS9oh_H_tffDdfDGBw%3Dhec0WKRpz2_QoD0A%40mail.gmail.com.

Jeremy Evans

unread,
Aug 30, 2021, 1:28:09 AMAug 30
to seque...@googlegroups.com
For what you want, assuming you have an array of model instances, you can use this approach (should work on PostgreSQL and SQLite):

 DB.create_table(:ts) do
  primary_key :id
  String :name
end

class T < Sequel::Model
end

a = [T.new(name: 'A'), T.new(name: 'B')]
columns = a.first.keys
a = T.dataset.
  returning.
  with_sql(:insert_sql, columns, DB.values(a.map{|x| x.values.values_at(*columns)})).
  all

Thanks,
Jeremy

Rob Galanakis

unread,
Aug 30, 2021, 3:19:36 PMAug 30
to seque...@googlegroups.com
Thanks, that is some useful code!
For this case though, it doesn't fulfill the need- the original models in `a` are replaced with new references. For example, this situation would need to pass:

x = T.new(name: 'A')
a = [x]
bulk_create_ala_jeremy(a)
assert x.id > 0
assert !x.new?

Admittedly, this does seem rather obscure- in this case, I have a series of what are basically paired models; I want to do something like this:

model_a = []
model_b = []
(0..5).each do |i|
  model_a << A.new
  model_b << B.new
end
bulk_create(model_b)
model_a.zip(model_b).each { |a, b| a[:b_id] = b.id }
bulk_create(model_a)

My experience with Django's bulk_create tells me there are more (and less exotic) cases; but I am not sure that's true in the general sense, it may be more an artifact of the designs Django ORM pushes you down (or the bad modeling of Django projects I've worked in, or both).

Thanks,
Rob

--
You received this message because you are subscribed to a topic in the Google Groups "sequel-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sequel-talk/w8pWnUWHaC4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sequel-talk...@googlegroups.com.

Jeremy Evans

unread,
Aug 30, 2021, 3:25:38 PMAug 30
to seque...@googlegroups.com
On Mon, Aug 30, 2021 at 12:19 PM Rob Galanakis <r...@lithic.tech> wrote:
Thanks, that is some useful code!
For this case though, it doesn't fulfill the need- the original models in `a` are replaced with new references. For example, this situation would need to pass:

x = T.new(name: 'A')
a = [x]
bulk_create_ala_jeremy(a)
assert x.id > 0
assert !x.new?

Admittedly, this does seem rather obscure- in this case, I have a series of what are basically paired models; I want to do something like this:

model_a = []
model_b = []
(0..5).each do |i|
  model_a << A.new
  model_b << B.new
end
bulk_create(model_b)
model_a.zip(model_b).each { |a, b| a[:b_id] = b.id }
bulk_create(model_a)

For this type of thing, it sounds like you would want to use the nested_attributes plugin, so you could automatically create and associate the appropriate model_b when saving model_a.

Thanks,
Jeremy
Reply all
Reply to author
Forward
0 new messages