The bulk processor itself will rely on "insert only" bulk features of the databases (see adapter_extensions on github) and will make no attempt to upsert instead of insert.
If you need to upsert, here are a couple of possibilities:
- slow but easy: map an AR model to your destination and rely on first_or_create or similar
- you could also mix the "update_database_destination" with bulk insert (using the bulk to insert only, and the slower update destination for updates), but this would probably require two passes
- there is a "insert_update_database_destination" too, probably a bit slow though
Basically it really depends on the upsert capabilities of your target datastore (if any).
Sidenote but on handling upsert (as I recall a note you send me privately): here are common patterns to handle incremental loading:
- if your input data is immutable (doesn't change once you got it), you can just store an incremental id or timestamp and ask the source store for things more recent than this id/timestamp; and bulk insert will do marvels here
- if your input data is mutable, you'll have to rely on modified_at > x + created_at > x or similar, and use a proper upsert processing
Be extra careful to handle errors properly here, ie: you'll want to restart the whole process if something goes wrong, to make sure you don't miss any record or record update.