The tables don't have the same schema/data.
I need to denormalize product data from sql server to solr documents (with multivalued fields).
the source schema is somewhat like i posted before (product / categories / articlegroup / product attributes).
But we don't need all data, we need just the products matching some sql criteria.
We found this query is somewhat costly, so we optimized it.
All in one sql command:
- We first determine all product id's we want and insert them into a temp table.
- we select al product data (one row for each product)
- then select categories from products which match the temp table.
- then select other attributes for products in the temp table
- etc.
on the client side, I fetch the first results (products) and construct a Product object from it.
i put those in a dictionary<productid, product> for quick access later.
Then I fetch the next resultset.
I loop trough this set once, and add that data to the correct product using the previous dictionary: productsById[productid].Categories.Add(rdr["CatId"], etc.)
In the end, this leaves me with a large collection of denormalized product data.
I do all this in one AbstractOperation currently, passing all the products as a result for further processing.
I was hoping I could split this one big operation into several, so we can for example add a new operation for each new product attribute to extract/transform/load.
I could express this as one big SQL query and figure the joining out on the client side, but this would lead to one big cartesian product.
Op zondag 22 juli 2012 16:35:44 UTC+2 schreef Nathan Palmer het volgende: