Getting validation errors on lookup tasks with a dynamic data flow

Nathaniel Roark

unread,

Jul 19, 2017, 5:48:35 PM7/19/17

to COZYROC

I'm trying to build a dynamic data flow to handle a potentially growing and changing set of queries to pull data from SalesForce. My control flow has a loop that processes details from a table returning function and sets the query filters through a script task:

The data flow collects the records, converts all columns to unicode strings, unpivots them into field_name and field_value, adds a couple variables as columns, does a lookup for an Id based on the field_name column, then drops the result into a recordset:

Finally, the control flow appends the results from each recordset into a datatable, then sends the datatable to a stored procedure to process and insert.

This all works in a more static configuration where I have dataflows for each query, but I wanted something more dynamic. We have an ever growing list of fields to import. The problem I'm having now is with the lookup task. I've rebuilt to the setup above with everything set dynamically with a dataflow+ task:

With a standard lookup set to the lookup query variable:

I get no errors during the build, but once I try to run it, I get this error:

Information: 0x4004300A at Import to recordset, SSIS.Pipeline: Validation phase is beginning.

Error: 0xC004701C at Import to recordset, SSIS.Pipeline: Add definition Ids.Inputs[Lookup Input].Columns[THUNK_COLUMN] has lineage ID 106 that was not previously used in the Data Flow task.

Error: 0xC004706B at Import to recordset, SSIS.Pipeline: "Add definition Ids" failed validation and returned validation status "VS_NEEDSNEWMETADATA".

Error: 0xC004700C at Import to recordset, SSIS.Pipeline: One or more component failed validation.

Error: 0xC0024107 at Import to recordset: There were errors during task validation.

I have metadata validation set to false for the task and delay validation set to true on my connections. It complains about the thunk connection, but I can't seem to remove that to get the package to build. I've also tried it with the original lookup query and connecting the field_name columns, but get the same error and have no idea how to fix it.

I also tried using the lookup+ task in the data flow (copied directly from the samples) and get the same validation error above, concerning the lineage ID.

Ivan Peev

unread,

Jul 20, 2017, 4:23:20 AM7/20/17

to COZYROC

Hi Nathaniel,

I would recommend against using the Recordset object in your design. First it is only available in 32bit execution and second it has terrible memory management. It cannot be used with large amounts of data. This is my advise regarding your overall architecture.

To find what is the issue, I would recommend a more iterative approach, where you start with a very simple design and make sure it works. Then gradually start adding more and more elements from your final design. This will help you determine what's causing the validation failure.

Small tip: enable the package logging and then execute to see what is the Data Flow Task Plus doing behind the scenes. This could help you determine what might be wrong in your setup.

Nathaniel Roark

unread,

Jul 20, 2017, 1:38:16 PM7/20/17

to COZYROC

Yes, I realized that the problem didn't appear to be some simple config I missed when you didn't immediately recognize the error, so I started rebuilding it yesterday. Really early I learned the data conversion component needed a dataset and I was giving it only a datatable, so I added it to a dataset to get that piece working. However, that's also when I realized it won't do what I need unless the converted columns can be replaced, I'd have to process the output with a script component to remove / replace the old columns before carrying on to the unpivot. This seemed like a lot of work. Instead I'm simply using the recordset destination and processing it in a C# script to handle the conversion and unpivot in one select expression. Here in case search terms bring someone to this post:

      // Parse the column names into a string array, which we'll use in a select to unpivot the result.
      string column_names = (string)Dts.Variables["User::column_names"].Value;
      string[] column_array = column_names.Split(',').Select(sValue => sValue.Trim()).ToArray();

      // Select each row and column to unpivot the result into a generic list, while converting the values to strings.
      var resultSet = dt.AsEnumerable()
        .Select(row =>
          column_array.Select(
            col_name =>
              new
              {
                field_name = col_name,
                field_value = Convert.ToString(row[col_name])
              }
          )
       ).SelectMany(x => x).ToList();

This may not perform optimally for big data, but my result sets are numerous and small, so gathering them together temporarily into a recordset works well. I then read from it into a new datatable, then merge it into another datatable I'm holding in a package variable. I do this at the end of each query within the loop, so I finish with one datatable that contains all my unpivoted results that I send to a proc to bulk insert within a transaction.

Thanks for responding. Starting simple and adding components back in one at a time is good advice.

Reply all

Reply to author

Forward