Groovy script to group records based on a field

106 views
Skip to first unread message

jayesh wasnik

unread,
Mar 31, 2021, 4:42:14 AM3/31/21
to sdc-user
Hi,
In our project we are recieving wifi data and we are processing this data.
In this data,I have records which have a "connectedDevices" array and a "networkDeviceId" field,
I want to club the records which have same "networkDeviceId" into a single record,and also merge the "connectedDevices" arrays. And the same thing should happen for records with different "networkDeviceId".
I am trying to use groovy evaluator for this,can you please help me for the same
Also attached figure 1 is the input and figure 2 is the output we need.

If you have any other way besides groovy ,you are most welcome to share
figure 1:
figure 1.png
figure 2:
figure 2.png

Thanks & Regards,
Jayesh

Dima Spivak

unread,
Mar 31, 2021, 9:54:37 AM3/31/21
to jayesh wasnik, sdc-user
Hi Jayesh,

Wanna post your existing script and describe the way in which it's not working? Either the Jython or Groovy processors should work for this scenario.

--
You received this message because you are subscribed to the Google Groups "sdc-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sdc-user+u...@streamsets.com.
To view this discussion on the web visit https://groups.google.com/a/streamsets.com/d/msgid/sdc-user/4ef4a0f7-51e3-4ccb-a4ce-00726db2bddfn%40streamsets.com.
--
-Dima

jayesh wasnik

unread,
Mar 31, 2021, 12:02:36 PM3/31/21
to sdc-user, di...@streamsets.com, sdc-user, jayesh wasnik
Hi Dima,

Following is the groovy code which we have:
records = sdc.records
try {
      if ((sdc.records.getAt(0).toString().contains("com.streamsets"))) {
    

        deviceNames = []
        int t2 =0;
       
        for(record in records) {
           
            connectedDevices = []
            if (deviceNames.contains(record.value['networkDeviceId'])) {
                
                
           
            } else{
                t2 = deviceNames.size();
                
                newRecord = sdc.createRecord(record.value['networkDeviceId'])
                
                newRecord.value = ['connectedDevices' : 'val2']
                newRecord.value = ['networkDeviceName' : 'val2']
              
                deviceNames.add(record.value['networkDeviceId'])
                
                newRecord.value = ['networkDeviceName' : record.value['networkDeviceId']]
             
                connectedDevices.add(record.value['connectedDevices'])
           
                 newRecord.value['connectedDevices'] =  connectedDevices
                
           
            }
            sdc.output.write(newRecord)
        }   
        
    
      }
}catch(ed){}

Here in this script we are adding new "networkDeviceId" to the deviceNames[] array in the "else" part,so whenever we encounter a  "networkDeviceId" which has already been there  and has been feeded into the deviceNames[] we would end up in the if part of the script.
We are facing trouble with the "if" part of the loop,where we have to find the record created in the "else" part having the same "networkDeviceId" ,and we have to add the "connectedDevices" array to this particular record.
While creating the record we are adding  networkDeviceId  to the header(using   newRecord = sdc.createRecord(record.value['networkDeviceId'])  ),so all the records would have the header value as networkDeviceId.



jayesh wasnik

unread,
Mar 31, 2021, 12:05:37 PM3/31/21
to sdc-user, jayesh wasnik, di...@streamsets.com, sdc-user

The records have networkId as header as follows

Untitled.png

Dima Spivak

unread,
Mar 31, 2021, 12:06:55 PM3/31/21
to jayesh wasnik, sdc-user
Jayesh,

Got it, thanks. So there's actually a good deal of nuance here and some serious problems that can arise from what you're attempting to do. Unfortunately, the full details go beyond what I can do justice over an email list. If you're a StreamSets customer, I highly recommend opening a support ticket and letting an engineer walk you through some of the pitfalls and help you come up with a solution that works for your use case.

Cheers,

-Dima

jayesh wasnik

unread,
Mar 31, 2021, 12:37:27 PM3/31/21
to sdc-user, di...@streamsets.com, sdc-user, jayesh wasnik
Hi Dima,

I had already raised a ticket with streamsets,but the engineer assigned told me that groovy implementation is not supported by them.
So I am posting in open forums.Expecting some help from wherever I can get.

Dima Spivak

unread,
Mar 31, 2021, 1:49:50 PM3/31/21
to jayesh wasnik, sdc-user
Jayesh,

I don't really have insight into specific support tickets, but we definitely support Groovy (perhaps the answer you got speaks to some of the concerns I have around performance/stability that your script may lead to?). I think the ticket would be the route to keep this going.

Cheers,

-Dima

Reply all
Reply to author
Forward
0 new messages