Hi Werner,
If your Hadoop cluster is Kerberized you must have a Kerberos service principal for the data collector, typically it should be sdc/<HOST> (where HOST is the hostname where the data collector runs) and the data collector user name for Hadoop is 'sdc'.
If your Hadoop cluster is not Kerberized, the data collector user name for Hadoop is the unix user name that started the data collector. This could be 'sdc' if you are running it as a service or your own user name.
Please determine your data collector user name for Hadoop. For the reminder of this email I'll refer to it as user 'foo'.
In the Hadoop FS destination, if want to impersonate a different Hadoop user than the one running the data collector (user 'foo'), in the 'Hadoop FS' tab, you should set the 'HDFS User' to the desired user. This it is all you have to do in the data collector.
Next, you'll have to configure the HDFS Namenode to allow the data collector user (user 'foo'), to be a proxyuser for other users. You do that by setting the following properties in the hdfs-site.xml of your Namenode:
hadoop.proxyuser.foo.host=*
hadoop.proxyuser.foo.groups=*
Remember, this is assuming your data collector is using the Hadoop user name 'foo'.
Once you make those changes, you need to restart the Namenode.
Then you should be all set.
If you are running a production setup make sure you configure the proxyuser properties above in the most restrictive manner possible for your usage (instead of using '*' that means ALL).
NOTE: If you leave the 'Hadoop FS' destination 'Hadoop User' configuration empty, then your pipeline will interact with HDFS as the Hadoop user running the the data collector (user 'foo').
Hope this helps.
Alejandro