Hi Anand/Rahul,
With the given scenario, the lag can happen when you rekey and definitely at the last map when you are making the keys null to distribute data to multiple partitions in output topic.
This is called data skew, a very common issue in distributed processing of key-value data. To do joins the data is partitioned first in basis of HASH of the key then each hash-range is assigned a partition. Now if there are collection of keys that result in same hash range will end up in same partition making one partition bigger than rest of the partitions (note that it can happen without the keys being duplicate). To illustrate it further, suppose your keys are long values (after rekeying) and you have 8 partitions but all the long values are multiple of 8, then hash(key) % 8 would always result in 0 hence all the records will end up in single partitions and rest of the partitions will always be empty and the processing will happen in one node. This is called data skew.
Now having said that in your case you can make the KTabke as GlobalKTable (if the data is fairly large) then you do not have to rekey the input stream as the State of GlobalKTable will updates at each node. This is similar to broadcast joins in spark (where you copy the entire table to all nodes). However, using GlobalKTable would in crease your storage on each node as the same table will be copied to all nodes, so use judiciously.
Coming to the next issue of making key null in last map to distribute the final data to all partitions. Making all the keys null would cause the data skew at its best I.e. all the data will end up in one partition as explained above. Since you are making the keys null, it means that you are not interested in keys, so instead of making it null make it unique. You can use current timestamp as millisecond or microsecond or nanosecond as the key (depending on your data volume and rate) and you will get desired output without skew. Important to note that DO NOT use the same timestamp in the last map, just use current time in the lambda.
Also, if rekeying the input stream is not causing skew then do not use GlobalKTable. That would be unnecessary.
Hope it helps!