Handling malformed JSON records with leading special characters in Kafka source

Joice Jacob

unread,

Nov 11, 2025, 8:50:36 AM (10 days ago) Nov 11

to Nussknacker

Hi,
I'm experiencing issues processing Kafka messages in Nussknacker where some records arrive with a leading semicolon before the valid JSON data.

Expected record format:
{"msisdn":"9876543210","status":1,"eventtimestamp":1730975400000}

Problematic record format:
;{"msisdn":"9876543210","status":1,"eventtimestamp":1730975400000}

Is there a configuration-based approach in Nussknacker to handle malformed JSON records (with leading special characters) without writing custom deserializers?
What's the recommended pattern in Nussknacker for handling such data quality issues?
Has anyone successfully implemented a similar solution for cleaning malformed JSON at the source level in Nussknacker?

Environment:
Nussknacker version : 1.18.0
Deployment: Flink

Joice Jacob

unread,

Nov 12, 2025, 8:56:09 AM (9 days ago) Nov 12

to Nussknacker

Hi,

Please assist me in resolving the above issue.

What I’m looking for

Is there a configuration-based approach in Nussknacker to handle/clean such malformed JSON (e.g., strip leading special characters) before parsing, without writing a custom Kafka deserializer?
What’s the recommended pattern in Nussknacker for basic data-quality sanitation on input (e.g., trimming, regex remove) prior to JSON parsing?
Has anyone successfully implemented this kind of cleanup at the source level in Nussknacker and could share a snippet or node pattern?

Thanks in advance for any pointers or examples!

Arkadiusz Burdach

unread,

Nov 12, 2025, 9:19:48 AM (9 days ago) Nov 12

to Nussknacker

Hi,

In the current, not yet released version, there is a possibility to consume any string from a Kafka topic. To do this, you must use a topic without a registered schema and choose Content type: PLAIN.

After doing that, the #input will have the String type.

Then, you can use: #CONV.toJson(#input.replaceFirst("^;", "")) expression to clean up such records.

The result of #CONV.toJson is of Any (aka Unknown) type, so code completion won't work and you'll have to use dynamic field accessors: #input['fieldA'] and conversions: #input['fieldA'].toInteger to access the given data.

The other approach that I have in mind is to use some lightweight application in front of Nussknacker (e.g. Kafka Connect) in order to clean up data and rewrite it to some other topic.

Will one of these approaches be fine for you?

Joice Jacob

unread,

Nov 13, 2025, 12:27:07 AM (8 days ago) Nov 13

to Arkadiusz Burdach, Nussknacker

Hi,

Thanks for the update. Could you please let me know when we can expect the new version containing this feature to be released?

Thanks & Regards,
Joice Jacob

--
You received this message because you are subscribed to the Google Groups "Nussknacker" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nussknacker...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/nussknacker/c4d9d492-ed1d-4aec-8433-30032ff59892n%40googlegroups.com.

Reply all

Reply to author

Forward