Can I exclude Yesterday from the Normalize date action?

7 views
Skip to first unread message

Miro Lucassen

unread,
Jan 9, 2026, 5:53:45 PMJan 9
to TextSoap
Hi, I am using TextSoap to clean a set of written text (in Dutch). Those texts have a date stamp formatted yyyymmdd, the publication date. In the Netherlands we use dd-mm-yyyy and TextSoap performs the conversion perfectly. However, in these texts the words Yesterday (gisteren), Today (vandaag) and Tomorrow (morgen) are also used by the writer. These are converted to a date which is undesirable. How can I exclude those words from the action in the cleaner?
Greetings from Amsterdam,

Miro

Mark Munz

unread,
Jan 9, 2026, 5:58:42 PMJan 9
to text...@googlegroups.com
Create a custom cleaner, then filter out the numbers you want to transform (which will ignore the word dates like cistern, vandaag, and morgen):

image.png

--
You received this message because you are subscribed to the Google Groups "TextSoap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textsoap+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/textsoap/2e293edf-9d37-4eb4-8be6-c407e6d2f89an%40googlegroups.com.


--
Mark Munz
unmarked software
https://textsoap.com/

Mark Munz

unread,
Jan 11, 2026, 9:21:16 AMJan 11
to text...@googlegroups.com
Apologies, looks like Gmail auto-corrected "gistern" to "cistern" (and I didn't catch that) before I sent it.

Also, I wanted to give a more detailed description of how this custom cleaner works.
The "Normalize Dates" cleaner uses macOS to match dates, but as you've seen, it can be very broad. It many contexts, that can be good. But not in all cases.

The If Matches Regex action lets you limit the text to process and then apply the cleaners contained within it to only that matched text. This effectively prevents the "Normalize Dates" cleaner from trying to convert anything it might guess to be a date (words like gistern, vandaag, morgen, or yesterday, today, tomorrow).

In this case, we're taking a list of numbers (which you indicated would likely be yyyymmdd) and applying the "Normalize Dates" to just that text.
I went with the simplest format, but you could construct a more limiting regular expression to ensure that you only capture dates. For example, you could verify the date starts with a 19 or a 20 for 19xx and 20xx as the year. The expression would look something like: (19|20)\d+

Hope that provides a bit clear explanation.

Miro Lucassen

unread,
Jan 12, 2026, 1:37:14 PMJan 12
to TextSoap
Thanks Mark, great help! I was not aware of the possibility to nest cleaners.

Op zondag 11 januari 2026 om 15:21:16 UTC+1 schreef Mark Munz:
Reply all
Reply to author
Forward
0 new messages