When you care about software quality, it's always interesting to see where the problematic bugs emerge from. There are always lessons it in.
I had a stupid, problematic bug today. Entirely avoidable, yet I didn't avoid it, and instead had to waste time tracing it and then clean up the mess afterwards.
An upstream system that emits spreadsheets containing share transaction records changed (actually improved) their format slightly, splitting a column that used to contain a Symbol and a Name, eg "ABB - Aussie Broadband", into two separate columns.
They don't publish or version their data formats so the first symptom was an import that failed. As I'd deemed the importer boring and non-critical code, it was vibe-coded, and it failed rather ungracefully, importing zero rows rather than any sort of validation failure.
I jumped in and patched a fix, branching for the two versions of the format. But what got accidentally left just on one side of the branch was a 'symbol.trim' call that stripped off invisible whitespace the spreadsheet carried.
The bug lay low and the new importer seemed to work, while importing transactions for stock codes like `ABB `. Then downstream jobs kicked off and transformed & copied the incorrect data into other tables, smearing the invalid data through the system.
Eventually, it showed up in a non-obvious place: reconciliation reports failed to balance. Basically, we'd sold some `ABB ` stock, but it couldn't find the corresponding acquisition event.
It took some crafted regexes to clean the bad data out of the database, and a certain amount of risk; DELETE FROM asset_transaction WHERE ... is not a comfortable query to run in production.
Anyway, reflecting: what's the best defence against this sort of bug?
Well, a good unit test would have found this, but increasingly, I'm wanting to tackle these problems in the type system. Make invalid states unrepresentable.
I think a neat (albeit partial) solution is to use Refinement Types. Go the Iron library if you use Scala 3,
https://github.com/Iltotore/iron. Make asset codes Alphanumeric:
case class Asset(code: String |: Alphanumeric, typ: AssetType)
With a type constraint in place, even lazy vibe-coding AIs, trying the quickest hack that will please the human, will be forced to validate the code doesn't contain whitespace to satisfy the type-checker. And lazy humans too :P
It's just a crude constraint, of course, but once the initial cost of getting fluent with refinement types is paid down, it looks like an appealing cost/benefit ratio to me.
-Ben