Hi Ben,
I think some simple fixes like these to “normalize” synthetic output for better matching/parsing is definitely a good idea. Assuming the discarded data probably can be made good, I don’t think users should be stuck waiting for data just because we didn’t strip some noise.
The more complicated solutions to make the model more “pluggable” with per-model output parsing or other solutions also sounds like something we will need (at least long term). As it gets more complicated, it would be best if it was not too speculative. That can be tricky, but it sounds like you’ll be able to come up with a reasonable problem and solution in the short-term using your example.
Regards,
Mark (markstur)
--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@instructlab.ai.
To view this discussion on the web visit https://groups.google.com/a/instructlab.ai/d/msgid/dev/CAOpbpxFoRz52Vc9o9wTsKB0Hw3RjbtYkCQpHmty3PUfHdEMckw%40mail.gmail.com.