Serious eye-rolling at the ~2M case.
> Currently for debugging we preserve the tail of output snippet up to the
> limit, and to get full output one can run outside of the launcher
> (--single-process-tests/--single_process). Does that sound good?
It's fine for cases where the log spew is cross-platform, but
sometimes/often it can be a substantial investment to build something
on a specific platform if it's not one of the core platforms you
routinely build on. If I make a change and it tips a test from just
under the threshold to just over the threshold, then I'm more likely
to do the right thing if it's quick and easy to do it immediately
without adding barriers.
My understanding is that storing many megabytes of log spew is a
scalability issue for the steady state, where we're storing it for the
successful runs. There should be orders of magnitude fewer failures
due to excessive logging, so keeping more in that case shouldn't be a
huge infrastructure issue.
Also, this is more of an issue when we tighten the screws. If we keep
100k of data, in all likelihood there's more than enough there to
identify major culprits. If we keep 10k, it's less likely. If we
kept 2k it's very possible that the majority offender doesn't even
show up, or isn't obvious. You can work around it by commenting out
the causes of that 2k and repeating until you get traction, but doing
it in one pass would be much happier.
> For post-processing, I was considering that. I came to conclusion that some
> examples would be useful to see how effective such heuristics are. I'm not
> sure if the lines would be identical - timestamps and PIDs would probably
> vary. In some cases the postprocessing might remove something important from
> the logs - it seems prudent to keep original version, but then it could
> defeat the point of deduping.
Just to be clear - I don't mean de-duping, which just encourages
people to leave things alone. I'm thinking more in terms of
cross-test analysis. If someone is logging 15 lines into EVERY test,
then we should probably visit whether we can remove it entirely or
condense it into one line. We definitely shouldn't elide it in any
way.
Maybe another angle on this is to have thresholds for per-test log
output, and additional thresholds per test target. We might need to
loosen the threshold per test target as new tests are added, but that
loosening could provide the signal to analyze for common log lines
again.
-scott