I'm running into some issues with logging from mappers, and I'm hoping someone can help me figure them out. Relevant context (please let me know if I missed anything): I'm on Python 3.6.4, working on a map-reduce job that gets run on EMR, with end-to-end unit tests that use the inline runner, and I'm working on bumping mrjob from 0.6.6 to 0.6.8. Up until now I've had no problems logging within the mapper to
sys.stderr via
util.log_to_stream (as a shortcut for attaching a stream handler for
sys.stderr etc), and in fact on 0.6.8 that still seems to work when running on EMR. However all of my unit tests are failing, and as far as I can tell it's because the inline runner reassigns
sys.stderr to a file handle, and the redirected stream now expects only bytes given to it, which breaks with something as simple as
self.log.info("Hello"). One of the things I've tried is to call
self.set_up_logging() in my MRJob class with
codecs.getwriter("utf_8")(self.stderr) as the stream, but I don't know if that'll work as expected on EMR (haven't tested it yet though). Not to mention, even with that workaround I still see issues in modules/libs my MRJob class depends on, which sometimes log to (what they think is)
sys.stderr, or just use the warnings lib which also goes to stderr. Another workaround might be to switch to using the local runner in unit tests and/or use
sandbox() to redirect stderr, but given that I also use the inline runner for manually testing the job, I'd like to figure out how to make it work again. Any tips would be greatly appreciated :)