PYTHONUNBUFFERED - Good place to set this environment varialble

354 views
Skip to first unread message

Sourya Kakarla

unread,
Aug 6, 2022, 11:08:33 AM8/6/22
to kaldi-developers
Hello,

I have observed that I have been missing some errors as stdout/stderr is buffered by default in python (before 3.7).

I am using a machine with python versions 2.7.17, 3.6.9 and I see many people/institutions using similar python configs. Therefore, I think that making sure kaldi scripts running python unbuffered will help in catching some errors which might be invisible due to the buffered stdout/stderr. I can come up with a specific example if needed where this happens. This would also be beneficial for more real-time tracking of logs.

For that reason, I believe setting the PYTHONUNBUFFERED environment variable in the main recipes like WSJ, TEDLIUM would be helpful.

Let me know if I am on the right path. If so, what would be a good place to add the `export PYTHONUNBUFFERED=1` line? Is the beginning of run.sh a good place to add this and submit a PR?

Regards,
Sourya



Jan Yenda Trmal

unread,
Aug 6, 2022, 11:10:07 AM8/6/22
to kaldi-de...@googlegroups.com
I'd suggest path.sh

--
visit http://kaldi-asr.org/forums.html to find out how to join.
---
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/1d1b8f84-83d8-4196-913b-5faf7d6d478cn%40googlegroups.com.

Sourya Kakarla

unread,
Aug 6, 2022, 11:14:37 AM8/6/22
to kaldi-developers

Thank you Yenda for the prompt response.

When submitting the PR, should I stick to a few recipes from "egs" like WSJ, TEDLIUM or is it a good idea to add this line for each of the "path.sh" in all the recipes applicable in "egs"?


Regards,
Sourya

Sourya Kakarla

unread,
Aug 8, 2022, 1:20:48 AM8/8/22
to kaldi-developers
Create a pull request for WSJ, TEDLIUM to add the env variable in path.sh scripts.

Am I on the right path? If so, I am open to adding it for all the examples' path.sh scripts where it might be relevant.

Regards,
Sourya

Jan Yenda Trmal

unread,
Aug 8, 2022, 9:27:10 AM8/8/22
to kaldi-de...@googlegroups.com
hi, that looks ok. 
I would prefer to wait for some time with more modifications because this can have performance implications. 
I'm not sure what errors you missed initially -- it might be better to figure out disabling buffering only for stderr...
y.


Sourya Kakarla

unread,
Aug 8, 2022, 11:31:20 AM8/8/22
to kaldi-developers
Thanks for the response.

I am curious to know more about performance implications. Which aspects of kaldi's outputs/logs are text heavy where performance can be impacted significantly with the unbuffered option? Would any kind of benchmarking help to test the implications?

I think I have seen it's difficult to observe even stdout logs sometimes in real-time when tracking a run of experiment without the unbuffered option. Would you like me to a reproducible example of that?

The error at the end of this log is the one that led me to this. The error was only visible when the unbuffered option was enabled.

Meanwhile, let me see if I can find a way to set the option only for stderr.

Regards,
Sourya

Sourya Kakarla

unread,
Aug 10, 2022, 8:32:47 AM8/10/22
to kaldi-developers
So far I have not found a simple way to set the unbuffered option for only stderr by using an environment variable.

To do that we might need to do it in the python code which might be too complicated to enforce across all the python scripts.

Also, bumping the questions I asked in my previous post.

Regards,
Sourya

Jan Yenda Trmal

unread,
Aug 10, 2022, 9:05:33 AM8/10/22
to kaldi-developers
as for your question w.r.t.  why it should impose substantial performance loss -- I'm worried about cases where the script is used as a filter, printing out data on stdout that are fed (via '|') into another data consumer.
y.


Sourya Kakarla

unread,
Aug 10, 2022, 10:15:36 AM8/10/22
to kaldi-developers
Thanks for your response.

I understand. Will look into that.

Regards,
Sourya

Sourya Kakarla

unread,
Aug 16, 2022, 4:04:42 AM8/16/22
to kaldi-developers
Hi,

I have done some investigation into the buffering config of python.

The unbuffered option is actually enabled by default (for character level) for python >=3.7 (Source: https://docs.python.org/3/using/cmdline.html#cmdoption-u).
More on the text layer: https://peps.python.org/pep-3116/#text-i-o.

I assume that as kaldi scripts are already being run on python>=3.7, maybe the performance impact isn't that significant?

Regards,
Sourya

Jan Yenda Trmal

unread,
Aug 16, 2022, 4:24:57 AM8/16/22
to kaldi-developers
good point, I will merge this week
y.

Sourya Kakarla

unread,
Aug 16, 2022, 4:29:01 AM8/16/22
to kaldi-developers
Thanks.

Is it a good idea to add the env variable in the path.sh files for other recipes as well? Or should we stick to a few like wsj, tedlium for now?

Regards,
Sourya

Jan Yenda Trmal

unread,
Aug 16, 2022, 4:30:23 AM8/16/22
to kaldi-developers
hm, I think if you are willing to do that, I would merge it.
y.

Sourya Kakarla

unread,
Aug 16, 2022, 4:32:18 AM8/16/22
to kaldi-developers
Yes, I am.

I will go through all the recipes and make the required changes and update here once I am confident about the PR.

Regards,
Sourya
Reply all
Reply to author
Forward
0 new messages