otree performance gets slower and slower over experiment

593 views
Skip to first unread message

Scott

unread,
Aug 6, 2021, 6:12:25 AM8/6/21
to oTree help & discussion

Hi there,

I’m having some performance issues with an otree app. I’m running otree 3.4.0.

  • Group size: 5
  • Session size 75 (15 groups)
  • Rounds: 26

Players are grouped as they arrive and stay in those groups throughout.

I’ve switched browser bots on for testing, and launch browsers on 25 machines in the lab, wait for those to complete, then run another 25, let them complete and then run the last 25.

These are the average time in seconds for each browser bot to complete the experiment, followed by the average CPU% for otree/database processes:

1st batch:             410 seconds       80/40

2nd batch:            570 seconds       85/33

3rd batch:             860 seconds       90/16

 

If I start another identical session after this one completes, I get these times:

1st batch:             1000 seconds    90/15

2nd batch:            ….

 

If I archive the data… next batch then take 1460 seconds. 

I then deleted all the data and the next session took 1800 seconds.

Why is it taking longer and longer to run the experiment? I also seems not to be database related, as the otree process is taking more and more CPU, and the database less and less.

When we run this in the live environment with similar sized sessions, the same thing happens, and we have people dropping out as the experiment moves on.

Ideas welcome

Scott

Ebenezer Yakbain

unread,
Aug 6, 2021, 11:57:36 AM8/6/21
to oTree help & discussion
Hey Scott,

I can't offer any useful explanation for why this happens, but I have noticed it as well. I have to regularly run "otree resetdb" and backup my database, otherwise oTree gets bloated and nothing works (people get stuck on wait pages, people get forwarded after time outs very slowly, and chat is delayed and requires refresh for participants to see new messages). 

Weirdly, this never happened in earlier versions of oTree which used worker dynos and redis and I ran a much larger study over a longer period of time.

Best,
E

Chris @ oTree

unread,
Aug 6, 2021, 2:48:20 PM8/6/21
to Scott, oTree help & discussion
Hi, I firstly I would encourage you to upgrade to oTree Lite (oTree 5) if possible. oTree Lite has better performance and a much smaller codebase, so if something goes wrong it is easier for me to narrow it down. Performance issues like this can end up taking a long time to investigate. 

Sent from my phone

On Aug 6, 2021, at 4:12 AM, Scott <s.j.v...@exeter.ac.uk> wrote:


--
You received this message because you are subscribed to the Google Groups "oTree help & discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to otree+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/otree/5f090b72-1710-4183-8826-6ee7f827712cn%40googlegroups.com.

Chris @ oTree

unread,
Aug 6, 2021, 2:59:32 PM8/6/21
to Scott, oTree help & discussion
(By codebase I mean oTree + dependencies .... oTree Lite has a much smaller dependency tree. With oTree 3 a lot of my performance investigations go deep into Django, Twisted, Channels, various plugins, etc.....

Sent from my phone

On Aug 6, 2021, at 12:48 PM, ch...@otree.org wrote:

Hi, I firstly I would encourage you to upgrade to oTree Lite (oTree 5) if possible. oTree Lite has better performance and a much smaller codebase, so if something goes wrong it is easier for me to narrow it down. Performance issues like this can end up taking a long time to investigate. 

Scott

unread,
Aug 10, 2021, 11:01:50 AM8/10/21
to oTree help & discussion
Hi Chris,

I have converted the app to otree 5, but when I run the bots against it I get this exception in the trace (full trace attached):

websockets.exceptions.ConnectionClosedOK: code = 1001 (going away), no reason

... and some pages get stuck on the grouping page when they throw that message.

I suspect this is a library incompatibility issue, I'm using a virtual environment:

aiofiles (0.6.0)
asgiref (3.4.1)
click (7.1.2)
h11 (0.12.0)
httptools (0.2.0)
itsdangerous (1.1.0)
MarkupSafe (1.1.1)
otree (5.2.8)
pip (9.0.1)
pkg-resources (0.0.0)
psycopg2 (2.9.1)
python-dotenv (0.19.0)
python-multipart (0.0.5)
PyYAML (5.4.1)
setuptools (39.0.1)
six (1.16.0)
SQLAlchemy (1.3.22)
starlette (0.14.1)
typing-extensions (3.10.0.0)
uvicorn (0.13.4)
uvloop (0.15.3)
watchgod (0.7)
websockets (8.1)
wheel (0.37.0)
WTForms (2.3.3)
WTForms-SQLAlchemy (0.2)

I'm using otree version 5.2.8, python 3.7.5, ubuntu server 18.04

Any help appreciated
Scott
stack_trace.txt

Chris @ oTree

unread,
Aug 10, 2021, 1:48:18 PM8/10/21
to oTree help & discussion
I don't think the ConnectionClosedOK error is significant; as far as I can tell this is an error that happens sometimes with things like navigating away from a page when a connection is being established. There is an issue logged for it in the underlying websocket libraries oTree uses and I will update when it gets fixed.

Is there any other error in the logs? Maybe the ConnectionClosedOK is the inner/outer exception and there is another one.

Otherwise, can you send me your project or a minimal example i can use to reproduce this issue?

How about command line bots? Do they have the same performance issue you observed with oTree 3.x?

Scott

unread,
Aug 13, 2021, 7:36:18 AM8/13/21
to oTree help & discussion
Hi Chris,

The issue with the ConnectionClosedOK error is that it shows in the client's browser, see attached screen shot. I have attached a project that reproduces this affect - the grouping method waits for all participants at the start of each round and randomises them. (The full code caters for dropouts, but that's not important here).

I have two issues which appear on both Windows and Linux. Create a session of size 8, and run it in browsers.

  1. The ConnectionClosedOK error will appear in at least one browser at some point.
  2. The first round runs fine, but subsequent rounds all hang for a minute after letting the first group go and recognising the second group is still waiting.

Hope you can reproduce this

Thanks for looking
Scott
conclosed.png
test_connection_closed.zip

Scott

unread,
Aug 16, 2021, 8:42:03 AM8/16/21
to oTree help & discussion
I've spun this up on Heroko with otree version 5, and get the same error. Anyone else seen this issue?

FWIW the command line bots don't show the error, just the web bots

Chris @ oTree

unread,
Aug 17, 2021, 6:27:16 AM8/17/21
to Scott, oTree help & discussion
Hi, I will investigate this when I have a chance, but it will require me to dig in since it’s not a simple issue. Also may require waiting for the underlying web sockets issue in a 3rd party library to get fixed.

Sent from my phone

On Aug 16, 2021, at 6:42 AM, Scott <s.j.v...@exeter.ac.uk> wrote:



ccr...@gmail.com

unread,
Nov 18, 2021, 6:19:06 PM11/18/21
to oTree help & discussion
Hi all -

I would like to add to this conversation, saying that running oTree 5.4.1 devserver on Windows, I have encountered this "ConnectionClosedOK" error while running a session which uses browser bots to test a game which has a group_by_arrival_time method.  It appears on the page which uses group_by_arrival time, and has been able to be cleared each time by refreshing the browser window.

I'll investigate more to see if it happens without bots or not (I haven't seen it without bots yet).

Would it still be useful to try to make a minimal working example of this?

Thanks,
--Chris

Geoffrey Castillo

unread,
Feb 2, 2022, 3:11:56 AM2/2/22
to oTree help & discussion

Hi everyone—just wanted to chime to say that we're also seeing this error in our lab.

Looking at the logs we first get some 

starlette.requests.ClientDisconnect, 

then 

websockets.exceptions.ConnectionClosedOK: code = 1001 (going away), no reason

Chris @ oTree

unread,
Feb 2, 2022, 1:31:30 PM2/2/22
to Geoffrey Castillo, oTree help & discussion
Hi, I’m working on this now. It might be fixed in the latest version of starlette, according to my investigations. Anyway I am working on the upgrade: https://github.com/encode/starlette/discussions/1457

Sent from my phone

On Feb 2, 2022, at 4:11 PM, Geoffrey Castillo <geoffrey...@gmail.com> wrote:



Sam

unread,
Apr 4, 2022, 11:04:54 AM4/4/22
to oTree help & discussion
Hi all, does anyone know if this issue has been resolved?

Chris @ oTree

unread,
Apr 4, 2022, 11:11:19 AM4/4/22
to oTree help & discussion
Yes as far as I can tell it no longer occurs in oTree 5.8.
Reply all
Reply to author
Forward
0 new messages