Import works in Jupyter but not from Terminal

115 views
Skip to first unread message

Drew Forbes

unread,
Feb 10, 2021, 2:56:44 PM2/10/21
to modin-dev
Hi all, I'm just getting started on my modin journey and seeing if it's a good fit for a current project. I've got modin(with both modin[ray] and modin[dask] ) installed, and I'm trying to run some basic POC tests.

I've found that I can import modin and read in a csv with no issue when within a jupyter notebook (spawned from my terminal), but when I try to run a script or just python in the terminal, I receive the following error loop upon import:

"OSError: [Errno 24] Too many open files"

This is before I even attempt to open files, just immediately after the import I get a ton of these and have to CTRL-Z my way out of it. 

Just wondering if anyone has run into this before? I'm on a mac, using Python3, and I've tried closing and restarting the terminal to no avail.

Thanks y'all! Modin looks great and I'm excited to get started.

Devin Petersohn

unread,
Feb 10, 2021, 4:14:38 PM2/10/21
to Drew Forbes, modin-dev
Hi Drew,

How did you import and run Modin? The "too many open files" issue can come from either Dask or Ray, because of how they communicate between workers.

Dask has this fix:

The fix for Ray is essentially identical, since it is a limitation set by the operating system.

Let me know if that helps!

Devin

--
You received this message because you are subscribed to the Google Groups "modin-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modin-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modin-dev/5662c211-5fba-4dc1-85d4-e1497396b80en%40googlegroups.com.

Drew Forbes

unread,
Feb 10, 2021, 5:03:06 PM2/10/21
to modin-dev
Thanks Devin. 

I imported with a simple "pip3 install modin", then did the same with modin[dask] and modin[ray]. I've also got dask installed separately to modin.

I'm running just by opening a Python3 interpreter from the terminal and running "import modin.pandas as pd". That's enough to start the error loop. Although like I said, I don't have this problem with a local Jupyter Notebook

I just tried raising my open file limit with "sudo launchctl limit maxfiles 1048576 1048576", which I'd really hope would be enough, but I'm getting the same exact error. Confirmed that the limits have raised in a new terminal window.

Devin Petersohn

unread,
Feb 11, 2021, 11:27:53 AM2/11/21
to Drew Forbes, modin-dev
Thanks Drew, would you mind pasting the entire terminal window input and output here (a screenshot will also do)?

Devin

Drew Forbes

unread,
Feb 12, 2021, 12:07:13 AM2/12/21
to modin-dev
drewforbes@Drews-MacBook-Pro ~ % python3
Python 3.8.2 (default, Nov  4 2020, 21:23:28)
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import modin.pandas as pd
UserWarning: The Dask Engine for Modin is experimental.
UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 49752 instead
>>> distributed.worker - WARNING - Heartbeat to scheduler failed
Exception in callback BaseAsyncIOLoop._handle_events(15, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(15, 1)>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/netutil.py", line 266, in accept_handler
    connection, address = sock.accept()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/socket.py", line 292, in accept
OSError: [Errno 24] Too many open files
distributed.worker - WARNING - Heartbeat to scheduler failed
Exception in callback BaseAsyncIOLoop._handle_events(15, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(15, 1)>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/netutil.py", line 266, in accept_handler
    connection, address = sock.accept()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/socket.py", line 292, in accept
OSError: [Errno 24] Too many open files
distributed.worker - WARNING - Heartbeat to scheduler failed
Exception in callback BaseAsyncIOLoop._handle_events(15, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(15, 1)>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/netutil.py", line 266, in accept_handler
    connection, address = sock.accept()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/socket.py", line 292, in accept
OSError: [Errno 24] Too many open files
distributed.worker - WARNING - Heartbeat to scheduler failed
Exception in callback BaseAsyncIOLoop._handle_events(15, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(15, 1)>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/netutil.py", line 266, in accept_handler
    connection, address = sock.accept()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/socket.py", line 292, in accept
OSError: [Errno 24] Too many open files
distributed.worker - WARNING - Heartbeat to scheduler failed
Exception in callback BaseAsyncIOLoop._handle_events(15, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(15, 1)>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/netutil.py", line 266, in accept_handler
    connection, address = sock.accept()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/socket.py", line 292, in accept
OSError: [Errno 24] Too many open files
distributed.worker - WARNING - Heartbeat to scheduler failed
Exception in callback BaseAsyncIOLoop._handle_events(15, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(15, 1)>
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/platform/asyncio.py", line 189, in _handle_events
    handler_func(fileobj, events)
  File "/Users/drewforbes/Library/Python/3.8/lib/python/site-packages/tornado/netutil.py", line 266, in accept_handler
    connection, address = sock.accept()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/socket.py", line 292, in accept
OSError: [Errno 24] Too many open files

zsh: suspended  python3
drewforbes@Drews-MacBook-Pro ~ %

Devin Petersohn

unread,
Feb 15, 2021, 9:15:52 AM2/15/21
to Drew Forbes, modin-dev
I see an existing Dask cluster is alive on the machine. I am unfortunately only aware of one way to fix the issue: rebooting the machine. It may be that the number of Dask schedulers alive on the machine at the same time is causing this issue. Sometimes if Dask doesn't shut down completely this can happen in my experience.

Does rebooting fix the problem?

Devin

Drew Forbes

unread,
Feb 16, 2021, 1:28:35 PM2/16/21
to Devin Petersohn, Drew Forbes, modin-dev
Nope, just did a restart and immediately tried again. Same exact error. 

I agree that it's likely another dask scheduler running somewhere, but I'm not aware of any programs I'm using that actually make use of dask.

If I'm feeling motivated I'll keep trying to look into it on my end, thank you for your troubleshooting help.

You received this message because you are subscribed to a topic in the Google Groups "modin-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/modin-dev/pC3KK6nFUDI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to modin-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modin-dev/CAFDZhWDgt%3DzJ%2Brzcs8c%2BGjOS6isi5%3DMEfHBagw8PVviG39zOnQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages