Gremlin Python to many open files

889 views
Skip to first unread message

Debasish Kanhar

unread,
Oct 6, 2017, 11:04:34 AM10/6/17
to Gremlin-users
Hi all,

I wrote my own funtion to run queries on JanusGraph.

Now, I call the function from outer loop which calls it iteratively for each node and vertex. The query basically checks for existance of node or edge before adding it.

My function is as follows:

def __run(self, query):
    """
    The backend method which executes a query. The query is run asynchronously with Gremlin Server interface.

    Used package Goblin's Cluster as OGM for querying Janus DB.

    Args:
        query (str): The Query to run on Janus DB

    Returns:
        messages (list): The list of responses, usually a single element list.
    """
    messages = []
    asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
    try:
        loop = asyncio.get_event_loop()
    except RuntimeError:
        logger.debug("Couldn't get event loop for current thread. Creating a new event loop to be used!")
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)

    async def run(loop, query):
        cluster = await Cluster.open(loop)
        client = await cluster.connect()
        resp = await client.submit(query)
        async for msg in resp:
            messages.append(msg)
        await cluster.close()

    loop.run_until_complete(run(loop, query))

    return messages


The number of nodes is 304 & Edges is 631.

But while running the query I face the following error:

Traceback (most recent call last):
  File "/share/janusgraph-0.1.1-hadoop2/debasish/CodeBase/utils/JanusGraphDBCon.py", line 236, in __data_pushing_wrapper
    vertex_dict = self.__add_vertex(property_information, label, id=node_id_name)
  File "/share/janusgraph-0.1.1-hadoop2/debasish/CodeBase/utils/JanusGraphDBCon.py", line 303, in __add_vertex
    ret = self.query(q)
  File "/share/janusgraph-0.1.1-hadoop2/debasish/CodeBase/utils/JanusGraphDBCon.py", line 101, in query
    response = self.__run(query)
  File "/share/janusgraph-0.1.1-hadoop2/debasish/CodeBase/utils/JanusGraphDBCon.py", line 167, in __run
    loop = asyncio.get_event_loop()
  File "/home/prjadmin/anaconda3/lib/python3.6/asyncio/events.py", line 678, in get_event_loop
    return get_event_loop_policy().get_event_loop()
  File "/home/prjadmin/anaconda3/lib/python3.6/asyncio/events.py", line 581, in get_event_loop
    self.set_event_loop(self.new_event_loop())
  File "/home/prjadmin/anaconda3/lib/python3.6/asyncio/events.py", line 599, in new_event_loop
    return self._loop_factory()
  File "/home/prjadmin/anaconda3/lib/python3.6/site-packages/uvloop/__init__.py", line 35, in _loop_factory
    return new_event_loop()
  File "/home/prjadmin/anaconda3/lib/python3.6/site-packages/uvloop/__init__.py", line 19, in new_event_loop
    return Loop()
  File "uvloop/loop.pyx", line 96, in uvloop.loop.Loop.__cinit__ (uvloop/loop.c:7554)
OSError: [Errno 24] Too many open files

I tried debugging myself, and thought that is because of linux limitations on number of open files.

I did 
ulimit -n
 and it returned 1024. 

I followed the link https://www.tecmint.com/increase-set-open-file-limits-in-linux/ to set number of open files very high but the problem still persists.

Package imported 
from goblin import Cluster

Any idea how to tackle this problem?

I'm using gremlin_python 3.2.3, and hence don't have gremlin_python's Client to create OGM connection to Janus!

Thanks

David Brown

unread,
Oct 7, 2017, 12:11:04 PM10/7/17
to Gremlin-users
Hello,

It seems the problem here doesn't have anything to do with goblin/aiogremlin or the TinkerPop. Instead, it looks like uvloop opens a file descriptor when you create a new instance, which leads to the OS error. I think on linux you can have 1024 open file descriptors at once by default. I can reproduce this error without anything TinkerPop related:

In [1]: import uvloop

In [2]: import asyncio

In [3]: asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

In [4]: for x in range(1025):
   ...:     loop = asyncio.new_event_loop()
   ...:     asyncio.set_event_loop(loop)
   ...:     
deallocating an open event loop
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-4-b20ee820c9fa> in <module>()
      1 for x in range(1025):
----> 2     loop = asyncio.new_event_loop()
      3     asyncio.set_event_loop(loop)
      4 

/usr/lib/python3.5/asyncio/events.py in new_event_loop()
    640 def new_event_loop():
    641     """Equivalent to calling get_event_loop_policy().new_event_loop()."""
--> 642     return get_event_loop_policy().new_event_loop()
    643 
    644 

/usr/lib/python3.5/asyncio/events.py in new_event_loop(self)
    591         loop.
    592         """
--> 593         return self._loop_factory()
    594 
    595 

~/.virtualenvs/goblin/lib/python3.5/site-packages/uvloop/__init__.py in _loop_factory(self)
     33 
     34     def _loop_factory(self):
---> 35         return new_event_loop()

~/.virtualenvs/goblin/lib/python3.5/site-packages/uvloop/__init__.py in new_event_loop()
     17 def new_event_loop():
     18     """Return a new event loop."""
---> 19     return Loop()
     20 
     21 

~/.virtualenvs/goblin/lib/python3.5/site-packages/uvloop/loop.pyx in uvloop.loop.Loop.__cinit__ (uvloop/loop.c:7554)()

OSError: [Errno 24] Too many open files

You will need to control the number of loops being created. If your app is single threaded, I would try to share a global event loop. If it is multi-threaded, you could use one loop per thread, or alternatively you could call the loop's `close` method after you run it.

Debasish Kanhar

unread,
Oct 10, 2017, 10:59:53 AM10/10/17
to Gremlin-users
Thanks for that. I was creating it each time I was running a query. I was able to resolve it to some extent, but guess your option is better one. If I may rephrase, each time a new thread starts, I ll check if loop event is present or not, if not create, else fetch it, and use it right?

Augusto Will

unread,
Oct 11, 2017, 9:00:56 PM10/11/17
to Gremlin-users
Same problem here. I'm trying to read from mysql database and insert in Janus. 
After some thousand of edges inserted, I have this too much open files. This is killing me.

Marko Rodriguez

unread,
Oct 11, 2017, 9:15:36 PM10/11/17
to gremli...@googlegroups.com
There is a ulimit settings or something like that for Linux based operating systems.


Dunno — like Google the problem and read how to solve it.

Marko.


On Oct 11, 2017, at 7:00 PM, Augusto Will <pw...@pwill.com.br> wrote:

Same problem here. I'm trying to read from mysql database and insert in Janus. 
After some thousand of edges inserted, I have this too much open files. This is killing me.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/d76988fc-7e70-4f50-bf01-3c2cd9d9ac06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Mallette

unread,
Oct 12, 2017, 6:02:36 AM10/12/17
to Gremlin-users
the ulimit setting is important but it also goes back to how the library is being used as David Brown's post explained. addressing those two things should solve the problem

On Wed, Oct 11, 2017 at 9:15 PM, Marko Rodriguez <okram...@gmail.com> wrote:
There is a ulimit settings or something like that for Linux based operating systems.


Dunno — like Google the problem and read how to solve it.

Marko.
On Oct 11, 2017, at 7:00 PM, Augusto Will <pw...@pwill.com.br> wrote:

Same problem here. I'm trying to read from mysql database and insert in Janus. 
After some thousand of edges inserted, I have this too much open files. This is killing me.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/061DBE94-0D96-4B98-A5E7-45B1A2C79F23%40gmail.com.

David Brown

unread,
Oct 12, 2017, 12:36:42 PM10/12/17
to Gremlin-users
Yeah I wouldn't consider this to be a valid reason to up the system limits for number of open files. Instead, I would clean up your application to take care of these kinds of resource leaks. In general, if you guys are running ETL (Relational -> GraphDB) you need to be very careful with the way you handle all of the networking related code (connections, even loops, etc.) to avoid these kinds of problems, especially if you are using threads or processes with something like Python's multiprocessing module. I don't know your use case, but I typically will use a task queue like Celery for something like this. Then I can use a single connection pool per db per process, a single loop, etc., each of which is set up on process init and torn down on process shutdown.

Good luck!


On Thursday, October 12, 2017 at 3:02:36 AM UTC-7, Stephen Mallette wrote:
the ulimit setting is important but it also goes back to how the library is being used as David Brown's post explained. addressing those two things should solve the problem
On Wed, Oct 11, 2017 at 9:15 PM, Marko Rodriguez <okram...@gmail.com> wrote:
There is a ulimit settings or something like that for Linux based operating systems.


Dunno — like Google the problem and read how to solve it.

Marko.
On Oct 11, 2017, at 7:00 PM, Augusto Will <pw...@pwill.com.br> wrote:

Same problem here. I'm trying to read from mysql database and insert in Janus. 
After some thousand of edges inserted, I have this too much open files. This is killing me.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages