Cheers,
/peter neubauer
G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer
If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j
On Thu, Apr 5, 2012 at 4:50 PM, Marius Stroe <laurent...@gmail.com> wrote:
> Hello guys,
>
>
> I apologize if this a duplicate message. I can't find the previously sent in
> this group.
>
> I would like to ask for your help for the following use case:
> * I have 10K keywords and a lot of tweets coming from twitter. Let's say we
> have a retweet: @b retweets @a's url.
> * Thus, an edge like (@a, @b, url) is created.
> * The fact is that the tweet may match to multiple keywords, thus the edge
> (@a, @b, url) may match to different keywords.
> * The output should be an influence graph for each keyword.
> * Every couple of seconds / minutes a keyword is being chosen and the
> influence graph should be fetched from db, processed and some results
> sent...
>
> Estimates: 100 mil edges and since the graph is really sparse the nodes
> would be around 25 mil.
>
> I couldn't find a way to use labels for vertexes and edges and make
> something blazing fast.
>
> Also, is the neo4j and bulbs a good combination for Python?
>
> Is this going to scale beyond 100 mil vertexes? Is sharding a tough task?
>
> Any help would be appreciated. :)
>
>
>
> Thank you,
>
> - Marius