Modelling a filesystem with a GraphDB

37 views
Skip to first unread message

Mark McKeown

unread,
Mar 27, 2023, 7:28:18 PM3/27/23
to gremli...@googlegroups.com

Hi,
    I have some newbie questions - I am looking at modelling parts of a filesystem using a GraphDB. I want to store metadata about parts of the filesystem in the GraphDB.

Is this a good use of GraphDB and Gremlin?

If I want to store information about a path /foo/bar/blah it seems I should have vertices foo, bar and blah and link them together?

How would I query if /foo/bar/blah exists in the GraphDB? I could have another path /soup/cheese/blah for example which would mean a node with the same name?

If I want to add /foo/bar/blah as it does not exist, but /foo/bar does exist how do I query for existing nodes? Can I query and add in the same operation?

Are there any tutorials that provide good examples for this type of case?

cheers
Mark


https://wandisco.com/

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Stephen Mallette

unread,
Mar 29, 2023, 6:44:15 AM3/29/23
to gremli...@googlegroups.com
HI there - some answers inline below:

On Mon, Mar 27, 2023 at 7:28 PM 'Mark McKeown' via Gremlin-users <gremli...@googlegroups.com> wrote:

Hi,
    I have some newbie questions - I am looking at modelling parts of a filesystem using a GraphDB. I want to store metadata about parts of the filesystem in the GraphDB.

Is this a good use of GraphDB and Gremlin?

a file system is a tree and a tree is a graph so the use case fits.  
  
If I want to store information about a path /foo/bar/blah it seems I should have vertices foo, bar and blah and link them together?

yes, that's likely the way to do it.
 
How would I query if /foo/bar/blah exists in the GraphDB? I could have another path /soup/cheese/blah for example which would mean a node with the same name?

"blah" doesn't need to refer to the same vertex. here's the graph you described. you can see in the final query that traverses the little graph that the "blah" vertices are unique with different generated identifiers.

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('directory').property('name','foo').as('v1').
......1>   addV('directory').property('name','bar').as('v2').
......2>   addV('directory').property('name','blah').as('v3').
......3>   addV('directory').property('name','soup').as('v4').
......4>   addV('directory').property('name','cheese').as('v5').
......5>   addV('directory').property('name','blah').as('v6').
......6>   addE('child').from('v1').to('v2').
......7>   addE('child').from('v2').to('v3').
......8>   addE('child').from('v4').to('v5').
......9>   addE('child').from('v5').to('v6').iterate()
gremlin> g.V().has('name','foo').out().out().elementMap()
==>[id:4,label:directory,name:blah]
gremlin> g.V().has('name','soup').out().out().elementMap()
==>[id:10,label:directory,name:blah]

 
If I want to add /foo/bar/blah as it does not exist, but /foo/bar does exist how do I query for existing nodes? Can I query and add in the same operation?

yes, here's a simplified example. it uses coalesce() to check for a path to a vertex with name of "blah" and then adds it if it doesn't exist.

gremlin> g.addV('directory').property('name','foo').as('v1').
......1>   addV('directory').property('name','bar').as('v2').
......2>   addE('child').from('v1').to('v2').iterate()    
gremlin> g.V().has('name','foo').
......1>   out().has('name','bar').as('v1').
......2>   coalesce(out().has('name','blah'),
......3>            addV('directory').property('name','blah').as('v2').
......4>            addE('child').from('v1').to('v2').
......5>            select('v2'))
==>v[5]
gremlin> g.V().has('name','foo').
......1>   out().has('name','bar').as('v1').
......2>   coalesce(out().has('name','blah'),
......3>            addV('directory').property('name','blah').as('v2').
......4>            addE('child').from('v1').to('v2').
......5>            select('v2'))
==>v[5]

 
Are there any tutorials that provide good examples for this type of case?

There is a recipes on trees: https://tinkerpop.apache.org/docs/current/recipes/#tree but i'm not sure what else is out there in this regard. Since you are new, I'd suggest you start learning by reading Kelvin Lawrence's book here:


and by playing around with Gremlin and your use case in Gremlify:

 

cheers
Mark


https://wandisco.com/

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CALK6y0X4Y7EkE4j3OsLdC2ByUC-fs76%3D6-0VvV41MA4y%3DaFfiA%40mail.gmail.com.

Mark McKeown

unread,
Mar 30, 2023, 3:55:51 AM3/30/23
to gremli...@googlegroups.com
Thanks Stephen, this is a great starter for me.

cheers
Mark



--
MARK MC KEOWN DEVELOPER

Reply all
Reply to author
Forward
0 new messages