Metadata Management

Eric Keeney

unread,

Aug 3, 2015, 6:57:56 PM8/3/15

to Neo4j

Has anyone heard of using Neo4j to manage metadata and lineage for a data warehouse or information management/data governance scenarios? I know this is a fairly open question, but just wanted to get a sense from the group if anyone is applying Neo4j to manage data lineage.

Thanks in advance!

-Eric

Michael Hunger

unread,

Aug 3, 2015, 7:18:54 PM8/3/15

to ne...@googlegroups.com, Byron Ruth, david...@oldhat.org, William Gorder, David Fauth

I think Dave, Bill and Byron can help you there.

We also have some other use-cases in this area. Let me see what else I can find out.

Cheers, Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

M. David Allen

unread,

Aug 3, 2015, 9:36:16 PM8/3/15

to Michael Hunger, erick...@gmail.com, ne...@googlegroups.com, Byron Ruth, William Gorder, David Fauth

There's software for managing data lineage on github, based on neo4j, here: https://github.com/plus-provenance/plus

Managing data lineage for data warehouses and governance scenarios is something that a lot of government agencies we work with want to do, that being said most implementations are fairly custom. Some issues are subject to ongoing research (there's a reasonably healthy data provenance/lineage research community out there that we participate in).

I did a tech talk to a graph meetup in Arlington on this topic, you can see the slides here: http://www.slideshare.net/MDavidAllen1/family-tree-of-data-provenance-and-neo4j-46405323

The question is broad, perhaps if you'd like to refine I can provide a more useful pointer in a particular direction?

But yes -- there are people using neo4j to manage lineage.

--

M. David Allen
Mobile: (804) 787-0289

Eric Keeney

unread,

Aug 4, 2015, 10:41:37 AM8/4/15

to Neo4j, michael...@neotechnology.com, erick...@gmail.com, b...@devel.io, william...@gmail.com, david...@neotechnology.com, david...@oldhat.org

Thank you Michael and David for replying to my post. My question was intended to get a sense from the group if graphdb technology had been applied to the metadata lineage space and you all have answered that question. It sounds like there are no commercially available products that use graphdb for lineage. If there are please let me know of them or point me in the right direction to get more information.

I'll review plus-provenance on github and go from there.

Thanks again!

Eric

Péterson Sampaio Procópio Júnior

unread,

Aug 5, 2015, 3:16:36 PM8/5/15

to neo4j, michael...@neotechnology.com, erick...@gmail.com, b...@devel.io, william...@gmail.com, david...@neotechnology.com, david...@oldhat.org

Hi Eric

I'm working experimentally for some time in importing a data warehouse metadata into Neo4j, and getting information from it.

I even submitted a visualization to 2014 Graph Connect Contest:

https://twitter.com/petjr/status/517496119937757184/photo/1

If you want to share some thoughts on it, please net me know.

Best regards,

Péterson

ps: for more information, here is the paragraph I sent to the contest:

There are many ways to tackle the complexity of a system. For a data warehouse (DW) system in particular, one way to achieve this goal is by taking a closer look on it's metadata. Typically, a DW metadata describe the different system's objects, their properties and the relationships between them. Objects, in this context, are DW concepts, such as, for example, cube, operational data store and , source system.
In this visualization, we focus on the "data flows" relationship between DW objects. We show a real DW data flow network. The network is a directed graph, in which nodes represent DW objects, and edges represent the "data flows to" relationship between objects, indicating that data is transferred from one object to another. The nodes are colored according to the objects they represent. For example, source systems are colored in pink, operational data stores are colored in light blue, cubes are colored in dark blue and queries are colored in red.
Through this visualization we are able to view the full path that data takes in the data warehouse, from it's origin in the source systems until it is finally presented to the end user. We are also able to view the complexity of the various existing ETL processes, and how the existing data marts are related to each other.
To construct this visualization we collected the metadata of a real data warehouse system that runs under the SAP Netweaver Business Warehouse Platform (SAP BW). We then stored this information in the Neo4j graph database, which allowed us great flexibility to work with and query the data. For the visualization, we used Gephi.

Péterson Sampaio Procópio Júnior

unread,

Aug 5, 2015, 3:17:33 PM8/5/15

to neo4j, michael...@neotechnology.com, erick...@gmail.com, b...@devel.io, william...@gmail.com, david...@neotechnology.com, david...@oldhat.org

By the way, Provenance seems to be a nice tool, thanks for sharing.

Reply all

Reply to author

Forward