Referencing locations

23 views
Skip to first unread message

Krzy Kli

unread,
Apr 10, 2014, 5:39:37 PM4/10/14
to open-m...@googlegroups.com
Hey Marcus, I would like to hear your thoughts on referencing locations.
I have gone through the code briefly and there seem to be no apparent way of doing that, but of course I might be wrong ;)
Say we have a data structure as follows:


How would you see Open Metadata handling situations where there's a master arbitrary location from which we want to be referencing data, that would ideally be able to override data coming from upstream but also still retain the control for local override disabling.

Also would be worth considering different ways of referencing, like continuous, timed (for example once a day) or manual.

And an offtopic thingy:
When starting playing with OM, after setting up the location I wanted to create some metadata for it. My initial intuition how it would work was as follows:

myLoc= om.Location(path)
myLoc.add_attr("myAttrName",myvalue)
 

Currently we have to refer to the whole package either to get the write method or to create datasets and then dump them to a specific location. Just in my opinion a set of convenience methods for data manipulation on a given Location object would be a nice addition and would give it a bit more of a OO feel to it ;)
Let me know what you think, thanks! ;)


Marcus Ottosson

unread,
Apr 11, 2014, 2:24:23 AM4/11/14
to Krzy Kli, open-m...@googlegroups.com
Hey Krzysztof! Glad to see you here. :)

I read three questions in your post; 

1. Data referencing, of having one set of data appear in multiple locations at once
2. Overriding metadata of parents in children
3. Object oriented access to metadata of a location

1. You're right, there is no way currently. Referencing is something we've spoken of and something pretty cool. I can see a few use-cases for it and I'm working on summarising the idea in RFC20, quite empty at the moment, this thread will help. :) In a nutshell, OM would hardlink datasets and junction groups. In your case, assuming nivea are datasets (i.e. files) there would be a mechanism (off the top of my head referenced_nivea.data = om.Reference(original_nivea)) that would hardlink one dataset from another.

For the sake of documenting, what is your use-case?

2. Overriding metadata is a feature on trial in the current release (0.3), have a look at motivation and idea in RFC12. In a nutshell, any child within a hierarchy would act as an inherited object; inheriting all metadata from above and in effect override any metadata that it contains itself. You could think of it as cascading data, similar to CSS.

3. Dot-notation-style access is also included in this release, and that RFC, but only applies to reading, not writing, at the moment.

>>> from openmetadata import instance
>>> barcelona = instance.Instance(barcelona_path)
>>> barcelona.nivea.data
'Contents of nivea'

Note that this also "inherits" data from parents above. I'm considering whether or not to separate this functionality and make it so that you can use dot-notation on regular objects without inheritance too, since it could potentially be quite useful:

# Psuedo-code
>>> from openmetadata import instance
>>> barcelona = instance.Instance(barcelona_path)
>>> barcelona.myAttrName = myvalue

Does this answer your questions?


--
You received this message because you are subscribed to the Google Groups "Open Metadata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-metadat...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Marcus Ottosson
konstr...@gmail.com

Krzy Kli

unread,
Apr 12, 2014, 2:45:43 PM4/12/14
to open-m...@googlegroups.com, Krzy Kli
Thanks a lot!

My use-case for referencing is straightforward: avoiding duplication of data.
And yes, I agree, the feature would be really cool and powerful!

Dot notation for writing data seems sensible and useful in my opinion, would like to see that in the future:)

Cool, I guess that's it for now.

Cheers,
kk

Marcus Ottosson

unread,
Apr 12, 2014, 2:55:20 PM4/12/14
to Krzy Kli, open-m...@googlegroups.com
Good use-case. :)

About the dot-notation. At the moment, it's only available with Instance objects, which will also "inherit" data from parent Locations.

Instance is separate from other objects, completely isolated in fact, simply because all other objects are "dumb" containers with functionality only relevant to itself. Any interaction with the outside world is handled by functions such as om.pull() and om.dump().

This is so that any interaction will be made explicit and so that interacting with only the object will be both quick and safe.

The current dot-notation of Instance objects breaks this encapsulation, as it allows the objects themselves to access the outside world in order to tell you whether or not the AttributeExists.

Another reason for this separation is so that the interaction can be served external to the local computer; in case you are 5 people all accessing metadata at the same time, it may be a good idea to concentrate the file-system requests into one spot.

RFC13 has all the juicy details on this.

Anyway, I'm conflicted about whether or not to keep the dot-notation as-is. What are your thoughts?

Krzy Kli

unread,
Apr 13, 2014, 4:43:52 PM4/13/14
to open-m...@googlegroups.com, Krzy Kli
Well, I don't really know either.

The dot notation seems more natural from outside, but if there are good reasons for it not to be present on Instance objects then I think it's alright as long as there is another way of doing a particular thing, and it's not complex or trick-ish :)

Marcus Ottosson

unread,
Apr 13, 2014, 4:45:26 PM4/13/14
to Krzy Kli, open-m...@googlegroups.com
Yes, I think the only way to know for sure is usage. The current method works for me so far, but I'm interested in hearing what works for you too. :) 

How's it going so far?
Reply all
Reply to author
Forward
0 new messages