Backward compatibility of GREL etc functions?

Tom Morris

unread,

Sep 20, 2011, 1:17:30 AM9/20/11

to google-refine-dev

What backward compatibility guarantees have we (implicitly) made, or
should we make, with regard to GREL/Clojure/etc?

Possible options:

1. All functions produce the same results in all Refine versions forever
2. Once you change the Refine version, all bets are off and no
guarantees are made
3. Function results are upward compatible for later versions of Refine
4. as for 3, with the exception of obvious bug fixes
5. <other variations>

This thought was triggered by the combination of folks who are
interested in provenance and digital forensics, combined with
perceived changes is default results for functions. We should
probably put a stake in the ground which describes what our intentions
are.

This probably deserves wider discussion on the general list, but I
figured I'd start here with a more limited community...

Tom

David Huynh

unread,

Sep 20, 2011, 4:00:07 AM9/20/11

to google-r...@googlegroups.com

Hi Tom,

On Tue, Sep 20, 2011 at 1:17 PM, Tom Morris <tfmo...@gmail.com> wrote:

What backward compatibility guarantees have we (implicitly) made, or
should we make, with regard to GREL/Clojure/etc?

This issue has been perplexing me, too.

Possible options:

1. All functions produce the same results in all Refine versions forever
2. Once you change the Refine version, all bets are off and no
guarantees are made

You probably meant these 2 to be straw-man examples.

3. Function results are upward compatible for later versions of Refine
4. as for 3, with the exception of obvious bug fixes

It seems like the only safe way that a function can change is that it takes more parameters, which can then alter its behavior. Even bug fixes are not safe. A sequence of operations may not yield the same effect before vs. after the bug fixes.

5. <other variations>

This thought was triggered by the combination of folks who are
interested in provenance and digital forensics, combined with
perceived changes is default results for functions. We should
probably put a stake in the ground which describes what our intentions
are.

This probably deserves wider discussion on the general list, but I
figured I'd start here with a more limited community...

One solution I can think of is to separate out the GREL support completely into a "scripting language module" of some sort (much like current jython and clojure support). But each GREL version is its own "scripting language module", and several versions of GREL can be installed on the same Refine instance. Then each expression is prefixed with the language as well as its version.

And if that turns out well for GREL, we should also extract out and version "operation modules". So Refine is just a UI with these scripting language and operation plugin modules.

David

Thad Guidry

unread,

Sep 20, 2011, 9:50:55 AM9/20/11

to google-r...@googlegroups.com

If I give you toString() and suddenly toString() performs a toString() + "Thad" everytime, then you have changed the function's previous usefulness. In that case, the new version of toString() should not be called TO STRING any longer.

David says that you could add parameters to it, sure that would make it safe. toString(value, +"Thad")

But depending on and becoming accustomed to GREL 2.1 is easy to do, I think most would agree.

If we make switching GREL 2.1 to 2.5, or 3.0 easy for the user, then we eliminate a lot of problems for them, I would think.

If making switching GREL versions easy means that GREL should be a "scripting language module".

R language modules (well, some of them) do the same thing, and some folks depend on a tspaleo15 version versus tspaleo16 (just an example, but it's out there)

--
-Thad
http://www.freebase.com/view/en/thad_guidry

David Huynh

unread,

Sep 20, 2011, 7:51:56 PM9/20/11

to google-r...@googlegroups.com

On Tue, Sep 20, 2011 at 9:50 PM, Thad Guidry <thadg...@gmail.com> wrote:

If I give you toString() and suddenly toString() performs a toString() + "Thad" everytime, then you have changed the function's previous usefulness. In that case, the new version of toString() should not be called TO STRING any longer.

David says that you could add parameters to it, sure that would make it safe. toString(value, +"Thad")

But depending on and becoming accustomed to GREL 2.1 is easy to do, I think most would agree.

If we make switching GREL 2.1 to 2.5, or 3.0 easy for the user, then we eliminate a lot of problems for them, I would think.
If making switching GREL versions easy means that GREL should be a "scripting language module".

R language modules (well, some of them) do the same thing, and some folks depend on a tspaleo15 version versus tspaleo16 (just an example, but it's out there)

Cool, good to see precedents!

I think the challenge isn't about allowing the user to easily switch version, but the challenges are that the language modules might have a different release cycle than the core product, and how to support installing language modules.

Overall, I think Refine has 3 "language" abstractions (and by "language" I don't mean programming language per se):

1. a language for computing data (e.g., GREL)

2. a language for selecting which data to operate on (i.e., faceted browsing), which can depend on (1)

3. a language for transforming data (i.e., operations), which depend on both (1) and (2)

Ideally, all three are pluggable, but I don't know if that's going overboard.

David

Stefano Mazzocchi

unread,

Sep 21, 2011, 10:49:35 PM9/21/11

to google-r...@googlegroups.com

On Wed, Sep 21, 2011 at 1:51 AM, David Huynh <dfh...@gmail.com> wrote:

On Tue, Sep 20, 2011 at 9:50 PM, Thad Guidry <thadg...@gmail.com> wrote:
If I give you toString() and suddenly toString() performs a toString() + "Thad" everytime, then you have changed the function's previous usefulness. In that case, the new version of toString() should not be called TO STRING any longer.

David says that you could add parameters to it, sure that would make it safe. toString(value, +"Thad")

But depending on and becoming accustomed to GREL 2.1 is easy to do, I think most would agree.

If we make switching GREL 2.1 to 2.5, or 3.0 easy for the user, then we eliminate a lot of problems for them, I would think.
If making switching GREL versions easy means that GREL should be a "scripting language module".

R language modules (well, some of them) do the same thing, and some folks depend on a tspaleo15 version versus tspaleo16 (just an example, but it's out there)

Cool, good to see precedents!

I think the challenge isn't about allowing the user to easily switch version, but the challenges are that the language modules might have a different release cycle than the core product, and how to support installing language modules.

Overall, I think Refine has 3 "language" abstractions (and by "language" I don't mean programming language per se):
1. a language for computing data (e.g., GREL)

2. a language for selecting which data to operate on (i.e., faceted browsing), which can depend on (1)
3. a language for transforming data (i.e., operations), which depend on both (1) and (2)

Ideally, all three are pluggable, but I don't know if that's going overboard.

Pluggability is a false feature, IMO: we are just passing the costs to somebody else.

I am much more inclined to use a process that has worked well for other projects in the past:

1) within the same generation (the first number in the version number), we guarantee back-compatibility

2) within the same evolution (the second number), we can introduce new things that were not available to previous evolutions

3) within revision numbers (the third number), we don't introduce anything new (but we fix bugs)

also, we should export the version number for the "redo" scripts so that a new refine could reject (or adapt) an older script.

If we make things pluggable, we don't have much less work to do to maintain such contracts *and* we have made the future-compatibility of 'redo' scripts a lot more fragile and pass the cost of that fragility onto our users.

--
Stefano Mazzocchi <stef...@google.com>
Software Engineer, Google Inc.

David Huynh

unread,

Sep 22, 2011, 12:23:52 AM9/22/11

to google-r...@googlegroups.com

On Thu, Sep 22, 2011 at 10:49 AM, Stefano Mazzocchi <stef...@google.com> wrote:

On Wed, Sep 21, 2011 at 1:51 AM, David Huynh <dfh...@gmail.com> wrote:

On Tue, Sep 20, 2011 at 9:50 PM, Thad Guidry <thadg...@gmail.com> wrote:
If I give you toString() and suddenly toString() performs a toString() + "Thad" everytime, then you have changed the function's previous usefulness. In that case, the new version of toString() should not be called TO STRING any longer.

David says that you could add parameters to it, sure that would make it safe. toString(value, +"Thad")

But depending on and becoming accustomed to GREL 2.1 is easy to do, I think most would agree.

If we make switching GREL 2.1 to 2.5, or 3.0 easy for the user, then we eliminate a lot of problems for them, I would think.
If making switching GREL versions easy means that GREL should be a "scripting language module".

R language modules (well, some of them) do the same thing, and some folks depend on a tspaleo15 version versus tspaleo16 (just an example, but it's out there)

Cool, good to see precedents!

I think the challenge isn't about allowing the user to easily switch version, but the challenges are that the language modules might have a different release cycle than the core product, and how to support installing language modules.

Overall, I think Refine has 3 "language" abstractions (and by "language" I don't mean programming language per se):
1. a language for computing data (e.g., GREL)

2. a language for selecting which data to operate on (i.e., faceted browsing), which can depend on (1)
3. a language for transforming data (i.e., operations), which depend on both (1) and (2)

Ideally, all three are pluggable, but I don't know if that's going overboard.

Pluggability is a false feature, IMO: we are just passing the costs to somebody else.

I am much more inclined to use a process that has worked well for other projects in the past:

What kinds of projects are you referring to? I'm tempted to think that Refine is peculiar in its own ways that such a process doesn't apply.

1) within the same generation (the first number in the version number), we guarantee back-compatibility

2) within the same evolution (the second number), we can introduce new things that were not available to previous evolutions
3) within revision numbers (the third number), we don't introduce anything new (but we fix bugs)

But fixing bugs in GREL functions already changes their contracts.

also, we should export the version number for the "redo" scripts so that a new refine could reject (or adapt) an older script.

+1 to having the version somewhere in there. Do you mean such adaptation would be automatic? I.e., each GREL function would look at the version and behave correspondingly.

If we make things pluggable, we don't have much less work to do to maintain such contracts *and* we have made the future-compatibility of 'redo' scripts a lot more fragile and pass the cost of that fragility onto our users.

I'm guessing that we need both--contract as you described, as much as we can handle with limited human resources, plus pluggability. Imagine an agency who has managed to come up with several history scripts that it knows would work for several kinds of data it deals with, and then it needs to upgrade to a later version of Refine (because of some crucial bug fixes). But then some of the scripts no longer work as expected. If there's no pluggability, what can it do? Rather than just picking an older GREL version, it'd need to debug those scripts.

I think this might work out as how Eclipse lets you select different Java version. You rarely care to, but when you really need an old version, you have the option to use it.

Stefano Mazzocchi

unread,

Sep 22, 2011, 12:41:14 AM9/22/11

to google-r...@googlegroups.com

On Thu, Sep 22, 2011 at 6:23 AM, David Huynh <dfh...@gmail.com> wrote:

On Thu, Sep 22, 2011 at 10:49 AM, Stefano Mazzocchi <stef...@google.com> wrote:

On Wed, Sep 21, 2011 at 1:51 AM, David Huynh <dfh...@gmail.com> wrote:

On Tue, Sep 20, 2011 at 9:50 PM, Thad Guidry <thadg...@gmail.com> wrote:
If I give you toString() and suddenly toString() performs a toString() + "Thad" everytime, then you have changed the function's previous usefulness. In that case, the new version of toString() should not be called TO STRING any longer.

David says that you could add parameters to it, sure that would make it safe. toString(value, +"Thad")

But depending on and becoming accustomed to GREL 2.1 is easy to do, I think most would agree.

If we make switching GREL 2.1 to 2.5, or 3.0 easy for the user, then we eliminate a lot of problems for them, I would think.
If making switching GREL versions easy means that GREL should be a "scripting language module".

R language modules (well, some of them) do the same thing, and some folks depend on a tspaleo15 version versus tspaleo16 (just an example, but it's out there)

Cool, good to see precedents!

I think the challenge isn't about allowing the user to easily switch version, but the challenges are that the language modules might have a different release cycle than the core product, and how to support installing language modules.

Overall, I think Refine has 3 "language" abstractions (and by "language" I don't mean programming language per se):
1. a language for computing data (e.g., GREL)

2. a language for selecting which data to operate on (i.e., faceted browsing), which can depend on (1)
3. a language for transforming data (i.e., operations), which depend on both (1) and (2)

Ideally, all three are pluggable, but I don't know if that's going overboard.

Pluggability is a false feature, IMO: we are just passing the costs to somebody else.

I am much more inclined to use a process that has worked well for other projects in the past:

What kinds of projects are you referring to? I'm tempted to think that Refine is peculiar in its own ways that such a process doesn't apply.

1) within the same generation (the first number in the version number), we guarantee back-compatibility

2) within the same evolution (the second number), we can introduce new things that were not available to previous evolutions
3) within revision numbers (the third number), we don't introduce anything new (but we fix bugs)

But fixing bugs in GREL functions already changes their contracts.

Oh, c'mon.

also, we should export the version number for the "redo" scripts so that a new refine could reject (or adapt) an older script.

+1 to having the version somewhere in there. Do you mean such adaptation would be automatic? I.e., each GREL function would look at the version and behave correspondingly.

At the very least say "this is not compatible" and avoid throwing weirder errors such as 'function not found' or worse.

If we make things pluggable, we don't have much less work to do to maintain such contracts *and* we have made the future-compatibility of 'redo' scripts a lot more fragile and pass the cost of that fragility onto our users.

I'm guessing that we need both--contract as you described, as much as we can handle with limited human resources, plus pluggability. Imagine an agency who has managed to come up with several history scripts that it knows would work for several kinds of data it deals with, and then it needs to upgrade to a later version of Refine (because of some crucial bug fixes). But then some of the scripts no longer work as expected. If there's no pluggability, what can it do?

You are moving the cost of maintaining the interface with the language to the interface with the language/refine plugin interface... there's a win only if one can be solidified more than the other... it might be the case here, or not, I honestly don't know.

But my gut feeling is that we're doing this to punt the problem and still feel good about ourselves and that rarely ends well.

Rather than just picking an older GREL version, it'd need to debug those scripts.

This project is open source and will remain that. If you have a script that works with version 3.2.1, people will download that and run it. If they need bugfixes there, they can get the diffs and apply them. If they want to update it to the latest version, they can spend resources to understand the back incompatibilities introduced later and fix those.

I think that trying to avoid future compatibility problems is hopeless. We can try to be as consistent and open as possible and provide enough safety nets that people can understand what's going on without having to talk to us.

David Huynh

unread,

Sep 22, 2011, 1:11:38 AM9/22/11

to google-r...@googlegroups.com

On Thu, Sep 22, 2011 at 12:41 PM, Stefano Mazzocchi <stef...@google.com> wrote:

Pluggability is a false feature, IMO: we are just passing the costs to somebody else.

I am much more inclined to use a process that has worked well for other projects in the past:

What kinds of projects are you referring to? I'm tempted to think that Refine is peculiar in its own ways that such a process doesn't apply.

1) within the same generation (the first number in the version number), we guarantee back-compatibility

2) within the same evolution (the second number), we can introduce new things that were not available to previous evolutions
3) within revision numbers (the third number), we don't introduce anything new (but we fix bugs)

But fixing bugs in GREL functions already changes their contracts.

Oh, c'mon.

To be more concrete, the toDate() function previously returned null when it fails to parse something into a date. Now in trunk/, it returns an error. Any subsequent expression that tests such result for error or null will not work the same anymore.

Another example: toTitlecase() used to convert "C.R. BLAH" to "C.r. Blah", but now it returns "C.R. Blah".

also, we should export the version number for the "redo" scripts so that a new refine could reject (or adapt) an older script.

+1 to having the version somewhere in there. Do you mean such adaptation would be automatic? I.e., each GREL function would look at the version and behave correspondingly.

At the very least say "this is not compatible" and avoid throwing weirder errors such as 'function not found' or worse.

If we make things pluggable, we don't have much less work to do to maintain such contracts *and* we have made the future-compatibility of 'redo' scripts a lot more fragile and pass the cost of that fragility onto our users.

I'm guessing that we need both--contract as you described, as much as we can handle with limited human resources, plus pluggability. Imagine an agency who has managed to come up with several history scripts that it knows would work for several kinds of data it deals with, and then it needs to upgrade to a later version of Refine (because of some crucial bug fixes). But then some of the scripts no longer work as expected. If there's no pluggability, what can it do?

You are moving the cost of maintaining the interface with the language to the interface with the language/refine plugin interface... there's a win only if one can be solidified more than the other... it might be the case here, or not, I honestly don't know.

But my gut feeling is that we're doing this to punt the problem and still feel good about ourselves and that rarely ends well.

Punting would be doing nothing, which is not what I was suggesting. I'm trying for a solution that we feel good about that solution. We might not have enough engineering resources to implement it, but let's see if we can find the ideal solution for users first.

Rather than just picking an older GREL version, it'd need to debug those scripts.

This project is open source and will remain that. If you have a script that works with version 3.2.1, people will download that and run it. If they need bugfixes there, they can get the diffs and apply them. If they want to update it to the latest version, they can spend resources to understand the back incompatibilities introduced later and fix those.

The difference here is that our user base is not as technical to apply patches. Furthermore, subtle changes in behaviors of functions may have unforeseen consequences that are hard to understand, detect, and fix.

I think that trying to avoid future compatibility problems is hopeless. We can try to be as consistent and open as possible and provide enough safety nets that people can understand what's going on without having to talk to us.

Being able to plug in an older version of GREL can be such a safety net.

Thad Guidry

unread,

Sep 22, 2011, 11:13:19 AM9/22/11

to google-r...@googlegroups.com

Rapid Miner has an R extension. One key feature I noticed is the mapping ability of an R Variable. Nice 8 min video that gave me lots of ideas for forward features later in Refine. There is SOO much overlap among all these data tools now.

http://www.e-lico.eu/?q=node/269

Perhaps "pluggability" can also mean an interface for mapping Refine's variables ?

Imagine that you can facet and manipulate data, while cleaning it and use Refine's powerful faceting to narrow down to a "Set". And then that "Live Set" is easily accessible as input to other tools, like Rapid Miner, etc...to run the larger analysis, even using the R extension.

I think in terms of "pluggability" that other data tools are already THERE. We just need to provide some sort of "Live Set" interface. Who knows...maybe those can be real Refine extensions ? Like a Rapid Miner Refine extension would output the clustered, cleaned up faceted "Live Set" to Rapid Miner.

It's that "Live, Interactive, Previewing" ability of Refine (and the undo) that makes it extremely easy to "experiment". That experimentation is powerful...and other data tools do not let you experiment easily with the raw data, but instead are intended more for analysis, plotting, and visualization. Let those tools handle that, they're good at it already. Just allow more extensions or an even better extension interface that can quickly wire up Refine variables for output to those tools, and keep it "Live and Interactive" through those variable spaces. You could even have 2 or 3 "live interfaces" going, each using prefixed variable names to deal with that.