Denormalize operation

7 views
Skip to first unread message

Antonin Delpeuch (lists)

unread,
Jun 19, 2020, 12:54:32 PM6/19/20
to openref...@googlegroups.com
Hi all,

I noticed that we have a "Denormalize" operation in the backend. It is
not clear to me:
- if this operation is accessible from the UI?
- what this operation actually does? It seems related to the records
mode looking at the source.
- whether we should do anything about it? (expose it in the UI, delete it…)

https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/operations/row/DenormalizeOperation.java

Best,
Antonin

Tom Morris

unread,
Jun 19, 2020, 1:13:05 PM6/19/20
to openref...@googlegroups.com
The commit that introduced it says:

commit 3f40195ea1fa23381ba8cc250f3f85d990b80e11
Author: David Huynh <dfh...@gmail.com>
Date:   Thu Apr 29 22:07:07 2010 +0000
    Implemented but disabled the denormalize operation.

As far as I can tell, it's never been used, although it's gotten plenty of maintenance over the years (tests, CSRF support, etc). I'd delete it so it stops wasting effort.

Tom

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/23ba001f-3b5c-ef8a-d780-549abb08fd56%40antonin.delpeuch.eu.

Thad Guidry

unread,
Jun 19, 2020, 1:19:12 PM6/19/20
to openref...@googlegroups.com
David had some issues when implementing "denormalize" to fit a users need.  See thread below:

HISTORICAL THREAD

April 28, 2010, 11:29pm CST

When I take a column and split the value into new rows, is there a way to have those new rows inherit all other values of the row? A way for them to not be dependent rows on the original parent?

Thanks!
Jeanne

----

Jeanne,

I think this would be a useful feature.

-Raymond


----

In the meantime, you could go over each of the other columns and do this
transform:
     row.record.cells[columnName].value[0]

David

----

Along the same vein, I would like to be able to collapse rows together when I use clustering to update values. My specific case is one in which I want to be able to add values together when I discover they belong in the same row.

For example, the original data might look like this:

ABCDEFG                     35                   100
ABcdeFG                       20                     50

After clustering and finding the match, I would love to end up with 1 row that looks like this:

ABCDEFG                      55                   150

Plausible? Anyone else think this would be useful? I am not sure what happens to non-numeric values - perhaps just putting all the values in a single cell with a given value separator.

Jeanne


----

Definitely -- this kind of thing is something I'd probably drop out to
a programming language to do most of the time. What operation to
perform on the numbers in the collapsed targets would vary by the
semantics of each column, though; for columns that are "counts of
ABCDEFG", summing is of course the interesting operation, whereas for
columns that are "average price of ABCDEFGs", it would be an
average(count(each type of reconciled ABCDEFG) * value)), however that
would or would not be expressible in something like GEL. Yet other
column semantics might apply too, of course.

I would love this kind of collapsing functionality, especially if
columns could be marked up to carry along some of their semantics,
guiding per-column default choices of operation like this, letting
Gridworks get smarter (without overruling power uses with some other
agenda).

--
 / Johan Sundström, http://ecmanaut.blogspot.com/


----

Yeah, very interesting idea.

I would note that there are two different tasks here:

  1) find the sets of rows to conflate

  2) how to conflate them

Right now I think Gridworks is pretty good at #1 but totally lacking on
#2... the designing the UI will be pretty interesting.

--
Stefano Mazzocchi                              Application Catalyst
Metaweb Technologies, Inc.                      ste...@metaweb.com


----

I just tried a simple implementation of a "denormalize" command (which
is what I think Jeanne wanted), but my implementation didn't work as
expected. This command involves the record model--an under-designed
aspect of Gridworks currently. I'll need to think much more about this
issue, and I'm afraid it'll have to be a post-1.0 feature.

David


END OF HISTORIAL THREAD

Thad Guidry

unread,
Jun 19, 2020, 1:25:24 PM6/19/20
to openref...@googlegroups.com

Tom Morris

unread,
Jun 19, 2020, 1:32:45 PM6/19/20
to openref...@googlegroups.com
This command involves the record model--an under-designed aspect of Gridworks currently.

This remains true 10 years later. Another reason to drop it since we don't know what Record Mode would look like done right.

It's effectively a multi-column Fill Down, so general multi-column/all-column operation support would be another way to tackle this.

Tom

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.

Antonin Delpeuch (lists)

unread,
Jun 19, 2020, 1:35:48 PM6/19/20
to openref...@googlegroups.com
Ah, thank you for this wonderful piece of history!

So it is intended to be a sort of fill down on multiple columns at the
same time, in a sense.

Given that it has never been exposed in the UI, and David says it does
not work, I think it is safe to delete it indeed.

Antonin

On 19/06/2020 19:25, Thad Guidry wrote:
> Found the thread link here:
>
> https://groups.google.com/d/topic/openrefine/4DRi-hcOZKw/discussion
>
> Thad
> https://www.linkedin.com/in/thadguidry/
>
>
> On Fri, Jun 19, 2020 at 12:18 PM Thad Guidry <thadg...@gmail.com
> <mailto:thadg...@gmail.com>> wrote:
>
> David had some issues when implementing "denormalize" to fit a users
> need.  See thread below:
> **
> *
> HISTORICAL THREAD*
> <mailto:ste...@metaweb.com>
>
> ----
>
> I just tried a simple implementation of a "denormalize" command (which
> is what I think Jeanne wanted), but my implementation didn't work as
> expected. This command involves the record model--an under-designed
> aspect of Gridworks currently. I'll need to think much more about this
> issue, and I'm afraid it'll have to be a post-1.0 feature.
>
> David
>
> *END OF HISTORIAL THREAD*
>
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine-de...@googlegroups.com
> <mailto:openrefine-de...@googlegroups.com>.
> <https://groups.google.com/d/msgid/openrefine-dev/CAChbWaOwG%2BQXb1%2B%3DkYeHadK89HDs-HUwCyfF8_q301%3D0gbj7sw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages