Analyzing Tweets | blank result | String[] value not storable

251 views
Skip to first unread message

Rob H-Man

unread,
Nov 19, 2015, 7:30:21 PM11/19/15
to OpenRefine

Dear Open/Google-Refine Community,

to start with: I am very new to data analysis, and my skills regarding any kind of coding/programming are (obviously) very limited.

My goal is, to analyze Twitter-Data as shown and explained here: http://schoolofdata.org/harvesting-and-analyzing-tweets/

-> The issue is now, that when I run the code, as seen on the guide and the screenshot, then I get an error code called "String[] value not storable", while the preview showd the results somehow successful?

Putting the issue to google I got: https://code.google.com/p/google-refine/issues/detail?id=434 ...but thats unfortunately not very helpful - or I did smth wrong?

What can i do to fix the issue and analyze my data? Is the guide outdated or wrong?

With best regards and thanks,
Rob



split(/[^a-z0-9-_@#]/)
lit(/[^a-z0-9-_@#]/)

value.split(/[^a-z
value.split(/[^a-z0-9-_@#]/)
value.split(/[^a-z0-9-_@#]/)
value.split(/[^a-z0-9-_@#]/)

Thad Guidry

unread,
Nov 19, 2015, 11:10:20 PM11/19/15
to openrefine
The output of split() function is not a String, but actually an Array.  I know, its confusing for beginners.  There is a difference.  Our Reference section tells what each function actually outputs, by the way.

You can convert an Array to a String pretty easily.  Most folks like to have or keep some chars to show the separation of each element in the array.

split(/[^a-z0-9-_@#]/).join(",")

by simply adding a .join(",") your telling OpenRefine to join each element with a , comma between them and output that long string.

Give it a try and experiment.  Make sure to look at our Documentation and Recipes.  Scroll down on this page: https://github.com/OpenRefine/OpenRefine

The preview showed that you will be getting an Array of elements with the [ ] brackets and elements in between those brackets.  We designed the preview to help you understand and visually see what the split() function or any function is doing.  Just keep in mind that what you see in the OpenRefine Expression preview is not necessarily always a String datatype.  You will have to ensure yourself that you somehow make it into the datatype that is needed for your cells and workflow.

Thad Guidry

unread,
Nov 19, 2015, 11:18:46 PM11/19/15
to openrefine
Oh, and for those technically wondering what is happening in our code under the covers in these cases and perhaps wish to make it a bit smarter of a hint back to a user like Rob and hack on it to improve :  https://github.com/OpenRefine/OpenRefine/blob/a2aa8dffb4d21146156647f979e65b5ce376abd1/main/src/com/google/refine/expr/ExpressionUtils.java#L156
Message has been deleted

Rob H-Man

unread,
Nov 20, 2015, 5:34:21 AM11/20/15
to OpenRefine
Thank you very much Thad; I will try it! Have a nice weekend!

Tom Morris

unread,
Nov 20, 2015, 7:29:27 AM11/20/15
to openr...@googlegroups.com
On Thu, Nov 19, 2015 at 11:10 PM, Thad Guidry <thadg...@gmail.com> wrote:

The preview showed that you will be getting an Array of elements with the [ ] brackets and elements in between those brackets.  We designed the preview to help you understand and visually see what the split() function or any function is doing.  Just keep in mind that what you see in the OpenRefine Expression preview is not necessarily always a String datatype.  You will have to ensure yourself that you somehow make it into the datatype that is needed for your cells and workflow.

We could probably improve the preview to make it clearer to the user what's going on.  It is a little deceptive that the preview shows no indication of a problem, even when the result can't be stored.

Tom 

Thad Guidry

unread,
Nov 20, 2015, 10:05:11 AM11/20/15
to openrefine
Agree Tom.

I thought we had an issue for that somewhere...hmm.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Owen Stephens

unread,
Nov 20, 2015, 4:26:32 PM11/20/15
to OpenRefine
Thad - you created an issue for this recently https://github.com/OpenRefine/OpenRefine/issues/1088

Thad Guidry

unread,
Nov 20, 2015, 5:41:19 PM11/20/15
to openrefine
haha... I forgot it had my name attached to it somewhere..(figures)... thanks Owen.

Thad Guidry

unread,
Nov 21, 2015, 6:58:17 PM11/21/15
to openrefine
How does this look for everyone ?

I only just gave it an extra hint of an error (light grey, but still interactive and value itself is still useable in facets...best of both worlds)
Code is not merged yet.

1.105907Error: array preview (not stored): [ "105907" ]
2.5434addressError: array preview (not stored): [ "5434", "ress" ]
3.488352addressError: array preview (not stored): [ "488352", "ress" ]
4.239355addressError: array preview (not stored): [ "239355", "ress" ]
5.96530addressError: array preview (not stored): [ "96530", "ress" ]
6.167408addressError: array preview (not stored): [ "167408", "ress" ]

​If we like this and agree on the text displayed  (Error: prefix has to stay, that's how we give the user a hint that they probably need to do more to get stored output)​

​Thoughts ?

Thad Guidry

unread,
Nov 21, 2015, 7:01:21 PM11/21/15
to openrefine
Or this ?

1.105907Error: add join() to store array: [ "105907" ]
2.5434addressError: add join() to store array: [ "5434", "ress" ]
3.488352addressError: add join() to store array: [ "488352", "ress" ]
4.239355addressError: add join() to store array: [ "239355", "ress" ]
5.96530addressError: add join() to store array: [ "96530", "ress" ]
6.167408addressError: add join() to store array: [ "167408", "ress" ]
7.141235addressError: add join() to store array: [ "141235", "ress" ]

Owen Stephens

unread,
Nov 23, 2015, 5:19:50 AM11/23/15
to OpenRefine
I'm torn as I use the ability to preview the array a lot and anything that gets in the way of that is bad from my perspective - so I'm not very keen on adding the error into the preview. On the other hand I recognise that we need to flag to users somehow that they cannot store the array.

Also, note that when doing a Custom Text Facet, Array is OK as an output - which may be a complication as the error that applies for a Cell Transform will not apply in the Custom Text Facet.

If we are adding an error then I prefer the former rather than the latter because there are multiple possibilities of going from an array to a storable value e.g. length() would also be valid

However, to throw another idea into the mix - is there any reason why the array should not be stored? This would be useful IMO as it would simplify transformations allowing you to create an array in one transform, store it in a cell, and then carry out further manipulations on it. I realise, of course, that this is likely to be considerably more complicated than just displaying the error!

Owen

Thad Guidry

unread,
Nov 23, 2015, 12:32:40 PM11/23/15
to openrefine
Owen,

OK, looks like I have your other idea implemented, and array, collection, or list is now being stored...

And then for a GREL function to convert back, you would want smartSplit() to be just a teeny weeny bit smarter... right ?

Screenshot:


Thad

Thad Guidry

unread,
Nov 23, 2015, 12:57:50 PM11/23/15
to openrefine
Actually Owen,

parseJson() seems to work fine with it:








Thoughts ? Anything I'm missing ?

Owen Stephens

unread,
Nov 23, 2015, 1:03:40 PM11/23/15
to OpenRefine
Thanks Thad - that looks cool.
I was actually thinking that the array could be stored as an array data type, but actually to default to storing as a JSON array if the outcome is an array seems like a pretty neat approach to me.

Do you need to check that this doesn't cause any problems when doing custom text facets based on arrays? Or is code separate?

Owen

Thad Guidry

unread,
Nov 23, 2015, 1:28:15 PM11/23/15
to openrefine
Owen,

Custom Text Faceting works as it always has. (might need to be bang tested by you to make sure I did not assume too much)
This facet looks correct as it did before my changes.




For this, I made changes only to ExpressionUtils.java

    static public boolean isStorable(Object v) {
        return v == null ||
            v instanceof Number ||
            v instanceof String ||
            v instanceof Boolean ||
            v instanceof Date ||
            v instanceof Calendar ||
            v instanceof EvalError ||
            v instanceof Arrays ||      // NEW
            v instanceof Collection ||      // NEW
            v instanceof List;      // NEW
    }

    static public Serializable wrapStorable(Object v) {
        if (v instanceof JSONArray) {
            return ((JSONArray) v).toString();
        } else if (v instanceof JSONObject) {
            return ((JSONObject) v).toString();

// NEW
        } else if (v instanceof List) {
            return ((List<?>) v).toString();
        } else if (v.getClass().isArray()) {
            return (Arrays.deepToString((Object[]) v));  // This is kinda cool and uses http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#deepToString(java.lang.Object[])
        } else if (v instanceof Collection<?>) {
            return ((Collection<?>) v).toArray().toString();
// NEW


        } else {
            return isStorable(v) ?
                (Serializable) v :
                new EvalError(v.getClass().getSimpleName() + " value not storable");
        }
    }

Thad Guidry

unread,
Nov 23, 2015, 3:54:10 PM11/23/15
to openrefine
And I think that I am happy with the way jsonize() is able to wrap each Array or List element with double-quotes

["1", "2", "3"]

othewise, without appending jsonize(), you just get the pure Array output via the new Arrays.deepToString() method

[1, 2, 3]



I think this is quite good enough for all the possible usecases.

Owen Stephens

unread,
Nov 24, 2015, 5:43:02 AM11/24/15
to OpenRefine
Agreed - looks good to me

Owen Stephens

unread,
Nov 24, 2015, 5:43:24 AM11/24/15
to OpenRefine
Text Faceting looks good as well
Reply all
Reply to author
Forward
0 new messages