I'm analyzing Tweets ... as you know some Tweets contain hashtags and links. To extract them with Python is very easy.
The problem is that I don't want to switch from one tool to another to analyze the data. Is there a possibility to do these
steps with Open Refine?
thashtags=re.findall("#([a-z0-9]+)", result['text'], re.I)
data['hashtags']='::'.join(thashtags)
The [0] at the end of the expression indicates which hastag you want to extract:
[0] for the first element
[1] for the second
And so one ...
To know how many hastag your string contains you can use countif facet (see http://googlerefine.blogspot.ca/2011/09/countif-in-google-refine-with.html)
Martin
--
You received this message because you are subscribed to the Google Groups "Open Refine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Ok. No idea how to do that into one single expression.