I've spent a little time looking at some example Pipes as well as
going through a tutorial and I my question is which are the most
common transformations that are used.
It seems to me the transformations I've seen that look like they're
used most commonly are:
sort by (some attribute)
select by (some attribute)
truncate by (number of entries)
Any others that people see in common use?
- Serge
Can you give examples of "ifs". Is an if like a filter?
My implementation questions are mainly around the filters and some of the sorts.
- Serge
I've since found the documentation:
http://pipes.yahoo.com/pipes/docs?doc=modules
This maps basically onto what I thought...
Pipes seems to (and people can correct me if I'm wrong, map generally
into a set of records with a set of key:value pairs, with the
possibility to loop through the key:val pairs deeper if necessary.
> As for implementation... Here's my take on it.
>
> Filters end up being very simple. For each record, if they meet some
> condition, pass it on. If not, throw it away.
Yes, the trick is particular data types need some special attention.
For example, let's say I have two data sources, one is a CSV file with
the date as an ISO 8601 format and an RSS 2.0 feed with the date in
RFC822 format. You want to be able to say "newer" on both date
formats, so you can't simply treat them as text.
Similarly you may want to work with geo-encoded data and say "Inside
this box" or "Outside this polygon". This means you need to have an
internal representation of this new data type.
Or may not... I may be over-thinking this.
Having not worked with Pipes before, I'm getting my head around the
problem space.
- Serge
> A better solution, in my opinion, is to provide transformers to rename
> keys & reformat data, and make it the responsibility of the person
> designing the pipe flow. If they wanted to compare dates from two
> different data sources, they would be responsible for inserting a
> "dateParser" transformer component to normalize the data before
> feeding it to the comparator transformer.
I've thought of this problem a little too. Here's my take:
I think ultimately we want to be always oriented toward the output.
Everything up to the output is "necessary work" to arrive at the
output.
That said, I think the easiest way to achieve this would be a
transformer that does associations between key names.
Some data begs to be normalized- time, I think, is one of them. But
other data is going to be so difficult to manage that it
but it seems overall easier to solve with a transformation of
"associations" where you'd provide a mapping between the keys in one
pipe input and that of another, but you'd just have
'pubDate' -> 'publication_date'
How does Pipes handle this BTW?
I don't know how often this problem will actually come up in real life
either, so I'm less hesitant to try to solve it in an elegant way (vs
data types, which I feel aught to be supported).
- Serge