On May 26, 4:22 pm, Oscar Boykin <
os...@twitter.com> wrote:
> I'm not aware of an example of crawling with scalding. We (Twitter)
> do it for log processing. That said it shouldn't be too hard to
> represent the crawl function they have as a map operation:
>
> map('url -> 'crawlData) { url : String => getPageData(url) }
>
> then process the return of getPageData.
>
> Scalding has built in support for Kryo serialization, so you don't
> need to worry about mapping the return of getPageData onto a Tuple if
> you don't want.
>
> Hope this helps somewhat.
>
>
>
>
>
>
>
>
>
> On Fri, May 25, 2012 at 6:36 PM, Jyotirmoy Sundi <
sundi...@gmail.com> wrote:
> > Hi,
> > I am looking for a crawling a site .
> > The contents of apage can have:
> > 1 can have information to extract +
> > 2 some more urls to crawl(pagination)
> > I tried to follow the bixo tuorial
> > but is wondering if there is anything similar in the scalding(by
> > twitter) framework.
>
> > Regards
> > Sundi
>
> > --
> > You received this message because you are subscribed to the Google Groups "cascading-user" group.
> > To post to this group, send email to
cascadi...@googlegroups.com.
> > To unsubscribe from this group, send email to
cascading-use...@googlegroups.com.