Re: [cc] Digest for common-crawl@googlegroups.com - 2 updates in 1 topic

31 views
Skip to first unread message

R Lee Prevost

unread,
Jun 21, 2025, 10:47:02 AMJun 21
to common...@googlegroups.com
My two cents is you have to do this via s3. It sure seems like you could go a long way on AWS free tier. I use databricks but I’ve explored doing this kind of work with AWS glue or emr. I’m convinced you can go a long way with just glue. Also have you considered AWS search? Writing a lambda function to query file nams and offsets and write them back to something seems fairly affordable.


Sent from my iPhone

> On Jun 21, 2025, at 8:47 AM, common...@googlegroups.com wrote:
>
> But the thing is, due to a very limited budget, I am unable to use AWS for
> this, as an alternative, I consider using HTTPS range requests
Reply all
Reply to author
Forward
0 new messages