Replacing local blobstore with cloud storage, what is the replacement for BlobstoreLineInputReader?

73 views
Skip to first unread message

Emlyn

unread,
Jun 18, 2015, 3:44:26 AM6/18/15
to google-a...@googlegroups.com
This is a python appengine question, mapreduce 1.9.21

I have code writing lines to a blob in the local blobstore, then processing that using mapreduce BlobstoreLineInputReader. 

Given that the files api is going away, I thought I'd retarget all my processing to cloud storage.

I would expect to find a class called GoogleCloudStorageLineInputReader, but there isn't anything like that. 

Is there something way I can use GoogleCloudStorageInputReader to read lines? 

Another possibility is using GoogleCloudStorageRecordInputReader, but for that my input file needs to be in LevelDB format and I don't know how to create that except with a GoogleCloudStorageConsistentRecordOutputWriter, which I don't know how to use outside a mapreduce context. How might I do that?

Or am I doing this all wrong, is there some other possibility I've missed?



Ryan (Cloud Platform Support)

unread,
Jun 18, 2015, 9:51:02 AM6/18/15
to google-a...@googlegroups.com, emlyn...@gmail.com
Salutations Emlun,

There is work on LineInput Readers here. It is still Beta and has not been accepted by the master thread yet so be careful using it.

Emlyn

unread,
Jun 18, 2015, 9:27:31 PM6/18/15
to google-a...@googlegroups.com
Thanks very much Ryan,

I'll give it a shot. If I hit any walls or notice any issues, I'll let you know.

On 18 June 2015 at 23:21, Ryan (Cloud Platform Support)
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengi...@googlegroups.com.
> To post to this group, send email to google-a...@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/google-appengine/aed62252-a74a-4170-b1c4-283bd9f62851%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Emlyn

http://point7.wordpress.com - My blog
https://plus.google.com/u/0/100281903174934656260 - Google+

Emlyn

unread,
Jun 19, 2015, 12:23:16 AM6/19/15
to google-a...@googlegroups.com
I'm trying this library here as pointed to by Ryan:
https://github.com/rbruyere/appengine-mapreduce

But I'm getting an odd mismatch between it and the Cloud Storage
client library; it's looking for "file_name" on GCSFileStat objects,
but in the library these only have "filename"

I'm using version 1.9.21.0 of the cloud storage client library. Is
there some other version I should be using?

Emlyn

unread,
Jun 19, 2015, 12:26:45 AM6/19/15
to google-a...@googlegroups.com
oh, update:

In the cloudstorage library, in common.py, I added this:

@property
def file_name(self):
return self.filename

to the class GCSFileStat, and now GoogleCloudStorageLineInputReader
appears to be working. Woohoo!

Ryan (Cloud Platform Support)

unread,
Jun 19, 2015, 10:24:35 AM6/19/15
to google-a...@googlegroups.com, emlyn...@gmail.com
Glad that helped. As stated it's still being developed so bugs are still expected.
>>> email to google-appengine+unsubscribe@googlegroups.com.
>>> To post to this group, send email to google-appengine@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages