Picasa post/get synchronization/merge

132 views
Skip to first unread message

Ferran

unread,
Jun 28, 2010, 8:40:36 PM6/28/10
to GoogleCL Discuss
Hi,

I'm working on synchronizing picasa pictures because I've a lot of
near 10GB of photos. There is a request here:
http://code.google.com/p/googlecl/issues/detail?id=170

Currently I've implemented it for "picasa post" and "picasa get"
tasks. I need some feedback to know if I'm on the right way. So,
please, all comments are welcome! In my local copy I can run (and
works):

> picasa post --sync --title "test" c:\tmp\test\*.jpg
> picasa get --sync --title "test" c:\tmp

That implementation just skips the existing files on the local (get)
and remote (post) targets. It seems more a "merge" that a sync.

What do you think about sync/merge? What have to provide? also delete
files? The comparison have to be based on what? Time? Some hash?

Thanks!

Ferran

Dalton Barreto

unread,
Jun 28, 2010, 9:09:26 PM6/28/10
to googlecl...@googlegroups.com
2010/6/28 Ferran <fer...@gmail.com>:

> Hi,
>
> I'm working on synchronizing picasa pictures because I've a lot of
> near 10GB of photos. There is a request here:
> http://code.google.com/p/googlecl/issues/detail?id=170
>
> Currently I've implemented it for "picasa post" and "picasa get"
> tasks. I need some feedback to know if I'm on the right way. So,
> please, all comments are welcome! In my local copy I can run (and
> works):
>
>> picasa post --sync --title "test" c:\tmp\test\*.jpg
>> picasa get --sync --title "test" c:\tmp
>
> That implementation just skips the existing files on the local (get)
> and remote (post) targets. It seems more a "merge" that a sync.
>

That's a very good feature. Reading this issue I was thinking about
which side would prevail at the time of synchronization and combine
the --sync option with "post" and "get" resolves this in a very clever
way. Congratulations!


> What do you think about sync/merge? What have to provide? also delete
> files? The comparison have to be based on what? Time? Some hash?
>

I think that ideally it should be the hash, but if gdata does not
provide this value for each foto googlecl will need to download any
photo to only then calculate the hashes.

I didn't read the gdata specification to see if it provides such a
value, but I'll take a look.

About deleting files I think that a --delete options would be a good way to go.

--
Dalton Barreto

Dalton Barreto

unread,
Jun 28, 2010, 9:27:12 PM6/28/10
to googlecl...@googlegroups.com
2010/6/28 Dalton Barreto <dalto...@gmail.com>:

> I didn't read the gdata specification to see if it provides such a
> value, but I'll take a look.

This should do the trick.
<http://code.google.com/intl/pt-BR/apis/picasaweb/docs/2.0/reference.html#gphoto_checksum>

But to make this happen googlecl should calculate the hash value for
every sent photo and put this value inside the gphoto:checksum
element.

Does this seems reasonable?

--
Dalton Barreto

Ferran Busquets

unread,
Jun 29, 2010, 3:04:34 AM6/29/10
to googlecl...@googlegroups.com
Thanks for your feedback. The big question is how to compute the checksum for the local file... 

Anyway, I attached a patch on the issue 170.


2010/6/29 Dalton Barreto <dalto...@gmail.com>



--
http://twitter.com/ferranb

Dalton Barreto

unread,
Jun 29, 2010, 8:15:57 AM6/29/10
to googlecl...@googlegroups.com
2010/6/29 Ferran Busquets <fer...@gmail.com>:

> Thanks for your feedback. The big question is how to compute the checksum
> for the local file...
>

This should be simple, I made a quick test.

First I created a 1MB random file:
$ dd if=/dev/random of=/tmp/file bs=1024k count=1

Then I calculated a sha1 hash for this file using the python interactive shell

Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> sha1 = hashlib.sha1()
>>> sha1.update(file('/tmp/file').read())
>>> sha1.hexdigest()
'da0b76d89aea72c13408702ff2e98f070c0545bf'
>>>

Then I used a command line tool to do the same:

$ sha1sum /tmp/file
da0b76d89aea72c13408702ff2e98f070c0545bf /tmp/file

Look that, as expected, we have the same result. So this python code
is consistent with other tools and should be good.

Note that you can't reuse the same sha1 object for two different
files. You will have to create a sha1 object for each file that you
calculate the hash.

Also, If you don't feel confortable using sha1 you can choose any
other algorithm, this was just a simple example. =)

--
Dalton Barreto

Ferran Busquets

unread,
Jun 29, 2010, 9:32:48 AM6/29/10
to googlecl...@googlegroups.com

It's the same checksum that picasa saves on the uploaded pictures metadata? Is picasa using sha1?

Ferran

Dalton Barreto

unread,
Jun 29, 2010, 10:01:08 AM6/29/10
to googlecl...@googlegroups.com
2010/6/29 Ferran Busquets <fer...@gmail.com>:

> It's the same checksum that picasa saves on the uploaded pictures metadata?

I don't know. But if I understood right the documentation, the
gphoto:checksum is filled by the client application, so the developers
could choose what to use and put in this field.

> Is picasa using sha1?

If the above is right, this does not matter. As the value will be
controlled by the application.
Even if google is using sha1 this can be changed at any time, so
controlling the hash value by the application is a better way to go in
my opinion.

In this case the problem will be albums that used other tools,
googlecl will not be capable to sync them. Is this a major problem?

--
Dalton Barreto

Ferran

unread,
Jun 29, 2010, 4:36:30 PM6/29/10
to GoogleCL Discuss
I agree that SHA1 is a good choice. I've done a little test and only
Picasa desktop application uses the checksum. If you upload a file
through the web browser or through gdata API, then no checksum is
used.

For my it's a problem because it can begin to upload again all the
10GB photos and it can take too much time for my patience ;-)

May be interesting to have some option to "post" only the checksum to
uploaded photos.



On 29 Juny, 16:01, Dalton Barreto <daltonma...@gmail.com> wrote:
> 2010/6/29 Ferran Busquets <ferr...@gmail.com>:

Dalton Barreto

unread,
Jun 29, 2010, 7:31:54 PM6/29/10
to googlecl...@googlegroups.com
2010/6/29 Ferran <fer...@gmail.com>:

> I agree that SHA1 is a good choice. I've done a little test and only
> Picasa desktop application uses the checksum. If you upload a file
> through the web browser or through gdata API, then no checksum is
> used.

That's good to know.

>
> For my it's a problem because it can begin to upload again all the
> 10GB photos and it can take too much time for my patience ;-)
>

For sure!

> May be interesting to have some option to "post" only the checksum to
> uploaded photos.
>
>

That's right! Could be a --update-checksum option. I think the
"partial update" should help.
<http://code.google.com/intl/pt-BR/apis/picasaweb/docs/2.0/developers_guide_protocol.html#PartialUpdate>
But in this ocasion we are back to the first problem: "What will be
considered to decide when a photo is equal to another?" I think for
this the filename is good.

--
Dalton Barreto

Tamer Ziady

unread,
Jan 20, 2014, 1:13:27 PM1/20/14
to googlecl...@googlegroups.com
Hello; 

    I have a huge photo library up at google and I am looking for a way to sync it with my local directory. Also, huge - 200+ gigs. Is there a way of doing this easily without deleting everything on google and starting from scratch?

Tamer Ziady

unread,
Jan 20, 2014, 1:14:39 PM1/20/14
to googlecl...@googlegroups.com
I am willing to use google cl or any other method you think that might work. I only have linux workstations; but if I have to I will install a windblows box (shudder) :(....
Reply all
Reply to author
Forward
0 new messages