Imgee storage

5 views
Skip to first unread message

Devi

unread,
Mar 26, 2013, 8:25:44 AM3/26/13
to hasgeek-code
Couple of queries -

- Currently, the uploaded images are saved on to the disk first and
then uploaded to AWS. Is this 2 step process intended? Do we want to
maintain a local copy too? In that case, do we want to do batch
uploads to S3 periodically?

- We make a unique name for each image uploaded and store it in S3
with that unique name. We also store the mapping of names in the DB.
We display the original name and use the unique name in the URL. Now,
should we allow a user to upload images with the same name twice?

--
~Devi

Mitesh Ashar

unread,
Mar 26, 2013, 9:53:54 AM3/26/13
to hasgee...@googlegroups.com

-- 
Mitesh Ashar
Sent with Sparrow

On Tuesday, 26 March 2013 at 5:55 PM, Devi wrote:

Couple of queries -

- Currently, the uploaded images are saved on to the disk first and
then uploaded to AWS. Is this 2 step process intended? Do we want to
maintain a local copy too? In that case, do we want to do batch
uploads to S3 periodically?
Yesterday I came across a useful command line utility called s3cmd. I used it for uploading two large files from my EC2 server to a bucket. Worked well. Didn't explore it beyond that.
Apparently, I remember reading that it supports sync. After doing reliability tests & choosing to use it could possibly take away the need of the application handling this at all, i.e. of course in the case answer to Devi's question about maintaining a local copy is Yes.
--
~Devi

--
You received this message because you are subscribed to the Google Groups "HasGeek Code" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hasgeek-code...@googlegroups.com.
To post to this group, send email to hasgee...@googlegroups.com.

Deepak Jois

unread,
Mar 26, 2013, 10:05:48 AM3/26/13
to hasgee...@googlegroups.com

On Tue, Mar 26, 2013 at 1:25 PM, Devi <asl...@gmail.com> wrote:
- Currently, the uploaded images are saved on to the disk first and
then uploaded to AWS. Is this 2 step process intended? Do we want to
maintain a local copy too? In that case, do we want to do batch
uploads to S3 periodically?

I am not aware of the exact reason for the query, but jumping in here just in case it might be usefule. It is possible to upload directly from a browser to S3: http://aws.amazon.com/articles/1434

Deepak

Devi

unread,
Mar 26, 2013, 11:29:08 AM3/26/13
to hasgee...@googlegroups.com
On Tue, Mar 26, 2013 at 7:23 PM, Mitesh Ashar <em...@miteshashar.com> wrote:
>
> --
> Mitesh Ashar
> Sent with Sparrow
>
> On Tuesday, 26 March 2013 at 5:55 PM, Devi wrote:
>
> Couple of queries -
>
> - Currently, the uploaded images are saved on to the disk first and
> then uploaded to AWS. Is this 2 step process intended? Do we want to
> maintain a local copy too? In that case, do we want to do batch
> uploads to S3 periodically?
>
> Yesterday I came across a useful command line utility called s3cmd. I used
> it for uploading two large files from my EC2 server to a bucket. Worked
> well. Didn't explore it beyond that.
> Apparently, I remember reading that it supports sync. After doing
> reliability tests & choosing to use it could possibly take away the need of
> the application handling this at all, i.e. of course in the case answer to
> Devi's question about maintaining a local copy is Yes.

I know of s3cmd and in fact I was using the same to test the uploads
to S3 today. I've used sync option also before. But I guess what s3cmd
can do (get, put, sync etc) is just a small part of the project (just
a couple of lines of code each with boto). We need to manage
permissions, sizes, ownership etc. along with that in Imgee.

My question was whether the local copy is really required and if yes, why?

Kiran Jonnalagadda

unread,
Mar 27, 2013, 9:28:11 AM3/27/13
to HasGeek Code

Devi, the local copy is meant to be a cache for when the image is requested at a different resolution and needs to be rescaled. We don't want to store everything locally long term, so this behaviour is until we add a cache management layer that discards local copy after a day or so.

We are assigning the image a uuid4 (stored as urlsafe_base64). The original filename should be stored as the default title, but can be changed by the user. It shouldn't be used for dupe detection.

However, dupe detection is a good idea, so maybe we can add an md5sum column and check against that.

Kiran

--
Kiran Jonnalagadda
+91-99452-35123
http://hasgeek.com/

(Sent from my phone)

Devi

unread,
Mar 28, 2013, 5:35:50 AM3/28/13
to hasgee...@googlegroups.com
On Wed, Mar 27, 2013 at 6:58 PM, Kiran Jonnalagadda <ki...@hasgeek.com> wrote:
> Devi, the local copy is meant to be a cache for when the image is requested
> at a different resolution and needs to be rescaled. We don't want to store
> everything locally long term, so this behaviour is until we add a cache
> management layer that discards local copy after a day or so.

Ah. I see the point now. Got confused by the code which reads the
other way - get the original image from S3 and create a rescaled image
locally. Will fix that.

> We are assigning the image a uuid4 (stored as urlsafe_base64).
and we are calling that "name". Shouldn't we preserve the extension of
the image in the name?

> The original
> filename should be stored as the default title, but can be changed by the
> user. It shouldn't be used for dupe detection.
>
> However, dupe detection is a good idea, so maybe we can add an md5sum column
> and check against that.

Makes sense.

Kiran Jonnalagadda

unread,
Mar 28, 2013, 6:11:50 AM3/28/13
to HasGeek Code

Browsers only care for mime type, so extension-free is fine as long as the right mime type is delivered.

Kiran

--
Kiran Jonnalagadda
+91-99452-35123
http://hasgeek.com/

(Sent from my phone)

Mitesh Ashar

unread,
Mar 28, 2013, 6:19:17 AM3/28/13
to hasgee...@googlegroups.com

-- 
Mitesh Ashar
Sent with Sparrow

We considered the case where the name of files is same, but files are different. In the context of using md5sum for dupe detection, one more thing that surfaces is that since the name is used for displaying, what if the user wants the same image to be displayed in two locations  with different titles.

Devi

unread,
Mar 28, 2013, 6:57:41 AM3/28/13
to hasgee...@googlegroups.com
We could check for the uniqueness of the title and the md5sum.
> --
> You received this message because you are subscribed to the Google Groups
> "HasGeek Code" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to hasgeek-code...@googlegroups.com.
> To post to this group, send email to hasgee...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
~Devi

Anand Chitipothu

unread,
Mar 28, 2013, 8:49:40 AM3/28/13
to hasgee...@googlegroups.com
I don't think uniqueness of title and md5sum is a good idea.

md5sum is uniquely identified an uploaded image. There could be two different entries for the same file with two different titles, you just need to make sure your database can support that.

Anand
Anand
http://anandology.com/

Devi

unread,
Mar 28, 2013, 9:17:26 AM3/28/13
to hasgee...@googlegroups.com
Instead of using uuid4, how about using md5sum of the image?

Devi

unread,
Mar 28, 2013, 11:22:54 PM3/28/13
to hasgee...@googlegroups.com
On Thu, Mar 28, 2013 at 3:41 PM, Kiran Jonnalagadda <ki...@hasgeek.com> wrote:
> Browsers only care for mime type, so extension-free is fine as long as the
> right mime type is delivered.

Yes, I suggested that for humans to understand.

Kiran Jonnalagadda

unread,
Mar 30, 2013, 10:50:02 PM3/30/13
to HasGeek Code

The same md5sum can legitimately appear twice if two different users upload the same image. This can happen when two users represent different departments of the same company. They'll want to use the same image but with independent permission management.

Titles are only shown to the uploading user to help them organize their images. Others will only ever see the image itself, embedded on another web page. The title is also an editable free text field. Therefore there is no meaning to a unique constraint on Title.

Kiran

--
Kiran Jonnalagadda
+91-99452-35123
http://hasgeek.com/

(Sent from my phone)

Reply all
Reply to author
Forward
0 new messages