Using S3 document storage

342 views
Skip to first unread message

Ami Ganguli

unread,
Jul 22, 2012, 10:17:30 PM7/22/12
to mayan...@googlegroups.com
Hi everybody,

I've been trying to get Mayan to work with Amazon S3. 

There've been various problems, which I've hacked around one-by-one just to get something working.  But now I'm running into issues that might require me to touch a fair bit of code.

Before I start doing that, I thought I would ask if anybody has gotten this to work already, and if so, how?

If not, I could start again and ask one-by-one for the "right" solution to the problems I've had, and perhaps contribute some proper patches.  I'm fairly new to Django, so the solutions I've come up with on my own so far are probably not the best.

Cheers,
Ami.

Roberto Rosario

unread,
Jul 23, 2012, 11:39:48 AM7/23/12
to mayan...@googlegroups.com
Hi Ami,

What you are trying to do sounds really interesting.  I don't have any experience working with Amazon's S3 but since Mayan uses Django's abstracted storage system any project that expands Django's storage options should work basically out of the box with Mayan.  One such project is django-storages (http://django-storages.readthedocs.org/en/latest/index.html) (http://code.larlet.fr/django-storages/).  Getting django-storages to work with Mayan would be as follows:

  • Activate the virtualenv where you installed Mayan
  • issue a pip install django-storages
  • Create a settings_local.py file with the following:
    # Import and activate the storages app
    from django.conf import settings
    settings.INSTALLED_APPS += ('storages',)

    # Set the simple S3Storage backend as Mayan's document storage backend 
    from storages.backends.s3 import S3Storage
    DOCUMENTS_STORAGE_BACKEND=S3Storage

     Please, keep the group up to date on your progress, I'm sure a lot of people are interested in this kind of setup.


    /Roberto

    Ami Ganguli

    unread,
    Jul 23, 2012, 11:57:08 AM7/23/12
    to mayan...@googlegroups.com
    Hi Roberto,

    Yup, that's more-or-less what I've done. There are a number of
    problems that I ran into, unfortunately.

    I've forked your repository on GitHub. I'll try to work through the
    issues one-by-one and send you links to the commits so that you can
    suggest better workarounds, or integrate the fixes into your
    repository.

    Cheers,
    Ami.
    > --
    >
    >
    >

    Roberto Rosario

    unread,
    Jul 23, 2012, 3:23:26 PM7/23/12
    to mayan...@googlegroups.com
    Hi Ami,

    Found your fork in github will keep an eye for your patches. Thanks!

    /Roberto

    Ami Ganguli

    unread,
    Jul 23, 2012, 3:40:57 PM7/23/12
    to mayan...@googlegroups.com
    Cool.

    I'm creating a couple of branches there - once for working on the API
    (which is a bit more urgent for me), and the other for the S3 stuff
    (not a big deal if that takes a few weeks to sort out).

    Cheers,
    Ami.
    > --
    >
    >
    >

    Carlos Aguilar

    unread,
    Jul 23, 2012, 3:42:01 PM7/23/12
    to mayan...@googlegroups.com

    You can check django-storages to make compatible with S3 your app.

    --
    Carlos Aguilar
    Consultor Hardware y Software
    DWD&Solutions
    http://www.dwdandsolutions.com
    http://www.houseofsysadmin.com
    Cel: 78740173
    Oficina: 22693598

    Nate Aune

    unread,
    Jul 23, 2012, 4:22:42 PM7/23/12
    to mayan...@googlegroups.com
    I think one problem is that much of the Mayan functionality (OCR, metadata, etc.) expects that the files are on a locally accessible file system, which is not the case if the files are on S3. 

    So it would seem necessary to temporarily download the files from S3 to a /tmp folder to process them with the Unix cmd line tools.  

    I don't have any direct experience with doing this with Mayan, but I know from another project in which we were storing MP3 files in a database, that if we wanted to extract the ID3 tags from the files using id3 cmd line tools, we had to copy the MP3 files out of the database and put them in a temp folder.

    Nate


    Roberto Rosario

    unread,
    Jul 23, 2012, 9:54:03 PM7/23/12
    to mayan...@googlegroups.com
    That is why I avoid external binary dependencies like the plague :)  Otherwise I could just pass the file handle returned by the storage class to whatever Python code needs to process a document.  But whenever there is processing by an external utility Mayan already copies the document file locally and treats it like a cached version of the original document:


    which in turn calls the Document's latest version's open method:


    which in turn calls the storage class open method :)


    Never is a document assumed to be local.  I tested this decoupling in the past storing documents in a GridFS clustered storage and worked, but it was quite some time ago so I'm very eager to see what Ami found out and fix it.  

    /Roberto

    Ami Ganguli

    unread,
    Jul 23, 2012, 11:20:57 PM7/23/12
    to mayan...@googlegroups.com
    There were several issues, but the one I finally got stuck on (well, thought I'd better ask before mucking things up too badly), was "file.path". 

    According to the docs, this method is supposed to return the local path for the file (suitable for use by the standard python "open"), or throw an exception for objects that don't have local paths (like S3).  Obviously S3 throws the exception.

    This isn't caught, so things blow up in apps/documents/models.py.  For example DocumentVersion.exists calls self.file.path, and crashes there.  I hacked around that to see if that was the only problem, but ran into trouble elsewhere for the same reason.  Decided I didn't understand the code well enough to fix this - at least not yet.

    Cheers,
    Ami.


    --
     
     
     

    Angel Rosario-Sierra (rtm-it)

    unread,
    Jul 23, 2012, 11:46:25 PM7/23/12
    to mayan...@googlegroups.com

    This is getting messy!!!  Ha,hahaha.. J

     

    __________________________________________________________________________________________

    Angel Rosario-Sierra

    Business Critical Solutions Architect

    ) Cel: 787-548-0915, ) T&F: 787-707-0869

     

    ** PLEASE TAKE NOTE OF OUR NEW ADDRESS AND TEL NUMBER **

     

    B5 CALLE TABONUCO

    SUITE 216, PBM 101

    GUAYNABO, PR 00968-3029

    TEL/FAX: 787-707-0869

     

    E-Mail:  Angel....@RTM-IT.com, Web: www.RTM-IT.com 

      Description: Description: Description: Description: C:\Documents and Settings\rosarioa\Local Settings\Temporary Internet Files\Content.Word\Vms_shark_hp.png  OpenVMS Experts in PR, OpenVMS FOREVER!

    P  Antes de imprimir este e-mail piense bien si es necesario hacerlo: El medioambiente es cosa de todos..."Before you print this E-mail,ask if it's really necessary. Our environment concerns us all"…

    "Anyone who has never made a mistake has never tried anything new!”  Albert Einstein

    --
     
     
     

    image001.jpg
    image002.jpg
    Reply all
    Reply to author
    Forward
    0 new messages