mediacache doc

8 views

Skip to first unread message

Laurent Savaete

unread,

Jul 22, 2012, 12:34:13 PM7/22/12

to ductus-d...@googlegroups.com

I'm working on getting silence in podcasts. Got this seemingly running
fine for webm podcasts like:
- upload a .ogg 3s silence file and add its urn to ductus_local_settings.py
- in the podcast generation code, insert the silent audio filename in
between every other file
The result is fine for webm audio, but for some reason I can't figure
out, mp4 doesn't work. The podcast is generated properly, but no
silence.
Tracing the code, it appears that my file is actually properly
inserted there, but somehow the result doesn't include the silence. I
tried using a different audio (non silence so I could hear it if it
was inserted) in case MP4Box would just munch silence for some odd
reason, but same result.
The log contains the following lines (from mp4Box I assume):
<<<
Appending file /home/laurent/Sites/ductus/storage3/sha384/zj/Z7/zjZ75nRwdOAPvC1cn0ua8wbQS_bsFtCwrCLutqpbJU7DV08r3V2fgkS8uxqKpI_p.m4a
No suitable destination track found - creating new one (type soun)
Appending file /home/laurent/Sites/ductus/storage3/sha384/wX/c6/wXc62Dazygu2VCmc54hOBe1SHb9yET3IZAH9TTg_xm3-xLI-EOapR35L5deWlFEG.m4a
No suitable destination track found - creating new one (type soun)
>>>
These 2 urns are: the resulting podcast filename (I just deleted the
file) and the silence file (which did not exist at first, but did in
subsequent requests, and is what I expected). So I'm not sure what's
going on here. Any ideas welcome.

I'll also take the opportunity for a few questions for which answers
should go in the doc about mediacache:
- what is the naming logic for files (I assume a hash for basic files,
but what about podcasts named like
/mediacache/sha384/zjZ75nRwdOAPvC1cn0ua8wbQS_bsFtCwrCLutqpbJU7DV08r3V2fgkS8uxqKpI_p.c260c2476a92b443a7bce5d2886fe6f186774551.m4a
- if I upload an audio file, I find 3 versions of it in the storage, with names:
> urn:sha384:hash
> rn:sha384:hash.m4a
> urn:sha384:hash.oga  two of them are the same except the one with
no extension has a blob\0 header. Why do you store some files twice?
I'm assuming you have a good reason to do so, but I can't think of it
:)
- mediacache being a cache, did you include an expiration mechanism
for old files?

that's all I can think of for now.

Jim Garrison

unread,

Jul 29, 2012, 8:58:46 PM7/29/12

to ductus-d...@googlegroups.com

On 07/22/12 09:34, Laurent Savaete wrote:
> I'm working on getting silence in podcasts. Got this seemingly running
> fine for webm podcasts like:
> - upload a .ogg 3s silence file and add its urn to ductus_local_settings.py
> - in the podcast generation code, insert the silent audio filename in
> between every other file

There should be a way to reference the 3s audio file by pointing
directly to its location in the code repository, instead of assuming the
user has uploaded it to the resourcedatabase and provided a URN. Did
you attempt this and run into any difficulty?

> The result is fine for webm audio, but for some reason I can't figure
> out, mp4 doesn't work. The podcast is generated properly, but no
> silence.
> Tracing the code, it appears that my file is actually properly
> inserted there, but somehow the result doesn't include the silence. I
> tried using a different audio (non silence so I could hear it if it
> was inserted) in case MP4Box would just munch silence for some odd
> reason, but same result.
> The log contains the following lines (from mp4Box I assume):
> <<<
> Appending file /home/laurent/Sites/ductus/storage3/sha384/zj/Z7/zjZ75nRwdOAPvC1cn0ua8wbQS_bsFtCwrCLutqpbJU7DV08r3V2fgkS8uxqKpI_p.m4a
> No suitable destination track found - creating new one (type soun)
> Appending file /home/laurent/Sites/ductus/storage3/sha384/wX/c6/wXc62Dazygu2VCmc54hOBe1SHb9yET3IZAH9TTg_xm3-xLI-EOapR35L5deWlFEG.m4a
> No suitable destination track found - creating new one (type soun)
>>>>
> These 2 urns are: the resulting podcast filename (I just deleted the
> file) and the silence file (which did not exist at first, but did in
> subsequent requests, and is what I expected). So I'm not sure what's
> going on here. Any ideas welcome.
>
> I'll also take the opportunity for a few questions for which answers
> should go in the doc about mediacache:
> - what is the naming logic for files (I assume a hash for basic files,
> but what about podcasts named like
> /mediacache/sha384/zjZ75nRwdOAPvC1cn0ua8wbQS_bsFtCwrCLutqpbJU7DV08r3V2fgkS8uxqKpI_p.c260c2476a92b443a7bce5d2886fe6f186774551.m4a

See get_joined_audio_mediacache_url() and resolve_relative_mediacache_url().

Basically it's hash1.hash2.mp4 where hash1 is the hash of the podcast
itself and hash2 is the hash of the list of audio URNs which are
contained in the podcast.

Turns out, hash2 is actually pointless.

What I /meant/ to do was to make hash1 the hash of the *first audio
file* in the podcast, and hash2 the hash of the list of audio URNs as
described above. This way, if two podcasts contain the exact same
material but are under different URNs (which would happen e.g. if
somebody simply adds a tag to a lesson and re-saves it), then we don't
need to have the podcast regenerated and re-saved.

The attached diff would fix this to be the way I originally intended,
but applying it would mean regenerating all existing audio under their
new filenames :). I am not opposed to doing this -- it does make things
conceptually cleaner and will save space over time. Feel free to apply
it if you agree that it makes sense.

> - if I upload an audio file, I find 3 versions of it in the storage, with names:
> > urn:sha384:hash
> > rn:sha384:hash.m4a
> > urn:sha384:hash.oga  two of them are the same except the one with
> no extension has a blob\0 header. Why do you store some files twice?
> I'm assuming you have a good reason to do so, but I can't think of it
> :)

Sadly no good reason other than that we were already committed to saving
things with blob\0 at the beginning before we implemented mediacache,
but we can't serve things with that prefix if we want them to be useful
to a user :). One potential way out that I've considered is to make a
new LocalStorageBackend that gets rid of blob\0 or xml\0 from the
beginning of each file, and simply saves them in a xml/ or blob/
directory depending on what it is. We'd have two places to look for any
given resource, but it would mean that we could (in some instances)
simply use a symlink from the mediacache location to that file.

One reason I haven't forged ahead with implementing this is because I
don't know for how long we should expect LocalStorageBackend to scale.
Storing everything on the local filesystem assumes we have a single
filesystem with everything we know about -- i.e. it does not scale
horizontally, like a distributed filesystem like gridfs would.

We might find that it does make sense to do the "new
LocalStorageBackend" as an intermediate solution though. We should keep
this open as a possibility (and feel free to suggest an alternative
solution).

> - mediacache being a cache, did you include an expiration mechanism
> for old files?

Nope. In priniple anything can be deleted at any time. I figured
eventually we would have a script run periodically and look at access
logs and delete anything that hasn't been accessed in so long. But I've
not actually explored this idea fully.