Binary files with the same name but different extensions

245 views
Skip to first unread message

AlexGhitza

unread,
Mar 10, 2012, 8:38:03 PM3/10/12
to na...@googlegroups.com
Hi,

I have two files: blah.pdf and blah.tex, which I want to place on a site created with nanoc.  To do this, I tried to simply follow what I normally do: if I just have blah.pdf, I put it somewhere under content/ and link to it from the markdown file of the page I want it to appear on.  However, copying both blah.pdf and blah.tex under content/ results in an error of the type: "Found 2 content files for blah; expected 0 or 1".

I guess I'm looking for the "right way" of doing this, and given my limited understanding of how nanoc works I can't seem to be able to extract this from the documentation or previous discussions in this group.

Best,
Alex

Arno Hautala

unread,
Mar 10, 2012, 9:20:55 PM3/10/12
to na...@googlegroups.com
On Sat, Mar 10, 2012 at 20:38, AlexGhitza <agh...@gmail.com> wrote:
>
> I have two files: blah.pdf and blah.tex

nanoc strips the extension when generating item IDs. Denis has talked
about implementing an option to include the extension in the item ID,
but this isn't possible yet (that I'm aware of).

I resolved this by naming them blah_pdf.pdf and blah_tex.tex and then
including a routing rule to set the correct destination.

--
arno  s  hautala    /-|   ar...@alum.wpi.edu

pgp b2c9d448

Justin Clift

unread,
Mar 10, 2012, 10:11:12 PM3/10/12
to na...@googlegroups.com

If it's not too much hassle, a way that works is to change the files names to be unique.
The main (non extension) part that is. ie to blah_pdf.pdf and blah_tex.tex or similar
(just not both "blah")

I've had the same problem when importing existing sites before. Doing the rename thing
works... though there might be a better way. (I've not asked. :>)

Regards and best wishes,

Justin Clift

Alex Ghitza

unread,
Mar 16, 2012, 6:37:37 PM3/16/12
to Justin Clift, na...@googlegroups.com
On Sun, 11 Mar 2012 14:11:12 +1100, Justin Clift <jus...@salasaga.org> wrote:
> If it's not too much hassle, a way that works is to change the files names to be unique.
> The main (non extension) part that is. ie to blah_pdf.pdf and blah_tex.tex or similar
> (just not both "blah")

Thanks Arno and Justin, this does work.

So nanoc has text items and binary items, and both are managed/processed
by nanoc. Would it be a good idea to introduce a new type of item that
is simply not processed by nanoc? The compiler would just take those
files from content/ and drop them into output/ without any changes. So
I could have blah.tex, blah.bib, blah.pdf, blah.ps, etc. all tagged as
"items not to be processed" and wouldn't need to mess with the
filenames.

I realize that I could just do this myself, i.e. simply put these files
into output/ directly, but conceptually it makes more sense to have all
the content in content/, and have output/ be completely
computer-generated.

--
Best,
Alex

Alex Ghitza -- http://aghitza.org/
Lecturer in Mathematics -- The University of Melbourne -- Australia

Denis Defreyne

unread,
Apr 15, 2012, 9:25:55 AM4/15/12
to na...@googlegroups.com
On 16 Mar 2012, at 23:37, Alex Ghitza wrote:

> Would it be a good idea to introduce a new type of item that is simply not processed by nanoc? The compiler would just take those
> files from content/ and drop them into output/ without any changes. So I could have blah.tex, blah.bib, blah.pdf, blah.ps, etc. all tagged as
> "items not to be processed" and wouldn't need to mess with the filenames.

I certainly understand the need for this, but adding a new type of item would add complexity, which I’d like to avoid if possible.

For me, the ideal solution to this problem is to let identifiers include extensions, but still allow matching identifiers without paths. Unfortunately, as far as I can tell, it is impossible to make this work without breaking backwards compatibility. Here’s an example of how it would work:

item.identifier.to_s
# => '/about.txt'
item.identifier =~ '/about'
# => true
item.identifier =~ '/about.html'
# => false
item.identifier =~ '/about.txt'
# => true

I think I _might_ be able to get it to work without breaking compatibility by letting this identifier class behave a lot like String (overriding and abusing #==), but I have a feeling that I will regret this later on. So this will likely not change before nanoc 4.0.

> I realize that I could just do this myself, i.e. simply put these files into output/ directly, but conceptually it makes more sense to have all
> the content in content/, and have output/ be completely computer-generated.

Yes… the output directory should be entirely managed by nanoc. This is an assumption nanoc makes for e.g. the “prune” command.

Cheers

Denis

signature.asc

kimo.johnson

unread,
Apr 15, 2012, 8:23:29 PM4/15/12
to na...@googlegroups.com

I'm new to nanoc and this forum. I recently converted two sites from other systems to nanoc (one from Webby and one from symfony):

Although I don't know ruby, the process was fairly straightforward. I'm sure my nanoc setup could be improved but everything seems to be working.

One of the major issues I ran into when converting my sites is static files with different extensions. I had quite a few of these (paper.pdf, paper.tex, paper.mov, etc.). I ended up changing some of the names, which I didn't want to do because it breaks links from other sites. But I couldn't figure out an easy way to handle this.

In my case, I just want these items moved from content to output and I don't need an identifier for them. So I add a vote for Alex's suggestion of items that aren't processed by nanoc and just moved.

Anyway, thanks for the great system!

Kimo

Sean Davis

unread,
Apr 16, 2012, 6:25:13 AM4/16/12
to na...@googlegroups.com
On Sun, Apr 15, 2012 at 8:23 PM, kimo.johnson <kimo.j...@gmail.com> wrote:
>
> I'm new to nanoc and this forum. I recently converted two sites from other
> systems to nanoc (one from Webby and one from symfony):
> http://www.gelsight.com
> http://people.csail.mit.edu/kimo
>
> Although I don't know ruby, the process was fairly straightforward. I'm sure
> my nanoc setup could be improved but everything seems to be working.
>
> One of the major issues I ran into when converting my sites is static files
> with different extensions. I had quite a few of these (paper.pdf, paper.tex,
> paper.mov, etc.). I ended up changing some of the names, which I didn't want
> to do because it breaks links from other sites. But I couldn't figure out an
> easy way to handle this.
>
> In my case, I just want these items moved from content to output and I don't
> need an identifier for them. So I add a vote for Alex's suggestion of items
> that aren't processed by nanoc and just moved.

I'll add a +1 here, too. For me, one of the most significant
limitations with nanoc is the lack of a simple "copy these contents"
directive; I do not think it is an uncommon use case. I currently
maintain a "static output" directory that I simply rsync to the server
on top of the nanoc content directory, but this is definitely
suboptimal and not in the spirit of having nanoc in control.

The identifier description below may allow such a "copy" directive,
but I haven't thought through clearly how that would look in the Rules
file.

Sean

> --
> You received this message because you are subscribed to the nanoc
> discusssion group.
>
> To post to this group, send email to na...@googlegroups.com
> To unsubscribe from this group, send email to
> nanoc+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/nanoc?hl=en

Justin Clift

unread,
Apr 16, 2012, 6:54:47 AM4/16/12
to na...@googlegroups.com
On 16/04/2012, at 8:25 PM, Sean Davis wrote:
<snip>

>>
>> In my case, I just want these items moved from content to output and I don't
>> need an identifier for them. So I add a vote for Alex's suggestion of items
>> that aren't processed by nanoc and just moved.
>
> I'll add a +1 here, too. For me, one of the most significant
> limitations with nanoc is the lack of a simple "copy these contents"
> directive; I do not think it is an uncommon use case. I currently
> maintain a "static output" directory that I simply rsync to the server
> on top of the nanoc content directory, but this is definitely
> suboptimal and not in the spirit of having nanoc in control.

Yeah, it would be really useful to have *some* way for this to work.

Maybe some code changes, so files not covered by the "text_extensions"
list, can be copied as is...?

Wonder if it would be possible, for files not covered by the
"text_extensions" list, to have their whole filename become the
identifier?

Pseudo logic something like this:

If file_extension is in list, then identifier is the first part of
the filename. i.e. foo.haml -> /foo/

If file_extension is NOT in list, then identifier is the whole
filename. i.e. foo.png -> /foo.png/

Sounds a bit half arsed, but might not be too hard to implement (for
someone who can code), plus also gets the desired result.

?

Denis Defreyne

unread,
Apr 16, 2012, 7:28:32 AM4/16/12
to na...@googlegroups.com
Letting files with extensions not in `text_extensions` include their
extension in the identifier is a step in the right direction, but will
break backwards compatibility.

I was thinking about adding an `add_extensions_below:` attribute,
containing a list of path prefixes. Items with an path starting with
any of those prefixes will have an identifier containing the
extension. For example:

Configuration:

add_extensions_below: [ '/assets' ]

Files and directories:

content/
assets/
stylesheet.css
blog.html

Identifiers:

/assets/stylesheet.css/
/blog/

What do you think?

Denis

Andreas Drop

unread,
Apr 16, 2012, 8:19:36 AM4/16/12
to nanoc
I do maintain all static content in a /static directory beside the /
content dir, and use a simple helper method to copy all those files in
there to output, with all subdirs.

If there is a huge amount of those files, mayby the copying should be
done with rsync, because it can be set to only copy updated and new
files.

Andy



kimo.johnson

unread,
Apr 16, 2012, 10:31:13 AM4/16/12
to na...@googlegroups.com
Hi Denis,

I'm not sure how this would work for my setup. I have my files organized by "publication" where each publication has its own folder and there can be various types of files associated with a publication (images, pdfs, movies, etc). For example, my content directory might look like this:

content/
  publications/
     conference2010/
      paper10.pdf
      index.html
      paper10.tex
    journal2009/
      paper09.pdf
      index.html
      paper09.mov
  blog/
    blog1.html
    blog2.html

Of course, I choose more descriptive names for the papers but the point is that a top level prefix filter would make identifiers for the publications part of the site different from the blog part of the site, which seems awkward to me.

Some of the ideas previously suggested would work for my setup, such as appending the extension to the identifier for unrecognized types or just moving these files without generating an identifier. A variant on the first idea would be defining an 'append_extension' list where files with a type in the list have their extensions appended to the identifier.

Kimo

Sean Davis

unread,
Apr 16, 2012, 1:51:30 PM4/16/12
to na...@googlegroups.com

I would aim for the same model as Kimo, generally, with each folder
representing a page, in some sense. I routinely write a tutorial or a
blog post and want to have supporting files in the same folder for
easy referencing on the web and on the file system. I'm not sure how
common this setup is, though.

Sean


> Kimo
>
>
> On Monday, April 16, 2012 7:28:32 AM UTC-4, Denis Defreyne wrote:
>>
>> Letting files with extensions not in `text_extensions` include their
>> extension in the identifier is a step in the right direction, but will
>> break backwards compatibility.
>>
>> I was thinking about adding an `add_extensions_below:` attribute,
>> containing a list of path prefixes. Items with an path starting with
>> any of those prefixes will have an identifier containing the
>> extension. For example:
>>
>> Configuration:
>>
>>     add_extensions_below: [ '/assets' ]
>>
>> Files and directories:
>>
>>     content/
>>       assets/
>>         stylesheet.css
>>       blog.html
>>
>> Identifiers:
>>
>>     /assets/stylesheet.css/
>>     /blog/
>>
>> What do you think?
>>
>> Denis
>>
>>

Denis Defreyne

unread,
Apr 17, 2012, 2:31:49 AM4/17/12
to na...@googlegroups.com
On 16 Apr 2012, at 16:31, kimo.johnson wrote:

> I'm not sure how this would work for my setup. I have my files organized by "publication" where each publication has its own folder and there can be various types of files associated with a publication (images, pdfs, movies, etc).

Ahh, in that case an add_extensions_below would not be useful. Two alternative suggestions:

1. keep_extensions: [ 'jpg', 'png', 'js', 'pdf', 'mov' ] -- this will cause the data source not to remove any of the extensions in the list from the identifier. Other extensions are removed. The disadvantage is that this list could become rather long, so perhaps instead of specifying a list of extensions to keep, it is better to do the opposite:

2. remove_extensions: [ 'html', 'md', 'txt', 'css' ] -- this will cause the data source to remove any of the extensions. Other extensions are kept. This list can grow fairly large as well, but not as large as a keep_extensions list I believe. If set to “nil”, all extensions will be removed (the default, for backwards compatibility).

I would also like a keep_extensions_below attribute, because it can be quite useful to people with an assets/ directory which contains all assets.

signature.asc

kimo.johnson

unread,
Apr 17, 2012, 8:28:45 AM4/17/12
to na...@googlegroups.com

Yes, I believe either option here would work for my setup. Option 2 would probably be a shorter list as you suggested. I can't think of any problems, so I would be interested in such a solution.

Thanks,
Kimo

Carter Charbonneau

unread,
Apr 18, 2012, 7:00:02 PM4/18/12
to na...@googlegroups.com
Another way to do this would to have a folder alongside content that just gets the items copied as-is after processing content.

Justin Clift

unread,
Apr 23, 2012, 1:02:02 AM4/23/12
to na...@googlegroups.com
On 16/04/2012, at 9:28 PM, Denis Defreyne wrote:
> Letting files with extensions not in `text_extensions` include their
> extension in the identifier is a step in the right direction, but will
> break backwards compatibility.
>
> I was thinking about adding an `add_extensions_below:` attribute,
> containing a list of path prefixes. Items with an path starting with
> any of those prefixes will have an identifier containing the
> extension. For example:
>
> Configuration:
>
> add_extensions_below: [ '/assets' ]
>
> Files and directories:
>
> content/
> assets/
> stylesheet.css
> blog.html
>
> Identifiers:
>
> /assets/stylesheet.css/
> /blog/
>
> What do you think?

Wouldn't work for one of the major uses of it... importing and
existing code base. :(

For example, we pull in javascript libraries, templating engines,
and so forth. So far, and for every one, I have to manually go
through them and change their filenames and internally references
just to make them work with nanoc. (sometimes painful)

An effect of that, is that we don't really upgrade our javascript
libraries and similar when a new release comes out, unless there's
a _really good reason_ for it. (hasn't been so far :>)

To my thinking (so far), this is an important scenario that the
"right" solution should somehow support. :)

+ Justin

> Denis

Denis Defreyne

unread,
Apr 27, 2012, 3:25:56 AM4/27/12
to na...@googlegroups.com
On 23 Apr 2012, at 07:02, Justin Clift wrote:

> Wouldn't work for one of the major uses of it... importing and
> existing code base. :(
>
> For example, we pull in javascript libraries, templating engines,
> and so forth.

Where do those files all end up? Can you give a more concrete example of what files end up in what directories?

> An effect of that, is that we don't really upgrade our javascript
> libraries and similar when a new release comes out, unless there's
> a _really good reason_ for it. (hasn't been so far :>)

Yes… that certainly is something that needs to be resolved! :)

Cheers

Denis

signature.asc
Reply all
Reply to author
Forward
0 new messages