a2x non-english language title

644 views
Skip to first unread message

candido

unread,
Aug 2, 2011, 1:28:44 PM8/2/11
to asciidoc
Hi,

I am writing a spanish document with accent acute mark on the text
titles of the asciidoc document. For a html generation there is not
problem but a2x fail and there is not pdf out document. I the text
titles are without acute marks so the pdf document is generated.

I have try with asciidoc source utf-8 and iso-8859-1 enconde and a2x
fail.

The asciidoc source header has language properties like:

:lang: es
:encode: ISO-8859-15

I see that a2x work with latin1 encode document for default.

Thanks
Cándido

Lex Trotman

unread,
Aug 2, 2011, 8:19:05 PM8/2/11
to asci...@googlegroups.com

Hi Candido,

My guess would be that your latex engine doesn't support non-ascii
encodings. IIUC xetex supports unicode, but I'm no texpert.

Cheers
Lex

>
> Thanks
> Cándido
>
> --
> You received this message because you are subscribed to the Google Groups "asciidoc" group.
> To post to this group, send email to asci...@googlegroups.com.
> To unsubscribe from this group, send email to asciidoc+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/asciidoc?hl=en.
>
>

candido

unread,
Aug 3, 2011, 8:42:24 AM8/3/11
to asciidoc
Thanks Lex,

The problem possibly is the ID section synthesis. I see the tex
document generated by a2x with cross references markup \label like:

\section{Comando en línea dblatex}
\label{_comando_en_línea_dblatex}\hyperlabel{_comando_en_lí
nea_dblatex}%

the text "linea" with accute accent on i has not synthetized
correctly and the "~A" character synthetized is not a legal character
for ID text.

if i write ID explicitly with accute accents text the warning message
is : "missing section"

A temporal solution is edit titles or ID explicitly without accute
accents character.

Thanks
Candido

On Aug 3, 2:19 am, Lex Trotman <ele...@gmail.com> wrote:

Lex Trotman

unread,
Aug 3, 2011, 7:31:28 PM8/3/11
to asci...@googlegroups.com
On 3 August 2011 22:42, candido <cana...@gmail.com> wrote:
> Thanks Lex,
>
> The problem possibly is the ID section synthesis. I see the tex
> document generated by a2x with cross references markup \label like:
>
> \section{Comando en línea dblatex}
> \label{_comando_en_línea_dblatex}\hyperlabel{_comando_en_lí
> nea_dblatex}%

This problem is in dblatex, asciidoc generates the docbook:

<section id="_comando_en_línea_dblatex">


The small i with grave accent is unicode 0xed which is a legal xml ID character.

You need to ask on the dblatex ML.

Cheers
Lex

Fabrice Flore-Thébault

unread,
Nov 7, 2011, 4:55:32 PM11/7/11
to asci...@googlegroups.com
So for the present, as tools used to convert docbook to pdf are not supporting accents in ids, there is this workaround to use accents in titles: manually redefine each title id by declaring it before the title:

for example:

[[this-is-an-id-for-my-title-without-accents]]
=== now i càn have a tîtle with accénts

A more pragmatic/automated approach could be to automatically remove accents in generated ids. (But i don't know how to achieve this)

PS: I publish this answer for the next time i will forget this workaround and get blocked because in french you often get accents in titles.

Lex Trotman

unread,
Nov 7, 2011, 5:05:11 PM11/7/11
to asci...@googlegroups.com
On Tue, Nov 8, 2011 at 8:55 AM, Fabrice Flore-Thébault
<fabr...@gmail.com> wrote:
> So for the present, as tools used to convert docbook to pdf are not
> supporting accents in ids, there is this workaround to use accents in
> titles: manually redefine each title id by declaring it before the title:
>
> for example:
>
> [[this-is-an-id-for-my-title-without-accents]]
> === now i càn have a tîtle with accénts
>
> A more pragmatic/automated approach could be to automatically remove accents
> in generated ids. (But i don't know how to achieve this)

Hi Fabrice,

Doing it manually is indeed a (laborious) workaround.

The problem with just dropping accented characters is that the
resulting id then may not be unique.

Cheers
Lex

>
> PS: I publish this answer for the next time i will forget this workaround
> and get blocked because in french you often get accents in titles.
>

> --
> You received this message because you are subscribed to the Google Groups
> "asciidoc" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/asciidoc/-/zxv6Ou9ZIdsJ.

Fabrice Flore-Thebault

unread,
Nov 7, 2011, 5:23:55 PM11/7/11
to asci...@googlegroups.com
On 07/11/11 23:05, Lex Trotman wrote:
> On Tue, Nov 8, 2011 at 8:55 AM, Fabrice Flore-Th�bault

> <fabr...@gmail.com> wrote:
>> So for the present, as tools used to convert docbook to pdf are not
>> supporting accents in ids, there is this workaround to use accents in
>> titles: manually redefine each title id by declaring it before the title:
>>
>> for example:
>>
>> [[this-is-an-id-for-my-title-without-accents]]
>> === now i c�n have a t�tle with acc�nts

>>
>> A more pragmatic/automated approach could be to automatically remove accents
>> in generated ids. (But i don't know how to achieve this)
>
> Hi Fabrice,
>
> Doing it manually is indeed a (laborious) workaround.
>
> The problem with just dropping accented characters is that the
> resulting id then may not be unique.

Yep, same issue when you manually create the id :/

I don't know if it would be possible to have an option in asciidoc
conversion like --no-accents-in-ids ? I guess the xhtml output removes
the accents also, as it is so bad looking in a web browser ?

--
Fabrice Flore-Th�bault

Stuart Rackham

unread,
Nov 7, 2011, 6:05:14 PM11/7/11
to asci...@googlegroups.com

I committed a patch that should work around the problem: If the 'ascii-ids'
attribute is defined then non-ascii characters are stripped from auto-generated IDs:

http://code.google.com/p/asciidoc/source/detail?r=10e2d4dc4655a2310a321b2b904c14513cea98d5


Cheers, Stuart


Dag Wieers

unread,
Nov 8, 2011, 2:33:51 PM11/8/11
to asci...@googlegroups.com
On Mon, 7 Nov 2011, Fabrice Flore-Thebault wrote:

> On 07/11/11 23:05, Lex Trotman wrote:

>> On Tue, Nov 8, 2011 at 8:55 AM, Fabrice Flore-Thï¿œbault


>> <fabr...@gmail.com> wrote:
>> > So for the present, as tools used to convert docbook to pdf are not
>> > supporting accents in ids, there is this workaround to use accents in
>> > titles: manually redefine each title id by declaring it before the
>> > title:
>> >
>> > for example:
>> >
>> > [[this-is-an-id-for-my-title-without-accents]]

>> > === now i cï¿œn have a tï¿œtle with accï¿œnts


>> >
>> > A more pragmatic/automated approach could be to automatically remove
>> > accents
>> > in generated ids. (But i don't know how to achieve this)
>>

>> Doing it manually is indeed a (laborious) workaround.
>>
>> The problem with just dropping accented characters is that the
>> resulting id then may not be unique.
>
> Yep, same issue when you manually create the id :/
>
> I don't know if it would be possible to have an option in asciidoc conversion
> like --no-accents-in-ids ? I guess the xhtml output removes the accents also,
> as it is so bad looking in a web browser ?

Now that the 'ascii-ids' attribute has been added, we shouldn't stop
trying to get the tools to support accented characters.

I would hate to see this workaround block any progress in the tools. So,
please, Fabrice, do report and track those issues in the tools you use
as a courtesy to AsciiDoc ;-)

--
-- dag wieers, d...@wieers.com, http://dag.wieers.com/
-- dagit linux solutions, in...@dagit.net, http://dagit.net/

[Any errors in spelling, tact or fact are transmission errors]

Simon

unread,
Dec 31, 2011, 8:42:26 PM12/31/11
to asciidoc
Hi list!

Here's a description of a method for a slightly better workaround:

http://www.peterbe.com/plog/unicode-to-ascii

This way, accented letters such as å, ö, ï will become a, o, i, making
the
ids a bit more recognizable for some languages. Characters such as
¿ and œ will
still get deleted, though.

Happy New Year!
/Simon Kågedal Reimer

On Nov 8 2011, 8:33 pm, Dag Wieers <d...@wieers.com> wrote:
> On Mon, 7 Nov 2011, Fabrice Flore-Thebault wrote:
> > On 07/11/11 23:05, Lex Trotman wrote:
> >>  On Tue, Nov 8, 2011 at 8:55 AM, Fabrice Flore-Thï¿œbault
> >>  <fabric...@gmail.com>  wrote:
> >> >  So for the present, as tools used to convert docbook to pdf are not
> >> >  supporting accents in ids, there is this workaround to use accents in
> >> >  titles: manually redefine each title id by declaring it before the
> >> >  title:
>
> >> >  for example:
>
> >> >  [[this-is-an-id-for-my-title-without-accents]]
> >> >  === now i cï¿œn have a tï¿œtle with accï¿œnts
>
> >> >  A more pragmatic/automated approach could be to automatically remove
> >> >  accents
> >> >  in generated ids. (But i don't know how to achieve this)
>
> >>  Doing it manually is indeed a (laborious) workaround.
>
> >>  The problem with just dropping accented characters is that the
> >>  resulting id then may not be unique.
>
> > Yep, same issue when you manually create the id :/
>
> > I don't know if it would be possible to have an option in asciidoc conversion
> > like --no-accents-in-ids ? I guess the xhtml output removes the accents also,
> > as it is so bad looking in a web browser ?
>
> Now that the 'ascii-ids' attribute has been added, we shouldn't stop
> trying to get the tools to support accented characters.
>
> I would hate to see this workaround block any progress in the tools. So,
> please, Fabrice, do report and track those issues in the tools you use
> as a courtesy to AsciiDoc ;-)
>
> --
> -- dag wieers, d...@wieers.com,http://dag.wieers.com/
> -- dagit linux solutions, i...@dagit.net,http://dagit.net/

Stuart Rackham

unread,
Dec 31, 2011, 11:35:14 PM12/31/11
to asci...@googlegroups.com
Hi Simon

Good suggestion, I've applied a patch to the trunk:
http://code.google.com/p/asciidoc/source/detail?r=29361394ca62637dc9c34c7d7f09656ddd41944a


Cheers, Stuart

On 01/01/12 14:42, Simon wrote:
> Hi list!
>
> Here's a description of a method for a slightly better workaround:
>
> http://www.peterbe.com/plog/unicode-to-ascii
>

> This way, accented letters such as �, �, � will become a, o, i, making


> the
> ids a bit more recognizable for some languages. Characters such as

> � and � will


> still get deleted, though.
>
> Happy New Year!

> /Simon K�gedal Reimer

Eduardo Santana

unread,
May 19, 2015, 2:11:58 PM5/19/15
to asci...@googlegroups.com
We have a solution for dblatex, we don't have use `asii-id`.

http://aerostitch.github.io/misc/asciidoc/asciidoc-title_uft8.html

Just add:

--dblatex-opts="--param=latex.encoding=utf8"
Reply all
Reply to author
Forward
0 new messages