Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

osascript and encoding of CLI arguments

185 views
Skip to first unread message

Thomas Kaiser

unread,
Jan 19, 2006, 6:17:57 AM1/19/06
to
Hi all,

I recently ran into the following problem: When trying to call an
AppleScript via osascript from within the shell, AppleScript thinks the
parameters supplies on the command line are encoded in MacRoman (regardless
of system environment settings such as $LANG and so on). I've taken a close
look into

<http://developer.apple.com/technotes/tn2002/tn2065.html>

already but that did'nt give me a hint how to let AppleScript interpret the
'argv' arguments in a different encoding -- preferably UTF-8.

My script (saved as encoding-test.scpt) looks like

on run argv
set FirstFile to POSIX file (item 1 of argv)
return FirstFile & (system attribute "LANG")
end run

When I call it with a file that contains special characters like eg. german
umlauts (for example "Hüttenkäse" ;-)

osascript encoding-test.scpt /Users/tk/Hüttenkäse

then AppleScript assumes the parameters in argv are encoded in MacRoman
instead of UTF-8 as I would assume would be a reasonable default:

powerbook-tk:~ tk$ osascript encoding-test.scpt /Users/tk/Hüttenkäse
file MacOS X:Users:tk:H?ºttenk?§se, de_DE.UTF-8

My $LANG is set to "de_DE.UTF-8" as you can see from the output. Any ideas
how to solve the problem?

TIA,

Thomas

Simon Slavin

unread,
Jan 21, 2006, 6:44:23 PM1/21/06
to
On 19/01/2006, Thomas Kaiser wrote in message <dqnsh5$t9o$1@svr7.m-
online.net>:


> When I call it with a file that contains special characters like eg.
> german umlauts (for example "Hüttenkäse" ;-)
>
> osascript encoding-test.scpt /Users/tk/Hüttenkäse
>
> then AppleScript assumes the parameters in argv are encoded in MacRoman
> instead of UTF-8 as I would assume would be a reasonable default:
>
> powerbook-tk:~ tk$ osascript encoding-test.scpt /Users/tk/Hüttenkäse
> file MacOS X:Users:tk:H?ºttenk?§se, de_DE.UTF-8

Try something like

set argvInternational to argv as international text

Simon.
--
http://www.hearsay.demon.co.uk

Thomas Kaiser

unread,
Jan 22, 2006, 5:22:13 AM1/22/06
to
Simon Slavin schrieb am 2006-01-21 in <news:dqui9s$4pr$1$830f...@news.demon.co.uk>

> On 19/01/2006, Thomas Kaiser wrote
>> When I call it with a file that contains special characters like eg.
>> german umlauts (for example "Hüttenkäse" ;-)
>>
>> osascript encoding-test.scpt /Users/tk/Hüttenkäse
>>
>> then AppleScript assumes the parameters in argv are encoded in MacRoman
>> instead of UTF-8 [...]

>
> Try something like
>
> set argvInternational to argv as international text

Well, it *is* already interpreted as international text (MacRoman). What I
want is something like

set argvUTF8 to argv as «class utf8»

which does not work. Names on HFS+ filesystems are UTF-8 decomposed so I
need a way to let AppleScript interpret it in that encoding. But to no avail
so far.

BTW: It seems to work when I convert the UTF-8 strings to MacRoman in the
shell before calling osascript [1] but there might occur characters that
aren't mappeable directly, so this isn't a solution either :-(

Regards,

Thomas

[1] echo "$filename" | iconv -f UTF-8-MAC -t MACROMAN

David Phillip Oster

unread,
Jan 23, 2006, 12:30:53 PM1/23/06
to
In article <dqnsh5$t9o$1...@svr7.m-online.net>,
Thomas Kaiser <Thomas...@phg-online.de> wrote:

> Hi all,
>
> I recently ran into the following problem: When trying to call an
> AppleScript via osascript from within the shell, AppleScript thinks the
> parameters supplies on the command line are encoded in MacRoman (regardless
> of system environment settings such as $LANG and so on). I've taken a close
> look into
>
> <http://developer.apple.com/technotes/tn2002/tn2065.html>
>
> already but that did'nt give me a hint how to let AppleScript interpret the
> 'argv' arguments in a different encoding -- preferably UTF-8.

In the OS X 10.3 days, osascript didn't support passing arguemnts at all.

I wrote osasubr, an open-source replacement that did, and I make it
available with the C++ source code at:
<http://www.turbozen.com/mac/osasubr/>

It would be easy to modify the source to convert from UTF-8 shell
command line arguments to aeDescriptors of type international text
('utxt') (CFString provides conversion routines.) then pass those as
arguments to OSADoEvent().

Thomas Kaiser

unread,
Jan 23, 2006, 3:58:42 PM1/23/06
to
David Phillip Oster wrote at 2006-01-23 in <news:oster-69312D....@newsclstr02.news.prodigy.com>

> In the OS X 10.3 days, osascript didn't support passing arguemnts at all.

But you can export environment variables and read them via 'system
attribute'. But the same problem applies to this approach. AS always
interprets 'system attributes' as well as CLI arguments as international
text.

> I wrote osasubr, an open-source replacement that did, and I make it
> available with the C++ source code at:
> <http://www.turbozen.com/mac/osasubr/>
>
> It would be easy to modify the source to convert from UTF-8 shell
> command line arguments to aeDescriptors of type international text
> ('utxt') (CFString provides conversion routines.) then pass those as
> arguments to OSADoEvent().

O.k.

But that's the opposite of what I want/need. Interpreting the arguments as
Unicode.

BTW: You say in the docs:

You can pass a Unix (Posix) path for the file argument.

Does this mean you do some sort of encoding conversion or do you pass the
posix path unchanged thru?

Regards,

Thomas

David Phillip Oster

unread,
Jan 24, 2006, 12:01:27 AM1/24/06
to
In article <dr3g22$fap$1...@svr7.m-online.net>,
Thomas Kaiser <Thomas...@phg-online.de> wrote:

> BTW: You say in the docs:
>
> You can pass a Unix (Posix) path for the file argument.
>
> Does this mean you do some sort of encoding conversion or do you pass the
> posix path unchanged thru?

It means that the argument to 'osasybr' naming the file that holds the
appleScript, can be a POSIX path. It is just used to access the file
holding the applescript. The next argument is the routine within the
file to run, and following arguments are passed to that routine.

has

unread,
Jan 24, 2006, 2:22:52 PM1/24/06
to
Thomas Kaiser wrote:

> I recently ran into the following problem: When trying to call an
> AppleScript via osascript from within the shell, AppleScript thinks the
> parameters supplies on the command line are encoded in MacRoman

I'd say it's a design flaw in osascript. While there are good reasons
for passing all input as raw data and letting the script perform any
conversions necessary (this is what more traditional scripting
languages do), the limited nature of AS renders those moot and the
logical thing would be for osascript to treat all input as UTF8 data
and pass it to AS as Unicode text values. (It already treats return
values as Unicode text, writing results to stdout as UTF8-encoded
data.) Instead, it's packing arguments into AS strings, which AS
interprets according to your system's primary encoding (MacRoman in
your case).

There's no way to convert string values containing raw UTF8 data to
Unicode text values within AS itself, and it gets rather awkward trying
to do it via 'do shell script', so the easiest solution would be to use
an osax or similar to perform the conversion for you. For example,
using TextCommands <http://osaxen.com/files/textcommands1.0.1.html>:

on run argv
-- kludge to convert arguments to Unicode text
tell application "TextCommands"
repeat with itemRef in argv
set itemRef's contents to convert to unicode itemRef from "utf8"
end repeat
end tell
-- do stuff with argv...
end run

I'd also suggest you file a bug report with Apple, explaining why
arguments really ought to be treated as UTF8.

...

Alternatively, if you're a regular command line user, consider using
some other language than AppleScript. Perl, Python, Ruby and Tcl are
all native to Unix and all support application scripting to some degree
or other (Perl and Python are best). I can certainly vouch for Python
as a command line-friendly alternative to AS (I wrote the bridge for
it).

HTH

0 new messages