Unicode characters as data input

1,152 views
Skip to first unread message

pi

unread,
Mar 26, 2010, 10:21:21 PM3/26/10
to robotframework-users
Hi,

I am wondering if there is a way to use unicode characters as data
input.

For example, one can escape carriage return with \ (backslash) such
as "text\rmore text" which will prevent robot from parsing r
character

what if I want to use unicode such as "text\uxxxx\more text"?

Is it possible?

Thanks,
-Snejana

Pekka Klärck

unread,
Mar 29, 2010, 7:42:35 PM3/29/10
to bug...@gmail.com, robotframework-users
2010/3/27 pi <bug...@gmail.com>:

>
> I am wondering if there is a way to use unicode characters as data
> input.

Yes it is. You can actually use real characters in your data if you
follow these format specific rules:

- With HTML test data you need to specify correct encoding in the
title section. Alternatively in HTML you can use entity references
such as "&auml;".
- With TXT and TSV formats you need encode your files using UTF-8 encoding.

For more information, see the format specific sections in the User Guide:
http://robotframework.googlecode.com/svn/tags/robotframework-2.1.3/doc/userguide/RobotFrameworkUserGuide.html#test-data-syntax

Another solution is using variables. Keywords can return values
containing Unicode, and if you need many strings, or want to be able
to use different strings on different runs, you can use variable
files:
http://robotframework.googlecode.com/svn/tags/robotframework-2.1.3/doc/userguide/RobotFrameworkUserGuide.html#variable-files

> For example, one can escape carriage return with \ (backslash) such
> as  "text\rmore text" which will prevent robot from parsing r
> character
>
> what if I want to use unicode such as "text\uxxxx\more text"?
>
> Is it possible?

Unicode escape sequences like this aren't supported. We could probably
add support for them relatively easily, but because you can just use
those actual characters I don't see big need for that.

Cheers,
.peke
--
Agile Tester/Developer/Consultant :: http://eliga.fi
Lead Developer of Robot Framework :: http://robotframework.org

Shark

unread,
Mar 30, 2010, 10:48:16 AM3/30/10
to Pekka Klärck, robotframework-users

I used following snipped of java code to parse robot input data like Google\\u000DScholar (\\u000D is a carriage return)

Pattern pattern = Pattern.compile("\\\\u[\\w]{4}");

Matcher matcher = pattern.matcher(cp);

while (matcher.find()){

char unicode = (char) Integer.parseInt(matcher.group().substring(2),16);

codePoint = codePoint.replace(matcher.group(), unicode + "");

} 

This returns me Google'\u000D'Scholar, which is printed like:
Google
Scholar

Without this code snippet, literal string Google\u000DScholar is returned.

What did you mean by "you can just use those actual characters"? \uxxxx is a valid unicode string, where xxxx in this case can be anything in a range 0000-FFFF

Shark

unread,
Mar 30, 2010, 10:54:06 AM3/30/10
to Pekka Klärck, robotframework-users
Forgot to mention:

of course, robot supports \r as a carriage return, but the goal of the test is to make sure application can process unicode string handed from user.

Thus, the snippet of java is sufficient.

It would be nice if support for unicode sequences is added some time in the future, though I agree there is no high priority

Thanks,
Snejana

Pekka Klärck

unread,
Mar 30, 2010, 4:03:20 PM3/30/10
to Shark, robotframework-users
2010/3/30 Shark <bug...@gmail.com>:

>
> I used following snipped of java code to parse robot input data like
> Google\\u000DScholar (\\u000D is a carriage return)
>
> Pattern pattern = Pattern.compile("\\\\u[\\w]{4}");
> Matcher matcher = pattern.matcher(cp);
> while (matcher.find()){
> char unicode = (char) Integer.parseInt(matcher.group().substring(2),16);
> codePoint = codePoint.replace(matcher.group(), unicode + "");
> }
> This returns me Google'\u000D'Scholar, which is printed like:
> Google
> Scholar
> Without this code snippet, literal string Google\u000DScholar is returned.

Yes, your code converts an ASCII string to Unicode so that \u000D is
converted to a backslash.

> What did you mean by "you can just use those actual characters"? \uxxxx is a
> valid unicode string, where xxxx in this case can be anything in a range
> 0000-FFFF

'\uxxxx' is a Unicode escape sequence that Java and many other
programming languages support. This syntax isn't, however, supported
in in Robot Framework test data. Instead of using those escape
sequences you can use the actual characters. For example, instead of
'\u00e4' you can just use 'ä'. For this to work, you just need to take
care of the encoding as I explained in the previous mail.

If you really want to use Unicode escape sequences instead of the real
characters, you can create variables using them. You could return
create and return a Unicode character e.g. in your Java keyword, but
if you need more of them then variable files are probably a better
approach. You could, for example, have this code in `myvars.py` file:

auml = u'\u00e4'
bspace = u'\u000d'

and then use them in your test data like:

***Settings***
Variables myvars.py

***Test Case***
Example
Log ${AUML}
Log ${BSPACE}

Cheers,
.peke

Pekka Klärck

unread,
Mar 30, 2010, 4:07:19 PM3/30/10
to Shark, robotframework-users
2010/3/30 Shark <bug...@gmail.com>:

> Forgot to mention:
> of course, robot supports \r as a carriage return, but the goal of the test
> is to make sure application can process unicode string handed from user.
> Thus, the snippet of java is sufficient.

I recommend you to try out variable files.

> It would be nice if support for unicode sequences is added some time in the
> future, though I agree there is no high priority

I agree sometimes a direct support for \uxxxx escape sequences would
be handy. Please submit an enhancement request about it to the
tracker.

Cheers,
.peke

Reply all
Reply to author
Forward
0 new messages