"'abc'.split('')" gives me a "ValueError: empty separator".
However, "''.join(['a', 'b', 'c'])" gives me "'abc'".
Why this asymmetry? I was under the impression that the two would be
complementary.
Uli
--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
Anyway, if you want to turn a string into a list of single-character
strings, then use
list("abcde")
DaveA
maybe it isn't quite obvious, what the behaviour in this case should be;
re.split also works with empty delimiter (and returns the original string)
>>> re.split("", "abcde")
['abcde']
If you need to split the string into the list of single characters
like in your example, list() is the possible way:
>>> list("abcde")
['a', 'b', 'c', 'd', 'e']
>>>
vbr
Most of the files names are plain ascii, but about 10% of them have unicode
characters in them. When I try to print the string containing the name, I get
an exception:
'ascii' codec can't encode character '\udce9'
in position 37: ordinal not in range(128)
The string is:
'./Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'
This is on a windows xp system, using python 3.1 which I compiled
with the cygwin
linux compatability layer tool.
Can you tell me what encoding I need to print \udce9 and how to set python to
that encoding mode?
thanks,
jeff
That looks like a "surrogate escape" (See PEP 383)
http://www.python.org/dev/peps/pep-0383/. It indicates the wrong encoding
was used to decode the filename.
-Mark
That seems likely. How do I set the encoding to something correct to
decode the filename?
Clearly windows knows how to display it.
I suspect since I complied python with cygwin, that it is using a
POSIX standard,
rather than a windows specific standard. Of course ideally, I would
like my code to work
on linux as well as windows, as I back up all of my data to a linux
machine with
samba.
thanks,
jeff
Have you perhaps tried using the native Windows version of Python?
Cheers,
Chris
--
http://blog.rebertia.com
I fell into this trap some time ago too.
There is no such string method.
The opposite of "".join(aListOfChars) is
list(aString)
- Hendrik
>>That looks like a "surrogate escape" (See PEP 383)
>>http://www.python.org/dev/peps/pep-0383/. It indicates the wrong
>>encoding was used to decode the filename.
>
> That seems likely. How do I set the encoding to something correct to
> decode the filename?
>
> Clearly windows knows how to display it.
> I suspect since I complied python with cygwin, that it is using a
> POSIX standard,
> rather than a windows specific standard. Of course ideally, I would
> like my code to work
> on linux as well as windows, as I back up all of my data to a linux
> machine with
> samba.
>
If you are running on a Linux system then the filenames are stored encoded
as bytes but the system does not store the encoding. In fact different
files in the same directory could use different encodings. That's why
Python 3.1 uses the surrogate escapes so that you can at least work with
the files even if you can't display the filenames.
If you are running on Windows and using the native Python to access an NTFS
formatted partition then there shouldn't be a problem: the filenames are
stored as unicode and Python uses the unicode apis. Of course you may still
not be able to display the filenames if they contain characters not
available in your output codepage.
If you use cygwin a quick search on Google turned up some old discussions
implying that it uses the 8 bit apis which convert characters using the
current codepage and converts characters it cannot handle to '?' but I have
no idea if that still applies.
> Hi!
>
> "'abc'.split('')" gives me a "ValueError: empty separator". However,
> "''.join(['a', 'b', 'c'])" gives me "'abc'".
>
> Why this asymmetry?
The docs say
"If sep is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings (for example, '1,,2'.split(',') returns
['1', '', '2']). "
Now suppose sep = ''. That means split() should return an infinitely
long list of empty strings! Because if sep = '' then the
string 'hello' starts with an empty string followed by sep
followed by an empty string followed by sep followed by an
empty string followed by sep... that's all before we get to
the 'h'.