Groups

Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

1,671 views

Skip to first unread message

zljubi...@gmail.com

unread,

May 8, 2015, 3:01:06 PM5/8/15

to

The script is very simple (abc.txt exists in ROOTDIR directory):

import os
import shutil

ROOTDIR = 'C:\Users\zoran'

file1 = os.path.join(ROOTDIR, 'abc.txt')
file2 = os.path.join(ROOTDIR, 'def.txt')

shutil.move(file1, file2)

But it returns the following error:

C:\Python34\python.exe C:/Users/bckslash_test.py
File "C:/Users/bckslash_test.py", line 4
ROOTDIR = 'C:\Users'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Process finished with exit code 1

As I saw, I could solve the problem by changing line 4 to (small letter "r" before string:
ROOTDIR = r'C:\Users\zoran'

but that is not an option for me because I am using configparser in order to read the ROOTDIR from underlying cfg file.

I need a mechanism to read the path string with single backslashes into a variable, but afterwards to escape every backslash in it.

How to do that?

rand...@fastmail.us

unread,

May 8, 2015, 3:30:11 PM5/8/15

to pytho...@python.org

On Fri, May 8, 2015, at 15:00, zljubi...@gmail.com wrote:
> As I saw, I could solve the problem by changing line 4 to (small letter
> "r" before string:
> ROOTDIR = r'C:\Users\zoran'
>
> but that is not an option for me because I am using configparser in order
> to read the ROOTDIR from underlying cfg file.

configparser won't have that problem, since "escaping" is only an issue
for python source code. No escaping for backslashes is necessary in
files read by configparser.

>>> import sys
>>> import configparser
>>> config = configparser.ConfigParser()
>>> config['DEFAULT'] = {'ROOTDIR': r'C:\Users\zoran'}
>>> config.write(sys.stdout)
[DEFAULT]
rootdir = C:\Users\zoran

MRAB

unread,

May 8, 2015, 3:36:49 PM5/8/15

to pytho...@python.org

If you're reading the path from a file, it's not a problem. Try it!

zljubi...@gmail.com

unread,

May 8, 2015, 4:40:09 PM5/8/15

to

Thanks for clarifying.
Looks like the error message was wrong.
On windows ntfs I had a file name more than 259 characters which is widows limit.
After cutting file name to 259 characters everything works as it should.
If I cut file name to 260 characters I get the error from subject which is wrong.

Anyway case closed, thank you very much because I was suspecting that something is wrong with configparser.

Best regards.

Chris Angelico

unread,

May 8, 2015, 6:55:03 PM5/8/15

to pytho...@python.org

On Sat, May 9, 2015 at 5:00 AM, <zljubi...@gmail.com> wrote:
> But it returns the following error:
>
>
> C:\Python34\python.exe C:/Users/bckslash_test.py
> File "C:/Users/bckslash_test.py", line 4
> ROOTDIR = 'C:\Users'
> ^
> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Strong suggestion: Use forward slashes for everything other than what
you show to a human - and maybe even then (some programs have always
printed stuff out that way - zip/unzip, for instance). The backslash
has special meaning in many contexts, and you'll just save yourself so
much trouble...

ROOTDIR = 'C:/Users/zoran'

Problem solved!

ChrisA

Steven D'Aprano

unread,

May 8, 2015, 10:53:14 PM5/8/15

to

On Sat, 9 May 2015 06:39 am, zljubi...@gmail.com wrote:

> Thanks for clarifying.
> Looks like the error message was wrong.

No, the error message was right.

Your problem was that you used backslashes in *Python program code*, rather
than reading it from a text file.

In Python, a string-literal containing \U is an escape sequence which
expects exactly 8 hexadecimal digits to follow:

py> path = '~~~~\U000000a7~~~~'
py> print(path)
~~~~§~~~~

If you don't follow the \U with eight hex digits, you get an error:

py> path = '~~~~\Users~~~~'
File "<stdin>", line 1

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in

position 4-6: truncated \UXXXXXXXX escape

This applies only to string literals in code. For data read from files,
backslash \ is just an ordinary character which has no special meaning.

> On windows ntfs I had a file name more than 259 characters which is widows
> limit. After cutting file name to 259 characters everything works as it
> should. If I cut file name to 260 characters I get the error from subject
> which is wrong.

What you describe is impossible. You cannot possibly get a SyntaxError at
compile time because the path is too long. You must have made other changes
at the same time, such as using a raw string r'C: ... \Users\ ...'.

--
Steven

zljubi...@gmail.com

unread,

May 9, 2015, 6:31:17 AM5/9/15

to

Steven,

please do look at the code bellow:

# C:\Users\zoran\PycharmProjects\mm_align\hrt3.cfg contents
# [Dir]
# ROOTDIR = C:\Users\zoran\hrt

import os
import shutil
import configparser
import requests
import re

Config = configparser.ConfigParser()
Config.optionxform = str # preserve case in ini file
cfg_file = os.path.join('C:\\Users\\zoran\\PycharmProjects\\mm_align\\hrt3.cfg' )
Config.read(cfg_file)

ROOTDIR = Config.get('Dir', 'ROOTDIR')

print(ROOTDIR)

html = requests.get("http://radio.hrt.hr/prvi-program/arhiva/ujutro-prvi-poligraf-politicki-grafikon/118/").text

art_html = re.search('<article id="aod_0">(.+?)</article>', html, re.DOTALL).group(1)
for p_tag in re.finditer(r'<p>(.*?)</p>', art_html, re.DOTALL):
if '<strong>' not in p_tag.group(1):
title = p_tag.group(1)

title = title[:232]
title = title.replace(" ", "_").replace("/", "_").replace("!", "_").replace("?", "_")\
.replace('"', "_").replace(':', "_").replace(',', "_").replace('"', '')\
.replace('\n', '_').replace('&#39', '')

print(title)

src_file = os.path.join(ROOTDIR, 'src_' + title + '.txt')
dst_file = os.path.join(ROOTDIR, 'des_' + title + '.txt')

print(len(src_file), src_file)
print(len(dst_file), dst_file)

with open(src_file, mode='w', encoding='utf-8') as s_file:
s_file.write('test')

shutil.move(src_file, dst_file)

It works, but if you change title = title[:232] to title = title[:233], you will get "FileNotFoundError: [Errno 2] No such file or directory".
As you can see ROOTDIR contains \U.

Regards.

Dave Angel

unread,

May 9, 2015, 8:26:26 AM5/9/15

to pytho...@python.org

On 05/09/2015 06:31 AM, zljubi...@gmail.com wrote:
>
> title = title[:232]
> title = title.replace(" ", "_").replace("/", "_").replace("!", "_").replace("?", "_")\
> .replace('"', "_").replace(':', "_").replace(',', "_").replace('"', '')\
> .replace('\n', '_').replace('&#39', '')
>
> print(title)
>
> src_file = os.path.join(ROOTDIR, 'src_' + title + '.txt')
> dst_file = os.path.join(ROOTDIR, 'des_' + title + '.txt')
>
> print(len(src_file), src_file)
> print(len(dst_file), dst_file)
>
> with open(src_file, mode='w', encoding='utf-8') as s_file:
> s_file.write('test')
>
>
> shutil.move(src_file, dst_file)
>
> It works, but if you change title = title[:232] to title = title[:233], you will get "FileNotFoundError: [Errno 2] No such file or directory".
> As you can see ROOTDIR contains \U.

No, we can't see what ROOTDIR is, since you read it from the config
file. And you don't show us the results of those prints. You don't
even show us the full exception, or even the line it fails on.

I doubt that the problem is in the ROODIR value, but of course nothing
in your program bothers to check that that directory exists. I expect
you either have too many characters total, or the 232th character is a
strange one. Or perhaps title has a backslash in it (you took care of
forward slash).

While we're at it, if you do have an OS limitation on size, your code is
truncating at the wrong point. You need to truncate the title based on
the total size of src_file and dst_file, and since the code cannot know
the size of ROOTDIR, you need to include that in your figuring.

--
DaveA

Steven D'Aprano

unread,

May 9, 2015, 11:14:08 AM5/9/15

to

On Sat, 9 May 2015 08:31 pm, zljubi...@gmail.com wrote:

> It works, but if you change title = title[:232] to title = title[:233],
> you will get "FileNotFoundError: [Errno 2] No such file or directory".

Which is a *completely different* error from

SyntaxError: 'unicodeescape' codec can't decode bytes in position 2-3:
truncated \UXXXXXXXX escape

> As you can see ROOTDIR contains \U.

How can I possibly see that? Your code reads ROOTDIR from the config file,
which you don't show us.

I agree with you that Windows has limitations on the length of file names,
and that you get an error if you give a file name that cannot be found. The
point is that before you can get that far, you *first* have to fix the
SyntaxError. That's a completely different problem.

You can't fix the \U syntax error by truncating the total file length. But
you can fix that syntax error by changing your code so it reads the ROOTDIR
from a config file instead of a hard-coded string literal -- exactly like
we told you to do!

An essential skill when programming is to read and understand the error
messages. One of the most painful things to use is a programming language
that just says

"An error occurred"

with no other explanation. Python gives you lots of detail to explain what
went wrong:

SyntaxError means you made an error in the syntax of the code and the
program cannot even run.

FileNotFoundError means that the program did run, it tried to open a file,
but the file doesn't exist.

They're a little bit different, don't you agree?

--
Steven

Chris Angelico

unread,

May 9, 2015, 11:22:28 AM5/9/15

to pytho...@python.org

On Sun, May 10, 2015 at 1:13 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> FileNotFoundError means that the program did run, it tried to open a file,
> but the file doesn't exist.

Normally it does, at least. Sometimes it means that a *directory*
doesn't exist (for instance, you can get this when you try to create a
new file, which otherwise wouldn't make sense), and occasionally,
Windows will give you rather peculiar errors when weird things go
wrong, which may be what's going on here (maximum path length - though
that can be overridden by switching to a UNC-style path).

Steven's point still stands - very different from SyntaxError - but
unfortunately it's not always as simple as the name suggests. Thank
you oh so much, Windows.

ChrisA

zljubi...@gmail.com

unread,

May 10, 2015, 5:10:44 PM5/10/15

to

> No, we can't see what ROOTDIR is, since you read it from the config
> file. And you don't show us the results of those prints. You don't
> even show us the full exception, or even the line it fails on.

Sorry I forgot. This is the output of the script:

C:\Python34\python.exe C:/Users/zoran/PycharmProjects/mm_align/bckslash_test.py
C:\Users\zoran\hrt
Traceback (most recent call last):
File "C:/Users/zoran/PycharmProjects/mm_align/bckslash_test.py", line 43, in <module>

with open(src_file, mode='w', encoding='utf-8') as s_file:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\zoran\\hrt\\src_70._godišnjica_pobjede_nad_fašizmom_Zašto_većina_čelnika_Europske_unije_bojkotira_vojnu_paradu_u_Moskvi__Kako_će_se_obljetnica_pobjede_nad_nacističkom_Njemačkom_i_njenim_satelitima_obilježiti_u_našoj_zemlji__Hoće_li_Josip_Broz_Tito_o.txt'
70._godišnjica_pobjede_nad_fašizmom_Zašto_većina_čelnika_Europske_unije_bojkotira_vojnu_paradu_u_Moskvi__Kako_će_se_obljetnica_pobjede_nad_nacističkom_Njemačkom_i_njenim_satelitima_obilježiti_u_našoj_zemlji__Hoće_li_Josip_Broz_Tito_o
260 C:\Users\zoran\hrt\src_70._godišnjica_pobjede_nad_fašizmom_Zašto_većina_čelnika_Europske_unije_bojkotira_vojnu_paradu_u_Moskvi__Kako_će_se_obljetnica_pobjede_nad_nacističkom_Njemačkom_i_njenim_satelitima_obilježiti_u_našoj_zemlji__Hoće_li_Josip_Broz_Tito_o.txt
260 C:\Users\zoran\hrt\des_70._godišnjica_pobjede_nad_fašizmom_Zašto_većina_čelnika_Europske_unije_bojkotira_vojnu_paradu_u_Moskvi__Kako_će_se_obljetnica_pobjede_nad_nacističkom_Njemačkom_i_njenim_satelitima_obilježiti_u_našoj_zemlji__Hoće_li_Josip_Broz_Tito_o.txt

Process finished with exit code 1

Cfg file has the following contents:

C:\Users\zoran\PycharmProjects\mm_align\hrt3.cfg contents

[Dir]

ROOTDIR = C:\Users\zoran\hrt

> I doubt that the problem is in the ROODIR value, but of course nothing
> in your program bothers to check that that directory exists. I expect
> you either have too many characters total, or the 232th character is a
> strange one. Or perhaps title has a backslash in it (you took care of
> forward slash).

How to determine that?

> While we're at it, if you do have an OS limitation on size, your code is
> truncating at the wrong point. You need to truncate the title based on
> the total size of src_file and dst_file, and since the code cannot know
> the size of ROOTDIR, you need to include that in your figuring.

Well, in my program I am defining a file name as category-id-description.mp3.
If the file is too long I am cutting description (it wasn't clear from my example).

Regards.

zljubi...@gmail.com

unread,

May 10, 2015, 5:15:01 PM5/10/15

to

> > It works, but if you change title = title[:232] to title = title[:233],
> > you will get "FileNotFoundError: [Errno 2] No such file or directory".
>
>
> Which is a *completely different* error from
>
> SyntaxError: 'unicodeescape' codec can't decode bytes in position 2-3:
> truncated \UXXXXXXXX escape

I don't know when the original error disappeared and become this one (confused).

Regards.

Dave Angel

unread,

May 10, 2015, 9:33:39 PM5/10/15

to pytho...@python.org

Probably by calling os.path.isdir()

>
>> While we're at it, if you do have an OS limitation on size, your code is
>> truncating at the wrong point. You need to truncate the title based on
>> the total size of src_file and dst_file, and since the code cannot know
>> the size of ROOTDIR, you need to include that in your figuring.
>
> Well, in my program I am defining a file name as category-id-description.mp3.
> If the file is too long I am cutting description (it wasn't clear from my example).

Since you've got non-ASCII characters in that name, the utf-8 version of
the name will be longer. I don't run Windows, but perhaps it's just a
length problem after all.

--
DaveA

zljubi...@gmail.com

unread,

May 12, 2015, 2:58:01 PM5/12/15

to

I would say so as well.
Thanks to everyone who helped.

Regards and best wishes.

0 new messages

Search

Clear search

Close search

Google apps

Main menu