mongodb couldn't start with non-ASCII characters in path?

405 views
Skip to first unread message

Anta Huang

unread,
Mar 19, 2012, 5:15:18 AM3/19/12
to mongodb-user
I start mongodb in the following command


mongod.exe --dbpath "\Users\test\NON-ASCII\data\db" --port 27017 --
logappend --logpath "\Users\test\NON-ASCII\logs\mongod.log" --rest --
vvvvv

and the log shows

Mon Mar 19 16:59:47 [initandlisten] User Assertion: 13518:couldn't
open file /Users/test/NON-ASCII/data/db/journal/tempLatencyTest for
writing errno:3
Mon Mar 19 16:59:47 [initandlisten] info preallocateIsFaster couldn't
run; returning false
Mon Mar 19 16:59:47 [initandlisten] User Assertion: 13518:couldn't
open file /Users/test/NON-ASCII/data/db/journal/j._0 for writing errno:
3
Mon Mar 19 16:59:47 [initandlisten] exception in initAndListen: 13518
couldn't open file /Users/test/NON-ASCII/data/db/journal/j._0 for
writing errno:3
terminating

and if I replace the path with ASCII words, it works fine

I am sure if the NON-ASCII path is the reason why mongod couldn't
start but I guess so

Eliot Horowitz

unread,
Mar 19, 2012, 8:18:02 AM3/19/12
to mongod...@googlegroups.com
Are you sure that directory exists and is accesible?
What are the exact characters?
What OS is this?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Tad Marshall

unread,
Mar 19, 2012, 9:04:46 AM3/19/12
to mongodb-user
I just tested this is a recent master build (an intermediate 2.1.1-
pre- version) with a simple

md c:\data\á
mongod --dbpath c:\data\á

I did see issues: mongod's initial display of the command line options
showed 'options: { dbpath: "c:\data\ß" }' which is not correct and
trying to display them from the shell "failed":
> db.serverCmdLineOpts()
Mon Mar 19 08:51:00 decode failed. probably invalid utf-8 string [c:
\data\ß]
Mon Mar 19 08:51:00 why: InternalError: buffer too small
Mon Mar 19 08:51:00 Error: invalid utf8 src/mongo/shell/utils.js:1010

But I did not get the failure you got, and mongod.exe did create the
database in the directory I specified and writing to and reading from
a collection worked.

I'll post a Jira ticket on the symptoms I can reproduce. If you can
fill in the blanks from Eliot's questions and a few more (exact file
path you used, version of Windows, version of MongoDB, language and
keyboard settings, code page of your command window [type
"chcp<Enter>" in the command window]) I can try to repro your failure
and add it to the Jira ticket.

Thanks for the report!

Glenn Maynard

unread,
Mar 19, 2012, 9:48:18 AM3/19/12
to mongod...@googlegroups.com
On Mon, Mar 19, 2012 at 8:04 AM, Tad Marshall <t...@10gen.com> wrote:
I just tested this is a recent master build (an intermediate 2.1.1-
pre- version) with a simple

md c:\data\á
mongod --dbpath c:\data\á

Try it with characters that aren't in your ACP, eg. 漢字 (assuming you're in US English).  That's a common--nearly ubiquitous--problem in non-Unicode Windows applications; having non-ACP characters in your Users path would be a very bad idea...

--
Glenn Maynard

Tad Marshall

unread,
Mar 19, 2012, 10:14:52 AM3/19/12
to mongodb-user
Hi Glenn,

Thanks for the tip. I'm (I think) slightly limited in what I can test
because I need a monospaced font with my test characters in the
console window if I want to see what I'm doing, but perhaps if I
installed the Chinese language pack (in Windows 7 Ultimate) I'd have
what I need. I've been testing with the Consolas font.

So your suggestion is to use any character that is not in ISO Latin-1
(more or less Windows code page 1252)? So the Unicode character U
+0100 'Ā' (LATIN CAPITAL LETTER A WITH MACRON) would work? Or are
there further issues with Asian characters that I wouldn't see with
European characters?

I did see bad behavior in just my simple test so it's not like we work
right with non-ASCII (but in ISO Latin-1) characters anyway ...

Tad Marshall

Glenn Maynard

unread,
Mar 19, 2012, 11:03:33 AM3/19/12
to mongod...@googlegroups.com
On Mon, Mar 19, 2012 at 9:14 AM, Tad Marshall <t...@10gen.com> wrote:
Thanks for the tip.  I'm (I think) slightly limited in what I can test
because I need a monospaced font with my test characters in the
console window if I want to see what I'm doing, but perhaps if I
installed the Chinese language pack (in Windows 7 Ultimate) I'd have
what I need.  I've been testing with the Consolas font.

Be careful testing in the console.  The console uses the OEM codepage, where the rest of the system uses the ANSI codepage.  That's where the á vs. ß difference is coming from: á is 0xE1 in CP1252, and ß is 0xE1 in CP437.  This is probably only a problem with displaying characters--the rest (reading the string in argv[] and making fopen, etc. calls using it) is probably working fine.

This is probably a different problem than you'll see if you use characters that aren't in your language at all.

(The solution to most codepage problems is to use Unicode APIs.  Unfortunately, that's a pain in portable programs; you typically end up having to wrap all libc functions that receive encoded strings, eg. fopen -> CreateFileW and printf -> WriteConsoleW.)

So your suggestion is to use any character that is not in ISO Latin-1
(more or less Windows code page 1252)?  So the Unicode character U
+0100 'Ā' (LATIN CAPITAL LETTER A WITH MACRON) would work?  Or are
there further issues with Asian characters that I wouldn't see with
European characters?

To test without console codepages adding to the confusion, use something in neither your ANSI codepage nor your OEM codepage.  U+0100 should be fine.  Filenames with those characters in them simply can't be accessed with the libcrt POSIX file I/O functions (eg. fopen) in Windows (unless you happen to be in a UTF-8 codepage, which you almost never are).

--
Glenn Maynard


Anta Huang

unread,
Mar 19, 2012, 11:30:18 AM3/19/12
to mongodb-user
Sorry for the typo that I just found out from the last paragraph of
original post,
"I am NOT sure if the NON-ASCII path is the reason why mongod
couldn't
start but I guess so"

the exact path I use is like this : /Users/test/測試/data/db
OS: windows 7 ultimate
mongodb version : v2.0.1
code page : 950 Traditional Chinese

the NON-ASCII word "測試" in path is Traditional Chinese and I agree
with Glenn it's a bad idea having non-ACP characters in Users path
while user used to name their username in their language and the
username would appear in the path.

Thanks for noticing this post!

Tad Marshall

unread,
Mar 19, 2012, 12:04:27 PM3/19/12
to mongodb-user
Thanks Glenn and Anta for all the helpful information!

Glenn, nice catch on the 0xE1 == 'ß' in code page 437, I should have
caught that. Yes, changing my code page to 1252 makes mongod.exe
display the 'á' correctly. We still have the bad UTF-8 being sent to
the shell, or the shell's misinterpreting good UTF-8 or something.

Anta, thanks for the exact path that caused you problems. The
characters are U+6E2C U+8A66, both perfectly good Unicode characters
in the Basic Multilingual Plane, so correct Unicode handling and UTF-8
translations should not have issues with them.

I filed a Jira ticket ( https://jira.mongodb.org/browse/SERVER-5333 )
which I will update now. Vote for it if you like.

Glenn, the next version of the shell (2.1.1 when available) uses
CP_UTF8 to make a lot of the Unicode/UTF-8 issues fade. Probably if
mongod.exe did this as well it would display the 'á' properly.

I'm not sure what you're thinking regarding not testing in the
console. For better or worse, the Windows console is the main home of
the mongo shell and of the mongodb servers when they are not running
as Windows services, so that's the environment that we need to work
in. I know that some people run the shell in Cygwin or in an Emacs
window and we should work right there too, but these will never be the
main environment. Possibly I'm just missing your point.

Thanks for all your help!

Tad

Glenn Maynard

unread,
Mar 19, 2012, 1:15:24 PM3/19/12
to mongod...@googlegroups.com
On Mon, Mar 19, 2012 at 11:04 AM, Tad Marshall <t...@10gen.com> wrote:
Glenn, the next version of the shell (2.1.1 when available) uses
CP_UTF8 to make a lot of the Unicode/UTF-8 issues fade.  Probably if
mongod.exe did this as well it would display the 'á' properly.

I'm guessing you mean SetConsoleCP?  (I don't see any calls to that function in https://github.com/mongodb/mongo.git...)

That only changes the console codepage, though.  It doesn't help with accessing files with non-ACP filenames.  As far as I know, the only way to do that is to replace all fopen, rename, fstat, etc. calls with their Win32 equivalents like CreateFile and MoveFile.

The same problem exists with argv: it's in the ACP, and you need to use a Unicode main() function to get non-ACP strings on the commandline, eg. wmain(int argc, wchar_t *argv[]).  (You might be able to sidestep that with GetCommandLineW; I'm not sure off-hand if that works.)

I'm not sure what you're thinking regarding not testing in the
console.

There are two separate sets of problems: displaying strings in the console and opening files.  If you test opening files in the console, you'll have console I/O problems confusing your tests.  Testing without the console isolates the test better.

Did you check with U+0100?

--
Glenn Maynard

Tad Marshall

unread,
Mar 19, 2012, 1:47:35 PM3/19/12
to mongodb-user
Yes, Glenn, you are correct on all points.

The code that uses SetConsoleCP and SetConsoleOutputCP is not checked
in to the mongodb/mongo.git codebase yet, but you can look at it in my
fork at https://github.com/tadmarshall/mongo/commit/780b1d968c0d1cf04f6f74a791fbcc98c4f32230
.

My comment in
https://jira.mongodb.org/browse/SERVER-5333?focusedCommentId=100189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-100189
suggests how I'd fix it: switch to wmain(), translate to UTF-8 and
then translate to UTF-16 and use CreateFileW() etc. for file opens.
Pretty much what you are saying, I think.

And yes, the console code page issues are related to but not the same
as the file system issues: we need to fix both.

And another yes, I tested with U+0100 and it bombed as predicted.

13:31:49.78 G:\MongoDB\tadmarshall\mongo> mkdir c:\data\Ā

13:32:06.18 G:\MongoDB\tadmarshall\mongo> chcp
Active code page: 437

13:32:27.38 G:\MongoDB\tadmarshall\mongo> mongod --dbpath c:\data\Ā
// ... snip ...
Mon Mar 19 13:32:51 [initandlisten] options: { dbpath: "c:\data\A" }
Mon Mar 19 13:32:51 [initandlisten] exception in initAndListen: 10296
*********************************************************************
ERROR: dbpath (c:\data\A) does not exist.
Create this directory or give existing directory in --dbpath.
See http://www.mongodb.org/display/DOCS/Starting+and+Stopping+Mongo
*********************************************************************
, terminating
Mon Mar 19 13:32:51 dbexit:

Thanks for all your help!

Tad

On Mar 19, 1:15 pm, Glenn Maynard <gl...@zewt.org> wrote:
> On Mon, Mar 19, 2012 at 11:04 AM, Tad Marshall <t...@10gen.com> wrote:
> > Glenn, the next version of the shell (2.1.1 when available) uses
> > CP_UTF8 to make a lot of the Unicode/UTF-8 issues fade.  Probably if
> > mongod.exe did this as well it would display the 'á' properly.
>
> I'm guessing you mean SetConsoleCP?  (I don't see any calls to that
> function inhttps://github.com/mongodb/mongo.git...)
Reply all
Reply to author
Forward
0 new messages