Issue 682 in pdfium: FPDF_LoadDocument unable to load file whose name has multibyte characters

386 views
Skip to first unread message

jr.ta… via monorail

unread,
Mar 21, 2017, 6:45:24 AM3/21/17
to pdfiu...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 682 by jr.ta...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682

What steps will reproduce the problem?
1. Create a PDF whose filename contains Chinese, Greek or other such wide characters
2. Try to load the document with FPDF_LoadDocument(filename, password)
3. The document cannot be opened

What is the expected output? What do you see instead?
The document whose filename contains wide characters should be opened. Instead, the FPDF_GetLastError() is returning 2 which means the file could not be found.

What version of the product are you using? On what operating system?

Own compiled DLL from commit: ce88e77e62d818dafeeb79cd4a58a0aeff6e4444
With the following gn args:
pdf_is_standalone = true
is_component_build = false
is_official_build = false
is_debug = true

target_cpu = "x86"

Please provide any additional information below.

I'm integrating with this through JNA. I'm passing in the string containing the filename from Java into the JNA Interface which will run it in the DLL.


--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

dsincl… via monorail

unread,
Mar 21, 2017, 9:45:52 AM3/21/17
to pdfiu...@googlegroups.com

Comment #1 on issue 682 by dsin...@chromium.org: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c1

It's possible the name is getting mangled as it's passed through the system. We treat the name internally as a ByteString which is what gets passed to open().

jr.ta… via monorail

unread,
Mar 21, 2017, 10:12:54 PM3/21/17
to pdfiu...@googlegroups.com

Comment #2 on issue 682 by jr.ta...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c2

Hi dsinclair,

By passed through the system you mean as it's passed from Java through JNA to the DLL or as it's processed inside the DLL itself?

I checked the source:
DLLEXPORT FPDF_DOCUMENT STDCALL FPDF_LoadDocument(FPDF_STRING file_path,
FPDF_BYTESTRING password) {
// NOTE: the creation of the file needs to be by the embedder on the
// other side of this API.
CFX_RetainPtr<IFX_SeekableReadStream> pFileAccess =
IFX_SeekableReadStream::CreateFromFilename((const char*)file_path);
if (!pFileAccess)
return nullptr;

auto pParser = pdfium::MakeUnique<CPDF_Parser>();
pParser->SetPassword(password);

auto pDocument = pdfium::MakeUnique<CPDF_Document>(std::move(pParser));
CPDF_Parser::Error error =
pDocument->GetParser()->StartParse(pFileAccess, pDocument.get());
if (error != CPDF_Parser::SUCCESS) {
ProcessParseError(error);
return nullptr;
}
return FPDFDocumentFromCPDFDocument(pDocument.release());
}


Does the cast to (const char*) make reading the string passed done byte-by-byte? Specifically, are multibyte parts of the string being read per byte when cast to const char*?

dsincl… via monorail

unread,
Mar 22, 2017, 8:57:28 AM3/22/17
to pdfiu...@googlegroups.com

Comment #3 on issue 682 by dsin...@chromium.org: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c3

Yea, we're casting it to a char*. Reading a bit more, that in-and-of itself should be fine, as long as the bytes we have match the bytes that makeup the filename it should still work. So, the best bet is to print out each of the bytes in the file_path and make sure they match the bytes of the name on disk.

jr.ta… via monorail

unread,
Mar 23, 2017, 4:20:34 AM3/23/17
to pdfiu...@googlegroups.com

Comment #4 on issue 682 by jr.ta...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c4

I've tried printing file_path into a file by doing:


DLLEXPORT FPDF_DOCUMENT STDCALL FPDF_LoadDocument(FPDF_STRING file_path,
FPDF_BYTESTRING password) {
// NOTE: the creation of the file needs to be by the embedder on the
// other side of this API.
CFX_RetainPtr<IFX_SeekableReadStream> pFileAccess =
IFX_SeekableReadStream::CreateFromFilename((const char*)file_path);
std::ofstream myfile;
myfile.open ("C:/filenames.txt");
myfile << (const char*)file_path;
myfile.close();


The output written into C:/filenames.txt is the exact filename I passed from Java via JNA interface, so it's most likely a bug inside the DLL itself that mangles the filename up.

Attached is a sample document that will fail when opened via FPDF_LoadDocument

Attachments:
test with greek in name άλφα.pdf 15.6 KB

jr.ta… via monorail

unread,
Mar 30, 2017, 3:18:59 AM3/30/17
to pdfiu...@googlegroups.com

Comment #5 on issue 682 by jr.ta...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c5

Hello, I'd just like to check if this is already accepted as a bug or if there is anything else you'd like me to do for this ticket to be handled?

dsincl… via monorail

unread,
Mar 30, 2017, 8:56:01 AM3/30/17
to pdfiu...@googlegroups.com
Updates:
Status: Accepted

Comment #6 on issue 682 by dsin...@chromium.org: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c6

(No comment was entered for this change.)

farhad.k… via monorail

unread,
Nov 23, 2017, 2:32:57 PM11/23/17
to pdfiu...@googlegroups.com

Comment #7 on issue 682 by farhad.k...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c7

I also came across this problem when trying to open a file with a Unicode path. I have traced the problem and can suggest the following fix:

Add a new static function in fx_stream.h to support wide strings:

static RetainPtr<IFX_SeekableReadStream> CreateFromFilename(const wchar_t* filename);

With the corresponding implementation in fx_stream.cpp:

// static
RetainPtr<IFX_SeekableReadStream>
IFX_SeekableReadStream::CreateFromFilename(const wchar_t* filename) {
return IFX_SeekableStream::CreateFromFilename(filename, FX_FILEMODE_ReadOnly);
}

Change the interface in fpdfview.h to take Unicode string:

FPDF_EXPORT FPDF_DOCUMENT FPDF_CALLCONV
FPDF_LoadDocument(FPDF_WIDESTRING file_path, FPDF_BYTESTRING password);

With corresponding changes to the implementation in fpdfview.cpp:

FPDF_EXPORT FPDF_DOCUMENT FPDF_CALLCONV
FPDF_LoadDocument(FPDF_WIDESTRING file_path, FPDF_BYTESTRING password) {

// NOTE: the creation of the file needs to be by the embedder on the
// other side of this API.
return LoadDocumentImpl(IFX_SeekableReadStream::CreateFromFilename((const wchar_t*)file_path), password);
}

With these changes, I was able to pinvoke the function for Unicode paths using:

[DllImport("Pdfium.dll")]
public static extern IntPtr
FPDF_LoadDocument([MarshalAs(UnmanagedType.LPWStr)]string file_path,
[MarshalAs(UnmanagedType.LPStr)] string password);

dsincl… via monorail

unread,
Mar 11, 2018, 11:29:20 AM3/11/18
to pdfiu...@googlegroups.com
Updates:
Labels: Api

Comment #8 on issue 682 by dsin...@chromium.org: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c8


(No comment was entered for this change.)

gipet… via monorail

unread,
Feb 20, 2019, 8:47:21 AM2/20/19
to pdfiu...@googlegroups.com

Comment #9 on issue 682 by gipet...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c9

Are there any plans to fix this?

thes… via monorail

unread,
Feb 20, 2019, 4:52:37 PM2/20/19
to pdfiu...@googlegroups.com
Updates:
Owner: the...@chromium.org

Comment #10 on issue 682 by the...@chromium.org: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c10

Sure, I will add this to my queue.

patrick.… via monorail

unread,
Aug 28, 2019, 9:53:16 AM8/28/19
to pdfiu...@googlegroups.com

Comment #11 on issue 682 by patrick....@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c11

Hi,
I'm facing the same issue, is there a plan (or a possible workaround ) ?

thes… via monorail

unread,
Aug 28, 2019, 12:47:54 PM8/28/19
to pdfiu...@googlegroups.com

Comment #12 on issue 682 by the...@chromium.org: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c12

The workaround is to open and read the document separately, and then load the data with FPDF_LoadMemDocument().

patrick.… via monorail

unread,
Sep 2, 2019, 6:06:16 AM9/2/19
to pdfiu...@googlegroups.com

Comment #13 on issue 682 by patrick....@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c13

Thanks
but as workaround, i prefer to use FPDF_LoadCustomDocument() so we don't need to read the entire file in memory

I think the proposition of farhad (in comment 7) is a safe way (at least for windows...)

seanche… via monorail

unread,
Aug 7, 2020, 3:54:21 PM8/7/20
to pdfiu...@googlegroups.com

Comment #15 on issue 682 by seanche...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c15

The workaround is not very satisfying. Please add a version of FPDF_LoadDocument which accepts FPDF_WIDESTRING for the file path, or fix FPDF_LoadDocument so that UTF-8 encoded paths can be opened on Windows.

wolfr… via monorail

unread,
Nov 13, 2020, 2:56:52 PM11/13/20
to pdfiu...@googlegroups.com

Comment #16 on issue 682 by wolfr...@gmail.com: FPDF_LoadDocument unable to load file whose name has multibyte characters
https://bugs.chromium.org/p/pdfium/issues/detail?id=682#c16

This seems like a pretty big oversight with a fairly straightforward fix.
Reply all
Reply to author
Forward
0 new messages