Status: Unconfirmed
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 1597 by gipet...@
gmail.com: Wrong text when using FPDFTextObj_GetText
https://bugs.chromium.org/p/pdfium/issues/detail?id=1597What steps will reproduce the problem?
Use the following code to read the text of the attached PDF file.
FPDF_LIBRARY_CONFIG config;
config.version = 2;
config.m_pUserFontPaths = NULL;
config.m_pIsolate = NULL;
config.m_v8EmbedderSlot = 0;
FPDF_InitLibraryWithConfig(&config);
FPDF_DOCUMENT pdfDocument = FPDF_LoadDocument("PATH_TO_embedded_images.pdf", NULL);
FPDF_PAGE page = FPDF_LoadPage(pdfDocument, 0);
FPDF_TEXTPAGE textPage = FPDFText_LoadPage(page);
int objectCount = FPDFPage_CountObject(page);
for (int i = 0; i < objectCount; i++)
{
FPDF_PAGEOBJECT pageObject = FPDFPage_GetObject(page, i);
int type = FPDFPageObj_GetType(pageObject);
if (type == FPDF_PAGEOBJ_TEXT)
{
unsigned long size = FPDFTextObj_GetText(pageObject, textPage, nullptr, 0);
std::vector<unsigned short> buffer(size / 2);
FPDFTextObj_GetText(pageObject, textPage, buffer.data(), size);
std::wstring str(buffer.begin(), buffer.end());
std::wcout << "Size: " << size << " \"" << str << "\"" << std::endl;
}
}
What is the expected output? What do you see instead?
The forth text is returned as "LZW tiff; Flate RGB jpeg; Flate CMYK j"
Shouldn't this return three separate text?
When we try to visualize this we get a wrong result as depicted in WrongText.png
Attachments:
embedded_images.pdf 33.5 KB
WrongText.png 64.6 KB
--
You received this message because:
1. The project was configured to send all issue notifications to this address
You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings