DOCUMENT_TEXT_DETECTION of content in PDF returning words in incorrect order

January December

unread,

May 31, 2021, 8:02:34 AM5/31/21

to cloud-visi...@googlegroups.com

Hi All,

Order of words recognized from PDF file using file:annotate found to be words in incorrect order.

Any ideas on how to hint the vision API to recognize words in correct order?

For example, a PDF document page has page number on left-top and chapter subtitle on right-top in the same line. But both are on the same line. OCR returns the "chapter subtitle" first and then "page number". XY coordinates also represent the same, so there is no choice left for me to correct the word order programmatically.

Regards,

Jan.

Monica (Google Cloud Platform)

unread,

Jul 23, 2021, 7:28:22 PM7/23/21

to cloud-vision-discuss

Hello Jan,

Since the page number is on left-top and the chapter subtitle on the right-top it looks it might be interpreted in the same line. The OCR returns the "chapter subtitle" first and then "page number". XY coordinates also represent the same, I couldn't find a way to correct the word order programmatically.

Since the expected behavior isn't returned when using DOCUMENT_TEXT_DETECTION, can you check whether text_detection provides better performances? If no change, it would be advisable to open a case with support or create a public issue providing a sample document (non-confidential).

January December

unread,

Jul 30, 2021, 12:35:41 AM7/30/21

to cloud-vision-discuss

Hello Monica,

I have tried with TEXT_DETECTION, its output looks better but not complete w.r.t word orders.

I have reported this issue under public ticket https://issuetracker.google.com/u/1/issues/189099533

Regards,

Jan.

January December

unread,

Jul 30, 2021, 11:06:36 PM7/30/21

to cloud-vision-discuss

Hello Monica,

I would like to know whether it is possible to use both TEXT_DETECTION & DOCUMENT_TEXT_DETECTION together in a single request? Something like

"requests": [{

"features": [{

"type": "TEXT_DETECTION",

"type": "DOCUMENT_TEXT_DETECTION",

"model": "builtin/stable"

}

],..............

My requirement is to extract text from PDF documents. It is observed that TEXT_DETECTION type missing (skipping) some characters compared to DOCUMENT_TEXT_DETECTION. But TEXT_DETECTION giving better word order and character recognition. So wondering is there a way to leverage both feature types together in a single request.

I am referring to feature types mentioned in https://cloud.google.com/vision/docs/reference/rest/v1/Feature#Type.