Document AI Invoice Parser - Strange bounding polygons returned for certain input image orientations

96 views
Skip to first unread message

Antonis Karydas

unread,
May 25, 2021, 10:48:17 AM5/25/21
to Google Cloud Developers
Hi,

I use the Document AI Invoice Processor for processing scanned invoices. I am using the Java client libraries.
I recently noticed that there is an incosistency in the bounding polygons of the extracted entities when the input image is rotated 90deg ccw from the default reading orientation. 

When the service is presented with an image of an invoice which is rotated 90deg counterclockwise from the the upright (reading) orientation the returned bounding polygon is not correct. Suppose that the invoice-id field is located somewhere near the top-right corner of the invoice image when it is in the reading orientation. If (for some reason, e.g. a user misplaces the invoice document on the scanner) I send a 90deg ccw rotated image of this invoice to the service it still detects the invoice-id field (in fact, all fields) very well but whereas I would expect the bounding polygon to be somewhere in the top-left corner (because of the rotation) the engine returns a bounding poly that is still in the top-right corner. It very much looks like the returned bounding polygon is relative to the image in its upright (reading) orientation and not relative to the image I actually uploaded to the service (i.e. the 90deg ccw rotated one) as the documentation suggests.

NOTE that I get the bounding polygon by following this reference chain: 
Document->Entity->PageAnchor->PageRef->BoundingPoly.
If I wanted the bounding poly of, say, a paragraph I would follow this reference chain: Document->Page->Pargraph->Layout->BoundingPoly. Note how the last step in this chain is a Document.Page.Layout object which (along with the bounding poly)  has an 'orientation' property that specifies the orientation of this layout object relative to the page's orientation. Unfortunately, when reaching for the bounding poly of an extracted entity the reference chain does not include a Layout object. Instead, it goes through a PageRef object which has a bounding poly but NOT an orientation that would allow me to make sense of the returned bounding poly.

So, to get to my question, is this a bug ? Should the returned entity bounding polygon be relative to the uploaded image or is the observed behavior correct and I should transform the polygon somehow ?  And, how about the orientation of an extracted entity ? Why is it not conveyed in the PageRef object (like it is done for Blocks, Lines, Tokens etc through the Document.Page.Layout object) ? Is this something that will be added in the future ?

Thanks in advance,
Antonis

Olu

unread,
May 31, 2021, 10:42:40 AM5/31/21
to Google Cloud Developers
Hello, 

I understand your concern is about the result quality using the Cloud DocumentAI Invoice Parser. As you may already be aware, the Invoice Parser is a limited Access Feature and the request for API access has to be explicitly submitted to use the feature[0]. Hence, reviewing quality issues may require collaborating with internal GCP Support Engineers. 

I suggest you go ahead and open a private issue link[1] with one of our GCP Support Engineers for further evaluation of the quality issue.

Reply all
Reply to author
Forward
0 new messages