How to get text inside bounding box based on the google vision ocr results on entire document?

1,680 views
Skip to first unread message

Abhijith Nair

unread,
Jan 9, 2021, 12:31:39 PM1/9/21
to cloud-vision-discuss
I have a document image for which I have a bounding box for that image and I am trying to extract text from the bounding box.

Here's a sample document image,

TOwMB.png

Here is the code I use to generate those bounding box,

AP-RTF-docx-pdf-0.jpeg

    from matplotlib.patches import Rectangle
    import matplotlib.pyplot as plt
    from PIL import Image
    
    # Display the image
    plt.figure(figsize=(60, 13))
    plt.imshow(image)
    
    # Get the current reference
    ax = plt.gca()
    
    # Create a Rectangle patch
    rect = Rectangle((10.1, 193.3), width=2477.5, height=1417.3, linewidth=6,
                      edgecolor='r', facecolor='none')
    
    # Add the patch to the Axes
    ax.add_patch(rect)

My goal is to extract text in between those bounding box, I have results from OCR from google vision API but it is for the entire image not for those separate bounding box,

Here is the code,

    path = "docs_converted/AP_RTF.docx.pdf_0.jpeg"
    
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()
    
    image = vision.Image(content=content)
    response = client.document_text_detection(image=image)
    texts = response.text_annotations
    # document = response.full_text_annotation
    
    # these are the results from ocr from google
    description: "ANDREW"
    bounding_poly {
      vertices {
        x: 290
        y: 65
      }
      vertices {
        x: 599
        y: 66
      }
      vertices {
        x: 599
        y: 138
      }
      vertices {
        x: 290
        y: 137
      }
    }

These are different from bounding box coordinates.


If anyone know how to achieve what I am looking for please help me out.

Monica (Google Cloud Platform)

unread,
Feb 19, 2021, 11:12:53 PM2/19/21
to cloud-vision-discuss
Hello,
There is no API feature to tell Vision API to only scan a specific section of a file. It would be recommended to simply pre-process your images as to crop out the specific section of the file into a new image. Then perform Vision API OCR on that new smaller image.

You can find many threads on Stack Overflow (e.g. this one) that can further help you implementing the cropping feature.

Richard Warburton

unread,
Feb 20, 2021, 6:42:08 AM2/20/21
to cloud-vision-discuss
I am doing similar just now. I use a call to another API like cloudinary or imgix as the source of a cropped version of the image.
Get your boundary box
Send your crop coordinates to imgix 
Retrieve image and send back to document text ocr

Reply all
Reply to author
Forward
0 new messages