I have a document image for which I have a bounding box for that image and I am trying to extract text from the bounding box.
Here's a sample document image,

Here is the code I use to generate those bounding box,

from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
from PIL import Image
# Display the image
plt.figure(figsize=(60, 13))
plt.imshow(image)
# Get the current reference
ax = plt.gca()
# Create a Rectangle patch
rect = Rectangle((10.1, 193.3), width=2477.5, height=1417.3, linewidth=6,
edgecolor='r', facecolor='none')
# Add the patch to the Axes
ax.add_patch(rect)
My goal is to extract text in between those bounding box, I have results from OCR from google vision API but it is for the entire image not for those separate bounding box,
Here is the code,
path = "docs_converted/AP_RTF.docx.pdf_0.jpeg"
from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
response = client.document_text_detection(image=image)
texts = response.text_annotations
# document = response.full_text_annotation
# these are the results from ocr from google
description: "ANDREW"
bounding_poly {
vertices {
x: 290
y: 65
}
vertices {
x: 599
y: 66
}
vertices {
x: 599
y: 138
}
vertices {
x: 290
y: 137
}
}
These are different from bounding box coordinates.
If anyone know how to achieve what I am looking for please help me out.