Parsing PDFs From Gmail to Sheets. Is this possible?

TJ Huntley

unread,

Jun 27, 2022, 3:38:05 AM6/27/22

to Google Apps Script Community

For several weeks I have been trying to get data from PDFs in gmail to sheets. Is there a way to do this?

CBMServices Web

unread,

Jun 27, 2022, 2:20:27 PM6/27/22

to google-apps-sc...@googlegroups.com

Have you looked at this solution?

https://stackoverflow.com/questions/1554280/how-to-extract-text-from-pdf-in-javasript

Share what code you have so far and we can potentially give you pointers on where you are stuck.

On Mon, Jun 27, 2022 at 12:38 AM TJ Huntley <tjhunt...@gmail.com> wrote:

For several weeks I have been trying to get data from PDFs in gmail to sheets. Is there a way to do this?

--
You received this message because you are subscribed to the Google Groups "Google Apps Script Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-script-c...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-apps-script-community/6865962d-be62-4577-83a6-bad815f3fa97n%40googlegroups.com.

Remco Edelenbos

unread,

Jun 27, 2022, 4:05:20 PM6/27/22

to Google Apps Script Community

You can use (advanced) drive copy and convert with OCR to google docs. Then get the body and form there you need some REGEX skills...

const blob = file.getBlob()

const resource = {

title: `Temp-${name}`,

mimeType: file.getMimeType(),

parents: [{ id: rootId }]

};

const tempFile = Drive.Files.insert(resource, blob, { ocr: true, ocrLanguage: "en" })

dimud...@gmail.com

unread,

Jun 28, 2022, 10:14:09 AM6/28/22

to Google Apps Script Community

It's possible. But its not exactly trivial to pull off.

I'm not aware of a turn-key solution so I'd build something custom using one or more intermediary services.

Some have already recommended Google Drive OCR, but it leaves a lot to be desired especially if the content you want to extract from your PDFs have inconsistent layout.

If your PDFs fall within a common class of documents (invoices, purchase orders, etc.) I'd go for a OCR + NLP solution like Google's Document AI.

Google's Document AI Its not a free service, but its one of the more robust solutions out there. There is no native integration with Google Apps Script but there is a REST API available to can be accessed from GAS.

However, if you need an automated process that can scale to handle large volume of documents in real-time, I'd recommend using a different tech stack (GCP Cloud Functions + NodeJs).

Clark Lind

unread,

Oct 4, 2022, 8:12:09 AM10/4/22

to Google Apps Script Community

Not sure if you ever found a solution. I was looking for a solution also and found this thread. There are a few Apps Script developers I usually turn to, and for something like this, Amit Agarwal usually has cracked this nut already. He didn't disappoint!
See if this helps: https://www.labnol.org/extract-text-from-pdf-220422

On Monday, June 27, 2022 at 3:38:05 AM UTC-4 tjhunt...@gmail.com wrote:

Clark Lind

unread,

Oct 4, 2022, 8:14:30 AM10/4/22

to Google Apps Script Community

Amit's solution pretty much implements the answer provided by remco above.

Reply all

Reply to author

Forward