Can this be automated? Pull information from PDF into new Google Doc

95 views
Skip to first unread message

Kenneth Stuart

unread,
Oct 3, 2024, 1:45:11 PMOct 3
to Google Apps Script Community
I have a workflow that generates a PDF in a Shared Google Drive. The PDF always has the same format and headings but different text under each heading. This PDF originates from a separate program unrelated to Google suite. Anytime a new PDF is generated, I manually create a new Google Doc from a blank template file, name the Doc based on information in the PDF, pull information from the PDF into the Doc body, and insert a link in the Doc body back to the original PDF.

Can any part of this workflow creating and populating a new Google Doc be automated? I am entirely unfamiliar with what can be automated with Google Docs.

I do not need the automation to run immediately when a new PDF appears. Once per day would be sufficient.

George Ghanem

unread,
Oct 3, 2024, 10:14:56 PMOct 3
to google-apps-sc...@googlegroups.com

All of it can be automated except for pulling the info from PDF file into the Google Doc file. That may be possible programmatically but will require some special operations to achieve.


--
You received this message because you are subscribed to the Google Groups "Google Apps Script Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-apps-script-c...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-apps-script-community/9a3a300d-f398-456d-99c0-e3f169004dc6n%40googlegroups.com.

Brian Meehan

unread,
Oct 4, 2024, 1:04:00 AMOct 4
to google-apps-sc...@googlegroups.com
I do a very similar workflow with our payroll stubs. I rely on one computer where the workflow is executed from to have Google Drive installed so it can see the Shared Drive as a local file system. Then it runs a quick Python script using PDF Plumber. It can do amazing PDF things...extract text, rotate and add pages, re-order pages, etc. Then it touches a file, and the Google Apps script has a trigger that looks for the "i_am_done" file, runs its script, then deletes the i_am_done file.



--

Brian Meehan 明益友
Director of Learning Technology 資訊部主任
School: 886-7-586-3320
Email: bme...@kas.tw 


Kaohsiung American School | +886-7-5863300 |  http://kas.tw
No.889 Cueihua Rd. Zuoying Dist., Kaohsiung City, Taiwan 81354


This email and any included attachments are confidential and intended solely for the addressee.

If you are not the intended recipient or have received this message in error:

1) please alert the sender and delete this message and any attachments;

2) you are hereby notified that any use or storage of this message or its attachments is strictly prohibited.

DimuDesigns

unread,
Oct 4, 2024, 9:44:48 AMOct 4
to Google Apps Script Community
Can you directly access the data used to generate the PDF, preferably in a portable data exchange format such as JSON or CSV? If so, you can avoid parsing the PDF entirely and just fetch the data directly to populate your Google Doc template. If that's not an option you can look into using a PDFLib, a PDF library that can be used directly from GAS. If you're feeling fancy you can even leverage Google's Document AI to extract the data for you (its not free though). 

Kenneth Stuart

unread,
Oct 5, 2024, 9:52:56 AMOct 5
to Google Apps Script Community
I cannot directly access the database that produces the PDF

Kenneth Stuart

unread,
Oct 5, 2024, 9:54:58 AMOct 5
to Google Apps Script Community
George, do you mean there is a way with App Scripts to notice a new PDF and then create a new doc? That would still be helpful even if I can’t populate the Doc with data from the PDF

George Ghanem

unread,
Oct 5, 2024, 3:54:17 PMOct 5
to google-apps-sc...@googlegroups.com
Yes, you can set it on a trigger and check if any new files added and if so kick off the new doc build as needed.

Reply all
Reply to author
Forward
0 new messages