ChatGPT Helpful In Translating Tables

42 views
Skip to first unread message

Thomas Passin

unread,
Jun 18, 2023, 11:06:30 PM6/18/23
to leo-editor
Very thoughtful piece by Jon Udell - Why LLM-assisted table transformation is a big deal.

David Szent-Györgyi

unread,
Jun 22, 2023, 8:52:14 AM6/22/23
to leo-editor
On Sunday, June 18, 2023 at 11:06:30 PM UTC-4 tbp1...@gmail.com wrote:
Very thoughtful piece by Jon Udell - Why LLM-assisted table transformation is a big deal.
 
In my day job, I have to pull useful items out of PDFs  - pictures, text, tables. PDFs often make this difficult - because of password-protected access, and because the information that renders as neatly organized text and tables when printed or displayed in a viewer is not neatly organized - the data in the PDF requires rearrangement. Jon Udell's article mentions this without discussing the specifics of the articles he processes. 

It is true that tools like ChatGPT are trained on text and as such most likely to work on text, but they do not reason about non-text. I would argue that a PDF is non-text, and as such, recreating neatly organized text and tables is error-prone; if we really value the facts in a technical publication, we need to start with suitable source, which probably needs carefully done markup created by experts in the subject matter of the publication. 

I would not trust a complex table produced by ChatGPT, since it is not only not a subject matter expert, it cannot reason as a human being can when making sense of such a document. 

I don't know what to say about the extraordinary domain of software that produces those PDFs. How many of those software applications incorporate features meant to allow exploration of the structure of a document? This sounds to me like the sort of job for which Leo is well-equipped! 

Thomas Passin

unread,
Jun 22, 2023, 11:22:20 AM6/22/23
to leo-editor
Even copying selected text out of a pdf file can be unpleasant.  Often there will be no newlines, so words may run together when they were visually separated by a line break.

David Szent-Györgyi

unread,
Jun 23, 2023, 5:50:55 AM6/23/23
to leo-editor
On Thursday, June 22, 2023 at 11:22:20 AM UTC-4 tbp1...@gmail.com wrote:
Even copying selected text out of a pdf file can be unpleasant.  Often there will be no newlines, so words may run together when they were visually separated by a line break.

Yes, indeed. Part of my day job involves serving as acquisitions editor for content for my employer's Web site, and I must pull content from PDFs from all over the world, some of them produced by companies whose workers' native language is not in Roman letters. Fishing data out of PDFs is more of a toothache than a headache! 
Reply all
Reply to author
Forward
0 new messages