PDF contains blocks of text with their coordinates and a bunch of PostScript formatting directives (it's a fun language, you should learn it) and other binary objects. Split lines might generate multiple blocks of text. It's a drawing. There isn't much structure to read, if you embark on that project, you have to use a bunch of heuristics to figure out the original structure from the blocks of text and their location. It's not quite a research problem but it's not a job you'll ace in one or two weekends of hacking (though it might be very rewarding, I have an incomplete stab at extracting tables from PDF that looks promising so I have a feeling it's definitely doable).
QFX and QBO are variants of an XML format called OFX with a large set of tags which are often produced inconsistently by the same Java programmers I alluded to in the other thread. (Or they might have been .NET programmers, if I'm not mistaken OFX was born in the guts of the evil empire itself, but probably before .NET days.)
Which one is better?
If all have the same numbers, the simplest one.
If some have more information, judge whether the extra data is something you want and worth the extra hacking effort.
I'd shoot for CSV myself (I like it simple and I prefer spending weekend time in the kitchen than in front of the computer).
mpl...@gmail.com
unread,
Apr 17, 2018, 7:32:35 AM4/17/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Sign in to report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Beancount
Martin, thank you for your detailed reply! Now it's super clear :)