Veryfi OCR API (and Emacs front-end for it)

TRS-80

unread,

Jun 21, 2025, 10:15:25 AM6/21/25

to bean...@googlegroups.com

Hello Friends,

I am very pleased to announce that I have switched careers and should no
longer have to travel all the time, so now I can get back to working on
many projects I have wanted to since years now. :)

One of those is dealing with the tedium of paper receipts in a more
automated way.

I have been looking for some kind of receipt OCR for years now. I even
played with some of the F/LOSS OCR tools which are available, but even
there it seems to me you have to tweak them to get good results I think.
Or maybe I was just bad at it. :D Anyway at some point I realized the
best way might just be to pay someone to figure out all those fiddly
details.

Now it turns out I didn't even need to pay any money after all! Anyway,
I'm just trying to illustrate my thought process here.

So finally I stumbled across this Veryfi API. I am sure there are
others but so far I found this works very well and very accurately. I
have been really happpy with it.

You can get up to 100 API calls per month for free with a developer
account (which I was able to obtain without any trouble). And that is
enough for my personal needs.

I thought others might find this interesting, so I wanted to share it
with the list. I would encourage you to make a developer account (you
can even get a 2 week free trial I think first) and just upload some
recipts via their web interface, and see how accurate it is for you.

I have no affiliation with them, just been looking for something like
this for years.

Now, I prefer to work in Emacs Lisp (rather than Python) so of course
the next thing was writing a front-end to this API in Emacs. :) I
realize less of you are probably interested in that, but I thought I
would mention it anyway. It has been working flawless for me, I just
need to write a README and publish it. Maybe if there was some interest
it might motivate me to do so, sooner.

--
Cheers,
TRS-80

Martin Blais

unread,

Jun 21, 2025, 12:41:05 PM6/21/25

to bean...@googlegroups.com

These days for OCR I think you can just download a free vision model from Hugginface and run it locally and it would work.

I remember doing that in the recent past.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/beancount/87frft89lc.fsf%40isnotmyreal.name.

TRS-80

unread,

Jun 21, 2025, 2:25:05 PM6/21/25

to bean...@googlegroups.com

Martin Blais <bl...@furius.ca> writes:

> These days for OCR I think you can just download a free vision model
> from Hugginface and run it locally and it would work.
> I remember doing that in the recent past.

I imagine something like that just returns a big blob of plain text,
amirite? Or can you have it return the data to you in a more structured
format, with specific fields (key-value pairs) more relevant to a
receipt?

With this Veryfi API, they are returning JSON with very specific fields
(e.g., card number, date, payee, tax, tip, total, even individual
receipt lines (including UPCs when avalable), etc.) and it seems to be
very accurate so far for me.

When I played with other general OCR tools in the past, I remember the
OCR itself was only half the battle. Even if you got that accurate, you
then had to write regex trying to pull all this other specific info out
from that.

--
Cheers,
TRS-80

Timothy Jesionowski

unread,

Jun 21, 2025, 6:30:50 PM6/21/25

to bean...@googlegroups.com

OCR plus some LLM prompts would be a more general solution than regex, if it works. But if Veryfi works than I wouldn't bother. (LLMs are probably what they use anyways...)

Sincerely,

Timothy Jesionowski

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/beancount/87bjqh7y19.fsf%40isnotmyreal.name.

Alen Šiljak

unread,

Jun 22, 2025, 5:29:00 PM6/22/25

to Beancount

> I imagine something like that just returns a big blob of plain text,

It returns whatever you ask it to.

I've had a bunch of lab result photos, taken with a phone, translated into json files and later analysed and compared, for example.

Martin Blais

unread,

Jun 22, 2025, 7:35:13 PM6/22/25

to bean...@googlegroups.com

Today's models are pretty amazing actually.

You can say things like "output the text under the white cat" and that would likely work.

I'm blown away every moment of the day these days using these.

--
You received this message because you are subscribed to the Google Groups "Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beancount+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/beancount/6b52fd4a-d737-4f63-b9f6-00a62f549d8cn%40googlegroups.com.

Stefano Zacchiroli

unread,

Jun 23, 2025, 2:51:51 AM6/23/25

to bean...@googlegroups.com

On Sun, Jun 22, 2025 at 07:34:58PM -0400, Martin Blais wrote:
> Today's models are pretty amazing actually.
> You can say things like "output the text under the white cat" and that
> would likely work.
> I'm blown away every moment of the day these days using these.

Yeah, but especially for vision models it seems to me that the quality
gap between self-hostable open-weight models and remote proprietary ones
is still pretty big. Last time I tried vision models locally for OCR in
the context of personal finance, the results weren't great (= not usable
yet), but it was ~1 year ago and things move fast. If someone on this
list have concrete experiences about self-hostable vision models that
work well for this, I'd love to hear about the specifics (which model,
system prompt, etc.).

Cheers
--
Stefano Zacchiroli . za...@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/
Télécom Paris, Polytechnic Institute of Paris o o o </> <\>
Co-founder & CSO Software Heritage o o o o /\|^|/\
Mastodon: https://mastodon.xyz/@zacchiro '" V "'

Reply all

Reply to author

Forward