Tesseract with phone images of receipts

254 views
Skip to first unread message

Wayne Rumble

unread,
Jun 23, 2016, 10:09:35 AM6/23/16
to tesseract-ocr

I am writing a program for my final project and part of it extracts quantity item name and price from a restaurant receipt using tesseract. I am using ionic with angular and a rails api to pass the image from a phone to the rails api where it converts the image and passes back the extracted information via a server to be displayed via angular and ionic again. The issue im having is that when testing with restaurant receipts found online,

Receipt image i was using

and cropping the image to contain just the items and total it worked fine. But when printing out this receipt image and taking a photo of it from my phone then cropping and passing it to the following methods the results are basically inconclusive and useless.

Here is the image processing code:


module Converter


  def tesseract
    system("convert #{Bill.last.image.url}  -scale 50% receipt.jpg")
    system("convert receipt.jpg -type Grayscale receipt.jpg")
    system("tesseract receipt.jpg output")
    find_total
    create_items
    system("rm output.txt")
    system("rm receipt.jpg")
  end

  private

  def find_total
   a = File.readlines('./output.txt').grep(/TOTAL/)
   b = a.map {|x| x[/\d+(?:[.,]\d+)?/].to_f}[0]
   Bill.last.update(total:"#{b}")
  end

  def create_items
   File.open './output.txt', 'r' do |file|
     file.each_line do |line|
       if search_for_words(line).length != 0
         Item.create(
         name: search_for_words(line),
         price: search_for_float(line),
         quantity: search_for_integer(line),
         bill_id: Bill.last.id
         )
       end
     end
   end
  end

  def search_for_float(line)
    line.gsub!(',','.')
    line.scan(/(\d+[,.]\d+)/).flatten[0].to_f
  end

  def search_for_integer(line)
    line.gsub!(',','.')
    line.scan(/(\d+)/).flatten[0].to_i
  end

  def search_for_words(line)
    line.split(" ").select{|word|word.match(/([a-z])/)}.join(" ")
  end

end

I had version and compatability troubles when using the tesseract gem so resorted to using it via the command line instead. Any insights on whether is should be resizing etc the image and so on would be great.

Thanks in advance

Reply all
Reply to author
Forward
0 new messages