I am writing a program for my final project and part of it extracts quantity item name and price from a restaurant receipt using tesseract. I am using ionic with angular and a rails api to pass the image from a phone to the rails api where it converts the image and passes back the extracted information via a server to be displayed via angular and ionic again. The issue im having is that when testing with restaurant receipts found online,
and cropping the image to contain just the items and total it worked fine. But when printing out this receipt image and taking a photo of it from my phone then cropping and passing it to the following methods the results are basically inconclusive and useless.
Here is the image processing code:
module Converter
def tesseract
system("convert #{Bill.last.image.url} -scale 50% receipt.jpg")
system("convert receipt.jpg -type Grayscale receipt.jpg")
system("tesseract receipt.jpg output")
find_total
create_items
system("rm output.txt")
system("rm receipt.jpg")
end
private
def find_total
a = File.readlines('./output.txt').grep(/TOTAL/)
b = a.map {|x| x[/\d+(?:[.,]\d+)?/].to_f}[0]
Bill.last.update(total:"#{b}")
end
def create_items
File.open './output.txt', 'r' do |file|
file.each_line do |line|
if search_for_words(line).length != 0
Item.create(
name: search_for_words(line),
price: search_for_float(line),
quantity: search_for_integer(line),
bill_id: Bill.last.id
)
end
end
end
end
def search_for_float(line)
line.gsub!(',','.')
line.scan(/(\d+[,.]\d+)/).flatten[0].to_f
end
def search_for_integer(line)
line.gsub!(',','.')
line.scan(/(\d+)/).flatten[0].to_i
end
def search_for_words(line)
line.split(" ").select{|word|word.match(/([a-z])/)}.join(" ")
end
end
I had version and compatability troubles when using the tesseract gem so resorted to using it via the command line instead. Any insights on whether is should be resizing etc the image and so on would be great.
Thanks in advance