Help in using Tesseract via perl system command

46 views
Skip to first unread message

Chula Vista

unread,
May 29, 2020, 4:03:25 AM5/29/20
to tesseract-ocr
Hello,

Just got started with Tesseract, and from a command prompt, I can do, "C:\\Program Files\\Tesseract-OCR\\tesseract.exe" I:\\OCR\\arf-0026.pbm "I:\\OCR\\aa"  --psm 6 --dpi 300 successfully, no problem.

However, in trying to replicate this in a perl script, I cannot work in those { --psm 6 --dpi 300 } params.

Without getting in the --psm 6, I get an 'Empty page!!' result.

I have tried  my @args = ('I:\\OCR\\arf-0026.pbm',  'I:\\OCR\\aa', ' --psm 6',  ' --dpi 300'), 

by way of -->  system("C:\\Program Files\\Tesseract-OCR\\tesseract.exe", @args);

but that leads to --> read_params_file: Can't open  --psm 6 <--

Must be simple but I am stuck. thanks





Robin Watts

unread,
May 29, 2020, 4:46:36 AM5/29/20
to tesser...@googlegroups.com, Chula Vista
On 29/05/2020 08:50, Chula Vista wrote:
> I have tried  my @args = ('I:\\OCR\\arf-0026.pbm', 'I:\\OCR\\aa', '
> --psm 6', ' --dpi 300'),

my @args = ('I:\\OCR\\arf-0026.pbm', 'I:\\OCR\\aa', '--psm', '6',
'--dpi', '300'),

I suspect.

Robin

Helmut Wollmersdorfer

unread,
Jul 7, 2020, 1:38:45 PM7/7/20
to tesseract-ocr
Can't tell you for Windows. 

On Unix-like system it works like this:

sub tesseract {
  my ($conf, $imagefile) = @_;

# tesseract  my_0002.png my_0002.png
# -c load_bigram_dawg=false -c load_freq_dawg=false -c load_system_dawg=false
# -c tessedit_write_images=true
# --oem 3

  # e.g. 'tessdata'  => '/usr/local/share/tessdata',
  $ENV{'TESSDATA_PREFIX'} = $conf->{'tessdata'} if $conf->{'tessdata'};

  # e.g. 'tesseract' => '/usr/local/bin/tesseract',
  my $command  = $conf->{'tesseract'};
  my $basename = $imagefile;
  my $language = '-l ' . $options->{'language'};
  my $tess_options  = '-c tessedit_write_images=true'; # writes tessinput.tif
  #my $files    = 'makebox hocr txt pdf';       # writes $base.box $base.hocr $base.txt
  my $files    = 'txt';          # writes $base.txt
  $files = $options->{'file_format'};
  my $tessdata = '';
  $tessdata = '--tessdata-dir ' . $conf->{'tessdata'} if $conf->{'tessdata'};
  my $psm = '--psm 4';
  if ($options->{'psm'} =~ m/^\d{1,2}$/) {
    $psm = '--psm ' . $options->{'psm'};
  }

  $basename =~ s/\.(png|jpg|tif|gif)$//i;

  #my @command = ($command, $imagefile, $basename, $language, $tess_options, $tessdata, $files);
  my @command = ($command, $imagefile, $basename, $language, $psm, $tessdata, $files);


  my $command_string = join(' ', @command);
  print STDERR $command_string, "\n" if ($options->{'verbose'} >= 1);
  system($command_string);

  if ($? == -1) {
    die "$command $imagefile failed: $!";
  }

  my $new_name = $basename . '.tessinput.tif';
  if (-e 'tessinput.tif' && -f 'tessinput.tif') {
    rename('tessinput.tif',"$new_name");
  }

  my $txtfile = $basename . '.txt';
  $basename =~ s/_\d+$//i;
  my $txtall  = $basename . '.tess.txt';

  if (($files =~ m/txt/) && -e $txtfile && -f $txtfile) {
    $command_string = "cat $txtfile >> $txtall";
    print STDERR $command_string, "\n" if ($options->{'verbose'} >= 1);
    system($command_string);

    if ($? == -1) {
      die "$command_string failed: $!";
    }
  }
}

 

 
Reply all
Reply to author
Forward
0 new messages