(Linux) Convert text to Excel (and back) - what are people's preferred methods?

13 views
Skip to first unread message

Kenny McCormack

unread,
May 19, 2022, 4:15:42 PMMay 19
to
First of all, I know the standard way - the way I've been doing it for
decades - which is to:
1) From Excel, write it out as a text file (tab delimited).
2) Work on it in Linux, using, e.g., AWK, creating a new tab-delimited file.
3) Load the tab-delimited file back into Excel

This, of course, works fine, but is tedious and not really automate-able;
you still have lots of manual steps.

What I'm looking for is something that can actually read/write Excel's
native format - i.e., an XLS file. Note that, for the purposes of this
discussion, we are only talking about XLS (the old, classic Excel format),
not XLSX.

I believe there are packages available for the more popular scripting
languages, such as Perl, Tcl, maybe Python, but not for AWK, that do this.
But I've never used any of them. Even though I'm doing my main processing
in AWK, I would not have a problem with using something in one of these other
languages, if someone can/would "spoon feed" me on how to do it.

Finally, note that I've done this sort of thing in the past using Excel
macros and/or COM automation, to do it (i.e., automate it) in Excel itself.
This is a possibility, but is kind of messy. I'd prefer a straight command
line way.

Also, one of the issues that I'm trying to get around is that when you do
step 3 in the above list, you then have to mess around a fair amount in
Excel to re-size your columns, and restore other bits of meta-information
that got lost in step 1. This again, would be nice to have be automated by
the command line tool that I am seeking. As noted, I had previously done
this using Excel macros.

--
Trump - the President for the rest of us.

https://www.youtube.com/watch?v=JSkUJKgdcoE

Lew Pitcher

unread,
May 19, 2022, 5:21:06 PMMay 19
to
I can't speak to your step 3; I don't know of any way to preserve the
metadata (column width, highlighting, etc) when converting an XLS file to
CSV.

However, I can suggest a simplification of your steps 1 and 2: use
LibreOffice/OpenOffice Calc to convert the XLS file to CSV. It's a
simple, one-line commandline invocation:
scalc --invisible --convert-to csv --outdir target/directory input.XLS

Take a look at
scalc --help
for these and other options

HTH
--
Lew Pitcher
"In Skills, We Trust"

Kenny McCormack

unread,
May 19, 2022, 6:02:20 PMMay 19
to
In article <t66cbd$f1o$1...@dont-email.me>,
Lew Pitcher <lew.p...@digitalfreehold.ca> wrote:
...
>I can't speak to your step 3; I don't know of any way to preserve the
>metadata (column width, highlighting, etc) when converting an XLS file to
>CSV.

Actually, step 3 is about converting from tab-delimited to XLS, but I get
the your meanining. To do this in full generality, you'd need to preserve
it when you went down to text so that you could put it back when you went
back up to XLS. But what I had in mind was some sort of option collection
(either on the command line or in a config file) that you'd feed to the
tab-delimited-to-XLS program to cause it to insert that meta-data into the
output (XLS) file.

>However, I can suggest a simplification of your steps 1 and 2: use
(Actually, only step 1. You still need to do the actual processing of the
data in step 2)

>LibreOffice/OpenOffice Calc to convert the XLS file to CSV. It's a
>simple, one-line commandline invocation:
> scalc --invisible --convert-to csv --outdir target/directory input.XLS
>
>Take a look at
> scalc --help
>for these and other options

This is not bad. I've actually done this before - and you're right, using
LO on both the input and output sides might not be such a bad idea.

But it does seem like overkill. I had hoped for something more
lightweight, such as the packages that I alluded to in the OP - which seem
to be present for languages such as Perl and Tcl.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/ItsTough

Eli the Bearded

unread,
May 19, 2022, 8:12:04 PMMay 19
to
In comp.unix.shell, Kenny McCormack <gaz...@shell.xmission.com> wrote:
> First of all, I know the standard way - the way I've been doing it for
> decades - which is to:
> 1) From Excel, write it out as a text file (tab delimited).
> 2) Work on it in Linux, using, e.g., AWK, creating a new tab-delimited file.
> 3) Load the tab-delimited file back into Excel
...
> I believe there are packages available for the more popular scripting
> languages, such as Perl, Tcl, maybe Python, but not for AWK, that do
> this. But I've never used any of them. Even though I'm doing my main
> processing in AWK, I would not have a problem with using something in
> one of these other languages, if someone can/would "spoon feed" me on
> how to do it.

The case of read/write one sheet in one xls file is covered in the
sample code in the documentation for the Perl module. Here's my version
of that sample code:

#!/usr/bin/perl -w
# Parse and dump xls files, from the Spreadsheet::ParseExcel documentation.
# This version puts all the sheets in a single file.
# 1 June 2001
use strict;
use Spreadsheet::ParseExcel;

my $oExcel = new Spreadsheet::ParseExcel;
my $xlsfile = shift;

die "$0: usage 'parsexls foo.xls > foo.out'\n"
unless (defined($xlsfile) and (-f $xlsfile));

#1.1 Normal Excel97
my $oBook = $oExcel->Parse($xlsfile);
my($iR, $iC, $oWkS, $oWkC);

print "FILE :", $oBook->{File} , "\n";
print "COUNT :", $oBook->{SheetCount} , "\n";
print "AUTHOR:", $oBook->{Author} , "\n";

my $nl;

for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
$oWkS = $oBook->{Worksheet}[$iSheet];
print "--------- SHEET:", $oWkS->{Name}, "\n";

for(my $iR = $oWkS->{MinRow} ;
defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {

$nl = '';
for(my $iC = $oWkS->{MinCol} ;
defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) {

$oWkC = $oWkS->{Cells}[$iR][$iC];
if ($oWkC) {
print "\t" if $nl;
print $oWkC->Value;
$nl = "\n";
} else {
print "\t" if $nl;
}

} # for column in row
print $nl;
} # for row in sheet
} # for sheet in book

exit
__END__


And here's my tsvtoxls reverse tool:

#!/usr/bin/perl -w
# Turn one or more TSV files into an XLS file.
# Each input file will be a sheet.
#
# Usage:
# tsvtoxls TSVfile [ TSVfile ] output.xls
# March 2011

use strict;
use Spreadsheet::WriteExcel;

# WriteExcel objects
use vars qw(
$WB $WS
);

# Other vars
use vars qw(
@sheetfiles $outfile $thisfile $line @values $value
);

for $thisfile (@ARGV) {
if( $thisfile =~ /[.]xls$/i) {
if(!defined($outfile)) {
$outfile = $thisfile;
} else {
die "$0: cannot have two output files: $outfile and $thisfile\n";
}

next;
}

if(! -f $thisfile) {
die "$0: this file $thisfile is not a file\n";
}

push(@sheetfiles, $thisfile);
}

if($#sheetfiles < 0) {
die "$0: no input files\n";
}
if(!defined($outfile)) {
die "$0: no output file\n";
}

$WB = Spreadsheet::WriteExcel->new($outfile);

for $thisfile (@sheetfiles) {
if(!open(SHEET, "< $thisfile")) {
warn "$0: cannot open $thisfile: $!, skipping\n";
next;
}
$thisfile =~ s:.*/::; # drop directory
$thisfile =~ s/[.]tsv$//; # drop suffix
$thisfile =~ tr/_/ /; # underscore to space

$WS = $WB->addworksheet($thisfile);

while(defined($line = <SHEET>)) {
chomp $line;
@values = split(/\t/, $line);

# row = line from file, start in column 0
$WS->write_row($., 0, \@values);
}
close SHEET;
}
$WB->close;
__END__

> Also, one of the issues that I'm trying to get around is that when you
> do step 3 in the above list, you then have to mess around a fair
> amount in Excel to re-size your columns, and restore other bits of
> meta-information that got lost in step 1. This again, would be nice
> to have be automated by the command line tool that I am seeking. As
> noted, I had previously done this using Excel macros.

Those spreadsheet modules could probably read all the column, row, and
cell formatting from one shhet and apply it to another. I haven't tried.
I think I set column widths once, but I can't find my code for that now.

Also note, there's an a whole separate module for xlxs, but interface
is pretty similar as I recall.

Elijah
------
comp.lang.perl.misc will be good for further Perl questions

vgersh99

unread,
Jun 27, 2022, 10:54:32 AM (13 hours ago) Jun 27
to
On Thursday, May 19, 2022 at 4:15:42 PM UTC-4, Kenny McCormack wrote:
> First of all, I know the standard way - the way I've been doing it for
> decades - which is to:
> 1) From Excel, write it out as a text file (tab delimited).
> 2) Work on it in Linux, using, e.g., AWK, creating a new tab-delimited file.
> 3) Load the tab-delimited file back into Excel
>
...
Maybe Ed Morton's SO post could help somewhat:
https://stackoverflow.com/questions/38805123/how-do-i-use-awk-under-cygwin-to-print-fields-from-an-excel-spreadsheet

Reply all
Reply to author
Forward
0 new messages