Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Convert word table to SQL table

800 views
Skip to first unread message

Jens Benecke

unread,
Jul 5, 2000, 3:00:00 AM7/5/00
to

Hi,

there is a Word file here that contains one single table with about 10,000
rows (it's a literature database). Word takes about one and a half hours to
read this file on an SCSI-Athlon system (don't ask me who created it and
how!), and my job is to convert this file to a format that a SQL database
can read or that can be converted into SQL format with little effort, i.e.
CSV.

How can I convert a Word table into a text-like format? Excel and Access
refuse to read or import that file. catdoc exports it to plain text alright
(takes about 5 seconds) but screws up the formatting (i.e. makes multiple
lines out of one, leaves empty cells out altogether etc) and cannot cope
with umlauts which this file contains a lot.


Does anyone have an idea?

As long as I get something readable (i.e. plain text) that I can feed a
Perl (or something) script with, I'm satisfied. It doesn't need to be
anything fancy.

Oh, btw: there are copies of the 95, 97 and 2000 releases of MS Office
available, so pick your choice.


Thanks in advance! :)


PS: Please CC me in your replies, so that I don't miss anything. Thanks.

--
Jens Benecke

Suzanne S. Barnhill

unread,
Jul 5, 2000, 3:00:00 AM7/5/00
to
You could start with Convert Table to Text and then save in a text format.

--
Suzanne S. Barnhill
Microsoft Word MVP
Words into Type
Fairhope, AL USA

Jens Benecke <je...@pinguin.conetix.de> wrote in message
news:3aa0k8...@192.168.1.100...

John A Fotheringham

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to
>How can I convert a Word table into a text-like format? Excel and Access
>refuse to read or import that file. catdoc exports it to plain text alright
>(takes about 5 seconds) but screws up the formatting (i.e. makes multiple
>lines out of one, leaves empty cells out altogether etc) and cannot cope
>with umlauts which this file contains a lot.
>
>Does anyone have an idea?
>
>As long as I get something readable (i.e. plain text) that I can feed a
>Perl (or something) script with, I'm satisfied. It doesn't need to be
>anything fancy.

Try saving as RTF. RTF files are text with tags in them. The tags
all have the form \tag and may be delimited by curly braces {}.

If there's not too much formatting, it should be pretty easy to parse
the resulting text using something like perl.

Alternatively save as HTML, although I'm not sure how easy that would
be to parse.

Of course if it takes that long to load it may take even longer to
convert to either format.

HTH.
--
John A Fotheringham (Jaf)
Convert text files to HTML or RTF in seconds
http://www.jafsoft.com/asctohtm/index.html or
http://www.jafsoft.com/asctortf/index.html

Jens Benecke

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to
In comp.os.ms-windows.apps.word-proc Suzanne S. Barnhill <sbar...@zebra.net> wrote:

> You could start with Convert Table to Text and then save in a text format.

Sorry, I forgot: Word has the file format box in the "Save As" dialog
disabled. Don't ask me why, apparently it cannot cope with converting such
a big file.

Otherwise, this would be quite trivial. Oh, and other things, like "Select
All" + paste into notepad etc. don't work either. Too much data for the
clipboard, I suppose.


Thanks anyway. =;)

--
ciao, Jens int main() { http://www.pinguin.conetix.de; }
jben...@web.de http://www.hitch-hiker.de
je...@pinguin.conetix.de http://www.linuxhelp.de
je...@linuxfaq.de http://www.linuxfaq.de

Jens Benecke

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to
In comp.os.ms-windows.apps.word-proc John A Fotheringham <spam.my....@jafsoft.com> wrote:

>>How can I convert a Word table into a text-like format? Excel and Access

>>refuse to read or import that file. catdoc exports it to plain te..
>>.. long as I get something readable (i.e. plain text) that I can feed a


>>Perl (or something) script with, I'm satisfied. It doesn't need to be
>>anything fancy.
> Try saving as RTF. RTF files are text with tags in them. The tags
> all have the form \tag and may be delimited by curly braces {}.

> Of course if it takes that long to load it may take even longer to
> convert to either format.

Sorry, I forgot:

for some strange reason, Word has disabled the document format box in the
Save As.. dialog. Otherwise, I wouldn't have asked ... =;) And it crashes
after ~30min when I select "Convert Table to Text".

I've put the file up on my FTP site at 134.28.73.83, port 7014, in
/pub/Literaturdaten.doc.zip. Free Pizza Hut coupon for the first to convert
this into plain text/CSV/... ;)

Cindy Meister -WordMVP-

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to
Hi Jens,

> Sorry, I forgot: Word has the file format box in the "Save As" dialog
> disabled.
>
This sounds like a template (rather than document) file, even if it
doesn't have a *.dot extension.

From Explorer, right-click the file and see if NEW isn't the default?
If it is, select that to get a true document, then you can save in any
other format.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister
http://go.compuserve.com/MSOfficeForum

Rückfragen & Antworten nur in der Newsgroup, bitte!


Suzanne S. Barnhill

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to

Well, you certainly wouldn't be able to paste into Notepad. WordPad maybe.
Have you tried selecting a little bit at a time and cutting and pasting? Or
possibly selecting a few rows at a time and converting to text?

--
Suzanne S. Barnhill
Microsoft Word MVP
Words into Type
Fairhope, AL USA

Jens Benecke <je...@pinguin.conetix.de> wrote in message

news:f0s1k8...@192.168.1.100...


> In comp.os.ms-windows.apps.word-proc Suzanne S. Barnhill
<sbar...@zebra.net> wrote:
>
> > You could start with Convert Table to Text and then save in a text
format.
>

> Sorry, I forgot: Word has the file format box in the "Save As" dialog

> disabled. Don't ask me why, apparently it cannot cope with converting such
> a big file.
>
> Otherwise, this would be quite trivial. Oh, and other things, like "Select
> All" + paste into notepad etc. don't work either. Too much data for the
> clipboard, I suppose.
>
>

> Thanks anyway. =;)

Jens Benecke

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to
In comp.os.ms-windows.apps.word-proc Cindy Meister -WordMVP- <CindyM...@swissonline.ch> wrote:
> Hi Jens,

>> Sorry, I forgot: Word has the file format box in the "Save As" dialog
>> disabled.

> This sounds like a template (rather than document) file, even if it
> doesn't have a *.dot extension. From Explorer, right-click the file and
> see if NEW isn't the default? If it is, select that to get a true
> document, then you can save in any other format.

Yes!

That was it. At least, it seems to have been. I chose "New" and Word opened
(well, it's still opening - started about 45min ago =;) the file and even
now, I can go to Save As ... and select other file types.

"*.txt" seems to export it one line per table cell - but that is parseable,
I'll manage. Finally I can populate the MySQL database on our Linux server.
:)


Thanks to all who helped!


--
ciao, Jens je...@pinguin.conetix.de http://www.pinguin.conetix.de
jben...@web.de je...@linuxfaq.de http://www.linuxfaq.de
http://www.hitch-hiker.de

Steve Atkinson

unread,
Jul 6, 2000, 3:00:00 AM7/6/00
to

So did you give yourself a Pizza Hut coupon?
Jens Benecke wrote in message ...

>In comp.os.ms-windows.apps.word-proc Cindy Meister -WordMVP-
<CindyM...@swissonline.ch> wrote:
,

Samuel Webster

unread,
Jul 7, 2000, 3:00:00 AM7/7/00
to
What do you get when you save the document as text?

Alternates:
* convert the table to text inside Word, and choose a unique char to delimit
each field e.g. Alt+0184; save the result as a text file.
* save the file as HTML, open the resulting text file and strip out the
non-tabular data. Open that in Access/Excel.

"Jens Benecke" <je...@pinguin.conetix.de> wrote in message

news:3aa0k8...@192.168.1.100...
>
> Hi,
>
> there is a Word file here that contains one single table with about 10,000
> rows (it's a literature database). Word takes about one and a half hours
to
> read this file on an SCSI-Athlon system (don't ask me who created it and
> how!), and my job is to convert this file to a format that a SQL database
> can read or that can be converted into SQL format with little effort, i.e.
> CSV.
>

> How can I convert a Word table into a text-like format? Excel and Access

> refuse to read or import that file. catdoc exports it to plain text
alright
> (takes about 5 seconds) but screws up the formatting (i.e. makes multiple
> lines out of one, leaves empty cells out altogether etc) and cannot cope
> with umlauts which this file contains a lot.
>
>
> Does anyone have an idea?
>

> As long as I get something readable (i.e. plain text) that I can feed a


> Perl (or something) script with, I'm satisfied. It doesn't need to be
> anything fancy.
>

Cindy Meister -WordMVP-

unread,
Jul 7, 2000, 3:00:00 AM7/7/00
to
> So did you give yourself a Pizza Hut coupon?
>
<LOL>!

-- Cindy


John McGhie [MVP - Word]

unread,
Jul 8, 2000, 3:00:00 AM7/8/00
to

Jens:

I think you will have better luck if you take on this monster in Word 95.
Word 97/2000 have various "issues" with the new table editor in long complex
tables which all end up with a long, long wait. If the doc has already been
converted to Word 97/2000, this won't help you much, though.

If you can get the file open at all in Word 2000, save it out to HTML, which
you can readily parse. Use the "Export to Compact HTML format" command
available on the File menu if you have HTML Filter 2 installed (if you
haven't, download it fdrom Microsoft).

As a last resort, "Print" the file to a dummy Text-only printer using Print
To File and use PERL to parse it.

FWIW the document is now seriously corrupt, which is why it is taking so
long to open. Things to try:

1) Tools>Options>Save and turn Allow Fast Saves OFF, then save the document
and close it. That cleans it out.

2) Remove all Versions, turn Track Changes OFF then Accept all changes
(cleans it out further).

Hope this helps.

In microsoft.public.word.tables on Wed, 5 Jul 2000 23:45:39 +0200, Jens
Benecke <je...@pinguin.conetix.de> wrote:

>
> Hi,
>
> there is a Word file here that contains one single table with about 10,000
> rows (it's a literature database). Word takes about one and a half hours to
> read this file on an SCSI-Athlon system (don't ask me who created it and
> how!), and my job is to convert this file to a format that a SQL database
> can read or that can be converted into SQL format with little effort, i.e.
> CSV.
>
> How can I convert a Word table into a text-like format? Excel and Access
> refuse to read or import that file. catdoc exports it to plain text alright
> (takes about 5 seconds) but screws up the formatting (i.e. makes multiple
> lines out of one, leaves empty cells out altogether etc) and cannot cope
> with umlauts which this file contains a lot.
>
>
> Does anyone have an idea?
>
> As long as I get something readable (i.e. plain text) that I can feed a
> Perl (or something) script with, I'm satisfied. It doesn't need to be
> anything fancy.
>
> Oh, btw: there are copies of the 95, 97 and 2000 releases of MS Office
> available, so pick your choice.
>
>
> Thanks in advance! :)
>
>
> PS: Please CC me in your replies, so that I don't miss anything. Thanks.


Please post follow-up questions to the newsgroup so that all may follow the thread.

John McGhie <jo...@mcghie-information.com.au>
Consultant Technical Writer
Microsoft MVP (Word)
Sydney, Australia (GMT +10 hrs) +61 (04) 1209 1410

0 new messages