I process financial documents which start in Word and end up either in
QuarkXPress or Adobe InDesign. Occasionally, a client will send a Word
document over that has financial tables that originated in Excel, but were
inserted using the Paste Special ‹) Paste as Picture command. When tables
are inserted this way, it causes me additional work because I need them not
as pictures but as text. However, since they are pictures, double-clicking
them does not let you access their original cells in Excel, it only lets you
access picture formatting options.
I have figured out a workaround, but let me ask this first:
Why can't Word let you extract the text from these pictures? I see no Macro
commands appropriate to this task... Ironically, the text information is all
there, embedded in the picture (in other words, it has not been rasterized).
Why does Word not allow this?????
OK, the workaround. I copy the picture, switch to FileMaker, paste the
picture in a graphic container, print to PDF, open the PDF in Acrobat
Professional, then save the PDF as a Word document (!!!!). When the
document is opened in Word, VOILA - the picture is not only converted to a
table, it is a very nicely constructed one, right down to the correctly
sized cells (even cells that were merged together in Excel, where the chart
originated). One strange artifact of this conversion is that dollar signs
sometimes get swapped around so that they follow the number they belong to,
rather than precede it... (WHY???)
There are always so many head-scratching WHYs whenever dealing with MS
products... The biggest one here is: WHY can Acrobat access this information
(the text correctly arrayed in table cells) while Word cannot??? Is there
no way to avoid this crazy workaround?
Assuming not, I have the following plea: can someone explain how to
construct a macro that will:
1) determine the total number of picture objects (the kind that have this
embedded text in them) in the current Word document
2) go the next picture object and copy it (I can use AppleScript to control
how many times it goes to the next one by using the total number of picture
objects as a variable that controls the number of times this step is called
in a loop. As far as the FileMaker and Acrobat stuff, I can also use
AppleScript for that too).
Many thanks,
Bill Planey
if you wish to reply to me directly, please fix/use the following address:
mac.info _at_ sbcglobal.net
I do not think that it is correct to say that the table has not been
rasterized. The way that you are doing it is probably the only way to do
it.
--
Please post any further questions or followup to the newsgroups for the
benefit of others who may be interested. Unsolicited questions forwarded
directly to me will only be answered on a paid consulting basis.
Hope this helps
Doug Robbins - Word MVP
"BillP" <n3wsr3ad3r_@_sbcglobal.net> wrote in message
news:BC833FF1.4F8AA%n3wsr3ad3r_@_sbcglobal.net...
Well, if the table _was_ rasterized, it wouldn't be possible to get font
information for it, since everything would be pixels. The font/text is
clearly embedded in such charts, so the question is: why can't Word get at
it when other tools _can_? it would make life a whole lot easier to not have
to program things that should be available within Word to begin with...
Bill
On 3/21/04 17:44, in article entLB65D...@TK2MSFTNGP09.phx.gbl, "Doug
Robbins - Word MVP - DELETE UPPERCASE CHARACTERS FROM EMAIL ADDRESS"
I might have to take that back. I haven' tried to do anything with an Excel
Spreadsheet inserted as a Picture since Word 6 or 97 - I threatened to kill
the next person who did that sort of thing with the particular type of
information concerned and my threat worked <g>
I note however that in Word 2003, If I use Paste Special and paste and Excel
Spreadsheet as a Picture (either Windows Metafile) or Ehanced Metafile), if
I right click on the Picture and select Edit Picture from the context menu
that appears, it is possible to select and edit the text.
The same can be done in Word 2000 but the layout changes a bit during the
editing so while that's going on, it's not exactly WYSIWYG.
--
Please post any further questions or followup to the newsgroups for the
benefit of others who may be interested. Unsolicited questions forwarded
directly to me will only be answered on a paid consulting basis.
Hope this helps
Doug Robbins - Word MVP
"BillP" <n3wsr3ad3r_@_sbcglobal.net> wrote in message
news:BC83915D.50884%n3wsr3ad3r_@_sbcglobal.net...
In any event, I have no desire to edit this text. I want to extract it into
Word, either in the Word Table format, or at the very least, in
tab-separated items that form one row per paragraph. Then I can subsequently
format it properly.
I was hoping (back to my original request) that Word would have some sort of
visual basic command that would allow me to automate this process. I did
mention a ridiculous workaround that involved saving a PDF of each table
back to Word format (which Acrobat Professional DOES!!), and it seems even
more ridiculous that Word would do a less good job of letting me get at this
data than Acrobat (but this would be par for the course for Microsoft...)
So:
1) VB command for counting total number (X) of Picture Objects, and then
going to next Picture Object X times
2) VB command (if exists) that converts to Word Table or plain tab separated
text right there on the spot.
Thanks!
Bill Planey
On 3/22/04 01:04, in article OvnQfv9D...@TK2MSFTNGP09.phx.gbl, "Doug
Robbins - Word MVP - DELETE UPPERCASE CHARACTERS FROM EMAIL ADDRESS"
<d...@mOSTvALUABLEpROFESSIONALs.org> wrote:
> Hi Bill,
>
Doug's reply points you in the general direction for extracting the data
without resorting to Adobe.
If you select the Excel 'picture' for editing, you get access to a series of
text boxes that hold the data. This could probably be automated with vba,
following which you could also use vba to extract the text from each of the
text boxes (via the Shapes.HasText property) and paste the results into
wherever else you wanted them.
Cheers
"BillP" <n3wsr3ad3r_@_sbcglobal.net> wrote in message
news:BC833FF1.4F8AA%n3wsr3ad3r_@_sbcglobal.net...
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.627 / Virus Database: 402 - Release Date: 16/03/2004
Those pictures *should* have been pasted originally as Windows Metafiles.
If you copy from Word and paste into a graphics app as "Word Picture" or
WMF, you will be able to read both the text and the lines.
If you paste as plain text, you will get the text out.
If you are doing this on a Mac, Word will convert the picture to PICT when
it displays the picture. I am not sure under those circumstances, but I
believe that both the text and the lines should remain available as vector
graphic objects.
If, on the other hand, you select the picture in the document then choose
Edit>Picture, what opens up in the Word picture editor is a vector graphic
in WMF format and you can then copy and paste the text as formatted text.
It helps to know that a picture embedded this way is a document within a
document. You need to open the embedded document: you can then copy
anything you like as a Word object.
However, if the originator has done the wrong thing and pasted the picture
as a bitmap, then you have a problem :-)
Hope this helps
This responds to article <BC833FF1.4F8AA%n3wsr3ad3r_@_sbcglobal.net>, from
"BillP" <n3wsr3ad3r_@_sbcglobal.net> on 22/3/04 5:49 AM:
--
Please respond only to the newsgroup to preserve the thread.
John McGhie, Consultant Technical Writer,
McGhie Information Engineering Pty Ltd
Sydney, Australia. GMT + 10 Hrs
+61 4 1209 1410, mailto:jo...@mcghie.name
The pictures are indeed metafiles. As I explained in my earliest post on
this subject, I am able to get the text by the following process:
1) copy the pictures into a container field in FileMaker Pro
2) print layout to PDF
3) open PDF in Acrobat Professional, Save As -> MS Word .doc
4) open saved document in Word to find that Acrobat managed (amazingly) to
preserve the original cellular structure of the Word table (!) and that I
can access the text perfectly fine, with one minor artifact of this
conversion the misplacement of dollar signs that happened to be in front of
some 1st column numbers (the $ signs were switched to come AFTER the
number).
All I dream of learning is how I can build a macro that will:
1) tell me if this type of metafile picture object is present in a document
I am opening,
2) allow me to go to each such object, one by one, and copy it, so that I
can paste it into FileMaker Pro for subsequent processing (see above).
I plan to control this macro from AppleScript, so it follows that I am on a
Macintosh.
My four step process above may seem outlandish, but I am willing to keep the
entire thing within Word if Word will only offer me the ability to do so.
Given that this is a Microsoft product I am dealing with, I fully expect the
need to do something fairly convoluted to achieve something that would, on
the surface, seem rather simple.
That¹s all!
Thanks,
Bill Planey
On 3/24/04 01:53, in article BC8789BA.B3AF%jo...@mcghie.name, "John McGhie
This responds to article <BC8A2AEF.519BD%n3wsr3ad3r_@_sbcglobal.net>, from
"BillP" <n3wsr3ad3r_@_sbcglobal.net> on 27/3/04 10:46 AM:
> All I dream of learning is how I can build a macro that will:
>
> 1) tell me if this type of metafile picture object is present in a document
> I am opening,
No. You can make a good guess, but you cannot precisely determine the
content of a picture from VBA. You can tell if it's a "Picture" (i.e. A
vector) or a "Bitmap" (i.e. Anything else). By computing where it is in the
document and parsing the caption you may be able to make a very good guess.
But you won't know for sure.
>
> 2) allow me to go to each such object, one by one, and copy it, so that I
> can paste it into FileMaker Pro for subsequent processing (see above).
Yes, that bit is easy. But why not go straight into Illustrator? You will
get the whole thing out in one easy operation!
> I plan to control this macro from AppleScript, so it follows that I am on a
> Macintosh.
Ah! Yes, I had assumed you were on a Macintosh because I am reading you in
the Mac Word group. Now: In Mac Word earlier than 2004, simply forget
this: the AppleScript is not good enough. Wait for Word 2004: you will be
amazed: you may then find that you elect to stay in AppleScript all the way
through without bothering with any VBA.
> My four step process above may seem outlandish, but I am willing to keep the
> entire thing within Word if Word will only offer me the ability to do so.
> Given that this is a Microsoft product I am dealing with, I fully expect the
> need to do something fairly convoluted to achieve something that would, on
> the surface, seem rather simple.
No. I would do this whole operation in Word using VBA, and I am not sure
why you don't. You do not seem to have discovered Edit>Picture yet (or the
pictures you are dealing with are not in fact WMF, or you do not have the
full version of Word X loaded). Check your Office X CD and make sure you
have all the graphics converters installed.
Mind you: when I say that I would do it all in Word, I would be using some
very sophisticated string-parsing VBA to do it, which means I would write
the macro in PC Word 2003 because the Macro Editor in Word X is simply not
up to the task. You would spend weeks writing the macro, and who has time
for that.
On the PC, the Visual Basic for Applications Development Environment will
talk you through: you could write this macro in an hour or so. Then move it
to the Mac when you're finished :-)
Cheers
--
>> I plan to control this macro from AppleScript, so it follows that I am on a
>> Macintosh.
>
> Ah! Yes, I had assumed you were on a Macintosh because I am reading you in
> the Mac Word group. Now: In Mac Word earlier than 2004, simply forget
> this: the AppleScript is not good enough. Wait for Word 2004: you will be
> amazed: you may then find that you elect to stay in AppleScript all the way
> through without bothering with any VBA.
There has been no information released about any change to AppleScript in
Word 2004, and in any case nobody here has Word 2004. Furthermore, it can
all be done by AppleScript in Word X, 2001 and 98 by using the 'do Visual
Basic' command, but that of course does involve your learning the basics of
VBA to do so, which is what John meant. Native AppleScript in Word is
unusable because it crashes and is faulty. See
http://word.mvps.org/FAQs/WordMac/WordAppleScript.htm
in Internet Explorer - it doesn't open in Safari.
--
Paul Berkowitz
MVP Entourage
Entourage FAQ Page: <http://www.entourage.mvps.org/toc.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>
Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.
PLEASE always state which version of Entourage you are using - 2001 or X.
It's often impossible to answer your questions otherwise.
I'll try to clarify/address these points futher...
On 3/28/04 16:51, in article BC8D941B.B8E5%jo...@mcghie.name, "John McGhie
[MVP - Word]" <jo...@mcghie.name> wrote:
> Hi Bill:
>
> This responds to article <BC8A2AEF.519BD%n3wsr3ad3r_@_sbcglobal.net>, from
> "BillP" <n3wsr3ad3r_@_sbcglobal.net> on 27/3/04 10:46 AM:
>
>> All I dream of learning is how I can build a macro that will:
>>
>> 1) tell me if this type of metafile picture object is present in a document
>> I am opening,
>
> No. You can make a good guess, but you cannot precisely determine the
> content of a picture from VBA. You can tell if it's a "Picture" (i.e. A
> vector) or a "Bitmap" (i.e. Anything else). By computing where it is in the
> document and parsing the caption you may be able to make a very good guess.
> But you won't know for sure.
Fair enough. Let me rephrase: it will be highly unlikely in my situation
that the objects will be bitmapped. Therefore, any way to address these
objects (even if it does not discriminate between ones that are bitmapped
and ones that contain real text information) I will regard as acceptable. I
just want to automatically know how many there are, find the first one,
copy, paste into FileMaker and go to the next one and repeat the cycle until
the last one has been done.
I did receive the following suggestion for the VBA phrasing from one Helen
Feddema, who is a VB expert in New York, after I posed the question to her:
>Bill,
>There are several ways you could get at these objects. To select one, a
>line like the following will do:
>
> Selection.GoTo What:=wdGoToGraphic, Which:=wdGoToFirst, Count:=5,
>Name:=""
>
>(this selects the fifth image in a Word document).
>
>If you want to cycle through all of the images, since there is no
>Graphics or Images collection in Word (though there is a Tables
>collection), you could set up an incrementing lngCount variable and
>access each image in turn, or try the wdGoToNext named constant as with
>tables.
...but unfortunately, I could not make it work. Syntax error or compile
error. I made sure that the document I tried it on had more than five of
such objects before trying it... Also, I am quite lost reading her last
paragraph. I am not that good with VBA; my best bits of code have come from
Word's AS recordability, and the incredible kindness of strangers on message
boards like this one.
>>
>> 2) allow me to go to each such object, one by one, and copy it, so that I
>> can paste it into FileMaker Pro for subsequent processing (see above).
>
> Yes, that bit is easy. But why not go straight into Illustrator? You will
> get the whole thing out in one easy operation!
>
I tried this, and, yes - you do get the text out. But each piece of text
(i.e., column) is a separate object and since the goal is to eventually get
this text back into Word, this seems an ungainly step. I have conclusively
tested to find that a PDF created from such a table can be saved as a WORD
document by Acrobat Professional, and it puts it right into the desired
cellular table structure that the chart/picture would have had if properly
prepared from the start. The whole point of my endeavor with this problem
is to build a script capable of detecting these "wrong" kind of tables and
fix them to the right kind, but only IF THEY ARE THERE to begin with. I
already have built the code that detects and processes the right kind of
tables. Just not this kind, which happens from time to time in the kind of
documents I am handling.
>> I plan to control this macro from AppleScript, so it follows that I am on a
>> Macintosh.
>
> Ah! Yes, I had assumed you were on a Macintosh because I am reading you in
> the Mac Word group. Now: In Mac Word earlier than 2004, simply forget
> this: the AppleScript is not good enough. Wait for Word 2004: you will be
> amazed: you may then find that you elect to stay in AppleScript all the way
> through without bothering with any VBA.
I fully expect Microsoft to not only disappoint with regard to the
robustness of their AppleScript implementation, but to leave _in_place_ a
major bug in the search/replace function that has been there since Office
2001 (the wildcard search bug). I had to beg MS to take my support call
without charging me just to report this bug 4 years ago (and it turned out
that they knew about it). At any rate, I was able to get my hands on a
preliminary copy of Office 2004 and I'll check under the hood, FWIW.
>
>> My four step process above may seem outlandish, but I am willing to keep the
>> entire thing within Word if Word will only offer me the ability to do so.
>> Given that this is a Microsoft product I am dealing with, I fully expect the
>> need to do something fairly convoluted to achieve something that would, on
>> the surface, seem rather simple.
>
> No. I would do this whole operation in Word using VBA, and I am not sure
> why you don't. You do not seem to have discovered Edit>Picture yet (or the
> pictures you are dealing with are not in fact WMF, or you do not have the
> full version of Word X loaded). Check your Office X CD and make sure you
> have all the graphics converters installed.
>
> Mind you: when I say that I would do it all in Word, I would be using some
> very sophisticated string-parsing VBA to do it, which means I would write
> the macro in PC Word 2003 because the Macro Editor in Word X is simply not
> up to the task. You would spend weeks writing the macro, and who has time
> for that.
>
> On the PC, the Visual Basic for Applications Development Environment will
> talk you through: you could write this macro in an hour or so. Then move it
> to the Mac when you're finished :-)
I find it icky in the extreme that I have to deal with MS Word at all; only
the reality of the marketplace (i.e., that 99% of this type of document is
prepared in Word) keeps me having to create kludgy solutions. It would be
great if BBEdit was the word processor of choice out there; based on my own
efforts in this regard, I can only estimate the annual $$$$billions in
unnecessary software development and support that takes place just to create
solutions that work _in_spite_of_ Microsoft. Sorry, I saw a soapbox and I
took the opportunity...
Bill Planey
That's *MY* damn soap-box, you interloper... gerrroff!!
I knew I was going to regret getting involved in this thread.
This responds to microsoft.public.mac.office.word on Mon, 29 Mar 2004
18:20:12 -0600, BillP <n3wsr3ad3r_@_sbcglobal.net>:
> I did receive the following suggestion for the VBA phrasing from one Helen
> Feddema, who is a VB expert in New York, after I posed the question to her:
>
> >Bill,
>
> >There are several ways you could get at these objects. To select one, a
> >line like the following will do:
> >
> > Selection.GoTo What:=wdGoToGraphic, Which:=wdGoToFirst, Count:=5,
> >Name:=""
> >
> >(this selects the fifth image in a Word document).
Try this (all as ONE line)...
Selection.GoTo What:=wdGoToGraphic, Which:=wdGoToAbsolute, Count:=5
They are functionally equivalent, but using the second statement enables you
to increment the Count parameter to step through the graphics in the
document.
> >If you want to cycle through all of the images, since there is no
> >Graphics or Images collection in Word (though there is a Tables
> >collection), you could set up an incrementing lngCount variable and
> >access each image in turn, or try the wdGoToNext named constant as with
> >tables.
She's made a mistake: there *is* a collection, but it's called "Shapes" not
graphics or images. Note: I haven't checked this on the Mac, I wonder if
she knows something I don't: maybe the Shapes collection does not exist on
the Mac: that's unlikely I think. Anyway... this is what she means:
Sub Macro1()
'
' Macro1 Macro
' Macro recorded 1/30/2004 by John McGhie
'
For i = 1 To ActiveDocument.Shapes.Count
Selection.GoTo What:=wdGoToGraphic, Which:=wdGoToAbsolute, Count:=i
MsgBox "Number " & Str(i)
Next i
End Sub
> ...but unfortunately, I could not make it work. Syntax error or compile
> error.
Yeah. Although this VBA looks simple, we're actually getting into the more
arcane parts of VBA where you can expect some fireworks getting it to go on
the Mac. I have not tested any of these examples on the Mac (sorry: I
haven't time to re-install Office X tonight) but it may well be that they
won't compile and/or run.
The implementation of VBA in Mac Word 2001/X is intentionally very limited
to keep the price of Office down :-) Regrettably, the fix would be a huge
and massively expensive undertaking, and there are many places they could
spend their money that I think Mac users would appreciate more. Couple that
with the fact that VBA is a dead technology anyway, and you can see that
when it comes to a choice of where they are going to spend development
funds, I think we can safely assume it won't be VBA... VBA is being
replaced simply because it was designed to be small and efficient and easy
to use in happier times when a small number of computers networked together
were all operated by friends who could be trusted. By design, VBA cannot be
made sufficiently secure for today's Internet firestorm, so they are
replacing it on the PC too, with VB dot Net. Dot Net is not only much
faster, more flexible, and more powerful, but -- by design -- it doesn't
trust *anybody* :-) It's designed to remain secure in a
massively-distributed application where the entire world is networked
together.
> I made sure that the document I tried it on had more than five of
> such objects before trying it... Also, I am quite lost reading her last
> paragraph. I am not that good with VBA; my best bits of code have come from
> Word's AS recordability, and the incredible kindness of strangers on message
> boards like this one.
Yeah. There's a reason I told you I wouldn't attempt this in Mac Word :-)
Quite frankly, unless you are very good at VBA, I would not be walking in
this valley of death. It's just too hard, wrestling with a language that is
incomplete on the Mac, a development environment that is incomplete on the
Mac, and trying to learn VBA at the same time. Find yourself a copy of Word
2003 on a PC and write your macro there: you will save literally weeks of
your time.
I would not have chosen the GoTo command to do this with (although, it does
save you having to work out what is a picture and what is not).
The following code will loop through all of the Shapes in the document, and
for each one that is a Drawing Canvas, copy it.
Sub Macro1()
'
' Macro1 Macro
' Macro recorded 1/30/2004 by John McGhie
'
Dim aPic As Shape
For Each aPic In ActiveDocument.Shapes
MsgBox aPic.Type
If aPic.Type = msoCanvas Then
aPic.Select
Selection.Copy
End If
' Put the rest of your code here
Next aPic
End Sub
' MsoShapeType can be one of these MsoShapeType constants.
' msoAutoShape
' msoCanvas
' msoComment
' msoFormControl
' msoCallout
' msoChart
' msoEmbeddedOLEObject
' msoFreeform
' msoGroup
' msoLine
' msoLinkedOLEObject
' msoLinkedPicture
' msoMedia
' msoOLEControlObject
' msoPicture
' msoPlaceholder
' msoScriptAnchor
' msoShapeTypeMixed
' msoTable
' msoTextBox
' msoTextEffect
Again: notice that those are all prefixed "mso"? That means they are part
of Microsoft Office, but not necessarily available in Word, particularly on
the Mac. So if you get problems, that could be a reason.
> I fully expect Microsoft to not only disappoint
Hey, that's my line! Seriously: If you think Microsoft is bad, try some of
the others :-)
> I find it icky in the extreme that I have to deal with MS Word at all; only
> the reality of the marketplace (i.e., that 99% of this type of document is
> prepared in Word) keeps me having to create kludgy solutions.
Again: Try the others :-) Word in my experience is the best there is, by
quite a long way :-) If you really enjoy tilting at windmills, try doing
this in FrameMaker :-)
Hope this helps
--
Please post all comments to the newsgroup to maintain the thread.
John McGhie, Consultant Technical Writer