Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Using WSH to save web pages as text files?

73 views
Skip to first unread message

Oliver Hill

unread,
Jan 29, 2003, 11:35:41 AM1/29/03
to
Hi,

I've just started using WSH and have written a script to access a series of
web pages which I want to save as text files for further processing. I can't
seem to find the appropriate object and method to access the SaveAs command
in Internet Explorer. Can anyone enlighten me?

Thanks
Oliver

Alex K. Angelopoulos (MVP)

unread,
Jan 29, 2003, 1:01:25 PM1/29/03
to
Oliver, are you wanting to save the entire markup for the page, or just the
text?


Here's how you would extract the text from a web page and write it to a file; if
you want to keep the HTML markup in the file, comment out the line [sData =
HtmlToText(sData)]:


' first get the HTML source into variable sData
sData = GetXml("http://www.yahoo.com")

' now use "htmlfile" to extract the text from the HTML
' if you want the complete HTML markup saved, comment out
' the next line.
sData = HtmlToText(sData)

'now save the text to a file.
WriteFile "C:\tmp\web.txt", sData


Function HtmlToText(sHtml)
' This function uses the "htmlfile" object to
' extract text from well-formed HTML
' lighter and faster than using IE!
With CreateObject("htmlfile")
.write sHtml
HtmlToText = .body.innertext
End With
End Function

Function GetXml(sURL)
' Create an xmlhttp object:
Dim Xml
Set Xml = CreateObject("Microsoft.XMLHTTP")
Xml.open "GET",sURL
Xml.send
Do:wscript.sleep 10:Loop While Xml.ReadyState<>4
GetXml = Xml.responseText
End Function

Sub WriteFile(FilePath, sData)
'writes sData to FilePath
With CreateObject("Scripting.FileSystemObject")._
OpenTextFile(FilePath, 2, True)
.Write sData: .Close
End With
End Sub


--
Please respond in the newsgroup so everyone may benefit.
http://dev.remotenetworktechnology.com
(email requests for support contract information welcomed)
----------
Subscribe to Microsoft's Security Bulletins:
http://www.microsoft.com/technet/security/bulletin/notify.asp


"Oliver Hill" <o.h...@cantab.net> wrote in message
news:3e38...@212.67.96.135...

Mythran

unread,
Jan 29, 2003, 1:54:17 PM1/29/03
to

The following code provides a few examples on how to do this :) Hope this helps!

Mythran

Option Explicit

'
' Constants
'
Const READYSTATE_UNINITIALIZED = 0
Const READYSTATE_LOADING = 1
Const READYSTATE_LOADED = 2
Const READYSTATE_INTERACTIVE = 3
Const READYSTATE_COMPLETE = 4

Const OLECMDEXECOPT_DODEFAULT = 0
Const OLECMDEXECOPT_PROMPTUSER = 1
Const LECMDEXECOPT_DONTPROMPTUSER = 2
Const OLECMDEXECOPT_SHOWHELP = 3


Const IDM_COPY = 15
Const IDM_CUT = 16
Const IDM_PASTE = 26
Const IDM_PRINT = 27
Const IDM_PROPERTIES = 28
Const IDM_REDO = 29
Const IDM_SELECTALL = 31
Const IDM_UNDO = 43
Const IDM_ZOOMPERCENT = 50
Const IDM_GETZOOM = 68
Const IDM_SAVE = 70
Const IDM_SAVEAS = 71
Const IDM_OPEN = 2000
Const IDM_NEW = 2001
Const IDM_SAVECOPYAS = 2002
Const IDM_PRINTPREVIEW = 2003
Const IDM_PAGESETUP = 2004
Const IDM_SPELL = 2005
Const IDM_PASTESPECIAL = 2006
Const IDM_CLEARSELECTION = 2007
Const IDM_SHOWPRINT = 2010
Const IDM_SHOWPAGESETUP = 2011
Const IDM_STOP = 2138

'
' Call Sub Main() to get the ball rolling.
'
Call Main()

Sub Main()
Dim strWebPage
Dim strFile1
Dim strFile2
Dim lngReturn

'
' Set the location of the web page to point to a web site.
'
strWebPage = "http://www.google.com"

'
' Set the file to save's location.
'
strFile1 = "Google1.txt"
strFile2 = "Google2.txt"

'
' Load and save the page.
'
MsgBox "Attempting to save using IE's SaveAs command."
lngReturn = SaveWebPageAsTextFile(strWebPage, strFile1)

MsgBox "Attempting to save using the FileSystemObject object."
Call SaveAsTextOnly(strWebPage, strFile2)

'
' Let the user know we are finished.
'
Call MsgBox("Files saved.")
End Sub


Sub SaveAsTextOnly(ByVal strWebPage, ByVal strFile)
Dim objIE
Dim objFS
Dim objFile

'
' Create the InternetExplorer.Application object.
'
Set objIE = CreateObject("InternetExplorer.Application")

'
' Make sure IE is visible.
'
objIE.Visible = True

'
' Navigate to the website.
'
Call objIE.Navigate(strWebPage)

'
' Sleep until IE is ready.
'
Do Until (objIE.readyState = READYSTATE_COMPLETE)
WScript.Sleep 100
Loop

'
' Create the Scripting.FileSystemObject object.
'
Set objFS = CreateObject("Scripting.FileSystemObject")

'
' Create a new file to save to, overwriting the file if necessary.
'
Set objFile = objFS.CreateTextFile(strFile, True)

'
' Write the contents of the web page to the file as text.
'
objFile.Write objIE.Document.Body.innerText

'
' Close the file.
'
objFile.Close

'
' Open the code file.
'
Set objFile = objFS.CreateTextFile(Left(strFile, Len(strFile) - 4) &
"__code.html", True)

'
' Write the HTML to the file.
'
objFile.Write objIE.Document.Body.innerHTML

'
' Close the file.
'
objFile.Close

'
' Destroy the object references.
'
Set objFS = Nothing
Set objFile = Nothing

objIE.Quit
Set objIE = Nothing
End Sub


Function SaveWebPageAsTextFile(ByVal strWebPage, ByVal strFile)
Dim objIE

'
' Create the InternetExplorer.Application object.
'
Set objIE = CreateObject("InternetExplorer.Application")

'
' Make sure IE is visible.
'
objIE.Visible = True

'
' Navigate to the website.
'
Call objIE.Navigate(strWebPage)

'
' Sleep until IE is ready.
'
Do Until (objIE.readyState = READYSTATE_COMPLETE)
WScript.Sleep 100
Loop

'
' Attempt to save the web page.
'
SaveWebPageAsTextFile = _
objIE.Document.execCommand("SaveAs", True, strFile)

'SaveWebPageAsTextFile = _
' objIE.ExecWB(IDM_SAVEAS, LECMDEXECOPT_DONTPROMPTUSER, strFile, 0)

'
' Close and destroy the IE window and object reference.
'
objIE.Quit
Set objIE = Nothing
End Function


Mythran

unread,
Jan 29, 2003, 1:58:56 PM1/29/03
to
>
> '
> ' Constants
> '
> Const READYSTATE_UNINITIALIZED = 0
> Const READYSTATE_LOADING = 1
> Const READYSTATE_LOADED = 2
> Const READYSTATE_INTERACTIVE = 3
> Const READYSTATE_COMPLETE = 4
>
> Const OLECMDEXECOPT_DODEFAULT = 0
> Const OLECMDEXECOPT_PROMPTUSER = 1
> Const LECMDEXECOPT_DONTPROMPTUSER = 2
> Const OLECMDEXECOPT_SHOWHELP = 3
>

Ooops..you can remove the IDM constants below :P You don't need them, I was
testing IE's execWB method and as it turned out, it didn't successfully save them
:(

Oliver Hill

unread,
Jan 30, 2003, 3:51:06 AM1/30/03
to

Many thanks to Alex and Mythran for the responses, exactly what I
needed!

Oliver


*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Mythran

unread,
Feb 3, 2003, 11:44:05 AM2/3/03
to

"Alex K. Angelopoulos (MVP)" <a...@mvps.org> wrote in message
news:OxJBkB8xCHA.2120@TK2MSFTNGP11...

> Oliver, are you wanting to save the entire markup for the page, or just the
> text?
>
>

Alex, go back to microsoft.public.scripting.vbscript. This is my newsgroup! :P
Jk... heh

Mythran


Alex K. Angelopoulos (MVP)

unread,
Feb 3, 2003, 12:20:42 PM2/3/03
to
wah,wah, wah.... ;)

--
Please respond in the newsgroup so everyone may benefit.
http://dev.remotenetworktechnology.com
(email requests for support contract information welcomed)
----------
Subscribe to Microsoft's Security Bulletins:
http://www.microsoft.com/technet/security/bulletin/notify.asp


"Mythran" <kip_p...@hotmail.com> wrote in message
news:u1WErN6yCHA.2648@TK2MSFTNGP11...

0 new messages