I've just started using WSH and have written a script to access a series of
web pages which I want to save as text files for further processing. I can't
seem to find the appropriate object and method to access the SaveAs command
in Internet Explorer. Can anyone enlighten me?
Thanks
Oliver
Here's how you would extract the text from a web page and write it to a file; if
you want to keep the HTML markup in the file, comment out the line [sData =
HtmlToText(sData)]:
' first get the HTML source into variable sData
sData = GetXml("http://www.yahoo.com")
' now use "htmlfile" to extract the text from the HTML
' if you want the complete HTML markup saved, comment out
' the next line.
sData = HtmlToText(sData)
'now save the text to a file.
WriteFile "C:\tmp\web.txt", sData
Function HtmlToText(sHtml)
' This function uses the "htmlfile" object to
' extract text from well-formed HTML
' lighter and faster than using IE!
With CreateObject("htmlfile")
.write sHtml
HtmlToText = .body.innertext
End With
End Function
Function GetXml(sURL)
' Create an xmlhttp object:
Dim Xml
Set Xml = CreateObject("Microsoft.XMLHTTP")
Xml.open "GET",sURL
Xml.send
Do:wscript.sleep 10:Loop While Xml.ReadyState<>4
GetXml = Xml.responseText
End Function
Sub WriteFile(FilePath, sData)
'writes sData to FilePath
With CreateObject("Scripting.FileSystemObject")._
OpenTextFile(FilePath, 2, True)
.Write sData: .Close
End With
End Sub
--
Please respond in the newsgroup so everyone may benefit.
http://dev.remotenetworktechnology.com
(email requests for support contract information welcomed)
----------
Subscribe to Microsoft's Security Bulletins:
http://www.microsoft.com/technet/security/bulletin/notify.asp
"Oliver Hill" <o.h...@cantab.net> wrote in message
news:3e38...@212.67.96.135...
Mythran
Option Explicit
'
' Constants
'
Const READYSTATE_UNINITIALIZED = 0
Const READYSTATE_LOADING = 1
Const READYSTATE_LOADED = 2
Const READYSTATE_INTERACTIVE = 3
Const READYSTATE_COMPLETE = 4
Const OLECMDEXECOPT_DODEFAULT = 0
Const OLECMDEXECOPT_PROMPTUSER = 1
Const LECMDEXECOPT_DONTPROMPTUSER = 2
Const OLECMDEXECOPT_SHOWHELP = 3
Const IDM_COPY = 15
Const IDM_CUT = 16
Const IDM_PASTE = 26
Const IDM_PRINT = 27
Const IDM_PROPERTIES = 28
Const IDM_REDO = 29
Const IDM_SELECTALL = 31
Const IDM_UNDO = 43
Const IDM_ZOOMPERCENT = 50
Const IDM_GETZOOM = 68
Const IDM_SAVE = 70
Const IDM_SAVEAS = 71
Const IDM_OPEN = 2000
Const IDM_NEW = 2001
Const IDM_SAVECOPYAS = 2002
Const IDM_PRINTPREVIEW = 2003
Const IDM_PAGESETUP = 2004
Const IDM_SPELL = 2005
Const IDM_PASTESPECIAL = 2006
Const IDM_CLEARSELECTION = 2007
Const IDM_SHOWPRINT = 2010
Const IDM_SHOWPAGESETUP = 2011
Const IDM_STOP = 2138
'
' Call Sub Main() to get the ball rolling.
'
Call Main()
Sub Main()
Dim strWebPage
Dim strFile1
Dim strFile2
Dim lngReturn
'
' Set the location of the web page to point to a web site.
'
strWebPage = "http://www.google.com"
'
' Set the file to save's location.
'
strFile1 = "Google1.txt"
strFile2 = "Google2.txt"
'
' Load and save the page.
'
MsgBox "Attempting to save using IE's SaveAs command."
lngReturn = SaveWebPageAsTextFile(strWebPage, strFile1)
MsgBox "Attempting to save using the FileSystemObject object."
Call SaveAsTextOnly(strWebPage, strFile2)
'
' Let the user know we are finished.
'
Call MsgBox("Files saved.")
End Sub
Sub SaveAsTextOnly(ByVal strWebPage, ByVal strFile)
Dim objIE
Dim objFS
Dim objFile
'
' Create the InternetExplorer.Application object.
'
Set objIE = CreateObject("InternetExplorer.Application")
'
' Make sure IE is visible.
'
objIE.Visible = True
'
' Navigate to the website.
'
Call objIE.Navigate(strWebPage)
'
' Sleep until IE is ready.
'
Do Until (objIE.readyState = READYSTATE_COMPLETE)
WScript.Sleep 100
Loop
'
' Create the Scripting.FileSystemObject object.
'
Set objFS = CreateObject("Scripting.FileSystemObject")
'
' Create a new file to save to, overwriting the file if necessary.
'
Set objFile = objFS.CreateTextFile(strFile, True)
'
' Write the contents of the web page to the file as text.
'
objFile.Write objIE.Document.Body.innerText
'
' Close the file.
'
objFile.Close
'
' Open the code file.
'
Set objFile = objFS.CreateTextFile(Left(strFile, Len(strFile) - 4) &
"__code.html", True)
'
' Write the HTML to the file.
'
objFile.Write objIE.Document.Body.innerHTML
'
' Close the file.
'
objFile.Close
'
' Destroy the object references.
'
Set objFS = Nothing
Set objFile = Nothing
objIE.Quit
Set objIE = Nothing
End Sub
Function SaveWebPageAsTextFile(ByVal strWebPage, ByVal strFile)
Dim objIE
'
' Create the InternetExplorer.Application object.
'
Set objIE = CreateObject("InternetExplorer.Application")
'
' Make sure IE is visible.
'
objIE.Visible = True
'
' Navigate to the website.
'
Call objIE.Navigate(strWebPage)
'
' Sleep until IE is ready.
'
Do Until (objIE.readyState = READYSTATE_COMPLETE)
WScript.Sleep 100
Loop
'
' Attempt to save the web page.
'
SaveWebPageAsTextFile = _
objIE.Document.execCommand("SaveAs", True, strFile)
'SaveWebPageAsTextFile = _
' objIE.ExecWB(IDM_SAVEAS, LECMDEXECOPT_DONTPROMPTUSER, strFile, 0)
'
' Close and destroy the IE window and object reference.
'
objIE.Quit
Set objIE = Nothing
End Function
Ooops..you can remove the IDM constants below :P You don't need them, I was
testing IE's execWB method and as it turned out, it didn't successfully save them
:(
Oliver
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Alex, go back to microsoft.public.scripting.vbscript. This is my newsgroup! :P
Jk... heh
Mythran
--
Please respond in the newsgroup so everyone may benefit.
http://dev.remotenetworktechnology.com
(email requests for support contract information welcomed)
----------
Subscribe to Microsoft's Security Bulletins:
http://www.microsoft.com/technet/security/bulletin/notify.asp
"Mythran" <kip_p...@hotmail.com> wrote in message
news:u1WErN6yCHA.2648@TK2MSFTNGP11...