We currently have a dynamic webpage with text on it. We would like to
capture this text every 5 minutes. To do this, we currently use the
following manual process:
- Maximise the Internet Explorer window which has the page loaded
- click Refresh
- Press CTrl-A (select all the text)
- Ctrl-C to copy all the text
- Then we paste the text to the bottom of a text file located on the
desktop
This is a very tedious process and we beleive that it can be automated.
I have some VBScript experience and was wondering if some sort of
script can be written to achieve this???
If so, does anyone have any ideas how this could be done?? Is there
sample code out there anywhere??
All help is greatly appreciated!!! It is going to save us a lot of time
and hassle!!
Once again, thank you!!!
You could just script your actions using something like:
http://www.autoitscript.com/autoit3/
Or you could use a script like:
Const sURL = "http://www.microsoft.com/"
'Uses Microsoft XML, v3.0
Dim oSvrHTTP
Set oSvrHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oSvrHTTP.Open "GET", sURL, False
oSvrHTTP.send
'Debug.Print oSvrHTTP.responseText
MsgBox oSvrHTTP.responseText
Set oSvrHTTP = Nothing
You could save the responseText to a different .htm file every five minutes,
or you could stript the html tags and just append the text to a common file.
The hard bit (if it's required) is stripping out the tags and javascript.
One example I found is at http://www.codeproject.com/asp/removehtml.asp, but
it needs improvement.
Good luck.
I have downloaded AutoItScript as directed but am struggling with it.
All I need done is:
- a certain page refreshed
- text from that page appended to a text file
(it doesn't matter if there are HTML tags or other rubbish with it, as
long as all the text that is on the page is somehow in the text file)
I have played around with AutoIt script but wasn't able to pull off
anything successful.
Do you know of a sample script I can use that will give me a head
start?
Thanks heaps!!!!
However, what I am struggling with at the moment is actually getting a
webpage to refresh every 5 minutes and then grab all the text from it
and dump it in a text file - its OK if there are html tags in the text
file!!
so basically I want to be able to do the following:
- start internet explorer and then go to the webpage
then I would run the script which should:
- refresh the page every 5 minutes (the page I manually opened)
- copy all text from it to a text file
I know my requirements would only need a few lines of code but I have
no idea where to start. I have had a look at an application called
AutoIT but it seems very complicated.
Can anyone help?? anyone got any ideas about how this can be done and
where i start???
Thanks heaps!!!
Richard Cole wrote:
> On Sat, 4 Jun 2005 18:57:24 +1000, "Jason Keats"
> <jke...@melbpcDeleteThis.org.au> wrote:
>
> >rusl...@yahoo.com wrote:
> >> Hi,
> >>
> >> We currently have a dynamic webpage with text on it. We would like to
> >> capture this text every 5 minutes. To do this, we currently use the
> >> following manual process:
> <<SNIP>>
> >The hard bit (if it's required) is stripping out the tags and javascript.
> >One example I found is at http://www.codeproject.com/asp/removehtml.asp, but
> >it needs improvement.
> To Ruslan
>
> If you download the source of my MetaTag generator
> (www.rcole.org/download\pages\downloads.htm), it contains VB6 code that
> will strip out all tags and javascript. Either copy this or use it as a
> guide as a way to do this.
>
> Richard
> pne...@epbyr.bet (ROT13 to e-mail me directly). See
> http://www.caravanningnow.co.uk for most things to do with caravanning.
> --
> From the moment I picked your book up until I laid it down I was
> convulsed with laughter. Someday I intend reading it. - Groucho Marx
> 1890-1977
Ruslan, here's some code I wrote...
You will need VB.
Start a new project and paste the following into Form1.
On the form you will need a textbox called txtURL and a Timer called Timer1.
Either change the URL in the code, or create a shortcut to the EXE and pass
in the URL you want to save.
Option Explicit
Private sURL As String
Private sPath As String
Private sFileOut As String
Private nMins As Integer
Const nInterval As Integer = 5 'minutes
Private Sub Form_Load()
Timer1.Enabled = False
txtURL.Text = "http://www.microsoft.com/"
If Len(Command$) Then txtURL.Text = Command$
sURL = txtURL.Text
txtURL.Locked = True
sPath = App.Path
If Right$(sPath, 1) <> "\" Then sPath = sPath & "\"
sPath = sPath & App.EXEName
GetUrl
Timer1.Interval = 60000 '1 min
Timer1.Enabled = True
End Sub
Private Sub Form_Unload(Cancel As Integer)
Timer1.Enabled = False
End Sub
Private Sub Timer1_Timer()
nMins = nMins + 1
If nMins Mod nInterval = 0 Then nMins = 0 Else Exit Sub
GetUrl
End Sub
Private Sub GetUrl()
sFileOut = sPath
sFileOut = sFileOut & "_" & Format$(Now, "yyyyMMddhhnnss") & ".htm"
Debug.Print sFileOut
'Reference to Microsoft XML, v3.0
Dim oSvrHTTP 'As MSXML2.ServerXMLHTTP
Set oSvrHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oSvrHTTP.Open "GET", sURL, False
oSvrHTTP.send
'Debug.Print oSvrHTTP.responseText
AppendToFile sFileOut, oSvrHTTP.responseText
Set oSvrHTTP = Nothing
End Sub
Private Sub AppendToFile(ByVal sFile As String, ByVal sAppend As String)
Dim nF As Integer
nF = FreeFile
Open sFile For Append As #nF
Print #nF, sAppend
Close #nF
End Sub
Approximately every 5 minutes a new timestamped HTML file will be created
containing all the text (no pictures will be saved) for the page in the URL.
With VB you could make a program with a
WebBrowser control that loads the page at
intervals. The IE DOM is then available through
the WB.Document. You can use WB.Navigate
to go to the page, WB.Refresh to reload the page,
WB.Document.Body.innerText to get the text on
the page.
With script it's nearly the same because the WB is
the same as the InternetExplorer.Application object.
With a single instance of IE open to the page you want,
use the following to get the IE object and access the
DOM:
Set IE = GetObject(, "InternetExplorer.Application")
Do
WScript.Sleep 300000 '-- pause 5 minutes.
IE.Refresh
s = IE.Document.Body.innerText '-- get text of page
Loop
(Then there's also the MSXML approach that Jason
Keats detailed. MSXML has an XML, DOM but I don't think
you can access the IE DOM with it, so you'd need to parse
the downloaded file yourself without the help of
Body.innerText, etc.)
Cheers!!!
Try http://www.rcole.org/Pages/downloads.htm instead.
Regards,
Eric