Doug Steele
This month, Doug Steele explores a way to get information from the Internet. Silver Collection
There's a Web page that has information that changes over time. I want to be able to read that page, and get the most up-to-date information from it. How can I do this?
Realistically, this sounds like a job for Web Services. However, sometimes the provider doesn't provide a Web Service. Depending on the Web page, it may be possible to simulate a Web Service using a technique often referred to as "screen scraping."
To do this, I'm going to take advantage of the XMLHTTP object that should already exist on your machine, even if you don't know about it. As long as you've installed at least Internet Explorer, version 4.0, you should have the file msxml.dll on your machine. If it's there, you're in business.
What I'm going to describe isn't the prettiest solution, but I find that this method works. This method does require that you have some knowledge of HTML, since what you're going to be retrieving from the Web site is the actual HTML that makes up the page, not the text that appears in your browser. As well, you'll see that this technique isn't really that amenable to sites that change their format frequently.
Also, I'm going to completely ignore how you initiate communication with the Internet if you don't have a dedicated connection (I have cable, so I have no way of testing any code for linking to the Internet). If it's any consolation, I can at least point you to code that you can use to determine whether or not you have a live Internet connection. Microsoft VB MVP Randy Birch has some code on his VBNet site at www.mvps.org/vbnet/code/network/internetgetconnectedstate.htm. You could also take a look at his www.mvps.org/vbnet/code/network/isdestinationreachable.htm, but that solution requires some tweaking to be applied in Access (the code uses a control array, which Access doesn't support). Another bit of code you can check is Michael Kaplan's connect.bas module at www.trigeminal.com/codes.asp (I found that using 0&—a zero followed by an ampersand—instead of StrPtr(App.Title) in the call to Michael's InternetOpen function worked for Access).
Having said all that, the following code is a module that will return the HTML associated with a given URL and print it to the debug window:
Function GetFromWebpage( URL As String) As String
On Error GoTo Err_GetFromWebpage
Dim objWeb As Object
Dim strXML As String
Set objWeb = CreateObject("Microsoft.XMLHTTP")
objWeb.Open "GET", URL, False
objWeb.Send
strXML = objWeb.responseText
GetFromWebpage = strXML
End_GetFromWebpage:
Set objWeb = Nothing
Exit Sub
Err_GetFromWebpage:
MsgBox Err.Description & " (" & Err.Number & ")"
Resume End_GetFromWebpage
End Sub
The routine begins by instantiating an instance of the XMLHTTP object and then gives the object a URL to retrieve. When the Send method is done, the code looks at the responseText property of the object. The problem with this code is that there's a lot of stuff returned that you probably don't care about: You get all of the HTML required to render the page in your browser, not just the data you're seeking. If there's a specific piece of information on the page that you want, you now have to parse the page (using functions like InStr, Left, Mid, and Right) to get only that part of the HTML that's of interest to you.
I'll present a simple example. Let's say you want to be able to get stock quotes. There are lots of Web sites that will give you stock quotes. For no particular reason, I'll choose to use http://finance.yahoo.com. Looking at that site, you can go directly to a given stock's metrics if you know the symbol for that stock. Picking everyone's favorite, you can look for Microsoft's stock price by going to http://finance.yahoo.com/q?s=msft. (Warning: Here's where writing these articles weeks or months in advance becomes a problem! When I wrote this at the beginning of August, the site appeared as shown in Figure 1. However, there was a warning on the page that they were in the process of testing a new interface. Hopefully they haven't implemented it before you read this!)
When I look at the HTML associated with that table, I see something like this:
<table cellspacing=1 cellpadding=4 border=0 width=100%>
<tr bgcolor=dcdcdc><th nowrap><font face=arial
size=-1>Symbol</font></th><th nowrap colspan=2><font
face=arial size=-1>Last Trade</font></th><th nowrap
colspan=2><font face=arial size=-1>Change</font></th>
<th nowrap><font face=arial size=-1>Volume</font></th>
</tr>
<tr align=right bgcolor=white>
<td nowrap align=left><font face=arial size=-1>
<a href="/q?s=MSFT&d=t">MSFT</a></font></td>
<td nowrap align=center><font face=arial size=-1>
12:22pm</font></td><td nowrap><font face=arial
size=-1><b>25.62</b></font></td><td nowrap><font
face=arial size=-1><font color=ff0020>-0.09</font>
</font></td><td nowrap><font face=arial size=-1><font
color=ff0020>-0.35%</font></font></td><td nowrap>
<font face=arial size=-1>17,721,532</font></td>
</tr>
and so on
The pattern for this page is consistent:
• The stock symbol appears after the HTML tags <font face=arial size=-1><a href="/q?s=.
• Last Trade information appears after the following <font face=arial size=-1> tag.
• The actual stock price follows the <font face=arial size=-1> following the Last Trade Information (and is enclosed in <b></b> tags).
It isn't that difficult to write a function that retrieves the HTML for the entire page, and then searches for those specific bits of text in the HTML. Since the code I showed before retrieves the entire HTML, all I need to do is write code to parse all of the ugly parts and find the important stuff buried in it:
Function GetStockQuote(CompanySymbol As String) _
As String
On Error GoTo Err_GetStockQuote
Dim objWeb As Object
Dim lngDateEnd As Long
Dim lngDateLen As Long
Dim lngDateStart As Long
Dim lngPointer As Long
Dim lngQuoteEnd As Long
Dim lngQuoteLen As Long
Dim lngQuoteStart As Long
Dim strDate As String
Dim strHTML As String
Dim strQuote As String
Dim strReturn As String
Dim strSearchFor As String
Dim strURL As String
strURL = "http://finance.yahoo.com/q?s=" & _
LCase$(CompanySymbol)
strHTML = GetFromWebpage(strURL)
strReturn = "Sorry: Couldn't get the quote for " & _
UCase$(CompanySymbol)
strSearchFor = "<font face=arial size=-1>" & _
"<a href=""/q?s=" & _
CompanySymbol & "&d=t"
lngPointer = InStr(1, strHTML, _
strSearchFor, vbTextCompare)
If lngPointer > 0 Then
strSearchFor = "<font face=arial size=-1>"
lngPointer = lngPointer + Len(strSearchFor)
lngDateStart = InStr(lngPointer, strHTML, _
strSearchFor, vbTextCompare)
If lngDateStart > 0 Then
lngDateStart = lngDateStart + Len(strSearchFor)
lngDateEnd = InStr(lngDateStart, strHTML, _
"</font>", vbTextCompare)
If lngDateEnd > 0 Then
lngDateLen = lngDateEnd - lngDateStart
strDate = Mid$(strHTML, lngDateStart, lngDateLen)
lngQuoteStart = InStr(lngDateEnd, strHTML, _
strSearchFor, vbTextCompare)
If lngQuoteStart > 0 Then
lngQuoteStart = lngQuoteStart + _
Len(strSearchFor)
lngQuoteEnd = InStr(lngQuoteStart, strHTML, _
"</font>", vbTextCompare)
If lngQuoteEnd > 0 Then
lngQuoteLen = lngQuoteEnd - lngQuoteStart
strQuote = Mid$(strHTML, lngQuoteStart, _
lngQuoteLen)
strReturn = UCase$(CompanySymbol) & _
" last traded for " & _
RemoveBold(Trim$(strQuote)) & _
" at " & Trim$(strDate)
End If
End If
End If
End If
End If
End_GetStockQuote:
Set objWeb = Nothing
GetStockQuote = strReturn
Exit Function
The RemoveBold function I use when concatenating everything for output is just this very simple code that replaces all occurrences of either <b> or </b> with a zero length string (""):
Function RemoveBold(StringToEdit As String) As String
RemoveBold = Replace( _
Replace(StringToEdit, "<b>", vbNullString), _
"</b>", vbNullString)
End Function
As you can see, my sample search function isn't particularly pretty code, but it does what it's supposed to. The problem, of course, is that this code will fail to work if the Web page changes (including errors in the HTML generated to produce the page). And as you can see, a reasonable understanding of HTML code is required to be able to pinpoint the exact location in the file of the information you want.
Your download file is called AA1103.ZIP in the file SA2003-11down.zip
This is found in the Silver Collection on this page
See Also