ASP.NET Screen Scraping

ASP.NET and the .NET framework make it unbelievably easy to retrieve web content (that’s it, whole web pages) from remote servers. You might have various reasons to retrieve remote web content, for example you might want to get the latest news headlines from popular news sites and link to them from your website.

To accomplish screen scraping in classic ASP, we had to resort to COM objects like AspHttp, ASPTear and Microsoft.XMLHTTP. The good news is that the .NET framework has built-in classes allowing getting remote web content with ease.

We are going to use 2 .NET classes found in the System.Net namespace – WebRequest and WebResponse, to get the remote web page content.

Here is how ASP.NET screen scraping works. We need to create an instance of the WebRequest class and request a web page through it. We can request either a static page (.htm, .html, .txt, etc.) or dynamic page (.asp, .aspx, .php, .pl, etc.). The type of the page we are requesting it’s not important, because we are getting what the page displays in the browser (usually HTML), not the actual page code.

After we have requested the page with our WebRequest object, we’ll have to use the WebResponse class in order to get the web page response returned by the WebRequest object.

Once we get the response into our WebResponse object, we use the System.IO.Stream (this class provides a generic view of a sequence of bytes) and System.IO.StreamReader classes to read the web page response as a text. The StreamReader class is designed to read characters from a byte stream in a particular encoding, while the Stream class is designed for byte input and output.

In our example below, we just print the response in the browser window with Response.Write, but you can parse this content and use only the parts that you need.

Here is a full working example of ASP.NET screen scraping, written in ASP.NET (VB.NET):

<%@ Import Namespace=”System” %>
<%@ Import Namespace=”System.Net” %>
<%@ Import Namespace=”System.IO” %>

<script language=”VB” runat=”server”>

Sub Page_Load(Sender as Object, E as EventArgs)

Dim oRequest As WebRequest = WebRequest.Create(“http://www.aspdev.org/asp.net/&#8221;)
Dim oResponse As WebResponse = oRequest.GetResponse()

Dim oStream As Stream = oResponse.GetResponseStream()

Dim oStreamReader As New StreamReader(oStream, Encoding.UTF8)

Response.Write(oStreamReader.ReadToEnd())
oResponse.Close()
oStreamReader.Close()

End Sub

</script>

Advertisements

About alamzyah
Name : Alamsyah Nick Name : Alamzyah Place of Birth : Jakarta, 04 June 1983 sex : Male Religion : Moslem Region : Jakarta, Indonesia Specialist : IT, Computer mail : alamzyah@gmail.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: