<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DavidCraddock.net</title>
	<atom:link href="http://www.davidcraddock.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.davidcraddock.net</link>
	<description>My Technology Site</description>
	<lastBuildDate>Tue, 22 Nov 2011 13:45:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>JSoup Method for Page Scraping</title>
		<link>http://www.davidcraddock.net/2011/09/07/jsoup-method-for-page-scraping/</link>
		<comments>http://www.davidcraddock.net/2011/09/07/jsoup-method-for-page-scraping/#comments</comments>
		<pubDate>Wed, 07 Sep 2011 18:35:17 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Solutions to a Specific Problem]]></category>
		<category><![CDATA[BeautifulSoup]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[JSoup]]></category>
		<category><![CDATA[Scraper]]></category>
		<category><![CDATA[Scraping webpages]]></category>
		<category><![CDATA[screen scrape]]></category>
		<category><![CDATA[web scraping]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=938</guid>
		<description><![CDATA[I&#8217;m currently in the process of writing a web scraper for the forums on Gaia Online. Previously, I used to use Python to develop web scrapers, with the very handy Python library BeautifulSoup. Java has an equivalent called JSoup. Here I have written a class which is extended by each class in my project that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/09/soup.jpg"><img src="http://www.davidcraddock.net/wp-content/uploads/2011/09/soup.jpg" alt="Soup bowl" title="Soup" width="300" height="300" class="aligncenter size-full wp-image-946" /></a></p>
<p>I&#8217;m currently in the process of writing a web scraper for the forums on <a href="http://www.gaiaonline.com/forum" title="Gaia Online">Gaia Online</a>. Previously, I used to use Python to develop web scrapers, with the very handy Python library <a href="http://www.crummy.com/software/BeautifulSoup/" title="BeautifulSoup">BeautifulSoup</a>. Java has an equivalent called JSoup.</p>
<p>Here I have written a class which is extended by each class in my project that wants to scrape HTML. This &#8216;Scraper&#8217; class deals with the fetching of the HTML and converting it into a JSoup tree to be navigated and have the data picked out of. It advertises itself as a &#8216;web spider&#8217; type of web agent and also adds a 0-7 second random wait before fetching the page to make sure it isn&#8217;t used to overload a web server. It also converts the entire page to ASCII, which may not be the best thing to do for multi-language web pages, but certainly has made the scraping of the English language site Gaia Online much easier.</p>
<p>Here it is:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.InputStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.StringWriter</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.text.Normalizer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Random</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.commons.io.IOUtils</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.http.HttpEntity</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.http.HttpResponse</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.http.client.HttpClient</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.http.client.methods.HttpGet</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.http.impl.client.DefaultHttpClient</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.jsoup.Jsoup</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.jsoup.nodes.Document</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
* Generic scraper object that contains the basic methods required to fetch
* and parse HTML content. Extended by other classes that need to scrape.
*
* @author David
*/</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> Scraper <span style="color: #009900;">&#123;</span>
&nbsp;
        <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">String</span> pageHTML <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// the HTML for the page</span>
        <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">Document</span> pageSoup<span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// the JSoup scraped hierachy for the page</span>
&nbsp;
&nbsp;
        <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">String</span> fetchPageHTML<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> <span style="color: #003399;">URL</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span><span style="color: #009900;">&#123;</span>
&nbsp;
            <span style="color: #666666; font-style: italic;">// this makes sure we don't scrape the same page twice</span>
            <span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageHTML</span> <span style="color: #339933;">!=</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageHTML</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            <span style="color: #003399;">System</span>.<span style="color: #006633;">getProperties</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">setProperty</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;httpclient.useragent&quot;</span>, <span style="color: #0000ff;">&quot;spider&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
            <span style="color: #003399;">Random</span> randomGenerator <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Random</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000066; font-weight: bold;">int</span> sleepTime <span style="color: #339933;">=</span> randomGenerator.<span style="color: #006633;">nextInt</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">7000</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000000; font-weight: bold;">try</span><span style="color: #009900;">&#123;</span>
                <span style="color: #003399;">Thread</span>.<span style="color: #006633;">sleep</span><span style="color: #009900;">&#40;</span>sleepTime<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//sleep for x milliseconds</span>
            <span style="color: #009900;">&#125;</span><span style="color: #000000; font-weight: bold;">catch</span><span style="color: #009900;">&#40;</span><span style="color: #003399;">Exception</span> e<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">// only fires if topic is interruped by another process, should never happen</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            <span style="color: #003399;">String</span> pageHTML <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">;</span>
&nbsp;
            HttpClient httpclient <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> DefaultHttpClient<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            HttpGet httpget <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> HttpGet<span style="color: #009900;">&#40;</span><span style="color: #003399;">URL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                HttpResponse response <span style="color: #339933;">=</span> httpclient.<span style="color: #006633;">execute</span><span style="color: #009900;">&#40;</span>httpget<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                HttpEntity entity <span style="color: #339933;">=</span> response.<span style="color: #006633;">getEntity</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>entity <span style="color: #339933;">!=</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                    <span style="color: #003399;">InputStream</span> instream <span style="color: #339933;">=</span> entity.<span style="color: #006633;">getContent</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                    <span style="color: #003399;">String</span> encoding <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;UTF-8&quot;</span><span style="color: #339933;">;</span>
&nbsp;
                    <span style="color: #003399;">StringWriter</span> writer <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">StringWriter</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                    IOUtils.<span style="color: #006633;">copy</span><span style="color: #009900;">&#40;</span>instream, writer, encoding<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                    pageHTML <span style="color: #339933;">=</span> writer.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                    <span style="color: #666666; font-style: italic;">// convert entire page scrape to ASCII-safe string</span>
                    pageHTML <span style="color: #339933;">=</span> Normalizer.<span style="color: #006633;">normalize</span><span style="color: #009900;">&#40;</span>pageHTML, Normalizer.<span style="color: #006633;">Form</span>.<span style="color: #006633;">NFD</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">replaceAll</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;[^<span style="color: #000099; font-weight: bold;">\\</span>p{ASCII}]&quot;</span>, <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                <span style="color: #009900;">&#125;</span>
&nbsp;
                <span style="color: #000000; font-weight: bold;">return</span> pageHTML<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">Document</span> fetchPageSoup<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> pageHTML<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> FetchSoupException<span style="color: #009900;">&#123;</span>
&nbsp;
            <span style="color: #666666; font-style: italic;">// this makes sure we don't soupify the same page twice</span>
            <span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageSoup</span> <span style="color: #339933;">!=</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageSoup</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            <span style="color: #000000; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>pageHTML.<span style="color: #006633;">equalsIgnoreCase</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000000; font-weight: bold;">throw</span> <span style="color: #000000; font-weight: bold;">new</span> FetchSoupException<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;We have no supplied HTML to soupify.&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            <span style="color: #003399;">Document</span> pageSoup <span style="color: #339933;">=</span> Jsoup.<span style="color: #006633;">parse</span><span style="color: #009900;">&#40;</span>pageHTML<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
            <span style="color: #000000; font-weight: bold;">return</span> pageSoup<span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Then each class subclasses this scraper class, and adds the actual drilling down through the JSoup hierachy tree to get what is required:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">...
<span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageHTML</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">fetchPageHTML</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">rootURL</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageSoup</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">fetchPageSoup</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageHTML</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// get the first &lt;div id=&quot;forum_hd_topic_pagelinks&quot;&gt;..&lt;/div&gt; section on the page</span>
<span style="color: #003399;">Element</span> forumPageLinkSection <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">pageSoup</span>.<span style="color: #006633;">getElementsByAttributeValue</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;id&quot;</span>,<span style="color: #0000ff;">&quot;forum_hd_topic_pagelinks&quot;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">first</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// get all the links in the above &lt;div&gt; section</span>
Elements forumPageLinks <span style="color: #339933;">=</span> forumPageLinkSection.<span style="color: #006633;">getElementsByAttribute</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;href&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
...</pre></div></div>

<p>I&#8217;ve found that this method provides a simple and effective way of scraping pages and using the resultant JSoup tree to pick out important data.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/09/07/jsoup-method-for-page-scraping/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Disabling Control-Enter and Control-B shortcut keys in Outlook 2003</title>
		<link>http://www.davidcraddock.net/2011/07/13/disabling-control-enter-and-control-b-shortcut-keys-in-outlook-2003/</link>
		<comments>http://www.davidcraddock.net/2011/07/13/disabling-control-enter-and-control-b-shortcut-keys-in-outlook-2003/#comments</comments>
		<pubDate>Wed, 13 Jul 2011 16:34:39 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Solutions to a Specific Problem]]></category>
		<category><![CDATA[disabling shortcut]]></category>
		<category><![CDATA[outlook 2003]]></category>
		<category><![CDATA[regedit]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=924</guid>
		<description><![CDATA[At work, I still have to use Windows XP and Outlook 2003. I don&#8217;t particually mind this, except when I draft an email to someone and accidently I press Control-B instead of Control-V. Control-B will go ahead and send your partially composed email, resulting in some embarassment as you have to tell everyone to disregard [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/07/email-oops.jpg"><img src="http://www.davidcraddock.net/wp-content/uploads/2011/07/email-oops.jpg" alt="" title="" width="240" height="159" class="aligncenter size-full wp-image-936" /></a></p>
<p>At work, I still have to use Windows XP and Outlook 2003. I don&#8217;t particually mind this, except when I draft an email to someone and accidently I press Control-B instead of Control-V. Control-B will go ahead and send your partially composed email, resulting in some embarassment as you have to tell everyone to disregard it.</p>
<p>So I wanted to remove the &#8216;send email&#8217; shortcut keys in Outlook 2003. There are two ways of doing this, one involves editing your group policy, which is something only my IT administration team can do, and I didn&#8217;t want to have to involve them. The other way is by making a change to your registry, which I will describe here.</p>
<ol>
<li>Open up regedit, and browse to the following registry key: HKEY_CURRENT_USER -> Software -> Microsoft -> office -> 11.0 -> outlook</li>
<li>Then create a new key called: &#8220;DisabledShortcutKeysCheckBoxes&#8221;.</li>
<li>Under that key, create two new String Values:<br />
Name: CtrlB Data: 66,8<br />
Name: CtrlEnter Data: 13,8
</li>
<li>Then restart Outlook and those keys will be disabled.</li>
</ol>
<p>Click on the thumbnail below to see what the finished edit should look like:</p>
<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/07/disablingshortcutkeys.jpg"><img src="http://www.davidcraddock.net/wp-content/uploads/2011/07/disablingshortcutkeys-300x123.jpg" alt="" title="disablingshortcutkeys" width="300" height="123" class="aligncenter size-medium wp-image-928" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/07/13/disabling-control-enter-and-control-b-shortcut-keys-in-outlook-2003/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Directory names not visable under ls? Change your colours.</title>
		<link>http://www.davidcraddock.net/2011/05/04/directory-names-not-visable-under-ls-change-your-colours/</link>
		<comments>http://www.davidcraddock.net/2011/05/04/directory-names-not-visable-under-ls-change-your-colours/#comments</comments>
		<pubDate>Wed, 04 May 2011 16:03:55 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Solutions to a Specific Problem]]></category>
		<category><![CDATA[centos]]></category>
		<category><![CDATA[console]]></category>
		<category><![CDATA[directory name not visable]]></category>
		<category><![CDATA[fedora]]></category>
		<category><![CDATA[ls]]></category>
		<category><![CDATA[LS_COLORS]]></category>
		<category><![CDATA[putty]]></category>
		<category><![CDATA[redhat]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=910</guid>
		<description><![CDATA[There is a problem I frequently encouter on Redhat/Fedora/CentOS systems with the output of the ls command. Under those distributions, the default setup is to display directories in a very dark colour. If you usually use a white foreground and a black background on your terminal client (such as Putty) then you will struggle to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/05/range_of_colours.jpg"><img style="border: none;" src="http://www.davidcraddock.net/wp-content/uploads/2011/05/range_of_colours-300x199.jpg" alt="" title="range_of_colours" width="300" height="199" class="aligncenter size-medium wp-image-913" /></a></p>
<p>There is a problem I frequently encouter on Redhat/Fedora/CentOS systems with the output of the <strong>ls</strong> command. Under those distributions, the default setup is to display directories in a very dark colour. If you usually use a white foreground and a black background on your terminal client (such as Putty) then you will struggle to read the names of the directories under Redhat-based distributions. </p>
<p>There are two soloutions that I have used:</p>
<p><strong>1. Change the colour settings in Putty </strong></p>
<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/05/screenshot-of-use-system-colors.bmp"><img src="http://www.davidcraddock.net/wp-content/uploads/2011/05/screenshot-of-use-system-colors.bmp" alt="" style="border: none;" title="screenshot of use system colors" class="aligncenter size-full wp-image-911" /></a></p>
<p>If you use Putty, ticking &#8216;Use System Colours&#8217; here changes the &#8220;white foreground, black background&#8221; default into a &#8220;white background, black foreground&#8221;. This way you can at least read the console properly, good for a quick fix. You can also save these settings in putty to be the default for the host that you are connecting to, or even all hosts.</p>
<p><strong>2. Change the LS_COLORS directive temporarily in the shell.</strong></p>
<p>Alternatively, you can ask the <strong>ls</strong> command to display directories and other entries in colours that you specify. You could add these lines to the bottom of your .bashrc to make these changes permanent, or if you are using a shared machine, just copy and paste the following lines into the terminal and they will change the colours to a reddish more visable set, until you logout. :</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #7a0874; font-weight: bold;">alias</span> <span style="color: #007800;">ls</span>=<span style="color: #ff0000;">'ls --color'</span> <span style="color: #666666; font-style: italic;"># just to make sure we are using coloured ls</span>
<span style="color: #007800;">LS_COLORS</span>=<span style="color: #ff0000;">'di=94:fi=0:ln=31:pi=5:so=5:bd=5:cd=5:or=31:mi=0:ex=35:*.rpm=90'</span>
<span style="color: #7a0874; font-weight: bold;">export</span> LS_COLORS</pre></div></div>

<p>(Original source for this particular LS_COLORS combo: <a href="http://linux-sxs.org/housekeeping/lscolors.html">http://linux-sxs.org/housekeeping/lscolors.html</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/05/04/directory-names-not-visable-under-ls-change-your-colours/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scraping Gumtree Property Adverts with Python and BeautifulSoup</title>
		<link>http://www.davidcraddock.net/2011/05/01/scraping-gumtree-property-adverts-with-python-and-beautifulsoup/</link>
		<comments>http://www.davidcraddock.net/2011/05/01/scraping-gumtree-property-adverts-with-python-and-beautifulsoup/#comments</comments>
		<pubDate>Sun, 01 May 2011 14:07:02 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[BeautifulSoup]]></category>
		<category><![CDATA[property adverts]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[scraping Gumtree]]></category>
		<category><![CDATA[web scraping]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=886</guid>
		<description><![CDATA[I am moving to Manchester soon, and so I thought I&#8217;d get an idea of the housing market there by scraping all the Manchester Gumtree property adverts into a MySQL database. Once in the database, I could do things like find the average monthly price for a 2 bedroom flat in an area, and spot [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/05/soup.jpg"><img style="border: none" src="http://www.davidcraddock.net/wp-content/uploads/2011/05/soup-300x199.jpg" alt="" title="soup" width="300" height="199" class="aligncenter size-medium wp-image-897" /></a></p>
<p>I am moving to Manchester soon, and so I thought I&#8217;d get an idea of the housing market there by scraping all the Manchester Gumtree property adverts into a MySQL database. Once in the database, I could do things like find the average monthly price for a 2 bedroom flat in an area, and spot bargains through using standard deviation from the mean on the price through using simple SQL queries via <a href="http://www.phpmyadmin.net/home_page/index.php">phpMyAdmin</a>.</p>
<p>I really like the Python library <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a> for writing scrapers, there is also a Java version called <a href="http://jsoup.org/">JSoup</a>. BeautifulSoup does a really good job of tolerating markup mistakes in the input data, and transforms a page into a tree structure that is easy to work with.</p>
<p>I chose the following layout for the program:</p>
<p><strong>advert.py</strong> &#8211; Stores all information about each property advert, with a &#8216;save&#8217; method that inserts the data into the mysql database<br />
<strong>listing.py</strong> &#8211; Stores all the information on each listing page, which is broken down into links for specific adverts, and also the link to the next listing page in the sequence (ie: the &#8216;next page&#8217; link)<br />
<strong>scrapeAdvert.py</strong> &#8211; When given an advert URL, this creates and populates an advert object<br />
<strong>scrapeListing.py</strong> &#8211; When given a listing URL, this creates and populates a listing object<br />
<strong>scrapeSequence.py</strong> &#8211; This walks through a series of listings, calling scrapeListing and scrapeAdvert for all of them, and finishes when there are no more listings in the sequence to scrape</p>
<p>Here is the MySQL table I created for this project (which you will have to setup if you want to run the scraper):</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">--</span>
<span style="color: #808080; font-style: italic;">-- Database: `manchester`</span>
<span style="color: #808080; font-style: italic;">--</span>
&nbsp;
<span style="color: #808080; font-style: italic;">-- --------------------------------------------------------</span>
&nbsp;
<span style="color: #808080; font-style: italic;">--</span>
<span style="color: #808080; font-style: italic;">-- Table structure for table `adverts`</span>
<span style="color: #808080; font-style: italic;">--</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #993333; font-weight: bold;">IF</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">EXISTS</span> <span style="color: #ff0000;">`adverts`</span> <span style="color: #66cc66;">&#40;</span>
  <span style="color: #ff0000;">`url`</span> <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`title`</span> text <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`pricePW`</span> <span style="color: #993333; font-weight: bold;">INT</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`pricePCM`</span> <span style="color: #993333; font-weight: bold;">INT</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">11</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`location`</span> text <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`dateAvailable`</span> <span style="color: #993333; font-weight: bold;">DATE</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`propertyType`</span> text <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`bedroomNumber`</span> <span style="color: #993333; font-weight: bold;">INT</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">11</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #ff0000;">`description`</span> text <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">`url`</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span> ENGINE<span style="color: #66cc66;">=</span>MyISAM <span style="color: #993333; font-weight: bold;">DEFAULT</span> CHARSET<span style="color: #66cc66;">=</span>latin1;</pre></div></div>

<p>PricePCM is price per calendar month, PricePW is price per week. Usually each advert with have one or the other specified.</p>
<p><b>advert.py:</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> MySQLdb
<span style="color: #ff7700;font-weight:bold;">import</span> chardet
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> advert:
&nbsp;
        url = <span style="color: #483d8b;">&quot;&quot;</span>
        title = <span style="color: #483d8b;">&quot;&quot;</span>
        pricePW = <span style="color: #ff4500;">0</span>
        pricePCM = <span style="color: #ff4500;">0</span>
        location = <span style="color: #483d8b;">&quot;&quot;</span>
        dateAvailable = <span style="color: #483d8b;">&quot;&quot;</span>
        propertyType = <span style="color: #483d8b;">&quot;&quot;</span>
        bedroomNumber = <span style="color: #ff4500;">0</span>
        description = <span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> save<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #808080; font-style: italic;"># you will need to change the following to match your mysql credentials:</span>
                db=MySQLdb.<span style="color: black;">connect</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;localhost&quot;</span>,<span style="color: #483d8b;">&quot;root&quot;</span>,<span style="color: #483d8b;">&quot;secret&quot;</span>,<span style="color: #483d8b;">&quot;manchester&quot;</span><span style="color: black;">&#41;</span>
                c=db.<span style="color: black;">cursor</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #008000;">self</span>.<span style="color: black;">description</span> = <span style="color: #008000;">unicode</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">description</span>, errors=<span style="color: #483d8b;">'replace'</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">description</span> = <span style="color: #008000;">self</span>.<span style="color: black;">description</span>.<span style="color: black;">encode</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'ascii'</span>,<span style="color: #483d8b;">'ignore'</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># TODO: might need to convert the other strings in the advert if there are any unicode conversetion errors</span>
&nbsp;
                sql = <span style="color: #483d8b;">&quot;INSERT INTO adverts (url,title,pricePCM,pricePW,location,dateAvailable,propertyType,bedroomNumber,description) VALUES('&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: black;">url</span>+<span style="color: #483d8b;">&quot;','&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: black;">title</span>+<span style="color: #483d8b;">&quot;',&quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">pricePCM</span><span style="color: black;">&#41;</span>+<span style="color: #483d8b;">&quot;,&quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">pricePW</span><span style="color: black;">&#41;</span>+<span style="color: #483d8b;">&quot;,'&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: black;">location</span>+<span style="color: #483d8b;">&quot;','&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: black;">dateAvailable</span>+<span style="color: #483d8b;">&quot;','&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: black;">propertyType</span>+<span style="color: #483d8b;">&quot;',&quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">bedroomNumber</span><span style="color: black;">&#41;</span>+<span style="color: #483d8b;">&quot;,'&quot;</span>+<span style="color: #008000;">self</span>.<span style="color: black;">description</span>+<span style="color: #483d8b;">&quot;' )&quot;</span>
&nbsp;
                c.<span style="color: black;">execute</span><span style="color: black;">&#40;</span>sql<span style="color: black;">&#41;</span></pre></div></div>

<p>In advert.py we convert the unicode output that BeautifulSoup gives us into plain ASCII so that we can put it in the MySQL database without any problems. I could have used Unicode in the database as well, but the chances of really needing Unicode for representing Gumtree ads is quite slim. If you intend to use this code then you will also want to enter the MySQL credentials for your database.</p>
<p><b>listing.py:</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> listing:
&nbsp;
        url=<span style="color: #483d8b;">&quot;&quot;</span>
        adverturls=<span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        nextLink=<span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> addAdvertURL<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>,url<span style="color: black;">&#41;</span>:
&nbsp;
                <span style="color: #008000;">self</span>.<span style="color: black;">adverturls</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span></pre></div></div>

<p><b>scrapeAdvert.py:</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> BeautifulSoup <span style="color: #ff7700;font-weight:bold;">import</span> BeautifulSoup          <span style="color: #808080; font-style: italic;"># For processing HTML</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib2</span>
<span style="color: #ff7700;font-weight:bold;">from</span> advert <span style="color: #ff7700;font-weight:bold;">import</span> advert
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> scrapeAdvert:
&nbsp;
        page = <span style="color: #483d8b;">&quot;&quot;</span>
        soup = <span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> scrape<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>,advertURL<span style="color: black;">&#41;</span>:
&nbsp;
                <span style="color: #808080; font-style: italic;"># give it a bit of time so gumtree doesn't</span>
                <span style="color: #808080; font-style: italic;"># ban us</span>
                <span style="color: #dc143c;">time</span>.<span style="color: black;">sleep</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span>
&nbsp;
                url = advertURL
                <span style="color: #808080; font-style: italic;"># print &quot;-- scraping &quot;+url+&quot; --&quot;</span>
                page = <span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">soup</span> = BeautifulSoup<span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span> = advert<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">url</span> = url
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">title</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractTitle</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">pricePW</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractPricePW</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">pricePCM</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractPricePCM</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">location</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractLocation</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">dateAvailable</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractDateAvailable</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">propertyType</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractPropertyType</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">bedroomNumber</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractBedroomNumber</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">anAd</span>.<span style="color: black;">description</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractDescription</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractTitle<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                location = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'h1'</span><span style="color: black;">&#41;</span>
                <span style="color: #dc143c;">string</span> = location.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">string</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + stripped + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> stripped
&nbsp;
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractPricePCM<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                location = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'span'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;class&quot;</span> : <span style="color: #483d8b;">&quot;price&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        <span style="color: #dc143c;">string</span> = location.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                        <span style="color: #dc143c;">string</span>.<span style="color: black;">index</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'pcm'</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">AttributeError</span>: <span style="color: #808080; font-style: italic;"># for ads with no prices set</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">ValueError</span>: <span style="color: #808080; font-style: italic;"># for ads with pw specified</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
&nbsp;
                stripped = <span style="color: #dc143c;">string</span>.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'&amp;pound;'</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'pcm'</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">','</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>stripped.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + stripped + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>stripped<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractPricePW<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                location = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'span'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;class&quot;</span> : <span style="color: #483d8b;">&quot;price&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        <span style="color: #dc143c;">string</span> = location.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                        <span style="color: #dc143c;">string</span>.<span style="color: black;">index</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'pw'</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">AttributeError</span>: <span style="color: #808080; font-style: italic;"># for ads with no prices set</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">ValueError</span>: <span style="color: #808080; font-style: italic;"># for ads with pcm specified</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
                stripped = <span style="color: #dc143c;">string</span>.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'&amp;pound;'</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'pw'</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">','</span>,<span style="color: #483d8b;">''</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>stripped.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + stripped + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">int</span><span style="color: black;">&#40;</span>stripped<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractLocation<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                location = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'span'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;class&quot;</span> : <span style="color: #483d8b;">&quot;location&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                <span style="color: #dc143c;">string</span> = location.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">string</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + stripped + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> stripped
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractDateAvailable<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                current_year = <span style="color: #483d8b;">'2011'</span>
&nbsp;
                ul = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'ul'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span> : <span style="color: #483d8b;">&quot;ad-details&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                firstP = ul.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'p'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                <span style="color: #dc143c;">string</span> = firstP.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">string</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                date_to_convert = stripped + <span style="color: #483d8b;">'/'</span>+current_year
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        date_object = <span style="color: #dc143c;">time</span>.<span style="color: black;">strptime</span><span style="color: black;">&#40;</span>date_to_convert, <span style="color: #483d8b;">&quot;%d/%m/%Y&quot;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">ValueError</span>: <span style="color: #808080; font-style: italic;"># for adverts with no date available</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
                full_date = <span style="color: #dc143c;">time</span>.<span style="color: black;">strftime</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'%Y-%m-%d %H:%M:%S'</span>, date_object<span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + full_date + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> full_date
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractPropertyType<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                ul = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'ul'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span> : <span style="color: #483d8b;">&quot;ad-details&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        secondP = ul.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'p'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">IndexError</span>: <span style="color: #808080; font-style: italic;"># for properties with no type</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;&quot;</span>
                <span style="color: #dc143c;">string</span> = secondP.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">string</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + stripped + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> stripped
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractBedroomNumber<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                ul = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'ul'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span> : <span style="color: #483d8b;">&quot;ad-details&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        thirdP = ul.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'p'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">IndexError</span>: <span style="color: #808080; font-style: italic;"># for properties with no bedroom number</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
                <span style="color: #dc143c;">string</span> = thirdP.<span style="color: black;">contents</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                stripped = <span style="color: #483d8b;">' '</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">string</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                stripped = stripped.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + stripped + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> stripped
&nbsp;
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractDescription<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                div = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'div'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span> : <span style="color: #483d8b;">&quot;description&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                description = div.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'p'</span><span style="color: black;">&#41;</span>
                contents = description.<span style="color: black;">renderContents</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                contents = contents.<span style="color: black;">replace</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;'&quot;</span>,<span style="color: #483d8b;">'&amp;quot;'</span><span style="color: black;">&#41;</span>
                <span style="color: #808080; font-style: italic;"># print '|' + contents + '|'</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> contents</pre></div></div>

<p>In scrapeAdvert.py there are a lot of string manipulation statements to pull out any unwanted characters, such as the &#8216;pw&#8217; characters (short for per week) found in the price string, which we need to remove in order to store the property price per week as an integer.</p>
<p>Using BeautifulSoup to pull out elements is quite easy, for example:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">ul = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'ul'</span>,attrs=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span> : <span style="color: #483d8b;">&quot;ad-details&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>That finds all the HTML elements under &lt;ul id=&#8221;ad-details&#8221;&gt;, so all the list elements in that list. More detail can be found in the <a href="http://www.crummy.com/software/BeautifulSoup/documentation.html">Beautiful Soup documentation</a> which is very good.</p>
<p><b>scrapeListing.py:</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> BeautifulSoup <span style="color: #ff7700;font-weight:bold;">import</span> BeautifulSoup          <span style="color: #808080; font-style: italic;"># For processing HTML</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib2</span>
<span style="color: #ff7700;font-weight:bold;">from</span> listing <span style="color: #ff7700;font-weight:bold;">import</span> listing
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> scrapeListing:
&nbsp;
        soup = <span style="color: #483d8b;">&quot;&quot;</span>
        url = <span style="color: #483d8b;">&quot;&quot;</span>
        aListing = <span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> scrape<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>,url<span style="color: black;">&#41;</span>:
                <span style="color: #808080; font-style: italic;"># give it a bit of time so gumtree doesn't</span>
                <span style="color: #808080; font-style: italic;"># ban us</span>
                <span style="color: #dc143c;">time</span>.<span style="color: black;">sleep</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;scraping url = &quot;</span>+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
&nbsp;
                page = <span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">soup</span> = BeautifulSoup<span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #008000;">self</span>.<span style="color: black;">aListing</span> = listing<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">aListing</span>.<span style="color: black;">url</span> = url
                <span style="color: #008000;">self</span>.<span style="color: black;">aListing</span>.<span style="color: black;">adverturls</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractAdvertURLs</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">aListing</span>.<span style="color: black;">nextLink</span> = <span style="color: #008000;">self</span>.<span style="color: black;">extractNextLink</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractAdvertURLs<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                toReturn = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
                h3s = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;h3&quot;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> h3 <span style="color: #ff7700;font-weight:bold;">in</span> h3s:
                        links = h3.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'a'</span>,<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;class&quot;</span>:<span style="color: #483d8b;">&quot;summary&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">for</span> link <span style="color: #ff7700;font-weight:bold;">in</span> links:
                                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;|&quot;</span>+link<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span>+<span style="color: #483d8b;">&quot;|&quot;</span>
                                toReturn.<span style="color: black;">append</span><span style="color: black;">&#40;</span>link<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #ff7700;font-weight:bold;">return</span> toReturn
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> extractNextLink<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
&nbsp;
                links = <span style="color: #008000;">self</span>.<span style="color: black;">soup</span>.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;a&quot;</span>,<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;class&quot;</span>:<span style="color: #483d8b;">&quot;next&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;&gt;&quot;</span>+links<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span>+<span style="color: #483d8b;">&quot;&gt;&quot;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">IndexError</span>: <span style="color: #808080; font-style: italic;"># if there is no 'next' link found..</span>
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;&quot;</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> links<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span></pre></div></div>

<p>The extractNextLink method here extracts the pagination &#8216;next&#8217; link which will bring up the next listing page from the selection of listing pages to browse. We use it to step through the pagination &#8216;sequence&#8217; of resultant listing pages.</p>
<p><b>scrapeSequence.py:</b></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> scrapeListing <span style="color: #ff7700;font-weight:bold;">import</span> scrapeListing
<span style="color: #ff7700;font-weight:bold;">from</span> scrapeAdvert <span style="color: #ff7700;font-weight:bold;">import</span> scrapeAdvert
<span style="color: #ff7700;font-weight:bold;">from</span> listing <span style="color: #ff7700;font-weight:bold;">import</span> listing
<span style="color: #ff7700;font-weight:bold;">from</span> advert <span style="color: #ff7700;font-weight:bold;">import</span> advert
<span style="color: #ff7700;font-weight:bold;">import</span> MySQLdb
<span style="color: #ff7700;font-weight:bold;">import</span> _mysql_exceptions
&nbsp;
<span style="color: #808080; font-style: italic;"># change this to the gumtree page you want to start scraping from</span>
url = <span style="color: #483d8b;">&quot;http://www.gumtree.com/flats-and-houses-for-rent/salford-quays&quot;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">while</span> url <span style="color: #66cc66;">!</span>= <span style="color: #008000;">None</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;scraping URL = &quot;</span>+url
        sl = <span style="color: #483d8b;">&quot;&quot;</span>
        sl = scrapeListing<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        sl.<span style="color: black;">scrape</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> advertURL <span style="color: #ff7700;font-weight:bold;">in</span> sl.<span style="color: black;">aListing</span>.<span style="color: black;">adverturls</span>:
                sa = <span style="color: #483d8b;">&quot;&quot;</span>
                sa = scrapeAdvert<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                sa.<span style="color: black;">scrape</span><span style="color: black;">&#40;</span>advertURL<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">try</span>:
                        sa.<span style="color: black;">anAd</span>.<span style="color: black;">save</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">except</span> _mysql_exceptions.<span style="color: black;">IntegrityError</span>:
                        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;** Advert &quot;</span> + sa.<span style="color: black;">anAd</span>.<span style="color: black;">url</span> + <span style="color: #483d8b;">&quot; already saved **&quot;</span>
                sa.<span style="color: black;">onAd</span> = <span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
        url = <span style="color: #483d8b;">&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> sl.<span style="color: black;">aListing</span>.<span style="color: black;">nextLink</span>:
                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;nextLink = &quot;</span>+sl.<span style="color: black;">aListing</span>.<span style="color: black;">nextLink</span>
                url = sl.<span style="color: black;">aListing</span>.<span style="color: black;">nextLink</span>
        <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'all done.'</span>
                <span style="color: #ff7700;font-weight:bold;">break</span></pre></div></div>

<p>This is the file you run to kick off the scrape. It uses an MySQL IntegrityError  try/except block to pick out when an advert has already been entered into the database, this will throw an error because the URL of the advert is the primary key in the database. So no two records can have the same primary key.</p>
<p>The URL you provide it above gives you the starting page from which to scrape from.</p>
<p>The above code worked well for scraping several hundred Manchester Gumtree ads into a database, from which point I was able to use a combination of phpMyAdmin and OpenOffice Spreadsheet to analyse the data and find out useful statistics about the property market in said area.</p>
<p><center><a href="http://www.davidcraddock.net/uploads/gumtree-scraper.tgz">Download the scraper source code in a tar.gz archive</a></center></p>
<p>Note: Due to the nature of web scraping, if &#8211; or more accurately, when &#8211; Gumtree changes its user interface, the scraper I have written will need to be tweaked accordingly to find the right data. This is meant to be an informative tutorial, not a finished product.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/05/01/scraping-gumtree-property-adverts-with-python-and-beautifulsoup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RESTful Web Services</title>
		<link>http://www.davidcraddock.net/2011/03/02/restful-web-services/</link>
		<comments>http://www.davidcraddock.net/2011/03/02/restful-web-services/#comments</comments>
		<pubDate>Wed, 02 Mar 2011 14:21:23 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Tutorials]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=876</guid>
		<description><![CDATA[REST (Representational State Transfer) is a way of delivering web services. When a web service conforms to REST, it is known as RESTful. The largest RESTful web service is the Hypertext Transfer Protocol (HTTP) which you use every day to send and receive information from web servers while browsing the internet. To implement RESTful web [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.davidcraddock.net/wp-content/uploads/2011/03/hammock-200x300.jpg" alt="Hammock with the background of a clear blue sky" title="hammock" width="200" height="300" class="aligncenter size-medium wp-image-882" /></p>
<p>REST (Representational State Transfer) is a way of delivering web services. When a web service conforms to REST, it is known as RESTful. The largest RESTful web service is the Hypertext Transfer Protocol (HTTP) which you use every day to send and receive information from web servers while browsing the internet.</p>
<p>To implement RESTful web services,  you should implement four methods: GET, PUT, POST and DELETE. Resources on RESTful web services are typically defined as collections of elements. The REST methods can either act on a whole collection, or a specific element in a collection.</p>
<p>A collection is usually logically defined as a hierarchy on the URL, for example take this fictitious layout:</p>
<p><strong>Collection:</strong> www.bbc.co.uk/iplayer/programmes/<br />
<strong>Element:</strong> www.bbc.co.uk/iplayer/programmes/24<br />
<strong>Element:</strong> www.bbc.co.uk/iplayer/programmes/25<br />
<strong>Element:</strong> www.bbc.co.uk/iplayer/programmes/26</p>
<p>The REST methods you use do different things depending on whether you are interacting with a Collection resource or an Element resource. See below:</p>
<p><strong>On a Collection: ie: www.bbc.co.uk/iplayer/programmes/</strong><br/><br />
GET – Lists the URLs of the collection’s members.<br />
PUT – Replace the entire collection with another collection.<br />
POST – Create a new element in a collection, returning the new element’s URL.<br />
DELETE – Deletes the entire collection.</p>
<p><strong>On an Element: ie: www.bbc.co.uk/iplayer/programmes/24</strong><br/><br />
GET – Retrieve the addressed element in the appropriate internet media type, ie: music file or image<br />
PUT – Replace the addressed element of the collection, or if it doesn’t exist, create it in the parent collection.<br />
POST – Treat the addressed element of the collection as a new collection, and add an element into it.<br />
DELETE – Delete the addressed element of the collection.</p>
<p>REST is a simple and clear way of implementing the basic methods of data storage; CRUD (Create, Read, Update and Delete), see: <a href="http://en.wikipedia.org/wiki/Create,_read,_update_and_delete">http://en.wikipedia.org/wiki/Create,_read,_update_and_delete</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/03/02/restful-web-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8216;Weather Forecast&#8217; Calendar Service in PHP</title>
		<link>http://www.davidcraddock.net/2011/02/24/a-3-day-weather-forecast-calendar-service/</link>
		<comments>http://www.davidcraddock.net/2011/02/24/a-3-day-weather-forecast-calendar-service/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 19:31:48 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[bbc weather feed]]></category>
		<category><![CDATA[ical service]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[weather forecast]]></category>
		<category><![CDATA[web services]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=857</guid>
		<description><![CDATA[The BBC provide 3 day weather RSS feeds for most locations in the UK. I thought it would be interesting to create a web service to turn the weather feed into calendar feed format, so I could have a constantly updated forecast of the next 3 days of weather mapped on to my iPhone’s calendar. [...]]]></description>
			<content:encoded><![CDATA[<p>The BBC provide 3 day weather RSS feeds for most locations in the UK. I thought it would be interesting to create a web service to turn the weather feed into calendar feed format, so I could have a constantly updated forecast of the next 3 days of weather mapped on to my iPhone’s calendar. Here it is on my iPhone:</p>
<p><a href="http://www.davidcraddock.net/wp-content/uploads/2011/02/weathercal.png"><img src="http://www.davidcraddock.net/wp-content/uploads/2011/02/weathercal.png" alt="Picture shows weather forecast on an iPhone calendar screenshot" title="weathercal" width="320" height="480" class="aligncenter size-full wp-image-868" /></a></p>
<p><strong>Overview</strong></p>
<p>The service is separated into five files:</p>
<ul>
<li><b>ical.php</b> – this contains the class ical which corresponds to a single calendar feed. A method called ‘addevent’ allows you to add new events to the calendar, and a method called ‘returncal’ redirects the resulting calendar file to the browser so people can subscribe to it using their calendar application.</li>
<li><b>forecast.php</b> – this file contains the class forecast, which has properties for all aspects that we want to record for each day’s forecast, ie: Wind Speed and Humidity.  It also contains the forecast set, which is a collection of forecast objects. The set class is serializable, which means each forecast object can be stored in a text file, including the Wind Speed, Humidity and all other things we want to record for each day.</li>
<li><b>scrape-weather.php</b> – this file contains code that scrapes the weather feed, populates the forecast set with all the weather information for the next 3 days, and stores the result in a file called forecasts.ser.</li>
<li><b>forecasts.ser</b> – this is all the data for the three day weather forecast, in serialized format. It is automatically deleted and recreated when the scrape-weather.php script is run.</li>
<li><b>reader.php</b> – this file converts the forecasts.ser file into an iCal calendar, and outputs the iCal formatted result to the calendar application that accesses reader.php page.</li>
</ul>
<p>It uses two external libraries:</p>
<ul>
<li><b>MagpieRSS 0.72</b> – this popular library is used for reading the calendar RSS feed and converting it into a PHP object that is easier to manipulate by scrape-weather.php.</li>
<li><b>iCalcreator 2.8</b> – this is used for creating the output iCal format of the calendar in ical.php and outputting it to the browser in reader.php.</li>
</ul>
<p><strong>Files</strong></p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">// ical.php</span>
<span style="color: #b1b100;">require_once</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'ical/iCalcreator.class.php'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> ical <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$v</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> ical<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">init</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>	
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> init<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$config</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'unique_id'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'weather.davidcraddock.net'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// set Your unique id</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> vcalendar<span style="color: #009900;">&#40;</span> <span style="color: #000088;">$config</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// create a new calendar instance</span>
&nbsp;
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'method'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'PUBLISH'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// required of some calendar software</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot;x-wr-calname&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;Calendar Sample&quot;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// required of some calendar software</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot;X-WR-CALDESC&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;Calendar Description&quot;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// required of some calendar software</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">&quot;X-WR-TIMEZONE&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">&quot;Europe/London&quot;</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// required of some calendar software</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> addevent<span style="color: #009900;">&#40;</span><span style="color: #000088;">$start_year</span><span style="color: #339933;">,</span><span style="color: #000088;">$start_month</span><span style="color: #339933;">,</span><span style="color: #000088;">$start_day</span><span style="color: #339933;">,</span><span style="color: #000088;">$start_hour</span><span style="color: #339933;">,</span><span style="color: #000088;">$start_min</span><span style="color: #339933;">,</span>
		  <span style="color: #000088;">$finish_year</span><span style="color: #339933;">,</span><span style="color: #000088;">$finish_month</span><span style="color: #339933;">,</span><span style="color: #000088;">$finish_day</span><span style="color: #339933;">,</span><span style="color: #000088;">$finish_hour</span><span style="color: #339933;">,</span><span style="color: #000088;">$finish_min</span><span style="color: #339933;">,</span>
		  <span style="color: #000088;">$summary</span><span style="color: #339933;">,</span><span style="color: #000088;">$description</span><span style="color: #339933;">,</span><span style="color: #000088;">$comment</span>		
	<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$vevent</span> <span style="color: #339933;">=</span> <span style="color: #339933;">&amp;</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">newComponent</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'vevent'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// create an event calendar component</span>
		<span style="color: #000088;">$start</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'year'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$start_year</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'month'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$start_month</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'day'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$start_day</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'hour'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$start_hour</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'min'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$start_min</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'sec'</span><span style="color: #339933;">=&gt;</span><span style="color: #cc66cc;">0</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'dtstart'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$start</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$end</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'year'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$finish_year</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'month'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$finish_month</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'day'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$finish_day</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'hour'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$finish_hour</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'min'</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$finish_min</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'sec'</span><span style="color: #339933;">=&gt;</span><span style="color: #cc66cc;">0</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'dtend'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$end</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'LOCATION'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">''</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		  <span style="color: #666666; font-style: italic;">// property name - case independent</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'summary'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$summary</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'description'</span><span style="color: #339933;">,</span><span style="color: #000088;">$description</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'comment'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$comment</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$vevent</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">setProperty</span><span style="color: #009900;">&#40;</span> <span style="color: #0000ff;">'attendee'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'contact@davidcraddock.net'</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> returncal<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #666666; font-style: italic;">// redirect calendar file to browser</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">v</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">returnCalendar</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">//forecast.php</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> forecast <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$day</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$month</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$year</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$high</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$low</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$summary</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$humidity</span><span style="color: #339933;">;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$windspeed</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> forecast_set <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000088;">$forecasts</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> forecast_set<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">forecasts</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> ArrayObject<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #666666; font-style: italic;">// scrape-weather.php</span>
<span style="color: #b1b100;">require_once</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'magpierss/rss_fetch.inc'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">require_once</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'forecast.php'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> scrape3day <span style="color: #009900;">&#123;</span>
	<span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$set</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// forecast set</span>
&nbsp;
	<span style="color: #666666; font-style: italic;">// configuration variables</span>
&nbsp;
	<span style="color: #666666; font-style: italic;">// weather forecasts are stored in this file:</span>
	<span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$store_path</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;/home/david_craddock/work.davidcraddock.net/weather/forecasts.ser&quot;</span><span style="color: #339933;">;</span>
	<span style="color: #666666; font-style: italic;">// weather forecasts are fetched from this BBC feed:</span>
	<span style="color: #000000; font-weight: bold;">var</span> <span style="color: #000088;">$feed_url</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;http://newsrss.bbc.co.uk/weather/forecast/2376/Next3DaysRSS.xml&quot;</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> scrape3day<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">scrapecurrent</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">store</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> store<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$store_path</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">store_path</span><span style="color: #339933;">;</span>
		<span style="color: #990000;">unlink</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$store_path</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #990000;">file_put_contents</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$store_path</span><span style="color: #339933;">,</span> <span style="color: #990000;">serialize</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">set</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
&nbsp;
	<span style="color: #000000; font-weight: bold;">function</span> scrapecurrent<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		<span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">feed_url</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$rss</span> <span style="color: #339933;">=</span> fetch_rss<span style="color: #009900;">&#40;</span> <span style="color: #000088;">$url</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$message</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">sizeof</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$rss</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">items</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> <span style="color: #cc66cc;">3</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
			<span style="color: #990000;">die</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Problem with BBC weather feed.. dying&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span>
		<span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$set</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> forecast_set<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$curdate</span> <span style="color: #339933;">=</span> <span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Y-m-d&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">echo</span> <span style="color: #000088;">$curdate</span><span style="color: #339933;">;</span>
		<span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$rss</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">items</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
			<span style="color: #000088;">$href</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'link'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$title</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'title'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$description</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'description'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #990000;">print_r</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$item</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$curyear</span> <span style="color: #339933;">=</span> <span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Y'</span><span style="color: #339933;">,</span><span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Y-m-d&quot;</span><span style="color: #339933;">,</span> <span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$curdate</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot; +1 day&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$curmonth</span> <span style="color: #339933;">=</span> <span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'m'</span><span style="color: #339933;">,</span><span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Y-m-d&quot;</span><span style="color: #339933;">,</span> <span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$curdate</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot; +1 day&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$curday</span> <span style="color: #339933;">=</span> <span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'d'</span><span style="color: #339933;">,</span><span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Y-m-d&quot;</span><span style="color: #339933;">,</span> <span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$curdate</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot; +1 day&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/:.+?,/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$title</span><span style="color: #339933;">,</span><span style="color: #000088;">$summary</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/Min Temp:.+?-*\d*/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$title</span><span style="color: #339933;">,</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/Max Temp:.+?-*\d*/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$title</span><span style="color: #339933;">,</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/Wind Speed:.+?-*\d*/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$description</span><span style="color: #339933;">,</span><span style="color: #000088;">$windspeed</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/Humidity:.+?-*\d*/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$description</span><span style="color: #339933;">,</span><span style="color: #000088;">$humidity</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$summary</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">': '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$summary</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$summary</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">','</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$summary</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Min Temp: '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Max Temp: '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$windspeed</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Wind Speed: '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$windspeed</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$humidity</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Humidity: '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$humidity</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$mins</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>	
			<span style="color: #000088;">$maxs</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> forecast<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">low</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">high</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">year</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$curyear</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">month</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$curmonth</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">day</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$curday</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">windspeed</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$windspeed</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">humidity</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$humidity</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$forecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">summary</span> <span style="color: #339933;">=</span> <span style="color: #990000;">ucwords</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$summary</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$set</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">forecasts</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">append</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$forecast</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
			<span style="color: #000088;">$i</span><span style="color: #339933;">++;</span>	
			<span style="color: #000088;">$curdate</span> <span style="color: #339933;">=</span> <span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Y-m-d'</span><span style="color: #339933;">,</span><span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Y-m-d&quot;</span><span style="color: #339933;">,</span> <span style="color: #990000;">strtotime</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$curdate</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">.</span> <span style="color: #0000ff;">&quot; +1 day&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #009900;">&#125;</span>
		<span style="color: #990000;">print_r</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$set</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
		<span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">set</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$set</span><span style="color: #339933;">;</span>
&nbsp;
	<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #009900;">&#125;</span>
<span style="color: #000088;">$s</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> scrape3day<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #b1b100;">require_once</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'ical.php'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">require_once</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'forecast.php'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000088;">$c</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> ical<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$f</span> <span style="color: #339933;">=</span> <span style="color: #990000;">unserialize</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">file_get_contents</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'forecasts.ser'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span><span style="color: #000088;">$i</span><span style="color: #339933;">&lt;</span><span style="color: #cc66cc;">3</span><span style="color: #339933;">;</span><span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
	<span style="color: #000088;">$curforecast</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$f</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">forecasts</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$weather_digest</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;Max: &quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">high</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot; Min: &quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">low</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot; Humidity: &quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">humidity</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot;% Wind Speed: &quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">windspeed</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot;mph.&quot;</span><span style="color: #339933;">;</span>
	<span style="color: #000088;">$c</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">addevent</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">year</span><span style="color: #339933;">,</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">month</span><span style="color: #339933;">,</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">day</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">7</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">year</span><span style="color: #339933;">,</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">month</span><span style="color: #339933;">,</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">day</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">7</span><span style="color: #339933;">,</span><span style="color: #cc66cc;">30</span><span style="color: #339933;">,</span><span style="color: #000088;">$curforecast</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">summary</span><span style="color: #339933;">,</span><span style="color: #000088;">$weather_digest</span><span style="color: #339933;">,</span><span style="color: #000088;">$weather_digest</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000088;">$c</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">returncal</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p><strong>SVN Version</strong></p>
<p>If you have subversion, you can check out the project from: http://svn.davidcraddock.net/weather-services/. There are a couple extra files in that directory for my automated freezing weather alerts, but you can safely ignore those.</p>
<p><strong>Installation</strong></p>
<p>You will have to add this entry to your crontab to run once per day. You could set the script to run at midnight through adding the following:</p>
<pre>0 0 * * * &lt;path to PHP interpreter&gt; &lt;path to scrape-weather.php&gt;</pre>
<p>For example, in my case:</p>
<pre>0 0 * * * /usr/local/bin/php /home/david_craddock/work.davidcraddock.net/weather/scrape-weather.php </pre>
<p>You will then need to edit the contents of the $store_path and $feed_url variables in scrape-weather.php. Store_path should refer to a file path that the web server can create and edit files in, and feed_url should refer to the RSS feed of your local area that you have copied and pasted from the <a href="http://news.bbc.co.uk/weather/">http://news.bbc.co.uk/weather/</a> site, don&#8217;t use mine because your area is likely different. After that, you&#8217;re set to go.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/02/24/a-3-day-weather-forecast-calendar-service/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Find large files by using the OSX commandline</title>
		<link>http://www.davidcraddock.net/2011/02/22/find-large-files-by-using-the-osx-commandline/</link>
		<comments>http://www.davidcraddock.net/2011/02/22/find-large-files-by-using-the-osx-commandline/#comments</comments>
		<pubDate>Tue, 22 Feb 2011 00:16:12 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Solutions to a Specific Problem]]></category>
		<category><![CDATA[command line]]></category>
		<category><![CDATA[finding large files]]></category>
		<category><![CDATA[osx]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=852</guid>
		<description><![CDATA[To quickly find large files to delete if you have filled your startup disk, enter this command on the OSX terminal: sudo find / -size +500000 -print This will find and print out file paths to files over 500MB. You can then go through them and delete them individually by typing rm &#8220;&#60;file path&#62;&#8221;, although [...]]]></description>
			<content:encoded><![CDATA[<p>To quickly find large files to delete if you have filled your startup disk, enter this command on the OSX terminal:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> <span style="color: #c20cb9; font-weight: bold;">find</span> <span style="color: #000000; font-weight: bold;">/</span> <span style="color: #660033;">-size</span> +<span style="color: #000000;">500000</span> <span style="color: #660033;">-print</span></pre></div></div>

<p>This will find and print out file paths to files over 500MB. You can then go through them and delete them individually by typing <strong>rm &#8220;&lt;file path&gt;&#8221;</strong>, although there is no undelete so make sure you know you won&#8217;t miss them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/02/22/find-large-files-by-using-the-osx-commandline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding files in Linux modified between two dates</title>
		<link>http://www.davidcraddock.net/2011/02/16/finding-files-in-linux-modified-between-two-dates/</link>
		<comments>http://www.davidcraddock.net/2011/02/16/finding-files-in-linux-modified-between-two-dates/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 12:33:44 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Solutions to a Specific Problem]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=848</guid>
		<description><![CDATA[You use the &#8216;touch&#8217; command to create two blank files, with a last modified date that you specify &#8211; one with a date of the start of the range you want to specify, and the second with a date at the end of the range you want to specify. Then you reference to those two [...]]]></description>
			<content:encoded><![CDATA[<p>You use the &#8216;touch&#8217; command to create two blank files, with a last modified date that you specify &#8211; one with a date of the start of the range you want to specify, and the second with a date at the end of the range you want to specify. Then you reference to those two files in your find command:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">touch</span> <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>temp <span style="color: #660033;">-t</span> <span style="color: #000000;">200604141130</span>
<span style="color: #c20cb9; font-weight: bold;">touch</span> <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>ntemp <span style="color: #660033;">-t</span> <span style="color: #000000;">200604261630</span>
<span style="color: #c20cb9; font-weight: bold;">find</span> <span style="color: #000000; font-weight: bold;">/</span>data<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #660033;">-cnewer</span> <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>temp <span style="color: #660033;">-and</span> <span style="color: #000000; font-weight: bold;">!</span> <span style="color: #660033;">-cnewer</span> <span style="color: #000000; font-weight: bold;">/</span>tmp<span style="color: #000000; font-weight: bold;">/</span>ntemp</pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/02/16/finding-files-in-linux-modified-between-two-dates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing simple email alerts in PHP with MagpieRSS</title>
		<link>http://www.davidcraddock.net/2011/02/12/writing-simple-email-alerts-in-php-with-magpierss/</link>
		<comments>http://www.davidcraddock.net/2011/02/12/writing-simple-email-alerts-in-php-with-magpierss/#comments</comments>
		<pubDate>Sat, 12 Feb 2011 20:01:13 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Solutions to a Specific Problem]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=842</guid>
		<description><![CDATA[I wrote an email alerter that sends me an email whenever the upcoming temperature may dip below freezing. It uses the Magpie RSS reader to pull down a 3 day weather forecast that is provided for my area in RSS form by the BBC weather site. It then parses this forecast and determines if either [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote an email alerter that sends me an email whenever the upcoming temperature may dip below freezing. It uses the <a href="http://magpierss.sourceforge.net/">Magpie RSS reader</a> to pull down a 3 day weather forecast that is provided for my area in RSS form by the BBC weather site. It then parses this forecast and determines if either today&#8217;s or tomorrow&#8217;s weather may dip below freezing. If it might, it sends an email to my email address to warn me. </p>
<p>I scheduled this script to run every day by adding it as a <a href="http://en.wikipedia.org/wiki/Cron">daily cron job</a> on my web host. You can set this up for any web hosts that support cron jobs.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #b1b100;">require_once</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'magpierss/rss_fetch.inc'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #000088;">$url</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;http://newsrss.bbc.co.uk/weather/forecast/2376/Next3DaysRSS.xml&quot;</span><span style="color: #339933;">;</span>
        <span style="color: #000088;">$rss</span> <span style="color: #339933;">=</span> fetch_rss<span style="color: #009900;">&#40;</span> <span style="color: #000088;">$url</span> <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000088;">$message</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">sizeof</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$rss</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">items</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> <span style="color: #cc66cc;">3</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$message</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">'Error: problem parsing BBC weather feed'</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #000088;">$i</span><span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">foreach</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$rss</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">items</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$href</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'link'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$title</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$item</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'title'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/Min Temp:.+?-*\d*/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$title</span><span style="color: #339933;">,</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                <span style="color: #990000;">preg_match</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'/Max Temp:.+?-*\d*/'</span><span style="color: #339933;">,</span><span style="color: #000088;">$title</span><span style="color: #339933;">,</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Min Temp: '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">str_replace</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Max Temp: '</span><span style="color: #339933;">,</span><span style="color: #0000ff;">''</span><span style="color: #339933;">,</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$mins</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$mintemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$maxs</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>int<span style="color: #009900;">&#41;</span><span style="color: #000088;">$maxtemp</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$i</span><span style="color: #339933;">++;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// freezing warnings</span>
&nbsp;
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$mins</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$message</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot;Today's temperature in W3 may go below freezing, anything down to &quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$mins</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$mins</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$message</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">&quot;Tommorow's temperature in W3 may go below freezing, anything down to &quot;</span><span style="color: #339933;">.</span><span style="color: #000088;">$mins</span><span style="color: #009900;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$message</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$to</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;contact@davidcraddock.net&quot;</span><span style="color: #339933;">;</span>
                <span style="color: #000088;">$subject</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;Freezing weather alert for &quot;</span> <span style="color: #339933;">.</span> <span style="color: #990000;">date</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'l jS \of F'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                <span style="color: #990000;">mail</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$to</span><span style="color: #339933;">,</span><span style="color: #000088;">$subject</span><span style="color: #339933;">,</span><span style="color: #000088;">$message</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>You can right click on this link and &#8216;save as&#8217; to <a href="http://svn.davidcraddock.net/weather-services/freezing.php">download the script</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/02/12/writing-simple-email-alerts-in-php-with-magpierss/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reverting back to a previous version in CVS &#8211; the magic &#8220;undo&#8221; feature</title>
		<link>http://www.davidcraddock.net/2011/01/28/reverting-back-to-a-previous-version-in-cvs-the-magic-undo-feature/</link>
		<comments>http://www.davidcraddock.net/2011/01/28/reverting-back-to-a-previous-version-in-cvs-the-magic-undo-feature/#comments</comments>
		<pubDate>Fri, 28 Jan 2011 16:27:43 +0000</pubDate>
		<dc:creator>David Craddock</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.davidcraddock.net/?p=836</guid>
		<description><![CDATA[If you&#8217;ve committed some code into to CVS, and made a mistake on that commit, you will want to know how to revert to a previously saved version. Here is the command line command for CLI versions of CVS: $ cvs update -D '1 week ago' Run this command in the main directory of your [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve committed some code into to CVS, and made a mistake on that commit, you will want to know how to revert to a previously saved version. Here is the command line command for CLI versions of CVS:</p>
<pre>
$ cvs update -D '1 week ago'
</pre>
<p>Run this command in the main directory of your checked out working copy. This will revert your working copy to the version of the code that was checked in &#8217;1 week ago&#8217; from the present date. You also use commands like &#8220;1 day ago&#8221; and &#8220;5 days ago&#8221;.</p>
<p>Then simply commit the changes with a log message:</p>
<pre>
$ cvs commit -m "Oops! Made a mistake, had to revert back to the 21/1/2011 version"
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.davidcraddock.net/2011/01/28/reverting-back-to-a-previous-version-in-cvs-the-magic-undo-feature/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

