Tutorials

...now browsing by category

Detailed tutorials on how to accomplish a specific task.

 

Scraping Gumtree Property Adverts with Python and BeautifulSoup

Sunday, May 1st, 2011

I am moving to Manchester soon, and so I thought I’d get an idea of the housing market there by scraping all the Manchester Gumtree property adverts into a MySQL database. Once in the database, I could do things like find the average monthly price for a 2 bedroom flat in an area, and spot bargains through using standard deviation from the mean on the price through using simple SQL queries via phpMyAdmin.

I really like the Python library BeautifulSoup for writing scrapers, there is also a Java version called JSoup. BeautifulSoup does a really good job of tolerating markup mistakes in the input data, and transforms a page into a tree structure that is easy to work with.

I chose the following layout for the program:

advert.py – Stores all information about each property advert, with a ‘save’ method that inserts the data into the mysql database
listing.py – Stores all the information on each listing page, which is broken down into links for specific adverts, and also the link to the next listing page in the sequence (ie: the ‘next page’ link)
scrapeAdvert.py – When given an advert URL, this creates and populates an advert object
scrapeListing.py – When given a listing URL, this creates and populates a listing object
scrapeSequence.py – This walks through a series of listings, calling scrapeListing and scrapeAdvert for all of them, and finishes when there are no more listings in the sequence to scrape

Here is the MySQL table I created for this project (which you will have to setup if you want to run the scraper):

--
-- Database: `manchester`
--
 
-- --------------------------------------------------------
 
--
-- Table structure for table `adverts`
--
 
CREATE TABLE IF NOT EXISTS `adverts` (
  `url` VARCHAR(255) NOT NULL,
  `title` text NOT NULL,
  `pricePW` INT(10) UNSIGNED NOT NULL,
  `pricePCM` INT(11) NOT NULL,
  `location` text NOT NULL,
  `dateAvailable` DATE NOT NULL,
  `propertyType` text NOT NULL,
  `bedroomNumber` INT(11) NOT NULL,
  `description` text NOT NULL,
  PRIMARY KEY (`url`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

PricePCM is price per calendar month, PricePW is price per week. Usually each advert with have one or the other specified.

advert.py:

import MySQLdb
import chardet
import sys
 
class advert:
 
        url = ""
        title = ""
        pricePW = 0
        pricePCM = 0
        location = ""
        dateAvailable = ""
        propertyType = ""
        bedroomNumber = 0
        description = ""
 
        def save(self):
                # you will need to change the following to match your mysql credentials:
                db=MySQLdb.connect("localhost","root","secret","manchester")
                c=db.cursor()
 
                self.description = unicode(self.description, errors='replace')
                self.description = self.description.encode('ascii','ignore')
                # TODO: might need to convert the other strings in the advert if there are any unicode conversetion errors
 
                sql = "INSERT INTO adverts (url,title,pricePCM,pricePW,location,dateAvailable,propertyType,bedroomNumber,description) VALUES('"+self.url+"','"+self.title+"',"+str(self.pricePCM)+","+str(self.pricePW)+",'"+self.location+"','"+self.dateAvailable+"','"+self.propertyType+"',"+str(self.bedroomNumber)+",'"+self.description+"' )"
 
                c.execute(sql)

In advert.py we convert the unicode output that BeautifulSoup gives us into plain ASCII so that we can put it in the MySQL database without any problems. I could have used Unicode in the database as well, but the chances of really needing Unicode for representing Gumtree ads is quite slim. If you intend to use this code then you will also want to enter the MySQL credentials for your database.

listing.py:

class listing:
 
        url=""
        adverturls=[]
        nextLink=""
 
 
        def addAdvertURL(self,url):
 
                self.adverturls.append(url)

scrapeAdvert.py:

from BeautifulSoup import BeautifulSoup          # For processing HTML
import urllib2
from advert import advert
import time
 
class scrapeAdvert:
 
        page = ""
        soup = ""
 
        def scrape(self,advertURL):
 
                # give it a bit of time so gumtree doesn't
                # ban us
                time.sleep(2)
 
                url = advertURL
                # print "-- scraping "+url+" --"
                page = urllib2.urlopen(url)
                self.soup = BeautifulSoup(page)
 
                self.anAd = advert()
 
                self.anAd.url = url
                self.anAd.title = self.extractTitle()
                self.anAd.pricePW = self.extractPricePW()
                self.anAd.pricePCM = self.extractPricePCM()
 
                self.anAd.location = self.extractLocation()
                self.anAd.dateAvailable = self.extractDateAvailable()
                self.anAd.propertyType = self.extractPropertyType()
                self.anAd.bedroomNumber = self.extractBedroomNumber()
                self.anAd.description = self.extractDescription()
 
        def extractTitle(self):
 
                location = self.soup.find('h1')
                string = location.contents[0]
                stripped = ' '.join(string.split())
                stripped = stripped.replace("'",'"')
                # print '|' + stripped + '|'
                return stripped
 
 
        def extractPricePCM(self):
 
                location = self.soup.find('span',attrs={"class" : "price"})
                try:
                        string = location.contents[0]
                        string.index('pcm')
                except AttributeError: # for ads with no prices set
                        return 0
                except ValueError: # for ads with pw specified
                        return 0
 
                stripped = string.replace('£','')
                stripped = stripped.replace('pcm','')
                stripped = stripped.replace(',','')
                stripped = stripped.replace("'",'"')
                stripped = ' '.join(stripped.split())
                # print '|' + stripped + '|'
                return int(stripped)
 
        def extractPricePW(self):
 
                location = self.soup.find('span',attrs={"class" : "price"})
                try:
                        string = location.contents[0]
                        string.index('pw')
                except AttributeError: # for ads with no prices set
                        return 0
                except ValueError: # for ads with pcm specified
                        return 0
                stripped = string.replace('£','')
                stripped = stripped.replace('pw','')
                stripped = stripped.replace(',','')
                stripped = stripped.replace("'",'"')
                stripped = ' '.join(stripped.split())
                # print '|' + stripped + '|'
                return int(stripped)
 
        def extractLocation(self):
 
                location = self.soup.find('span',attrs={"class" : "location"})
                string = location.contents[0]
                stripped = ' '.join(string.split())
                stripped = stripped.replace("'",'"')
                # print '|' + stripped + '|'
                return stripped
 
        def extractDateAvailable(self):
 
                current_year = '2011'
 
                ul = self.soup.find('ul',attrs={"id" : "ad-details"})
                firstP = ul.findAll('p')[0]
                string = firstP.contents[0]
                stripped = ' '.join(string.split())
                date_to_convert = stripped + '/'+current_year
                try:
                        date_object = time.strptime(date_to_convert, "%d/%m/%Y")
                except ValueError: # for adverts with no date available
                        return ""
 
                full_date = time.strftime('%Y-%m-%d %H:%M:%S', date_object)
                # print '|' + full_date + '|'
                return full_date
 
        def extractPropertyType(self):
 
                ul = self.soup.find('ul',attrs={"id" : "ad-details"})
                try:
                        secondP = ul.findAll('p')[1]
                except IndexError: # for properties with no type
                        return ""
                string = secondP.contents[0]
                stripped = ' '.join(string.split())
                stripped = stripped.replace("'",'"')
                # print '|' + stripped + '|'
                return stripped
 
        def extractBedroomNumber(self):
 
                ul = self.soup.find('ul',attrs={"id" : "ad-details"})
                try:
                        thirdP = ul.findAll('p')[2]
                except IndexError: # for properties with no bedroom number
                        return 0
                string = thirdP.contents[0]
                stripped = ' '.join(string.split())
                stripped = stripped.replace("'",'"')
                # print '|' + stripped + '|'
                return stripped
 
 
        def extractDescription(self):
 
                div = self.soup.find('div',attrs={"id" : "description"})
                description = div.find('p')
                contents = description.renderContents()
                contents = contents.replace("'",'"')
                # print '|' + contents + '|'
                return contents

In scrapeAdvert.py there are a lot of string manipulation statements to pull out any unwanted characters, such as the ‘pw’ characters (short for per week) found in the price string, which we need to remove in order to store the property price per week as an integer.

Using BeautifulSoup to pull out elements is quite easy, for example:

ul = self.soup.find('ul',attrs={"id" : "ad-details"})

That finds all the HTML elements under <ul id=”ad-details”>, so all the list elements in that list. More detail can be found in the Beautiful Soup documentation which is very good.

scrapeListing.py:

from BeautifulSoup import BeautifulSoup          # For processing HTML
import urllib2
from listing import listing
import time
 
class scrapeListing:
 
        soup = ""
        url = ""
        aListing = ""
 
        def scrape(self,url):
                # give it a bit of time so gumtree doesn't
                # ban us
                time.sleep(3)
 
                print "scraping url = "+str(url)
 
                page = urllib2.urlopen(url)
                self.soup = BeautifulSoup(page)
 
                self.aListing = listing()
                self.aListing.url = url
                self.aListing.adverturls = self.extractAdvertURLs()
                self.aListing.nextLink = self.extractNextLink()
 
        def extractAdvertURLs(self):
 
                toReturn = []
                h3s = self.soup.findAll("h3")
                for h3 in h3s:
                        links = h3.findAll('a',{"class":"summary"})
                        for link in links:
                                print "|"+link['href']+"|"
                                toReturn.append(link['href'])
 
                return toReturn
 
        def extractNextLink(self):
 
                links = self.soup.findAll("a",{"class":"next"})
                try:
                        print ">"+links[0]['href']+">"
                except IndexError: # if there is no 'next' link found..
                        return ""
                return links[0]['href']

The extractNextLink method here extracts the pagination ‘next’ link which will bring up the next listing page from the selection of listing pages to browse. We use it to step through the pagination ‘sequence’ of resultant listing pages.

scrapeSequence.py:

from scrapeListing import scrapeListing
from scrapeAdvert import scrapeAdvert
from listing import listing
from advert import advert
import MySQLdb
import _mysql_exceptions
 
# change this to the gumtree page you want to start scraping from
url = "http://www.gumtree.com/flats-and-houses-for-rent/salford-quays"
 
while url != None:
        print "scraping URL = "+url
        sl = ""
        sl = scrapeListing()
        sl.scrape(url)
        for advertURL in sl.aListing.adverturls:
                sa = ""
                sa = scrapeAdvert()
                sa.scrape(advertURL)
                try:
                        sa.anAd.save()
                except _mysql_exceptions.IntegrityError:
                        print "** Advert " + sa.anAd.url + " already saved **"
                sa.onAd = ""
 
        url = ""
        if sl.aListing.nextLink:
                print "nextLink = "+sl.aListing.nextLink
                url = sl.aListing.nextLink
        else:
                print 'all done.'
                break

This is the file you run to kick off the scrape. It uses an MySQL IntegrityError try/except block to pick out when an advert has already been entered into the database, this will throw an error because the URL of the advert is the primary key in the database. So no two records can have the same primary key.

The URL you provide it above gives you the starting page from which to scrape from.

The above code worked well for scraping several hundred Manchester Gumtree ads into a database, from which point I was able to use a combination of phpMyAdmin and OpenOffice Spreadsheet to analyse the data and find out useful statistics about the property market in said area.

Download the scraper source code in a tar.gz archive

Note: Due to the nature of web scraping, if – or more accurately, when – Gumtree changes its user interface, the scraper I have written will need to be tweaked accordingly to find the right data. This is meant to be an informative tutorial, not a finished product.

RESTful Web Services

Wednesday, March 2nd, 2011

Hammock with the background of a clear blue sky

REST (Representational State Transfer) is a way of delivering web services. When a web service conforms to REST, it is known as RESTful. The largest RESTful web service is the Hypertext Transfer Protocol (HTTP) which you use every day to send and receive information from web servers while browsing the internet.

To implement RESTful web services, you should implement four methods: GET, PUT, POST and DELETE. Resources on RESTful web services are typically defined as collections of elements. The REST methods can either act on a whole collection, or a specific element in a collection.

A collection is usually logically defined as a hierarchy on the URL, for example take this fictitious layout:

Collection: www.bbc.co.uk/iplayer/programmes/
Element: www.bbc.co.uk/iplayer/programmes/24
Element: www.bbc.co.uk/iplayer/programmes/25
Element: www.bbc.co.uk/iplayer/programmes/26

The REST methods you use do different things depending on whether you are interacting with a Collection resource or an Element resource. See below:

On a Collection: ie: www.bbc.co.uk/iplayer/programmes/

GET – Lists the URLs of the collection’s members.
PUT – Replace the entire collection with another collection.
POST – Create a new element in a collection, returning the new element’s URL.
DELETE – Deletes the entire collection.

On an Element: ie: www.bbc.co.uk/iplayer/programmes/24

GET – Retrieve the addressed element in the appropriate internet media type, ie: music file or image
PUT – Replace the addressed element of the collection, or if it doesn’t exist, create it in the parent collection.
POST – Treat the addressed element of the collection as a new collection, and add an element into it.
DELETE – Delete the addressed element of the collection.

REST is a simple and clear way of implementing the basic methods of data storage; CRUD (Create, Read, Update and Delete), see: http://en.wikipedia.org/wiki/Create,_read,_update_and_delete

‘Weather Forecast’ Calendar Service in PHP

Thursday, February 24th, 2011

The BBC provide 3 day weather RSS feeds for most locations in the UK. I thought it would be interesting to create a web service to turn the weather feed into calendar feed format, so I could have a constantly updated forecast of the next 3 days of weather mapped on to my iPhone’s calendar. Here it is on my iPhone:

Picture shows weather forecast on an iPhone calendar screenshot

Overview

The service is separated into five files:

  • ical.php – this contains the class ical which corresponds to a single calendar feed. A method called ‘addevent’ allows you to add new events to the calendar, and a method called ‘returncal’ redirects the resulting calendar file to the browser so people can subscribe to it using their calendar application.
  • forecast.php – this file contains the class forecast, which has properties for all aspects that we want to record for each day’s forecast, ie: Wind Speed and Humidity. It also contains the forecast set, which is a collection of forecast objects. The set class is serializable, which means each forecast object can be stored in a text file, including the Wind Speed, Humidity and all other things we want to record for each day.
  • scrape-weather.php – this file contains code that scrapes the weather feed, populates the forecast set with all the weather information for the next 3 days, and stores the result in a file called forecasts.ser.
  • forecasts.ser – this is all the data for the three day weather forecast, in serialized format. It is automatically deleted and recreated when the scrape-weather.php script is run.
  • reader.php – this file converts the forecasts.ser file into an iCal calendar, and outputs the iCal formatted result to the calendar application that accesses reader.php page.

It uses two external libraries:

  • MagpieRSS 0.72 – this popular library is used for reading the calendar RSS feed and converting it into a PHP object that is easier to manipulate by scrape-weather.php.
  • iCalcreator 2.8 – this is used for creating the output iCal format of the calendar in ical.php and outputting it to the browser in reader.php.

Files

<?php
// ical.php
require_once( 'ical/iCalcreator.class.php' );
 
class ical {
	public $v;
 
	function ical(){
		$this->init();
	}	
 
	function init(){
		$config = array( 'unique_id' => 'weather.davidcraddock.net' );
		  // set Your unique id
		$this->v = new vcalendar( $config );
		  // create a new calendar instance
 
		$this->v->setProperty( 'method', 'PUBLISH' );
		  // required of some calendar software
		$this->v->setProperty( "x-wr-calname", "Calendar Sample" );
		  // required of some calendar software
		$this->v->setProperty( "X-WR-CALDESC", "Calendar Description" );
		  // required of some calendar software
		$this->v->setProperty( "X-WR-TIMEZONE", "Europe/London" );
		  // required of some calendar software
	}
 
	function addevent($start_year,$start_month,$start_day,$start_hour,$start_min,
		  $finish_year,$finish_month,$finish_day,$finish_hour,$finish_min,
		  $summary,$description,$comment		
	){
		$vevent = & $this->v->newComponent( 'vevent' );
		  // create an event calendar component
		$start = array( 'year'=>$start_year, 'month'=>$start_month, 'day'=>$start_day, 'hour'=>$start_hour, 'min'=>$start_min, 'sec'=>0 );
		$vevent->setProperty( 'dtstart', $start );
		$end = array( 'year'=>$finish_year, 'month'=>$finish_month, 'day'=>$finish_day, 'hour'=>$finish_hour, 'min'=>$finish_min, 'sec'=>0 );
		$vevent->setProperty( 'dtend', $end );
		$vevent->setProperty( 'LOCATION', '' );
		  // property name - case independent
		$vevent->setProperty( 'summary', $summary );
		$vevent->setProperty( 'description',$description );
		$vevent->setProperty( 'comment', $comment );
		$vevent->setProperty( 'attendee', 'contact@davidcraddock.net' );
	}
 
	function returncal(){
		// redirect calendar file to browser
		$this->v->returnCalendar();
	}
}
?>
<?php
//forecast.php
 
class forecast {
	public $day;
	public $month;
	public $year;
 
	public $high;
	public $low;
	public $summary;
 
	public $humidity;
	public $windspeed;
}
 
class forecast_set {
	public $forecasts;
 
	function forecast_set(){
		$this->forecasts = new ArrayObject();
	}
}
<?php
// scrape-weather.php
require_once('magpierss/rss_fetch.inc');
require_once('forecast.php');
 
class scrape3day {
	var $set; // forecast set
 
	// configuration variables
 
	// weather forecasts are stored in this file:
	var $store_path = "/home/david_craddock/work.davidcraddock.net/weather/forecasts.ser";
	// weather forecasts are fetched from this BBC feed:
	var $feed_url = "http://newsrss.bbc.co.uk/weather/forecast/2376/Next3DaysRSS.xml";
 
	function scrape3day(){
		$this->scrapecurrent();
		$this->store();
	}
 
	function store(){
		$store_path = $this->store_path;
		unlink($store_path);
		file_put_contents($store_path, serialize($this->set));
	}
 
	function scrapecurrent(){
		$url = $this->feed_url;
		$rss = fetch_rss( $url );
		$message = "";
		if(sizeof($rss->items) != 3){
			die("Problem with BBC weather feed.. dying");
		}
		$i=0;
		$set = new forecast_set();
		$curdate = date("Y-m-d");
		echo $curdate;
		foreach ($rss->items as $item) {
			$href = $item['link'];
			$title = $item['title'];
			$description = $item['description'];
			print_r($item);
			$curyear = date('Y',strtotime(date("Y-m-d", strtotime($curdate)) . " +1 day"));
			$curmonth = date('m',strtotime(date("Y-m-d", strtotime($curdate)) . " +1 day"));
			$curday = date('d',strtotime(date("Y-m-d", strtotime($curdate)) . " +1 day"));
			preg_match('/:.+?,/',$title,$summary);
			preg_match('/Min Temp:.+?-*\d*/',$title,$mintemp);
			preg_match('/Max Temp:.+?-*\d*/',$title,$maxtemp);
			preg_match('/Wind Speed:.+?-*\d*/',$description,$windspeed);
			preg_match('/Humidity:.+?-*\d*/',$description,$humidity);
			$summary[0] = str_replace(': ','',$summary[0]);
			$summary[0] = str_replace(',','',$summary[0]);
			$mintemp[0] = str_replace('Min Temp: ','',$mintemp[0]);
			$maxtemp[0] = str_replace('Max Temp: ','',$maxtemp[0]);
			$windspeed[0] = str_replace('Wind Speed: ','',$windspeed[0]);
			$humidity[0] = str_replace('Humidity: ','',$humidity[0]);
			$mins[$i] = (int)$mintemp[0];	
			$maxs[$i] = (int)$maxtemp[0];
			$forecast = new forecast();
			$forecast->low = (int)$mintemp[0];
			$forecast->high = (int)$maxtemp[0];
			$forecast->year = (int)$curyear;
			$forecast->month = (int)$curmonth;
			$forecast->day = (int)$curday;
			$forecast->windspeed = $windspeed[0];
			$forecast->humidity = $humidity[0];
			$forecast->summary = ucwords($summary[0]);
			$set->forecasts->append($forecast);
			$i++;	
			$curdate = date('Y-m-d',strtotime(date("Y-m-d", strtotime($curdate)) . " +1 day"));
		}
		print_r($set);
		$this->set = $set;
 
	}
 
}
$s = new scrape3day();
<?php
require_once('ical.php');
require_once('forecast.php');
 
$c = new ical();
$f = unserialize(file_get_contents('forecasts.ser'));
for($i=0;$i<3;$i++){
	$curforecast = $f->forecasts[$i];
	$weather_digest = "Max: ".$curforecast->high." Min: ".$curforecast->low." Humidity: ".$curforecast->humidity."% Wind Speed: ".$curforecast->windspeed."mph.";
	$c->addevent($curforecast->year,$curforecast->month,$curforecast->day,7,0,$curforecast->year,$curforecast->month,$curforecast->day,7,30,$curforecast->summary,$weather_digest,$weather_digest);
}
$c->returncal();
?>

SVN Version

If you have subversion, you can check out the project from: http://svn.davidcraddock.net/weather-services/. There are a couple extra files in that directory for my automated freezing weather alerts, but you can safely ignore those.

Installation

You will have to add this entry to your crontab to run once per day. You could set the script to run at midnight through adding the following:

0 0 * * * <path to PHP interpreter> <path to scrape-weather.php>

For example, in my case:

0 0 * * * /usr/local/bin/php /home/david_craddock/work.davidcraddock.net/weather/scrape-weather.php 

You will then need to edit the contents of the $store_path and $feed_url variables in scrape-weather.php. Store_path should refer to a file path that the web server can create and edit files in, and feed_url should refer to the RSS feed of your local area that you have copied and pasted from the http://news.bbc.co.uk/weather/ site, don’t use mine because your area is likely different. After that, you’re set to go.

Restoring Ubuntu 10.4′s Bootloader, after a Windows 7 Install

Tuesday, July 13th, 2010

I installed Windows 7 after I had installed Ubuntu 10.4. Windows 7 overwrote the Linux bootloader “grub” on my master boot record. Therefore I had to restore it.

I used the Ubuntu 10.4 LiveCD to start up a live version of Ubuntu. While under the LiveCD, I then restored the Grub bootloader by chrooting into my old install, using the linux command line. This is a fairly complex thing to do, and so I recommend you use this approach only if you’re are confident with the linux command line:

(as root under Ubuntu's LiveCD)

# prepare chroot directory

mkdir /chroot
d=/chroot

# mount my linux partition

mount /dev/sda1 $d   # my linux partition was installed on my first SATA hard disk, on the first parition (hence sdA1).

# mount system systems inside the new chroot directory

mount -o bind /dev $d/dev
mount -o bind /sys $d/sys
mount -o bind /dev/shm $d/dev/shm
mount -o bind /proc $d/proc

# accomplish the chroot

chroot $d

# proceed to update the grub config file to include the option to boot into my new windows 7 install

update-grub

# install grub with the new configuration options from the config file, to the master boot record on my first hard disk

grub-install /dev/sda

# close down the liveCD instance of linux, and boot from the newly restored grub bootloader

reboot

Ripping Movies onto the iPhone

Monday, May 17th, 2010

I’m currently watching Persepolis, the 2008 animated film about a tomboy anarchist growing up in Iran. I’m watching this on my new iPhone 3GS, and the picture and audio quality is very good.

Here’s what I used to convert my newly bought Persepolis DVD, for watching on the iPhone.

1x Macbook (but you can use any intel mac)
1x iTunes
1x RipIt – Commercial Mac DVD Ripper (rips up to 10 DVDs on the free trial, $20 after)
1x Handbrake 32 – Freely available transcoder
1x VLC 32 – Freely available media player
1x DVD

* Ripit – rips the video and audio from the DVD, onto your computer
* Handbrake 32 – ‘transcodes’ the ripped video and audio, meaning – it converts it into an iPhone compatible video file.
* VLC 32 – is used by Handbrake 32 to get past any problems with converting the media.

Go to the following sites to fetch the software:

1. Ripit – http://thelittleappfactory.com/ripit/
2. Handbrake 32 – http://handbrake.fr/downloads.php (get the 32 bit version)
3. VLC 32 – http://www.videolan.org/vlc/download-macosx.html (be sure to get the 32 bit version)

There’s currently a difficulty in getting the VLC 64 bit software for the Mac, and so although the 64 bit version is faster to use, you’re probably better off with 32 bit versions of both for now.

The Process

1) Rip the DVD.

Start RipIt. It will ask for a DVD, insert the DVD.. and point the resultant save location to the desktop. The ripping process takes about 40 minutes on my Macbook, you can check the progress by looking at the icon in the dock – it will be updated with the percentage of progress until completion. You can do other things on your mac while it’s ripping, even though the DVD drive will be occupied. Wait until it’s completed before continuing.

2) Transcode (convert) the ripped video file for use on the iPhone.

Start Handbrake. There are a bunch of transcoding settings called presets – those tell Handbrake what type of media player you want the converted video to work on. In handbrake on the right section of the window, select the iPhone preset. Then go to the file menu, select ‘Open’, and then select the video file that RipIt saved onto your desktop. Then select the destination for the converted video file. Then select the Start (green) button on Handbrake window, and it will start. You can now minimise handbrake and do other things. The transcoding process depends on the film, but takes about an hour on my Macbook. You can check on progress by maximizing the Handbrake window, and checking on the progress bar.

3) Move the converted video file onto your iPhone.

Once that’s done, you will have another media file on your desktop – this is the end result, a video file that will play on your iPhone. Simply connect your iPhone to your Mac, start up iTunes, and drag that file from your desktop into the iPhone icon on your iTunes window. It will take a couple of minutes to transfer, then eject the iPhone as normal

Now you can watch this new movie on your iPhone by going to the ‘Videos’ tab of your iPod app.