July 6th, 2008 by
David Craddock
For some reason, under ubuntu-server, my default MySQL 5 character encoding was latin1. This caused no end of problems with grabbing data from the web, which was not neccesarily in latin1 characterset.
If you are ever in this situation, I suggest you handle everything as UTF-8. That means setting the following lines in my.cnf:
1
2
3
4
| [mysqld]
..
default-character-SET=utf8
skip-character-set-client-handshake |
If you already have tables in your database that you have created, and they have defaulted to the latin1 charset, you’ll be able to tell by looking at the mysqldump SQL:
1
2
3
4
| DROP TABLE IF EXISTS `ARTISTS`;
CREATE TABLE `ARTISTS` (
.. some col declarations..
) ENGINE=MyISAM AUTO_INCREMENT=4519 DEFAULT CHARSET=latin1; |
See here this artists table has been set to default charset of latin1 by mysql. This is bad. So what I recommend is:
1. Dump the full database structure + data to a file using mysqldump
2. Substitute ‘latin1′ for ‘utf8′ universally on that file using your favourite text editor
3. Import the resultant file into mysql using the mysql -uroot -p -Dyourdb < dump.sql method
Then everything will be in utf8, and your character encoding issues will be solved 
Posted in Uncategorized |
No Comments »
June 18th, 2008 by
David Craddock
I’ve been hacking away at BrightonSound.com and I’ve been looking for a way of automatically sourcing biographical information from artists, so that visitors are presented with more information on the event.
The Songbird media player plugin ‘mashTape’ draws upon a number of web services to grab artist bio, event listings, youtube vidoes and flickr pictures of the currently playing artist. I was reading through the mashTape code, and then found this posting by its developer, which helpfully provided the exact method I needed.
I then hacked up two versions of the code, a PHP version using simpleXML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| <?php
function grabwiki($band){
$band = urlencode($band);
$yahoourl = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?".
"appi d=YahooDemo&query=%22$band%22%20music&site=wikipedia.org";
$x = file_get_contents($yahoourl);
$s = new SimpleXMLElement($x);
$ar = split('/',$s->Result->Url);
if($ar[2] == 'en.wikipedia.org'){
$wikikey = $ar[4]; // more than likely to be the wikipedia page
}else{
return ""; // nothing on wikipediea
}
$url = "http://dbpedia.org/data/$wikikey";
$x = file_get_contents($url);
$s = new SimpleXMLElement($x);
$b = $s->xpath("//p:abstract[@xml:lang='en']");
return $b[0];
}
?> |
and a pythonic version using the amara XML library (has to be installed seperately):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
| import amara
import urllib2
from urllib import urlencode
def getwikikey(band):
url = "http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=%22"+band+"%22&site=wikipedia.org";
print url
c=urllib2.urlopen(url)
f=c.read()
doc = amara.parse(f)
url = str(doc.ResultSet.Result[0].Url)
return url.split('/')[4]
def uurlencode(text):
"""single URL-encode a given 'text'. Do not return the 'variablename=' portion."""
blah = urlencode({'u':text})
blah = blah[2:]
return blah
def getwikibio(key):
url = "http://dbpedia.org/data/"+str(key);
print url
try:
c=urllib2.urlopen(url)
f=c.read()
except Exception, e:
return ''
doc = amara.parse(f)
b = doc.xml_xpath("//p:abstract[@xml:lang='en']")
try:
r = str(b[0])
except Exception, e:
return ''
return r
def scrapewiki(band):
try:
key = getwikikey(uurlencode(band))
except Exception, e:
return ''
return getwikibio(key)
#unit test
#print scrapewiki('guns n bombs')
#print scrapewiki('diana ross') |
There we go, artist bio scraping from wikipedia.
Posted in Uncategorized |
3 Comments »
March 21st, 2008 by
David Craddock

I passed the adExcellence exam first time.. woo! It wasn’t that difficult really.
“David Craddock of iCrossing is accredited as an official Microsoft adExcellence Member. A Microsoft adExcellence Member has completed comprehensive online training on managing Microsoft adCenter search engine marketing campaigns and has demonstrated expert knowledge by passing the Microsoft adExcellence accreditation exam.”
As of 21/3/08, I’m somehow also now #1 on Google.co.uk for the keyword “adExcellence exam”.. if that’s what you googled for, you probably want the adExcellence main site instead. Or use Live Search.
Posted in Uncategorized |
No Comments »
March 17th, 2008 by
David Craddock

I have just seen Yahoo! Pipes, and am convinced this is going to change the web. For real.
Data source sites will become ‘content providers’, data will be aggregated and filtered from multiple content providers, either by the user or by ‘intermediary’ sites. The user will be able to choose his ‘data view’ of the content on the internet, just as Google is currently doing.
This is fascinating stuff if you’re involved in the web industry.
Posted in Uncategorized |
No Comments »
March 15th, 2008 by
David Craddock

We’ve been working on a Brighton music events Google maps mashup project:
www.BrightonSound.com
It’s still developing, but it looks quite good, and we’re ready to start showing it off to people. So check it out!
Posted in Uncategorized |
No Comments »
February 28th, 2008 by
David Craddock
I finally setup my Dell Lattitude D630 laptop the way I wanted it last night, and thought I’d do a quick writeup about it. Here is the parttition table:
- A 40GB Windows XP partition, with VMWare Player installed, which I will be using for Windows applications that don’t play well in virtualised mode (eg media applications). I will also be using it as the main platform for running VMs.
- A basic 5GB root + 1.4GB swap 7.10 Ubuntu server partition, with VMWare Server installed (for creating, advanced editing and performing network testing on VMs). I used these VMWare server on Ubuntu 7.10 tutorials.
- A 36GB NTFS partition for storing VMs
- A 26GB NTFS media partition for media I want to share between VMs and the two operating systems on the disc.
We use VMWare servers at work to host our infrastructure, so this setup will be very useful for me. I can now:
- Take images off the servers at work and bring them up, edit them and test their network interactions under my local VMWare Server running on my Linux install.
- From within my windows install, I can bring up a Linux VM and use Windows and Linux side by side.
Posted in Uncategorized |
2 Comments »
February 22nd, 2008 by
David Craddock
I will be attending Brighton Barcamp 2 on the weekend of the 14th March, and presenting on a new web project I’ve been working on.
See: http://barcamp.pbwiki.com/BarCampBrighton2 and http://www.barcampbrighton.org/ for more info.
Update: Brighton Barcamp 2 is now over.
This was really interesting, and I learned a huge amount in a very short amount of time. Thanks to everyone who talked to me. I’ll definitely be attending future Barcamps.
Posted in Uncategorized |
No Comments »
November 12th, 2007 by
David Craddock
My friend Adam has a blog featuring interesting internet finds. Check it out:
One Idea
Posted in Uncategorized |
No Comments »
June 2nd, 2007 by
David Craddock
I’ve been trying to learn a lot about search engines lately, as I’ve been starting at an internet marketing firm. I found this excellent list of online materials for university courses related to search engines:
http://clair.si.umich.edu:8080/wordpress/?p=11
In particular, these seem especially relevant:
Posted in Uncategorized |
No Comments »