In a web application that I have been working on recently, it has been necessary to answer the question: what country is lat,lon in?
There are a number of ways of determining this, but as a starting point I decided to find the boundary of each country from osm data as a set of polygons.
After extracting the boundary data from osm, I found that the linestrings or ways (which when joined together form a polygon) where badly ordered and had lots of holes in them.
I spent a day trying to beat the osm boundaries into something decent, until I finally decided that there had to be an easier way. Then I stumbled across world boundaries at thematicmapping.org. The dataset is available under a Creative Commons Attribution-Share Alike License, so I’m sharing my success and files here.
I extracted the boundaries from shape files into mysql with some inspiration from Spincloud Labs, and then set about writing some basic spatial sql queries.
The information on the internet about the current state of mysql spatial queries is not extensive, so I updated my test mysql server to the latest version in case anything had changed and set about finding out if the within or contains functions work properly with mysql multipolygons. Even in MySQL 5.1.32 these two functions use MBR – minimum bounding rectangle. So rather than answer the point in poly question, queries such as
SELECT ID, NAME, CONTAINS( ogc_geom, GeomFromText('POINT(15 60)') ) AS C
FROM `world_boundaries` ORDER BY C DESC
would return Norway, Sweden, Russia and United States.
From there, it would be necessary to extract the multipolygon data, parse it into x and y values, then see which polygon it is within. I have written code for the point in poly before in a few languages, but parsing the whole of US,Norway,Swedish and Russian boundaries when a position is well within Sweden seemed a bit silly. There had to be a better way!

In order to inspect the boundaries, I had written a simple polygon renderer. I set it to work on the worlds boundaries. Even at a multiplier of 20 (pixel width of the map is 360 times the multiplier, so for a multiplier value of 20 the map is 7200 x 3600 pixels) it only took a few seconds to render the map. Here each country is its ID in the rgb values of each pixel. Now, with a simple ‘what is the colour of pixel lat,lon‘ you can find out what country a pixel is in! Once the image is loaded in memory, this lookup is fast. (I have made it faster later – read on!)
Now it is important to ensure that there is no anti-aliasing on the image – otherwise it could cause problems. But still at country borders 1 pixel (1/20th of a degree) is a big brush, and I needed some way of flagging that a pixel is a country border, perhaps to use a finer algorithm to find out exactly which country. So, at any country borders, I changed the pixel colour to black. If the search returns 0 (black) as the country, then it must be ambigous! If the search retuns 255 (white) then it is sea. Otherwise, it is a country! I also performed quite a bit of cleaning up the data, such as removing small holes and expanding the edges at the coast.
I overlaid the map on the vector data, and found the two to be quite consistent with each other.

Now for some fine tuning to remove the need to load the whole 25 million pixels into memory before a search begins.
There are 246 countries in the dataset, plus 2 extra (black and white) which I have added for ambigous and sea. Thats a total of 248 – handy for fitting in 1 byte! Now we simply go along each x and y value of the image, look at the pixel rgb value, fit it into 1 byte ( & 0xFF ) and then stick it into a binary file. That file does end up being 25MB, but I can cope with that.
Now, to find the country ID of a latitude and longitude. I think some code here would be easier than explaining it! Here is some php:
$handle = fopen("world_boundaries.bin",'r');
function get_country(&$handle,$long,$lat){
$mult = 20;
$x = round(($long+180)*$mult);
$y = round((90-$lat)*$mult);
$pos = round( ($y*360*$mult) + $x);
fseek($handle, (int)$pos, SEEK_SET);
return ord(fgetc($handle));
}
print "You are in country ".get_country($handle,-2.0,53.1);
fclose($handle);
The x value is between 0 and 7200, and y value is (negative downwards) between 0 and 3200. From this the position is found in the file, and the easiest way to return 1 byte in a file in php to to use fgetc.
I expect this to be fast. There is only one file involved, as there is no locking or checking for pretty much anything!
When looking up random locations around the world, each lookup would take about 0.02 milliseconds on my test server. That I suggest is pretty good!
Now, if this function returns 255, you are at sea. If it returns 0, you are on a country border and you will need to check in more detail.
Downloads:
The dataset is available under a Creative Commons Attribution-Share Alike License
World image file: 7200 x 3600. 375kB
World boundary binary file. Compressed, 215kB