Donations to MPs are in the news again, and TheyWorkForYou allows users to easily see what any individual MP has received. In fact, the site has carried a copy of the Register of Members’ Financial Interests (in which, as Parliament’s website explains, “MPs must register within 28 days any interest which someone might reasonably consider to influence their actions or words as an MP“) since at least 2005.
This hasn’t always been straightforward, and has recently become slightly trickier.
The official register is published as static HTML or PDF, with a simple list of all MPs. We scrape that HTML, convert it into light XML and import it onto the site – which means you can easily see not only the current entry on an individual MP’s page, but also see a complete history of their register without having to view many different copies of the official register.
The XML contains all the data from the official register, but it only parses out basic information like the category of interest. Providing more detail would be great, but is quite a hard problem to tackle.
Recently, Parliament has started using Cloudflare’s bot-protection technology. We assume this change was made with good reason, but as a side effect it has prevented effective scraping of the website, as Cloudflare don’t distinguish between good and bad bots or scrapers.
We know that Parliament was working on an API at least as far back as 2016, from their now-removed data blog, but if this is still in development, it is yet to see the light of day. What they said at the time still stands: their website is still the only means of accessing this data. We don’t think it’s necessary to protect purely static HTML pages such as the Register in quite such a heavy-handed manner.
We do have ways of continuing to get the Register, and TheyWorkForYou is still up to date, so anyone else who has been scraping the official site and has hit issues because of this is welcome to use our data, either via the XML or our API.
Image: Adeolu Eletu