Trial #35: Querying the gov.uk website for COVID-19 Tier by PostCode
Problem:
The UK Government has a special COVID-19 Data Website.
It has a handy tool to pull up local information given a particular.
However, if you want to look up a collection of postcodes this process might be time consuming. There is an API and docs provided. However, I could not see an endpoint that took a postcode.
Solution:
Fortunately, the custom postcode dashboards are accessible by a query string. So we can work through a set of postcodes by making a series of get requests.
I put together a minimal viable powershell script. I will not explain every step here but it uses the Invoke-WebRequest
Cmcmdlet
to GET
and parse the webpage, drill into the required page elements and interpret the human readable text into structured data. There is quite a lot of Regex for validation and data scraping which I plan to cover in another post.
Pitfalls:
Invoke-WebRequest and the automatic parser are not especially quick or optimised. For heavy use it would be better to write a custom tool using a common .NET HttpClient
with parallel execution and an optimised scraper.
As my address list contained duplicate postcodes, I first made a list of unique postcodes and then constructed a Hashtable
with the results of my above script per postcode. I then join this Hashtable
to the original set to avoid repeated lookups of the same postcode.
Another advantage is that you can investigate missing values without iterating over the entire set of postcodes.
There are other data points on the page that could be scraped. However, at this point we have a geographic area for each postcode. We would be far better served using this to query the API and obtain the structured data directly.
Leave a comment