From our Swindon Correspondent:
From the Guardian:-
PHE was responsible for collating the test results from public and private labs, and publishing the daily updates on case count and tests performed.
But the rapid development of the testing programme has meant that much of the work is still done manually, with individual labs sending PHE spreadsheets containing their results. Although the system has improved from the early days of the pandemic, when some of the work was performed with phone calls, pens and paper, it is still far from automated.
In this case, the Guardian understands, one lab had sent its daily test report to PHE in the form of a CSV file – the simplest possible database format, just a list of values separated by commas. That report was then loaded into Microsoft Excel, and the new tests at the bottom were added to the main database.
I haven’t done this sort of work for a while, but I used to do quite a lot of it. People put things into Excel and then you load up all of it into a system. Sometimes, that’s just the specification. Excel is a common tool that pretty much everyone owns, so maybe you just send them out a template and for their sales or faults, they fill out a sheet, return it, and you import it.
But here’s what you don’t do: you don’t import it into Excel. You don’t store data, in any permanent way in Excel. Excel is not a database. You want something like SQL Server, MySQL or Oracle. You take the data sent in and validate it, then you apply it to a database. Here’s the sort of reason why:-
But while CSV files can be any size, Microsoft Excel files can only be 1,048,576 rows long – or, in older versions which PHE may have still been using, a mere 65,536. When a CSV file longer than that is opened, the bottom rows get cut off and are no longer displayed. That means that, once the lab had performed more than a million tests, it was only a matter of time before its reports failed to be read by PHE.
It’s also hard to do things like checking for duplicates, doing any sort of controls, any sort of fast reporting.
As part of that process, you also create some reports that show the following:-
- How many cases were in the database before the update
- How many cases came in from each lab
- How many failures there were (e.g. validation problems)
- How many cases are now in the database.
You send those reports to a manager who checks that it reconciles. This is basic, BASIC stuff that banks have been doing for decades. You could even have an automatic check when it processes and just alert someone if there’s an issue with the numbers. Or you know, just have a developer and a manager in every day, 7 days a week. This is supposed to be critical data for understanding the pandemic, so why would you want any delay whatsoever?
I’m going to presume there are no controls in this process. Lashing Excel together is what some Johnny in a user department with no software development experience does, not experienced software designers. I’m guessing there’s no controls for missing files from a lab, duplicate data from a lab, malformed data from a lab, no audit trails, no testing process.
And there’s simply no excuse. The people with skills to do this aren’t cheap, but they also aren’t that expensive. There’s probably people in some parts of government on furlough who could do this properly. This is what you get from government.