There’s a clear argument for opening up public sector information for reuse. It increases transparency. It’s the raw material for new kinds of services with public or commercial value. It improves measurability and makes possible new kinds of analysis. Therefore, the public sector should open up more of its data, in reusable formats.
Building a catalogue of data held by public sector organisations (like the great one set up in Washington, DC and now the Federal data.gov) is a logical starting point. Audit what’s held in departments, who’s responsible for it, and publish the list (with links to the relevant datasets) for potential reusers to come and browse. Or even go a step further as they’re doing at Kent County Council with their Pic-and-Mix pilot, and develop a mashup hub too, for end users themselves to develop and discuss applications which use that data.
Richard is now asking for feedback on the formats and structure of the UK’s data catalogue – do go and give him your thoughts. But I’m still troubled by some more fundamental problems than whether we publish the data in JSON or RSS:
- Which data? For example, I suspect a simple, definitive postcoded list of UK higher education institutions would be useful to a fair number of people developing map-based mashups – though I’m not sure a civil servant would identify just how useful that kind of thing could be. I wonder whether a mechanism a bit like a souped-up, less forlorn version of OPSI’s ‘data unlocking service’ might provide a forum for potential re-users make ‘bids’ for useful data – even if they don’t know where it sits or what form it takes – and develop a community which can assess, prioritise and refine those data specification.
- Who decides whether to publish? Proactively publishing data almost inevitably increases an organisation’s short-term potential exposure to criticism (even if it reduces it in the long term). It invariably generates tedious work for which the perceived ‘market’ is tiny. To play devil’s advocate: from a civil servant’s perspective, what makes the ‘open data people’ any different from the cranks who’ve always made trouble for bureaucrats by asking vexatious questions? There’s no big queue of citizens asking for data right now, any only a hypothetical end user audience for hypothetical tools based upon it. Ask an IT manager, a press officer and a policy official whether to publish any given dataset and you’re likely to get three radically different answers. We need some pretty clear principles to determine what gets published, to prevent our data catalogues being reduced to the blandest lowest common denominator.
- Who benefits? The civil service isn’t the machine it’s sometimes portrayed as: ours is a surprisingly small, somewhat stretched, government of humans for whom opening up data is not – to put it mildly – a top priority, even where the data itself are simple and uncontroversial. We can tell these people to do it, but until we can show them where the benefits lie – not just in the social value, but the benefit for their organisation and for them personally – we’re unlikely to get buy-in on a large scale.
- Who pays? Cleaning up data for publication, documenting it, checking it for errors or personal details, reformating it, uploading it, answering queries about it – there’s a lot of work involved in open data. It’s not in estabished job descriptions – so we’re likely to need more people to do this work, if it’s to happen on a large scale. Now, as taxpayers, we might decide that’s a cost worth paying. But for how many datasets? And at what maximum cost?
- For how long? As any knowledge manager will tell you, information has a life cycle. Publish it now, and in six months’ or six years’ time, bit rot may have rendered it useless. Who is going to be responsible for maintaining the data when published, and what liability should public bodies accept for its misuse or inaccuracy when used by third parties? If the hospital Mashup A told you was at map co-ordinates X,Y turns out not to be, who are you going to be able shout at about it?
Here’s my thought: open data needs a new breed of data gardeners – not necessarily civil servants, but people who know data, what it means and how to use it, and have a role like the editors of Wikipedia or the mods of a busy forum in keeping it clean and useful for the rest of us. Encourage three or four independent people passionate about, say, transport or secondary education, who know and respect the system enough to know how to extract useful data, without rattling too roughly the cages of the people who will be asked to provide it. They’ll know when the data changes, or what a reasonable request is, or where something can be found because… they just know that area like the back of their hands. Support them with some data groundsmen with heavy-lifting tools and technical skills to organise, format, publish and protect large datasets. And then point the digital mentors at the data garden, to get communities to come and enjoy the flowers in ways that enrich their lives.
Personally, I passionately want to see open data work in the UK. But as with so much on the web, I think the primary challenges will be sociological, not technical.
Photo credit: Omargurnah