How Simon Willison built a search engine for Datasette
Posted on
Simon Willison is just the absolute best kind of nerd and Datasette is such a prime example of why: a rigorous set of tools that makes it easy — especially for data journalists but really anyone — to make sense of a bunch of un- (or barely) structuresd data.
This is a class of problem that needs better tools, to be honest; we’re all so awash in data these days and we really have no control over any of it. At best this leaves us feeling confused, at worst exploited.
Earlier this year, working on a personal project, I ended up with a big set of data about schools across a bunch of different states, scraped from state and county and even individual school district websites. This isn’t a Big Data class of problem, but it’s also unwieldy in a spreadsheet. The data was a gnarled, unstructured mess and Datasette handled it all brilliantly.
Datsette is also extensible through plugins and Willison has built a personal data tracking tool called Dogsheep on top of Datasette. The basic idea behind Dogsheep is it lets you export the data about you that gets warehoused by various tech companies, either to glean something useful about yourself or simply to have control over it.
Willison built a single, faceted search engine that works across all the various projects and docs and plugins. The details get fairly technical but there’s plenty about this kind of project that leaves me feeling pretty hopeful. All too often it’s easy to feel powerless in the face of how much control Big Tech exerts over our daily lives. It’s wonderful to be reminded this isn’t destiny.