FromCursiveToDatabase

An open-source suite of tools for manipulating facsimiles, crowdsourcing transcription and searching transcripts

FreeUKGen's open source/open data system for crowdsourcing transcription of structured manuscript material

We're building a general-purpose, open-source tool for crowdsourced transcription of structured manuscript data into a searchable database.

We're basing our system on the Scribe tool developed for the Citizen Science Alliance for What's the Score at the Bodleian, which originated out of their experience building OldWeather and other citizen science sites.

The are four parts to the system:

  1. A new tool MyopicVicar for loading image sets into the Scribe system and attaching them to data-entry templates.
  2. Modifications to the Scribe system to handle our volunteer organization's workflow, plus some usability enhancements.
  3. A publicly-accessible search-and-display website to mine the database created through data entry.
  4. A reporting, monitoring, and coordinating system for our volunteer supervisors.

We also plan to add support for geocoding during transcription and GIS support within the search and display system. Currently, initial development is mostly finished with 1 and moving on to 2 and 3 above.

Although this tool is focused on support for parish registers and census forms, we are intent on creating a general-purpose system for any tabular/structured data. Scribe's data-entry templates are defined in its database, with the possibility to assign different templates to different images or sets of images. As a result, we can use a simple template for a 1750 register of burials or a much more complex template for an 1881 census form. Since each transcribed record is linked to the section of the page image it represents, we have the ability to display the facsimile version of a record alongside its transcript in a list of search results, or to get fancy and pre-populate a transcriber's form with frequently-repeated information like months or birthplaces.

Under the guidance of Ben Laurie, the trustee directing the project, we are committed to open source and open data. We're releasing the source code under an Apache license and planning to build API access to the full set of record data.

How to Help

We need your help to make this the best project it can be, and there are plenty of ways to contribute:

  1. Fix a bug! Check out the issues list on our fork of Scribe and see if anything looks intriguing.
  2. Join the mailing list!
  3. Tackle a feature from our database of user stories (click the Icebox link for the issues discussed so far).
  4. Design a site! We want to build the best record-search database on the web, so if you can put together your vision of what that looks like, we want to know!

FAQ