FreeUKGen's open source/open data system for crowdsourcing transcription of structured manuscript material
We're building a general-purpose, open-source tool for crowdsourced transcription of structured manuscript data into a searchable database.
We're basing our system on the Scribe tool developed for the Citizen Science Alliance for What's the Score at the Bodleian, which originated out of their experience building OldWeather and other citizen science sites.
The are four parts to the system:
- A new tool MyopicVicar for loading image sets into the Scribe system and attaching them to data-entry templates.
- Modifications to the Scribe system to handle our volunteer organization's workflow, plus some usability enhancements.
- A publicly-accessible search-and-display website to mine the database created through data entry.
- A reporting, monitoring, and coordinating system for our volunteer supervisors.
We also plan to add support for geocoding during transcription and GIS support within the search and display system. Currently, initial development is mostly finished with 1 and moving on to 2 and 3 above.
Although this tool is focused on support for parish registers and census forms, we are intent on creating a general-purpose system for any tabular/structured data. Scribe's data-entry templates are defined in its database, with the possibility to assign different templates to different images or sets of images. As a result, we can use a simple template for a 1750 register of burials or a much more complex template for an 1881 census form. Since each transcribed record is linked to the section of the page image it represents, we have the ability to display the facsimile version of a record alongside its transcript in a list of search results, or to get fancy and pre-populate a transcriber's form with frequently-repeated information like months or birthplaces.
Under the guidance of Ben Laurie, the trustee directing the project, we are committed to open source and open data. We're releasing the source code under an Apache license and planning to build API access to the full set of record data.
How to Help
We need your help to make this the best project it can be, and there are plenty of ways to contribute:
- Fix a bug! Check out the issues list on our fork of Scribe and see if anything looks intriguing.
- Join the mailing list!
- Tackle a feature from our database of user stories (click the Icebox link for the issues discussed so far).
- Design a site! We want to build the best record-search database on the web, so if you can put together your vision of what that looks like, we want to know!
FAQ
-
Why should I help FreeREG/FreeCEN? I don't even have any ancestors from the British Isles.
We want to build a general-purpose indexing/search tool, rather than one that's specialized for British genealogy. That's why we're especially looking for people who will make the tool their own, working together to build a system that will solve any structured transcription project. We need your own project ideas, your own experience and your own goals for the tool to make it work. -
I don't know Ruby on Rails or MongoDB. How can I help?
That's okay. Currently our greatest needs don't involve Ruby on Rails or MongoDB at all. We need help coming up with mockups and site design. We need help with our front-end implementation of the search site. The Scribe code-base requires a great deal of expertise in Javascript and DHTML. There are plenty of ways to contribute with any one of HTML, CSS, Javascript, or Photoshop/GIMP.