About Nile Travelogues

Who we are

Nile Travelogues is a part of Newbook Digital Texts, a digital humanities publishing house sponsored by the University of Washington's Department of Near Eastern Languages and Civilization.

How it was made

The database was developed by Frederick Chan as an intern, who made automated methods for extracting publication data, created the database, and programmed this web application.

Data acquisition and processing

The data from Nile Travelogues originally came from Nile Notes of a Howadji by Martin Kalfatovic, a bibliography of travel logs of people who travelled to Egypt until 1918. While some records were manually entered into a spreadsheet, with around 400 entries that remained, it quickly became clear that automation was needed.

An OCRed version of the book was used to extract its contents and put them in an SQL database. Since all the publications listed in Kalfatovic were in a particular format described in the preface, a Python script was written that scanned the book line by line and parsed the information accordingly. This line by line approach was selected because OCR text is noisy; scanning artifacts may appear in unexpected locations. If this happens, the script logs which entry in the book it tried to parse and skips to the next one. The result was 90% of entries the script tried to parse succeeded, and the rest were entered manually by Dr. Sarah Ketchley, director of the Nile Travelogues Project.

Book scans and other data sources

Data not available in Kalfatovic's book is acquired through different methods. For example, the genders of many travelers in the database were inferred using the Wolfram Engine's NameGender classifier function, taking in a name and guessing the gender.

Scans of the books were acquired be searching the Internet Archive for books in the Travelogues database. We get the books' IIIF manifest from Internet Archive, which tells the IIIF reader where the scans are located on Internet Archive's servers. The IIIF reader we use is the Mirador reader, which provides a web component that displays book scans right here in our web app!