The PBMP: Getting Started

This is the first in a series of posts – frequent, but unlikely regular – discussing the work we are doing to achieve the goal of making the bibliography of Pompeian scholarship searchable via a map. The Pompeii Bibliography and Mapping Project (PBMP) is committed to this task and now has the necessary resources to complete it with the generous funding of both an American Council of Learned Societies Digital Innovation Fellowship and a National Endowment for the Arts Digital Humanities Start Up Grant. What the PBMP is and who we are can be found elsewhere on our website, so I wont repeat that information here. Instead, what follows are some preliminary ruminations on our goals and what we have accomplished in our first month in operation.

Two of the three elements of the PBMP – the bibliographic catalog/subject repository and the online GIS – are already online in beta forms and linked together through only the most basic toponymic information. The work to replace both of these demos has been the focus of much of the first weeks of work. The bibliographic catalog is based on the incredible compilation of citations published by Laurentino Garcia y Garcia in his three volume Nova Biblioteca Pompeiana, which recently also has been released online in pdf. Our continuing work to digitize this resource was lead last year by UMass student Jackson Mitchell with funding through a Project Funds Grant from the UMass College of Humanities and Fine Arts. Mitchell is continuing his leadership role this year, working with Kevin Nguyen (among others, some named below) to proof the catalog data and to add essential metadata.

The Nova Biblioteca Pompeiana’s collection is current only through 2011, however, and new scholarship is constantly being published. Therefore, one challenge is to bring the catalog current and find new sources as they are produced. It dawned on me that that libraries and research databases are able to quickly learn about new books. How do they do it? In addition to this line of investigation, Leslie Bradshaw (UMass Digital Humanities Initiative Graduate Assistant) is researching programmatic and crowdsourcing solutions so that the PBMP can keep current with the continually expanding its list of publications on Pompeii. When we learn answers to these questions, they will be published on this blog, which is carefully managed by Chris Caro, the PBMP’s IT assistant.

We are also actively expanding the number of full-text documents that can be linked to these citations. UMass Librarian, Annette Vadnais has been working with the Hathi Trust to gain access to their vast corpus of Google’s scanned books. Additional works, mainly from the Internet Archive, are being added to our collections through a collaboration with the UMass Center for Intelligent Information Retrieval’s (CIIR) A Million Scanned Books and Proteus projects. Digitizing some works with particular topographic importance is also underway in house. In 2010, the PBMP was awarded one of the initial UMass Digital Humanities Initiative Seed Grants, which funded work on our bibliographic catalog and the purchase of an Atiz BookDrive Mini book scanner. Currently, the Pompei: Pitture e Mosaici volumes are being scanned by Tess Brickley. At the same time, Danielle Dyer is parsing the data in the OCR text from our scans of the Corpus Topographicum Pompeianum into a database of individual property records. Later, this database will be supplemented (including concordance) by property information from other sources, such as Eschebach’s Gebäudeverzeichnis und Stadtplan der antiken Stadt Pompeji, Astrid Schoonoven’s Metrology and Meaning in Pompeii (Appendix I), and Damian Robinson’s The Shape of Space in Pompeii: Studies in the Social Production of a Roman Urban Landscape (Appendix, unpublished diss., Univ. of Bradford). Danielle is also correcting the scanned concordance of place names and addresses (“The Toponymy”, CTP II, pp. 1-203). These efforts to generate authority lists of the Pompeian topography will be essential to the task of making our full-text corpus searchable and connected to our GIS map. This work will begin in earnest this spring, as we apply Natural Language Processing techniques to our digitized catalog under the direction of Prof. David Smith.

Last week, Annette and I also had a fascinating and important meeting with Laura Quilter, UMass Copyright and Information Policy Librarian, to discuss the issues of copyright regarding what content PBMP can and cannot make available. I am inspired to think this topic needs a post of its own, but two important points can be shared here. The first point is that the last three-quarters of the 20th century is a “copyright black hole” (my descriptor, not Laura’s). That is, between our current open access boom and 1938, there is a circa seventy year period of digital silence. Within this period, copyright holders (mostly publishers), can prevent the wide redistribution of the important information within those titles. A second fact potentially ameliorates the impact of this fact: authors can petition publishers to release their copyright if the work is no longer in print and cannot be purchased for a reasonable price. The publisher has the option to reprint the work, but if they choose not to, the copyright must go back to the author. Later in the life of the PBMP, we will certainly pursue a strategy of contacting authors and publishers to get the 20th century to rejoin the 21st, as well as the 18th and 19th centuries in terms of accessibility.

Fig. 1. Location of 2006 survey points, from Morichi, Panone, Rispoli, and Sampaolo (2006, 554-555).

Fig. 1

The second major component of the PBMP is our online GIS map. Digitization of Pompeii’s landscape, in its most basic form, is complete and already online. The digitization process, however, was done in piecemeal fashion and for a separate purpose. These files served as the underlying GIS for my dissertation on Pompeii’s traffic system. Because the spatial data for the PBMP was born in dissertation research, the current GIS is overly specialized in some areas and lacks detail in others. Several subsequent publication projects (e.g. on stables and on the drainage system) were also added to the new datasets of general (e.g. a new contour map of local topography) and very specific (e.g. the locations of street blockages) interest. The first task was therefore to inventory the hundreds of files and (their versions) in a dozen file formats, spread across three computers and six external drives to find out what we are missing. That inventory is shared here as a Google Spreadsheet.

Fig. 2 Example of SAP survey point sheet (ST 60).
Fig. 2_ST 060
Fig. 2_ST  60

Even the very first step in creating the GIS, georeferencing the Soprintendenza Archeologica di Pompei’s (SAP) AutoCAD plan of the site, was done as part of another project. In July of 2006, I was completing my charge to build an accurate 3D wireframe model of Insula VI 1 for the Anglo American Project in Pompeii (AAPP) using a reflectorless laser theodolite. During the same period the SAP commissioned a theodolite transit of the city and the creation of nearly 100 georeferenced survey points (fig. 1-2). We were given three of these points and I began the process of shifting the AAPP model, joined to the SAP’s basic CAD plan of the entire city into real world coordinates. The process, conceptually, was very easy:

Capture the SAP survey points within the local AAPP grid and model.
Create a point in UTM coordinates in the CAD model for each of the SAP survey points.
Copy the joined SAP and AAPP models and paste them into UTM coordinates based the location of one of the points in step #2.
Rotate the entire model so that a second point in original model is moved to match the location of a second point in step #3.

Fig. 3. Angular distortion across insula VI 1.

Fig. 3

The actual transformation procedure, however, became relatively complex due to the differing levels of precision among GPS readings (+/- 10cm), electronic survey measurements (+/- 1cm), and the infinite scalability of AutoCAD. Thus, although the model could be moved to the exact location of any single point (step #3), the difference in precision meant that the model could not be rotated and “snapped” exactly to any second point (step #4). More importantly, errors in the GPS points’ positions were compounded in the rotation procedure, expanding over distance and ensuring that the southern and eastern portions of the city would suffer the greatest distortions (figs. 3-4). Fixing the errors in position (and projection) is one of the initial tasks the PBMP will undertake, lead by UMass senior GIS analyst, Alexander Stepanov.

Fig. 4. Detail of angular distortion. Error is as much as 0.26m over 51.36m.
Fig. 4

Following the correction of positional errors, the next task will be to assess the internal consistency of the spatial data. All the city’s features not found in the original CAD plan- for example, polygons of the streets and street features, unexcavated areas, gates and fortifications, fountains and water towers, as well as the individual insulae and properties – were hand drawn by me over the course of a decade. Minor inconsistencies due to (my) human error in the shapes of features and especially in how those features interact with others – by overlapping, containing, forming a boundary, etc. – must be identified and corrected. In the course of this work we will also establish our basic topological rules for the data, defining how objects can and cannot function in the GIS. One obvious rule will be that properties cannot be overlapped by other properties, but others will also be devised.

In the coming weeks we hope to not only detail our progress in this blog, but also have demos and beta products to offer for testing and feedback. And, let me say it now for the first of what will be many times: we are always interested in your suggestions for what bibliographic information should be included in our catalog, what aspects of Pompeii’s urban topography should be digitized, and what functionalities the PBMP should offer to you, our community of interested users. Having read this far your interest must be genuine indeed.