Documentation on Linking Data

With the first GIS map of Pompeii now available online, we are turning more of our attention to the problem of connecting our spatial data to our bibliographic data. While there is still some important spatial work to be done with the current map, the planning and documentation for the bibliographic integration serves as a worthwhile distraction. To that end and following a discussion last week with Alexander Stepanov, the PBMP’s GIS architect, I’ve decided to write up some very quick documentation for our data and their connections as a blog post. I’ve also decided to try something else new. Below is a Google Slide with the designs and discussions we drew on a whiteboard as the background. Over this are shapes representing files we need to link together with their names hyperlinked to their locations on the web (as hosted sites or Dropbox objects). In this way, the blog post operates in three different dimensions:

  1. As a public discussion
  2. As a living, internal document
  3. As an interface to the repository of files we’re using.

The files listed are as follows:

A single file of spatial data to start, the Propeties by Eschebach (Prop_ESCH), representing all the building and occupied spaces in the city. Later this will expand to include other, more generalized features of the landscape, such as the City Blocks, Gates, and Fortification Walls.

Three files from the Nova Bibliotheca Pompeiana are given here:

  1. The first 10,000 citations (GYG Citations_BIBLIOGRAPHY) completed from the NBP as there were prepared for uploading to Zotero (and then to Omeka). This shows how the data were divided and might be recombined.
  2. A list of property addresses from the Spatial Index from the NBP (GYG Citations_INDEX). This gives as a one-to-many relationship the address of a property and the one or more citations that relate to it.
  3. A list of addresses per citation as extracted from the full-text of the first two volumes of the NBP (GYG Citations_TEXT). This gives as a one-to-many relationship the bibliographic citation as given by Garcia y Garcia and the one or more addresses that relate to it.

Naturally, there will be a significant overlap between #2 and #3, which will reduce the total number of connections, but also offer a chance to preform quality control test on the data as extracted from the NBP.

If thinking of this a merely a spatial data problem, the work to be done is non-trivial, but also not conceptually difficult. That is, if all we wanted to do was to connect the bibliographic data to the map so that users could click on it and access that information, the process would be straight-forward: combine and proof tables #2 and #3, then join them to the spatial data of Properties by Eschebach. Indeed, that *is* our primary goal, but we also want those bibliographic citations to be linked to their full references on our other platforms (i.e., Zotero and Omeka). Moreover, we want users to be able to use search functions in the map – beyond navigating and clicking – to both find and leverage bibliographic information. For example, we want people to be able to search for an author in the map and have the sites and buildings associated with that author appear highlighted. The user should also then be able to create a new search off of this subset of data, using either additional bibliographic criteria or spatial definitions. To make these functions possible, however, the data stored in the map cannot only be reference numbers linked out to other resources. Finally, we would like to eventually have searches in our bibliography be (passed to and) responsive in the map, so that the results of regular bibliographic searches might be visualized in the map as well as in the listing of citations.

As you can see from teh image, we’ve got an outline of how we’ll do this. Nonetheless, if you are a GIS architect, a digital collections librarian, data designer, or all around smart person and have an opinion on how this might be done, in all or in parts, please do email me: Pompeiana[AT]gmail.com

– EP

Pompeii: The First Navigation Map

The PBMP’s first full map for navigation is now online. You can start to explore Pompeii in the map embedded below, or go to the full site for more space and options. If you want to customize the map or make a presentation from it, sign in to / sign up for your ArcGIS Online account and save a copy to your own webspace. The link is at the upper right of the embedded map page. Below the map is additional information about the files, the information they contain, and their display.

The “Pompeii: Navigation Map” is essentially a set of nested tiles that change the display of the city as one zooms in and out to change the scale of the map. Overlying these are a series of vector-based files, which are used almost exclusively as invisible, data rich layers. That is, the transparency for many of the files set to 100% so that the information about Pompeii those files hold can be accessed (via a pop-up window), but their rendering does not slow the loading of the map.

Users may find the information in the following files to be of interest:

Data-Rich Layers

Elevations Points: This layer is turned off by default and set to not appear until the view scale reached 1:2500. Above sea level elevation data at 5cm or 10cm resolution from multiple sources: Corpus Topographicum Pompeianaum (1984); De Caro, S. (1979); Eschebach and Müller-Trollius (1993); Etani et al, 2003; Pompeii Archaeological Research Project: Porta Stabia.

Eschebach ALL: (West & East). Due to the number of features in the original file (Properties by Eschebach), the file was split in two along the via Stabiana. The user should notice little difference. There are, however, some significant issues to be aware of in the spatial consistency of properties for those interested in the area of individual features. Because the properties were drawn to express the functional categories assigned by Eschebach and not the contiguous physical boundaries of the building, there are overlaps, gaps, and duplications in the data. We are working to improve these data. For the moment, caveat emptor. These files do, however, contain information of importance to researchers, including:

  1. Address of the Primary Door according to Eschebach (1970; 1993).
  2. Functional Category according to Eschebach (1970; 1993).
  3. A link to image of the property at Pompeii in Pictures.
  4. The Date(s) of excavation according to the Corpus Topographicum Pompeianaum (1984).
  5. Area of the property in square meters.

PBMP CTP (Features): The 628 properties in this file represent the properties described in the “Structures” section of the Corpus Topographicum Pompeianaum (1983), pars. II. This layer contains information of importance to researchers, including:

  1. Address of the Primary Door.
  2. Page number of the information in the Corpus Topographicum Pompeianaum (1983), pars. II.
  3. All known names of the property: Name (1) – Name (15).
  4. Bibliographic reference for each known name of the property: Ref. Name (1) – Ref. Name (15).
  5. A link to image of the property at Pompeii in Pictures
  6. Area of the property in square meters.

Display Layers

Fortification Walls: Sixteen sections, between the defensive towers and city gates, of Pompeii’s extant fortifications are shown and named.

Defensive Towers: Eleven of the twelve known (by inscription) defensive towers surrounding Pompeii are shown and named.

Gates: The seven known gates to Pompeii are shown and named.

Unexcavated Areas: Three primary areas still not yet excavated (in Regions I, III, IV, V, and IX) are shown and named, as well as isolated areas along the interior of the fortification walls.

City Blocks: The excavated extent of the city blocks (insulae) are shown and labeled.

Streets: There are 97 streets and passage areas represented in this file with the extend of the street and its name given according to their modern conventional nomenclature (in Italian).

Alleys: Six passages within city blocks and disconnected from the street network shown and named.

Sidewalks: The excavated extent of the pedestrian sidewalks are shown.

Stepping-Stones: The 316 known stepping-stones within the street network are shown and named.

Forum: The forum, though also given a designation as a city block (VII 8), is shown here as its own feature.

Water Towers: The twelve water towers are located and labeled according to the nomenclature established in Larsen, 1982.

Fountains: Thirty-four public fountains, including both the complete footprint of the fountain and its interior basin, shown, symbolized to show the basin with water, and named.

Projected City Blocks (insulae): This is layer is turned off by default and expresses 46 extrapolated city blocks that remain partially or completely unexcavated. Some areas are almost certainly accurate (Regions I, III, and IX), while other are somewhat more speculative (Regions IV and V).

PBMP CTP (Tiles): This layer is turned off by default and represents the location of the 628 properties described in the “Structures” section of the Corpus Topographicum Pompeianaum (1983), pars. II. This layer visualizes the locations in the PBMP CTP (Features) layer, but does not contain that layer’s attribute data.

Mapping the Mapping Project’s Design

On Wednesday of last week I had a fantastically productive meeting with my colleague and GIS architect for the PBMP, Alexander Stepanov. In about an hour we defined the current state of our mapping project, reconceived and reified how the GIS would move forward, and established how it will function as the “pivot” for the other elements of the PBMP. Below is an image of the white board we marked up over that hour:

PBMP_MappingMeeting_2.26.2014

Much of this was already in our heads or sketched in broader terms in the notes of other meetings. What was different in this meeting was the previous four months of work to understand the production of the spatial data and to describe it in a specific and unique nomenclature. With this new foundation and confidence it was easier to define the phases of development and know how to realize them. From the image you might see these development phases of the GIS scribbled in my impenetrable handwriting. Here they are with a bit more detail:

Phase One: A Basic Map for Navigation. The first step in our plan (and the primary functionality intended for our GIS) is a map that allows one to effortlessly move across the landscape of Pompeii, to shift scales, to add and remove data layers, and to access the basic descriptive data of those layers. As I mentioned previously, we have focused on a subset of our spatial data for the Navigation Map. These layers are largely topographic rather than interpretive in their content:

GIS Name (Prefix_short-Name_Extent_Type_Version) Alias Code
PMBP_Alley_City_Poly_001 Alleys ALS
PMBP_FtWall_City_Poly_001 Fortification Walls FTW
PMBP_Curb_City_Poly_001 Curbstones CBS
PMBP_Elev_City_Pnts_001 Elevation Points ELP
PMBP_Forum_City_Poly_001 Forum FRM
PMBP_Fount_City_Poly_001 Fountains FNT
PMBP_Gates_City_Poly_001 Gates CGT
PMBP_Ins_City_Poly_001 Intersections INT
PMBP_PropWalls_City_Line_001 Property Walls (Muri) PRW
PMBP_NSS_City_Poly_001 Narrowing Stones NSS
PMBP_ProjIns_City_Poly_001 Projected City Blocks (insulae) PJI
PMBP_ProjInt_City_Poly_001 Projected Intersections PJT
PMBP_ProjStr_City_Poly_001 Projected Streets PJS
PMBP_PropEsch_City_Poly_001 Properties by Eschebach PRE
PMBP_RutsTsuji_City_Poly_001 Ruts by Tsujimura RTS
PMBP_Sidwlk_City_Poly_001 Sidewalks SDW
PMBP_SSS_City_Poly_001 Stepping-Stones SSS
PMBP_Str_City_Poly_001 Streets STR
PMBP_UnEx_City_Poly_001 Unexcavated Areas UNA
PMBP_WatTow_City_Poly_001 Water Towers WTS

Embedded in the nomenclature are elements of the data structure itself, including the data’s producer, an abbreviated indication of the content, its scale, the geometry type, and a version number. Thus,  the file name “PBMP_Forum_City_Poly_001” expresses that the file was made by the PBMP, encompasses the area of the forum, operates on the scale of the city, is a polygon, and is version 1. A more human readable alias is also included, as is a unique three letter code that serves as a prefix in the IDs of individual objects within that layer. In this case, there is only one Forum at Pompeii, so the polygon named FRM000001 describing it is the only feature in that layer.

To finish the Navigation Map we have only a few simple tasks to complete, including finalizing the metadata description of each layer, defining which parts of that metadata should be displayed when the user accesses if via the map (“identify tool”), and globally adjusting the position of our files to overlie publicly available satellite imagery. This final task is a kind of compromise between absolute accuracy and usability. That is, because a perfectly precise positioning of our Pompeii data would likely move it closer to the satellite imagery but not exactly overlie it, it is preferable to produce our data in a way that better meets user expectations and better integrates with other applications. If we have to be “wrong”, let’s be wrong in the right direction.

Phase Two: An Information / Interpretation Map. The landscape of Pompeii will become far richer as we begin to add illustrative and academic information about each of the objects in the map. We already link each property to Pompeii in Pictures, but the addition of information on each property from the Corpus Topographicum Pompeianum (CTP) will offer a full listing of all the names given to these properties as well as a basic bibliography for each. Such bibliographic content, together with the spatial index provided by Garcia y Garcia in the Nova Bibliotheca Pompeiana, will provide the first connection to our catalog of citations (see below for more on this).  In Phase Two we will also include functional interpretations. City-wide interpretive data come from the CTP and, of course, from the Eschebach plan of Pompeii. The 1993 update of this plan in Lislotte Eschebach’s Gebäudeverzeichnis und Stadtplan der antiken Stadt Pompeji also contains additional information about each property, such as dates of excavation, finds, and decoration as well as additional bibliography. We are in process of adapting this information as well. To provide more up to date functional information, we will also be including the published work of scholars who have focused on specific properties or types of properties, such as bars (Ellis), fullonicae (Flohr), or bakeries (Monteix). Finally, to increase the spatial resolution of the map, we are creating room level data based on the definitions and nomenclature in the Pompei: Pitture e Mosaici volumes. Producing spatial data at this level will offer the potential to attach research data in a more specific and powerful way, such as the finds-by-room data produced by Penelope Allison.

To do all of this work will require an investment of time to ensure the spatial and descriptive integrity of every  building the ancient city. On the descriptive side, this work will start by getting the building’s address correct and associating that with all previous addresses. Luckily, the CTP has published concordances from which to work. For the spatial side, there is no such index. Each building in our map will need to be examined and compared with previous maps to ensure that when we attach functional or interpretive data to a property, that property is the expected shape. For minor differences, especially in the interior of a single building, those differences will be described in the metadata and illustrated in georeferenced plans (when possible). For major differences, a new polygon will be drawn to reflect the different interpretations of a building’s shape. While a faithful representation of multiple interpretations is appropriate, it will also necessitate further attention to how different elements of the map interact with one another. This is called topology.

Phase Three: A Query Map.

Topological rules are the basis for one of the most powerful aspects of geographical information systems: the ability to search spatially. A spatial search can be on strictly spatially descriptive attributes, such as the elevation of a point, the length of a line, or the area of a polygon. It can also be used to find non-spatial attributes attached to the same geometries: the source of the point’s elevation data, the name of the street the line defines, or kinds of floor treatments of a room’s polygon. Most importantly, the spatial relationships among geometries can also be searched. That is, one could ask if a kind of room were found within houses of a particular size or if that house was within a certain distance of another kind of property, such as an inn or bakery. To generate this valuable kind of search depends upon three components:

  1. How well defined the physical shape of Pompeii is;
  2. How much academic information we can attach to those shapes;
  3. How carefully and precisely we define the topological rules.

Part one is well underway as our Navigation Map. Part two is growing, but always needs more information. Help from the community is ALWAYS desired. Part three, the Queryable Map, is in the very near future.

Phase Four: Pompeii’s Bibliocartography.

For the PBMP, the essential and defining query – whatever its structure – is to access accurate bibliographic information through the map. Realizing this functionality is fortunately not a terribly complex technological problem. Because of its robust native query functions, GIS  will be the primary platform for combining the different types of data. Specifically, the unique ID of any map element will be “king”, the atomic bond between tables of attribute data, catalogs of bibliographic citations, and indexes of full-text publications. An example will make this clearer. As our drawing (failingly) attempts to illustrate, an individual map element, in this case the polygon of a house in Pompeii, will have descriptive information attached to it. The most important of these will be the name of the house, or rather the many names given a house over the centuries since its excavation. These names and their addresses will provide the handle for our processing of full text documents, allowing us to not only make the documents discoverable via the map, but also make the map serve to illustrate bibliographic searches.

Connecting citations to to places will require a number of approaches to be employed. As a proof of concept, in 2011 the PBMP first used the spatial index created by Garcia y Garcia in for his first two volumes of the Nova Bibliotheca Pompeiana. An updated version is available online. This index specifically lists each citation number associated with a particular address in the city. As the effort of one dedicated scholar, this index is truly remarkable. As a bridge between Pompeii’s physical and publication landscapes, however, it reaches less than one quarter of the way across the gulf that divides them. That is, only about 25% of all the citations are given an address and only about the same percentage of the places in Pompeii are listed. To associate more works, and to parse their contents more precisely, the PBMP is applying natural language processing techniques to all the full-text documents we can capture. Let me echo our call for help again here to grow our repository. Our authority list of Pompeii’s toponymy has been generated from its complete enumeration (more than 5000 entries) in volume II of the Corpus Topogrpahicum Pompeiana while the collocation (and implicit disambiguation) comes from the “Numerical Index” of the same volume. Because gathering the entire corpus of Pompeian scholarship in full-text will take some time, we plan to move forward with intermediate steps including parsing title keywords, processing book indexes, and seeking community help in tagging works they have read (or written!).

The Future. 

Careful viewers will have seen some notations in the image above about the future, especially concerning ways to extend the project and the multiple platforms for potential dissemination. We have always intended to have both download and upload capabilities; the ability for users to pull down our data and for the PBMP to ingest their additions, changes, and improvements. Since the project was conceived, a number of online platforms, resources, and coding practices have risen to prominence including OpenJumpPostGIS, GeoJSON, and GitHub. It is our intension to keep the doors of our data open to these and future developments.

– EP

The PBMP: Getting Started

This is the first in a series of posts – frequent, but unlikely regular – discussing the work we are doing to achieve the goal of making the bibliography of Pompeian scholarship searchable via a map. The Pompeii Bibliography and Mapping Project (PBMP) is committed to this task and now has the necessary resources to complete it with the generous funding of both an American Council of Learned Societies Digital Innovation Fellowship and a National Endowment for the Arts Digital Humanities Start Up Grant. What the PBMP is and who we are can be found elsewhere on our website, so I wont repeat that information here. Instead, what follows are some preliminary ruminations on our goals and what we have accomplished in our first month in operation.

Two of the three elements of the PBMP – the bibliographic catalog/subject repository and the online GIS – are already online in beta forms and linked together through only the most basic toponymic information. The work to replace both of these demos has been the focus of much of the first weeks of work. The bibliographic catalog is based on the incredible compilation of citations published by Laurentino Garcia y Garcia in his three volume Nova Biblioteca Pompeiana, which recently also has been released online in pdf. Our continuing work to digitize this resource was lead last year by UMass student Jackson Mitchell with funding through a Project Funds Grant from the UMass College of Humanities and Fine Arts. Mitchell is continuing his leadership role this year, working with Kevin Nguyen (among others, some named below) to proof the catalog data and to add essential metadata.

The Nova Biblioteca Pompeiana’s collection is current only through 2011, however, and new scholarship is constantly being published. Therefore, one challenge is to bring the catalog current and find new sources as they are produced. It dawned on me that that libraries and research databases are able to quickly learn about new books. How do they do it? In addition to this line of investigation, Leslie Bradshaw (UMass Digital Humanities Initiative Graduate Assistant) is researching programmatic and crowdsourcing solutions so that the PBMP can keep current with the continually expanding its list of publications on Pompeii. When we learn answers to these questions, they will be published on this blog, which is carefully managed by Chris Caro, the PBMP’s IT assistant.

We are also actively expanding the number of full-text documents that can be linked to these citations. UMass Librarian, Annette Vadnais has been working with the Hathi Trust to gain access to their vast corpus of Google’s scanned books. Additional works, mainly from the Internet Archive, are being added to our collections through a collaboration with the UMass Center for Intelligent Information Retrieval’s (CIIR) A Million Scanned Books and Proteus projects. Digitizing some works with particular topographic importance is also underway in house. In 2010, the PBMP was awarded one of the initial UMass Digital Humanities Initiative Seed Grants, which funded work on our bibliographic catalog and the purchase of an Atiz BookDrive Mini book scanner. Currently, the Pompei: Pitture e Mosaici volumes are being scanned by Tess Brickley. At the same time, Danielle Dyer is parsing the data in the OCR text from our scans of the Corpus Topographicum Pompeianum into a database of individual property records. Later, this database will be supplemented (including concordance) by property information from other sources, such as Eschebach’s Gebäudeverzeichnis und Stadtplan der antiken Stadt Pompeji, Astrid Schoonoven’s Metrology and Meaning in Pompeii (Appendix I), and Damian Robinson’s The Shape of Space in Pompeii: Studies in the Social Production of a Roman Urban Landscape (Appendix, unpublished diss., Univ. of Bradford). Danielle is also correcting the scanned concordance of place names and addresses (“The Toponymy”, CTP II, pp. 1-203). These efforts to generate authority lists of the Pompeian topography will be essential to the task of making our full-text corpus searchable and connected to our GIS map. This work will begin in earnest this spring, as we apply Natural Language Processing techniques to our digitized catalog under the direction of Prof. David Smith.

Last week, Annette and I also had a fascinating and important meeting with Laura Quilter, UMass Copyright and Information Policy Librarian, to discuss the issues of copyright regarding what content PBMP can and cannot make available. I am inspired to think this topic needs a post of its own, but two important points can be shared here. The first point is that the last three-quarters of the 20th century is a “copyright black hole” (my descriptor, not Laura’s). That is, between our current open access boom and 1938, there is a circa seventy year period of digital silence. Within this period, copyright holders (mostly publishers), can prevent the wide redistribution of the important information within those titles. A second fact potentially ameliorates the impact of this fact: authors can petition publishers to release their copyright if the work is no longer in print and cannot be purchased for a reasonable price. The publisher has the option to reprint the work, but if they choose not to, the copyright must go back to the author. Later in the life of the PBMP, we will certainly pursue a strategy of contacting authors and publishers to get the 20th century to rejoin the 21st, as well as the 18th and 19th centuries in terms of accessibility.

Fig. 1. Location of 2006 survey points, from Morichi, Panone, Rispoli, and Sampaolo (2006, 554-555).

Fig. 1

The second major component of the PBMP is our online GIS map. Digitization of Pompeii’s landscape, in its most basic form, is complete and already online. The digitization process, however, was done in piecemeal fashion and for a separate purpose. These files served as the underlying GIS for my dissertation on Pompeii’s traffic system. Because the spatial data for the PBMP was born in dissertation research, the current GIS is overly specialized in some areas and lacks detail in others. Several subsequent publication projects (e.g. on stables and on the drainage system) were also added to the new datasets of general (e.g. a new contour map of local topography) and very specific (e.g. the locations of street blockages) interest. The first task was therefore to inventory the hundreds of files and (their versions) in a dozen file formats, spread across three computers and six external drives to find out what we are missing. That inventory is shared here as a Google Spreadsheet.

Fig. 2 Example of SAP survey point sheet (ST 60).
Fig. 2_ST 060
Fig. 2_ST  60

Even the very first step in creating the GIS, georeferencing the Soprintendenza Archeologica di Pompei’s (SAP) AutoCAD plan of the site, was done as part of another project. In July of 2006, I was completing my charge to build an accurate 3D wireframe model of Insula VI 1 for the Anglo American Project in Pompeii (AAPP) using a reflectorless laser theodolite. During the same period the SAP commissioned a theodolite transit of the city and the creation of nearly 100 georeferenced survey points (fig. 1-2). We were given three of these points and I began the process of shifting the AAPP model, joined to the SAP’s basic CAD plan of the entire city into real world coordinates. The process, conceptually, was very easy:

Capture the SAP survey points within the local AAPP grid and model.
Create a point in UTM coordinates in the CAD model for each of the SAP survey points.
Copy the joined SAP and AAPP models and paste them into UTM coordinates based the location of one of the points in step #2.
Rotate the entire model so that a second point in original model is moved to match the location of a second point in step #3.

Fig. 3. Angular distortion across insula VI 1.

Fig. 3

The actual transformation procedure, however, became relatively complex due to the differing levels of precision among GPS readings (+/- 10cm), electronic survey measurements (+/- 1cm), and the infinite scalability of AutoCAD. Thus, although the model could be moved to the exact location of any single point (step #3), the difference in precision meant that the model could not be rotated and “snapped” exactly to any second point (step #4). More importantly, errors in the GPS points’ positions were compounded in the rotation procedure, expanding over distance and ensuring that the southern and eastern portions of the city would suffer the greatest distortions (figs. 3-4). Fixing the errors in position (and projection) is one of the initial tasks the PBMP will undertake, lead by UMass senior GIS analyst, Alexander Stepanov.

Fig. 4. Detail of angular distortion. Error is as much as 0.26m over 51.36m.
Fig. 4

Following the correction of positional errors, the next task will be to assess the internal consistency of the spatial data. All the city’s features not found in the original CAD plan- for example, polygons of the streets and street features, unexcavated areas, gates and fortifications, fountains and water towers, as well as the individual insulae and properties – were hand drawn by me over the course of a decade. Minor inconsistencies due to (my) human error in the shapes of features and especially in how those features interact with others – by overlapping, containing, forming a boundary, etc. – must be identified and corrected. In the course of this work we will also establish our basic topological rules for the data, defining how objects can and cannot function in the GIS. One obvious rule will be that properties cannot be overlapped by other properties, but others will also be devised.

In the coming weeks we hope to not only detail our progress in this blog, but also have demos and beta products to offer for testing and feedback. And, let me say it now for the first of what will be many times: we are always interested in your suggestions for what bibliographic information should be included in our catalog, what aspects of Pompeii’s urban topography should be digitized, and what functionalities the PBMP should offer to you, our community of interested users. Having read this far your interest must be genuine indeed.

EP