Documentation on Linking Data

With the first GIS map of Pompeii now available online, we are turning more of our attention to the problem of connecting our spatial data to our bibliographic data. While there is still some important spatial work to be done with the current map, the planning and documentation for the bibliographic integration serves as a worthwhile distraction. To that end and following a discussion last week with Alexander Stepanov, the PBMP’s GIS architect, I’ve decided to write up some very quick documentation for our data and their connections as a blog post. I’ve also decided to try something else new. Below is a Google Slide with the designs and discussions we drew on a whiteboard as the background. Over this are shapes representing files we need to link together with their names hyperlinked to their locations on the web (as hosted sites or Dropbox objects). In this way, the blog post operates in three different dimensions:

  1. As a public discussion
  2. As a living, internal document
  3. As an interface to the repository of files we’re using.

The files listed are as follows:

A single file of spatial data to start, the Propeties by Eschebach (Prop_ESCH), representing all the building and occupied spaces in the city. Later this will expand to include other, more generalized features of the landscape, such as the City Blocks, Gates, and Fortification Walls.

Three files from the Nova Bibliotheca Pompeiana are given here:

  1. The first 10,000 citations (GYG Citations_BIBLIOGRAPHY) completed from the NBP as there were prepared for uploading to Zotero (and then to Omeka). This shows how the data were divided and might be recombined.
  2. A list of property addresses from the Spatial Index from the NBP (GYG Citations_INDEX). This gives as a one-to-many relationship the address of a property and the one or more citations that relate to it.
  3. A list of addresses per citation as extracted from the full-text of the first two volumes of the NBP (GYG Citations_TEXT). This gives as a one-to-many relationship the bibliographic citation as given by Garcia y Garcia and the one or more addresses that relate to it.

Naturally, there will be a significant overlap between #2 and #3, which will reduce the total number of connections, but also offer a chance to preform quality control test on the data as extracted from the NBP.

If thinking of this a merely a spatial data problem, the work to be done is non-trivial, but also not conceptually difficult. That is, if all we wanted to do was to connect the bibliographic data to the map so that users could click on it and access that information, the process would be straight-forward: combine and proof tables #2 and #3, then join them to the spatial data of Properties by Eschebach. Indeed, that *is* our primary goal, but we also want those bibliographic citations to be linked to their full references on our other platforms (i.e., Zotero and Omeka). Moreover, we want users to be able to use search functions in the map – beyond navigating and clicking – to both find and leverage bibliographic information. For example, we want people to be able to search for an author in the map and have the sites and buildings associated with that author appear highlighted. The user should also then be able to create a new search off of this subset of data, using either additional bibliographic criteria or spatial definitions. To make these functions possible, however, the data stored in the map cannot only be reference numbers linked out to other resources. Finally, we would like to eventually have searches in our bibliography be (passed to and) responsive in the map, so that the results of regular bibliographic searches might be visualized in the map as well as in the listing of citations.

As you can see from teh image, we’ve got an outline of how we’ll do this. Nonetheless, if you are a GIS architect, a digital collections librarian, data designer, or all around smart person and have an opinion on how this might be done, in all or in parts, please do email me: Pompeiana[AT]gmail.com

– EP

Zotero: the first 10,000 (almost) citations

The first 10,000 citations about Pompeii have now been prepared and 9,956 have been uploaded to our Zotero library. Users can search the library, reorder the display, export records, produce formatted citations, and add references to their own collections. These citations still have issues in need of correction due to both human error and text character translation. We hope to improve the citations and eventually add more using the PBMP Zotero Group. Please sign up and get in touch (PompeianaATgmail.com) if you are interested.

Most of the content in the Zotero library is self-explanatory, as the redundancy of the table below demonstrates. There are, however, two fields that need some clarification to be properly used or ignored. These are:

  1. Loc. In Archive: This is PBMP ID, a unique, sequential number assigned by the project.
  2. Call Number: This is the NBP ID, a (mostly) unique, alphanumeric designation assigned by L. Garcia y Garica in his landmark three volume work, Nova Bibliotheca Pompeiana. Use this number to discover further information about the author, editions of the publications, reprints, and reviews. Volumes I and II are still in print and volume III is newly available. We encourage you to encourage your library to purchase the remaining copies of these works.

 

Item header: Title of the work

Item Type Publication format, such as book, journal article, artwork, etc.
Title Title of the publication.
Author Author(s) of the publication.
Series Editor Name(s) of editor(s), or various authors (AA.VV) if there is no specific editor.
Series Name of publication series, if any.
Place Place of publication.
Date Year of publication.
# Of Pages Extent of pages in the publication.
Language Language of publication.
URL  Link to Full-Text of the publication.
Loc. In Archive PBMP ID
Call Number NBP ID

Pompeii: The First Navigation Map

The PBMP’s first full map for navigation is now online. You can start to explore Pompeii in the map embedded below, or go to the full site for more space and options. If you want to customize the map or make a presentation from it, sign in to / sign up for your ArcGIS Online account and save a copy to your own webspace. The link is at the upper right of the embedded map page. [Click here to download the files as a map package or as Shapefiles (with minor improvements from online version) or as an illustrator file of just the architecture]. Below the map is additional information about the files, the information they contain, and their display.

The “Pompeii: Navigation Map” is essentially a set of nested tiles that change the display of the city as one zooms in and out to change the scale of the map. Overlying these are a series of vector-based files, which are used almost exclusively as invisible, data rich layers. That is, the transparency for many of the files set to 100% so that the information about Pompeii those files hold can be accessed (via a pop-up window), but their rendering does not slow the loading of the map.

Users may find the information in the following files to be of interest:

Data-Rich Layers

Elevations Points: This layer is turned off by default and set to not appear until the view scale reached 1:2500. Above sea level elevation data at 5cm or 10cm resolution from multiple sources: Corpus Topographicum Pompeianaum (1984); De Caro, S. (1979); Eschebach and Müller-Trollius (1993); Etani et al, 2003; Pompeii Archaeological Research Project: Porta Stabia.

Eschebach ALL: (West & East). Due to the number of features in the original file (Properties by Eschebach), the file was split in two along the via Stabiana. The user should notice little difference. There are, however, some significant issues to be aware of in the spatial consistency of properties for those interested in the area of individual features. Because the properties were drawn to express the functional categories assigned by Eschebach and not the contiguous physical boundaries of the building, there are overlaps, gaps, and duplications in the data. We are working to improve these data. For the moment, caveat emptor. These files do, however, contain information of importance to researchers, including:

  1. Address of the Primary Door according to Eschebach (1970; 1993).
  2. Functional Category according to Eschebach (1970; 1993).
  3. A link to image of the property at Pompeii in Pictures.
  4. The Date(s) of excavation according to the Corpus Topographicum Pompeianaum (1984).
  5. Area of the property in square meters.

PBMP CTP (Features): The 628 properties in this file represent the properties described in the “Structures” section of the Corpus Topographicum Pompeianaum (1983), pars. II. This layer contains information of importance to researchers, including:

  1. Address of the Primary Door.
  2. Page number of the information in the Corpus Topographicum Pompeianaum (1983), pars. II.
  3. All known names of the property: Name (1) – Name (15).
  4. Bibliographic reference for each known name of the property: Ref. Name (1) – Ref. Name (15).
  5. A link to image of the property at Pompeii in Pictures
  6. Area of the property in square meters.

Display Layers

Fortification Walls: Sixteen sections, between the defensive towers and city gates, of Pompeii’s extant fortifications are shown and named.

Defensive Towers: Eleven of the twelve known (by inscription) defensive towers surrounding Pompeii are shown and named.

Gates: The seven known gates to Pompeii are shown and named.

Unexcavated Areas: Three primary areas still not yet excavated (in Regions I, III, IV, V, and IX) are shown and named, as well as isolated areas along the interior of the fortification walls.

City Blocks: The excavated extent of the city blocks (insulae) are shown and labeled.

Streets: There are 97 streets and passage areas represented in this file with the extend of the street and its name given according to their modern conventional nomenclature (in Italian).

Alleys: Six passages within city blocks and disconnected from the street network shown and named.

Sidewalks: The excavated extent of the pedestrian sidewalks are shown.

Stepping-Stones: The 316 known stepping-stones within the street network are shown and named.

Forum: The forum, though also given a designation as a city block (VII 8), is shown here as its own feature.

Water Towers: The twelve water towers are located and labeled according to the nomenclature established in Larsen, 1982.

Fountains: Thirty-four public fountains, including both the complete footprint of the fountain and its interior basin, shown, symbolized to show the basin with water, and named.

Projected City Blocks (insulae): This is layer is turned off by default and expresses 46 extrapolated city blocks that remain partially or completely unexcavated. Some areas are almost certainly accurate (Regions I, III, and IX), while other are somewhat more speculative (Regions IV and V).

PBMP CTP (Tiles): This layer is turned off by default and represents the location of the 628 properties described in the “Structures” section of the Corpus Topographicum Pompeianaum (1983), pars. II. This layer visualizes the locations in the PBMP CTP (Features) layer, but does not contain that layer’s attribute data.

The Elegance (and Importance) of Ugly: the “Errorscape”

The Elegance (and Importance) of Ugly: the “Errorscape”

Some pretty things are actually rather ugly. Take this map (below), for example:

Gaps&Overlaps

While it is pleasing in its shifting colors (ordered here by the area of the polygon) and even, to me, a bit mesmerizing  as the eye tries also to accommodate transparent, color-coded detail along the streets, this map is replete with error.* Indeed, its job is to reveal those errors. The blue and red shapes represent all the places where one polygon improperly meets another. This can mean that a street for a short stretch overlaps a sidewalk or that a small gap exists between that sidewalk and the house or shop beside it. The image links to a very large version (5312×2938) so you can see some these problems for yourself. They’ll still be hard to see because, despite the three meter buffers that surround them, each overlap or gap is less than three square meters in area, and the vast majority are less one square meter. The problems all come, sadly, from me… well, from me being a human being living in a world of finite resources and time. Each problem is a drawing error produced when I digitized the landscape of Pompeii – approaching 5,000 individual features – between 2005 and 2008. Now, in order to have a more precisely rendered landscape of Pompeii, one that both looks good as a map free of drafting errors and works well as a table with as accurate information as possible, these errors must be fixed. There are 3023 areas of interest to examine, so this won’t be fixed overnight.

In a related problem, the georeferencing of all the data is off. That is, when we overlie it upon satellite imagery, it is shifted considerably – approximately 30m to the southwest. The issue, however, is global and appeared to be one that would require only a reprojection of the data.** We did this using our “Architectural Features” layer, which represents all the architectural ground features (i.e., not the walls). When the two layers overlapped, they looked aesthetically interesting and remarkably like an archaeological phase plan.

Projection error looks like PhasePlan2

The utility of this plan is to show us the scale and direction of our reprojection error. The interesting thing about it is the way it piques the archaeological imagination. I immediately want to push the blue and the brown apart in chronology. I try to imagine how the experience of the space changed between these periods and wonder about the significance of the altar built directly above the stairs of an earlier temple. Then I remind myself, this is not a real landscape, it is a functional “errorscape”. Its a deliberate graphical representation of errors in (ultimately, tabular) data with the purpose of defining and resolving those errors. Still, I can’t help but reflect on its attraction and pull on me, the seduction of space and time distilled to a few lines in different colors, however unreal.

The point of the map, beyond its wonderment to me, is to be a guide between where my data are and where I want them to be. It serves to show me that my spatial data must be pushed through an new projection. This testing, however, shows that not only is there an error in projection, there is an error in location. The reprojected layer is still about 2.20m off from the satellite image, this time to the southeast. This error is undoubtedly again my fault, created in the original georeferencing of the data. Really, it’s only partially my fault as there are inaccuracies both in the underlying CAD data generously shared by the Soprintendenza*** as well as in the GPS survey that gave the real world coordinates. Regardless, the next step will be to adjust our coordinates to match the satellite imagery because, as I said in the last post, “if we have to be “wrong”, let’s be wrong in the right direction.”

-EP

 

Some more technical details:

* To make the map of gaps and overlaps, I did the following:

  1. Create new feature for the outline of the entire city;
  2. Union all polygons together;
  3. Sort the attribute table of the resulting layer by area, descending;
  4. Select all those items with an area less than three square meters;
  5. Make a layer from the selected items and export to geodatabase;
  6. Enlarge this layer’s visibility using the buffer tool (3m).
  7. Symbolize the total area of the Union layer as a color ramp
  8. Symbolize those gaps/overlaps less than 1m in red, more than 1m in blue.
  9. Adjust their displayed transparency to 50%.
  10. Export map.

** To test the error for reprojection, I used an unusual process: I converted the shapefiles to Google Earth KML files because I had noticed in a previous experiment that these files far more precisely overlaid the satellite imagery. Some of that work, as images, is here:

Shift_via_KML_1

Shift_via_KML_2

 We have struggled thus far (ok, after one afternoon) to get our files to reproject appropriately in ArcGIS or in FME. All suggestions are welcome.

***  The SAP CAD file was created by digitizing the RICA Maps of Pompeii, 1984, supported by World Monuments Watch and funded by American Express. The RICA MAPs were drawn in part from aerial images taken at 820m and supplemented by on the ground survey. Inconsistencies among the features of this layer with satellite imagery or other maps of Pompeii may be the result of errors in the rectification of the original aerial images. On the creation of the RICA Maps, see Van der Poel, H. B., Corpus Topographicum Pompeianum, vol. IIIA. Austin, TX, 1986, XI-XIX.

Mapping the Mapping Project’s Design

On Wednesday of last week I had a fantastically productive meeting with my colleague and GIS architect for the PBMP, Alexander Stepanov. In about an hour we defined the current state of our mapping project, reconceived and reified how the GIS would move forward, and established how it will function as the “pivot” for the other elements of the PBMP. Below is an image of the white board we marked up over that hour:

PBMP_MappingMeeting_2.26.2014

Much of this was already in our heads or sketched in broader terms in the notes of other meetings. What was different in this meeting was the previous four months of work to understand the production of the spatial data and to describe it in a specific and unique nomenclature. With this new foundation and confidence it was easier to define the phases of development and know how to realize them. From the image you might see these development phases of the GIS scribbled in my impenetrable handwriting. Here they are with a bit more detail:

Phase One: A Basic Map for Navigation. The first step in our plan (and the primary functionality intended for our GIS) is a map that allows one to effortlessly move across the landscape of Pompeii, to shift scales, to add and remove data layers, and to access the basic descriptive data of those layers. As I mentioned previously, we have focused on a subset of our spatial data for the Navigation Map. These layers are largely topographic rather than interpretive in their content:

GIS Name (Prefix_short-Name_Extent_Type_Version) Alias Code
PMBP_Alley_City_Poly_001 Alleys ALS
PMBP_FtWall_City_Poly_001 Fortification Walls FTW
PMBP_Curb_City_Poly_001 Curbstones CBS
PMBP_Elev_City_Pnts_001 Elevation Points ELP
PMBP_Forum_City_Poly_001 Forum FRM
PMBP_Fount_City_Poly_001 Fountains FNT
PMBP_Gates_City_Poly_001 Gates CGT
PMBP_Ins_City_Poly_001 Intersections INT
PMBP_PropWalls_City_Line_001 Property Walls (Muri) PRW
PMBP_NSS_City_Poly_001 Narrowing Stones NSS
PMBP_ProjIns_City_Poly_001 Projected City Blocks (insulae) PJI
PMBP_ProjInt_City_Poly_001 Projected Intersections PJT
PMBP_ProjStr_City_Poly_001 Projected Streets PJS
PMBP_PropEsch_City_Poly_001 Properties by Eschebach PRE
PMBP_RutsTsuji_City_Poly_001 Ruts by Tsujimura RTS
PMBP_Sidwlk_City_Poly_001 Sidewalks SDW
PMBP_SSS_City_Poly_001 Stepping-Stones SSS
PMBP_Str_City_Poly_001 Streets STR
PMBP_UnEx_City_Poly_001 Unexcavated Areas UNA
PMBP_WatTow_City_Poly_001 Water Towers WTS

Embedded in the nomenclature are elements of the data structure itself, including the data’s producer, an abbreviated indication of the content, its scale, the geometry type, and a version number. Thus,  the file name “PBMP_Forum_City_Poly_001” expresses that the file was made by the PBMP, encompasses the area of the forum, operates on the scale of the city, is a polygon, and is version 1. A more human readable alias is also included, as is a unique three letter code that serves as a prefix in the IDs of individual objects within that layer. In this case, there is only one Forum at Pompeii, so the polygon named FRM000001 describing it is the only feature in that layer.

To finish the Navigation Map we have only a few simple tasks to complete, including finalizing the metadata description of each layer, defining which parts of that metadata should be displayed when the user accesses if via the map (“identify tool”), and globally adjusting the position of our files to overlie publicly available satellite imagery. This final task is a kind of compromise between absolute accuracy and usability. That is, because a perfectly precise positioning of our Pompeii data would likely move it closer to the satellite imagery but not exactly overlie it, it is preferable to produce our data in a way that better meets user expectations and better integrates with other applications. If we have to be “wrong”, let’s be wrong in the right direction.

Phase Two: An Information / Interpretation Map. The landscape of Pompeii will become far richer as we begin to add illustrative and academic information about each of the objects in the map. We already link each property to Pompeii in Pictures, but the addition of information on each property from the Corpus Topographicum Pompeianum (CTP) will offer a full listing of all the names given to these properties as well as a basic bibliography for each. Such bibliographic content, together with the spatial index provided by Garcia y Garcia in the Nova Bibliotheca Pompeiana, will provide the first connection to our catalog of citations (see below for more on this).  In Phase Two we will also include functional interpretations. City-wide interpretive data come from the CTP and, of course, from the Eschebach plan of Pompeii. The 1993 update of this plan in Lislotte Eschebach’s Gebäudeverzeichnis und Stadtplan der antiken Stadt Pompeji also contains additional information about each property, such as dates of excavation, finds, and decoration as well as additional bibliography. We are in process of adapting this information as well. To provide more up to date functional information, we will also be including the published work of scholars who have focused on specific properties or types of properties, such as bars (Ellis), fullonicae (Flohr), or bakeries (Monteix). Finally, to increase the spatial resolution of the map, we are creating room level data based on the definitions and nomenclature in the Pompei: Pitture e Mosaici volumes. Producing spatial data at this level will offer the potential to attach research data in a more specific and powerful way, such as the finds-by-room data produced by Penelope Allison.

To do all of this work will require an investment of time to ensure the spatial and descriptive integrity of every  building the ancient city. On the descriptive side, this work will start by getting the building’s address correct and associating that with all previous addresses. Luckily, the CTP has published concordances from which to work. For the spatial side, there is no such index. Each building in our map will need to be examined and compared with previous maps to ensure that when we attach functional or interpretive data to a property, that property is the expected shape. For minor differences, especially in the interior of a single building, those differences will be described in the metadata and illustrated in georeferenced plans (when possible). For major differences, a new polygon will be drawn to reflect the different interpretations of a building’s shape. While a faithful representation of multiple interpretations is appropriate, it will also necessitate further attention to how different elements of the map interact with one another. This is called topology.

Phase Three: A Query Map.

Topological rules are the basis for one of the most powerful aspects of geographical information systems: the ability to search spatially. A spatial search can be on strictly spatially descriptive attributes, such as the elevation of a point, the length of a line, or the area of a polygon. It can also be used to find non-spatial attributes attached to the same geometries: the source of the point’s elevation data, the name of the street the line defines, or kinds of floor treatments of a room’s polygon. Most importantly, the spatial relationships among geometries can also be searched. That is, one could ask if a kind of room were found within houses of a particular size or if that house was within a certain distance of another kind of property, such as an inn or bakery. To generate this valuable kind of search depends upon three components:

  1. How well defined the physical shape of Pompeii is;
  2. How much academic information we can attach to those shapes;
  3. How carefully and precisely we define the topological rules.

Part one is well underway as our Navigation Map. Part two is growing, but always needs more information. Help from the community is ALWAYS desired. Part three, the Queryable Map, is in the very near future.

Phase Four: Pompeii’s Bibliocartography.

For the PBMP, the essential and defining query – whatever its structure – is to access accurate bibliographic information through the map. Realizing this functionality is fortunately not a terribly complex technological problem. Because of its robust native query functions, GIS  will be the primary platform for combining the different types of data. Specifically, the unique ID of any map element will be “king”, the atomic bond between tables of attribute data, catalogs of bibliographic citations, and indexes of full-text publications. An example will make this clearer. As our drawing (failingly) attempts to illustrate, an individual map element, in this case the polygon of a house in Pompeii, will have descriptive information attached to it. The most important of these will be the name of the house, or rather the many names given a house over the centuries since its excavation. These names and their addresses will provide the handle for our processing of full text documents, allowing us to not only make the documents discoverable via the map, but also make the map serve to illustrate bibliographic searches.

Connecting citations to to places will require a number of approaches to be employed. As a proof of concept, in 2011 the PBMP first used the spatial index created by Garcia y Garcia in for his first two volumes of the Nova Bibliotheca Pompeiana. An updated version is available online. This index specifically lists each citation number associated with a particular address in the city. As the effort of one dedicated scholar, this index is truly remarkable. As a bridge between Pompeii’s physical and publication landscapes, however, it reaches less than one quarter of the way across the gulf that divides them. That is, only about 25% of all the citations are given an address and only about the same percentage of the places in Pompeii are listed. To associate more works, and to parse their contents more precisely, the PBMP is applying natural language processing techniques to all the full-text documents we can capture. Let me echo our call for help again here to grow our repository. Our authority list of Pompeii’s toponymy has been generated from its complete enumeration (more than 5000 entries) in volume II of the Corpus Topogrpahicum Pompeiana while the collocation (and implicit disambiguation) comes from the “Numerical Index” of the same volume. Because gathering the entire corpus of Pompeian scholarship in full-text will take some time, we plan to move forward with intermediate steps including parsing title keywords, processing book indexes, and seeking community help in tagging works they have read (or written!).

The Future. 

Careful viewers will have seen some notations in the image above about the future, especially concerning ways to extend the project and the multiple platforms for potential dissemination. We have always intended to have both download and upload capabilities; the ability for users to pull down our data and for the PBMP to ingest their additions, changes, and improvements. Since the project was conceived, a number of online platforms, resources, and coding practices have risen to prominence including OpenJumpPostGIS, GeoJSON, and GitHub. It is our intension to keep the doors of our data open to these and future developments.

– EP