Making the connection: a first functioning prototype of the PBMP

 Making the connection: a first functioning prototype of the PBMP

In a previous post I described the frustration of simultaneously succeeding in building multiple components of the PBMP and being unable to see how they fit together as “trying to grow a hand from the fingers in”.  Today, in a brief, but very effective meeting, all those working on the different parts of the project discussed how the GIS and bibliography, in their current states, will be joined together. What follows is a “meeting minutes” style documentation of our conversation, set out here for two purposes:

  1. To remind us of our thoughts and plans at this stage of the project
  2. To easily share those ideas with a wider community for comment and criticism.

The meeting began with a quick recitation by me (Poehler) on the status of each part of the project and what I thought needed to be accomplished by the end of the week. The deadline is not arbitrary: I will be presenting the PBMP as one part of a workshop I’ll give at the University of Texas at Austin’s Graduate Student Conference entitled “Digital Archaeology at an urban scale.” The poster and schedule can be found at the bottom of this post.

Bibliography Team: Ron Peterson and Aaron Rubinstein have been working to transform our massive spreadsheet of citation information (currently 13,040 completed references) into a format acceptable for importation by Zotero. Thus far the idea has been to condense the spreadsheet – originally designed for a MODS implementation – into Dublin Core and covert that to RIS format. We’ve wanted to have mirrored Zotero and Omeka citation databases, and since Omeka can import via Zotero, the challenge therefore is to get things into Zotero. Zotero can import from a number of formats, but RIS seems the best thus far. Aaron is working on that now.

Mapping Team: Alexander Stepanov and I have been working to publish online a solid, clean, and complete first map of the Pompeii. This will be a basic map for navigation, with some important attribute information included. The navigation and attribute functions relate to the Phase One and Phase Two maps described earlier on this blog. In many ways, this first published map is nearly done, but a few pieces remain to clean up. The first of these is to design and configure the what information will appear and what it will look like when a user clicks on a place in the map. Pop-up windows in ArcGIS and ArcGIS Online are dynamic and scriptable, but with only a subset of all the data available, we’ll need to strike a balance between what we can show now and what is possible eventually. A second pressing problem is the need to be able to zoom as far into the map as possible. Because the features of Pompeii are naturally at the human scale rather than geographical scale, users need to be able to zoom in to 1:50 scale or even lower to examine those features. Currently, ArcGIS online only scales to about 1:1000. Finally, I am working feverishly to complete the integration of  information from the Corpus Topographicum Pompeianum II (“Toponmy” section) with a new spatial dataset. What is driving this is the symmetry of presenting that data at the University of Texas at Austin, which published the CTP volumes thirty years ago.

After writing the above, I’m compelled expand into a little editorializing: Though it is only the first of what will be many versions of our map and mapping data, the importance of publishing this map should not be underestimated. Because we will allow users to download the entire map package – map and data together – this will be the first time a standard, fully digital map of Pompeii has been available. The CAD plan that underlies our GIS and the effort that scholars and the superintendency put into it should not be dismissed, but  it is not available publicly. Remarkably, this will be the first major cartographic advance on the topography of Pompeii since the publication of the RICA maps of the Corpus Topographicum Pompeianum thirty years ago in 1984.

Natural Language Processing Team: As we await a new batch of full-text objects from HathiTrust, Tiger Wu and David Smith have parsed the three volumes of the Nova Bibliotheca Pompeiana graciously published online by Arbor Sapientiae, extracting from each citation its number and all the addresses listed by L. Garcia y Garcia as being relevant to that citation. You can see that rough, but valuable tabulation here as tab separated values. We plan to use this as a first iteration of a joining table that will link bibliographic references to places in Pompeii with its digital spatial representations. Additionally, because we do have the basic information for works held by the Internet Archive – especially their permalinks – our plan is to integrate that information with our Zotero and Omeka catalogs so that, whenever possible, a researcher can go from finding a place to be interesting in the map to reading about that place in only a matter seconds.

When this prototype is working later this week, I will post some links to it. Expect also to read what I learn from demoing the PBMP with the folks at UT.

– EP

Rebuilding the City Poster

 

The Elegance (and Importance) of Ugly: the “Errorscape”

The Elegance (and Importance) of Ugly: the “Errorscape”

Some pretty things are actually rather ugly. Take this map (below), for example:

Gaps&Overlaps

While it is pleasing in its shifting colors (ordered here by the area of the polygon) and even, to me, a bit mesmerizing  as the eye tries also to accommodate transparent, color-coded detail along the streets, this map is replete with error.* Indeed, its job is to reveal those errors. The blue and red shapes represent all the places where one polygon improperly meets another. This can mean that a street for a short stretch overlaps a sidewalk or that a small gap exists between that sidewalk and the house or shop beside it. The image links to a very large version (5312×2938) so you can see some these problems for yourself. They’ll still be hard to see because, despite the three meter buffers that surround them, each overlap or gap is less than three square meters in area, and the vast majority are less one square meter. The problems all come, sadly, from me… well, from me being a human being living in a world of finite resources and time. Each problem is a drawing error produced when I digitized the landscape of Pompeii – approaching 5,000 individual features – between 2005 and 2008. Now, in order to have a more precisely rendered landscape of Pompeii, one that both looks good as a map free of drafting errors and works well as a table with as accurate information as possible, these errors must be fixed. There are 3023 areas of interest to examine, so this won’t be fixed overnight.

In a related problem, the georeferencing of all the data is off. That is, when we overlie it upon satellite imagery, it is shifted considerably – approximately 30m to the southwest. The issue, however, is global and appeared to be one that would require only a reprojection of the data.** We did this using our “Architectural Features” layer, which represents all the architectural ground features (i.e., not the walls). When the two layers overlapped, they looked aesthetically interesting and remarkably like an archaeological phase plan.

Projection error looks like PhasePlan2

The utility of this plan is to show us the scale and direction of our reprojection error. The interesting thing about it is the way it piques the archaeological imagination. I immediately want to push the blue and the brown apart in chronology. I try to imagine how the experience of the space changed between these periods and wonder about the significance of the altar built directly above the stairs of an earlier temple. Then I remind myself, this is not a real landscape, it is a functional “errorscape”. Its a deliberate graphical representation of errors in (ultimately, tabular) data with the purpose of defining and resolving those errors. Still, I can’t help but reflect on its attraction and pull on me, the seduction of space and time distilled to a few lines in different colors, however unreal.

The point of the map, beyond its wonderment to me, is to be a guide between where my data are and where I want them to be. It serves to show me that my spatial data must be pushed through an new projection. This testing, however, shows that not only is there an error in projection, there is an error in location. The reprojected layer is still about 2.20m off from the satellite image, this time to the southeast. This error is undoubtedly again my fault, created in the original georeferencing of the data. Really, it’s only partially my fault as there are inaccuracies both in the underlying CAD data generously shared by the Soprintendenza*** as well as in the GPS survey that gave the real world coordinates. Regardless, the next step will be to adjust our coordinates to match the satellite imagery because, as I said in the last post, “if we have to be “wrong”, let’s be wrong in the right direction.”

-EP

 

Some more technical details:

* To make the map of gaps and overlaps, I did the following:

  1. Create new feature for the outline of the entire city;
  2. Union all polygons together;
  3. Sort the attribute table of the resulting layer by area, descending;
  4. Select all those items with an area less than three square meters;
  5. Make a layer from the selected items and export to geodatabase;
  6. Enlarge this layer’s visibility using the buffer tool (3m).
  7. Symbolize the total area of the Union layer as a color ramp
  8. Symbolize those gaps/overlaps less than 1m in red, more than 1m in blue.
  9. Adjust their displayed transparency to 50%.
  10. Export map.

** To test the error for reprojection, I used an unusual process: I converted the shapefiles to Google Earth KML files because I had noticed in a previous experiment that these files far more precisely overlaid the satellite imagery. Some of that work, as images, is here:

Shift_via_KML_1

Shift_via_KML_2

 We have struggled thus far (ok, after one afternoon) to get our files to reproject appropriately in ArcGIS or in FME. All suggestions are welcome.

***  The SAP CAD file was created by digitizing the RICA Maps of Pompeii, 1984, supported by World Monuments Watch and funded by American Express. The RICA MAPs were drawn in part from aerial images taken at 820m and supplemented by on the ground survey. Inconsistencies among the features of this layer with satellite imagery or other maps of Pompeii may be the result of errors in the rectification of the original aerial images. On the creation of the RICA Maps, see Van der Poel, H. B., Corpus Topographicum Pompeianum, vol. IIIA. Austin, TX, 1986, XI-XIX.

Mapping the Mapping Project’s Design

On Wednesday of last week I had a fantastically productive meeting with my colleague and GIS architect for the PBMP, Alexander Stepanov. In about an hour we defined the current state of our mapping project, reconceived and reified how the GIS would move forward, and established how it will function as the “pivot” for the other elements of the PBMP. Below is an image of the white board we marked up over that hour:

PBMP_MappingMeeting_2.26.2014

Much of this was already in our heads or sketched in broader terms in the notes of other meetings. What was different in this meeting was the previous four months of work to understand the production of the spatial data and to describe it in a specific and unique nomenclature. With this new foundation and confidence it was easier to define the phases of development and know how to realize them. From the image you might see these development phases of the GIS scribbled in my impenetrable handwriting. Here they are with a bit more detail:

Phase One: A Basic Map for Navigation. The first step in our plan (and the primary functionality intended for our GIS) is a map that allows one to effortlessly move across the landscape of Pompeii, to shift scales, to add and remove data layers, and to access the basic descriptive data of those layers. As I mentioned previously, we have focused on a subset of our spatial data for the Navigation Map. These layers are largely topographic rather than interpretive in their content:

GIS Name (Prefix_short-Name_Extent_Type_Version) Alias Code
PMBP_Alley_City_Poly_001 Alleys ALS
PMBP_FtWall_City_Poly_001 Fortification Walls FTW
PMBP_Curb_City_Poly_001 Curbstones CBS
PMBP_Elev_City_Pnts_001 Elevation Points ELP
PMBP_Forum_City_Poly_001 Forum FRM
PMBP_Fount_City_Poly_001 Fountains FNT
PMBP_Gates_City_Poly_001 Gates CGT
PMBP_Ins_City_Poly_001 Intersections INT
PMBP_PropWalls_City_Line_001 Property Walls (Muri) PRW
PMBP_NSS_City_Poly_001 Narrowing Stones NSS
PMBP_ProjIns_City_Poly_001 Projected City Blocks (insulae) PJI
PMBP_ProjInt_City_Poly_001 Projected Intersections PJT
PMBP_ProjStr_City_Poly_001 Projected Streets PJS
PMBP_PropEsch_City_Poly_001 Properties by Eschebach PRE
PMBP_RutsTsuji_City_Poly_001 Ruts by Tsujimura RTS
PMBP_Sidwlk_City_Poly_001 Sidewalks SDW
PMBP_SSS_City_Poly_001 Stepping-Stones SSS
PMBP_Str_City_Poly_001 Streets STR
PMBP_UnEx_City_Poly_001 Unexcavated Areas UNA
PMBP_WatTow_City_Poly_001 Water Towers WTS

Embedded in the nomenclature are elements of the data structure itself, including the data’s producer, an abbreviated indication of the content, its scale, the geometry type, and a version number. Thus,  the file name “PBMP_Forum_City_Poly_001” expresses that the file was made by the PBMP, encompasses the area of the forum, operates on the scale of the city, is a polygon, and is version 1. A more human readable alias is also included, as is a unique three letter code that serves as a prefix in the IDs of individual objects within that layer. In this case, there is only one Forum at Pompeii, so the polygon named FRM000001 describing it is the only feature in that layer.

To finish the Navigation Map we have only a few simple tasks to complete, including finalizing the metadata description of each layer, defining which parts of that metadata should be displayed when the user accesses if via the map (“identify tool”), and globally adjusting the position of our files to overlie publicly available satellite imagery. This final task is a kind of compromise between absolute accuracy and usability. That is, because a perfectly precise positioning of our Pompeii data would likely move it closer to the satellite imagery but not exactly overlie it, it is preferable to produce our data in a way that better meets user expectations and better integrates with other applications. If we have to be “wrong”, let’s be wrong in the right direction.

Phase Two: An Information / Interpretation Map. The landscape of Pompeii will become far richer as we begin to add illustrative and academic information about each of the objects in the map. We already link each property to Pompeii in Pictures, but the addition of information on each property from the Corpus Topographicum Pompeianum (CTP) will offer a full listing of all the names given to these properties as well as a basic bibliography for each. Such bibliographic content, together with the spatial index provided by Garcia y Garcia in the Nova Bibliotheca Pompeiana, will provide the first connection to our catalog of citations (see below for more on this).  In Phase Two we will also include functional interpretations. City-wide interpretive data come from the CTP and, of course, from the Eschebach plan of Pompeii. The 1993 update of this plan in Lislotte Eschebach’s Gebäudeverzeichnis und Stadtplan der antiken Stadt Pompeji also contains additional information about each property, such as dates of excavation, finds, and decoration as well as additional bibliography. We are in process of adapting this information as well. To provide more up to date functional information, we will also be including the published work of scholars who have focused on specific properties or types of properties, such as bars (Ellis), fullonicae (Flohr), or bakeries (Monteix). Finally, to increase the spatial resolution of the map, we are creating room level data based on the definitions and nomenclature in the Pompei: Pitture e Mosaici volumes. Producing spatial data at this level will offer the potential to attach research data in a more specific and powerful way, such as the finds-by-room data produced by Penelope Allison.

To do all of this work will require an investment of time to ensure the spatial and descriptive integrity of every  building the ancient city. On the descriptive side, this work will start by getting the building’s address correct and associating that with all previous addresses. Luckily, the CTP has published concordances from which to work. For the spatial side, there is no such index. Each building in our map will need to be examined and compared with previous maps to ensure that when we attach functional or interpretive data to a property, that property is the expected shape. For minor differences, especially in the interior of a single building, those differences will be described in the metadata and illustrated in georeferenced plans (when possible). For major differences, a new polygon will be drawn to reflect the different interpretations of a building’s shape. While a faithful representation of multiple interpretations is appropriate, it will also necessitate further attention to how different elements of the map interact with one another. This is called topology.

Phase Three: A Query Map.

Topological rules are the basis for one of the most powerful aspects of geographical information systems: the ability to search spatially. A spatial search can be on strictly spatially descriptive attributes, such as the elevation of a point, the length of a line, or the area of a polygon. It can also be used to find non-spatial attributes attached to the same geometries: the source of the point’s elevation data, the name of the street the line defines, or kinds of floor treatments of a room’s polygon. Most importantly, the spatial relationships among geometries can also be searched. That is, one could ask if a kind of room were found within houses of a particular size or if that house was within a certain distance of another kind of property, such as an inn or bakery. To generate this valuable kind of search depends upon three components:

  1. How well defined the physical shape of Pompeii is;
  2. How much academic information we can attach to those shapes;
  3. How carefully and precisely we define the topological rules.

Part one is well underway as our Navigation Map. Part two is growing, but always needs more information. Help from the community is ALWAYS desired. Part three, the Queryable Map, is in the very near future.

Phase Four: Pompeii’s Bibliocartography.

For the PBMP, the essential and defining query – whatever its structure – is to access accurate bibliographic information through the map. Realizing this functionality is fortunately not a terribly complex technological problem. Because of its robust native query functions, GIS  will be the primary platform for combining the different types of data. Specifically, the unique ID of any map element will be “king”, the atomic bond between tables of attribute data, catalogs of bibliographic citations, and indexes of full-text publications. An example will make this clearer. As our drawing (failingly) attempts to illustrate, an individual map element, in this case the polygon of a house in Pompeii, will have descriptive information attached to it. The most important of these will be the name of the house, or rather the many names given a house over the centuries since its excavation. These names and their addresses will provide the handle for our processing of full text documents, allowing us to not only make the documents discoverable via the map, but also make the map serve to illustrate bibliographic searches.

Connecting citations to to places will require a number of approaches to be employed. As a proof of concept, in 2011 the PBMP first used the spatial index created by Garcia y Garcia in for his first two volumes of the Nova Bibliotheca Pompeiana. An updated version is available online. This index specifically lists each citation number associated with a particular address in the city. As the effort of one dedicated scholar, this index is truly remarkable. As a bridge between Pompeii’s physical and publication landscapes, however, it reaches less than one quarter of the way across the gulf that divides them. That is, only about 25% of all the citations are given an address and only about the same percentage of the places in Pompeii are listed. To associate more works, and to parse their contents more precisely, the PBMP is applying natural language processing techniques to all the full-text documents we can capture. Let me echo our call for help again here to grow our repository. Our authority list of Pompeii’s toponymy has been generated from its complete enumeration (more than 5000 entries) in volume II of the Corpus Topogrpahicum Pompeiana while the collocation (and implicit disambiguation) comes from the “Numerical Index” of the same volume. Because gathering the entire corpus of Pompeian scholarship in full-text will take some time, we plan to move forward with intermediate steps including parsing title keywords, processing book indexes, and seeking community help in tagging works they have read (or written!).

The Future. 

Careful viewers will have seen some notations in the image above about the future, especially concerning ways to extend the project and the multiple platforms for potential dissemination. We have always intended to have both download and upload capabilities; the ability for users to pull down our data and for the PBMP to ingest their additions, changes, and improvements. Since the project was conceived, a number of online platforms, resources, and coding practices have risen to prominence including OpenJumpPostGIS, GeoJSON, and GitHub. It is our intension to keep the doors of our data open to these and future developments.

– EP

Open access win: UMass Amherst and Google reach licensing agreement

Open access win: UMass Amherst and Google reach licensing agreement

Through the diligent and intelligent efforts of the UMass Libraries staff and UMass General Council’s Office the University of Massachusetts Amherst (along with UMass Lowell) has signed a licensing agreement with Google, Inc. that allows for the use of their scanned books held within the HathiTrust. I previously wrote about the impasse created by the indemnity clause in the standard licensing agreement and made a forceful argument that the clause conflicted with the spirit of the Google Books project as well as their own arguments of legitimacy. Now that the parties have come to an amicable resolution, it seems appropriate (and with permission) to share some of the details so that the next university, library, museum, or individual scholar can better understand what resolution looks like. Here’s the agreement (adjust the width with fit width tool <[]> if the pdf is too large):

Umass-GoogleAgreement (Note that the signatories have been redacted).

There’s much one might say about the document, but I’ll focus on Section 1 and Section 4.

In Section 1, the language carves out a space for academic, non-commercial, non-competing projects like the Pompeii Bibliography and Mapping Project. Specifically, UMass (Institution) will:

“(1) use the Institution Digital Copy only for research, scholarly, or academic purposes;” – Academic research and public outreach are the goals of the PBMP.

“(2) not share, provide, license, or sell the Institution Digital Copy to any third party…” – the PBMP plans only to link to the actual human readable documents and will use the scanned books in our Natural Language Processing efforts to better connect Pompeii’s map and bibliography.

“(3) not use the Institution Digital Copy to provide commercial search or hosting services substantially similar to those provided by Google, including but not limited to those services substantially similar to Google Book Search;” – considering Google’s great
combination of Map services, Google Book search, and Image collections, this is a bit closer to the PBMP’s mission to make the discovery of information about Pompeii as seamless and intuitive as possible. The PBMP, however, is not a commercial enterprise and in the totality of its content (c. 15,000 citations and dozens of mapping files) is surely so insignificant compared to Google that is cannot reasonably considered “substantially similar”. Right? Fortunately, the rest of the clause speaks directly to exclusion of academic projects like the PBMP.

“(4) (A) use reasonable efforts to prevent third parties from bulk downloading any portion of the Institution Digital Copy, and (B) implement technological measures (…) to restrict automated access to any part of Institution’s website where substantial portions of the Institution Digital Copy are available.” – Again, because the PBMP will not host these documents, instead mining them for OCR transcripts and producing indexes, these digital copies will not be stored in web accessible storage.

Section 4, and its further definition in Section 5, is a major change from the former agreement language (is that why its shouted in all CAPS?). In essence the change is from UMass accepting liability for any and all legal challenges arising from third party litigation, to UMass accepting liability when it (as most common means of breach, though not the only means) redistributes Google’s scanned books. This language seems to put the burden back on each party for its own actions: Google for any breach of copyright; UMass for compounding that error.

Our next steps are getting the books we need out of the HathiTrust: linked to our bibliographic database and captured for our Natural Language Processing efforts. Their datasets page offers information on extracting works not digitized by Google via their Data API and creating a formal request for Google’s digital copy.

I have a meeting next week with Laura Quilter (UMass Copyright and Information Policy Librarian) to hammer out a strategy for getting Google and Non-Google books. That process will be the subject of a subsequent post.

 

 

A request to the community of Pompeii scholars

It’s like trying to grow a hand from the fingers inward.

This is the analogy I have been using to describe the steady, but frustratingly dislocated process of constructing the different elements of the Pompeii Bibliography and Mapping Project. The bibliographic catalog and full text repository, the GIS map, the natural language processing of the repository, and the user interface all are growing apace, but none are yet connected and can demonstrate the power of the final product. Nonetheless, now is the time to ask for your help. As the group I anticipate to be the most avid users of this resource, I need your cooperation to make it as useful as possible. Therefore:

I need your bibliography, especially any and all full-text copies.

Citations: While the 15,000+ corpus of citations based on L. Garcia y Garcia’s Nova Bibliotheca Pompeiana is nearly ready for release as a searchable database (the first iteration will be here), there are still hundreds of citations that are unaccounted for, especially in the last decade or so. To tell the PBMP about an important citation, join our Zotero group and add items to the “New Citations” collection or simply fill out the form on this page.

Full text: There is an obvious benefit to being able to seamlessly go from discovering a potential source to reading it with only a single click. For thousands of the works about Pompeii this is possible and for those works it is being realized. There are many other thousands, however, that remain in copyrighted status and cannot therefore freely be accessed or redistributed. The PBMP will be working with authors and publishers to seek the release of as many of these works as possible. In the meantime, I am still asking for any full-text items – ebooks, electronic offprints, personal scans, etc – regardless of copyright status to use in our natural language processing of Pompeian sources. Let me be clear, these items WILL NOT be redistributed. Instead, they will be used to help make the search functions of the PBMP more robust. To put it another way, your copy will be read by a machine and not by humans. Please email me (Pompeiana@gmail.com) to make arrangements to transfer files.

I need your maps, underlying spatial data, and suggestions for inclusion.

The map of Pompeii will be based ultimately on the superintendency’s CAD plan of the city. In many ways, this base data is like the Eschebach map of property function – we all know its faults and we know there is no little alternative. We have made many valuable improvements, but if you have or can recommend maps that are crucial to an updated topography of Pompeii – whether descriptive or interpretive – please send them along.

I need your expertise and your time.

If you have an interest and a skill that might be valuable to the PBMP, please do let me know.

Please email me directly at Pompeiana@gmail.com to offer your data, suggestions, or to ask questions. The more data I have today, the more data you will have tomorrow. Thank you all in advance for your help!

-Eric Poehler

The Elephant In The Room

The Elephant in the Room

Institution shall defend Google against any third party lawsuit or proceeding that relates to Institution’s use of the Institution Digital Copy or receipt of the Public Domain Digital Copies, including without limitation, any such use by a third party.  Institution shall select counsel reasonably appropriate for such defense and shall pay for all costs incurred by such counsel.  In addition, Institution shall pay any damage awards or settlement costs that may be incurred.  Google may participate in the defense with counsel of its own choice, at its own expense.

This, digital humanities and open access friends, is what is known in the common parlance as a deal-breaker. The preceding language comes from an agreement that all parties must sign in order to use the Google-scanned books in the collections of Hathi Trust. If you don’t know, Hathi Trust (pronounced “Hah-thee”) is an ingenious umbrella organization of research universities that administer a massive repository of digital content. Hathi’s biggest collection, however, comes from Google and using the materials means licensing them under the terms Google sets. Those terms effectively transform every licensee into a firewall for Google, whom Google can choose to aid in defense or not. For public institutions like the University of Massachusetts Amherst, which is an entity of the Commonwealth, signing such an agreement puts every tax payer of the state on the hook should a third party sue. Of course, both Google and the Commonwealth of Massachusetts are naturally first responsible for protecting their business and the citizens of the state from liability. Indeed, Google should have some protection from flagrant misuse of intellectual property by their licensees. Similarly, individuals or entities should not be signatories to agreements that drag an entire state into court.

The losers from the inflexibility of this clause, however, is everyone else.

Even if reasonable from the perspective of major institutions, these licensing agreements are myopic and pernicious. If Universities cannot provide digital resources the way they provide physical resources, how can faculty and students be expected to advance scholarship in the 21st century with only the resources of the 20th century? Moreover, if libraries cannot expand their digital repositories except by what they digitize themselves, libraries will devolve into the dreaded data silos, re-imposing the limitations of time and distance that digital resources inherently overcome.

The Pompeii Bibliography Mapping Project is (gladly) becoming the canary in the coal mine for this problem here at UMass. The problem has already passed from the UMass Libraries, to the Office of General Counsel, and back again. The issue has even become the kind of hallway banter and cocktail chatter that characterizes its intractability. For the PBMP, however, the conversation can’t end with a well meaning sigh and shrug of the shoulders. The project is hamstrung without resolution.

There’s plenty of blame to go around here. Google should be more willing to negotiate the terms of its licensing agreement so that anyone can actually use what they have spent so much energy creating and defending. In fact, Google might look at how it just successfully defended itself against the Author’s Guild complaint. Judge Chin wrote, “In my view, Google Books provides significant public benefits. […] Indeed, all society benefits.” Google’s licensing agreement stands in practical opposition to the theoretical benefits Judge Chin’s ruling describes. Here at UMass, negotiating access to digital resources should be a higher priority. Perhaps it’s an effect of faculty research being (perceived) a small component of the university’s overall mission. What would happen if a licensing agreement were to prevent students from accessing course materials, such as e-reserves? More likely than not, we’ll soon find out the answer.

Copyright Symbol

Photo: MikeBlogs/Flickr

We faculty have plenty to own up to as well. We’ve been donating our intellectual property to commercial enterprises in exchange for a physical binding and the imprimatur of legitimacy. Faculty need to remember when faced with a copyright release form that the production of scholarship – those ideas supported by hundreds of citations – is dependent on the consumption of scholarship. Now the very groups that want you to sign away your rights are trying to prevent you from using other people’s research unless you pay, because they own that too. We’re not just working for free, we’re working at a net loss.

It’s a business model that should worry universities more, especially public universities. What does it cost the university for a faculty member to write a book in terms of portions of salary when you consider leave time, research funding, physical infrastructure, and library resources, among other things? Most universities have technology transfer offices that directly transform mainly science scholarship into protected intellectual property. They help discoveries become inventions and eventually products. In the humanities and social sciences the process is the same, but the valuation of what we do is perverse. Of course, a book is unlikely to ever match the monetary rewards of a new heart medication, but the university puts as much skin in the game and then allows its investment to become someone else’s profit. It’s worse still. The university must then buy back its own investment, sometime after having supported the scholar via a publication subvention. Jon Stewart might say it this way: The university is a deli owner who makes sandwiches and lets the cashier keep the money from the sale and when the deli owner needs more ingredients, the cashier is suddenly the nearest supplier. So, why shouldn’t universities impose an “indirect costs” model upon for-profit publishers? If it is necessary for faculty to give up 59.5% of their federal grant money to universities because of the umbrella of services they provide, should not publishers and “copyright squatters” be forced into accepting more of the full process?

None of this is new, of course. But it’s the first time I’ve said it publicly and more and more people need to say it as well. Better yet, vote with your feet. Refuse to sign your copyright away. Publish in open access venues. And when it comes time to evaluate scholarship competitively for grants, fellowships, and for promotion, we must stop using the press name or journal title as shorthand for quality, or, rather as proof that ideas found outside of major venues are of lesser value. University presses at public institutions should be at the vanguard of this change. Faculty should be encouraged to publish in house and force major academic “content corporations” to license the scholarship from the university.

In the end, this specific impasse will likely be resolved. I’ve learned that my alma mater, the University of Virginia, has negotiated mutually acceptable indemnity language with Google. Some academics have even broke with their university counsel and signed the agreement as individuals, taking on the liability themselves. Considering that the Authors Guild had asked for $1500 in damages, per book from Google, this seems like a deadly, if not dangerous risk to take. That is, the likelihood that an individual would lose an infringement case may be very low, but would be catastrophic should it happen. What’s a scholar to do? People fly in planes, don’t they?

-EP

Postcards from Berlin: Reflections on Running a DH Project

I’m writing this post in Berlin’s Tegel airport on my way home from an excellent visit to the Digital Classicist Seminar hosted by the DIA and TOPOI. I had intended for our second post to share some reflections on running a grant-funded digital humanities project. In particular, how we communicate in the PBMP and how much that matters. There was an excellent question from the audience last night, and it is still nagging me, so I’m going to wrestle with it an addendum.

 

Running a grant-funded project is a lot like running a small business. Having co-directed an archaeological field project, Pompeii Quadriporticus Project or PQP for short, for as long as four years, this was not terribly surprising to me.  There is, however, an important difference between the administration of the PQP and the PBMP. That difference is not in the archaeological fieldwork vs. digital information production, but rather in duration. Imagining a graph of the PQP administrative duties forms a nice bell curve of work in my mind. It starts off at a slow pace by finding grants and writing applications for them. If successful, the next step is to begin recruiting students and staff. Then things really get moving as tasks shift to arranging travel, food, and accommodation. Intensity spikes during the field season itself. Afterwards there is the reckoning of what was accomplished. Some expect you to describes these accomplishments in archaeological reports, some in new course material, and some others need you to describe your activities in receipts. By September, there’s a peaceful lull before you yawn and look around again at the CFPs, thinking of next year’s work.

If a fieldwork project is a small business, then it is also a seasonal business. A grant-funded DH project, by contrast is a construction contract. You’ve promised to deliver a product, even it that’s only proof of concept, and you’ve got a year (for example) to do it. Where I’ve noticed this difference most starkly is in the sheer volume of time spent in communication. On average, I spend three hours a day writing emails, making phone calls, attending meetings, and updating comments in our project management environment (more on that below)…and if I’m moving the project forward, this pace of communication needs to continue. The most disappointing aspect of this communication load is that it takes me away from the data itself, from reading the digitized books, browsing the names and places in our authority lists, or perusing the topography as it flows into our GIS. Perhaps this should not have surprised me as directing an archaeological field project also takes one away from the day to day shifting of dirt (the old saw is that there’s an inverse proportional relationship between one’s authority in the field and the size of the tool she uses).  It is sitting in the airport that brings the following metaphor to mind: if you’re good enough at flying planes, one day you’ll get promoted to air traffic control.

Fortunately, intermediaries, both people and technologies, help to ease that communication burden and permit some meaningful contact with data. In the first instance, having student managers who know the work and can communicate it well to their peers are of incalculable help. In passing information and instructions “downstream,” I only have to explain things once, and when confusion arises from those instructions, only the most confounding questions make their way back “upstream” to me.  I also find the web-based project management tool, Trello to be especially helpful communicating aspects of the work. Trello functions like being in a room full of bulletin boards, to which you can affix individual tasks, also known as “cards”, like expandable “post-it” notes in categories of “To Do”, “Doing,” and “Done.” Trello and the PBMP are alike in this respect, each using a physical metaphor as a basic structure to organize information. Within each task, is a flexible work space that permits users to assign a task to individuals or a team, to create checklists, add comments, and to attach files to the card. The free account offers only 10MB of hosting space, but Trello is well integrated with both Dropbox and Google Drive. The later is especially useful for extending the shared environment from task manager to work space.  The checklists offers a great way to track the progress of work in both serial and sequential tasks and even the most bureaucratic duties, like students reporting the hours worked each week.  Finally, because it tracks the progress of tasks and records not only the content of comments, but also their order, Trello functions as a de facto (but incomplete), documentation system. In the context of a fast-paced, communication-rich project like the PBMP, it is especially important to record not only that we accomplished something, but also how we did it. Since the PBMP is committed to serving as a model for other topographically and bibliographically rich subjects, whether archaeological or not, documenting our discursive work process at this micro level will be valuable documentation of successful strategies and common pitfalls.

Trello

Trello “card” assignment

This brings me neatly to one of the insightful questions raised Tuesday night at the Digital Classicist Seminar. Paraphrased, that question was:  “How should one go about doing their own project like the PBMP?” For all my rhetoric about the possibility of extending the lessons of the PBMP to another project, I must admit to having been temporarily stumped. My eventual answer was practical and technical in nature:

  1. Assess the suitability of the physical landscape in your project to scaffold your bibliographic data or other resources. For example, Pompeii is neatly divided into individual regions, city blocks, and properties with addresses and (mostly) unique names. How well divided and labeled is the space of your subject? How well does the content of your bibliographic data map onto that space? That is, do the texts use this spatial system regularly and consistently?
  2. Take an inventory of the available resources in both analog and digital formats. For Pompeii, we are fortunate to have its 250+ years of scholarship and 64 hectares of space already well defined in accessible print publications. The initial challenge was one of digitization. If your subject, however, has not codified its sources or maps into nominally canonical resources,  the first step would be to undertake a comprehensive, “state-of-the-field” project. If starting here, building a carto-bibliographic system like the PBMP would be a far heaver lift.

Having had a night to think about it, I wish that I had added another issue to my response: take a personal census. Put simply, are YOU the best fit for such a project. In many cases, answering the technical questions above would act as a personality sorter, driving away only those most committed to a subject with the volume of work the project represents. Still, running a major digital humanities project is facilitated by certain skill sets and personality traits. Let me express these, as I understand them (and not necessarily because I possess them), as a series of questions:

  • Do you prefer to work in teams?
  • Are you a good judge of talent? Of character?
  • Can you hire and fire people?
  • Can you motivate people with your own enthusiasm about “the big picture” when their daily work might be monotonous and dry?
  • Are you immune to the monotony of repetitive tasks?
  • Is salesmanship a positive word?
  • Do you like it when a new email arrives?
  • Can you share your work publicly before it is perfect?
  • Can you give away the data you’ve worked so hard to produce?
  • Is the recognition for what is essentially altruism a sufficient reward for you?

If you answered “Yes” to most of these questions, you may be well suited to accomplish and enjoy running a grant-funded digital humanities project.

 

Next time, the PBMP and the copyright hustle.

-EP

The PBMP: Getting Started

This is the first in a series of posts – frequent, but unlikely regular – discussing the work we are doing to achieve the goal of making the bibliography of Pompeian scholarship searchable via a map. The Pompeii Bibliography and Mapping Project (PBMP) is committed to this task and now has the necessary resources to complete it with the generous funding of both an American Council of Learned Societies Digital Innovation Fellowship and a National Endowment for the Arts Digital Humanities Start Up Grant. What the PBMP is and who we are can be found elsewhere on our website, so I wont repeat that information here. Instead, what follows are some preliminary ruminations on our goals and what we have accomplished in our first month in operation.

Two of the three elements of the PBMP – the bibliographic catalog/subject repository and the online GIS – are already online in beta forms and linked together through only the most basic toponymic information. The work to replace both of these demos has been the focus of much of the first weeks of work. The bibliographic catalog is based on the incredible compilation of citations published by Laurentino Garcia y Garcia in his three volume Nova Biblioteca Pompeiana, which recently also has been released online in pdf. Our continuing work to digitize this resource was lead last year by UMass student Jackson Mitchell with funding through a Project Funds Grant from the UMass College of Humanities and Fine Arts. Mitchell is continuing his leadership role this year, working with Kevin Nguyen (among others, some named below) to proof the catalog data and to add essential metadata.

The Nova Biblioteca Pompeiana’s collection is current only through 2011, however, and new scholarship is constantly being published. Therefore, one challenge is to bring the catalog current and find new sources as they are produced. It dawned on me that that libraries and research databases are able to quickly learn about new books. How do they do it? In addition to this line of investigation, Leslie Bradshaw (UMass Digital Humanities Initiative Graduate Assistant) is researching programmatic and crowdsourcing solutions so that the PBMP can keep current with the continually expanding its list of publications on Pompeii. When we learn answers to these questions, they will be published on this blog, which is carefully managed by Chris Caro, the PBMP’s IT assistant.

We are also actively expanding the number of full-text documents that can be linked to these citations. UMass Librarian, Annette Vadnais has been working with the Hathi Trust to gain access to their vast corpus of Google’s scanned books. Additional works, mainly from the Internet Archive, are being added to our collections through a collaboration with the UMass Center for Intelligent Information Retrieval’s (CIIR) A Million Scanned Books and Proteus projects. Digitizing some works with particular topographic importance is also underway in house. In 2010, the PBMP was awarded one of the initial UMass Digital Humanities Initiative Seed Grants, which funded work on our bibliographic catalog and the purchase of an Atiz BookDrive Mini book scanner. Currently, the Pompei: Pitture e Mosaici volumes are being scanned by Tess Brickley. At the same time, Danielle Dyer is parsing the data in the OCR text from our scans of the Corpus Topographicum Pompeianum into a database of individual property records. Later, this database will be supplemented (including concordance) by property information from other sources, such as Eschebach’s Gebäudeverzeichnis und Stadtplan der antiken Stadt Pompeji, Astrid Schoonoven’s Metrology and Meaning in Pompeii (Appendix I), and Damian Robinson’s The Shape of Space in Pompeii: Studies in the Social Production of a Roman Urban Landscape (Appendix, unpublished diss., Univ. of Bradford). Danielle is also correcting the scanned concordance of place names and addresses (“The Toponymy”, CTP II, pp. 1-203). These efforts to generate authority lists of the Pompeian topography will be essential to the task of making our full-text corpus searchable and connected to our GIS map. This work will begin in earnest this spring, as we apply Natural Language Processing techniques to our digitized catalog under the direction of Prof. David Smith.

Last week, Annette and I also had a fascinating and important meeting with Laura Quilter, UMass Copyright and Information Policy Librarian, to discuss the issues of copyright regarding what content PBMP can and cannot make available. I am inspired to think this topic needs a post of its own, but two important points can be shared here. The first point is that the last three-quarters of the 20th century is a “copyright black hole” (my descriptor, not Laura’s). That is, between our current open access boom and 1938, there is a circa seventy year period of digital silence. Within this period, copyright holders (mostly publishers), can prevent the wide redistribution of the important information within those titles. A second fact potentially ameliorates the impact of this fact: authors can petition publishers to release their copyright if the work is no longer in print and cannot be purchased for a reasonable price. The publisher has the option to reprint the work, but if they choose not to, the copyright must go back to the author. Later in the life of the PBMP, we will certainly pursue a strategy of contacting authors and publishers to get the 20th century to rejoin the 21st, as well as the 18th and 19th centuries in terms of accessibility.

Fig. 1. Location of 2006 survey points, from Morichi, Panone, Rispoli, and Sampaolo (2006, 554-555).

Fig. 1

The second major component of the PBMP is our online GIS map. Digitization of Pompeii’s landscape, in its most basic form, is complete and already online. The digitization process, however, was done in piecemeal fashion and for a separate purpose. These files served as the underlying GIS for my dissertation on Pompeii’s traffic system. Because the spatial data for the PBMP was born in dissertation research, the current GIS is overly specialized in some areas and lacks detail in others. Several subsequent publication projects (e.g. on stables and on the drainage system) were also added to the new datasets of general (e.g. a new contour map of local topography) and very specific (e.g. the locations of street blockages) interest. The first task was therefore to inventory the hundreds of files and (their versions) in a dozen file formats, spread across three computers and six external drives to find out what we are missing. That inventory is shared here as a Google Spreadsheet.

Fig. 2 Example of SAP survey point sheet (ST 60).
Fig. 2_ST 060
Fig. 2_ST  60

Even the very first step in creating the GIS, georeferencing the Soprintendenza Archeologica di Pompei’s (SAP) AutoCAD plan of the site, was done as part of another project. In July of 2006, I was completing my charge to build an accurate 3D wireframe model of Insula VI 1 for the Anglo American Project in Pompeii (AAPP) using a reflectorless laser theodolite. During the same period the SAP commissioned a theodolite transit of the city and the creation of nearly 100 georeferenced survey points (fig. 1-2). We were given three of these points and I began the process of shifting the AAPP model, joined to the SAP’s basic CAD plan of the entire city into real world coordinates. The process, conceptually, was very easy:

Capture the SAP survey points within the local AAPP grid and model.
Create a point in UTM coordinates in the CAD model for each of the SAP survey points.
Copy the joined SAP and AAPP models and paste them into UTM coordinates based the location of one of the points in step #2.
Rotate the entire model so that a second point in original model is moved to match the location of a second point in step #3.

Fig. 3. Angular distortion across insula VI 1.

Fig. 3

The actual transformation procedure, however, became relatively complex due to the differing levels of precision among GPS readings (+/- 10cm), electronic survey measurements (+/- 1cm), and the infinite scalability of AutoCAD. Thus, although the model could be moved to the exact location of any single point (step #3), the difference in precision meant that the model could not be rotated and “snapped” exactly to any second point (step #4). More importantly, errors in the GPS points’ positions were compounded in the rotation procedure, expanding over distance and ensuring that the southern and eastern portions of the city would suffer the greatest distortions (figs. 3-4). Fixing the errors in position (and projection) is one of the initial tasks the PBMP will undertake, lead by UMass senior GIS analyst, Alexander Stepanov.

Fig. 4. Detail of angular distortion. Error is as much as 0.26m over 51.36m.
Fig. 4

Following the correction of positional errors, the next task will be to assess the internal consistency of the spatial data. All the city’s features not found in the original CAD plan- for example, polygons of the streets and street features, unexcavated areas, gates and fortifications, fountains and water towers, as well as the individual insulae and properties – were hand drawn by me over the course of a decade. Minor inconsistencies due to (my) human error in the shapes of features and especially in how those features interact with others – by overlapping, containing, forming a boundary, etc. – must be identified and corrected. In the course of this work we will also establish our basic topological rules for the data, defining how objects can and cannot function in the GIS. One obvious rule will be that properties cannot be overlapped by other properties, but others will also be devised.

In the coming weeks we hope to not only detail our progress in this blog, but also have demos and beta products to offer for testing and feedback. And, let me say it now for the first of what will be many times: we are always interested in your suggestions for what bibliographic information should be included in our catalog, what aspects of Pompeii’s urban topography should be digitized, and what functionalities the PBMP should offer to you, our community of interested users. Having read this far your interest must be genuine indeed.

EP