Mapping the Mapping Metadata
I would like to credit Alexander Stepanov for his help in generating the content and the language of this post. – EP
In preparation for the beginning of the PBMP’s grant term (with generous funding from both the ACLS and the NEH), I took a census of all the spatial data I had generated for Pompeii. The purpose of this inventory was, of course, to find the best, most recent data possible. While these files had been an excellent spatial foundation for running analyses on my own research interests at Pompeii (i.e., the architecture, economy, infrastructure, and the circulation of pedestrian and vehicular traffic), they are neither spatially consistent nor sufficiently described in metadata. In some case, individual features did not have even unique identifiers beyond their object ID fields.
For most of October and November, therefore, the primary work of the PBMP’s mapping team was the careful examination and consideration of these data. Under the supervision of Alexander Stepanov, GIS architect for UMass Amherst and the PBMP, we examined each file, looking for important data to keep, necessary data to transform, and redundant or extraneous data to eliminate. For example, because many of the GIS files were first created in AutoCAD there were a number of fields with data that were merely artifacts of this process. On the other hand, many of the files held objects that had been drawn in individual CAD layers, such as the outlines of the identified buildings in the city. In these cases, our task was to keep the unique attributes of each record within the feature class, but also to extend a uniformity across all features in our databases. One of stickier questions was what to do with features made up of multiple polygons, but which represented an architectural or interpretive whole. The city gates are the best example of this. Thus, while we are loathe to collapse the complexity inherent in the parts, we are equally averse to denying the obvious unity within their function, especially when a research might be more interested in the entire footprint of a gate more than merely the sum of its parts. In the end, we kept the individual polygons of the gates, but also added a footprint class. This solution was actually generated for the fountains of Pompeii, which each have a complete footprint, area of masonry, and a basin to articulate spatially. Below are the old and new attribute tables, respectively, for the Gates feature at Pompeii.
The result of these efforts was a model, constructed in Safe Software’s Feature Manipulation Engine. The Feature Manipulation Engine (FME) is our software of choice for the way it manages workflows, allowing us to recreate a dataset from the original files on demand rather requiring the creation of intermediate data sets. In this way we can recalculate and reassemble any particular piece of spatial data, changing parameters or classifications at any point in the process without creating a pile of redundant intermediary files. Thus, if a new attribute is required, it can be easily added, including being joined from an external database, before or after any other process in the model. For example, one of the first tasks set to the mapping team was the development of unique naming conventions (GIS Name) and prefixes (Code) for all our spatial data. We settled on an initial set of 28 layers and solidified their nomenclature in a spreadsheet. Using this spreadsheet and FME’s Joiner function we were able to simultaneously funnel all our spatial data through this table and use it to rename the feature class as well as all the individual records within that file. After submitting to other transformations (up to 30 attributes condensed to 11) in what Stepanov has aptly titled the “Data Centrifuge”, these processed tables “fan out” once again into individual files.
For example, the first workflow transfers set of shape files (with different attributes) into set of Feature Classes in File Geodatabase with uniform set of attributes, and automatically populates some attributes (setting e.g. data sources, date of update, etc). This workflow forms uniform Feature Classes names according to the following naming convention (I think it would be interesting for readers to see the naming convention). Then at the end of ‘Data Centrifuge’ block, all features are ‘fan out’ into separated Feature Classes based on the value of the name attribute.
Because of the iterative nature of our work the development of such a model was essential. That is, because we only fully come to understand what we are doing and what needs doing by doing it, we must have a framework for writing and rewriting metadata that is a flexible as we are, that can easily apply the lessons we learn. We have worked with these initial 28 shapefiles, but plan to process all the relevant files in the inventory. Before these can be widely shared, however, a bit more work must be done. Thus, they must be:
- nudged into place slightly as the files’ alignment is off from standard mapping projections.
- reviewed for spatial consistency; e.g., making sure that features don’t have unwanted gaps or overlaps.
- prepared for further data linkages, including data from within publications and from across the internet.
These will be the subject of a subsequent post or series of posts.
– EP