Notes
Slide Show
Outline
1
Challenges and Solutions for Digital Geospatial Data Preservation

Jeff Essic
Geospatial Data Services Librarian
North Carolina State University Libraries
2
NC Geospatial Data Archiving Project
  • Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)


  • One of 8 initial NDIIPP collection building partnerships


  • Focus on state and local geospatial content in North Carolina (state demonstration)


  • Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventories


3
NCGDAP Goals
  • Repository Goal
    • Capture at-risk data
    • Explore technical and organizational challenges
  • Project End Goal
    • Data Producers: Improved temporal data management practices
    • Archives: More efficient means of acquiring and preserving data;
    •    Progress towards best practices

4
NCGDAP Specifics
  • Funding:
    • $520,000 for 2005-2007
    • $500,000 for 18 month extension

  • Staff:
    • 1.5 at NCSU
    • Approx. same at NCCGIA
5
 
6
Outline
  • Key Geospatial Data Types


  • Risks to Digital Geospatial Data


  • Value in Temporal/Historical Geospatial Data


  • Archiving Challenges


  • Solutions in Progress
7
Key Geospatial Content Types
8
 
9
 
10
 
11
 
12
 
13
 
14
Other Geospatial Data Types: Web 2.0 Content
15
Geospatial Data: Compelling Issues
  • Dynamic content
    • Constantly updated information
    • Data versioning


  • Digital object complexity
    • Spatially enabled databases
    • Complicated, multi-component formats
    • Proprietary formats
16
Risks to Geospatial Data
17
Digital Preservation Points of Failure
  • Data is not saved, or …
  • can’t be found, or …
  • media is obsolete, or …
  • media is corrupt, or …
  • format is obsolete, or …
  • file is corrupt, or …
  • meaning is lost
18
Risks to Geospatial Data
  • Producer focus on current data
    • Data overwrite as common practice
  • Future support of data formats in question
    • No open, supported format for vector data
  • Shift to web services-based access
    • Data becoming more ephemeral
  • Inadequate or nonexistent metadata
    • Impedes discovery and use
  • Increasing use of spatial databases for data management
    • The whole is greater than the sum of the parts
19
Value in Historical/Temporal
Geospatial Data
20
 
21
 
22
 
23
 
24
 
25
Preservation Challenges
26
Challenge: Data Capture
27
Challenge: Data Capture
  • Industry focus on “latest and greatest” data
  • Industry temporally-impaired from the point of view of data availability, software support, etc.


  • Loss of memory about the data
  • Of superceded county orthophoto flights in NC:
    • Only 22% recorded in the state’s GIS inventory
    • Only 30% accessible through county map servers
28
 
29
 
30
Challenge: Preservation Metadata
31
Challenge: Vector Data Formats
  • No widely-supported, open vector formats for geospatial data
    • Spatial Data Transfer Standard (SDTS) not widely supported
    • Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”

  • Spatial Databases
    • The whole is more than the sum of the parts, and the whole is very difficult to preserve
    • Can export individual data layers for curation, but relationships and context are lost
32
 
33
Challenge: Digital Object Complexity
34
Challenge: Cartographic Representation
35
Challenge: Geospatial Web Services
36
 
37
Other Challenges

  • Rights management
  • Data versioning
  • Semantic issues
  • Large scale content transfer
  • Integrating older analog data
  • More …





38
Solutions in Progress
39
Different Ways to Approach Preservation
  • Technical solutions:  How do we preserve acquired content over the long term?


  • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?


40
Different Ways to Approach Preservation
  • Technical solutions:  How do we archive acquired content over the long term?
    • Build data repositories: not just as an end in itself but also as a catalyst for discussion within the data community
    • Develop repository ingest workflows: create technical points of engagement with other NDIIPP preservation projects and build on collective learning experience


41
Different Ways to Approach Preservation
  • Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be archived—from point of production?
    • Engage data producer community and spatial data infrastructure through outreach and engagement; influence practice
    • Sell the problem to software vendors and standards development
    • Find overlap with more compelling business problems: disaster preparedness, business continuity, road building, etc.
    • Start a discussion about roles at the local, state, and federal level
42
Content Identification
43
Formal Inventory Processes
  • Alleviate “contact fatigue” on part of local agencies
    • 20 different NC state agencies contact local agencies for data … also, federal/regional agencies
  • Geospatial data is complex, requiring lengthy inventory process
    • Must capture descriptive, technical, and administrative information related to the data
  • Make the inventory available as a sharable data store






44
What do Inventories Offer to Archives?
  • Data Availability Information
    • Detailed information by data layer
  • Contact Information
  • Minimal Metadata
    • Descriptive, technical, administrative
  • Rights Information
  • Document Technical Environment
    • Software used, formats, transfer methods
  • Future Data Development Plans





45
Detailed Information About Data
46
Inventories as Source of Metadata
Example: Surface Water
47
Content Selection
48
Selection Issues

  • Most content is already at some level of risk
  • Early-Middle-Late Stage issues
    • Middle stage is usually the “sweet spot”, e.g. TIFF orthophotos vs. raw images or compressed images
  • Also added-value products: digital maps, cartographic representation
    • Digital maps: “record” or not?
  • Frequency of capture






49
 
50
Sept. 2006 Frequency of Capture Survey
  • Survey objective:
    • Document current practices for obtaining archival snapshots of county/municipal geospatial vector data layers
    • Seek guidance about frequency of capture
  • Survey topics:
    • General questions about data archiving practice
    • Specific questions about parcels, street centerlines, jurisdictional boundaries, and zoning
  • Survey subjects:
    • All 100 counties and 25 municipalities
    • 58% response rate
    • Survey conducted September 2006

51
Frequency of Capture Survey
52
Data Capture Survey Results: Overview
  • Two-thirds of responding agencies create and retain periodic snapshots
  • Long-term retention more common in counties with larger populations
  • Storage environments vary, with servers and CD-ROMs most common
  • Offsite storage (or both onsite and offsite) is used by nearly half of the respondents
  • Popularity of historic images has resulted in scanning and geo-referencing of hardcopy aerial photos among one-third of the respondents





53
Survey Observations
  • Process of survey formulation and implementation helped to socialize the problem of archiving data
  • Local innovation needs to be mined further to inform development of best practices
  • Business drivers for archiving need more study (e.g., stated adherence to retention policy)
  • Exposure to peer practice encourages archiving
  • Pronounced local interest in scanning/rectifying older analog maps and imagery





54
Content Exchange
55
Solutions: Content Exchange Infrastructure
  • High volume of state/federal requests for local data
  • Solving the present-day problems of data sharing is a pre-requisite to solving the problem of long-term access
  • Leveraging more compelling business reasons to put the data in motion (disaster preparedness, business continuity, highway construction, census, …)
  • Content exchange networks:
    • Minimize need to make contact
    • Add technical, administrative, descriptive metadata
    • Establish rights and provenance






56
Solutions: Content Exchange Infrastructure

  • Nov. 2007:  NC Geographic Information Coordinating Council (GICC):
  •     Ten Recommendations in Support of Geospatial Data Sharing  released
    • Recommendation: “Establish archive and long term data access strategies”
    • Suggested best practices include: “Establish a policy and procedure for the provision of access to historic data, especially for framework data layers.”


    • http://www.ncgicc.org/CurrentActivities/TenRecommendationsinSupportofGeospatialData/tabid/156/Default.aspx
57
Solutions: Get the Data in Motion

  • Harvesting use cases for older data as part of outreach





58
Solutions: Getting the Data in Motion
59
 
60
"Tracking data"
  • Tracking data, map servers, and web services since 2000


  • Ranked 3rd in traffic among entry points to library website


  • Persistent identifiers
    • usage tracking
    • IDs used in other sites


  • Peers compare activities


  • Community help in site maintenance



61
Repository Development
62
General Workflow
  • Receive Data from Agency
  • Copy data from agency source to NCSU workstation
  • Create Dspace collection “space” for the data
  • Create administrative metadata
  • Process geospatial metadata
  • Scan geospatial formats and migrate to archival format
  • Ingest original and archival data objects, and geospatial administrative metadata to Dspace
63
Repository Status
  • Acquired 4 TB of data with more on the way


  • Disk space being used initially for “data staging”
    • Inventorying

  • In the process of ingesting content into DSpace
    • Metadata generation



64
Summary
65
Data Capture Challenge –
Implemented Solutions:
  • Downloading or acquiring “low hanging fruit”
  • Frequency based on FOC survey
  • Tapping into existing content exchange networks
    • Orthophoto “sneakernet”
    • NC OneMap
    • NCStreetmaps.org
    • Floodplain Mapping data distribution
    • Others…


66
Preservation Metadata Challenge –
 Implemented Solutions:
  • Creating our own based on:
    • Non-standard documentation
    • Inventories
    • Personal information exchanges
    • Data context
    • Clues, memory,
    •    and other sleuthing


67
Vector Data Formats and Complexity Challenges – Implemented Solutions:
  • Converting and Preserving data in Shapefile format
    • Not ideal, but…
    • Specifications are published
    • Stable, widely accepted and known format

  • Ingest content into Dspace object model
    • Exportability, Transfer, Extraction, and Conversion being tested
68
Cartographic Representation Challenge –
 Implemented Solutions:
  • Scanned, georeferenced, and compressed over 286 NC geologic maps, in cooperation with NC Geologic Survey
69
Geospatial Web Services Challenge –
 Implemented Solutions:
  • Still searching
  • WMS (Web Map Service)
    • Can only capture derived static images, losing the underlying data intelligence
    • Possible use for agent-based image atlas creation
  • WFS (Web Feature Service)
    • Transfers actual vector data as GML
    • Not widely deployed; variation in configuration
    • Scalability for bulk transfer questionable

70
Engaging Spatial Data Infrastructure
71
NC Spatial Data Infrastructure:  NC OneMap
  • NC OneMap is a next generation mechanism to coordinate and disseminate geographic information in North Carolina and interact with the NSDI.


  • Objectives:


  • Build a common
  • understanding of North
  • Carolina data resources


  • Enable widespread
  • access and distribution
  • of geospatial data




72
NC OneMap
  • Objectives (cont.):


  • Develop ongoing data
  • inventory for all geospatial data
  • holdings –
  • http://nc.gisinventory.net


  • Develop content standards
  • for key data themes
  • NC Geographic Information
  • Coordinating Council (GICC)


  • One of the defined characteristics of NC OneMap is that “Historic and temporal data will be maintained and available”.


73
Points of Engagement with Spatial Data Infrastructure
  • Framework data communities
    • Snapshot frequency, naming schemes, classification, GML application schemas, format strategies
  • Metadata standards and outreach
    • Persistent identifiers, versioning, feedback on metadata quality
  • Content replication/transfer
    • For data improvement projects, disaster preparedness, aggregation by regional service providers, … and archives
  • Where does archiving and preservation fit in?







74
Archival and Long Term Access Working Group
  • Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to data
  • Federal, state, regional, and local agency representation
  • Key focus
    • Best practices for data snapshots and retention
    • State Archives processes: appraisal, selection, retention schedules, etc.
    • Who, What, Why, When, Where, How
  • Promising outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.


75
Regional Partnerships
  • Focused on development of shared infrastructure for cultivating access to data
  • Becoming test beds for innovation in the area of data sharing and data management, including archiving


76
NDIIPP Multi-State Geospatial Project
  • Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC
  • Partners:
    • Leading state geospatial organizations of Kentucky and Utah
    • State Archives of Kentucky and Utah
    • NCSU Libraries in catalytic/advisory role
  • State-to-state and geo-to-Archives collaboration
  • 2 year project: Nov. 2007-Dec. 2009
  • Archives as part of Spatial Data Infrastructure
77
Engaging Industry
78
Cultural: Changing Industry Thinking
  • Is the geospatial industry “temporally-impaired?”
    • Lack of access to older data
    • Lack for tool/model support for temporal analysis
    • Metadata: poor support for changing data
    • Education: building class projects around available data (i.e., not temporal)
  • Increased interest now in temporal applications?
    • Increased demand for temporal data?
    • Improved tool support: ArcGIS 9.2 animation tools; Geodatabase History, etc.




79
Project Status
80
Conclusions
81
Conclusions
  • “Supporting temporal analysis requirements” gets more attention than “archiving and preservation”
  • Leverage existing infrastructure
  • Current data sharing needs drive infrastructure improvements that help archiving
  • Leverage business needs that are more compelling than preservation (e.g., continuity of operations)
  • Facilitate stakeholder ownership of the solutions
  • Mine state and local archiving innovations






82