greenelk: Representing geodata, in and around Python

Scenario and use case: gps location, track and map data

The overall use case is the application ecosystem of geodata for the purpose of outdoor sports, with a dash of travel planning and travel photography. The end user need is to plan, execute and document adventures, where locations, tracks and maps play a central role.

My recent eager geodata blog entries are prompted by exciting stuff I’m doing to visualise sports adventures on maps. The hacking is done in Python, but I am less than certain it’s good code. So I allow myself to ask another question: How should I represent geodata objects, for maximum usability during editing and data interchange? And most of all: How should geodata be elegantly represented in the Python code itself?

Scenario and use case: gps location, track and map data

The overall use case is the application ecosystem of geodata for the purpose of outdoor sports, with a dash of travel planning and travel photography. The end user need is to plan, execute and document adventures, where locations, tracks and maps play a central role.

 2013-12-22_Pitztal_88.jpg

Requirements

OVERALL APPROACH: HACKING AND PROTOTYPING

Given today’s technology, the possibilities how to plan, execute and document adventures are plentiful and, honestly, not even fully understood. Adding tomorrow’s technology, the result is a dynamic set of requirements.

The end result is a scenario that requires prototyping. Quick code snippets are hacked and tested. Snippets are improved and extended. Needs are covered better, and more needs are covered.

Roles in the target audience

A distinction between developers and end users exists, but is somewhat blurred. Few outdoor enthusiasts are expected to be able or available to write code, but many are known to have a technical view on editing and interchanging data. Even more are pure end users.

On the continuum between end user and developer, individuals can certainly move over time, but let’s simplify by identifying three categories (and their roles during prototyping):

  1. Surfers (content viewers). The largest group. Use the web and apps. Point and click.

  2. Techies (content creators). A smaller but still large group. Accept (or even like) character based editors and the command line. Have some IT skills and a technical interest. Often coders and developers of other applications.

  3. Coders (developers). A tiny group. Contribute to the code of this application.

Until needs are well understood, only techies are expected to create content in the form of describing placemarks, uploading and editing tracks, and planning trips. At some point, as needs are stable and tools mature, part of the resources will be devoted to enabling surfers to create new content. When and how this happens is outside the scope of this text.

Surfer requirements and needs

  1. View: Browse maps of adventures, before and after participating in them.Techie requirements and needs

  2. Offline: Availability of functionality when not connected to the internet (which can be either flaky or expensive when you’re on an outdoor expedition). This means that extra-to-be-installed Python packages are undesirable. Standard rulez.

  3. Trackers: Ability to use (read) data from personal GPS trackers.

  4. Import: Obtain placemarks and tracks entered in other tools or made by others.

  5. Edit: Plan and annotate adventures, tune imported tracks and placemarks.

  6. Export: Share adventures in formats viewable and editable by other applications and by other users.

  7. Usability: Across all the other requirements (Offline, Trackers, Import, Edit, Export), data should be dead easy to edit, control, and keep track of.

Coder requirements and needs

  1. Coding: Elegant code. Compact. Easy to write, understand, maintain.

  2. Multi-language: Python is the language chosen, for the foreseeable future. But one day, we may want to create an adventure viewer and browser on a platform where Python isn’t supported. Today, we should choose clever interchange formats.

  3. Metadata: No redundancy when describing data structures. Have only one place where data is described (such as a JSON), and make all other places use it, with as little code as possible.

  4. Mapping: Have simple mapping mechanisms between data representations. Make it easy to import and export between code objects and various file formats, such as KML and the various CSV and JSON formats described below.

Externally defined, “given” formats

  1. KMLKeyhole Markup Language is an XML schema for expressing geographic annotation and visualisation within Internet-based Earth browsers, primarily Google Earth and Google Maps.

    1. Needs to be imported, as placemarks are best entered in geo-aware apps.

    2. Needs to be exported, as placemarks and tracks are best displayed in geo-aware apps.

  2. GPXGPS eXchange Format is another XML schema designed as a common GPS data format for software applications.

    1. Needs to be imported, as some trackers deliver GPX (in their specific dialect).

    2. Needs to be exported, for apps and devices to use placemarks and tracks created and maintained in other apps and devices

  3. CSV: A specific comma-separated value format used by GPS trackers. Currently, we support only Columbus V-900.

    1. Needs to be imported, to get track data

  4. HTMLHyperText Markup Language is the main markup language for creating web pages and other information that can be displayed in a web browser.

    1. Needs to be exported, to view and share data in text form

  5. GeoJSONGeoJSON is an open standard format for encoding collections of simple geographical features along with their non-spatial attributes using JavaScript Object Notation (.json).

    1. Could be exported and imported (but I’m not familiar with GeoJSON yet)

    2. Could it be used or extended as the basic format, the kingpin?

  6. CesiumCesium Language (CZML) [note: links to github, not Wikipedia] is a JSON schema for describing dynamic scenes in virtual globes and maps

    1. Could be exported for cool viewing of adventures in browsers

Representing data in Python code

We want to represent coordinates (in placemarks and time-stamped trackpoints),paths (sequences of coordinates), tracks (sequences of trackpoints) and other entities in Python code. But how? As objects? lists? dicts? What are the advantages of each one of them?

Given the coder requirement of non-redundancy (the “metadata” need above), and the vague assumption that JSON is the shit (= the data representation of choice), which Python concept can best be mapped onto a JSON object? Would this requirement help us choose between Python objects, lists and dicts?

From the way I asked the question, you may decipher that I am not quite 100% up to speed on Python. I do understand that Python objects are complex, and that you can fine tune their exact structure. But while I did most of my coding last century, I am still a simple man in need of an elegant answer.

By the way, part of a great answer would also tell me what Python modules I need to import (such as import json, xml.etree.ElementTree as ET), with a strong preference for standard modules, as I want any techie to be able to run the scripts without installation hassle (remember the “Offline” techie need above).

Representing data in flat files

We want to represent the geodata in flat files, in order for the techie end user to use his favourite text editor to annotate adventures, plan tracks, give metadata on placemarks and do other stuff not easily accomplished in an Earth viewer or web app, which is constrained by a rigid graphical user interface (see, I did do most of my coding in the character based past century).

The XML and even JSON based flat files are fine, but wordy. I coded a simple Python snippet for converting KML to CSV, and used it on my sample KML file with a thousand placemarks. What happened? The CSV file ended up with 15 % of the KML file size in bytes, and 5 % the number of lines. How this satisfies the “Edit” and “Usability” needs of the techie should be fairly clear: Within one eyeful, you get a much better overview. Much less error prone, much faster to edit, much quicker to sort. Perfect edit usability for techies!

So what am I asking you?

While I did enough coding last century to be quite comfortable with defining the greenelk comma-separated value format loosely basing it on RFC 4180, I am only somewhat comfortable, if at all, with the other choices made so far. And I would like to verify them with you.

I am somewhat comfortable with JSON as the core format to define the data structures (for coordinates, placemarks, trackpoints, paths, tracks et al.).

I am somewhat comfortable with picking GeoJSON and Cesium CZML as additional formats to support beyond KML and GPX.

I am not at all comfortable with whether to base the greenelk JSON format on extending GeoJSON, or creating it from scratch.

I am not at all comfortable with how to best represent JSON (GeoJSON or otherwise) in Python code. How should I map the JSON components – on objects, lists, dicts?