Deploying Open Source Geospatial Software – Part 1: Challenges

My blog has been quiet for some time. As many of us I’ve been busy doing projects, all involving Open Source Geo (OSGeo) software. Partly development, writing software, I love it, but also more and more in “what comes next”: deploying and maintaining “the application” with all of its dependencies. For this I have been using several “deployment strategies” I would like to share.  To be specific and for a TL;DR : over the years I went through custom compiles/installs, Debian/Ubuntu(GIS) package installation, writing Debian/RPM packages, using Puppet (not yet Chef), and now sitting on the Docker of the Bay. For many this last sentence may be gibberish, so I will try to sketch some context first. Calling this blog Part 1 also hopefully keeps me attached to the subject and writing as I have very good news. But today, ‘helas’, the bad and the ugly.

In terms of architecture I always prefer a “best-of-breed” selection of Open Source Geospatial (OSGeo) software components, rather than select a single platform/”Suite”. Nothing against Suites, this is a domain where  Open Source Geo providers, are, literally, “stacking up” against proprietary GIS providers. Boundless, GeoSolutions, Geomajas, to name a few, have great platforms you should check out.  Because I like to dive deep into open source geospatial technology, trying to contribute where possible, even writing some myself, and having experienced the pros and cons of each individual component, I tend to go for a best fit in a project. For example, for WMS/WFS I may apply MapServer or GeoServer or deegree, for web clients OpenLayers or Leaflet. As for tiling, well, to be honest, nothing beats MapProxy. GDALQGIS, GRASS, GeoNetwork or pycsw, I could go on. I am a huge fan of each of these projects, standing on the shoulders of giants when using their products.  It depends on the project’s requirements what I choose.

But going for a “best-of-breed” architecture, where a selection of Open Source Geospatial components is made, usually extended with custom software and configurations, creates challenges in deployment and maintenance. With the latter I mean: going into production (live) and maintaining the system for an N number of years through modifications and updates. “Getting it working” on a single system will often succeed, possibly after a great number of Google searches,  mailing list threads, then finally getting all components and dependencies installed, often by hand. In some cases even recompiling components, moving libraries, setting PATHs etc. At some point “it all works” but at the same time we enter the “don’t touch it”  phase. We have an “upgrading issue”, but doable on a single system/server.

To worsen this situation: most professional IT-departments employ a multi-step deployment-strategy. There is not just a single system where the application runs, but several systems, each dedicated to, and named after their phase in deployment. For example, governmental projects within The Netherlands often deploy “OTAP”. OTAP (in English DTAP) stands for Development, Test, Acceptance, Production. These are, often rigorously, separated computing infrastructures (servers, clients). An application with all its dependencies has to be deployed sequentially on/through each of these phases, sometimes called “pillars” (Dutch: zuilen). In many cases a direct connection between these systems is blocked by the IT-department.  In the simplest case, we have a Test and Production system. Hence, our carefully handcrafted system will have a major challenge getting from one pillar to the next.  But I am not finished yet, we have the “tribal thing” going on in Open Source Geospatial software. Let me expand.

Diversity is good. Also in software. Over the years Open Source Geospatial software has been developed using a plethora of programming languages. Each came with a variety of deployment systems. I am talking about Java, Python, JavaScript/NodeJS, C/C++, and recently Go. These languages usually have some kind of library and deployment technology. Take Java: for server side components we need to have an “J2EE Container”, in most cases Tomcat, and deploy .war files (e.g. GeoServer or GeoNetwork). For Python and “CGI-able” components like MapServer, we may just need a CGI-server like Apache or Nginx.  Each of these products deploys in its own way, has its own method for maintaining its configuration and managing updates. In Dutch we call this a “Lappendeken”. The closest translation I found is a “patchwork”, that is to say a diverse deployment and maintenance system. Individual products may provide a “GUI” to manage configurations, stored in diverse ways, from single XML/YAML files to even databases. No way to manage these products in a uniform way. For an outsider, or a cynical proprietary GIS-provider, this all could be labeled, as “Open Source Geospatial (deployment) is a big mess”.

So dear readers,  sketching this bag of problems, in a positive sense: challenges, how we go from here? As I indicated, there is good news. The answer, my friend, lies in abstraction. Abstraction is the way that software technology has always progressed: from machine instructions to assembly and programming languages, through data structures, objects and classes. To components and packages. Coupling and cohesion is another progressing force: maximizing cohesion (do one thing good) and minimizing coupling (reduce dependencies). All in all I have been finding solutions to the above problems using very accessible technologies. In the next two parts I hope to expand on these further as I am picking just two (Deployment Strategies) for now. The first is Debian Packaging (with some Puppet), the second is Docker. In short: what to expect in my next two blogs (Part 2 and Part 3):

  • Debian Packaging: writing Debian packages to maintain software and configuration in a multi-step deployment environment
  • Docker: building/maintaining Docker images but keep control (on host) over their configuration, state and functionality

Also with some telling images, as these are lacking in this post!

2 Comments:

  1. This is a great post. I hoped to find part 2 and 3 somewhere … 🙂

    I do myself put very much to much time in deploying and trying to understand the different parts. Right now I wonder what I should choose to have MapServer running. Have tried Uwsgi behind Nginx, but it only leads me to the “ok, it works now, don’t touch phase” as you describe. And I want to avoid the Apache beast 🙂

    Thanks for a good post

Leave a Reply

Your email address will not be published.