In this talk we will discuss creating a distributed raster processing service capable of handling spatial, spatial-temporal and multi-band rasters built with the GeoTrellis library. We rely on Apache Spark for providing the distributed computation engine and Hadoop HDFS with Apache Accumulo for providing distributed persistence.
Big Data Day
We’re continuously building a comprehensive raster map of the world. We use imagery from dozens of providers, delivered in several ways, at scales from hemispheres to trampolines. In this talk, we’ll show some of the ways we use (and create) open-source tools to integrate terabytes of heterogeneous raw data into a single, constantly improving image of Earth. We’ll cover topics from ways of dealing with nodata values to how we pursue aesthetics, driven by real examples with lots of beautiful images.
Pelias is a modular open-source geocoder using ElasticSearch for fast autocomplete and forward and reverse geocoding. It's designed to support a variety of datasets, including Geonames, Quattroshapes, OSM, and whatever else you can throw at it.
This session aims at discussing some of the core modules of Pelias and familiarizing the community with some of the design choices made by us and the geocoder itself. Additionally, there will be a few demos (projects powered by pelias) and finally some Q&A.
The Apache CouchDB project is a NoSQL database that enables users to store unstructured data with high availability and partition tolerance.
As an example of a geo big data problem, this presentation will show how to process and store the results of a large image classification in geojson format into CouchDB and the potential problems that can arise of
* Database availability
* Multiple data types
* Sharding of large geospatial data across multiple database nodes
* Querying geospatial data efficiently using complex polygons
Foursquare's data is extremely rich in location context--from our tens of millions of venues to billions of checkins and passive location pings, almost everything is tagged with geographic coordinates. Our homegrown and open-sourced Geo infrastructure powers the geocoding, geo aggregation and analysis of all that location data and is itself built on top of a mostly open set of geographic data.
We describe work towards enabling faster iteration for improvements to relevance, performance, index build times and hotfix deployment times of the Twofishes geocoder that is the core of our Geo infrastructure.
GeoMesa builds on the Hadoop and Accumulo ecosystem to scale up indexing billions of spatio-temporal data. This presentation will showcase and discuss some of GeoMesa's existing distributed computational capabilities such as K-nearest neighbor queries, and then move on to highlight relevant work by the fall 2014 Facebook Open Academy (FOA) students. The FOA students have created a Web Processing Service (WPS) process to get back aggregate time series data for an Extended Common Query Language (ECQL) query.
(I would be fine with a 1 hour talk slot).