Teaching Twofishes to Fly: Faster Iteration on Foursquare's Open Geo Infrastructure
Foursquare's data is extremely rich in location context--from our tens of millions of venues to billions of checkins and passive location pings, almost everything is tagged with geographic coordinates. Our homegrown and open-sourced Geo infrastructure powers the geocoding, geo aggregation and analysis of all that location data and is itself built on top of a mostly open set of geographic data.
At the core of our Geo infrastructure is Twofishes [http://twofishes.net]: a coarse, splitting, autocomplete-capable forward geocoder and a high-performance coarse reverse geocoder written in Scala using Thrift + Finagle. Twofishes combines Geonames gazetteer data with other sources of ranking data (such as Natural Earth) and polygons (much of it from our Quattroshapes project [http://quattroshapes.com]).
Twofishes has been used by our friends at Twitter and Pinterest and we're seeing increased interest from other fast-moving startups that are committed to Open Geo.
In the past year, a consistent theme in our work on Twofishes has been improving iteration speeds: We built a new Autocomplete evaluation framework to speed up iteration on quality of geocode results and user experience. We built a new Hotfix architecture that enables us to deploy high priority geo data fixes to our production servers in under 15 minutes. We integrated with Twitter's Iago load testing framework to speed up iteration on performance. We are rewriting our index build pipeline to be parallelized using MapReduce so we can build our indexes in under an hour and therefore consume and deploy updates to our underlying datasets more frequently. This rewrite will enable us to rapidly iterate as we work through our next major project---improving the quality of the Twofishes query splitter. We will demonstrate the results of, and share learnings from, all this work.