Efficient Streaming Vector Processing in Scala at Socrata
(I would be fine with a 1 hour talk slot).
At Socrata, the leaders in public open data, we process many datasets with geospatial location data. This talk covers how we perform efficient, in-memory streaming vector processing on the JVM using Scala. Why use the JVM, and Scala, for geo processing? What does an architecture look like for stream vector processing? What does vector processing on the JVM involve? What are the advantages of stream processing versus in-database? How can we efficiently represent polygons on the JVM heap? How can we cache and manage memory? How can the current architecture be scaled out to distributed stream processing systems like Apache Spark? We will attempt to answer all your questions and more.
Evan is Principal Engineer at Socrata, Inc. -- bringing the power of data to enhance citizens lives. He loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including a columnar real-time distributed query engine. He is an active contributor to the Apache Spark project and co-creator of the open-source Spark Job Server. He is a big believer in GitHub, open source, and meetups, and have given talks at various conferences including Spark Summit, Cassandra Summit, and PNWScala. He has Bachelor's and Master's degrees in Electrical Engineering from Stanford University.