Track: Big Data Frameworks, Architectures, and Data Science


Day of week:

Designing and building modern big data frameworks, and building machine learning models from that data. As big data tools and architectures continue to evolve, how do you architect and select technologies that work now but are also future-proof? What frameworks and architectures are interesting right now, and how do you get the best from them? How do you build and run performant machine learning models, and how do you do 'data science' with that big data?

Track Host:
Paul Miller
Analysis and insight; Cloud Computing, Big Data, Semantic Web and more
Paul Miller is an analyst and consultant, based in the East Yorkshire (UK) market town of Beverley and working with clients worldwide. He helps clients understand the opportunities (and pitfalls) around cloud computing, big data, and open data, as well as presenting, podcasting, and writing for a number of industry channels. His background includes public policy and standards roles, several years in senior management at a UK software company, and a Ph.D. in archaeology.
10:20am - 11:10am

by Emily Green
Backend Engineer at SoundCloud

We'll be looking at how SoundCloud uses Cassandra. At the moment we mostly use Cassandra for our stats product, which provides analytics data to people with sounds on SoundCloud. Since releasing that, more teams in SoundCloud have started using Cassandra, and so the exact content of this presentation will depend on how they get on between the time of writing this and QCon - whatever happens I'll let you know! The intention is to describe at least a couple of Cassandra instances, from the...

11:30am - 12:20pm

by Dave McCrory
CTO at Basho

Data Gravity has been recognized as affecting all aspects of distributed systems, data storage, and networks. When there are even stricter performance requirements or when there are very large volumes of data, Data Gravity’s affects are amplified. What is Data Gravity? How does it affect performance and portability? Why are these effects amplified when there are larger volumes of data? Get the answers to these questions and more in this session.

1:20pm - 2:10pm

by Richard Kasperowski
QCon Open Space Facilitator

Open Space

Join Paul Miller, our speakers, and other attendees for the Big Data Open Space.

What is Open Space?

Every day at QCon London, we’ll open space five times, once for each track. Open Space is a kind of unconference, a simple way to run productive meetings for 5 to 2000 or more people, and a powerful way to lead any kind of organization in everyday practice and extraordinary change.



2:30pm - 3:20pm

by Sean Owen
Director of Data Science at Cloudera

Apache Spark continues to gain momentum as the new processing paradigm for Apache Hadoop, and for the data scientist, it has a lot to like: natively distributed, REPL, Python APIs in addition to native Scala, and a library of machine learning algorithms, MLlib. Spark 1.2 includes an implementation of random decision forests, an important and popular ensemble classifier/regressor algorithm.


This talk will introduce Spark, Scala and random decision forests to the curious,...

3:40pm - 4:30pm

by Kristoffer Dyrkorn
Scientist at BEKK Consulting

Most people have experienced the boredom of being stuck in traffic. Up-to-date and credible information about congestion and detours could save us time and frustration in our everyday lives.

The Norwegian Public Roads Administration is now building a new infrastructure for road traffic measurements, and the system will provide high-quality, near-realtime information as publicly available open data. The project uses embedded devices for vehichle detection and classification, a network...

4:50pm - 5:40pm

by Simon Metson
Cloudant product manager at IBM

Organisations have numerous, overlapping data systems and teams, all fulfilling business-critical data processing roles. Often the use of these components has \ evolved, leading to painful or fragile data processing environments.

Who you gonna call?

From utilizing partitioning in a relational database to user-facing in-memory caches to a Lambda architecture, it’s important that pieces of the system are the “right tool for the job”; appropriate for the task at hand, and not...


Covering innovative topics

Wednesday, 4 March

  • Architecture Improvements

    Next gen architecture, Arch over the full lifecycle, Bleeding edge tech in legacy, Cognitive biases in architecture, Evolving Architecture.

  • Big Data Frameworks, Architectures, and Data Science

    As big data tools and architectures continue to evolve, how do you architect and select technologies that work now but are also future-proof?

  • DevOps and Continuous Delivery: Code Beyond the Dev Team

    As infrastructure becomes as malleable as code, a unified approach from reqs to ops is needed to deliver promised breakthroughs.

  • Engineering Culture

    The best teams and companies talk about how to create amazing engineering cultures.

  • Java - Not Dead Yet

    Java is evolving to meet developer and business needs, from lambdas in Java 8 to built-in support for money types rumoured for Java 9.

  • Mind Matters at Work

    How theories from neuroscience and psychology can help us better understand IT professionals and discover what really motivates them.

Thursday, 5 March

  • Docker, containers and application portability

    People building stuff for and with containers showing why application portability is important, and what can be done with expanding ecosystems.

  • Evolving agile

    Reflecting on and learning from successes and failures in applying agile approaches since the creation of the Agile Manifesto and exploring ways of applying agile practices to increase business value.

  • HTML and JS Today

    The state of the art in web technologies. What is important to know and why?

  • Internet of Things

    What software devs need to know to design and build for instrumented environments and reactive things, what new issues and questions it raises.

  • Modern CS in the Real World

    How modern CS helps you tackle today's problems.

  • Reactive Architecture

    How to create reactive systems is more than simply learning a framework. Thinking in a reactive way helps you to design responsive architectures.

  • The Go Language

    The Go Language - Concurrency, Performance, Systems Programming.

Friday, 6 March

  • Architectures You've Always Wondered About

    Get a rare look behind the scenes and get to see the architectures of the most well-known sites with the least known architectures.

  • Low latency trading

    The 'race to zero' continues. Join us to learn about the latest tecniques being deployed to optimise order routing and execution.

  • Open source in finance

    Financial services have changed from OS as cost-saving to a competitive weapon. See open source projects that are disrupting the finance industry.

  • Product Mastery

    Come have fun with fellow PMs and BAs as you learn about Value Management. We'll even tell you dark tales of Snarks, Hippos and other obstacles.

  • Taming Microservices

    Tackling the challenges of microservices in practice.

  • Taming Mobile

    Mobile is no longer the Next Big Thing but a requirement for your business. Hear from those who have implemented successful mobile systems.