Track:

Fast & Furious: Ad Serving, Finance, & Performance

Location:

Churchill, G flr.

Duration

Duration:

4:10pm - 5:00pm

Day of week:

Tuesday

Level:

Advanced

Persona:

Developer

Key Takeaways

Learn how terabyte data can be processed on a single machine
Analyze how big data visualisations can highlight illegal trading
Understand how data storage and computing power are growing faster than problem sizes

Abstract

Neurensic has built a solution, SCORE, for doing Trade Surveillance using H2O (an open-source pure Java Big Data ML tool), Machine Learning, and a whole lot of domain expertise and data munging. SCORE pulls in private and public market data and in a few minutes will search it for all sorts of bad behavior: spoofing, wash-trading and cross-trading, pinging, and a lot more. It then filters down the billions of rows of data down to human scale with some great visualizations - well enough to use as hard legal evidence. Indeed SCORE and it's underlying tech is not just used by companies to police themselves; it is being used by the public sector to find and prosecute the Bad Guys. I'll close with a demo of a Real Life bad guy - he was defrauding the markets out of 10's of millions - who got caught via an early alpha version of SCORE. All data anonymized of course, so you'll have to go hunt last years Wall Street Journal to find his name for real.

Interview

Question:

What is the focus of your work today?

Answer:

I’m working on SCORE, which is a tool for trade surveillance. In the capital markets (stock markets, futures trading and the like) there’s an obligation to ensure that the trading is legal, that you’re not trying to be fraudulent in some way.

So most large firms that do trading engage a large number of traders who are not necessarily employees of hte company but who are freelance experts. The companies who host them and who give them access to an exchange have a legal obligation to do surveillance to understand what the traders are doing. SCORE is a new tool for doing trade surveillance.

Question:

What’s the motivation for your talk?

Answer:

The technology that’s in place right now is 20 years old, and SCORE is state-of-the-art. It’s going to be H2O on a Java process (instead of an old-school Windows DLL) and it’s going to use machine learning instead of a rules based engine, along with a GUI in a browser that’s friendlier to work with. It’s also hugely faster and more accurate than what’s gone on before.

The other side of it is: we’re solving a very human problem. We’re finding people who are attempting to cheat (or successfully cheating) the stock markets today. There’s a steady stream of people who are attempting to defraud people out of the markets and who are by and large successful because the tools for detecting them are really old. Plus, they know how to work around the tools and they do so on a regular basis.

So as soon as we turn on SCORE in a new trading house, we immediately find people who have been cheating for a long time. It’s very obvious that they’re cheating as soon as you look at the visualisations that are coming out of the tool. We are catching people who are doing big dollar cheating.

Question:

How you you describe the persona of the target audience of this talk?

Answer:

Probably Data Arch, some Arch, some Dev JVM, - also applies to CCO/CRO (Chief Compliance Officer, Chief Risk Officer) but not expecting too many in the audience

There are two types of audience persona: those who are interested in big data, using H2O as a big-data Java product, as well as big data miners, as well as a success story of how big data and machine learning work.

The other part is a human interest side: it’s fun to hear about someone who is cheating, how they are cheating, and how you catch them. There will be examples shown and people who have subsequently gone to jail.

Question:

What tools and techniques am I going to be able to take away from this talk?

Answer:

I took the existing prior toolchain on SCORE and threw out the database (MariaDB and Hadoop)-- I’m running on a single structured file system with a single JVM process running H2O. This combination goes a long way to solve problems that stack up to tens of terabytes (although not to petabytes and beyond).

There’s a GUI with Elm - which will be covered during the talk - but the key takeaway is that using a single JVM on a single machine with a structured file system can scale to handling terabytes worth of data.

Question:

How would you rate the level of this talk?

Answer:

Mid-to-expert.

Question:

What do you feel is the most disruptive tech in IT right now?

Answer:

I’ll say that constant shrinking of memory cells and the constant increase in size of memory on a single node means that a lot more problems don’t need clustering in order to be solved. Originally I used H2O for its potential clustering ability, but unless you’re Google you can solve a lot of problems on a single fat node. In addition, nodes are getting fatter faster than the problems are getting larger. So today I can buy a 512Gb machine and tomorrow I can buy a 1Tb machine, which is sufficient to solve most problems.

There’s a huge market opportunity for Tb big data problems using non-clustered technology. Hadoop is solution for a giant filesystem or giant mapreduce. Since the disks are getting so big I don’t need a giant filesystem to be able to store the data in a distributed fashion for a lot of problems.

Question:

QCon targets advanced architects and sr development leads, what do you feel will be the actionable that type of persona will walk away from your talk with?

Answer:

Simplify your architecture! No DB (unless you really need atomic updates; append-only does NOT count). No Hadoop (unless you really need high 10’s of Tbs and up data scales). Single machines are not hugely faster… but memory on 1 node has jumped up to low Tb counts. So single machine in-memory.

Speaker: Cliff Click

CTO @Neurensic

Cliff Click is the CTO of Neurensic, and before that the CTO and Co-Founder of h2o.ai, the makers of H2O an open source math and machine learning engine for Big Data. Cliff wrote his first compiler when he was 15 (Pascal to TRS Z-80!), although Cliff’s most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). That compiler showed the world that JIT'd high quality code was possible, and was at least partially responsible for bringing Java into the mainstream. Cliff helped Azul Systems build an 864 core pure-Java mainframe that keeps GC pauses on 500Gb heaps to under 10ms, and worked on all aspects of that JVM. Cliff is invited to speak regularly at industry and academic conferences and has published many papers about HotSpot technology. He holds a PhD in Computer Science from Rice University and about 20 patents.

Find Cliff Click at

Speaker page

CTO

Core Kafka team @Confluent

Ben Stopford

Causal Consistency For Large Neo4j Clusters

Chief Scientist @Neo4j

Jim Webber

Mini Workshop: Hands-on Deep Learning

Research Engineer @FastForwardLabs, Keras Contributor

Micha Gorelick

Mini Workshop: Hands-on Deep Learning

Director of Research @FastForwardLabs

Mike Lee Williams

Panel: What's Next for Our Programming Languages?

Elm Pioneer & Software Engineer @noredink

Richard Feldman

Panel: What's Next for Our Programming Languages?

Java Language Architect @Oracle

Brian Goetz

Panel: What's Next for Our Programming Languages?

Pulumi Co-founder & CEO, Previously @Microsoft Director of Engineering for Languages/Compilers

Joe Duffy

Panel: What's Next for Our Programming Languages?

High Performance & Low Latency Specialist

Martin Thompson

Panel: What's Next for Our Programming Languages?

CTO @Causality

Sylvan Clebsch

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Cliff Click at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Policing The Stock Market with Machine Learning

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Cliff Click at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World