Jason Hunter, Principal Technologist with MarkLogic

Jason Hunter is Principal Technologist with MarkLogic, and the father of MarkMail.org. He's the author of "Java Servlet Programming" (O'Reilly Media) and the creator of the JDOM open source project for Java-optimized XML manipulation. He's also an Apache Software Foundation Member and former Vice-President, and as Apache's representative to the Java Community Process Executive Committee he established a landmark agreement for open source Java.

He's an original contributor to Apache Tomcat, a member of the expert groups responsible for Servlet, JSP, JAXP, and XQJ API development, and was recently appointed a Java Champion. He's also a frequent speaker. His largest audience was 15,000 at a JavaOne conference keynote.

Presentation: "Unifying the Search Engine and NoSQL DBMS with a Universal Index"

Track: Solution Track: Wednesday

Time: Wednesday 10:35 - 11:35

Location: Rutherford Room, Fourth Floor

Abstract:

In contrast to single-function architectures, MarkLogic Server takes an unusual approach to collapsing the usual hierarchies of types of servers that make up a complete application, combining Search, a NoSQL DBMS, and an application server in a single kernel. The computational foundation for this hybrid is the Universal Index.

In this talk, we'll begin with the familiar text indexing data structures and algorithms that underlie search engine technologies. We'll extend that index to cover document structure and semantics, add scalar range indexing in one and two dimensions (including geospatial application), and then incorporate "reverse" indexing of queries. We will demonstrate a novel type of "matchmaking" query whose evaluation is based on a composition of forward and reverse index evaluation. Finally, we'll explore the means by which all of this indexing may efficiently run concurrently with querying, using Multi-Version Concurrency Control and Log-Structured Merge Trees, providing ACID transactions together with lock-free query evaluation, built-in sharding, terabyte-per-server scale-out, replication, and query distribution.