Thanks for the memory
For years, Qlik’s key technical advantage was the performance and analytic agility of the engine. The power came from our in-memory technology, while speed-of-thought insights rested on what we called the associative experience.
Yet in-memory technology has always faced a limitation: simply the amount of data which memory may hold. Thanks to 64-bit processors, cheap chips, effective compression and the reality that most business use cases reference only a subset of data, this constraint felt more theoretical than practical. I rarely met a case in the wild where the restriction caused problems.
Today, however, Big Data is a key resource for every business, and the demand evolves. The Qlik Analytics Platform attracts developers and analysts handling very large data volumes in companies such as Rovio (of Angry Birds fame), King.com (Candy Crush Saga!) and Spotify. Qlik users need to crunch serious numbers, but want the associative experience too.
Experiencing the experience
The associative experience is possible because Qlik indexes all data loaded into memory. With an ability to discover data relationships automatically, this ensures that Qlik always calculates aggregations properly (and instantly) in the correct table. A small point? It rests on complex tech, and you'll appreciate that complexity if you have ever painfully debugged double-counting, incorrect aggregations or confusing results in another tool.
The in-memory index of all data also enables one of my personal favorite features: the deceptively simple gesture to Select Excluded values or to Show Alternatives.
Imagine I’m analyzing game play for our latest mobile sensation. I have selected to see iPhone users who own our previous game and who have reached level 10 of the new game. I could be analyzing how many players give up at that point, how many make in-app purchases, or some such scenario.
Qlik enables me to select and analyze that case easily. Even today, with the right adaptor or Big Data partner, I might crunch many millions of rows of data. Qlik have demonstrated this kind of connectivity and data volume many times.
But now I have another question. Qlik Sense orients me in my selection, when I glimpse the histograms of currently selected data. I can see that my selection represents only a small proportion of all customers. From there, new questions are simple to formulate. What about customers who have not reached level 10? That‘s a much larger number. Or customers who are not using iPhone? Even more.
But to follow up, you now must see the entire data set, aggregated, visualised and navigable. And you had better see that data quickly, because your mind is moving on already to other questions. You demand sub-second response times.
Analytics at the next level
There is the advantage of the associative experience and it’s challenge. You can ask questions that defeat other tools, but how to realise these benefits with huge volumes of data that don’t fit into memory? This scenario is increasingly common as IT departments deploy data lakes without the transformations and data reduction of traditional data warehouses.
As I said, there are some practical approaches for many customers, as Qlik has some great relationships with Hadoop distributors such as Cloudera, and was the first BI vendor to integrate with Google Big Query. Yet still, the R&D team want to bring the fullest Qlik experience to market.
The usual solution for BI vendors would be a SQL-over-Hadoop query interface. But the Qlik engine is unlike other BI interfaces. Traditional SQL optimizes for joining, filtering and restricting data. Qlik, in contrast, optimizes for groupings, relationships and set queries, while unique gestures, such as Select Excluded (return all the records NOT in my current selection!) can break the spirit, and sometimes the execution, of standard query interfaces.
New Answers in a New Language
When I left Qlik about a year ago, Mike and Jose were exploring some promising ideas. In particular, they had a new approach to indexing all the data that the engine looks at. By dividing every table into manageable “chunks” - which in the world of Big Data is still tens of millions of rows - they can distribute the data across multiple servers for best performance. A global table summarizes the chunks.
This approach gives the necessary scale for handling Big Data, but what of the restrictions of SQL and the demands of the associative experience?
I have a rule of thumb for innovation: merely incremental work doesn’t count! I like to see radical new answers to basic questions. The Qlik R&D team’s answer to such tough questions is radical enough: they created a new language.
Transparent to the user, Selection Query maps your current selection over indexes of your data sources. The engine then translates any associative query into an efficient call to retrieve relevant rows. Mapping the current selection over indexes rather than data sources is important, because in a Qlik model sources may be very diverse. Qlik mashes up indexes of source data sets, which may otherwise be incompatibile. From there, rows are efficiently retrieved from relevant chunks of each source as needed.
There’s some great intellectual property in this approach - and Qlik are actively patenting these techniques. More importantly for users, there’s the promise of the best of Qlik’s associative experience with the power and scale of Big Data. For me, there’s the satisfaction of seeing my old team still pushing at the edges of what’s possible - and what a difference a year makes!