Businesses take warning from this latest blow coming from Alteryx, which at first glance, seemed to be analogous to the data breach headlines for 2017 led by Yahoo, Uber, Arby’s, and Equifax to name a few. Why the Alteryx event stands out goes beyond phone numbers, email addresses and zip codes. Among the 248 fields of the 123M records lives what fraudsters crave most: consumer social context. In this case, the personal and identifiable information (PII) that you would expect to find is paired with analytics-backed information on consumer behaviors and interests, for example; book buying, financial spending habits, and gourmet cooking. This additional behavioral context helps up the fraudster’s game in winning the trust of its victims.
Events such as this highlight yet again the responsibility corporations bear to protect consumers’ privacy and security of personal information – especially as they run forward into the modern era of big data and an analytics-driven business strategy.
The problem is that managing big data isn’t a sprint. It’s a marathon. Big data moving through a modern enterprise travels a long, complex journey from the moment it’s produced or acquired, through innumerable interim preparation or storage stops, to its final point for consumption by business users or analysts. Everywhere along this path security, governance, and enterprise grade management practices are essential.
The Last Mile: Where it all comes to together – or apart.
Stand-alone Data Preparation Tools or Wranglers – a crowded category that Alteryx fits into – provide a necessary function and series of benefits for data in that “last mile.” That sexy sprint to the finish line of data visualization, predictive analytics, advanced modeling, data transformations, aggregations, unions, joins, filters and the likes. These capabilities are great against data that is already clean, well-organized and governed.
However, in marathon terms, there are many miles in the data journey not addressed by Wranglers, and it is here where enterprise-for-scale providers that manage data throughout its entire lifecycle – like Podium – stand out. The value of the Podium Data Marketplace is working with original source data to ultimately deliver it for business-ready use, at scale and with high agility all while maintaining data integrity.
Every Mile Matters
Data integrity begins at ingest – that “first mile” of the data journey and beyond. Podium’s attention to data on ingest through automated data validation and profiling covers a spectrum of critical checkpoints that Wranglers take for granted, yet benefit from, with their last-mile toolset. A personal trainer for the big data journey, Podium checks for data errors, incorrect formatting, and other idiosyncrasies common in mainframe and legacy big data sources up front.
And further data preparation and exploration capabilities continue under Podium where Wranglers do not – under a consolidated catalog of all data. This is new territory for Wranglers because their tools were not built for cataloging data and data governance - they were built for data manipulation. This is a realization among the Wranglers now and why you see them attempting to move further down from the first mile in the data path to ingestion, orchestration, preparation, governance and exploration of data in a variety of modern and traditional repositories.
The enterprise-grade, automated data management Podium provides would have removed the blind spots in Alteryx. Complementary to all data wranglers, Podium ensures agility and productivity, providing secure self-service access to enterprise data on-demand to be extended with insights through every mile in the data journey.
We are sympathetic to Alteryx’s current debacle, however there were steps (applied data security measures and automated controls) that could have greatly reduced the likelihood of leaking personally identifiable information (PII). This recent data breech is a catastrophe, not only for Alteryx but also for the 98% of American households who private information might well be in the hands of malicious actors.
Without doubt, this can and will happen to others and the next breach is not all on the shoulders of the Wranglers. Company data delivery teams charged with empowering a growing army of data scientists and business analysts with expanded access to big data, need to expand their thinking and awareness of big data management to consider the whole data marathon, not just the final mile.