How Do You Tame Your Data Demon?

Big Data has been around since 2005 but not everyone understands it. A couple of years ago, a potential client called me because they had a problem with their Big Data project. They said, if I volunteered my time to help them, I would have a chance to win the project. Naturally, due to the goodness of my heart (and a slight competitiveness about winning the project), I said yes. I told them, before we begin, I need access to your systems, so that I can look at your data and get a feel of it. The gentleman quickly told me they do not give policy access to external vendors, but that he could send me the Big Data through an email attachment.

A lot of people don’t quite realize what Big Data is. What Big Data is to you? 1GB? 1TB? 1PB? I have a different perspective on Big Data. Big data is not about just the size. It’s about the Volume, Velocity, and Variety. Most modern systems can handle size with ease. SQL Server, Oracle or even MySQL has made leaps and bounds progress in technology. So, why do we need the scale of Hadoop or HPCC Systems? That’s because while these systems can handle the size, they crawl when it comes to velocity and variety.

A lot of industries have tamed the velocity issue by adding more compute power, memory, moving to fast SSDs, running multiple systems in parallel, etc. Payments, for example, measure speeds in Transactions Per Second! You get penalized if the processing did not happen in time. Similarly, in the mobile industry or in the field of telemetry, data flows with much more velocity.

What about Variety? While the variety problem is comparatively easier to solve using document style databases, when combined with Volume and Velocity, it creates a toxic mix that can cripple an organization.

This is where you need a truly resilient system that can tame this Data Demon. Much like Goddess Kali who is adept at killing all demons! Any big data system today should be able to handle all three Vs of a Data Conundrum and create a veracious system that can be relied upon.

DataSeers uses HPCC Systems to solve the 3V conundrum and to achieve the fourth V. It is cooking a brand-new product that makes use of Hyper Ledger to then create a “single source of truth”. This is how DataSeers tames the Data Demon. How do you do it?