The Best Kept Secret in Big Data is No Longer a Secret

Posted by:

For years, HPCC Systems was a best kept secret within LexisNexis. It was the technology that enabled them to run at million miles a minute when rest of the industry was just crawling with the influx of data. However, the senior management within LexisNexis thought such power should not be contained. They decided to unleash it! In 2011, HPCC Systems was made open source. Its not only opensource, but truly free, with a huge community behind it.

HPCC Systems pre-dates most big data technologies out there. It was built ground up using C++. The paradigm behind it was to build a tool, that allowed processing and analyzing of business data at any scale. LexisNexis has several thousand data points that they ingest data from, this data is both structured and unstructured, and amounts to Petabytes on the disk! The team that manages these sources, quality, security and analysis of this data is less than 200 people. How is that possible? For that you have to understand a little more of HPCC Systems.

HPCC Systems consists of two major components. Thor and ROXIE. Thor is an engine meant to crunch your data. As the name suggests, Thor’s hammer flattens the data out, while ROXIE delivers this data to the end user. The system itself is very flexible. If you had lots of data coming in but not a lot of people consuming it, you could have a Thor heavy implementation. Whereas if you have relatively less data coming in and a lot of users consuming this data, you may have a ROXIE heavy implementation. Think of HPCC Systems as a putty that you can mold anyway you want.

Most big data systems were invented for an internal need. Eventually the power became too big to contain and the platform was made open source. HPCC Systems is different in one way, it’s very modular. Each of its components has a very specific task and it does that really well. This flexibility and containerized design, allows the user to store and access sensitive data with ease.

At the core of HPCC Systems is a language called ECL (Enterprise Control Language). ECL is reminiscent of C++ however it is declarative and datacentric. The compiler is forgivable, which means you may make a mistake in a join, and the compiler would correct it for you.

Another strength of HPCC Systems is its machine learning libraries. These libraries have been natively built and use the power of distributed computing. This usually means, you don’t have to learn multiple different languages and have silos of programmers which don’t speak to each other. If you know ECL, you can do everything from data ingestion to analytics and AI.

The entire platform runs on commodity hardware and requires no special training to maintain it. After-all it was developed ground up to “just keep working”.

While HPCC Systems may not be able to solve the world hunger problem, I am sure it can solve most of your business problems.

*HPCC Systems is a LexisNexis Risk Solutions Technology.

**For more information about HPCC Systems, please visit


Add a Comment