Tag Archives: BigData

BigData Ecosystem

Apache Hadoop: an open-source software framework for storing and processing large volumes of data in a distributed computing environment.

Major components- HDFS (Hadoop Distributed File System) – for storing data, and MapReduce – for processing data.

Apache Pig: Works on top of Hadoop to write data analysis programs in an easier way.

Apache Hive: SQL-like query language for Hadoop.

Apache Storm: It is a distributed, real-time processing system for big data. It has the ability to process data in real time, which makes it well-suited for use cases such as real-time analytics for stocks, fraud detection, and event-driven applications.

Apache Spark: It provides an in-memory data processing engine, which makes it faster and more flexible than Hadoop’s MapReduce for many use cases.

4 Vs of Big Data:

  • Volume: which normal RDBMS databases cannot store or are not meant to process.
  • Velocity: Speed at which data is getting added.
  • Variety: Structures, Semi-structured and unstructured. Traditional systems are meant only for Structured data. For example, reviews, comments, images, etc are nonstructured data
  • Veracity: Non-verified data. Data that may or may not be useful. Inconsistent data which cannot be used straight away.

BigData and the hype

You must be hearing a lot of noise around the term BigData these days. What exactly is big data and how is it impacting the tech world?

As the name suggests, it is about data, a lot of data, huge data. What’s new? The amount and type of data. Data mining has already been an important part of any company right now, it helps them understand current trends and predict future upto a certain level. With BigData coming in, the concept is stretched further.

Let’s take an example, an ecommerce website wishes to understand customer behavior to improve its reputation and responsiveness. Where can it find the data, past sales, logs, customer activity on the website, blogs, facebook, twitter, google plus.. phew, that is tones of data. But the more data you can collect, a better decision you can make.

The challenges include- collection of data, search the relevant data, storage and most importantly the analysis i.e. conversion of raw data to useful information. Think about it, if a company can convert all the raw data related to its brand available on the various arenas to useful information like what people think about it, what are their exceptions, how to improve user experience etc, it will be a big win for the company.

Related post: http://kamalmeet.com/2013/03/infosys-launches-big-data-edge/