Big Data is a term which describes techniques and technologies to capture, store, distribute, manage and analyze larger-sized datasets with high-velocity and different structures. Big data can be of many forms like structured, unstructured or semi-structured, resulting in incapability of conventional data management methods. Data is generated from various different sources and can arrive in the system at various rates. Parallelism is used to process these large amounts of data in an inexpensive and efficient way. Big Data is a data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. Hadoop is the core platform for structuring Big Data, and solves the problem of making it useful for analytics purposes. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.


Big Data, Hadoop, Map Reduce, HDFS, Hadoop Components


