Monday, 2 January 2017

Big Data Mining Platforms: Distributed Aggregation for Data-Parallel Computing

Vol. 3  Issue 3
Year:2014
Issue:Jun-Aug
Title:Big Data Mining Platforms: Distributed Aggregation for Data-Parallel Computing
Author Name:Bala Krishna Sapparam and Janardhan
Synopsis:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. In typical data mining systems, the mining procedures require computational intensive computing units for data analysis and comparisons. A computing platform is, therefore, needed to have efficient access to, at least, two types of resources: data and computing processors. For small scale data mining tasks, a single desktop computer, which contains hard disk and CPU processors, is sufficient to fulfill the data mining goals. Indeed, many data mining algorithm are designed for this type of problem settings. For medium scale data mining tasks, data are typically large (and possibly distributed) and cannot be fit into the main memory. Common solutions are to rely on parallel computing , collective mining to sample and aggregate data from different sources and then use parallel computing programming .In this paper we concentrated the Tier I i.e.,Big Data Mining Platforms by using MapReduce (MR). For this technique we follow the Distributed Aggregation for Data Parallel computing. In this technique we reduce the network traffic over the network.

No comments:

Post a Comment