21CS753 Introduction To Big Data

21CS753 Introduction To Big Data

Course Learning Objectives

CLO 1. Understand Hadoop Distributed File system and examine MapReduce Programming                         CLO 2. Explore Hadoop tools and manage Hadoop with Sqoop
CLO 3. Appraise the role of data mining and its applications across industries                                                    CLO 4. Identify various Text Mining techniques  

                                                

 

SYLLABUS COPY

MODULE - 1

Hadoop Distributed file system

HDFS Design, Features, HDFS Components, HDFS user commands Hadoop MapReduce Framework: The MapReduce Model, Map-reduce Parallel Data Flow,Map Reduce Programming

MODULE - 2

Essential Hadoop Tools

Using apache Pig, Using Apache Hive, Using Apache Sqoop, Using Apache Apache Flume, Apache H Base

MODULE - 3

Data Warehousing

Introduction, Design Consideration, DW Development Approaches, DW Architectures

Data Mining

Introduction, Gathering, and Selection, data cleaning and preparation, outputs ofData Mining, Data Mining Techniques

MODULE - 4

Decision Trees

Introduction, Decision Tree Problem, Decision Tree Constructions, Lessons from Construction Trees. Decision Tree Algorithm

Regressions

Introduction, Correlations and Relationships, Non-Linear Regression, Logistic Regression, Advantages and disadvantages.

MODULE - 5

Text Mining

Introduction, Text Mining Applications, Text Mining Process, Term Document Matrix, Mining the TDM, Comparison, Best Practices

Web Mining

Introduction, Web Content Mining, Web Structured Mining, Web Usage Mining, Web Mining Algorithms.

Course outcome

At the end of the course the students will be able to:
CO 1. Master the concepts of HDFS and MapReduce framework.
CO 2. Investigate Hadoop related tools for Big Data Analytics and perform basic                                                CO 3. Infer the importance of core data mining techniques for data analytics                                                      CO 4. Use Machine Learning algorithms for real world big data.

Suggested Learning Resources

Textbooks
1. Douglas Eadline,”Hadoop 2 Quick-Start Guide: Learn the Essentials of Big DataComputing in the Apache Hadoop 2 Ecosystem”, 1 Edition, Pearson Education,2016.                                                                                         2. Anil Maheshwari, “Data Analytics”, 1 Edition, McGraw Hill Education,2017

FOLLOW US

Scroll to Top