21CS71 Big Data Analytics
Course Learning Objectives
CLO 1. Understand fundamentals and applications of Big Data analytics
CLO 2. Explore the Hadoop framework and Hadoop Distributed File system and essential Hadoop
Tools
CLO 3. Illustrate the concepts of NoSQL using MongoDB and Cassandra for Big Data
CLO 4. Employ MapReduce programming model to process the big data
CLO 5. Understand various machine learning algorithms for Big Data Analytics, Web Mining and
Social Network Analysis.
SYLLABUS COPY
MODULE - 1
Introduction to Big Data Analytics: Big Data, Scalability and Parallel Processing, Designing Data
Architecture, Data Sources, Quality, Pre-Processing and Storing, Data Storage and Analysis, Big Data
Analytics Applications and Case Studies.
MODULE - 2
Introduction to Hadoop (T1): Introduction, Hadoop and its Ecosystem, Hadoop Distributed File
System, MapReduce Framework and Programming Model, Hadoop Yarn, Hadoop Ecosystem Tools.
Hadoop Distributed File System Basics (T2): HDFS Design Features, Components, HDFS User
Commands.
Essential Hadoop Tools (T2): Using Apache Pig, Hive, Sqoop, Flume, Oozie, HBase.
MODULE - 3
NoSQL Big Data Management, MongoDB and Cassandra: Introduction, NoSQL Data Store, NoSQL Data
Architecture Patterns, NoSQL to Manage Big Data, Shared-Nothing Architecture for Big Data Tasks,
MongoDB, Databases, Cassandra Databases.
MODULE - 4
Introduction, MapReduce Map Tasks, Reduce Tasks and MapReduce Execution, Composing MapReduce
for Calculations and Algorithms, Hive, HiveQL, Pig.
MODULE - 5
Machine Learning Algorithms for Big Data Analytics: Introduction, Estimating the relationships,
Outliers, Variances, Probability Distributions, and Correlations, Regression analysis, Finding Similar
Items, Similarity of Sets and Collaborative Filtering, Frequent Itemsets and Association Rule Mining.
Text, Web Content, Link, and Social Network Analytics: Introduction, Text mining, Web Mining, Web
Content and Web Usage Analytics, Page Rank, Structure of Web and analyzing a Web Graph, Social
Network as Graphs and Social Network Analytics:
Course outcome
At the end of the course the student will be able to:
CO 1. Understand fundamentals and applications of Big Data analytics.
CO 2. Investigate Hadoop framework, Hadoop Distributed File system and essential Hadoop tools.
CO 3. Illustrate the concepts of NoSQL using MongoDB and Cassandra for Big Data.
CO 4. Demonstrate the MapReduce programming model to process the big data along with Hadoop
tools.
CO 5. Apply Machine Learning algorithms for real world big data, web contents and Social Networks
to provide analytics with relevant visualization tools.
Suggested Learning Resources
Textbooks
1. Raj Kamal and Preeti Saxena, “Big Data Analytics Introduction to Hadoop, Spark, and MachineLearning”, McGraw Hill Education, 2018 ISBN: 9789353164966, 9353164966
2. Douglas Eadline, “Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in
the Apache Hadoop 2 Ecosystem”, 1 stEdition, Pearson Education, 2016. ISBN13: 978-
9332570351
Reference Books
1. Tom White, “Hadoop: The Definitive Guide”, 4 th Edition, O‟Reilly Media, 2015.ISBN-13: 978-
9352130672
2. Boris Lublinsky, Kevin T Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, 1
stEdition, Wrox Press, 2014ISBN-13: 978-8126551071
3. Eric Sammer, “Hadoop Operations: A Guide for Developers and Administrators”,1 stEdition,
O’Reilly Media, 2012.ISBN-13: 978-9350239261
4. ArshdeepBahga, Vijay Madisetti, “Big Data Analytics: A Hands-On Approach”, 1st Edition, VPT
Publications, 2018. ISBN-13: 978-0996025577