BAI515E Exploratory Data Analysis
Course Learning Objectives
CLO1: To equip students with Python, IPython, and Jupyter for data analysis tasks.
CLO2: To provide a comprehensive understanding of NumPy for scientific computations.
CLO3: To introduce foundational and advanced data manipulation techniques using Pandas CLO4: To enhance data visualization skills using Matplotlib and Seaborn
CLO5: To introduce Machine Learning concept with practical applications using Scikit-Learn. CLO6: To promote the practical application of data analysis tools and techniques on real-world datasets
SYLLABUS COPY
MODULE - 1
Introduction to Python and NumPy
Getting Started in IPython and Jupyter, Enhanced Interactive Features, The Basics of NumPy Arrays, Sorted Arrays, Structured Data: NumPy’s Structured Arrays
MODULE - 2
Data Manipulation with Pandas – I
Introducing Pandas Objects, Handling Missing Data, Hierarchical Indexing, Pivot Tables.
MODULE - 3
Data Manipulation with Pandas – II
Vectorized String Operations, Working with Time Series, High- Performance Pandas: eval and query
MODULE - 4
Data Visualization with MatPlotlib
General Matplotlib Tips, Simple Line Plots, Simple Scatter Plots, Visualization with Seaborn
MODULE - 5
Introduction to Machine Learning
What Is Machine Learning?, Introducing Scikit-Learn, Hyperparameters and Model Validation
Course outcome
1. Demonstrate the application of the NumPy for performing data analysis tasks.
2. Make use of Pandas for various data manipulation tasks.
3. Apply advanced data manipulation techniques to real-world datasets.
4. Develop data visualizations using Matplotlib and Seaborn to effectively communicate data insights.
5. Explain the fundamental concepts of machine learning and validation models using Scikit-Learn.
Suggested Learning Resources
Text Books
1. Jake VanderPlas – Python Data Science Handbook: Essential Tools for Working with Data, Oreilly 2 nd Edition, 2022.
Reference Book
1. https://python4csip.com/files/download/Data%20Visualization.pdf