Course Description

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. 

This course covers the essential exploratory techniques for summarizing data. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. 

The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.

Course Content

Module 01 - Introduction to Data Science with R

1.1 What is Data Science?
1.2 Significance of Data Science in today’s data-driven world, its applications of, , lifecycle, and its components
1.3 Introduction to R programming and RStudio

Module 02 - Data Exploration

2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What are data exploratory analysis and data importing?
2.4 DataFrames, working with them, accessing individual elements, vectors, factors, operators, in-built functions, conditional and looping statements, user-defined functions, and data types

Module 03 - Data Manipulation

3.1 Need for data manipulation
3.2 Introduction to the dplyr package
3.3 Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
3.4 Combining different functions with the pipe operator and implementing SQL-like operations with sqldf

Module 04 - Data Visualization

4.1 Introduction to visualization
4.2 Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
4.5 Creating barplots for categorical variables using geom_bar(), and adding themes with the theme() layer
4.6 Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with box-plots, and sub grouping plots
4.7 Working with co-ordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
4.8 Geographic visualization with ggmap() and building web applications with shinyR

Module 05 - Introduction to Statistics

5.1 Why do we need statistics?
5.2 Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures of spread
5.3 Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chi-square testing, ANOVA, normal distribution, and binary distribution

Module 06 - Machine Learning

6.1 Introduction to Machine Learning
6.2 Introduction to linear regression, predictive modeling, simple linear regression vs multiple linear regression, concepts, formulas, assumptions, and residuals in Linear Regression, and building a simple linear model
6.3 Predicting results and finding the p-value and an introduction to logistic regression
6.4 Comparing linear regression with logistics regression and bivariate logistic regression with multivariate logistic regression
6.5 Confusion matrix the accuracy of a model, understanding the fit of the model, threshold evaluation with ROCR, and using qqnorm() and qqline()
6.6 Understanding the summary results with null hypothesis, F-statistic, and
building linear models with multiple independent variables

Module 07 - Logistic Regression

7.1 Introduction to logistic regression
7.2 Logistic regression concepts, linear vs logistic regression, and math behind logistic regression
7.3 Detailed formulas, logit function and odds, bivariate logistic regression, and Poisson regression
7.4 Building a simple binomial model and predicting the result, making a confusion matrix for evaluating the accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
7.5 Finding out the right threshold by building the ROC plot, cross validation, multivariate logistic regression, and building logistic models with multiple independent variables
7.6 Real-life applications of logistic regression


Module 08 - Decision Trees and Random Forest

8.1 What is classification? Different classification techniques
8.2 Introduction to decision trees
8.3 Algorithm for decision tree induction and building a decision tree in R
8.4 Confusion matrix and regression trees vs classification trees
8.5 Introduction to bagging
8.6 Random forest and implementing it in R
8.7 What is Naive Bayes? Computing probabilities
8.8 Understanding the concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
8.9 Overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning, pruning a decision tree and predicting values, finding out the right number of trees, and evaluating performance metrics

Module 09 - Unsupervised Learning

9.1 What is Clustering? Its use cases
9.2 what is k-means clustering? What is canopy clustering?
9.3 What is hierarchical clustering?
9.4 Introduction to unsupervised learning
9.5 Feature extraction, clustering algorithms, and the k-means clustering algorithm
9.6 Theoretical aspects of k-means, k-means process flow, k-means in R, implementing k-means, and finding out the right number of clusters using a scree plot
9.7 Dendograms, understanding hierarchical clustering, and implementing it in R
9.8 Explanation of Principal Component Analysis (PCA) in detail and implementing PCA in R

Module 10 - Association Rule Mining and Recommendation Engines

10.1 Introduction to association rule mining and MBA
10.2 Measures of association rule mining: Support, confidence, lift, and apriori algorithm, and implementing them in R
10.3 Introduction to recommendation engines
10.4 User-based collaborative filtering and item-based collaborative filtering, and implementing a recommendation engine in R
10.5 Recommendation engine use cases

Module 11 - Introduction to Artificial Intelligence

11.1 Introducing Artificial Intelligence and Deep Learning
11.2 What is an artificial neural network? TensorFlow: The computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow and working with TensorFlow in R

Module 12 - Time Series Analysis

12.1 What is a time series? The techniques, applications, and components of time series
12.2 Moving average, smoothing techniques, and exponential smoothing
12.3 Univariate time series models and multivariate time series analysis
12.4 ARIMA model
12.5 Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis

Module 13 - Support Vector Machine (SVM)

13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM algorithms using separable and inseparable cases
13.4 Linear SVM for identifying margin hyperplane

Module 14 - Naïve Bayes

14.1 What is the Bayes theorem?
14.2 What is Naïve Bayes Classifier?
14.3 Classification Workflow
14.4 How Naive Bayes classifier works and classifier building in Scikit-Learn
14.5 Building a probabilistic classification model using Naïve Bayes and the zero probability problem

Module 15 - Text Mining

15.1 Introduction to the concepts of text mining
15.2 Text mining use cases and understanding and manipulating the text with ‘tm’ and ‘stringR’
15.3 Text mining algorithms and the quantification of the text
15.4 TF-IDF and after TF-IDF

Student feedback

10 Reviews

  • 9
  • 0
  • 0
  • 0
  • 0


out of 5

Course Rating


Parinita Beniwal

Perfect Course

Hi SparkAcademy, Thank You for the wonderful course. Helps understands the basics and more of Data Science concepts using R. Nice examples and also understandable algorithms for everyone. However the overall content and examples are really good. Thank you once again :)


Ravina Pawaria

Knowledgeable course

Very knowledgeable course it is. I'm extremely grateful. Learning Data Science using R became really easier and enjoying for me. Useful one!!


Kavi Tomar

Great Course

Great course. Looking forward to completing the whole specialization! SparkAcademy seriously provides amazing training. I feel great that I joined you guys for my learning! Thanks!


Sachi Jaswal

Great Course

it's Best course for me i'm very satisfied and got knowledge required, it's better for my future as I received good training here, now all the concepts I've understood.


Aanchal Jain

Well structured

Well structured course. Gives a good explanation and relevant practical knowlege of Data Science with R subject with basics clear. Learning was enjoyable here!


Shraddha Singh

Good course

Good. Understanding of this course will help you to know more about Data Science. It was well structured, trainer was good. Explained really well.


Pranay Mehent

I really loved the way concepts were elaborated in the course. It was all great and helpful course.


Raj Sinha

Good training

I would highly recommend SparkAcademy for any online technical classes. Just loved the course.


Yash Taragi

Amazing Course

It was an amazing session. Thanks to the trainer for sharing his knowledge.


Sandhya Singh


SparkAcademy team is the best. I love the format and logistics of SparkAcademy so much that I would chose them for future courses at any cost rather than take anything else. Awesome guys.

Add Reviews & Rate

  • What is it like to Course?

Related Courses

Data Science Program
Preview Course

Deep Learning Course
Data Science Program
Preview Course

Statistics for Data Science Course
Data Science Program
Preview Course

Master's in Data Science Program

    Course Features

    • R Programming
    • Exploratory Data Analysis
    • Data Manipulation
    • Data Visualization
    • Statistics
    • Machine Learning Algorithms