In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a highlevel statistical language.
This course covers the essential exploratory techniques for summarizing data. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics.
The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
1.1 What is Data Science?
1.2 Significance of Data Science in today’s datadriven world, its applications of, , lifecycle, and its components
1.3 Introduction to R programming and RStudio
2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What are data exploratory analysis and data importing?
2.4 DataFrames, working with them, accessing individual elements, vectors, factors, operators, inbuilt functions, conditional and looping statements, userdefined functions, and data types
3.1 Need for data manipulation
3.2 Introduction to the dplyr package
3.3 Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
3.4 Combining different functions with the pipe operator and implementing SQLlike operations with sqldf
4.1 Introduction to visualization
4.2 Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
4.5 Creating barplots for categorical variables using geom_bar(), and adding themes with the theme() layer
4.6 Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with boxplots, and sub grouping plots
4.7 Working with coordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
4.8 Geographic visualization with ggmap() and building web applications with shinyR
5.1 Why do we need statistics?
5.2 Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures of spread
5.3 Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chisquare testing, ANOVA, normal distribution, and binary distribution
6.1 Introduction to Machine Learning
6.2 Introduction to linear regression, predictive modeling, simple linear regression vs multiple linear regression, concepts, formulas, assumptions, and residuals in Linear Regression, and building a simple linear model
6.3 Predicting results and finding the pvalue and an introduction to logistic regression
6.4 Comparing linear regression with logistics regression and bivariate logistic regression with multivariate logistic regression
6.5 Confusion matrix the accuracy of a model, understanding the fit of the model, threshold evaluation with ROCR, and using qqnorm() and qqline()
6.6 Understanding the summary results with null hypothesis, Fstatistic, and
building linear models with multiple independent variables
7.1 Introduction to logistic regression
7.2 Logistic regression concepts, linear vs logistic regression, and math behind logistic regression
7.3 Detailed formulas, logit function and odds, bivariate logistic regression, and Poisson regression
7.4 Building a simple binomial model and predicting the result, making a confusion matrix for evaluating the accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
7.5 Finding out the right threshold by building the ROC plot, cross validation, multivariate logistic regression, and building logistic models with multiple independent variables
7.6 Reallife applications of logistic regression
8.1 What is classification? Different classification techniques
8.2 Introduction to decision trees
8.3 Algorithm for decision tree induction and building a decision tree in R
8.4 Confusion matrix and regression trees vs classification trees
8.5 Introduction to bagging
8.6 Random forest and implementing it in R
8.7 What is Naive Bayes? Computing probabilities
8.8 Understanding the concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
8.9 Overfitting, pruning, prepruning, postpruning, and costcomplexity pruning, pruning a decision tree and predicting values, finding out the right number of trees, and evaluating performance metrics
9.1 What is Clustering? Its use cases
9.2 what is kmeans clustering? What is canopy clustering?
9.3 What is hierarchical clustering?
9.4 Introduction to unsupervised learning
9.5 Feature extraction, clustering algorithms, and the kmeans clustering algorithm
9.6 Theoretical aspects of kmeans, kmeans process flow, kmeans in R, implementing kmeans, and finding out the right number of clusters using a scree plot
9.7 Dendograms, understanding hierarchical clustering, and implementing it in R
9.8 Explanation of Principal Component Analysis (PCA) in detail and implementing PCA in R
10.1 Introduction to association rule mining and MBA
10.2 Measures of association rule mining: Support, confidence, lift, and apriori algorithm, and implementing them in R
10.3 Introduction to recommendation engines
10.4 Userbased collaborative filtering and itembased collaborative filtering, and implementing a recommendation engine in R
10.5 Recommendation engine use cases
11.1 Introducing Artificial Intelligence and Deep Learning
11.2 What is an artificial neural network? TensorFlow: The computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow and working with TensorFlow in R
12.1 What is a time series? The techniques, applications, and components of time series
12.2 Moving average, smoothing techniques, and exponential smoothing
12.3 Univariate time series models and multivariate time series analysis
12.4 ARIMA model
12.5 Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis
13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM algorithms using separable and inseparable cases
13.4 Linear SVM for identifying margin hyperplane
14.1 What is the Bayes theorem?
14.2 What is Naïve Bayes Classifier?
14.3 Classification Workflow
14.4 How Naive Bayes classifier works and classifier building in ScikitLearn
14.5 Building a probabilistic classification model using Naïve Bayes and the zero probability problem
15.1 Introduction to the concepts of text mining
15.2 Text mining use cases and understanding and manipulating the text with ‘tm’ and ‘stringR’
15.3 Text mining algorithms and the quantification of the text
15.4 TFIDF and after TFIDF
Course Rating
Parinita Beniwal




 (5)
March 02, 2021Perfect Course
Hi SparkAcademy, Thank You for the wonderful course. Helps understands the basics and more of Data Science concepts using R. Nice examples and also understandable algorithms for everyone. However the overall content and examples are really good. Thank you once again :)
Ravina Pawaria




 (5)
February 23, 2021Knowledgeable course
Very knowledgeable course it is. I'm extremely grateful. Learning Data Science using R became really easier and enjoying for me. Useful one!!
Kavi Tomar




 (5)
February 21, 2021Great Course
Great course. Looking forward to completing the whole specialization! SparkAcademy seriously provides amazing training. I feel great that I joined you guys for my learning! Thanks!
Sachi Jaswal




 (5)
February 13, 2021Great Course
it's Best course for me i'm very satisfied and got knowledge required, it's better for my future as I received good training here, now all the concepts I've understood.
Aanchal Jain




 (5)
February 08, 2021Well structured
Well structured course. Gives a good explanation and relevant practical knowlege of Data Science with R subject with basics clear. Learning was enjoyable here!
Shraddha Singh




 (5)
February 07, 2021Good course
Good. Understanding of this course will help you to know more about Data Science. It was well structured, trainer was good. Explained really well.
Pranay Mehent




 (5)
February 05, 2021I really loved the way concepts were elaborated in the course. It was all great and helpful course.
Raj Sinha




 (4.5)
February 04, 2021Good training
I would highly recommend SparkAcademy for any online technical classes. Just loved the course.
Yash Taragi




 (5)
February 02, 2021Amazing Course
It was an amazing session. Thanks to the trainer for sharing his knowledge.
Sandhya Singh




 (5)
February 02, 2021Review
SparkAcademy team is the best. I love the format and logistics of SparkAcademy so much that I would chose them for future courses at any cost rather than take anything else. Awesome guys.