Data science continue to be in highest demand across industries, and the need for data practitioners is booming. Upon completing this Professional Certificate program, you will be armed with the skills and experience you need to start your career in data science and machine learning.
Through handson assignments and highquality instruction, you will build a portfolio using real data science tools and realworld problems and data sets. The curriculum will cover a wide range of data science topics including: open source tools and libraries, methodologies, Python, databases, SQL, data visualization, data analysis, and machine learning.
1.1 Descriptive statistics basics
1.2 Mean, median, and mode
1.3 Standard deviation
1.4 Use of the central tendency measures
1.5 Bayes Theorem
2.1 Types of visualization
2.2 Calculation and interpretation of graphs, plot, and measures
3.1 Basics of probability distributions
3.2 Poisson probability function
3.3 Binomial distribution
3.4 Normal distribution
3.5 Probability distribution applications
4.1 Tests to deal with data and their relationships
4.2 Assumptions taken in the tests
4.3 Language while interpreting the outcome of hypothesis tests
5.1 Use of Python for regression analysis
5.2 Test relationships and differences in sample means
5.3 Interpret the results of these tests
1.1 What is Data Science?
1.2 Significance of Data Science in today’s datadriven world, its applications of, , lifecycle, and its components
1.3 Introduction to R programming and RStudio
2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What are data exploratory analysis and data importing?
2.4 DataFrames, working with them, accessing individual elements, vectors, factors, operators, inbuilt functions, conditional and looping statements, userdefined functions, and data types
3.1 Need for data manipulation
3.2 Introduction to the dplyr package
3.3 Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
3.4 Combining different functions with the pipe operator and implementing SQLlike operations with sqldf
4.1 Introduction to visualization
4.2 Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
4.5 Creating barplots for categorical variables using geom_bar(), and adding themes with the theme() layer
4.6 Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with boxplots, and sub grouping plots
4.7 Working with coordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
4.8 Geographic visualization with ggmap() and building web applications with shinyR
5.1 Why do we need statistics?
5.2 Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures of spread
5.3 Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chisquare testing, ANOVA, normal distribution, and binary distribution
6.1 Introduction to Machine Learning
6.2 Introduction to linear regression, predictive modeling, simple linear regression vs multiple linear regression, concepts, formulas, assumptions, and residuals in Linear Regression, and building a simple linear model
6.3 Predicting results and finding the pvalue and an introduction to logistic regression
6.4 Comparing linear regression with logistics regression and bivariate logistic regression with multivariate logistic regression
6.5 Confusion matrix the accuracy of a model, understanding the fit of the model, threshold evaluation with ROCR, and using qqnorm() and qqline()
6.6 Understanding the summary results with null hypothesis, Fstatistic, and
building linear models with multiple independent variables
7.1 Introduction to logistic regression
7.2 Logistic regression concepts, linear vs logistic regression, and math behind logistic regression
7.3 Detailed formulas, logit function and odds, bivariate logistic regression, and Poisson regression
7.4 Building a simple binomial model and predicting the result, making a confusion matrix for evaluating the accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
7.5 Finding out the right threshold by building the ROC plot, cross validation, multivariate logistic regression, and building logistic models with multiple independent variables
7.6 Reallife applications of logistic regression
8.1 What is classification? Different classification techniques
8.2 Introduction to decision trees
8.3 Algorithm for decision tree induction and building a decision tree in R
8.4 Confusion matrix and regression trees vs classification trees
8.5 Introduction to bagging
8.6 Random forest and implementing it in R
8.7 What is Naive Bayes? Computing probabilities
8.8 Understanding the concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
8.9 Overfitting, pruning, prepruning, postpruning, and costcomplexity pruning, pruning a decision tree and predicting values, finding out the right number of trees, and evaluating performance metrics
9.1 What is Clustering? Its use cases
9.2 what is kmeans clustering? What is canopy clustering?
9.3 What is hierarchical clustering?
9.4 Introduction to unsupervised learning
9.5 Feature extraction, clustering algorithms, and the kmeans clustering algorithm
9.6 Theoretical aspects of kmeans, kmeans process flow, kmeans in R, implementing kmeans, and finding out the right number of clusters using a scree plot
9.7 Dendograms, understanding hierarchical clustering, and implementing it in R
9.8 Explanation of Principal Component Analysis (PCA) in detail and implementing PCA in R
10.1 Introduction to association rule mining and MBA
10.2 Measures of association rule mining: Support, confidence, lift, and apriori algorithm, and implementing them in R
10.3 Introduction to recommendation engines
10.4 Userbased collaborative filtering and itembased collaborative filtering, and implementing a recommendation engine in R
10.5 Recommendation engine use cases
11.1 Introducing Artificial Intelligence and Deep Learning
11.2 What is an artificial neural network? TensorFlow: The computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow and working with TensorFlow in R
12.1 What is a time series? The techniques, applications, and components of time series
12.2 Moving average, smoothing techniques, and exponential smoothing
12.3 Univariate time series models and multivariate time series analysis
12.4 ARIMA model
12.5 Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis
13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM algorithms using separable and inseparable cases
13.4 Linear SVM for identifying margin hyperplane
14.1 What is the Bayes theorem?
14.2 What is Naïve Bayes Classifier?
14.3 Classification Workflow
14.4 How Naive Bayes classifier works and classifier building in ScikitLearn
14.5 Building a probabilistic classification model using Naïve Bayes and the zero probability problem
15.1 Introduction to the concepts of text mining
15.2 Text mining use cases and understanding and manipulating the text with ‘tm’ and ‘stringR’
15.3 Text mining algorithms and the quantification of the text
15.4 TFIDF and after TFIDF
1.1 What is Data Science, what does a data scientist do
1.2 Various examples of Data Science in the industries
1.3 How Python is deployed for Data Science applications
1.4 Various steps in Data Science process like data wrangling, data exploration and selecting the model.
1.5 Introduction to Python programming language
1.6 Important Python features, how is Python different from other programming languages
1.7 Python installation, Anaconda Python distribution for Windows, Linux and Mac
1.8 How to run a sample Python script, Python IDE working mechanism
1.9 Running some Python basic commands
1.10 Python variables, data types and keywords.
2.1 Introduction to a basic construct in Python
2.2 Understanding indentation like tabs and spaces
2.3 Python builtin data types
2.4 Basic operators in Python
2.5 Loop and control statements like break, if, for, continue, else, range() and more.
3.1 Central Tendency
3.2 Variabiltiy
3.3 Hypothesis Testing
3.4 Anova
3.5 Correlation
3.6 Regression
3.7 Probability Definitions and Notation
3.8 Joint Probabilities
3.9 The Sum Rule, Conditional Probability, and the Product Rule
3.10 Baye’s Theorem
4.1 Understanding the OOP paradigm like encapsulation, inheritance, polymorphism and abstraction
4.2 What are access modifiers, instances, class members
4.3 Classes and objects
4.4 Function parameter and return type functions
4.5 Lambda expressions.
5.1 Introduction to mathematical computing in Python
5.2 What are arrays and matrices, array indexing, array math, Inspecting a numpy array, Numpy array manipulation
6.1 Introduction to scipy, building on top of numpy
6.2 What are the characteristics of scipy
6.3 Various subpackages for scipy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with scipy.
7.1 What is a data Manipulation. Using Pandas library
7.2 Numpy dependency of Pandas library
7.3 Series object in pandas
7.4 Dataframe in Pandas
7.5 Loading and handling data with Pandas
7.6 How to merge data objects
7.7 Concatenation and various types of joins on data objects, exploring dataset
8.1 Introduction to Matplotlib
8.2 Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more
8.3 Matplotlib API
Handson Exercise –
1. Deploying Matplotlib for creating pie, scatter, line and histogram.
2. Subplots and Pandas builtin data visualization.
9.1 Revision of topics in Python (Pandas, Matplotlib, numpy, scikitLearn)
9.2 Introduction to machine learning
9.3 Need of Machine learning
9.4 Types of machine learning and workflow of Machine Learning
9.5 Uses Cases in Machine Learning, its various arlogrithms
9.6 What is supervised learning
9.7 What is Unsupervised Learning
10.1 What is linear regression
10.2 Step by step calculation of Linear Regression
10.3 Linear regression in Python
10.4 Logistic Regression
10.5 What is classification
10.6 Decision Tree, Confusion Matrix, Random Forest, Naïve Bayes classifier (Self paced), Support Vector Machine(self paced), xgboost(self paced)
11.1 Introduction to unsupervised learning
11.2 Use cases of unsupervised learning
11.3 What is clustering
11.4 Types of clustering(selfpaced)Exclusive clustering, Overlapping Clustering, Hierarchical Clustering(selfpaced)
11.5 What is Kmeans clustering
11.6 Step by step calculation of kmeans algorithm
11.7 Association Rule Mining(selfpaced), Market Basket Analysis(selfpaced), Measures in association rule mining(selfpaced)support, confidence, lift
11.8 Apriori Algorithm
12.1 Introduction to pyspark
12.2 Who uses pyspark, need of spark with python
12.3 Pyspark installation
12.4 Pyspark fundamentals
12.5 Advantage over mapreduce, pyspark
12.6 Usecases pyspark and demo.
13.1 Introduction to Dimensionality
13.2 Why Dimensionality Reduction
13.3 PCA
13.4 Factor Analysis
13.5 LDA
14.1 White Noise
14.2 AR model
14.3 MA model
14.4 ARMA model
14.5 ARIMA model
14.6 Stationarity
14.7 ACF & PACF
1.1 What is data visualization?
1.2 Comparison and benefits against reading raw numbers
1.3 Real use cases from various business domains
1.4 Some quick and powerful examples using Tableau without going into the technical details of Tableau
1.5 Installing Tableau
1.6 Tableau interface
1.7 Connecting to DataSource
1.8 Tableau data types
1.9 Data preparation
2.1 Installation of Tableau Desktop
2.2 Architecture of Tableau
2.3 Interface of Tableau (Layout, Toolbars, Data Pane, Analytics Pane, etc.)
2.4 How to start with Tableau
2.5 The ways to share and export the work done in Tableau
3.1 Connection to Excel
3.2 Cubes and PDFs
3.3 Management of metadata and extracts
3.4 Data preparation
3.5 Joins (Left, Right, Inner, and Outer) and Union
3.6 Dealing with NULL values, crossdatabase joining, data extraction, data blending, refresh extraction, incremental extraction, how to build extract, etc.
4.1 Mark, highlight, sort, group, and use sets (creating and editing sets, IN/OUT, sets in hierarchies)
4.2 Constant sets
4.3 Computed sets, bins, etc.
5.1 Filters (addition and removal)
5.2 Filtering continuous dates, dimensions, and measures
5.3 Interactive filters, marks card, and hierarchies
5.4 How to create folders in Tableau
5.5 Sorting in Tableau
5.6 Types of sorting
5.7 Filtering in Tableau
5.8 Types of filters
5.9 Filtering the order of operations
6.1 Using Formatting Pane to work with menu, fonts, alignments, settings, and copypaste
6.2 Formatting data using labels and tooltips
6.3 Edit axes and annotations
6.4 Kmeans cluster analysis
6.5 Trend and reference lines
6.6 Visual analytics in Tableau
6.7 Forecasting, confidence interval, reference lines, and bands
7.1 Working on coordinate points
7.2 Plotting longitude and latitude
7.3 Editing unrecognized locations
7.4 Customizing geocoding, polygon maps, WMS: web mapping services
7.5 Working on the background image, including add image
7.6 Plotting points on images and generating coordinates from them
7.7 Map visualization, custom territories, map box, WMS map
7.8 How to create map projects in Tableau
7.9 Creating dual axes maps, and editing locations
8.1 Calculation syntax and functions in Tableau
8.2 Various types of calculations, including Table, String, Date, Aggregate, Logic, and Number
8.3 LOD expressions, including concept and syntax
8.4 Aggregation and replication with LOD expressions
8.5 Nested LOD expressions
8.6 Levels of details: fixed level, lower level, and higher level
8.7 Quick table calculations
8.8 The creation of calculated fields
8.9 Predefined calculations
8.10 How to validate
9.1 Creating parameters
9.2 Parameters in calculations
9.3 Using parameters with filters
9.4 Column selection parameters
9.5 Chart selection parameters
9.6 How to use parameters in the filter session
9.7 How to use parameters in calculated fields
9.8 How to use parameters in the reference line
10.1 Dual axes graphs
10.2 Histograms
10.3 Single and dual axes
10.4 Box plot
10.5 Charts: motion, Pareto, funnel, pie, bar, line, bubble, bullet, scatter, and waterfall charts
10.6 Maps: tree and heat maps
10.7 Market basket analysis (MBA)
10.8 Using Show me
10.9 Text table and highlighted table
11.1 Building and formatting a dashboard using size, objects, views, filters, and legends
11.2 Best practices for making creative as well as interactive dashboards using the actions
11.3 Creating stories, including the intro of story points
11.4 Creating as well as updating the story points
11.5 Adding catchy visuals in stories
11.6 Adding annotations with descriptions; dashboards and stories
11.7 What is dashboard?
11.8 Highlight actions, URL actions, and filter actions
11.9 Selecting and clearing values
11.10 Best practices to create dashboards
11.11 Dashboard examples; using Tableau workspace and Tableau interface
11.12 Learning about Tableau joins
11.13 Types of joins
11.14 Tableau field types
11.15 Saving as well as publishing data source
11.16 Live vs extract connection
11.17 Various file types
12.1 Introduction to Tableau Prep
12.2 How Tableau Prep helps quickly combine join, shape, and clean data for analysis
12.3 Creation of smart examples with Tableau Prep
12.4 Getting deeper insights into the data with great visual experience
12.5 Making data preparation simpler and accessible
12.6 Integrating Tableau Prep with Tableau analytical workflow
12.7 Understanding the seamless process from data preparation to analysis with Tableau Prep
13.1 Introduction to R language
13.2 Applications and use cases of R
13.3 Deploying R on the Tableau platform
13.4 Learning R functions in Tableau
13.5 The integration of Tableau with Hadoop
1.1 Field of machine learning, its impact on the field of artificial intelligence
1.2 The benefits of machine learning w.r.t. Traditional methodologies
1.3 Deep learning introduction and how it is different from all other machine learning methods
1.4 Classification and regression in supervised learning
1.5 Clustering and association in unsupervised learning, algorithms that are used in these categories
1.6 Introduction to ai and neural networks
1.7 Machine learning concepts
1.8 Supervised learning with neural networks
1.9 Fundamentals of statistics, hypothesis testing, probability distributions, and hidden markov models.
2.1 Multilayer network introduction, regularization, deep neural networks
2.2 Multilayer perceptron
2.3 Overfitting and capacity
2.4 Neural network hyperparameters, logic gates
2.5 Different activation functions used in neural networks, including relu, softmax, sigmoid and hyperbolic functions
2.6 Back propagation, forward propagation, convergence, hyperparameters, and overfitting.
3.1 Various methods that are used to train artificial neural networks
3.2 Perceptron learning rule, gradient descent rule, tuning the learning rate, regularization techniques, optimization techniques
3.3 Stochastic process, vanishing gradients, transfer learning, regression techniques,
3.4 Lasso l1 and ridge l2, unsupervised pretraining, xavier initialization.
4.1 Understanding how deep learning works
4.2 Activation functions, illustrating perceptron, perceptron training
4.3 multilayer perceptron, key parameters of perceptron;
4.4 Tensorflow introduction and its opensource software library that is used to design, create and train
4.5 Deep learning models followed by google’s tensor processing unit (tpu) programmable ai
4.6 Python libraries in tensorflow, code basics, variables, constants, placeholders
4.7 Graph visualization, usecase implementation, keras, and more.
5.1 Keras highlevel neural network for working on top of tensorflow
5.2 Defining complex multioutput models
5.3 Composing models using keras
5.3 Sequential and functional composition, batch normalization
5.4 Deploying keras with tensorboard, and neural network training process customization.
6.1 Using tflearn api to implement neural networks
6.2 Defining and composing models, and deploying tensorboard
7.1 Mapping the human mind with deep neural networks (dnns)
7.2 Several building blocks of artificial neural networks (anns)
7.3 The architecture of dnn and its building blocks
7.4 Reinforcement learning in dnn concepts, various parameters, layers, and optimization algorithms in dnn, and activation functions.
8.1 What is a convolutional neural network?
8.2 Understanding the architecture and usecases of cnn
8.3‘What is a pooling layer?’ how to visualize using cnn
8.4 How to finetune a convolutional neural network
8.5 What is transfer learning?
8.6 Understanding recurrent neural networks, kernel filter, feature maps, and pooling, and deploying convolutional neural networks in tensorflow.
9.1 Introduction to the rnn model
9.2 Use cases of rnn, modeling sequences
9.3 Rnns with back propagation
9.4 Long shortterm memory (lstm)
9.5 Recursive neural tensor network theory, the basic rnn cell, unfolded rnn, dynamic rnn
9.6 Timeseries predictions.
10.1 Gpu’s introduction, ‘how are they different from cpus?,’ the significance of gpus
10.2 Deep learning networks, forward pass and backward pass training techniques
10.3 Gpu constituent with simpler core and concurrent hardware.
11.1 Introduction rbm and autoencoders
11.2 Deploying rbm for deep neural networks, using rbm for collaborative filtering
11.3 Autoencoders features and applications of autoencoders.
12.1 Image processing
12.2 Natural language processing (nlp) – Speech recognition, and video analytics.
13.1 Automated conversation bots leveraging any of the following descriptive techniques: Ibm watson, Microsoft’s luis, Open–closed domain bots,
13.2 Generative model, and the sequence to sequence model (lstm).
RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples
Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types
The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync
Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization
Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup
In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.
Concepts of data aggregation and types and data indexing concepts, properties and variations
using single key and using multikey
Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo
Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data
Course Rating
Shivam Baliyan




 (5.0)
March 02, 2021Great Course
Excellent Explanation of concepts. the course was very informative and well explained will surely recommend it. I learned a lot and the illustrations and examples are great. The Course is a great match.
Vishv Ajay Mittal




 (5.0)
February 24, 2021Awsome and complete course
Thanks to the instructors for this awesome tutorial. the best Data Science Architect course. just amazing!!! I was greatly helped by this course. Everything was mindblowing according to me, it was very helpful to start my career as a data scientist.
Aarchi Jain




 (5.0)
February 14, 2021Wonderful Training
Amazingly wonderful course , extremely intuitive. Very pleased with this course. It was really good and informative. The speaker is knowledgeable and quite engaging. Thank you SparkAcademy!!
Malvika Sitlani Aryan




 (5.0)
February 11, 2021Great Training
Well, it's definitely a great course, very deep knowledge, however enough detailed, great presentations, easy to learn  all together is an amazing study for everyone. It was great learning experience!!!
Kritika Khurana




 (5.0)
February 04, 2021Perfect Course
I am absolutely loving this course. This course is very thorough, structured around core concepts and practical coding experience is a huge plus. An enjoyable course with interesting content.
Guneet Virdi




 (5.0)
January 29, 2021Amazing course
Thank you for the practical examples you gave, they made understanding easier, I really enjoyed the course. The course is very well structured. Happy I took this one!!!
Saksham Bisht




 (5.0)
January 24, 2021Well Explained
its a very well explained curse and very well presented by the instructor. I have completed the whole course and loved it. It made me understand many concepts really clear. Amazingly structured one!
Akanksha Maheshwari




 (5.0)
January 22, 2021Clear Concepts
The knowledge and content are very valuable. Very well explained all the complex things in a simple and clear way. Overall I feel happy about my decision to take this course.
Kunaal Khanna




 (5.0)
January 19, 2021Excellent tutorials
An excellent course for everyone who wants to start learning data science. I really liked its stepbystep structure and reallife problems that are backing up concepts taught in this course.
Apaar Bharadwaj




 (5.0)
January 14, 2021Wellstructured and complete course
It has been a real pleasure experiencing this long journey through data science. It's a very wellstructured and complete course, with a perfectly balanced amount of theory and exercises. Thank you so much for this great work!