Looking to accelerate your career in data science? Gain in-demand knowledge immediately transferrable to your work with live online data science programs, powered by Metis.

Introduction to Data Science

Designed for those with a basic understanding of data analysis techniques, this course serves as an introduction to the data science principles required to tackle real-world, data-rich problems in business and academia.



Data acquisition, cleaning and aggregation
Exploratory data analysis and visualisation
Feature engineering
Model creation and validation
Basic statistical and mathematical foundations for data science


An understanding of problems solvable with data science and an ability to attack them from a statistical perspective.


An understanding of when to use supervised and unsupervised statistical learning methods on labelled and unlabelled data-rich problems.


The ability to create data analytical pipelines and applications in Python.


Familiarity with the Python data science ecosystem and the various tools needed to continue developing as a data scientist.


Students should have some familiarity with basic statistical and linear algebraic concepts such as mean, median, mode, standard deviation, correlation, and the difference between a vector and a matrix. Additionally, Python is a requirement for the course. In Python, it will be helpful to know basic data structures such as lists, tuples and dictionaries, and what distinguishes them (that is, when they should be used). Python v3 is currently used in the course.

To ensure everyone begins the course on the same page, students will be encouraged to complete approximately 8 hours of pre-work before the first day of instruction.

Students will need a Github account to get access to the content and a Slack account to collaborate with their instructor and peers. Sign-up is free, fast and easy.

Course designed by Sergey Fogelson, VP of Analytics and Measurement Sciences, Viacom


Get answers to frequently asked questions. FAQs >

Available in late-2020


Please register your interest for the live online Introduction to Data Science course in late-2020.

Course Structure and Syllabus


CS/Statistics/Linear Algebra Short Course

We start with the basics. For CS, we briefly cover basic data structures/types, program control flow and syntax in Python. For statistics, we go over basic probability and probability distributions, along with general properties of some common distributions. For linear algebra, we cover matrices, vectors and some of their properties, and how to use them in Python.


Exploratory Data Analysis and Visualisation

We spend a considerable amount of time using the Pandas Python package to attack a dataset we’ve never seen before, uncovering some useful information from it. At this point, students decide on a course project that would benefit from the data-scientific approach. The project must involve public (freely-accessible and usable) data and must answer an interesting question, or collection of questions, about that data (several resources of free data will be provided).


Data Modelling: Supervised/Unsupervised Learning and Model Evaluation

We learn about the two basic kinds of statistical models, which have classically been used for prediction (supervised learning): linear regression and logistic regression. We also look at clustering using k-means, one of the ways you can glean information from unlabelled data.


Data Modelling: Feature Selection, Engineering, and Data Pipelines

We switch gears from talking about algorithms to talk about features. What are they? How do we engineer them? What can be done (principal component analysis/independent component analysis, regularisation) to create and use them given the data at hand? We also cover how to construct complete data pipelines, going from data ingestion and pre-processing to model construction and evaluation.


Data Modelling: Advanced Supervised/Unsupervised Learning

We delve into more advanced supervised learning approaches and get a feel for linear support vector machines, decision trees, and random forest models for regression and classification. We also explore DBSCAN, an additional unsupervised learning approach.


Data Modelling: Advanced Model Evaluation and Data Pipelines | Presentations

We explore more sophisticated model evaluation approaches (cross-validation and bootstrapping) with the goal of understanding how we can make our models as generalisable as possible. Students complete data science projects and share learnings and discoveries.

Live online interactive learning

Learn from world-class data science practitioners

Live online instructors bring deep industry experience and will be available to support you throughout your learning process.

Interact with instructors and classmates in real-time

Ask questions, participate in discussions and join your course Slack channel for maximum engagement, collaboration and support.

The benefit of online learning with live instruction

Log in from wherever you are to access live online classes. If you miss a class or need to refer back, recordings are available 24/7.

Register your interest 

Register your interest for the live online Introduction to Data Science course in late-2020. We’ll let you know when classes are open for enrolment.

Frequently asked questions

Python is a requirement for the course. In Python, it will be helpful to know basic data structures such as lists, tuples and dictionaries, and what distinguishes them (that is, when they should be used). Python v3 is currently used in the course.

No, you’ll receive a certificate of completion stating you’ve completed the course. 

While there’s no official homework, you can expect to spend a minimum of 3 hours per week reviewing material or working on projects. The non-class time spent will depend on your background and the course itself. Each instructor will address this on the first day of class. There will be lab/office hours outside of class during where students and the instructor can collaborate.

Course instructors are from the industry and have real-world experience as practitioners of data science. Please visit the respective course pages for specific information on each instructor’s background and current job.

The course runs two nights per week over 6 weeks, totalling 36 hours of instruction.

The live online format allows you to attend class sessions from anywhere with a stable internet connection. Unlike other online options, where sessions may be pre-recorded, the live online format allows for interaction with the instructor, teaching assistant and other participants.

The curriculum will be provided via Github, so you must register a Github account. Sign-up for an account on their site is free, fast and easy. Github is a web-based hosting service for version control using Git.