More on Model Evaluation and an Updated Syllabus

Monday July 9, 2018 at 12:06 pm CDT

As mentioned in the previous blog post, this month, I’ll be focusing on Chollet’s Deep Learning with Python (DLP), which focuses on Keras, and Géron’s Hands on Machine Learning with Scikit-Learn and Tensorflow (HML).

I’m definitely wishing I spent more of June digging into training models, since it feels like this month will be a mad dash to become proficient in doing so. Having said that, I’m glad I took time to dig into some of the fundamental concepts I’d need to understand before moving on to more complex material.

For both texts, the plan is to implement most of the code samples in a Jupyter Notebook. Like a lot of folks, it helps my understanding to actually be able to play with code, as opposed to just reading it.

HML has exercises, so I’ll also choose a few of those each chapter to make sure I really know what I’m doing. DLP, in contrast, doesn’t have any exercises. There are plenty of code samples, but you’re not asked to write any further code. For chapters in DLP, I’ll likely do one of three things - repeat an exercise from HML using Keras, reimplement the chapter code using a different dataset, or find an appropriate Kaggle competition in which I can apply the techniques covered in the chapter.

Model Evaluation

I’ve talked about model evaluation before. I’ll talk about it again, since it’s a significant part of the reading I did in HML this week and it’s an extensive topic so there’s plenty to discuss.

HML covers in detail confusion matrices and useful metrics for evaluating results. To start, let’s discuss what a confusion matrix is.

confusion matrix

The rows correspond to the actual classes to which the samples belong and the columns to the predicted classes. “True positive” indicates that the predicted value matched the actual value, “false positive” indicates that the predicted value was positive, but the actual value was negative, etc.

Let’s say we have 100 samples in our data set. Our samples are images of frogs and toads. A confusion matrix for just the frog class might look like this:

frog-toad confusion matrix

The confusion matrix shows that 11 samples that were predicted to be images of frogs were, in fact, images of frogs. However, there were 7 images that were predicted not to be frogs that were actually frogs. The model also mistook 15 images of toads for frogs, but correctly guessed that 67 images of toads were toads.

One potential metric we could use to evaluate our model is accuracy.

which in this case would be 0.84. Not amazing, but still pretty decent results. Now let’s take a look at some other metrics:

where P is the total number of positive samples.

Precision measures what proportion of the model’s positive guesses were correct. Recall indicates what proportion of the samples belonging to the class in question (in this case, frogs) you correctly identified as such.

The precision of the model is 0.55 which is significantly lower than the accuracy score we saw before. The recall is somewhat better at 0.61.

So…what’s up with that?

You’ll note that we have far more toads in the set than frogs. There are 82 toads to 18 frogs. Let’s say that our model is conservative in its predictions, i.e. is more likely to guess that an image is of a toad than of a frog. Simply guessing “toad” more than “frog” results in higher accuracy. For instance, if our model only returned “toad,” it would have an accuracy of 0.82. Accuracy is not particularly useful when you’re dealing with imbalanced data sets.

It’s also important to note that there is a tradeoff between precision and recall. Given a dataset heavily skewed such that most samples are not of the class in question, if a model is more inclined to give positive predictions, then the precision will drop as the number of false positives increases. The recall, in contrast, will increase as the number of true positive predictions increases. On the other hand, if a model is more inclined to give negative predictions, then the precision will increase and the recall will drop.

You can use this tradeoff to your benefit. If you are more concerned with making sure that you get as many of the positive samples as possible and not super concerned with the number of false positives, then you want to aim for a high recall. This would be the case for a model that tells you if a patient has a certain disease. The consequences for not discovering that the patient has the disease are far greater than if initially it is believed that they do.

Syllabus Updates

Quite a few changes have been made to my syllabus in the past few weeks. Note that the ~ indicates that a resource is optional or can be skipped if there isn’t adequate time.

Books:

  • Ali, Cielen, Meysman - Introducing Data Science (IDS)
  • Kirk - Thoughtful Machine Learning with Python (TML)
  • Grus - Data Science from Scratch (DSS)
  • Bengio, Courville, Goodfellow - Deep Learning (DL)
  • Nielsen - Neural Networks and Deep Learning (NNDL)
  • Chollet - Deep Learning with Python (DLP)
  • Bostrom - Superintelligence: Paths, Dangers, Strategies (SPD)
  • Hawkins - On Intelligence: How a New Understanding of the Brain… (OI)
  • Bruce - Practical Statistics for Data Scientists (PSDS)
  • Raschka - Python Machine Learning (PML)
  • Geron - Hands-On Machine Learning with Scikit-Learn and Tensorflow (HML)
  • Sutton and Barto - Reinforcement Learning: An Introduction, 2nd edition (RL)
  • Kelleher, Mac Namee, D’Arcy - Fundamentals of Machine Learning for Predictive Data Analytics (FMLP)

Blogs:

  • Jason Brownlee - Machine Learning Mastery (machinelearningmastery.com) (MLM)
  • Andy Thomas - Adventures in Machine Learning (adventuresinmachinelearning.com) (AML)
  • Arthur Juliani - Reinforcement Learning (https://medium.com/@awjuliani) (AJRL)
  • Univ. of California, Berkeley - Berkeley Artificial Intelligence Research (http://bair.berkeley.edu/blog/) (BAIR)
  • OpenAI - OpenAI Blog (blog.openai.com) (OPAI)
  • Towards Data Science (towardsdatascience.com) (TDS)
  • DataCamp (https://www.datacamp.com/community/tutorials/) (DC)
  • Daniel Stang - Step by Step Object Detection API Tutorial (DSOD)

Week 1 - Linear algebra, calculus, statistics, python

  1. Linear algebra, calculus, and principal component analysis
    1. Coursera - Mathematics for Machine Learning Specialization
  2. Python
    1. Python docs
    2. DSS - ch. 2: A Crash Course in Python
  3. Statistics (refresher)
    1. DSS - ch. 5: Statistics

Practice

  1. ~ FastAI - Lesson 1: Recognizing Cats and Dogs;
  2. ~ FastAI - Lesson 2: Improving Your Image Classifier

Week 2 - numpy, pandas, matplotlib, seaborn; K-nearest neighbors, curse of dimensionality, overfitting

  1. Numpy, pandas, matplotlib, seaborn
    1. see notes for Udacity - Introduction to Data Analysis
  2. KNN, curse of dimensionality
    1. DSS - ch. 12: K-Nearest Neighbors
    2. FMLP - ch. 5: Similarity-based Learning

Week 3 - probability, deep learning fundamentals, naive Bayesian classification, neural networks math basics

  1. Probability
    1. DSS - ch. 6: Probability
  2. Naive Bayesian Classification
    1. DSS - ch. 13: Naive Bayes
  3. Deep learning fundamentals
    1. DLP - ch. 1: What is Deep Learning?
  4. Neural networks math basics
    1. DLP - ch. 2: The Mathematical Building Blocks of Neural Networks

Practice 1. HML - ch. 1: Machine Learning Landscape 2. HML - ch. 2: End-to-End Machine Learning Project


Week 4 - gradient descent, decision trees, bias-variance tradeoff,

  1. Gradient descent
    1. DSS - ch. 8: Gradient Descent
  2. Decision trees
    1. DSS - ch. 17: Decision Trees
  3. Overfitting, bias-variance tradeoff, feature selection
    1. DSS - ch. 11: Machine Learning

Practice 1. AML - Word2Vec Keras Tutorial 2. HML - ch. 3: Classification


###Week 5 - simple linear regression, logistic regression

Practice 1. DLP - ch. 3: Getting Started with Neural Networks 2. ~ DLP - ch. 4: Fundamentals of Machine Learning 3. DLP - ch. 5: Deep Learning for Computer Vision 4. HML - ch. 4: Training Models


Week 6 - recommendation systems, natural language processing, meta-learning

  1. Natural language processing
    1. DSS - ch. 20 - Natural Language Processing
  2. Recommendation systems
    1. DSS - ch. 22: Recommender Systems
  3. Meta-learning
    1. AJRL - Learning Policies for Learning Policies: Meta Reinforcement Learning in Tensorflow
    2. OPAI - Learning a Hierarchy
    3. TDS - What is Model-Agnostic Meta-Learning?
    4. Finn, Abbeel, Levine - Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (https://arxiv.org/abs/1703.03400)

Practice 1. AML - Word2Vec Word Embedding Tutorial in Python and Tensorflow 2. HML - ch. 8: Dimensionality Reduction 3. HML - ch. 9: Up and Running with Tensorflow 4. HML - ch. 10: Introduction to Artificial Neural Networks


Week 7 - generative deep learning, Hadoop, Spark, resampling methods

  1. Hadoop, Spark
    1. ~ IDS - ch. 5: First Steps in Big Data
  2. Generative deep learning
    1. DLP - ch. 8: Generative Deep Learning
  3. Resampling methods
    1. MLM - How to Implement Resampling Methods from Scratch in Python

Practice 1. DSOD- parts 1- 5 2. HML - ch. 11: Training Deep Neural Nets 3. HML - ch. 16: Reinforcement Learning 4. DLP - ch. 6: Deep Learning for Text and Sequences


Week 8 - Cognition, more reinforcement learning, papers around machine curiosity, math review

  1. Cognition
    1. OI, SPD
  2. Reinforcement learning
    1. AJRL - Simple Reinforcement Learning w/ Tensorflow
    2. Agrawal, et al. - ‘Learning to Poke by Poking: Experiential Learning of Intuitive Physics’
    3. Pathak, et al. - ‘Curiosity-driven Exploration by Self-supervised Prediction’

Practice 1. AML - Reinforcement Learning Tutorial Using Python and Keras 2. DLP - ch. 7: Advanced Deep-Learning Best Practices 3. ~ HML - ch. 14: Recurrent Neural Networks


Photo by Sharon McCutcheon on Unsplash