Skip to main content

Practical Deep Learning for Coders – Jeremy Howard (fast.ai)

Tools

  • PyTorch
  • FastAI library (docs) (built on PyTorch library):
    • DataBlock: To create training set (data used to create a model) and validation set (data used to check the accuracy of a model)
    • DataLoaders: Iterate through data to train the model
    • fastdownload: download_url
    • learner: Takes data and model (an actual neural network function)
    • fine_tune: For computer vision models. Adjusts the weights so the model learns to recognize your particular dataset
    • learn.predict
    • TabularDataLoader: A specific API for tabular analysis
    • fit_one_cycle: Similar to fine_tune but made for tabular data
    • CollabDataLoaders: For building recommendation systems
    • collab_learner: Learner for CollabDataLoaders
    • learn.show_results
  • resnet18/34: A Residual Network family computer vision model, commonly used for image classification. 18/34 layers (mainly made of convolutional layers, batch normalization, and ReLU activation) with trainable weights.
  • convnext: Another, more accurate, vision model.
  • Jupyter Notebook, Kaggle (cloud server for running notebooks and creating/exporting model files, e.g., .pkl files), Paperspace (better alternative to Kaggle), HuggingFace Spaces (to deploy and share models).
  • HuggingFace Transformers: An NLP library.
  • NLP: Using a pre-trained model for tokenization and numericalization.

Techniques

  • Avoid overfitting or underfitting the model to the dataset. Choose validation set wisely. Cross-validation technique is used for evaluating the model (to detect overfitting) by training model on subsets of dataset and evaluating them on the complementary dataset. K-fold cross-validation.
  • Pearson correlation coefficient: To measure degree of relationship between predicted and actual variables, the value ranges from -1 to 1. Values at or close to zero indicate no linear relationship or a very weak correlation.
  • In the last step of binary classification, pass the predictor function through a sigmoid function (a non-linear activation function) to improve the accuracy of the model.
  • Activation Functions: Introduce non-linearity into a neural net to learn complex patterns and relationships. Without them, NNs behave like simple linear regression models, unable to capture non-linear patterns in data.
    • Use ReLU (or Leaky ReLU) in hidden layers.
    • Use Sigmoid for binary classification output.
    • Use Softmax for multi-class classification output.
  • Random Forest: An ensemble of decision trees, each tree is an ensemble of binary splits (splits a row of data sample into two groups).
    • Bagging (Bootstrap Aggregating): Randomly choose a proportion for a subset from the total samples (50%, 75%, etc.), and build a separate decision tree model on each subset of the samples.
    • Random forest also chooses a random subset of columns for each decision tree.
    • Take the mean/average of the predictions made from all the decision trees (e.g., predictions from 100 decision trees). Creating 100 decision trees in a random forest is a thumb rule.
    • Out-of-bag (OOB) error: If each decision tree was trained on 75% of the samples, the remaining 25% samples (not part of the training) could be used as a validation set for each decision tree. Measuring the error of the trees on those validation sets is Out-of-bag error. SKlearn library has a built-in method for measuring the OOB error.
    • Random forest advantages:
      • Requires little preprocessing of data.
      • Robust. Hard to mess up the model.
      • Less chances of overfitting.
      • Provides insight into which columns are strongest predictors and which ones can be ignored. Useful for datasets with very large numbers of columns.
  • Gradient Boosting
  • In NNs, we mostly care about tweaking the first or the last layer.
  • Lesson 7: There are PyTorch methods to see the GPU utilization and garbage collection.
  • Cross-entropy loss: Used for multi-class and binary classification tasks. Used with softmax in multi-class classification. It measures the difference between the predicted probability distribution and the actual class labels.
    • Intuition Behind Cross-Entropy Loss:
      • If the model predicts the correct class with high confidence (probability close to 1), the loss is low.
      • If the model is wrong or uncertain, the loss is high.
      • The log function penalizes incorrect predictions more severely.
  • Stochastic Gradient Descent (SGD) for optimizing the parameters.