DataBlock: To create training set (data used to create a model) and validation set (data used to check the accuracy of a model)
DataLoaders: Iterate through data to train the model
fastdownload: download_url
learner: Takes data and model (an actual neural network function)
fine_tune: For computer vision models. Adjusts the weights so the model learns to recognize your particular dataset
learn.predict
TabularDataLoader: A specific API for tabular analysis
fit_one_cycle: Similar to fine_tune but made for tabular data
CollabDataLoaders: For building recommendation systems
collab_learner: Learner for CollabDataLoaders
learn.show_results
resnet18/34: A Residual Network family computer vision model, commonly used for image classification. 18/34 layers (mainly made of convolutional layers, batch normalization, and ReLU activation) with trainable weights.
convnext: Another, more accurate, vision model.
Jupyter Notebook, Kaggle (cloud server for running notebooks and creating/exporting model files, e.g., .pkl files), Paperspace (better alternative to Kaggle), HuggingFace Spaces (to deploy and share models).
HuggingFace Transformers: An NLP library.
NLP: Using a pre-trained model for tokenization and numericalization.
Avoid overfitting or underfitting the model to the dataset. Choose validation set wisely. Cross-validation technique is used for evaluating the model (to detect overfitting) by training model on subsets of dataset and evaluating them on the complementary dataset. K-fold cross-validation.
Pearson correlation coefficient: To measure degree of relationship between predicted and actual variables, the value ranges from -1 to 1. Values at or close to zero indicate no linear relationship or a very weak correlation.
In the last step of binary classification, pass the predictor function through a sigmoid function (a non-linear activation function) to improve the accuracy of the model.
Activation Functions: Introduce non-linearity into a neural net to learn complex patterns and relationships. Without them, NNs behave like simple linear regression models, unable to capture non-linear patterns in data.
Use ReLU (or Leaky ReLU) in hidden layers.
Use Sigmoid for binary classification output.
Use Softmax for multi-class classification output.
Random Forest: An ensemble of decision trees, each tree is an ensemble of binary splits (splits a row of data sample into two groups).
Bagging (Bootstrap Aggregating): Randomly choose a proportion for a subset from the total samples (50%, 75%, etc.), and build a separate decision tree model on each subset of the samples.
Random forest also chooses a random subset of columns for each decision tree.
Take the mean/average of the predictions made from all the decision trees (e.g., predictions from 100 decision trees). Creating 100 decision trees in a random forest is a thumb rule.
Out-of-bag (OOB) error: If each decision tree was trained on 75% of the samples, the remaining 25% samples (not part of the training) could be used as a validation set for each decision tree. Measuring the error of the trees on those validation sets is Out-of-bag error. SKlearn library has a built-in method for measuring the OOB error.
Random forest advantages:
Requires little preprocessing of data.
Robust. Hard to mess up the model.
Less chances of overfitting.
Provides insight into which columns are strongest predictors and which ones can be ignored. Useful for datasets with very large numbers of columns.
Gradient Boosting
In NNs, we mostly care about tweaking the first or the last layer.
Lesson 7: There are PyTorch methods to see the GPU utilization and garbage collection.
Cross-entropy loss: Used for multi-class and binary classification tasks. Used with softmax in multi-class classification. It measures the difference between the predicted probability distribution and the actual class labels.
Intuition Behind Cross-Entropy Loss:
If the model predicts the correct class with high confidence (probability close to 1), the loss is low.
If the model is wrong or uncertain, the loss is high.
The log function penalizes incorrect predictions more severely.
Stochastic Gradient Descent (SGD) for optimizing the parameters.