Jeremy-Howard
In this lesson on deep learning, the instructor demonstrates how to create and use callbacks in fastai to tweak and monitor the training process. The callback class has several methods to handle exceptions, each of which can cancel an epoch, batch, or the whole fitting process. The instructor creates several callbacks, including ones to calculate metrics, track progress, and modify the learning rate schedule. Additionally, the instructor shows how to subclass the Learner class to create a more flexible version with custom functions and parameters. The video also touches on the concepts of momentum and learning rate schedulers and how to diagnose problems with model training.
In this section, we start working in the 09_learner notebook where we saw Basic Callbacks Learner, which had all the essential pieces but wasn't very flexible. The method could only calculate accuracy and average loss; the device and learning rate are hard-coded to default values. The Learner had a fit method where training and evaluation happen depending on a flag, and another method calls one_batch to grab the x and y parts of batches and calculate statistics for the accuracy. Now, we move to Basic Callbacks Learner, with almost full functionality, and flexible learner, after creating some callbacks and metrics. We can run various callbacks manually before_fit, after_batch, and after_fit to increment and print out how many batches were complete.
In this section, Jeremy Howard explains how callbacks work in fastai by going through the process of creating and testing one. A callback looks at a class and includes one or more methods before_fit, after_fit, before_epoch, after_epoch, before_batch, and after_batch. Howard demonstrates how to experimentally call the different methods to see how they work. The try-except blocks are added to each method to handle three types of exceptions: CancelFitException, CancelEpochException, and CancelBatchException. Any callback can raise any one of these three exceptions to indicate that it does not want to execute that particular batch, epoch, or fitting altogether. Finally, Howard creates a single batch callback that raises a CancelFitException after the first batch to test the learning model easily.
In this section, the instructor adds the single batch callback to the list of previously created callbacks and runs a test that cancels before the callback can execute. The callback can be made to run second by setting its order attribute to a higher number. The instructor then creates a Metric class to keep track of and calculate metrics, like accuracy and loss. The Metric class calculates the mean of the inputs equal to the targets and stores batches of inputs, targets, and the current batch size. Finally, the instructor creates a device callback that prints out the GPU device being used.
In this section, the speaker explains how to utilize CUDA or Apple GPU without any complications when having multiple processes in the DataLoader. By using a callback to modify callback in the Learner, they can set up the device for each batch as well as making a constructor that allows users to have more flexibility in training with different devices. The speaker also introduces torcheval, an official PyTorch project that already built metrics classes. They installed it with pip and import the MultiClassAccuracy and Mean metric. Finally, the speaker creates a metrics callback to print out the metrics.
In this section, the instructor explains how to create a metrics callback that tracks and prints accuracy and loss for the training and validation sets during model fitting. A little function called to_cpu is created to detach the tensor and removes all the gradient and computation histories used to calculate a gradient and put it on the CPU. The instructor uses **kwargs to create a shortcut in which a metric name and object can be passed in without explicitly writing accuracy = ..., and the same name as the class will be used as the metric name. The Metrics callback is then passed in as one of the callbacks and used to calculate the metrics, reset them before each epoch, create a dictionary of keys and values after each epoch, and update the metrics after each batch. The instructor also mentions that the code can be improved in terms of how the metrics are displayed. The learner is considered of intermediate complexity and is still designed to fit in a single code screen.
In this section, the speaker explains how he has made the code much cleaner and shorter by using a context manager, which helps to call both the before and after callbacks in a try-except block, using a 'with' statement. This means that the try, before_name, the epoch, and after fit all get called when we need them. The code has been changed to call self.predict, self.get_loss, self.backward, and other similar methods instead of these methods being called explicitly, thereby reducing the burden of maintaining the code and giving more flexibility. Additionally, the speaker shows how he has used variables in Python's global dictionary to call the CancelFitException instead of writing it explicitly in the code.
In this section, the video discusses how to create a callback for training that can access the Learner using self.learn, which allows for modification of how data styles are processed. The callback can access self.learn.preds, self.learn.model, self.learn.batch, and independent variables. The progress callback is also introduced, which displays the current loss in real-time and creates a graph. Additionally, the video explains how classes can replace how metrics are displayed, providing the ability to modify how metrics are displayed and enabling users to change the display or export metrics to other services.
In this section, the instructor shows how to create a more flexible Learner by subclassing it. By subclassing, you can define the necessary functions directly in the Learner subclass, making them available without using getattr. To demonstrate, the instructor changes the zero_grad function to multiply gradients by a number instead of zeroing them. This would reduce previous gradients by some amount, making it unnecessary to call zero_grad in PyTorch. The new zero_grad function is defined as a parameter, so you can easily vary the amount of multiplication.
In this section, the instructor explains the concept of momentum, which is used in deep learning optimization methods. Normally, the momentum is computed by storing a complete copy of all the gradients that are exponentially weighted moving averages. However, the instructor uses a trick to use the dot grad themselves to store these averages. Additionally, a learning rate finder callback is also explained, which helps in determining the optimum learning rate at which loss doesn't keep increasing.
In this section, the speaker explains how to decide when to stop training and to keep track of the minimum loss. They suggest stopping training if the loss is three times higher than the minimum loss. Learning rates and losses are stored in PyTorch in optimizer dictionaries, allowing updates in the learning rates to be made. Additionally, the speaker introduces the idea of callbacks in PyTorch, which can be self-contained, and clarifies the simplicity of the learning rate scheduler.
In this section, the instructor explains how the scheduler adjusts the learning rates, and introduces the PyTorch ExponentialLR scheduler as an example. The main difference in the new version is the creation of the self.sched object in before_fit, which adjusts the learning rate multiplier. The use of PyTorch schedulers is not doing anything extraordinary and can be done with just one line of code. The instructor also suggests renaming the plot to after_fit, which means the learn.fit function can be called and the next function can be deleted. Finally, the instructor moves on to how to identify and diagnose problems with the models while training using the set_seed function.
In this section, the instructor introduces a function that sets a reproducible seed in PyTorch and uses it to train a model on the Fashion MNIST dataset. He explains the benefits of training models quickly and at high learning rates, but notes that sometimes, training can go awry. To investigate why this can happen, the instructor creates his own SequentialModel to get the means and standard deviations of each layer in the model and plots them. He shows that in this case, the activations started small, increased exponentially, and then crashed, leading to unstable training.
In this section, the importance of monitoring the activation means and standard deviations of a model during the training process is explained. If the activations are too close to zero, the model will not train properly, so it's important to keep an eye on the means and standard deviations throughout the training process. The section also introduces PyTorch hooks, a way to add a function to the execution of every forward pass or backward pass through a layer, making it easy to keep track of the activation means and standard deviations for each layer in the model. By using hooks, users can monitor their model's training progress without having to check it manually and update their model as necessary.
In this section, the instructor explains hooks and callbacks in PyTorch, which are less flexible than callbacks used in the Learner because they have limited access to states and cannot change things. To simplify hook creation, the instructor creates a class called Hook and defines a remove function to ensure hooks are removed when no longer needed. A further simplified approach is to create a Hooks class, which makes it easier to add multiple hooks with just one extra line of code, and is used as a context manager to loop through and index into each hook. The instructor explains that this behavior is possible by creating a class using the most flexible general way of creating context managers.
In this section, the instructor goes over the concepts of dunder enter, dunder exit, and dunder delitem in the context of Python's list class. The hooks class, which inherits from the list class, is also discussed, and the instructor demonstrates how it can be used to attach hooks to layers in a PyTorch model. Finally, the concept of colorful dimension histograms is introduced as a tool for understanding a model's internal workings.
In this section, the instructors explain how to create a histogram of the absolute values of the activations and use those histograms to create a color-coded single column plot. They apply this concept to show the training patterns of the first, second, third, and fourth layers. The plot shows that the training is following the same pattern, which is that the vast majority of values are close to 0, and there are only a few with slightly bigger numbers.
In this section, the speaker discusses the importance of monitoring the activation functions during model training. He explains that the goal is to see a nice even gradient that gradually decreases the number of activations, resembling a normal distribution. However, if there are too many inactive or nearly inactive activations, then the model is not doing much work and will not improve. The speaker advises that if there is a rising and crashing pattern in the activation functions during early training, it is best to stop and restart training as the model will likely never recover. Overall, the speaker emphasizes the importance of monitoring the activation functions to understand what is going on inside the model and train it reliably and quickly.
In this section, the instructor concludes the class by mentioning that the next lesson will focus on important topics such as initialization in deep learning models. He advises students to be well-versed in concepts like standard deviations before diving into the next class. He thanks the viewers for joining him and expresses his excitement for the next lesson.
No videos found.
No related videos found.
No music found.