NoShade.Vision

Lesson 15: Deep Learning Foundations to Stable Diffusion

Jeremy-Howard

Lesson 15: Deep Learning Foundations to Stable Diffusion by Jeremy-Howard

The video discusses various topics related to convolutional neural networks (CNNs). The instructor explains what convolutions are and how they can be used to convey structural information to the model. They discuss the process of performing convolutions using sliding windows and the "im2col" trick, which allows convolution to be performed using matrix multiplication for faster computation on GPUs. The video also covers stride and padding, as well as reducing dimensionality using max-pooling or stride-2 convolution. The importance of receptive fields is also discussed, and the instructor demonstrates how to calculate the number of parameters in a model using PyTorch and even shows how to perform CNN operations using Microsoft Excel. Finally, the instructor uses the fashion MNIST dataset to illustrate how to create a classifier using a sequential model, along with creating a collate function and data loader function.

00:00:00

In this section, the instructor talks about convolutions in neural networks and how they allow us to convey structural information to the model, making it easier to solve problems. Convolutions are particularly useful for image processing tasks as they can capture the relationships between pixels in an image. The instructor also introduces a new library called miniai, which they are developing using the nbdev tool. The library contains useful tools for deep learning and can be installed locally as a Python module.

00:05:00

In this section, the instructor discusses convolutions in deep learning and describes how the process works. A convolutional neural network is created using the MNIST dataset and a 2 by 2 matrix kernel, with the overlay happening over the first 2 by 2 sub grid, matching color to color, with each of the coefficients being applied to the respective colored squares. The process involves flattening the rank 2 tensor into vectors and performing a dot product as the kernel is slid over the grid, creating a convolution. The instructor also creates a rank 2 tensor kernel with certain values for testing purposes.

00:10:00

In this section of the video, the instructor explains how sliding a 3x3 kernel over a 28x28 image can detect certain features such as edges. By multiplying the kernel element-wise with each 3x3 section of the image and summing the result, we end up with a positive or negative number that indicates the presence of an edge in a certain direction. This process can be applied to every 3x3 section of the image by sliding the kernel over it. The instructor demonstrates this process using a list comprehension, creating a tuple containing the coordinates of each 3x3 section and then applying the kernel to each of them using a custom function called apply_kernel().

00:15:00

In this section, the speaker explains how to perform convolutions on images and mentions that the first layer of a convolutional network often looks for edges and gradients. To do the convolutions quickly, a way of converting a convolution into a matrix multiplication called im2col was developed. The speaker explains that this was accidentally reinvented, as it had already been around for a while and was mentioned in a paper. The im2col technique is what made GPU acceleration possible for deep learning and is widely used today.

00:20:00

In this section of the video, the presenter explains how to perform a convolution operation using matrix multiplication rather than sliding window. This is known as the "im2col" trick, where instead of matching a small window to the input, the input and kernel are flattened into matrices and multiplied together. The presenter provides a NumPy implementation of the trick and explains how to implement it in PyTorch using the "unfold" function. This results in a GPU-optimized implementation that is much faster than the sliding window approach, as shown by benchmark tests.

00:25:00

In this section, the author explains how PyTorch's conv2d() can be used, and that it provides the same level of acceleration as executing unfold() and matrix multiplication. They also mention how padding can be used to avoid the loss of a single pixel on each side, and odd-numbered kernels are better than even-numbered size kernels. With odd kernels, by using ks by ks sized kernels, if ks is even, then divide and truncate it to (ks//2) will give the right size. Additionally, they highlight that a window can move across a window by altering the kernel to jump more than one pixel at a time.

00:30:00

In this section, the video explains how stride and padding work in convolutional neural networks (CNNs). Stride refers to the amount by which a CNN moves the kernel over the input image in each row and column, and by applying a stride 2 convolution we can reduce the dimensionality of the input image by a factor of 2, which is useful for many applications. To create a CNN, we can use a function that performs a Conv2d with a stride of 2 followed by an activation, to be added optionally. The number of filters refers to the number of channels that our convolution will have, and we can add a comment to remind ourselves of the grid size after each layer. Finally, we can train our CNN that generates 10 probabilities for each possible digit using two datasets, one for training and another for validation, and GPU acceleration.

00:35:00

In this section, the speaker discusses device options for running deep learning models, including using MPS on an Apple Silicon Mac or CUDA on an Nvidia GPU. They also provide a function to move tensors to a specified device and create a custom collate function to train a convolutional neural network (CNN) on the GPU. With minimal code changes from the previous MLP model, the speaker is able to achieve similar accuracy with the CNN. They also mention the NCHW axes convention and the growing support for channels-last in PyTorch.

00:40:00

In this section, the speaker shows how to automatically calculate the number of parameters in a model, both bias and weights. Using a list comprehension and PyTorch, they demonstrate how to calculate the number of parameters in an MLP and a simple CNN. They also address the misconception that convolutional neural networks can handle any sized image, explaining that this particular CNN can only handle images that end up with a 1 by 1 after going through the stride-2 convs. Finally, they demonstrate how to deconstruct the parameters inside a convolution and show an Excel workbook with a receptive field calculation.

00:45:00

In this section of the video, the instructor demonstrates how to perform convolutional neural network (CNN) operations using Microsoft Excel. He shows how he created a top edge filter and a left edge filter using conditional formatting applied to the pixel values he copied into Excel from MNIST. He then demonstrates how Excel can do broadcasting, allowing for simple broadcasting in the software. He notes that when building a CNN using Excel, one will need a set of 3 by 3 filters for each input and a rank 4 tensor.

00:50:00

In this section, the instructor discusses two options for reducing dimensionality in a Convolutional Neural Network (CNN): max-pooling and stride-2 convolution. They have the same effect, which is to reduce the grid size by 2 on each dimension. To create a single output, an approach of applying a dense layer to each in a 14x14 grid is discussed, but the more common approach now is to use global average pooling. The receptive field, a region of the input that affects the output of a specific neuron, is also covered. The receptive field of a neuron at the end of a deep network is very large, and inputs closest to the middle of the receptive field have the most significant impact on output.

00:55:00

In this section, the instructor takes a break after discussing the importance of receptive fields in convolutional neural networks and then switches to a different dataset, the fashion MNIST dataset. The instructor creates a classifier using a sequential model, and they grab the parameters of the CNN, which have been moved over to the device. They also create a collate function and a data_loader function that creates a dictionary of data loaders, which can be used to get the x and y batch by calling next(). Finally, the instructor shows the fashion MNIST mini-batch using the show_images function that was created earlier.

01:00:00

In this section, the instructor explains why the process of decoding the PNG images in the dataset is slowing down the DataLoader in the training process. The Hugging Face dataset is not pre-converted into a single tensor like the MNIST dataset, instead, each image is a separate PNG image which needs to be decoded. Adding the argument num_workers to the data loaders can speed up the process, but this leads to an error when trying to put items onto the GPU in separate processes. The only solution is to rewrite the fit function entirely, which is not preferable. Additionally, the accuracy of the model is low, and the instructor suggests using paperswithcode to compare it to other similar models.

01:05:00

In this section, the speaker discusses creating an auto-encoder and how to improve its efficiency. They demonstrate using a stride-2 convolution to reduce the size of the input by a factor of 2 and can add extra channels. They also talk about nearest neighbor upsampling as the simple way to double the grid size with convolution. The autoencoder requires a deconvolutional layer containing two layers of upsampling and strided convolution and an optional activation function before creating a sequential using layers passed as separate arguments.

01:10:00

In this section, the presenter goes through the process of creating a fitness function for an autoencoder to compress an image by eight times. They use mse_loss to compare the pixels of the input and output images, however, they have to create their own fit function. The presenter runs the fit function but unfortunately, the autoencoder could not recreate the original image. The presenter highlights the difficulties in getting the autoencoder to work and suggests possible solutions such as using a better optimizer, a better architecture, or a Variational Auto-Encoder.

01:15:00

In this section, the instructor discusses the need for a framework to enable rapid testing and iteration of different ideas and introduces the concept of a learner to achieve this. The learner is built to allow fast testing, introspection of the model, multi-process CUDA, data augmentation, trying different architectures, and more. They also introduce the DataLoader class to load data and the fit function to train the model, utilizing fastcore.store_attr() to reduce boilerplate code and fast testing of different ideas. The learner stores all the necessary information in the constructor and calls the one epoch function for each epoch to train or evaluate the model.

01:20:00

In this section, the video instructor explains the basics of the Learner class that has been created in the previous sections. The learner class handles the training of the model and the calculation of loss functions and metrics. The instructor introduces the concept of Metric class, which subclasses from the base Metric class and calculates particular metrics. This class is used to customize the Learner to handle other things, which are not multi-class or binary classification. The code for a basic Metric object, which is used for loss, is also provided. The instructor demonstrates the use of mini batches to add data to an Accuracy metric object, which can calculate the accuracy of the model. The .value property of the object can calculate the accuracy without the need for parentheses.

01:25:00

In this section, the instructor explains how they have created a Metric class that can be used to create any metric by overriding the calc() function or creating new functions. They have also changed the Learner class and added a new decorator with callbacks, which stores the name as 'fit', and creates a special method called callback that is called before the actual function is executed. The callback can be used to perform any additional functionality required before the function is called. The Learner now has a fit() method that uses callbacks to perform additional functionality, such as creating optimizers and looping through epochs. The callbacks can be used to create charts and other visualizations.

01:30:00

In this section, the instructor explains the use of callbacks in PyTorch models and demonstrates the DeviceCB callback as an example. Callbacks are methods that are called automatically while running the fit() method. A callback is passed as a parameter and, before running the fit() method, the learner.dot_callback function is called, which passes the callback name to the callback method and iterates over all callbacks sorted by order. The try-except block is used to catch exceptions and stop the fit() method gracefully in case of errors. Finally, the instructor talks about the identity function used by callback when it can't find a method to execute.

01:35:00

In this section, the speaker encourages learners to familiarize themselves with the different components of the framework such as try-except blocks, decorators, and getattr to reduce their cognitive load and facilitate their learning process. The speaker notes that the cognitive load theory asserts that learning could be made difficult if the load on one's cognitive abilities is high. Given the relative simplicity of the framework, the speaker believes that getting comfortable with these components will enable learners to write powerful and general code.

More from
Jeremy-Howard

No videos found.

Trending
AI Music

No music found.