Build ML models using SageMaker Studio Notebooks - AWS Virtual Workshop

AWS-Online-Tech-Talks

Build ML models using SageMaker Studio Notebooks - AWS Virtual Workshop by AWS-Online-Tech-Talks

The AWS Virtual Workshop on building ML models using SageMaker Studio Notebooks covers the essential capabilities of Amazon SageMaker Studio for building, training, and deploying machine learning models. SageMaker Studio provides pre-built images for popular machine-learning frameworks, including MXNet, PyTorch, and TensorFlow. The workshop demonstrates how to explore data using SageMaker Studio Notebooks, install additional packages to the kernel for libraries such as Plotly and Matplotlib, and compute histograms and apply data transformations. It also covers SageMaker training jobs, experiments, and trials, and logging and comparing metrics using regex for the standard output of the container. The workshop concludes by exploring SageMaker Studio features such as SageMaker projects, pipelines, experiments, trials, model registry, compiled jobs, and edge packaging for machine learning (ML) at the edge.

00:00:00

In this section, the senior product manager of the SageMaker team and a senior machine learning solutions architect discuss the learning objectives of the virtual workshop. The intermediate course showcases how to use SageMaker Studio Notebooks for building machine learning models, including the installation and exploration of some popular open-source extensions that augment workflows within Studio Notebooks. Participants also learn how to track and manage training and data processing jobs and test machine learning model performance. The fully integrated development environment of SageMaker Studio eliminates the need for months of writing customer integration code and provides purpose-built tools for every step of the machine learning development process, including labeling data, preparation, feature engineering, statistical bias detection, auto ML, training, tuning, hosting, explainability, monitoring, and workflows. The section also highlights the different feature sets within SageMaker Studio that help in data preparation, such as Data Wrangler, SageMaker Processing, and Amazon SageMaker Feature Store. Additionally, Studio Notebooks provide essential capabilities for building machine learning models, including over 20 built-in and optimized algorithms, over 300 pre-built models, and 15 pre-built solution templates for different scenarios.

00:05:00

In this section, the capabilities of Amazon SageMaker Studio for building and training machine learning models are explained. SageMaker Studio has over 20 built-in algorithms and pre-built container images available for quickly training and running inferences. It also offers SageMaker JumpStart for pre-built solutions and open-source models to quickly get started. AutoML and SageMaker Autopilot are also available for automatically building and tuning machine learning models based on data while maintaining full control. The key capabilities within SageMaker Studio to train machine learning models are experiment management and model tuning, with the ability to track iterations and use SageMaker Debugger for real-time performance monitoring. Finally, deploying and managing machine learning models as a fully managed service is made easy with SageMaker Studio, offering scalability and cost-effectiveness to deploy thousands of models on a single endpoint.

00:10:00

In this section, the capabilities of Sagemaker Studio Notebooks are presented, which are crucial to powering and enhancing the experience of machine learning. One can start their notebook without waiting for an instance to spin up and can customize the environment to meet their enterprise needs by bringing their own image packages extensions and automating customizations with lifecycle configurations. The integrated debugger and language server extension help with modern ID functionalities such as tab completion, syntax highlighting, and variable renaming. Sharing notebooks with co-workers is effortless, and the latest open-source Jupiter extensions are supported. A step-by-step demonstration by Sean shows how to use Sagemaker Studio Notebooks to boost productivity.

00:15:00

In this section, the speaker introduces the concept of lifecycle configurations in SageMaker Studio, which allow administrators or end-users to run scripts at the time of Jupiter server or kernel launch. She explains that a lifecycle configuration has been set up for the demo that pre-installs Jupiter extensions onto users' SageMaker Studio domain, highlighting the customizability of SageMaker Studio and the launch of Jupiter lab 3. The speaker then navigates through the landing page of SageMaker Studio and shows viewers how to access the extension manager tab and the user's home directory, where they can place code repositories or data sets specific to their user. She concludes the section by providing a link to a lightweight code repository used in the demonstration.

00:20:00

In this section, the presenter goes through the steps for setting up an iPython Notebook in SageMaker Studio and selecting the correct kernel and instance type. The presenter demonstrates how the IDE has pre-built optimized images for popular machine learning frameworks such as MXNet, PyTorch, and TensorFlow, making it easy to develop ML models. Additionally, he runs through some of the features of the IDE, such as checking the CPU and memory utilizaion, and installing additional Jupyter Lab extensions using the terminal.

00:25:00

customizing the JupyterLab interface, but also for improving code formatting and standardization. In this section of the workshop, the instructor shows how to install Jupyter extensions and libraries into a specific Conda environment, including the JupiterLab LSP and spell checker, as well as the JupiterLab code formatter and the Black library. These tools can help teams standardize their notebooks and Python files by automatically formatting them to meet certain specifications. The instructor then demonstrates how to import libraries and get started with building machine learning models using SageMaker Studio notebooks.

00:30:00

In this section of the video, the presenter starts by showing the syntax highlighting of a python file. Then he uses a California housing dataset and does exploratory data analysis on it before trying to build a model to predict the medium house value using the population, income, longitude, and latitude. He shows how to view the source code of the imported modules, and installs additional packages like plotly notebook format and matplotlib to the kernel. Finally, the presenter explains how to configure your notebook so that libraries can be imported throughout the notebook.

00:35:00

In this section of the video, the presenter demonstrates how to compute histograms using the data set in SageMaker Studio Notebooks. They show how to bin the medium income and age of households in California, as well as the average number of rooms and the population of the area. Using Plotly, they create an interactive histogram of the house age, with the ability to zoom in on particular regions and reset the zoom. They also demonstrate how to do data transformations natively within the notebook, by creating some Pandas data frames, and utilizing the train test split functionality from scikit-learn. They use the Standard Scalar Functionality to transform the data set, then save the files locally to disk. Next, the presenter shows how to build a model locally in the notebook, using the same instance and kernel to create a machine learning model based on the prepared data set. They do this using a function called "get model," which pulls a Keras model with an input shape that has three dense layers.

00:40:00

In this section of the transcript, the speaker demonstrates how to use SageMaker Studio Notebooks to train a machine learning model using open-source frameworks like TensorFlow, Keras, and PyTorch. They highlight how SageMaker provides built-in algorithms to simplify the training process but still showcase how to do it with user scripts. The speaker then shows how to utilize the rest of the SageMaker environment to scale up model training by doing things like hyperparameter optimization, ephemeral job training, and leveraging spot instances. Finally, the transcript mentions some helpful functionalities integrated with Jupiter Lab 3, such as the tab completion and the table of contents for easy navigation in large notebooks.

00:45:00

In this section, the speaker discusses the process of using SageMaker training jobs to scale training out and launching many different training jobs that slightly alter the parameters of the models. The speaker uses SageMaker experiments to compare the various trials that are run to find optimized performance and shows how to chart and plot and compare different runs. They also discuss building a set of hyperparameters that they want to search over and iterating through all of those sets using the TensorFlow Estimator for SageMaker training. Additionally, the speaker logs various metric definitions using regex values that can be recorded from the standard output of the container and points to an entry point that will be utilized for the deployed ephemeral training job.

00:50:00

In this section of the workshop, the instructor explains how to use weight equals false and weight equals true parameters when building models in notebooks. Weight equals false allows multiple jobs to be kicked off with a loop, while weight equals true will wait for the job to complete synchronously, and the logs are printed directly into the notebook. The instructor demonstrates this by launching three jobs using weight equals false, and then a fourth job using weight equals true, which shows the container logs directly in the notebook. Additionally, the instructor explores other features of SageMaker Studio IDE, including SageMaker projects, pipelines, experiments, trials, model registry, compiled jobs, and edge packaging for machine learning (ML) at the edge.

00:55:00

In this section, the speaker demonstrates how to inspect the results of training jobs launched from experiments and trials in SageMaker Studio. The metric definitions, such as loss accuracy, validation loss, and validation accuracy, are actively collected during the training jobs. The speaker also explains the different types of data input modes available for the training jobs, such as file mode, fast file mode, and streaming data directly to the instances. Additionally, the speaker shows how to inspect the experiments and trial analysis by selecting all trials, removing unnecessary columns, and plotting a time-series line chart to compare the results of the training jobs.

01:00:00

In this section, the workshop presenter demonstrates how to use the SageMaker Tuner to determine hyperparameters such as learning rates. The Tuner uses algorithms to determine what hyperparameters should be optimized, and users can set continuous, categorical, and integer parameters to search over. The presenter demonstrates how to wrap the estimator in the hyperparameter tuner and run multiple jobs at the same time using a Bayesian updating strategy. The presenter also shows how to analyze the results of the hyperparameter tuning job and deploy the best instance to a real-time endpoint for further testing. Other functionalities of SageMaker Studio Notebooks include native debugging, adjusting advanced settings, and configuring language server preferences.

01:05:00

In this section, the speaker discusses SageMaker processing jobs, which work similarly to training jobs, but are used for distributed data processing. These jobs can process large datasets, and one can customize it to use built-in containers for popular frameworks or use generic Docker containers. The process involves passing data from S3 buckets into the processing container, which can run on one or more instances. A script processor is set with input and output S3 buckets, making it an easy and scalable method to process data, just like scaling model training. Finally, the speaker shows how a hyperparameter optimization tuner is automatically creating an experiment with associated trials. The tuner finds optimal values better than humans can, even for continuous parameters like learning rate, which humans would have never inputted.

01:10:00

In this section, the presenter comments on some analysis using a tuner and shows various methods that can be used to access it, including deploying the best model for training. They demonstrate how this can be used for programmatic usage of SageMaker experiments and hyperparameter optimization. Additionally, they discuss the features of AWS SageMaker Studio, including the ability to run multiple types of instances and leverage GPU-optimized kernels directly from the notebook without launching a training job. The presenter deploys an endpoint configuration for an active endpoint, which will be used for real-time predictions.

01:15:00

In this section, the speaker demonstrates how to pass data to a predictor using restful endpoints and the SageMaker SDK. They show the results printed for each value and advise users to delete the endpoint once the demo is completed to avoid running a 24/7 endpoint unnecessarily. Lastly, they advise users to shut down the instances that are running to back their notebooks. The speaker thanks the attendees and concludes the workshop, hoping that it was a valuable learning experience.

More from
AWS-Online-Tech-Talks

No videos found.

Related Videos

No related videos found.

Trending
AI Music

No music found.