Jupyter notebooks

9/13/2023

These could be dates of phone call records in your call center or Tweet-streams from particular users in your social network. You may have an S3 bucket full of objects and need to run a full notebook on each object. Although it’s still fully supported to run Amazon SageMaker notebooks and Amazon SageMaker Studio on p3 instances, developing a habit of using the largest instances only for short periods is a net cost-savings exercise. You can easily connect these resources to the Experiments SDK with a few lines of code. You can spin up a p3.xlarge only for the hour your model trains but use your Studio environment all day on the affordable t3.medium. With this toolkit, you can easily use the advanced compute options only for the time you’re training a model. Perhaps you’re on a data science team that still trains models on local laptops or Amazon SageMaker notebooks, and haven’t yet adopted the Amazon SageMaker ephemeral instances for training jobs. Your code continues to run on the scheduled instance, regardless of whether or not you’re actively using Jupyter at the moment. No problem-just drop your notebook into the toolkit, run a job, close your laptop, and you’re done. You’ve already perfected the for-loop to knock out all your Pandas transformations, and all you need is time and compute to run this on the full 20 GB of data. Or, imagine that you want to scale up a feature engineering step. You can start your day with the latest notebook, executed overnight, to move your analysis forward. That would be a perfect fit for a scheduled notebook-all the graphs, tables, and charts are generated by your code, the same as if you stepped through the notebook yourself, except now they are handled automatically, in addition to persisting in Amazon Simple Storage Service (Amazon S3). For example, you may want to analyze all the training jobs your data science team ran that day, run a cost/benefit analysis, and generate a report about the business value your models are going to bring after you deploy them into production. This toolkit is especially useful for running nightly reports. Read on to learn all about how to use scheduled notebook execution. All the source code is available in aws-samples on GitHub.

We’ve written sample code that simplifies setup by using AWS CloudFormation to handle the heavy lifting and provides convenience tools to run and monitor executions.įor more information about executing notebooks, see the GitHub repo. With the tools provided here, you can do this from anywhere: at a shell prompt, in JupyterLab on Amazon SageMaker, in another JupyterLab environment you have, or automated in a program you’ve written. It includes a library and CLI to initiate notebook execution from any AWS client and a Jupyter plugin for a seamless user experience.Īs of this writing, you can write code in a Jupyter notebook and run it on an Amazon SageMaker ephemeral instance with the click of a button, either immediately or on a schedule. We’re happy to provide a do-it-yourself toolkit to simplify this process, using AWS CloudFormation to set up permissions, Lambda to launch the job, and Amazon Elastic Container Registry (Amazon ECR) to create a customized execution environment. The combination of Amazon SageMaker with Amazon CloudWatch, AWS Lambda, and the entire AWS stack have always provided the modular backbone you need to scale up jobs, like feature engineering, both on the fly and on a schedule.

In this post, we demonstrate using Amazon SageMaker Processing Jobs to execute Jupyter notebooks with the open-source project Papermill. Why not schedule the job from your notebook directly?Īmazon SageMaker provides a fully-managed solution for building, training, and deploying machine learning (ML) models. You could upgrade your notebook instance, but the job would stop as soon as you close your laptop. It just started working on your Amazon SageMaker Studio t3.medium notebook, and all you want to do is plug this onto a massive instance, scale it out over the rest of your dataset, and go home. You’ve spent all afternoon coding out a complex, sophisticated feature engineering strategy.

Operationalize your Amazon SageMaker Studio notebooks as scheduled notebook jobs.
Schedule your notebooks from any JupyterLab environment using the Amazon SageMaker JupyterLab extension.
May 2023: The functionality described in this blog post, is now natively available in SageMaker Studio, and can be installed as an extension into any Jupyter environment.

0 Comments

Jupyter notebooks

Leave a Reply.

Author

Archives

Categories