Part One exploration of AWS SageMaker and Machine Learning.

Machine Learning is taking over every aspect of computing and business with no end in sight. Every organization is looking for ways to leverage machine learning to boost productivity and expand understanding of business insights. AWS SageMaker is Amazon's answer to help you build, train, and deploy ML models faster and with ease. Let’s start with the basics. What exactly is SageMaker?

What is AWS SageMaker

SageMaker is a fully managed machine learning service for building, training, and deploying ML models to a production ready hosting environment. Machine Learning models are how we “teach” computers to make predictions. You start to train a model by feeding an algorithm example data. The types of example data depend on the business problem you face. Data must be cleaned and transformed after it's gathered by data professionals before being used. After your data is ready you can begin training and then analyze your data for accuracy, you’ll also need to compute resources for training your models. Once your model’s accuracy of the inferences is at an acceptable level you can use SageMaker hosting service to deploy your model independently from your application code keeping everything decoupled. Machine Learning is an ongoing process. Continuous tweaking of the model. Collect truths, monitor the inferences and look for drift. Now with that simple understanding and outline of the machine learning process let’s go a little more into detail about some of the features used to help with this process. Note to remember is SageMaker is a vast service with new features consistently being added this is not an exhaustive list. But rather part one in a SageMaker series.

Amazon SageMaker Clarify

Clarify helps with the detection of potential bias. It also helps explain the predictions the model makes. It can identify different types of bias in both pre and post-training of models and in production. SageMaker Clarify uses feature attribution approach to help explain how models make their predictions. You can use Clarify tools to keep your risk and compliance teams, as well as external regulators, informed with model governance reports. An application that utilizes ML provides the benefits of increased productivity, improved accuracy, and cost savings. In addition it helps meet regulatory requirements and improve business decisions. Another important concept is fairness and explainability functionality. These are components that help build more understandable and less biased machine learning models.

Amazon SageMaker Autopilot

Autopilot feature was built to bring ML to those whose organization or team has limited experience with machine learning. SageMaker Autopilot can be used in different ways from varying levels of human interaction to complete Autopilot. It uses the automatic machine learning (AutoML) process to automate the important task. Things like evaluating data, preparing data for model training and tuning, as well as picking the relevant algorithms for the problem domain. The AutoML follows in steps as.

Load raw data to train the model.

Select target columns for prediction.

Automatic Model Creation by selecting the correct algorithm, training it and tuning it.

Full visibility of model notebooks.

Pick the best model from a ranked list that best fits your needs.

Deploy and monitor your model.

You can use Autopilot with code via AWS SDK or no code with SageMaker Studio. Using a feature attribution approach developed first for SageMaker Clarify, Autopilot can help explain model predictions. Currently, Autopilot supports regression, binary and multiclass classification problem types.

Amazon SageMaker Data Wrangler

This feature is a part of SageMaker Studio and provides an end-to-end solution to import, prepare, transform, featurize, and analyze data. To use Data Wrangler, you need an active Studio instance.

Amazon SageMaker Debugger

Just as its name implies Debugger profiles and debugs training jobs to help resolve such problems as system bottlenecks, overfitting, saturated activation functions, and vanishing gradients. It offers tools to send you alerts when training anomalies are found. It can also visualize collected metrics and tensors to use in identifying problems. Debugger is available in nearly all regions. You can use Debugger with custom training containers. Debugger supports profiling functionality for performance optimization to identify computation issues and debugging functionality for model optimization is about analyzing non-converging training issues.

Amazon Augmented AI(A2I)

This is a service that brings in human review of ML predictions to all developers and organizations. Amazon A2I efficiently builds and manages human reviews for ML applications. A host of ML apps you build will need human review for low-confidence predictions or random prediction samples. Things like content moderation and text extraction from documents for example. A2I can integrate with a number of other ML services in AWS.

Batch Transform

Batch Transform is a feature too with things like getting inferences from large datasets, running inference when you don't need a persistent endpoint or preprocessing datasets to remove noise or bias that interferes with training or inference. You can use Batch Transform to test different hyperparameter settings or various models. BT can be used for one-off evaluations of model fit. There are costs incurred with the deployment process for the Batch Transform Jobs.

SageMaker JumpStart

JumpStart provides solution templates and open-source, pre-trained models for a wide range of problem types. You can access JumpStart through its landing page in SageMaker Studio or with the SageMaker Python SDK. You can then browse for solutions, models, data types, notebooks, frameworks and other resources. With just one click you can have an ML solution to many common problem types.

Amazon SageMaker Studio Notebooks

SageMaker Studio Notebooks are collaborative notebooks you can launch fast and easily. You don’t have to provision compute resources or file storage. Studio notebooks provide persistent storage so even when the instances are shut down you can still share and view notebooks. Teams can share access to a read-only copy of the notebook through a secure URL. A copy of the notebook will open in the same environment as the original. A Studio notebook runs in an environment defined by an EC2 instance type, SageMaker image, KernelGateway app and Kernel. A number of different instance types can be quickly launched with Fast Launcher.

Conclusion

With that, this concludes part one of our exploration into SageMaker. In the coming articles, we’ll look into actually using and training ML models. As well as the deeper concepts behind what SageMaker is doing. Please like, share and subscribe for more articles.

Become a Sage with AWS SageMaker