Featured

New SageMaker features cut machine learning curve

Published on 15 December 2020

AWS provided the following information on the new features, which it says build on more than 50 new Amazon SageMaker capabilities that AWS has delivered in the past year:

Amazon SageMaker Data Wrangler automated data preparation: Amazon SageMaker Data Wrangler provides the fastest and easiest way to prepare data for machine learning. Data preparation for machine learning is a difficult process. This difficulty arises from the fact that data attributes (known as features) used to train a machine learning model often come from different sources and exist in various formats. This means that developers must spend considerable time extracting and normalising this data so it’s consistently easy to use with machine learning. Customers might also want to combine features into composite features to give the machine learning model more helpful inputs.
- For example, a customer might want to create a feature that describes a group of customers that are prolific spenders so they can be offered loyalty program rewards by combining features for items previously purchased, the amount spent, and frequency of purchases. The work associated with transforming data into features is called feature engineering, and it consumes a lot of time for developers when they’re building machine learning models. Amazon SageMaker Data Wrangler radically simplifies the process of data preparation and feature engineering.
- With Amazon SageMaker Data Wrangler, customers can choose the data they want from their various data stores and import it with a single click. Amazon SageMaker Data Wrangler contains over 300 built-in data transformers that can help customers normalise, transform, and combine features without having to write any code, while managing all of the processing infrastructure under the hood. Customers can preview and inspect that these transformations are what was intended by viewing them in SageMaker Studio (the first end-to-end Integrated Development Environment for machine learning). Once the features have been engineered, Amazon SageMaker Data Wrangler will save them for reuse in the Amazon SageMaker Feature Store.
Amazon SageMaker Feature Store feature storage and management: Amazon SageMaker Feature Store provides a new repository that makes it easy to store, update, retrieve, and share machine learning features for training and inference. Today, customers can save their features to Amazon Simple Storage Service (S3). This works well for a simple set of features that are mapped to a single model, but most features are not mapped to only one model. Most features are used repeatedly by multiple models and multiple developers and data scientists, and as new features are created, developers also want to be able to reuse them repeatedly.
- This leads to multiple S3 objects to manage, which can quickly become difficult to manage. Developers and data scientists try to solve this by using spreadsheets, paper notes, and emails. Sometimes they even try to build a custom application to keep track of the features, but this is a lot of work and error-prone. Further, developers and data scientists need the same features not only to train multiple models with all of the data available and where this activity can happen over hours, but also to use during inference when the predictions need to be returned in milliseconds and often use just a subset of the data in relevant features.
- For example, a developer might want to create a model that predicts the next best song in a playlist. To do this, developers would train the model on thousands of songs and then provide the model the last three songs played during inference to predict the next song. Training and inference are very different uses cases. During training, the models can access the features offline and in batch, but for inference, the model needs only a subset of the features in near real-time. Since machine learning models have a single source of features that need to be consistent, these different access patterns make it challenging to keep the features consistent and up to date.
- The Amazon SageMaker Feature Store solves this problem by providing a purpose-built feature store where developers can access and share features that make it faster to name, organise, find, and share sets of features among teams of developers and data scientists. Since the Amazon SageMaker Feature Store resides in Amazon SageMaker Studio—close to where machine learning models are run—it provides single-digit millisecond latency for inference. Amazon SageMaker Feature Store makes it simple and easy to organise and update large batches of features for training and smaller instantiations of them for inference. That way, there’s one consistent view of features for machine learning models to use and it becomes significantly easier to generate models that produce highly accurate predictions.
Amazon SageMaker Pipelines workflow management and automation: Amazon SageMaker Pipelines is the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning. As customers can see with feature engineering, machine learning comprises multiple steps that can benefit from orchestration and automation. This is not dissimilar to traditional programming, where customers have tools like CI/CD to help them develop and deploy applications more quickly. However, with machine learning, CI/CD tools are rarely used because they don’t exist or because they are hard to set up, configure, and manage. With Amazon SageMaker Pipelines, developers can define each step of an end-to-end machine learning workflow.
- These workflows include the data-load steps, transformations from Amazon SageMaker Data Wrangler, features stored in Amazon SageMaker Feature Store, training configuration and algorithm set up, debugging steps, and optimisation steps. With Amazon SageMaker Pipelines, developers can easily re-run an end-to-end workflow from Amazon SageMaker Studio, using the same settings to get the exact same model every time, or they can re-run the workflow on a regular schedule with new data to update a model. Amazon SageMaker Pipelines logs each step in Amazon SageMaker Experiments (an Amazon SageMaker capability that organises and tracks machine learning experiments and model versions) every time a workflow is run. This helps developers visualise and compare machine learning model iterations, training parameters, and outcomes. With Amazon SageMaker Pipelines, workflows can be shared and re-used between teams, either to recreate a model or to act as a starting point for making improvements through new features, algorithms, or optimisations.
Amazon SageMaker Clarify bias detection and explainability: Amazon SageMaker Clarify provides bias detection across the machine learning workflow, enabling developers to build greater fairness and transparency into their machine learning models. Once developers have prepared data for training and inference, they need to try to ensure the data is free from statistical bias and that model predictions are transparent, so they can explain how the model features are contributing to predictions. Today, developers sometimes try to use open source tools to detect statistical bias in their data, but these tools require a lot of manual effort and coding and are typically error-prone.
- With Amazon SageMaker Clarify, developers can now more easily detect statistical bias across the entire machine learning workflow and provide explanations for predictions their machine learning models are making. Amazon SageMaker Clarify integrates with Amazon SageMaker Data Wrangler where it runs a set of algorithms on features to identify bias during data preparation with visualisations that include a description of the sources and severity of possible bias. This way, developers can take steps for mitigation. Amazon SageMaker Clarify also integrates with Amazon SageMaker Experiments to make it easier to check trained models for statistical bias. It also details how each feature inputted into the model is affecting predictions. Finally, Amazon SageMaker Clarify integrates with Amazon SageMaker Model Monitor (an Amazon SageMaker capability that continuously monitors the quality of machine learning models in production) to alert developers if the importance of model features shifts and causes model behaviour to change.
Deep Profiling for Amazon SageMaker Debugger model training profiler: Deep Profiling for Amazon SageMaker Debugger now enables developers to train their models faster by automatically monitoring system resource utilisation and providing alerts for training bottlenecks. Today, developers don’t have a standard way to monitor system utilisation (e.g. GPU, CPU, network throughput, and memory I/O) to identify and troubleshoot bottlenecks in their training jobs. As a result, developers can’t train models as quickly and cost-effectively as possible.
- Amazon SageMaker Debugger solves this problem with Deep Profiling’s newly announced capabilities, which provide developers with the ability to visually profile and monitor system resource utilisation in Amazon SageMaker Studio. This makes it easier to root cause issues and reduce the time and cost of training machine learning models. With these new capabilities, Amazon SageMaker Debugger expands its scope to monitor the utilisation of system resources, send out alerts on problems during training in Amazon SageMaker Studio or via AWS CloudWatch, and correlate usage to different phases in the training job or a specific point in time during training (e.g. 28 minutes after the training job started). Amazon SageMaker Debugger can also trigger actions based on alerts (e.g. stop a training job when irregularities in GPU usage are detected). Amazon SageMaker Debugger’s Deep Profiling works across frameworks (PyTorch, Apache MXNet, and TensorFlow) and collects the necessary system and training metrics automatically without requiring any code changes in training scripts. This allows developers to visualise how their system resources were used during training in Amazon SageMaker Studio.

Read more on the next page about further capabilities of SageMaker.

Subscribe to our free newsletter

Gadget

Featured

New SageMaker features cut machine learning curve

Trending

GadgetWheels

‘How I built the
ultimate Landy’

Gadget of the Week

Gadget of the Week: Lenovo launches folding power

GadgetWheels

Everest is an adventure on wheels

Africa News

Global music industry booms, led by South Africa

Software

Adobe unveils AI future of customer experience management

Share

Trending

GadgetWheels

‘How I built the ultimate Landy’

Gadget of the Week

Gadget of the Week: Lenovo launches folding power

GadgetWheels

Everest is an adventure on wheels

Africa News

Global music industry booms, led by South Africa

Software

Adobe unveils AI future of customer experience management

‘How I built the
ultimate Landy’