Case Study

Predictive Shipment ETA Using Machine Learning on AWS SageMaker

Ransomware-resilient architecture on AWS: How we secured a SaaS company’s critical data

Supply ChainCross-platform

Client Story

Challenge

Solution

The Results

Client Story

Challenge

Solution

The Results

Client Story

A client of our client, one of the global leaders in agricultural equipment manufacturing, faced challenges in accurately predicting shipment delivery times across their complex supply chain network. Traditional methods of estimating ETAs were unreliable due to multiple variables like shipping lanes, carrier service types, geographic factors, and transit status updates. The lack of accurate ETA predictions resulted in:

Reduced supply chain visibility and planning efficiency
Difficulty in coordinating multi-stop international shipments (US, Europe, Asia)
Manual processes for tracking shipments through various transit statuses
Inability to leverage historical shipment data from 4,400+ shipments for predictive insights
Challenges in vendor service optimization with multiple carriers

The company needed a scalable, data-driven solution to predict shipment ETAs using historical patterns, geographic data, and real-time transit information, to boost operational decision-making and customer satisfaction.

We leveraged AWS SageMaker to build machine learning models (Linear Regression and XGBoost) that predict shipment estimated time of arrival (ETA) based on historical transit data, improving supply chain visibility and operational efficiency.

The Challenge

Building accurate ETA predictions required overcoming data inconsistencies, multi-stop complexities, and scaling ML across international lanes. Key insights emerged:

Technical Lessons:

Data Quality Critical: Geocoding accuracy significantly impacts model performance. Required manual corrections for city/country mappings and filtering shipments with missing geographic coordinates, reducing the dataset from raw to ~4,400 usable shipments.
Feature Engineering Impact: Transforming raw transit data into meaningful features (shipment lanes, distance calculations, business day indicators) was more impactful than model complexity. The 64-feature set, including categorical encodings for lanes and service codes, proved effective.
Multi-Stop Complexity: Initial implementation excluded multi-stop pickups/deliveries, simplifying to single PICKEDUP → DELIVRED flows. Future iterations should incorporate multi-leg shipment patterns.
Model Selection: Both Linear Regression and XGBoost achieved similar RMSE (~21-24 hours). Simpler linear models may be preferable for interpretability in production logistics environments.
Endpoint Management: SageMaker endpoint costs require active management - implementing automatic shutdown for unused endpoints prevented unnecessary charges during development.

Operational Lessons:

Data Pipeline Design: Building reusable Jupyter notebooks for data cleaning, geocoding, and feature preparation enabled rapid iteration and model retraining.
Account-Based Routing: Lambda function with account ID routing provides a flexible, multi-tenant architecture for scaling to additional customers.
Validation Requirements: Transit status filtering (PICKEDUP, INTRANST, DELIVRED) and temporal consistency checks were essential for training data quality.

The Solution

We developed an ML-powered predictive ETA solution on AWS that leverages historical shipment data to forecast delivery times with high accuracy.

Architecture Components

Data Layer: Amazon Redshift stores orders, transit updates, pickup/delivery, and geo-data.
Data Processing: Python ETL in SageMaker notebooks handles extraction, geocoding, distance calculations, and 64 features (lanes, service codes, business days).
ML Training Pipeline: SageMaker trains Linear Regression (distance/categoricals) and XGBoost (tuned hyperparameters); models/datasets saved to S3.
Inference Layer: SageMaker endpoints for real-time predictions; Lambda (predictShipmentETA) routes by account ID.

Technologies used:

AWS Lambda
Amazon SageMaker
Amazon Redshift
AWS S3
Python

Team:

Senior Python Engineer
2 Python Engineers
Data Scientist
Software Architect

The Results

The ETA prediction solution deployed seamlessly to production, slashing costs by eliminating on-premises GPU needs (avoiding $10K-50K CapEx), upfront hardware investments, and maintenance. Pay-as-you-go SageMaker with auto-shutdown endpoints, serverless Lambda routing, and automated predictions cuts operational overhead from manual ETA processes, delivering enterprise ML at a fraction of traditional infrastructure costs with elastic scalability.

Performance Metrics:

Linear Regression model achieved RMSE of 21.4 hours on the test dataset (80889 shipments)
XGBoost model achieved an RMSE of 23.9 hours with 50 training rounds
Models trained on 304,441 unique shipments across multiple international shipping lanes
Successfully processes 64 engineered features, including distance, geographic data, and service codes

Operational Outcomes:

Automated ETA prediction capability
Real-time predictions via AWS Lambda routing to SageMaker endpoints
Multi-region support covering US, European, and international shipping lanes (60+ lane combinations)
Account-based model deployment, enabling scalability to additional customers without a linear cost increase
Production-ready REST API integration with existing shipment management systems

Business Impact:

Improved supply chain visibility through data-driven ETA forecasting
Enhanced ability to coordinate multi-stop international shipments
Foundation for predictive analytics across the agricultural equipment logistics network
Scalable ML infrastructure on AWS, enabling future model improvements and additional use cases

The solution validated the feasibility of ML-based shipment prediction and established a framework for continuous model refinement as more historical data becomes available.

Let’s discuss your needs

Unreliable shipment ETAs disrupt supply chains — no more. Whether you're tackling complex international logistics, optimizing carrier performance, or scaling predictive analytics across regions, we can build a custom ML solution on AWS SageMaker for you. At Erbis, we specialize in data-driven supply chain innovations that boost visibility, cut costs, and enable real-time decisions — just like this deployment that slashed RMSE to under 24 hours while avoiding hefty infrastructure expenses.

Ready to predict ETAs with machine learning precision and transform your logistics?

Let’s explore how we can tailor this AWS-powered framework to your operations.

Get In Touch

Industries & Expertise

Industries

Expertise

Services

About Us

Predictive Shipment ETA Using Machine Learning on AWS SageMaker

Client Story

The Challenge

The Solution

The Results

Let’s discuss your needs

Ready to predict ETAs with machine learning precision and transform your logistics?

Share your ideas, get our solutions

Speed

Flexibility

Expert PM, QA, and BA

Expertise

Predictive Shipment ETA Using Machine Learning on AWS SageMaker

Client Story

The Challenge

The Solution

The Results

Let’s discuss your needs

Ready to predict ETAs with machine learning precision and transform your logistics?

Related Cases

Share your ideas, get our solutions

Speed

Flexibility

Expert PM, QA, and BA