Safely distribute new Machine Learning models to millions of iPhones over-the-air

For some apps, it may be sufficient to train a machine learning (ML) model once and ship it with the app itself. However, most mobile apps are dynamic, constantly changing and evolving. It is therefore important to adapt and improve your machine learning models quickly, without doing a full app release and going through the whole App Store release & review process.

‍

In this series, we will explore how to operate machine learning models directly on your device instead of relying on external servers via network requests. Running models on-device enables immediate decision-making, eliminates the need for an active internet connection, and can significantly lower infrastructure expenses, reinforcing ContextSDK's position as a leading on-device AI startup.

‍

In the example of this series, we’re using a model to make a decision on when to prompt the user to upgrade to the paid plan based on a set of device-signals, to reduce user annoyances while increasing our paid subscribers.

‍

Step 1: Shipping a base-model with your app’s binary

We believe in the craft of beautiful, reliable, and fast mobile apps. Running machine-learning models on-device makes your app responsive, snappy, and reliable. One aspect to consider is the first app launch, which is critical to prevent churn and get the user hooked to your app.

To ensure your app works out of the box right after its installation, we recommend shipping your pre-trained CoreML file with your app. Our part 1 covers how to easily achieve this with Xcode

‍

Step 2: Check for new CoreML updates

Your iOS app needs to know when a new version of the machine learning file is available. This is as simple as regularly sending an empty network request to your server. Your server doesn’t need to be sophisticated; we initially started with a static file host (like S3, or alike) that we update whenever we have a new model ready.

The response could use whatever versioning you prefer:

A version number of your most recent model
The timestamp your most recent model was trained
A checksum
A randomly generated UUID

‍

Whereas the iOS client would compare the version number of the most recently downloaded model with whatever the server responds with. The approach you choose is up to your strategy on how you want to roll out, monitor, and version your machine learning models.

Over time, you most likely want to optimize the number of network requests. Our approach combines a smart mechanism where we’d combine the outcome collection we use to train our machine learning models with the model update checks, while also leveraging a flushing technique to batch many events together to minimize overhead and increase efficiency.

‍

Ideally, the server’s response already contains the download URL of the latest model, here is an example response: https://krausefx.github.io/CoreMLDemo/latest_model_details.json

‍

The above example is a little simplified, and we’re using the model’s file name as our version to identify each model.

You’ll also need to consider which app version is supported. In our case, a new ContextSDK version may implement additional signals that are used as part of our model. Therefore, we provide the SDK version as part of our initial polling request, and our server responds with the latest model version that’s supported.

First, we’re doing some basic scaffolding, creating a new ModelDownloadManager class:

‍

And now to the actual code: Downloading the model details to check if a new model is available:

‍

Step 3: Download the latest CoreML file

If a new CoreML model is available, your iOS app now needs to download the latest version. You can use any method of downloading the static file from your server:

‍

Considering Costs

Depending on your user-base, infrastructure costs will be a big factor in how you’re going to implement the on-the-fly update mechanism.

‍

For example, an app with 5 Million active users and a CoreML file size of 1 Megabyte would generate a total data transfer of 5 Terabytes. If you were to use a simple AWS S3 bucket directly with $0.09 per GB egress costs, this would yield costs of about $450 for each model rollout (not including the free tier).

As part of this series, we will talk about constantly rolling out new, improved challenger models, running various models in parallel, and iterating quickly, paying this amount isn’t a feasible solution.

‍

One easy fix for us was to leverage CloudFlare R2, which is faster and significantly cheaper. The same numbers as above cost us less than $2 and would be completely free if we include the free tier.

‍

Step 4: Compile the CoreML file on-device

After successfully downloading the CoreML file, you need to compile it on-device. While this sounds daunting, Apple made it a seamless, easy, and safe experience. Compiling the CoreML file on-device is a requirement and ensures that the file is optimized for the specific hardware it runs on.

You are responsible for the file management, including ensuring that you store the resulting ML file in a permanent location. In general, file management on iOS can be a little tedious, covering all the various edge cases.

You can also find the official Apple Docs on Downloading and Compiling a Model on the User’s Device

Step 5: Additional checks and clean-ups

We don’t yet have a logic on how we decide if we want to download the new model. In this example, we’ll do something very basic: each model’s file-name is a unique UUID. All we need to do is to check if a model under the exact file name is available locally:

‍

Of course, we want to be a good citizen and delete all older models from the local storage. Also, for this sample project, this is required, as we’re using UUIDs for versioning, meaning the iOS client actually doesn’t know about which version is higher. For sophisticated systems, it’s quite common to not have this transparency to the client, as the backend may be running multiple experiments and challenger models in parallel across all clients.

‍

Step 6: Execute the newly downloaded CoreML file instead of the bundled version

Now all that’s left is to automatically switch between the CoreML file that we bundled within our app and the file we downloaded from our servers, whereas we’d always want to prefer the one we downloaded remotely.

In our ModelDownloadManager, we want an additional function that exposes the model we want to use. This can either be the bundled CoreML model or the CoreML model downloaded most recently over-the-air.

There are almost no changes needed to our code base from part 1. Instead of using the MyFirstCustomModel initializer directly, we now need to use the newly created .latestModel() method.

‍

Step 7: Decide when you want to trigger the update check

The only remaining code that’s left: triggering the update check. When you do that will highly depend on your app and the urgency with which you want to update your models.

Demo App

As part of this series, we’ve built out a demo app that shows all of this end-to-end in action. You can find it available here on GitHub: https://github.com/KrauseFx/CoreMLDemo

What’s next?

Today we’ve covered how you can roll out new machine learning models directly to your users’ iPhones, running them directly on their ML-optimized hardware. Using this approach, you can make decisions on what type of content or prompts you show based on the user's context, powered by on-device machine learning execution. Updating CoreML files quickly, on-the-fly without going through the full App Store release cycle, is critical to quickly react to changing user-behaviors when introducing new offers in your app, and to constantly improve your app, be it increasing your conversion rates, reducing annoyances and churn, or optimizing other parts of your app.

This is just the beginning: Next up, we will talk about how to manage the rollout of new ML models, in particular:

How to safely rollout new models: monitor, pause or rollback faulty models
How to monitor performance of deployed models
How to reliably compare performance between models, and the baseline performance

‍

Excited to share more on what we’ve learned when building ContextSDK to power hundreds of machine learning models distributed across more than 25 Million devices. As a leading on-device AI startup, ContextSDK offers innovative solutions that enable mobile apps to effectively optimize engagement and user experience.

All Product Growth Market Engineering