Air-gapped migration with a local model: Ollama on a cloud VM

June 12, 2026

Many of the organisations that most need to migrate from Azure DevOps to GitHub Actions are exactly the ones that cannot send pipeline source to a public LLM API: banks, defence, health, anyone under data-residency or export rules. Bifrost is built for them. Its LLM layer is a single LlmProvider trait, and air-gap mode routes every request to a local provider only — with a test target that asserts zero external calls. The local provider of choice is Ollama, running a small model on a machine inside your own network.

This post is the practical version: what to stand up, CPU versus GPU, how it connects, and a real scenario for how it should be used.

What “air-gapped” actually means here

Air-gap does not have to mean “no LLM.” In most regulated environments it means in-network providers only: you can reach a model running on a VM in your own VPC, or a private, in-tenancy frontier endpoint — but not the public internet. Bifrost models this directly. A provider declares whether it is_local; in air-gap mode the router silently skips every non-local provider, so a frontier never receives pipeline data and no external call is ever made. The assistant, the bulk gap-fills, everything — all local.

LLM routing policy and the air-gap toggle

The setup: Ollama on an in-network VM

Stand up one VM inside your network (a cloud VPC subnet with no egress, or on-prem), install Ollama, and pull a small instruct or coding model. Bifrost talks to it over Ollama’s HTTP API.

# On the in-network VM
curl -fsSL https://ollama.com/install.sh | sh
ollama pull <a-small-instruct-model>     # a few GB, quantised
ollama serve                              # exposes http://<vm>:11434

Point Bifrost at it — either as the first-class Ollama provider or as a generic OpenAI-compatible endpoint (Ollama serves both):

export OLLAMA_BASE_URL="http://<vm>:11434"
export BIFROST_AIR_GAP=1                   # force local-only routing

Or add it on the Connections page as an LLM provider (Ollama, or “OpenAI-compatible” pointed at http://<vm>:11434/v1), marked local so the router treats it as air-gap-eligible. Secrets are stored as references, never values.

CPU or GPU — both are useful

Bifrost’s grounded gap-fill is a small, bounded task: it hands the model the source snippet, the Importer’s converted output, and the specific failure, and asks it to fill that gap. That is the key to making a small model viable.

CPU. A small quantised model (think a few billion parameters at 4-bit) runs on an ordinary multi-core VM. Throughput is modest — seconds per gap — but for a portfolio you convert in the background and review, so latency rarely matters. This is the cheapest air-gap option and needs no special hardware.
GPU. Add a single mid-range GPU and the same model runs many times faster, and you can step up to a larger local model for the harder, classic-pipeline gaps. Use a GPU when you have thousands of pipelines, tight migration windows, or want a bigger model for reasoning-heavy conversions.

Either way the rule is the same: route bulk, mechanical fills to the small local model, and reserve the heavier model — local GPU, or an in-network private frontier — for the hard reasoning. That is exactly what the routing policy expresses.

Other local and in-network options

Ollama is the simplest, but the trait is the point — nothing in orchestration is tied to it:

llama.cpp / vLLM behind the OpenAI-compatible provider, for higher throughput serving.
A private, in-tenancy frontier endpoint — Azure OpenAI on a private endpoint, Vertex in your project, or a Bedrock-style gateway — added as a provider and marked local/in-network. In air-gap mode these are eligible precisely because they never leave your network.
A mix: the small Ollama model for bulk, a private frontier for the few hard gaps. The router picks per task class; air-gap still excludes anything truly external.

A real-life scenario

A bank is moving 1,800 Azure DevOps pipelines to GitHub Enterprise Cloud under a rule that no pipeline definition may touch a public AI service.

Stand up the model. Platform engineering provisions one GPU VM in the bank’s VPC, installs Ollama, pulls a small quantised instruct model, and confirms the subnet has no public egress.
Connect and lock down. Bifrost is deployed in the same VPC. On Connections they add the Ollama provider (local) and the Azure DevOps source. They set BIFROST_AIR_GAP=1 and lock it. The air-gap test asserts zero external calls; the audit log records the posture on every job.
Assess and forecast. They run the audit — the portfolio heatmap, the Assessment of source inventory, the Forecast of GitHub cost, and the Coverage matrix — all computed deterministically, no model involved.
Convert locally. Bulk, mechanical gap-fills run against the small local model overnight. The handful of classic-pipeline gaps that need real reasoning are routed to a larger model on the same GPU. Not one byte leaves the network.
Review and deliver. Engineers review each proposal in the three-pane diff, ask the grounded assistant — also local — about cost or coverage, approve or edit, and Bifrost opens pull requests. The change board gets the per-project PDF report.

The migration runs at portfolio scale, semantically assisted, fully reviewed — and provably inside the bank’s walls.

How it should be used

Small model for bulk, bigger for hard. Don’t reach for a large model to fill a PublishBuildArtifacts gap; a small local model does it. Save the GPU/larger model for the classic-pipeline tail.
Air-gap on, and locked. Turn it on and lock it for regulated work; let the zero-egress test be part of your evidence.
Review-first, always. The local model fills gaps from the diff and explains them; a human still approves every change before a PR. The model never scores risk and never prices cost — those stay deterministic.

If you can run one small model on one machine inside your network, you can run a reviewed, documented, portfolio-scale migration without any pipeline data ever leaving it.