We need one Azure subscription (sga-ai) plus a parallel PHI-ready subscription (sga-ai-prod), plus an Anthropic Enterprise account with BAA. Once the subscriptions exist and a service principal has Owner, our team provisions everything else by code — Foundry, Postgres, Blob, Search, Document Intelligence, Container Apps — without further IT cycles. Two BAA paths (Microsoft + Anthropic) is what unblocks Phase B (clinical photos, patient comms, treatment data) on a realistic timeline.
sga-ai — dev/sandbox environment, non-PHI by policysga-ai-prod — PHI-ready environment, provisioned now so we have a promotion target ready when prototypes graduate (nothing in it until then)sga-aisga-ai-prod, with prod access gated behind an approval workflow if IT preferssp-sga-ai-deploysga-aisga-ai-prodsga-ai| Resource | Limit |
|---|---|
| Claude Sonnet 4.6 (workhorse) | 500k TPM |
| Claude Opus 4.7 (deep reasoning) | 200k TPM |
| Claude Haiku 4.5 (high-volume) | 1M TPM |
| GPT-5 / o-series | 200k TPM |
| text-embedding-3-large | 500k TPM |
| Azure AI Search | Standard SKU |
| Azure AI Document Intelligence | S0 |
| Azure Container Apps | 20 vCPU / 40 GB |
| Azure Database for PostgreSQL Flexible | enabled |
| Storage accounts | default |
sga-ai with alerts at 50/80/100% to my email. Hard cap if our agreement supports it. (Expect this to need a bump to ~$1,500–2,000/month once active development is underway.) Prod budget set later when something is ready to promote.sga-ai (dev) RGs (rg-sga-ai-dev-*): public endpoints allowed, HIPAA/HITRUST blueprint not applied. Speed-of-iteration matters; PHI excluded by policy and team discipline. Written confirmation that no PHI is to be stored in sga-ai.sga-ai-prod RGs (rg-sga-ai-prod-*): full HIPAA/HITRUST blueprint applied from day 1. Private endpoints required, customer-managed keys on storage, audit logging mandatory.sga-ai-prod.sp-sga-ai-deploy as a member of the relevant Power BI workspaces so it can execute DAX queries against existing semantic models. This preserves the DAX measure library already built (powerbi-bridge, registered semantic models, provider-type thresholds, days-worked rules) — Foundry agents call into those measures rather than redefining them in raw SQL.Microsoft's BAA covers Anthropic-models-on-Azure (via Foundry). Anthropic's direct BAA covers Claude-via-native-API and Claude Code. Having both means we're not single-vendor on the legal side, and Claude Code itself runs under BAA — which it can't if we're only on Azure.
LLM usage dominates everything else. Infrastructure is ~$200–300/mo flat; model spend scales with how hard we push it. Plan ~$100k/year all-in for year one across Azure + Anthropic, climbing to $200–250k/year at mature production scale. Competitive with one full-time engineer for the same throughput.
Months 1–2 · exploration, light load
Month 3+ · real workloads, 6 active users
Full network usage at maturity
| Resource | Monthly |
|---|---|
| Postgres Flexible Server (Burstable B2ms) | ~$50 |
| Container Apps (scale-to-zero, light use) | ~$30 |
| Blob Storage (100 GB Hot + transfer) | ~$3 |
| AI Search (Basic) | ~$75 |
| Document Intelligence (S0, pay-per-page) | ~$20 |
| Key Vault | <$1 |
| App Insights + Log Analytics (5 GB) | ~$15 |
| Container Registry (Basic) | ~$5 |
| Infrastructure subtotal | ~$200 |
Mix-weighted across ~1.5B tokens/month combined usage.
| Model | Volume | Monthly |
|---|---|---|
| Claude Sonnet 4.6 (workhorse, 60%) | ~900M tokens | $2,500–4,000 |
| Claude Opus 4.7 (deep reasoning, 10%) | ~150M tokens | $1,500–2,500 |
| Claude Haiku 4.5 (high-volume cheap, 25%) | ~400M tokens | $400–700 |
| GPT-5 / o-series (fallback, 5%) | ~80M tokens | $300–500 |
| text-embedding-3-large | ~30M tokens | ~$4 |
| Total model spend | ~1.5B | $4.7–7.7k |
Every Azure service we're asking for, explained in non-IT terms. What it does, what it replaces in our current stack, and a concrete example of how we'd use it. Skim the headers; dive in where you want detail.
sonnet-46-default with 500k TPM, and your code calls that endpoint.Scope discipline. If asked "is that everything?", these are deliberately deferred: