Local vs cloud
Every operation Midcore performs — running a robotics policy, embedding a document, generating a PDF report, charting a Monte Carlo — is described in a single canonical catalog. The Compute Router consults that catalog plus your machine’s live capability + your preferences and decides where to run.
The five tiers
| Tier | Where it runs | Examples |
|---|---|---|
| LOCAL_ONLY | Always your machine. Never billed. | Physics preview, image thumbnailing, password-derived key generation, PKB capsule export. |
| PREFER_LOCAL | Your machine when it can. Cloud when local capability is missing. | Monte Carlo simulation, document embedding, PDF extraction, OCR. |
| HYBRID | Meaningful local fast-path AND higher-fidelity cloud. User picks per-op. | τ₀-WM policy inference (local if you have an RTX 4080+; cloud otherwise), large LLM inference (local small model vs cloud premium model). |
| PREFER_CLOUD | Cloud first; local exists only as degraded fallback. | Update checks, light external API pass-through. |
| CLOUD_ONLY | Always cloud. No meaningful local path. | Multi-tenant DB writes, Stripe webhooks, external paid LLM APIs (Anthropic, OpenAI), τ₀-WM cloud fine-tune. |
The decision flow
When you invoke an operation, the router resolves a decision in roughly this order:
- Is the tier fixed? LOCAL_ONLY → LOCAL. CLOUD_ONLY → CLOUD. Done.
- Is the data sensitive? Operations flagged sensitive (passwords, biometric, raw camera feed) go LOCAL regardless of any preference.
- Did the user override this op? If your Account → Compute panel has a per-op override set, that wins (subject to the platform tier).
- What’s the default mode? If you’re on Cloud-first → CLOUD. Otherwise probe local capability against the op’s requirements.
- Can your machine actually run it? The router compares the op’s minimum requirements (CPU cores, free RAM, WebGPU, local model weights cached) against the live capability probe. If yes → LOCAL. If no → CLOUD (with the hold/charge path below).
- Hold + dispatch. For CLOUD, the router pre-flights the cost, places a balance hold, and then dispatches the operation. If the hold fails with 402 (insufficient balance) and a local fallback exists, the router demotes to LOCAL automatically.
- Finalize. When the operation completes, the router calls finalize: the hold becomes a debit + a signed receipt. The receipt lands on your project audit ledger.
What determines local capability
- CPU cores. Some ops declare a minimum (Monte Carlo: 4; some heavier sims: 8).
- Free RAM. Same idea — the router won’t fire a 4 GB local LLM on a machine with 2 GB free.
- Free disk. Per-op model weights have to live somewhere. The capability probe measures the userData partition’s free space.
- WebGPU. Required by a subset of ops (notably image gen, some neural inference). If your renderer doesn’t expose
navigator.gpu, those ops route to cloud. - Local-model readiness. For ops requiring a downloaded weight (τ₀-WM, embedding models, OCR), the Compute panel reports cached/not-cached per op. Until a weight is cached, the op can’t route locally.
Hybrid: choosing local vs cloud per op
HYBRID ops have a meaningful local path and a meaningful cloud path. Examples:
- τ₀-WM policy inference. Local: run the 5.5 B model on your RTX 4080+ at ~140-220 ms per chunk. Cloud: run it on our A100s with a flat per-call charge, useful when your machine doesn’t have the GPU.
- Embedding batch. Local: bge-small-en in ONNX, 30k tokens/sec. Cloud: larger model with slightly higher recall, per-token charge.
- Image generation. Local: small Stable Diffusion through WebGPU. Cloud: higher-fidelity SDXL via metered call.
For HYBRID ops the router defaults to your preference— Local-first or Cloud-first. You can also pin specific ops in the Per-operation overrides table.
The local-first dial isn’t free of trade-offs
Next
Now that you know where things run, Pricing breaks down what cloud compute costs in cents and per-unit.