iJarvis Compute Fabric · Tampa, FL · est. 2026

Rent a shard,
not a queue.

A five-node Blackwell fleet, hand-routed by the operator who built it. Pay per token, reserve a node, or take the fleet. Humans and agents both buy here.

01The fleet

Five nodes on a single Tailscale WireGuard mesh. We don't pretend it's a region. We name them.

spark-alphaOnline
NVIDIA DGX Spark FE
GB10 Grace Blackwell · 128 GB UMA
Util62%
FP4 nativetampa-1
spark-bravoOnline
NVIDIA DGX Spark FE
GB10 Grace Blackwell · 128 GB UMA
Util48%
FP4 nativetampa-1
ijarvis-slimReserved
NVIDIA RTX PRO 6000
Blackwell · 96 GB GDDR7
Util91%
Anchor: NPCxtampa-1
ijarvis-thickOnline
NVIDIA RTX 5090
Blackwell · 32 GB GDDR7
Util34%
Ubuntu 24.04tampa-1
ijarvis-thinOnline
NVIDIA RTX 5090
Blackwell · 32 GB GDDR7
Util22%
Bookabletampa-1
Aggregate VRAM416 GB Cores5 Mesh latency~1 ms LAN Updated just now
Demo telemetry · live data at status.ishard.us when ready · machine-readable at fleet.json
02Why iShard

What you can't buy from a hyperscaler.

01 / Silicon

Blackwell, not last gen.

Every node is GB10 or Blackwell. FP4 is native. The PRO 6000 ships 96 GB of GDDR7 in a single card. The DGX Sparks each carry 128 GB unified across CPU and GPU. Most providers still serve generic workloads on shared H100 pools.

02 / Topology

A heterogeneous mesh.

UMA Sparks for huge contiguous models. PRO 6000 for production-grade dedicated workloads. RTX 5090s for fast small-model serving. Routing is decided by the workload, not by which bin had a free slot. Your 70B FP16 fits on one Spark — no sharding overhead unless you want it.

03 / Operations

One operator, on call.

iJarvis owns the silicon, the rack, the network, the OS, the inference layer, and the routing logic. When something breaks, the human who built it picks up. There is no L1 ticket queue. There is also no datacenter rent in our COGS — that is why our hourly rate is what it is.

03Pricing

Three tiers. No upsell. No PDF quote.

OPEN SHARDper-token

Pay-per-token.

Drop in via OpenAI-compatible endpoint. We route to whichever Blackwell card has headroom. Best-effort, no reservation.

Llama 3.3 70B class$0.55/M tok
DeepSeek V3 class$1.10/M tok
Llama 405B class$2.80/M tok
Custom checkpoint+15% surcharge
  • OpenAI-compatible /v1/chat/completions
  • Standard open-weight catalog
  • Streaming, function calling, JSON mode
  • $10 credit on signup
SLABest-effort. No 99.x uptime commitment. Status page is honest. Use this tier for non-critical workloads or to qualify before reserving.
DEDICATED SHARDRecommended

Reserve a node.

Lock a specific GPU end-to-end. Your model, your quant, your batch settings. We do not share the card with anyone else while you hold it.

RTX 5090 (32 GB)$1.50/hr
DGX Spark (128 GB UMA)$2.50/hr
RTX PRO 6000 (96 GB)$3.50/hr
Spark Pair (256 GB UMA)$4.75/hr
  • Bring-your-own model + quant
  • vLLM tuning to your batch profile
  • Hourly billing, no commit
  • Slack channel with the operator
  • FP4 / FP8 / INT8 / FP16 routing
SLABest-effort 99.0% during reserved window. Single-site, single-operator. Disclosed up front. Pager rotation: one human.
PRIVATE FLEETmonthly

Take the fleet.

Reserve N nodes for a month or longer. Custom routing, custom isolation, custom SLA. White-glove operations. For teams shipping AI as a feature.

Discount vs hourly30–40%
Minimum term6months
Onboarding$0included
Named SLACustom
  • Up to 5 nodes reserved
  • Dedicated Tailscale ACL
  • US-only data residency, written
  • Custom kernels / quants / model
  • Quarterly business review with operator
SLANegotiated. Single-site constraint disclosed in contract until Q4 2026 multi-site rollout.
Prices listed per million tokens / per GPU-hour. Machine-readable: pricing.json
04Case study

The first tenant is us.

NPCx · engine.npcx.gg

An AI dialogue engine for FiveM roleplay servers, in production for PENTA.

NPCx is iShard's first internal tenant. It serves real-time character dialogue and cinematic camera direction inside FiveM RP environments, with PENTA — a 364K-follower Twitch streamer — as its largest end-user. The model stack runs on ijarvis-slim via vLLM on port 8100, with Kokoro TTS on CPU port 8400 and an HTTP proxy on 8300.

It is our public proof that the substrate works. We list it because traction is the only honest thing to list. Pretending we have a thousand external customers when we do not is the fastest way to lose the ones we earn.

First iShard tenantNPCx
NPCx end-userPENTA
Downstream Twitch reach364K
External iShard customers0 today
05Hardware

What the iron actually is.

Node ID Silicon Memory Native precision Role Bookable
spark-alphaNVIDIA GB10 Grace Blackwell128 GB UMAFP4 / FP8 / FP16Large model servingYes
spark-bravoNVIDIA GB10 Grace Blackwell128 GB UMAFP4 / FP8 / FP16Large model · pair-link capableYes
ijarvis-slimNVIDIA RTX PRO 6000 Blackwell96 GB GDDR7FP4 / FP8 / INT8 / FP16Production · NPCx anchorPartial
ijarvis-thickNVIDIA RTX 5090 Blackwell32 GB GDDR7FP4 / FP8 / FP16Fast small-model servingYes
ijarvis-thinNVIDIA RTX 5090 Blackwell32 GB GDDR7FP4 / FP8 / FP16Fast small-model servingYes
06How it works

Four steps to a shard.

i.

Tell us the workload

Model, quant, expected RPS, latency target, residency requirements. Two-paragraph form, no sales call required.

ii.

Match to a node

Routing is decided by the operator, not a load balancer. Spark for big-context UMA, PRO 6000 for prod throughput, 5090 for sub-100ms TTFT on small models.

iii.

Provision & tune

vLLM stood up with your batch size, your kv-cache budget, your speculative-decoding setting. You get an OpenAI-compatible base_url and a Tailscale invite if Private Fleet.

iv.

Ship

Stream traffic. Watch tokens-per-second on the operator dashboard. Cancel any time on hourly. We will tell you if a different shard would suit you better. We would rather lose the upgrade than the trust.

07Agents

Agents are first-class buyers.

Pricing, fleet state, capabilities, and reservations are all published as machine-readable files at the same paths a human would visit. Crawlers and MCP-aware agents read the same data the homepage renders.

Reservation flow: an agent POSTs to the API, gets a single-use Stripe payment link back, and the link's holder (agent or delegated human) completes payment. iShard then issues a per-tenant Tailscale ACL and a scoped base_url. No card data ever crosses the agent boundary.

$ curl · reserve a Dedicated Shard via API
# POST /v1/reservations
curl https://api.ishard.us/v1/reservations \
  -H "Authorization: Bearer $ISHARD_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "node_class": "spark",
    "duration_hours": 4,
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "quant": "fp8"
  }'

# Response includes a single-use Stripe payment link.
# Settlement triggers Tailscale ACL issuance + base_url.
{
  "reservation_id": "rsv_01H...",
  "node": "spark-bravo",
  "window": { "start": "2026-05-07T18:00Z", "end": "2026-05-07T22:00Z" },
  "pay": "https://buy.stripe.com/...",
  "base_url_on_settlement": "https://spark-bravo.ishard.us/v1"
}

Agents pay via single-use Stripe payment links scoped to one reservation — no agent-readable card details ever required. Per-tenant Tailscale ACLs are issued on settlement. The API surface is open today for partners; general availability tracks the Q3 control-plane build. Email hello@ijarvis.ai to onboard early.

WHAT'S WRONG WITH IT

Five nodes is the feature, not the bug.

iShard is small on purpose, and small for now. We do not run thousands of GPUs because we do not want to. The buyers we are built for need a Blackwell card with their model on it, tuned by a human who reads their batch profile and does not guess.

We would rather be specific about what we are than blur into the mid-tier of the inference market. So here is everything you would otherwise have to extract from us:

  • SITEOne physical site (Tampa, FL) until Q4 2026 multi-site split. Your single Dedicated Shard goes down if our power does. Open Shard work fails over within the fleet only.
  • PAGEROne operator on call. You will know their name on Private Fleet. Pixalate is their day job.
  • CAPACITY3.5 nodes are externally bookable. ijarvis-slim runs internal NPCx workload — partially shared, never silently overcommitted.
  • PRICE FLOORWe are not cheaper than the cheapest serverless 70B on the market. We are not trying to be. If price is the only variable, go there.
  • COMPLIANCENo SOC 2 today. No HIPAA BAA today. US-only data residency, written into Private Fleet contracts. Encryption in transit (TLS 1.3) and at rest (LUKS / dm-crypt) on production volumes.
  • ROADMAPQ3 2026: status.ishard.us with real fleet telemetry; control-plane GA. Q4: second site (colo) for multi-site Private Fleet. 2027: optional GPU pooling with vetted partners under iShard SLA.
ishard.us · iJarvis Compute

Open a shard. Or talk to the operator.