iJarvis Compute Fabric · Tampa, FL · est. 2026

shard,
not a marketplace.

A five-node Blackwell fleet, hand-routed by the named operator who built it. Reserve a node by the hour, or take a private slice for six months — with US-only data residency in writing and one human on call. Not the cheapest. Not the biggest. Specific.

01The fleet

Five nodes on a single Tailscale WireGuard mesh. We don't pretend it's a region. We name them.

spark-alphaOnline
NVIDIA DGX Spark FE
GB10 Grace Blackwell · 128 GB UMA
Util62%
FP4 nativetampa-1
spark-bravoOnline
NVIDIA DGX Spark FE
GB10 Grace Blackwell · 128 GB UMA
Util48%
FP4 nativetampa-1
ijarvis-slimReserved
NVIDIA RTX PRO 6000
Blackwell · 96 GB GDDR7
Util91%
Anchor: NPCxtampa-1
ijarvis-thickOnline
NVIDIA RTX 5090
Blackwell · 32 GB GDDR7
Util34%
Ubuntu 24.04tampa-1
ijarvis-thinOnline
NVIDIA RTX 5090
Blackwell · 32 GB GDDR7
Util22%
Bookabletampa-1
Aggregate VRAM416 GB Nodes5 Mesh latency~1 ms LAN Updated just now
Demo telemetry · live data at status.ishard.us when ready · machine-readable at fleet.json
02Why iShard

What you can't buy from a hyperscaler.

01 / Silicon

Blackwell, on every node.

Every node is GB10 or Blackwell. FP4 is native. The PRO 6000 ships 96 GB of GDDR7 on a single card. The DGX Sparks each carry 128 GB unified across CPU and GPU — a config still rare on commodity rental marketplaces. If 128 GB UMA is what your model wants, this is the most direct way to get it.

02 / Topology

Heterogeneous, hand-routed.

UMA Sparks for huge contiguous models. PRO 6000 for production-grade dedicated workloads. RTX 5090s for fast small-model serving. Routing is decided by the operator reading your batch profile, not by a load balancer assigning whichever spot card surfaced first.

03 / Operations

A named operator. In writing.

On a marketplace you get an anonymous host ID and a support ticket. Here you get a name, a Slack channel, and a contract that puts US-only data residency in writing. When something breaks, the human who built it picks up. That is the package, and it is the entire reason to choose us over the cheaper option.

03Pricing

Two ways to buy. One honest answer to "how much."

We do not run an undifferentiated token API. Groq, Deepinfra, Together, and Fireworks are faster and cheaper at that specific job. Buy from us when you need a specific Blackwell card, your model, your routing, and a contract — not a marketplace lottery.

BRING YOUR OWN STACKany tier

Your model. Your quant.

We run open-weight models at whatever precision your hardware budget supports — current generation (Llama 4, GPT-OSS-120B, DeepSeek V3.2, Qwen3) and your custom checkpoints. Model selection is per-tenant and per-deployment.

Open-weight catalogIncludedin hourly
Custom checkpoint+15% surcharge
QuantizationFP4 / FP8 / INT8 / FP16
Serving stackvLLM, hand-tuned
  • OpenAI-compatible /v1/chat/completions per deployment
  • Streaming, function calling, structured outputs
  • vLLM tuned to your batch profile by the operator
  • FP4 native on Blackwell — request it
NOTEThis tile describes how inference works once you reserve a node — there is no separate per-token bill. You pay hourly (Dedicated) or monthly (Private Fleet) and run your model.
DEDICATED SHARDRecommended

Reserve a node, by the hour.

Lock a specific Blackwell GPU end-to-end. Your model, your quant, your batch settings, your traffic. We do not share the card with anyone else while you hold it.

RTX 5090 (32 GB)$0.95/hr
DGX Spark (128 GB UMA)$1.65/hr
RTX PRO 6000 (96 GB)$1.95/hr
Spark Pair (256 GB UMA)$3.25/hr
  • Bring-your-own model + quant
  • vLLM tuning to your batch profile
  • 1-hour minimum, then per-second billing
  • Slack channel with the operator
  • US-only data residency, written into the engagement
SLABest-effort 99.0% during reserved window. Single-site, single-operator. Disclosed up front. Pricing carries a modest premium over commodity spot rentals — that premium pays for the named operator and the residency commitment.
PRIVATE FLEETmonthly · the lead product

Take a slice of the fleet.

Reserve up to five nodes for six months or longer. Custom routing, custom isolation, custom SLA. White-glove operations from a named operator. For teams shipping AI as a feature and unwilling to be one ticket of many.

Discount vs hourly25–35%
Minimum term6months
Onboarding$0included
Named SLACustom
  • Up to 5 nodes reserved
  • Dedicated Tailscale ACL
  • US-only data residency, written into contract
  • Custom kernels / quants / model
  • Quarterly business review with the operator
SLANegotiated. Single-site constraint disclosed in contract until Q4 2026 multi-site rollout. This is the tier built for teams that need a contract — for steady-state production usage, the math beats hourly past roughly 400 hours per node per month.
Prices listed per GPU-hour. There is no separate per-token bill on either tier — you reserve a node and run your model on it. Machine-readable: pricing.json
04Case study

The first tenant is us.

NPCx · engine.npcx.gg

An AI dialogue engine for FiveM roleplay servers, in production for PENTA.

NPCx is iShard's first internal tenant. It serves real-time character dialogue and cinematic camera direction inside FiveM RP environments, with PENTA — a 364K-follower Twitch streamer — as its largest end-user. The model stack runs on ijarvis-slim via vLLM on port 8100, with Kokoro TTS on CPU port 8400 and an HTTP proxy on 8300.

It is our public proof that the substrate works. We list it because traction is the only honest thing to list. Pretending we have a thousand external customers when we do not is the fastest way to lose the ones we earn.

First iShard tenantNPCx
NPCx end-userPENTA
Downstream Twitch reach364K
External iShard customers0 today
05Hardware

What the iron actually is.

Node ID Silicon Memory Native precision Role Bookable
spark-alphaNVIDIA GB10 Grace Blackwell128 GB UMAFP4 / FP8 / FP16Large model servingYes
spark-bravoNVIDIA GB10 Grace Blackwell128 GB UMAFP4 / FP8 / FP16Large model · pair-link capableYes
ijarvis-slimNVIDIA RTX PRO 6000 Blackwell96 GB GDDR7FP4 / FP8 / INT8 / FP16Production · NPCx anchorPartial
ijarvis-thickNVIDIA RTX 5090 Blackwell32 GB GDDR7FP4 / FP8 / FP16Fast small-model servingYes
ijarvis-thinNVIDIA RTX 5090 Blackwell32 GB GDDR7FP4 / FP8 / FP16Fast small-model servingYes
06How it works

Four steps to a shard.

i.

Tell us the workload

Model, quant, expected RPS, latency target, residency requirements. Two-paragraph form, no sales call required.

ii.

Match to a node

Routing is decided by the operator, not a load balancer. Spark for big-context UMA, PRO 6000 for prod throughput, 5090 for sub-100ms TTFT on small models.

iii.

Provision & tune

vLLM stood up with your batch size, your kv-cache budget, your speculative-decoding setting. You get an OpenAI-compatible base_url and a Tailscale invite if Private Fleet.

iv.

Ship

Stream traffic. Watch tokens-per-second on the operator dashboard. Cancel any time on hourly. We will tell you if a different shard would suit you better. We would rather lose the upgrade than the trust.

07Agents

Agents are first-class buyers.

Pricing, fleet state, capabilities, and reservations are all published as machine-readable files at the same paths a human would visit. Crawlers and MCP-aware agents read the same data the homepage renders.

Reservation flow: an agent POSTs to the API, gets a single-use Stripe payment link back, and the link's holder (agent or delegated human) completes payment. iShard then issues a per-tenant Tailscale ACL and a scoped base_url. No card data ever crosses the agent boundary.

$ curl · reserve a Dedicated Shard via API
# POST /v1/reservations
curl https://api.ishard.us/v1/reservations \
  -H "Authorization: Bearer $ISHARD_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "node_class": "spark",
    "duration_hours": 4,
    "model": "meta-llama/Llama-4-70B-Instruct",
    "quant": "fp8"
  }'

# Response includes a single-use Stripe payment link.
# Settlement triggers Tailscale ACL issuance + base_url.
{
  "reservation_id": "rsv_01H...",
  "node": "spark-bravo",
  "window": { "start": "2026-05-15T18:00Z", "end": "2026-05-15T22:00Z" },
  "pay": "https://buy.stripe.com/...",
  "base_url_on_settlement": "https://spark-bravo.ishard.us/v1"
}

Agents pay via single-use Stripe payment links scoped to one reservation — no agent-readable card details ever required. Per-tenant Tailscale ACLs are issued on settlement. The API surface is open today for partners; general availability tracks the Q3 control-plane build. Email [email protected] to onboard early.

WHAT'S WRONG WITH IT

Five nodes is the feature, not the bug.

iShard is small on purpose, and small for now. We do not run thousands of GPUs because we do not want to. The buyers we are built for need a specific Blackwell card with their model on it, hand-tuned by an operator who reads their batch profile and does not guess.

We would rather be specific about what we are than blur into the mid-tier of the inference market. Here is everything you would otherwise have to extract from us — including the alternatives that are right for buyers we are not right for:

  • SCALEFive GPUs, single site, one operator. We are not a hyperscaler and we will not pretend to be one. If you need 1,000 concurrent inference replicas, we are the wrong choice today.
  • PRICE FLOORSpot GPU rental on Vast, Spheron, Runpod, Lambda will be cheaper for the same silicon. If price is the only variable, go there. The premium here pays for the named operator and the residency commitment.
  • TOKEN APISGroq is faster on Llama 3.3 70B. Deepinfra is cheaper. Together and Fireworks are battle-tested. Use them for undifferentiated, high-volume token traffic. Use us when the workload needs a specific card with your model on it.
  • SITEOne physical site (Tampa, FL) until Q4 2026 multi-site split. Your single Dedicated Shard goes down if our power does. Disclosed in the SLA, not buried.
  • PAGEROne operator on call. You will know their name on Private Fleet. Senior product management in ad-fraud and data integrity is the day job.
  • CAPACITYFour nodes are fully externally bookable; ijarvis-slim runs internal NPCx workload as the anchor tenant — partially shared, never silently overcommitted. Effective external capacity: 4.5 of 5.
  • COMPLIANCENo SOC 2 today. No HIPAA BAA today. US-only data residency, written into Private Fleet contracts. Encryption in transit (TLS 1.3) and at rest (LUKS / dm-crypt) on production volumes.
  • ROADMAPQ3 2026: status.ishard.us with real fleet telemetry; control-plane GA. Q4: second site (colo) for multi-site Private Fleet. 2027: optional GPU pooling with vetted partners under iShard SLA.
ishard.us · iJarvis Compute

Reserve a shard. Or talk to the operator.