iJarvis Compute Fabric · Tampa, FL · est. 2026

A shard,
not a marketplace.

A five-node Blackwell fleet, hand-routed by the named operator who built it. Reserve a node by the hour, or take a private slice for six months — with US-only data residency in writing and one human on call. Not the cheapest. Not the biggest. Specific.

View pricing See the fleet What it isn't

01The fleet

Five nodes on a single Tailscale WireGuard mesh. We don't pretend it's a region. We name them.

spark-alphaOnline

NVIDIA DGX Spark FE

GB10 Grace Blackwell · 128 GB UMA

Util62%

FP4 nativetampa-1

spark-bravoOnline

NVIDIA DGX Spark FE

GB10 Grace Blackwell · 128 GB UMA

Util48%

FP4 nativetampa-1

ijarvis-slimReserved

NVIDIA RTX PRO 6000

Blackwell · 96 GB GDDR7

Util91%

Anchor: NPCxtampa-1

ijarvis-thickOnline

NVIDIA RTX 5090

Blackwell · 32 GB GDDR7

Util34%

Ubuntu 24.04tampa-1

ijarvis-thinOnline

NVIDIA RTX 5090

Blackwell · 32 GB GDDR7

Util22%

Bookabletampa-1

Aggregate VRAM416 GB Nodes5 Mesh latency~1 ms LAN Updated just now

Demo telemetry · live data at status.ishard.us when ready · machine-readable at fleet.json

02Why iShard

What you can't buy from a hyperscaler.

01 / Silicon

Blackwell, on every node.

Every node is GB10 or Blackwell. FP4 is native. The PRO 6000 ships 96 GB of GDDR7 on a single card. The DGX Sparks each carry 128 GB unified across CPU and GPU — a config still rare on commodity rental marketplaces. If 128 GB UMA is what your model wants, this is the most direct way to get it.

02 / Topology

Heterogeneous, hand-routed.

UMA Sparks for huge contiguous models. PRO 6000 for production-grade dedicated workloads. RTX 5090s for fast small-model serving. Routing is decided by the operator reading your batch profile, not by a load balancer assigning whichever spot card surfaced first.

03 / Operations

A named operator. In writing.

On a marketplace you get an anonymous host ID and a support ticket. Here you get a name, a Slack channel, and a contract that puts US-only data residency in writing. When something breaks, the human who built it picks up. That is the package, and it is the entire reason to choose us over the cheaper option.

03Pricing

Two ways to buy. One honest answer to "how much."

We do not run an undifferentiated token API. Groq, Deepinfra, Together, and Fireworks are faster and cheaper at that specific job. Buy from us when you need a specific Blackwell card, your model, your routing, and a contract — not a marketplace lottery.

BRING YOUR OWN STACKany tier

Your model. Your quant.

We run open-weight models at whatever precision your hardware budget supports — current generation (Llama 4, GPT-OSS-120B, DeepSeek V3.2, Qwen3) and your custom checkpoints. Model selection is per-tenant and per-deployment.

Open-weight catalogIncludedin hourly

Custom checkpoint+15% surcharge

QuantizationFP4 / FP8 / INT8 / FP16

Serving stackvLLM, hand-tuned

OpenAI-compatible /v1/chat/completions per deployment
Streaming, function calling, structured outputs
vLLM tuned to your batch profile by the operator
FP4 native on Blackwell — request it

NOTEThis tile describes how inference works once you reserve a node — there is no separate per-token bill. You pay hourly (Dedicated) or monthly (Private Fleet) and run your model.

Ask about a model

DEDICATED SHARDRecommended

Reserve a node, by the hour.

Lock a specific Blackwell GPU end-to-end. Your model, your quant, your batch settings, your traffic. We do not share the card with anyone else while you hold it.

RTX 5090 (32 GB)$0.95/hr

DGX Spark (128 GB UMA)$1.65/hr

RTX PRO 6000 (96 GB)$1.95/hr

Spark Pair (256 GB UMA)$3.25/hr

Bring-your-own model + quant
vLLM tuning to your batch profile
1-hour minimum, then per-second billing
Slack channel with the operator
US-only data residency, written into the engagement

SLABest-effort 99.0% during reserved window. Single-site, single-operator. Disclosed up front. Pricing carries a modest premium over commodity spot rentals — that premium pays for the named operator and the residency commitment.

Reserve a shard

PRIVATE FLEETmonthly · the lead product

Take a slice of the fleet.

Reserve up to five nodes for six months or longer. Custom routing, custom isolation, custom SLA. White-glove operations from a named operator. For teams shipping AI as a feature and unwilling to be one ticket of many.

Discount vs hourly25–35%

Minimum term6months

Onboarding$0included

Named SLACustom

Up to 5 nodes reserved
Dedicated Tailscale ACL
US-only data residency, written into contract
Custom kernels / quants / model
Quarterly business review with the operator

SLANegotiated. Single-site constraint disclosed in contract until Q4 2026 multi-site rollout. This is the tier built for teams that need a contract — for steady-state production usage, the math beats hourly past roughly 400 hours per node per month.

Talk to the operator

Prices listed per GPU-hour. There is no separate per-token bill on either tier — you reserve a node and run your model on it. Machine-readable: pricing.json

04Case study

The first tenant is us.

NPCx · engine.npcx.gg

An AI dialogue engine for FiveM roleplay servers, in production for PENTA.

NPCx is iShard's first internal tenant. It serves real-time character dialogue and cinematic camera direction inside FiveM RP environments, with PENTA — a 364K-follower Twitch streamer — as its largest end-user. The model stack runs on ijarvis-slim via vLLM on port 8100, with Kokoro TTS on CPU port 8400 and an HTTP proxy on 8300.

It is our public proof that the substrate works. We list it because traction is the only honest thing to list. Pretending we have a thousand external customers when we do not is the fastest way to lose the ones we earn.

First iShard tenantNPCx

NPCx end-userPENTA

Downstream Twitch reach364K

External iShard customers0 today

05Hardware

What the iron actually is.

Node ID	Silicon	Memory	Native precision	Role	Bookable
spark-alpha	NVIDIA GB10 Grace Blackwell	128 GB UMA	FP4 / FP8 / FP16	Large model serving	Yes
spark-bravo	NVIDIA GB10 Grace Blackwell	128 GB UMA	FP4 / FP8 / FP16	Large model · pair-link capable	Yes
ijarvis-slim	NVIDIA RTX PRO 6000 Blackwell	96 GB GDDR7	FP4 / FP8 / INT8 / FP16	Production · NPCx anchor	Partial
ijarvis-thick	NVIDIA RTX 5090 Blackwell	32 GB GDDR7	FP4 / FP8 / FP16	Fast small-model serving	Yes
ijarvis-thin	NVIDIA RTX 5090 Blackwell	32 GB GDDR7	FP4 / FP8 / FP16	Fast small-model serving	Yes

06How it works

Four steps to a shard.

Tell us the workload

Model, quant, expected RPS, latency target, residency requirements. Two-paragraph form, no sales call required.

ii.

Match to a node

Routing is decided by the operator, not a load balancer. Spark for big-context UMA, PRO 6000 for prod throughput, 5090 for sub-100ms TTFT on small models.

iii.

Provision & tune

vLLM stood up with your batch size, your kv-cache budget, your speculative-decoding setting. You get an OpenAI-compatible base_url and a Tailscale invite if Private Fleet.

iv.

Ship

Stream traffic. Watch tokens-per-second on the operator dashboard. Cancel any time on hourly. We will tell you if a different shard would suit you better. We would rather lose the upgrade than the trust.

07Agents

Agents are first-class buyers.

Pricing, fleet state, capabilities, and reservations are all published as machine-readable files at the same paths a human would visit. Crawlers and MCP-aware agents read the same data the homepage renders.

Reservation flow: an agent POSTs to the API, gets a single-use Stripe payment link back, and the link's holder (agent or delegated human) completes payment. iShard then issues a per-tenant Tailscale ACL and a scoped base_url. No card data ever crosses the agent boundary.

/llms.txt

LLM-readable site summary (Answer.AI standard)

/ai.txt

Training & inference policy

/pricing.json

All tiers, machine-parseable

/fleet.json

Live node specs & bookability

/agents.json

Capabilities manifest for agent buyers

/.well-known/mcp.json

MCP server descriptor (Anthropic protocol)

$ curl · reserve a Dedicated Shard via API

# POST /v1/reservations
curl https://api.ishard.us/v1/reservations \
  -H "Authorization: Bearer $ISHARD_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "node_class": "spark",
    "duration_hours": 4,
    "model": "meta-llama/Llama-4-70B-Instruct",
    "quant": "fp8"
  }'

# Response includes a single-use Stripe payment link.
# Settlement triggers Tailscale ACL issuance + base_url.
{
  "reservation_id": "rsv_01H...",
  "node": "spark-bravo",
  "window": { "start": "2026-05-15T18:00Z", "end": "2026-05-15T22:00Z" },
  "pay": "https://buy.stripe.com/...",
  "base_url_on_settlement": "https://spark-bravo.ishard.us/v1"
}

Agents pay via single-use Stripe payment links scoped to one reservation — no agent-readable card details ever required. Per-tenant Tailscale ACLs are issued on settlement. The API surface is open today for partners; general availability tracks the Q3 control-plane build. Email [email protected] to onboard early.

WHAT'S WRONG WITH IT

Five nodes is the feature, not the bug.

iShard is small on purpose, and small for now. We do not run thousands of GPUs because we do not want to. The buyers we are built for need a specific Blackwell card with their model on it, hand-tuned by an operator who reads their batch profile and does not guess.

We would rather be specific about what we are than blur into the mid-tier of the inference market. Here is everything you would otherwise have to extract from us — including the alternatives that are right for buyers we are not right for:

SCALEFive GPUs, single site, one operator. We are not a hyperscaler and we will not pretend to be one. If you need 1,000 concurrent inference replicas, we are the wrong choice today.
PRICE FLOORSpot GPU rental on Vast, Spheron, Runpod, Lambda will be cheaper for the same silicon. If price is the only variable, go there. The premium here pays for the named operator and the residency commitment.
TOKEN APISGroq is faster on Llama 3.3 70B. Deepinfra is cheaper. Together and Fireworks are battle-tested. Use them for undifferentiated, high-volume token traffic. Use us when the workload needs a specific card with your model on it.
SITEOne physical site (Tampa, FL) until Q4 2026 multi-site split. Your single Dedicated Shard goes down if our power does. Disclosed in the SLA, not buried.
PAGEROne operator on call. You will know their name on Private Fleet. Senior product management in ad-fraud and data integrity is the day job.
CAPACITYFour nodes are fully externally bookable; ijarvis-slim runs internal NPCx workload as the anchor tenant — partially shared, never silently overcommitted. Effective external capacity: 4.5 of 5.
COMPLIANCENo SOC 2 today. No HIPAA BAA today. US-only data residency, written into Private Fleet contracts. Encryption in transit (TLS 1.3) and at rest (LUKS / dm-crypt) on production volumes.
ROADMAPQ3 2026: status.ishard.us with real fleet telemetry; control-plane GA. Q4: second site (colo) for multi-site Private Fleet. 2027: optional GPU pooling with vetted partners under iShard SLA.

08iJarvis ecosystem

iShard is one floor of a bigger building.

ijarvis.aiparent reruns.tvVOD pipeline npcx.ggin-game AI aidlas.comwearables iref.aigame reference agentichealth.ushealth agents ifsmy.comoperator continuity pr0mpt.usLLM tooling solarself.appspace weather marisoli.comhardware stylebydesign.aidesign isilentium.comtinnitus rehab iresonate.usbrain–body icompound.aicompound velocity

ishard.us · iJarvis Compute

Reserve a shard. Or talk to the operator.

Reserve a shard Private Fleet · talk to the operator

A shard,not a marketplace.