iJarvis Compute Fabric · Tampa, FL · est. 2026

Rent a shard,
not a queue.

A five-node Blackwell fleet, hand-routed by the operator who built it. Pay per token, reserve a node, or take the fleet. Humans and agents both buy here.

View pricing See live fleet For agents

01The fleet

Five nodes on a single Tailscale WireGuard mesh. We don't pretend it's a region. We name them.

spark-alphaOnline

NVIDIA DGX Spark FE

GB10 Grace Blackwell · 128 GB UMA

Util62%

FP4 nativetampa-1

spark-bravoOnline

NVIDIA DGX Spark FE

GB10 Grace Blackwell · 128 GB UMA

Util48%

FP4 nativetampa-1

ijarvis-slimReserved

NVIDIA RTX PRO 6000

Blackwell · 96 GB GDDR7

Util91%

Anchor: NPCxtampa-1

ijarvis-thickOnline

NVIDIA RTX 5090

Blackwell · 32 GB GDDR7

Util34%

Ubuntu 24.04tampa-1

ijarvis-thinOnline

NVIDIA RTX 5090

Blackwell · 32 GB GDDR7

Util22%

Bookabletampa-1

Aggregate VRAM416 GB Cores5 Mesh latency~1 ms LAN Updated just now

Demo telemetry · live data at status.ishard.us when ready · machine-readable at fleet.json

02Why iShard

What you can't buy from a hyperscaler.

01 / Silicon

Blackwell, not last gen.

Every node is GB10 or Blackwell. FP4 is native. The PRO 6000 ships 96 GB of GDDR7 in a single card. The DGX Sparks each carry 128 GB unified across CPU and GPU. Most providers still serve generic workloads on shared H100 pools.

02 / Topology

A heterogeneous mesh.

UMA Sparks for huge contiguous models. PRO 6000 for production-grade dedicated workloads. RTX 5090s for fast small-model serving. Routing is decided by the workload, not by which bin had a free slot. Your 70B FP16 fits on one Spark — no sharding overhead unless you want it.

03 / Operations

One operator, on call.

iJarvis owns the silicon, the rack, the network, the OS, the inference layer, and the routing logic. When something breaks, the human who built it picks up. There is no L1 ticket queue. There is also no datacenter rent in our COGS — that is why our hourly rate is what it is.

03Pricing

Three tiers. No upsell. No PDF quote.

OPEN SHARDper-token

Pay-per-token.

Drop in via OpenAI-compatible endpoint. We route to whichever Blackwell card has headroom. Best-effort, no reservation.

Llama 3.3 70B class$0.55/M tok

DeepSeek V3 class$1.10/M tok

Llama 405B class$2.80/M tok

Custom checkpoint+15% surcharge

OpenAI-compatible /v1/chat/completions
Standard open-weight catalog
Streaming, function calling, JSON mode
$10 credit on signup

SLABest-effort. No 99.x uptime commitment. Status page is honest. Use this tier for non-critical workloads or to qualify before reserving.

Start free

DEDICATED SHARDRecommended

Reserve a node.

Lock a specific GPU end-to-end. Your model, your quant, your batch settings. We do not share the card with anyone else while you hold it.

RTX 5090 (32 GB)$1.50/hr

DGX Spark (128 GB UMA)$2.50/hr

RTX PRO 6000 (96 GB)$3.50/hr

Spark Pair (256 GB UMA)$4.75/hr

Bring-your-own model + quant
vLLM tuning to your batch profile
Hourly billing, no commit
Slack channel with the operator
FP4 / FP8 / INT8 / FP16 routing

SLABest-effort 99.0% during reserved window. Single-site, single-operator. Disclosed up front. Pager rotation: one human.

Reserve a shard

PRIVATE FLEETmonthly

Take the fleet.

Reserve N nodes for a month or longer. Custom routing, custom isolation, custom SLA. White-glove operations. For teams shipping AI as a feature.

Discount vs hourly30–40%

Minimum term6months

Onboarding$0included

Named SLACustom

Up to 5 nodes reserved
Dedicated Tailscale ACL
US-only data residency, written
Custom kernels / quants / model
Quarterly business review with operator

SLANegotiated. Single-site constraint disclosed in contract until Q4 2026 multi-site rollout.

Talk to the operator

Prices listed per million tokens / per GPU-hour. Machine-readable: pricing.json

04Case study

The first tenant is us.

NPCx · engine.npcx.gg

An AI dialogue engine for FiveM roleplay servers, in production for PENTA.

NPCx is iShard's first internal tenant. It serves real-time character dialogue and cinematic camera direction inside FiveM RP environments, with PENTA — a 364K-follower Twitch streamer — as its largest end-user. The model stack runs on ijarvis-slim via vLLM on port 8100, with Kokoro TTS on CPU port 8400 and an HTTP proxy on 8300.

It is our public proof that the substrate works. We list it because traction is the only honest thing to list. Pretending we have a thousand external customers when we do not is the fastest way to lose the ones we earn.

First iShard tenantNPCx

NPCx end-userPENTA

Downstream Twitch reach364K

External iShard customers0 today

05Hardware

What the iron actually is.

Node ID	Silicon	Memory	Native precision	Role	Bookable
spark-alpha	NVIDIA GB10 Grace Blackwell	128 GB UMA	FP4 / FP8 / FP16	Large model serving	Yes
spark-bravo	NVIDIA GB10 Grace Blackwell	128 GB UMA	FP4 / FP8 / FP16	Large model · pair-link capable	Yes
ijarvis-slim	NVIDIA RTX PRO 6000 Blackwell	96 GB GDDR7	FP4 / FP8 / INT8 / FP16	Production · NPCx anchor	Partial
ijarvis-thick	NVIDIA RTX 5090 Blackwell	32 GB GDDR7	FP4 / FP8 / FP16	Fast small-model serving	Yes
ijarvis-thin	NVIDIA RTX 5090 Blackwell	32 GB GDDR7	FP4 / FP8 / FP16	Fast small-model serving	Yes

06How it works

Four steps to a shard.

Tell us the workload

Model, quant, expected RPS, latency target, residency requirements. Two-paragraph form, no sales call required.

ii.

Match to a node

Routing is decided by the operator, not a load balancer. Spark for big-context UMA, PRO 6000 for prod throughput, 5090 for sub-100ms TTFT on small models.

iii.

Provision & tune

vLLM stood up with your batch size, your kv-cache budget, your speculative-decoding setting. You get an OpenAI-compatible base_url and a Tailscale invite if Private Fleet.

iv.

Ship

Stream traffic. Watch tokens-per-second on the operator dashboard. Cancel any time on hourly. We will tell you if a different shard would suit you better. We would rather lose the upgrade than the trust.

07Agents

Agents are first-class buyers.

Pricing, fleet state, capabilities, and reservations are all published as machine-readable files at the same paths a human would visit. Crawlers and MCP-aware agents read the same data the homepage renders.

Reservation flow: an agent POSTs to the API, gets a single-use Stripe payment link back, and the link's holder (agent or delegated human) completes payment. iShard then issues a per-tenant Tailscale ACL and a scoped base_url. No card data ever crosses the agent boundary.

/llms.txt

LLM-readable site summary (Answer.AI standard)

/ai.txt

Training & inference policy

/pricing.json

All tiers, machine-parseable

/fleet.json

Live node specs & bookability

/agents.json

Capabilities manifest for agent buyers

/.well-known/mcp.json

MCP server descriptor (Anthropic protocol)

$ curl · reserve a Dedicated Shard via API

# POST /v1/reservations
curl https://api.ishard.us/v1/reservations \
  -H "Authorization: Bearer $ISHARD_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "node_class": "spark",
    "duration_hours": 4,
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "quant": "fp8"
  }'

# Response includes a single-use Stripe payment link.
# Settlement triggers Tailscale ACL issuance + base_url.
{
  "reservation_id": "rsv_01H...",
  "node": "spark-bravo",
  "window": { "start": "2026-05-07T18:00Z", "end": "2026-05-07T22:00Z" },
  "pay": "https://buy.stripe.com/...",
  "base_url_on_settlement": "https://spark-bravo.ishard.us/v1"
}

Agents pay via single-use Stripe payment links scoped to one reservation — no agent-readable card details ever required. Per-tenant Tailscale ACLs are issued on settlement. The API surface is open today for partners; general availability tracks the Q3 control-plane build. Email hello@ijarvis.ai to onboard early.

WHAT'S WRONG WITH IT

Five nodes is the feature, not the bug.

iShard is small on purpose, and small for now. We do not run thousands of GPUs because we do not want to. The buyers we are built for need a Blackwell card with their model on it, tuned by a human who reads their batch profile and does not guess.

We would rather be specific about what we are than blur into the mid-tier of the inference market. So here is everything you would otherwise have to extract from us:

SITEOne physical site (Tampa, FL) until Q4 2026 multi-site split. Your single Dedicated Shard goes down if our power does. Open Shard work fails over within the fleet only.
PAGEROne operator on call. You will know their name on Private Fleet. Pixalate is their day job.
CAPACITY3.5 nodes are externally bookable. ijarvis-slim runs internal NPCx workload — partially shared, never silently overcommitted.
PRICE FLOORWe are not cheaper than the cheapest serverless 70B on the market. We are not trying to be. If price is the only variable, go there.
COMPLIANCENo SOC 2 today. No HIPAA BAA today. US-only data residency, written into Private Fleet contracts. Encryption in transit (TLS 1.3) and at rest (LUKS / dm-crypt) on production volumes.
ROADMAPQ3 2026: status.ishard.us with real fleet telemetry; control-plane GA. Q4: second site (colo) for multi-site Private Fleet. 2027: optional GPU pooling with vetted partners under iShard SLA.

08iJarvis ecosystem

iShard is one floor of a bigger building.

ijarvis.aiparent reruns.tvVOD pipeline npcx.ggin-game AI aidlas.comwearables iref.airesearch tools agentichealth.ushealth agents solarself.appenergy marisoli.comhardware stylebydesign.aidesign isilentium.comtinnitus rehab iresonate.usbrain–body icompound.aicompound velocity

ishard.us · iJarvis Compute

Open a shard. Or talk to the operator.

Start free — $10 credit hello@ijarvis.ai

Rent a shard,not a queue.