Infinity wars, XR on bare minimum hardware

Prince LarbiPrince Larbi
6 min read

So here’s how it began—I got a VR headset, hurray, not to game but build (original plan but hey, you know what all work and no play does to jack)

You see, I’ve always had this natural pull towards understanding how tech pieces fit together — VR, IoT, AI, Blockchain, etc. But I also didn’t exactly have the luxury of unlimited hardware or funding. So I do what most curious builders do: make it work with what I’ve got.

Tools you can get your hands on with zero budget.

Tool / LibUse CaseWorks on
Three.js + WebXRRender 3D, handle VR interactionsBrowser
TensorFlow.jsRun ML in browserCPU
MQTT.js / Socket.ioIoT communicationLocalhost or broker
GGML + Whisper.cppVoice controlLocal CPU (no GPU needed)
A-Frame + AR.jsEasy AR scenesBrowser (mobile too)

I’m running Fedora Linux (since 2023, proudly), and honestly, getting Blender, Unity, or even Godot to run smoothly? That’s a full-time job on its own. Blender works… until it doesn’t. Unity? Refused to open. Meta’s dev tools? Wahala from start to finish.

At some point, I had to stop forcing it.

Enter WebXR — Peace Treaty

One day, I stumbled on this WebXR demo on Glitch and something just clicked… could’ve been my back, not sure, but it was a moment. Hehe.

WebXR lets you build immersive VR/AR apps directly in the browser using HTML, JavaScript, and some WebGL or Three.js sauce. No heavy engines. No platform-specific SDKs.

Left a side note of resources at the end, feel free to check them out.

In short: I finally had a way to start building without breaking my system.

I got a simple VR scene to render. Then I added buttons. Then I tried making the camera react to the headset's rotation. And just like that, I was in.

But I Didn’t Just Want Pretty Scenes

I wanted more than just building a 3D room that looked nice.

Art Fun GIF by ELYX

I’ve always been big on connecting the virtual with the real. That’s where the idea landed: controlling real-world devices from inside VR. Lights, fans, AC units—standard IoT stuff, but this time in XR.

Now here's where it got spicy.

Big Word, Spatial Intelligence

Mid crashing-out on Blender and discovering WebXR, I stumbled into spatial data and this thing called hit testing. Basically, it's a way for the VR system to understand where stuff is in 3D space—and where you're pointing, looking, or moving.

🔍 Spatial Intelligence Isn’t Just Geometry — It’s Context

  1. Positional Tracking:

    • Your headset’s position (x, y, z)

    • Orientation (rotation quaternion or Euler angles)

  2. Reference Spaces:

    • viewer, local, local-floor, bounded-floor, unbounded

    • These define how your scene relates to the real world (is it anchored? floating? room-scale?)

  3. Hit Testing:

    • Shoot a virtual ray from your eyes/hands and detect what it intersects in the physical or augmented scene.

But alone, that’s just math and coordinates. Here’s where we go deeper.

🧠 Making XR Smarter with Contextual Models

Let’s say your VR app knows you're looking at a physical object. Cool. But what’s that object? A ceiling fan? A coffee machine? A gas leak sensor?

That’s where spatial awareness meets local AI models → spatial intelligence!

Data Pipeline Example

txtCopyEditHeadset / Controller
  └── Spatial Pose (x,y,z) + Direction Vector
        └── Hit Test Source
              └── 3D Anchor
                    └── Object Metadata / ID
                          └── Contextual Action Pipeline

Practical Use Case

Let’s say your XR scene includes digital twins of real-world devices, and each twin has:

  • An object_id

  • A bounding box (via hit test)

  • A reference to a real IoT endpoint (via MQTT, REST, or local socket)

Now, your XR app detects you’re pointing at device_id: ac-unit-01, which is in a “Living Room” node of your home layout. Once that’s confirmed, your model can:

  1. Query real-time state: temperature, power draw, mode.

  2. Run a local LLM or intent parser (e.g., Whisper + GGML):
    “Lower the temperature”AC.setTemp(current - 2)

  3. Evaluate safety or contextual constraints:

    • Is someone else in the room?

    • Is the room already cooling too fast?

    • Should you get a notification instead?

This is spatial + contextual awareness in action.

Instead of making people click around floating panels, what if I could let them just look at a device, point at it, and then instantly get options to control it?

Charle, imagine mapping these actions to events:

  • Gaze + dwell → Intent

  • Hand swipe → Command

  • Step into space → Activate mode

  • Point + hold + confirm → Device action

Juju As A Service (JAAS)

Kind of like how command-line users don’t want to dig through 5 menus just to install something—they just paste the command and boom, it’s done. That’s what I’m trying to bring into VR for smart home control: spatial efficiency.

The Bigger Picture (And a Small Demo)

I’ve been tinkering with routing live camera feeds into local models on my machine—just to see what’s possible. Still early, but it’s opening new doors.

Here’s a barebones sample if you're curious:

Camera Feed + Frame Routing

jsCopyEditnavigator.mediaDevices.getUserMedia({ video: true })
  .then(stream => {
    const video = document.querySelector("video");
    video.srcObject = stream;
    video.play();

    const canvas = document.createElement("canvas");
    const ctx = canvas.getContext("2d");

    setInterval(() => {
      ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
      const frame = canvas.toDataURL(); // route this to your model for inference
    }, 100);
  });

Hit Testing Basics

jsCopyEditxrSession.requestReferenceSpace('viewer').then(viewerSpace => {
  xrSession.requestHitTestSource({ space: viewerSpace }).then(source => {
    xrSession.requestAnimationFrame(function frameLoop(time, frame) {
      const hits = frame.getHitTestResults(source);
      if (hits.length > 0) {
        const pose = hits[0].getPose(viewerSpace);
        console.log("Hit at:", pose.transform.position);
      }
      xrSession.requestAnimationFrame(frameLoop);
    });
  });
});

The real magic comes when you layer sensor data, ML inference, and spatial understanding together.

It stops being “a VR scene” and becomes a reactive space.

Side Note for Nerds (You Know Who You Are)

Here are some technical docs I’ve been referencing along the way:

Also worth looking into:

Final Thoughts (For Now)

This combo of WebXR + IoT + spatial data is low-barrier enough for indie hackers, but rich enough to scale into complex use cases.

Spatial data, machine learning, and real-world control, it stops being just a VR app. It becomes an intelligent agent inside a digital twin of your home.

Not just "press button to switch fan" — but “look at your space, and have it respond to you.”

If you're playing with similar ideas or want to experiment with spatially-aware smart home interfaces, let’s link up. Email me at phiddyconcept@gmail.com

More soon — bye.

2
Subscribe to my newsletter

Read articles from Prince Larbi directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Prince Larbi
Prince Larbi

Enjoy the journey. yoghurt over coffee!! ...any day