How to Create a VTuber Studio with Three.js, React & VRM


Want to build your own VTuber studio on the web? Thanks to modern tools like Three.js, VRM models, and Mediapipe, creating an interactive 3D avatar that mimics your facial expressions is totally possible—even in your browser.
In this guide, we’ll go over how to bring a VTuber avatar to life using Three.js with React. For detailed code and implementation, check out the full walkthrough video below.
What is a VRM?
Let’s have a look at what VRM are.
VRM stands for Virtual Reality Model. It’s a 3D avatar format widely used by VTubers and in games, thanks to its standardized structure that makes it easy to reuse the same character across different platforms and experiences.
You can find free avatars or create your own using the free tool VRoid Studio.
There are also utility libraries available to load VRM models into engines like Unity and Three.js, making integration super accessible for developers.
👉 For more info, visit the official VRM documentation.
Find VRM Models
If you’re just getting started, the fastest way is to download pre-made avatars from the community.
🎨 Explore a huge library of user-created characters here:
👉 hub.vroid.com
Once downloaded, you’ll get a .vrm
file, which you can load directly into your Three.js scene using the right loader.
Build Your VRM
Want to create a custom avatar?
VRoid Studio is your go-to tool. It’s a free app for Windows and macOS that lets you:
Design anime-style avatars with a visual editor
Customize facial features, hair, clothing, and more
Export the finished avatar in the
.vrm
format. Here is a link to the documentation on how to export the VRM model.
The exported file will be ready to use in any VTuber application—including your own studio.
Load and Control VRM with Three.js
To bring your avatar into a web environment, you’ll need Three.js and the VRM Loader with @pixiv/three-vrm library.
The basic steps are:
Set up your Three.js scene
Load the
.vrm
file with the VRMLoaderAdd it to your scene and animate it with custom logic
The VRM file includes bones and expressions that you can control the usual way. Either manually, through FBX animations, or with camera tracking.
Use Mediapipe with Kalidokit to Control the Avatar
For real-time face and body tracking, Google’s Mediapipe is incredible. It runs in the browser using WebAssembly and gives you landmark data for:
Eyes
Mouth
Face mesh
Hands
Pose
But Mediapipe only gives you raw data. This is where Kalidokit comes in. It maps Mediapipe’s landmarks to a VRM model's facial expressions and bones.
Here’s how it works:
Mediapipe tracks your webcam feed and extracts landmark data.
Kalidokit interprets that data and calculates bone rotations.
You apply these values to your VRM model in Three.js.
It’s like motion capture, but in real time and in your browser.
Checkout Kalidoface made by the creators of Kalidokit. It’s a very well polished app that uses the same technology to track your face and apply it to a VRM model.
Conclusion
By combining VRM avatars, Three.js rendering, Mediapipe tracking, and Kalidokit’s mapping magic, you can build your very own VTuber Studio—all with web technologies.
Whether you’re planning to stream, build a game, or just experiment with 3D avatars, this setup gives you a powerful and accessible starting point.
🧠 Want to see it all in action?
Watch the full video tutorial above for a step-by-step guide.
Check out the code on GitHub
Check the final project at https://vrm.wawasensei.dev/
Subscribe to my newsletter
Read articles from Wawa Sensei directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Wawa Sensei
Wawa Sensei
3D Web Developer 💻 · Talk about Three.js and React three Fiber · YouTuber · Indie Hacker · DevRel at Elestio