Understanding Embedded vs. Referenced Media in MongoDB: A Guide
data:image/s3,"s3://crabby-images/1ea4f/1ea4f2ab97bba40fca1ce8ea7c1e5ab0f6babab7" alt="Zeeshan Ali"
Table of contents
Introduction
When building LocaleBlend—an open-source social media app—I faced a critical design decision: How should I store user-generated media (images and videos) in MongoDB? Should I embed media directly inside posts or reference them in separate collections? This blog walks through my thought process, the challenges I encountered, and why I ultimately chose the referencing approach. If you’re struggling with similar database design dilemmas, this post is for you.
The Problem: Media Storage in a Hybrid App
LocaleBlend combines social media features (posts, comments) with dating functionalities (swipes, matches). Users can:
Upload photos/videos to posts.
Share media during chats.
Sell items in a marketplace (like Facebook Marketplace).
Key Requirements:
Scalability: Handle thousands of posts with media.
Performance: Load posts and media quickly.
Flexibility: Allow media reuse (e.g., a profile picture in multiple posts).
Maintainability: Easily update or delete media without breaking posts.
The Two Contenders: Embedded vs. Referenced Media
1. Embedded Media: Simplicity at a Cost
Initially, I considered embedding media directly in the posts
collection:
{
_id: "post_123",
content: "My weekend adventure!",
media: [
{ type: "image", url: "image1.jpg" },
{ type: "video", url: "video1.mp4" }
]
}
Pros:
Single Query: Fetch posts and media in one go.
Atomic Updates: Delete a post, and its media vanishes automatically.
Cons:
Document Bloat: Media-heavy posts risked hitting MongoDB’s 16MB document limit.
Duplication: Reusing media across posts meant storing redundant copies.
Complex Queries: Hard to query media independently (e.g., "Find all videos by a user").
2. Referenced Media: Complexity for Flexibility
I then explored separating media into their own collections:
// posts collection
{
_id: "post_123",
content: "My weekend adventure!",
pictures: ["pic_456", "pic_789"], // ObjectIds from pictureMedia
videos: ["vid_101"] // ObjectIds from videoMedia
}
// pictureMedia collection
{
_id: "pic_456",
url: "image1.jpg",
owner: "user_123",
post: "post_123"
}
// videoMedia collection
{
_id: "vid_101",
url: "video1.mp4",
owner: "user_123",
post: "post_123"
}
Pros:
Reusability: Media can be linked to multiple posts (e.g., a user’s profile picture).
Scalability: No document size limits.
Independent Management: Update/delete media without touching posts.
Cons:
Multiple Queries: Fetching a post’s media requires
$lookup
aggregations or.populate()
.Code Complexity: More collections to manage.
Why I Chose Referenced Media
After prototyping both approaches, I settled on the referenced media. Here’s why:
1. Future-Proofing for Advanced Features
LocaleBlend’s roadmap includes:
Media Analytics: Track views/likes on individual videos.
Media Galleries: Let users curate photos/videos across posts.
Marketplace Listings: Reuse product images in multiple listings.
Referencing made these features easier to implement.
2. Avoiding Document Bloat
A single post with 10+ high-resolution images/videos could push the posts
document close to 16MB. Referencing kept the posts
collection lean.
3. Performance Optimizations
While referenced media required more queries, I mitigated performance issues by:
Indexing: Added indexes on
post
andowner
fields inpictureMedia
/videoMedia
.Caching: Used Redis to cache frequently accessed media.
Challenges I Faced (and How I Solved Them)
1. Joining Data Across Collections
MongoDB isn’t built for joins, but I used Mongoose’s .populate()
to fetch media efficiently:
const post = await Post.findById(postId)
.populate("pictures videos")
.exec();
2. Atomic Deletes
Deleting a post required cleaning up its media. I solved this with Mongoose middleware:
Post.pre("deleteOne", async function () {
const postId = this.getQuery()._id;
await PictureMedia.deleteMany({ post: postId });
await VideoMedia.deleteMany({ post: postId });
});
3. Consistency
To ensure media is always linked to valid posts, I added reference validators:
const pictureSchema = new Schema({
post: {
type: Schema.Types.ObjectId,
ref: "Post",
validate: {
validator: async (postId) => {
const post = await Post.findById(postId);
return post !== null;
},
message: "Post does not exist!",
},
},
});
Lessons Learned
Start Simple, Then Optimize: Begin with embedding for simplicity, then switch to referencing if needed.
Index Early: Indexes on foreign keys (
post
,owner
) are non-negotiable.Use the Right Tools: Libraries like Mongoose simplify referencing with
.populate()
.Plan for Scale: If your app might grow, design for flexibility upfront.
Conclusion
Referencing media added complexity, but for LocaleBlend’s use case, it was the right trade-off. If your app needs advanced media management, references are worth the effort. But if you’re building a simple blog or forum, embedding might save you time.
About the Author
Hi! I’m a full-stack developer building LocaleBlend, an open-source social-dating hybrid app. Follow my journey on GitHub or Twitter for more tech insights!
Subscribe to my newsletter
Read articles from Zeeshan Ali directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
data:image/s3,"s3://crabby-images/1ea4f/1ea4f2ab97bba40fca1ce8ea7c1e5ab0f6babab7" alt="Zeeshan Ali"
Zeeshan Ali
Zeeshan Ali
Hi there! My name is Zeeshan Ali, and I'm a passionate programmer and React Native mobile app developer. I've always had a fascination with technology and its ability to solve complex problems, which is what led me to pursue a career in software development. My love for React JS is what got me into front-end development, and I'm now working towards becoming a MERN stack developer. Aside from my day job, I enjoy contributing to open-source projects and giving back to the tech community. I'm always looking for ways to expand my knowledge and skills, and I'm currently learning Node JS to compliment my existing skill set. As a tech blogger on Hashnode, I'm excited to share my experiences, insights, and knowledge with fellow developers and enthusiasts. My goal is to create informative and engaging content that inspires and empowers others to pursue their passion for programming. Whether you're just starting out in the field or looking to advance your career, I hope to provide valuable resources and insights that can help you succeed. Thanks for stopping by, and I look forward to connecting with you on Hashnode!