Understanding Embedded vs. Referenced Media in MongoDB: A Guide

Zeeshan AliZeeshan Ali
4 min read

Introduction

When building LocaleBlend—an open-source social media app—I faced a critical design decision: How should I store user-generated media (images and videos) in MongoDB? Should I embed media directly inside posts or reference them in separate collections? This blog walks through my thought process, the challenges I encountered, and why I ultimately chose the referencing approach. If you’re struggling with similar database design dilemmas, this post is for you.

The Problem: Media Storage in a Hybrid App

LocaleBlend combines social media features (posts, comments) with dating functionalities (swipes, matches). Users can:

  • Upload photos/videos to posts.

  • Share media during chats.

  • Sell items in a marketplace (like Facebook Marketplace).

Key Requirements:

  1. Scalability: Handle thousands of posts with media.

  2. Performance: Load posts and media quickly.

  3. Flexibility: Allow media reuse (e.g., a profile picture in multiple posts).

  4. Maintainability: Easily update or delete media without breaking posts.

The Two Contenders: Embedded vs. Referenced Media

1. Embedded Media: Simplicity at a Cost

Initially, I considered embedding media directly in the posts collection:

{
  _id: "post_123",
  content: "My weekend adventure!",
  media: [
    { type: "image", url: "image1.jpg" },
    { type: "video", url: "video1.mp4" }
  ]
}

Pros:

  • Single Query: Fetch posts and media in one go.

  • Atomic Updates: Delete a post, and its media vanishes automatically.

Cons:

  • Document Bloat: Media-heavy posts risked hitting MongoDB’s 16MB document limit.

  • Duplication: Reusing media across posts meant storing redundant copies.

  • Complex Queries: Hard to query media independently (e.g., "Find all videos by a user").

2. Referenced Media: Complexity for Flexibility

I then explored separating media into their own collections:

// posts collection
{
  _id: "post_123",
  content: "My weekend adventure!",
  pictures: ["pic_456", "pic_789"], // ObjectIds from pictureMedia
  videos: ["vid_101"] // ObjectIds from videoMedia
}

// pictureMedia collection
{
  _id: "pic_456",
  url: "image1.jpg",
  owner: "user_123",
  post: "post_123"
}

// videoMedia collection
{
  _id: "vid_101",
  url: "video1.mp4",
  owner: "user_123",
  post: "post_123"
}

Pros:

  • Reusability: Media can be linked to multiple posts (e.g., a user’s profile picture).

  • Scalability: No document size limits.

  • Independent Management: Update/delete media without touching posts.

Cons:

  • Multiple Queries: Fetching a post’s media requires $lookup aggregations or .populate().

  • Code Complexity: More collections to manage.


Why I Chose Referenced Media

After prototyping both approaches, I settled on the referenced media. Here’s why:

1. Future-Proofing for Advanced Features

LocaleBlend’s roadmap includes:

  • Media Analytics: Track views/likes on individual videos.

  • Media Galleries: Let users curate photos/videos across posts.

  • Marketplace Listings: Reuse product images in multiple listings.

Referencing made these features easier to implement.

2. Avoiding Document Bloat

A single post with 10+ high-resolution images/videos could push the posts document close to 16MB. Referencing kept the posts collection lean.

3. Performance Optimizations

While referenced media required more queries, I mitigated performance issues by:

  • Indexing: Added indexes on post and owner fields in pictureMedia/videoMedia.

  • Caching: Used Redis to cache frequently accessed media.


Challenges I Faced (and How I Solved Them)

1. Joining Data Across Collections

MongoDB isn’t built for joins, but I used Mongoose’s .populate() to fetch media efficiently:

const post = await Post.findById(postId)
  .populate("pictures videos")
  .exec();

2. Atomic Deletes

Deleting a post required cleaning up its media. I solved this with Mongoose middleware:

Post.pre("deleteOne", async function () {
  const postId = this.getQuery()._id;
  await PictureMedia.deleteMany({ post: postId });
  await VideoMedia.deleteMany({ post: postId });
});

3. Consistency

To ensure media is always linked to valid posts, I added reference validators:

const pictureSchema = new Schema({
  post: {
    type: Schema.Types.ObjectId,
    ref: "Post",
    validate: {
      validator: async (postId) => {
        const post = await Post.findById(postId);
        return post !== null;
      },
      message: "Post does not exist!",
    },
  },
});

Lessons Learned

  1. Start Simple, Then Optimize: Begin with embedding for simplicity, then switch to referencing if needed.

  2. Index Early: Indexes on foreign keys (post, owner) are non-negotiable.

  3. Use the Right Tools: Libraries like Mongoose simplify referencing with .populate().

  4. Plan for Scale: If your app might grow, design for flexibility upfront.


Conclusion

Referencing media added complexity, but for LocaleBlend’s use case, it was the right trade-off. If your app needs advanced media management, references are worth the effort. But if you’re building a simple blog or forum, embedding might save you time.


About the Author
Hi! I’m a full-stack developer building LocaleBlend, an open-source social-dating hybrid app. Follow my journey on GitHub or Twitter for more tech insights!

1
Subscribe to my newsletter

Read articles from Zeeshan Ali directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Zeeshan Ali
Zeeshan Ali

Hi there! My name is Zeeshan Ali, and I'm a passionate programmer and React Native mobile app developer. I've always had a fascination with technology and its ability to solve complex problems, which is what led me to pursue a career in software development. My love for React JS is what got me into front-end development, and I'm now working towards becoming a MERN stack developer. Aside from my day job, I enjoy contributing to open-source projects and giving back to the tech community. I'm always looking for ways to expand my knowledge and skills, and I'm currently learning Node JS to compliment my existing skill set. As a tech blogger on Hashnode, I'm excited to share my experiences, insights, and knowledge with fellow developers and enthusiasts. My goal is to create informative and engaging content that inspires and empowers others to pursue their passion for programming. Whether you're just starting out in the field or looking to advance your career, I hope to provide valuable resources and insights that can help you succeed. Thanks for stopping by, and I look forward to connecting with you on Hashnode!