Git from the Inside: A Developer’s Guide to Git Internals

Table of contents

“Do you know how to use Git?”
That was the first question my colleague asked me during a GitHub Actions session I was preparing for at work. At that point, I had already been writing code for over three years. I’d worked with teams, pushed code, made pull requests—so I confidently replied, “Yes, of course.”
And I wasn’t lying. I knew Git enough—or at least that’s what I thought.
I knew that Git was a version control system that allowed developers to collaborate on code stored in a central repository. Each developer could clone the project, make changes, and then push those changes back so the whole team stayed in sync.
I knew about commands like git add
, git commit
, and git push
. I understood branches and how to merge them. I knew how to resolve some basic merge conflicts.
That was it. That was the extent of my Git knowledge.
"Don’t worry. Most of the time you’ll only use 6 or 7 commands. Just make sure you always work on a branch and never commit directly to
main
."
That advice stuck with me, and as time passed, I got pretty comfortable using Git for everyday development: git pull
, git checkout
, git add
, git commit
, git push
, and git merge
. I even picked up more advanced tools like git stash
, git cherry-pick
, and occasionally used git reset
or git revert
when things went wrong.
By then, I thought I had a good handle on Git.
But then when I was working on a project with a colleague — a simple app. We had a few commits, everything was going smoothly, and then... I ran a git reset to "fix something".
And suddenly, my changes were gone. The commit history looked weird. My friend couldn’t even pull my changes.
I panicked.
I googled every combination of git undo, git restore, git reflog... But it felt like I was trying to reverse-engineer a black box I didn’t really understand.
And that’s when it hit me:
I use Git every day. But I don’t really know how Git works.
What exactly is a commit? Where does Git store files? Why is it so fast and efficient?
That incident sparked a curiosity. I started digging into Git internals. Not just the commands — but the architecture, the object model, and the magic behind it.
What I discovered changed everything.
What is Git?
Let’s take a look at the official Git documentation:
"Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals."
In simpler terms: Git is a content tracker. And that content can be anything—not just code.
This blew my mind. I always thought Git was a code versioning system. But it turns out Git is a system for tracking any change in any content.
How does Git track content?
Git works like a key-value store. It takes any content (a sequence of bytes), hashes it using SHA-1, and stores it with a unique identifier.
So, if you give Git the same content on two different machines, Git will generate the same hash for it.
That was my first big revelation: Git isn’t storing your files, it’s storing their content hashes.
Let’s understand by doing
I decided to build a sample project for my family’s task management. I initialized a Git repo:
git init
This created a hidden .git/
directory with internal folders like objects/
, where Git stores data.
Initially, objects/
is empty (except for some internal optimization folders like info/
and pack/
).
Then, I added a file:
echo "Manas" > Members.txt
git add Members.txt
git commit -m "Add Members file"
This commit created three new objects inside .git/objects/
—a commit
, a tree
, and a blob
.
First object type: Commit
I checked the commit using:
git cat-file -p <commit-hash>
A commit
in Git stores metadata (author, timestamp, message), and a reference to a tree
object that represents the snapshot of the project at that point in time.
Second object type: Tree
The tree
object contains a mapping of filenames to their corresponding blob
hashes.
This was my second big revelation: File names aren’t stored in the blob—they’re stored in the tree.
Why? Because Git may reuse a blob in different places, even with different filenames.
Third object type: Blob
The blob
holds the actual content of the file—no filename, no metadata. Just the raw data.
The second commit
I added a folder Tasks/
and inside it a file Wash the dishes.txt
with the same content: Manas
.
mkdir Tasks
echo "Manas" > Tasks/Wash\ the\ dishes.txt
git add .
git commit -m "Add task file"
This second commit introduced a new tree (for the Tasks/
directory), but no new blob for Wash the dishes.txt
.
Why? Because Git noticed the content was identical to Members.txt
.
So it reused the same blob hash.
This was my third big revelation: Git doesn't duplicate content—if the content is identical, Git reuses the blob.
That’s also why the filename is stored in the tree, not the blob—so that one blob can be used by multiple filenames.
Visualizing the object database
At this point, the structure looked like:
2 commits
3 trees
1 blob
One content file, reused across two different paths. That’s Git efficiency.
Git References: Branches, HEAD, and Tags
Let’s go deeper.
Inside .git/refs/heads/
, we find branches. These are just pointers to commits—stored as a simple file with the commit hash inside.
If we create a new branch:
git branch branchA
This creates a new reference inside .git/refs/heads/branchA
.
The current branch is stored in .git/HEAD
as:
ref: refs/heads/branchA
So HEAD is a pointer to a pointer.
Creating a new commit on branchA
After another commit on branchA
, only the branchA reference moves forward. master
stays where it was.
That’s how Git allows multiple lines of development.
What about tags?
Tags are just like branches—but immutable.
git tag v1.0
This creates a reference under .git/refs/tags/v1.0
.
They’re used to mark versions/releases in Git. Tags never move—making them perfect for tracking fixed points in history.
Wrapping up: What I Learned
Let me summarize the most important things I learned from this journey:
Git tracks content, not files.
Git stores three object types: blob (content), tree (structure), and commit (snapshot + metadata).
Same content → same blob. Git reuses blobs even across files with different names.
Branches, HEAD, and tags are just simple references to commits.
HEAD points to the current branch, which in turn points to the latest commit.
Final thoughts
I used to think I understood Git.
But it wasn't until I saw how Git actually works under the hood that things truly made sense.
If you’re like me—using Git every day but treating it like a magic box—take time to explore its internals. Create a repo, inspect the objects, look at the raw commits, trees, and blobs.
The beauty of Git is not just in how powerful it is—but how elegantly simple its model is when you truly understand it.
If you found this insightful, let me know—I’m considering a follow-up article about Git’s three-state workflow and how commands like add
, commit
, reset
, and checkout
affect it.
💡 Bonus section
I recorded a video demonstration that complements this article. If you’re more of a visual learner, check it out:
Subscribe to my newsletter
Read articles from Manas Shinde directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
