Git Internals

Blob Objects and Content Addressing

Explain how blobs store raw content and why hashes become the identity of Git objects.

Written by Lance MQ · 12 years of Git

Who This Is For

Readers building a durable Git mental model
Developers who keep running into history, ref, or recovery confusion

Prerequisites

Comfort reading basic Git output
A rough idea of commits, branches, and HEAD

Common Risks

Learning low-level terms without connecting them to commands
Collapsing objects, refs, and working state into one concept

Citations & Further Reading

Git Internals Git Objects [Book]

What you will learn

Understand the core purpose of Blob Objects and Content Addressing
Master the basic usage and common options of Blob Objects and Content Addressing
Explain how blobs store raw content and why hashes become the identity of Git objects.
Understand key concepts: How blob, tree, and commit connect
Know when to use this feature and when to avoid it

If you want to understand Git below the command surface, blobs are one of the best places to start. A blob is one of Git's most basic object types, and it explains why Git can treat content itself as the primary identity.

Start with a problem

You use Git commands daily, but occasionally encounter 'strange' behavior — like being told a file changed when you didn't touch it, or unexpected conflicts during a rebase. You want to understand how Git works under the hood.

How blob, tree, and commit connect

Blob → Tree → Commit ChainFile content is first stored as blob objects, organized into directory structures by tree objects, and finally pointed to by commit objects to form complete snapshots.

Content Layer

blob: file A contentblob: file B contentblob: file C content

Structure Layer

tree: root directorysub-tree: src/ directorycommit: snapshot + author + parent

Identical content shares the same blob. Blobs do not store filenames or paths.

What a blob stores

A blob stores raw file content.

That description matters because of what it does not store:

no filename
no directory path
no file history label
no commit message

From Git's point of view, a blob is just content.

That means the same content can appear in different paths, branches, or commits while still referring to the same underlying blob object.

Why content addressing matters

Git is content-addressed. That means an object's identity comes from the object data itself, not from an external database row or a mutable record ID.

Conceptually, Git computes an object ID from:

the object type
the object size
the object content

So if the content changes, the object ID changes too.

This gives Git a very strong guarantee:

identical object content leads to the same object identity
changed content leads to a different identity

Why identical files can map to the same blob

Suppose two files in different directories contain exactly the same text.

Git does not need two different blob objects just because the paths differ. The path information lives elsewhere. The blob only represents content.

This is one reason Git can store snapshots efficiently while still thinking in terms of object identity.

Where the filename actually lives

People often assume a file object in Git must know its own path. But that path is not stored in the blob.

The path is stored by a tree object, which maps:

a name
a mode
an object ID

So Git separates:

content itself: blob
placement in a directory tree: tree

That separation is one of the keys to understanding Git's internal model.

Why this is different from a normal "file history" mental model

Many users imagine Git as tracking a changing file over time as one continuous thing.

Internally, Git is closer to:

blobs for content
trees for directory structure
commits for full repository snapshots

That is why Git can reason about content reuse, renames, and snapshot identity without needing one permanent file object that lives forever.

Use case 1: why editing a file creates a new object identity

When you edit a file and stage it, Git is no longer talking about "the same blob with a new version number." It is preparing content that will hash to a different blob object.

That helps explain why even a small change results in a different object ID.

Use case 2: why renames are not stored as a special blob property

If blobs stored paths, a rename would have to change the blob. But a blob does not know its path.

So a rename is better understood as:

a tree-level change in where some content appears
not a mutation inside the blob itself

This is part of why Git often detects renames heuristically from content relationships instead of preserving them as a built-in object field.

Use case 3: why `git hash-object` is so revealing

Low-level commands like git hash-object are useful because they expose the content-addressed model directly.

They make it easier to see that Git is not assigning arbitrary IDs. The ID comes from the object data.

Special case: a blob is not "a file in the repo" in the everyday sense

It is tempting to say "a blob is a file." That is close enough for a first pass, but not really precise.

A blob is:

file content only
without location context
without commit context

That distinction becomes important once you start reasoning about trees, commits, or duplicate content.

Common misconceptions

"A blob stores the filename too"

No. The filename and path live in tree objects.

"Git tracks files as one permanent identity over time"

Not in the way many people first imagine. Git tracks snapshots and objects, and blob identity is derived from content.

"Two identical files must be stored as two separate blobs"

Not necessarily. If the content is identical, the same blob object can represent it.

Why this helps you understand commands

Once blobs and content addressing click, it becomes easier to understand:

why object IDs change with content
why staging writes toward new object identity
why renames are not baked into blob objects
why low-level inspection commands expose so much about Git's model

Suggested follow-up

It pairs especially well with:

git hash-object
git cat-file
git ls-tree
git show
git rev-parse

Try it yourself

Practice the blob-objects-and-content-addressing command in a test repository and observe state changes before and after
Experiment with different options and compare the output differences
Simulate a real scenario where you would need to use this, and walk through the full process

Previous / Next

PreviousTree Objects and SnapshotsGit Internals NextCommit Objects, Parents, and MessagesGit Internals