Git Internals

Tree Objects and Snapshots

Explain how tree objects encode directory structure and why commits represent full snapshot trees.

Written by Lance MQ · 12 years of Git

Who This Is For

Readers building a durable Git mental model
Developers who keep running into history, ref, or recovery confusion

Prerequisites

Comfort reading basic Git output
A rough idea of commits, branches, and HEAD

Common Risks

Learning low-level terms without connecting them to commands
Collapsing objects, refs, and working state into one concept

Citations & Further Reading

Git Internals Git Objects [Book]

What you will learn

Understand the core purpose of Tree Objects and Snapshots
Master the basic usage and common options of Tree Objects and Snapshots
Explain how tree objects encode directory structure and why commits represent full snapshot trees.
Understand key concepts: How trees organize snapshots
Know when to use this feature and when to avoid it

Tree objects are the part of Git that makes the repository look like a directory structure instead of a pile of unrelated blobs.

Start with a problem

You use Git commands daily, but occasionally encounter 'strange' behavior — like being told a file changed when you didn't touch it, or unexpected conflicts during a rebase. You want to understand how Git works under the hood.

How trees organize snapshots

Tree Directory Snapshot OrganizationRoot trees point to sub-trees and blobs, forming a complete directory hierarchy snapshot. Commits ultimately point to root trees to express the project's current state.

Directory Structure

src/app.ts → blob:app_hashsrc/utils.ts → blob:utils_hashREADME.md → blob:readme_hashtests/ → sub-tree

Snapshot Expression

tree: src/ (with app, utils)tree: tests/ (with test files)tree: root dir (with README, src/, tests/)

Each commit points to a complete tree snapshot, not a diff. Git ensures storage efficiency through object reuse.

What a tree stores

A tree records entries such as:

a path name
a file mode
the object ID of the child entry

Those child entries may point to:

blobs for file content
other trees for subdirectories

So a tree is Git's way of representing a directory snapshot.

Why trees matter

A blob only knows content. It does not know where that content lives.

A tree adds the missing structure:

which names exist
which entries are files vs directories
which object each path points to

This is why a commit can represent a whole project state instead of just one changed file.

Commits point to a root tree

A commit object does not directly list every file. Instead, it points to one root tree.

That root tree recursively links to other trees and blobs, which together describe the repository snapshot at that commit.

So when people say "a commit stores a snapshot," the precise meaning is closer to:

a commit points to a root tree
that tree graph describes the full snapshot

Why Git is better understood as snapshots than patches

Many developers first learn Git through diffs, so they imagine each commit as mostly a patch.

Diffs are very important for display and review, but internally Git is more naturally described as storing snapshots through tree and blob relationships.

That snapshot model explains a lot:

why checkout can reconstruct full directory state
why commits represent repository state, not just textual changes
why comparing commits often means comparing two trees

Use case 1: why a filename change is not a blob change

Suppose you rename a file without changing its content.

The blob may stay the same, because the content stayed the same. What changes is the tree structure that maps names to objects.

That is a good example of Git separating:

content identity
path placement

Use case 2: why checkout restores whole project state

When you check out a commit, Git is not just applying a patch line by line from nowhere. It has a tree-based snapshot it can use to reconstruct the directory layout and file contents for that commit.

Use case 3: why comparing commits often means comparing trees

A commit comparison usually boils down to asking:

what did the old root tree contain?
what does the new root tree contain?

That is why so many diff and status operations make more sense once you see trees as the structural backbone of the snapshot model.

Special case: one commit can reuse many old objects

Because trees and blobs are object-based, a new commit does not need to rewrite every file as brand-new content if most of the repository stayed the same.

Unchanged parts of the snapshot can still point to existing objects. That is one reason the object graph is both powerful and efficient.

Common misconceptions

"A commit directly stores every file inline"

Not exactly. A commit points to a root tree, and the tree structure describes the snapshot.

"Git mainly stores patches"

Diffs are important, but the internal model is better understood as snapshots built from trees and blobs.

"A rename must create a totally new file object"

Not necessarily. The content blob may stay the same while the tree entries change.

Why this helps you understand commands

Once tree objects make sense, it becomes easier to understand:

why commits represent full repository states
why checkout can rebuild directory structures
why renames and path changes are structural
why diff and status are often comparing snapshots, not just patches

Suggested follow-up

It pairs especially well with:

git ls-tree
git cat-file
git show
git diff
git checkout

Try it yourself

Practice the tree-objects-and-snapshots command in a test repository and observe state changes before and after
Experiment with different options and compare the output differences
Simulate a real scenario where you would need to use this, and walk through the full process

Previous / Next

PreviousMerge Bases and AncestryGit Internals NextBlob Objects and Content AddressingGit Internals