Performance

Scalar Git Deep Dive

Understand Scalar (formerly GVFS): Microsoft's virtual filesystem and background sync for massive repos, enabling on-demand downloads with standard Git compatibility.

Who This Is For
  • Developers managing large Git repositories
  • Developers optimizing CI pipeline speed
Prerequisites
  • Basic understanding of clone and fetch mechanisms
  • Awareness of the object database concept
Common Risks
  • Using partial clone on unsupported servers
  • Misconfigured sparse checkout leading to incomplete workspace

What you will learn

  • Understand the core purpose of Scalar Git Deep Dive
  • Master the basic usage and common options of Scalar Git Deep Dive
  • Understand Scalar (formerly GVFS): Microsoft's virtual filesystem and background sync for massive repos, enabling on-demand downloads with standard Git compatibility.
  • Understand key concepts: Overview
  • Know when to use this feature and when to avoid it

Start with a problem

Your Git repository keeps growing, clones are getting slower, and everyday operations are starting to feel sluggish. You want to know what optimization techniques are available and which ones fit your project.

Overview

Scalar (formerly GVFS — Git Virtual File System) is Microsoft's toolkit for massive repositories (Windows kernel, Office, Azure DevOps — millions of files, hundreds of GBs). It makes Git viable at scale with:

  • Virtual filesystem: On-demand file downloads (placeholders)
  • Background sync: Prefetch, GC, commit-graph maintenance
  • Standard Git compatibility: No toolchain changes needed

Architecture

Core Components

flowchart LR
  A[Scalar Daemon] --> B[VFS Driver/Placeholders]
  A --> C[Background Syncer]
  A --> D[Git Config Manager]
  B --> E[FS Interception]
  C --> F[Prefetch/GC/Commit-Graph]
  D --> G[Standard Git Commands]

Operating Modes

ModeDescriptionBest For
Full CloneDownload all objectsSmall/medium repos, CI
Scalar CloneMetadata only + on-demand filesMassive repos, daily dev
Partial Clone--filter=blob:noneBandwidth-limited, no full history needed

Installation & Setup

Install

# Windows (recommended)
winget install Microsoft.Scalar

# macOS
brew install scalar

# Linux
# Download .deb/.rpm or build from source

Register Repo

# Scalar clone (recommended for large repos)
scalar clone https://github.com/microsoft/Windows.git

# Or register existing repo
cd existing-repo
scalar register

Auto-configuration

# Scalar auto-configures:
git config core.fsmonitor true           # FS monitor
git config core.untrackedCache true      # Untracked cache
git config feature.manyFiles true        # Many files optimization
git config index.threads true            # Multi-threaded index
git config pack.threads true             # Multi-threaded pack
git config maintenance.auto true         # Auto maintenance

On-Demand Download (Placeholder Mechanism)

How It Works

flowchart TD
  A[User accesses file] --> B{File downloaded?}
  B -->|Yes| C[Read directly]
  B -->|No| D[Placeholder intercepts]
  D --> E[Background blob download]
  E --> F[Replace with real file]
  F --> C
  • Placeholder: Tiny stub file (bytes) marking content not downloaded
  • Trigger: First read/execute/edit of file
  • Transparent: Apps see normal files, zero awareness

Control Downloads

# Prefetch directory
scalar prefetch --path=src/

# Prefetch specific commit
scalar prefetch --commit=<sha>

# View download status
scalar diagnose

Background Sync & Maintenance

Auto Maintenance Tasks

# Scalar daemon periodically:
# 1. Prefetches remote updates
# 2. Runs git maintenance (GC, commit-graph, pack-refs)
# 3. Cleans expired placeholders
# 4. Updates remote tracking branches

Manual Triggers

# Full sync
scalar fetch

# Prefetch only
scalar prefetch

# Run maintenance
scalar maintain

# Health check
scalar diagnose

Standard Git Compatibility

Transparent Usage

# All standard Git commands work
git status
git add .
git commit -m "msg"
git push
git pull
git log
git blame
git diff

Known Limitations

OperationStatusNotes
git grepLimitedUndownloaded files not searched
git diffNormalCompares downloaded content
git blameNormalRequires file download
git stashNormal
SubmodulesPartialNeed separate registration

Best Practices for Large Repos

1. Use Scalar Clone

# Instead of git clone
scalar clone https://github.com/large/repo.git

2. Configure Prefetch Strategy

# Only prefetch recent commits' files
git config scalar.maxPrefetchCommits 100

# Exclude large dirs
git config scalar.excludePaths "vendor/,third_party/,bin/"

3. CI/CD Integration

# GitHub Actions
jobs:
  build:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v4
        with:
          repository: microsoft/Windows
      - name: Setup Scalar
        run: |
          scalar register
          scalar prefetch --commit=${{ github.sha }}
      - name: Build
        run: msbuild ...

Troubleshooting

Common Issues

# Placeholder stuck
scalar diagnose --verbose

# Sync failed
scalar fetch --verbose

# Daemon not running
scalar service start

# Reset repo state
scalar unregister
scalar register

Log Locations

# Windows
%LOCALAPPDATA%\Scalar\log\scalar.log

# macOS/Linux
~/.scalar/log/scalar.log

Alternatives Comparison

SolutionProsConsBest For
ScalarFull compat, on-demand, bg maintenanceRequires install on Win/macOS/LinuxMassive monorepos
Partial CloneNative Git, no extra toolsNo virtualization, needs networkMedium-large repos
Sparse CheckoutNative, dir-level filterStill downloads object metadataMonorepo partial dev
SubmodulesNative, modularComplex mgmt, weak atomicityMulti-repo architectures

Try it yourself

  1. Practice the scalar-git command in a test repository and observe state changes before and after
  2. Experiment with different options and compare the output differences
  3. Simulate a real scenario where you would need to use this, and walk through the full process

Continue Learning

  1. performance/partial-clone — Partial clone
  2. performance/git-maintenance — Maintenance framework
  3. internals/transfer-protocols-and-negotiation — Transfer protocols
  4. performance/large-repo-optimization — Large repo optimization