Logo

Cloning Just What You Need: Sparse Checkout for Large Repositories

5 min read
git

Table of Contents

I needed to fix a broken CI/CD pipeline, and it was blocking deploys. The problem? The repository was around 30GB, and a full clone would've taken hours. All I needed was the .github directory.

This is exactly what Git's sparse checkout was designed for. Combined with shallow cloning, it lets you clone just the directories you need in seconds instead of hours. It's not a workaround or a hack. It's the right tool for targeted changes in large repositories.

The Commands

Here's what worked for me:

1# Shallow clone with no files checked out
2git clone --depth 1 --filter=blob:none --sparse git@github.com:org/repo.git
3
4cd repo
5
6# Only checkout .github directory
7git sparse-checkout set .github

Let me break down what each flag does:

After running git sparse-checkout set .github, only the .github directory exists in my working tree. The .git folder is tiny since we skipped most of the blob data.

Working with Partial History

Since you're working with a subset of the repository, some Git commands behave differently. This isn't a limitation. It's just a different workflow that matches what you're trying to do: targeted edits, not full-repo operations.

Switching Branches

The usual git checkout other-branch doesn't work because remote tracking refs aren't fully set up in a shallow clone:

1git fetch origin develop
2git checkout develop
3
4# fatal: 'origin/develop' is not a commit and a branch 'develop' cannot be created from it

Instead, use FETCH_HEAD to reference what you just fetched:

1git fetch origin develop
2git checkout -b develop FETCH_HEAD

This creates a local branch from the fetched commit directly.

Getting Files from Other Branches

With a partial clone, traditional merging doesn't make sense. You don't have the full file tree, so there's nothing to merge into. If you try:

1git merge origin/feature-branch
2
3# fatal: refusing to merge unrelated histories

Git refuses because the shallow clone has no common commit history. Even with --allow-unrelated-histories, you'll get conflicts from files that aren't in your working directory.

The right approach is to checkout specific files directly:

1# Fetch the branch first
2git fetch origin feature-branch:refs/remotes/origin/feature-branch
3
4# Checkout just the file you need
5git checkout origin/feature-branch -- .github/workflows/deploy.yml

This grabs exactly what you need without trying to reconcile the entire repository.

Pushing Changes: Use Pull Requests

For production branches, always use pull requests. I made my changes to the sparse checkout, committed them, pushed to a feature branch, and opened a PR. The PR had no conflicts because I was only modifying files within my sparse checkout scope.

For non-production branches (like a development environment), you can push directly since the stakes are lower. But PRs are still the cleaner workflow.

Adding Files Outside Sparse Checkout

If you create a new file in a directory that's not in your sparse checkout, Git will complain:

1git add some-other-folder/test.txt
2
3# error: The following paths are not covered by your sparse-checkout:
4# some-other-folder/test.txt
5# hint: If you intend to update such entries, try one of the following:
6# hint: * Use the --sparse option.
7# hint: * Disable or modify the sparsity rules.

The fix is in the hint. Use the --sparse flag:

1git add --sparse some-other-folder/test.txt

Or expand your sparse checkout to include that directory:

1git sparse-checkout add some-other-folder

When to Use This

Sparse checkout with shallow clone is the right tool when:

It's not the right tool when:

Expanding When Needed

If you start with a sparse shallow clone and later realize you need more history or files, you can incrementally fetch them:

1# Get full history (but still sparse files)
2git fetch --unshallow
3
4# Or just deepen by N commits
5git fetch --depth=100

And if you need more directories:

1# Add another directory to sparse checkout
2git sparse-checkout add src/config
3
4# Or switch to full checkout
5git sparse-checkout disable

Quick Reference

1# Initial sparse shallow clone
2git clone --depth 1 --filter=blob:none --sparse <repo-url>
3git sparse-checkout set <directory>
4
5# Checkout a branch
6git fetch origin <branch>
7git checkout -b <branch> FETCH_HEAD
8
9# Get a file from another branch
10git fetch origin <branch>:refs/remotes/origin/<branch>
11git checkout origin/<branch> -- path/to/file
12
13# Add file outside sparse checkout
14git add --sparse path/to/file
15
16# Expand sparse checkout
17git sparse-checkout add <another-directory>
18
19# Convert to full clone
20git fetch --unshallow
21git sparse-checkout disable

This approach saved me hours and unblocked a critical deploy. It's not a replacement for full clones in day-to-day development, but for scoped, urgent changes in large repositories, it's exactly what Git designed these features for.

References

Related Articles