Home/Blog/Git LFS: Managing Large Files in Git Repositories
Software Engineering

Git LFS: Managing Large Files in Git Repositories

Learn how to use Git Large File Storage (LFS) to manage large binary files, images, videos, and datasets in your Git repositories without slowing down operations.

By Inventive HQ Team
Git LFS: Managing Large Files in Git Repositories

Git excels at tracking text files, but struggles with large binary files. Every clone downloads the entire history, and binary files don't compress or diff efficiently. Git Large File Storage (LFS) solves this by replacing large files with lightweight pointers while storing actual content separately. This guide covers setup, workflows, migration, and alternatives for managing large files in Git.

Why Git Struggles with Large Files

┌─────────────────────────────────────────────────────────────┐
│              GIT WITHOUT LFS                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Repository: 50MB code + 2GB images                        │
│                                                              │
│   Clone operation:                                           │
│   ├── Download all commits                                  │
│   ├── Download ALL versions of ALL images                   │
│   └── Total: 8GB (historical versions)                      │
│                                                              │
│   Time: 20+ minutes                                          │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│              GIT WITH LFS                                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Repository: 50MB code + pointer files                     │
│                                                              │
│   Clone operation:                                           │
│   ├── Download all commits (code + pointers)                │
│   └── Download only CURRENT version of images               │
│                                                              │
│   Total: 250MB                                               │
│   Time: 2 minutes                                            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

When to Use Git LFS

File TypeTypical SizeLFS Recommended
Source code< 100KBNo
Config files< 1MBNo
Small images< 500KBOptional
PSD/AI files10-500MBYes
Video files100MB+Yes
ML models100MB+Yes
Game assets10MB+Yes
Compiled binaries10MB+Yes
Datasets10MB+Yes

Rule of thumb: Track files with LFS if they're binary AND (larger than 1MB OR change frequently).

Setting Up Git LFS

Installation

# macOS
brew install git-lfs

# Ubuntu/Debian
sudo apt install git-lfs

# Windows
# Download from https://git-lfs.github.com/
# Or use: choco install git-lfs

# Initialize Git LFS for your user
git lfs install

Repository Setup

# Navigate to your repository
cd my-repo

# Track file types with LFS
git lfs track "*.psd"
git lfs track "*.mp4"
git lfs track "*.zip"
git lfs track "assets/large/**"

# Check tracked patterns
git lfs track

# This creates/updates .gitattributes
cat .gitattributes
# *.psd filter=lfs diff=lfs merge=lfs -text
# *.mp4 filter=lfs diff=lfs merge=lfs -text
# ...

# Commit the tracking configuration
git add .gitattributes
git commit -m "Configure Git LFS tracking"

How LFS Works

┌─────────────────────────────────────────────────────────────┐
│                    GIT LFS FLOW                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌─────────────┐                                           │
│   │  git add    │──► LFS filter detects tracked file        │
│   │  logo.psd   │                                           │
│   └──────┬──────┘                                           │
│          │                                                   │
│          ▼                                                   │
│   ┌─────────────────────────────────────────┐               │
│   │  1. Calculate SHA-256 of file content   │               │
│   │  2. Store file in .git/lfs/objects/     │               │
│   │  3. Create pointer file for staging     │               │
│   └─────────────┬───────────────────────────┘               │
│                 │                                            │
│                 ▼                                            │
│   ┌─────────────────────────────────────────┐               │
│   │  Pointer file content:                   │               │
│   │  version https://git-lfs.github.com/... │               │
│   │  oid sha256:abc123...                    │               │
│   │  size 15728640                           │               │
│   └─────────────┬───────────────────────────┘               │
│                 │                                            │
│                 ▼                                            │
│   ┌─────────────┐      ┌────────────────────┐               │
│   │ git commit  │──►   │  Commit contains   │               │
│   │             │      │  only pointer      │               │
│   └──────┬──────┘      └────────────────────┘               │
│          │                                                   │
│          ▼                                                   │
│   ┌─────────────┐      ┌────────────────────┐               │
│   │  git push   │──►   │ Pointer to GitHub  │               │
│   │             │      │ File to LFS server │               │
│   └─────────────┘      └────────────────────┘               │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Common Workflows

Adding New Large Files

# Ensure file type is tracked
git lfs track "*.psd"

# Add and commit normally
git add design.psd
git commit -m "Add design file"
git push

Cloning Repositories with LFS

# Standard clone (downloads LFS files automatically)
git clone https://github.com/org/repo.git

# Clone without LFS files (faster for large repos)
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/org/repo.git
cd repo
# Download specific files later
git lfs pull --include="assets/needed/*"

# Or download all LFS files
git lfs pull

Checking LFS Status

# List tracked patterns
git lfs track

# List all LFS files in repository
git lfs ls-files

# Show LFS file information
git lfs ls-files -l

# Check LFS status
git lfs status

# Verify LFS files
git lfs fsck

Fetching and Pulling

# Fetch LFS objects (download without checkout)
git lfs fetch

# Fetch specific paths only
git lfs fetch --include="assets/textures/*"

# Pull (fetch + checkout)
git lfs pull

# Fetch from specific remote
git lfs fetch origin

# Fetch all refs (branches, tags)
git lfs fetch --all

Migrating Existing Files to LFS

Track New Files Going Forward

# Track pattern before adding files
git lfs track "*.psd"
git add .gitattributes
git commit -m "Track PSD files with LFS"

# Now add the files
git add designs/*.psd
git commit -m "Add design files"

Migrate Files Already in History

Warning: This rewrites Git history. Coordinate with your team.

# See what would be migrated
git lfs migrate info --include="*.psd"

# Migrate files in history
git lfs migrate import --include="*.psd" --everything

# For specific branches only
git lfs migrate import --include="*.psd" --include-ref=main --include-ref=develop

# Force push after migration
git push --force-with-lease

Cleaning Up After Migration

# Remove old objects from local repo
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# Team members must re-clone
# Old clones still have bloated history

CI/CD Integration

GitHub Actions

name: Build
on: push

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout with LFS
        uses: actions/checkout@v4
        with:
          lfs: true

      - name: Build
        run: npm run build

Optimized with caching:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Cache LFS objects
        uses: actions/cache@v4
        with:
          path: .git/lfs
          key: lfs-${{ hashFiles('.lfs-assets-id') }}
          restore-keys: lfs-

      - name: Pull LFS files
        run: git lfs pull

      - name: Build
        run: npm run build

Selective LFS pull (faster):

steps:
  - uses: actions/checkout@v4

  - name: Pull only needed LFS files
    run: |
      git lfs install
      git lfs pull --include="src/assets/images/*" --exclude="*.psd"

GitLab CI

build:
  variables:
    GIT_LFS_SKIP_SMUDGE: "1"  # Skip automatic LFS
  script:
    - git lfs pull --include="needed-files/*"
    - npm run build

Storage and Hosting

Provider Comparison

ProviderFree StorageFree BandwidthPaid Plans
GitHub1 GB1 GB/month$5/50GB pack
GitLab.com5 GB10 GB/month$60/year more
Bitbucket1 GB1 GB/monthVaries by plan
Self-hostedUnlimitedUnlimitedStorage costs

Self-Hosted LFS Server

Using git-lfs-s3:

# Install
go install github.com/git-lfs/lfs-test-server@latest

# Configure S3 backend
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=xxx
export LFS_CONTENTPATH=s3://my-bucket/lfs
export LFS_ADMINUSER=admin
export LFS_ADMINPASS=secret

# Run server
lfs-test-server

Configure repository:

# Point repo to custom LFS server
git config lfs.url https://my-lfs-server.com/org/repo

Using .lfsconfig (committed to repo):

[lfs]
  url = https://my-lfs-server.com/org/repo

Troubleshooting

Common Issues

Problem: "This repository is over its data quota"

# Check storage usage
git lfs info

# Prune old versions locally
git lfs prune

# Remove files from LFS tracking (keeps in regular Git)
git lfs untrack "*.old"

Problem: LFS files showing as pointer text

# Check if LFS is installed
git lfs install

# Re-checkout LFS files
git lfs checkout

# Or pull all LFS content
git lfs pull

Problem: Slow clone/pull

# Clone without LFS, then selective pull
GIT_LFS_SKIP_SMUDGE=1 git clone <url>
cd repo
git lfs pull --include="needed/**"

# Parallel downloads
git config lfs.concurrenttransfers 8
git lfs pull

Problem: File too large for GitHub

# GitHub limit is 2GB per file
# Split large files or use different storage

# Check file sizes
git lfs ls-files -s

Debugging

# Verbose output
GIT_TRACE=1 GIT_TRANSFER_TRACE=1 git lfs pull

# Check LFS configuration
git lfs env

# Verify file integrity
git lfs fsck

Alternatives to Git LFS

Comparison

┌─────────────────────────────────────────────────────────────┐
│                  LARGE FILE SOLUTIONS                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Git LFS                                                    │
│   ├── Best for: General binary file versioning              │
│   ├── Pros: Simple, well-supported, integrated              │
│   └── Cons: Bandwidth costs, requires LFS support           │
│                                                              │
│   DVC (Data Version Control)                                 │
│   ├── Best for: ML datasets, pipelines, experiments         │
│   ├── Pros: ML-focused features, remote storage options     │
│   └── Cons: Separate tool, learning curve                   │
│                                                              │
│   git-annex                                                  │
│   ├── Best for: Complex storage backends, partial sync      │
│   ├── Pros: Flexible, works with any storage                │
│   └── Cons: Complex setup, different mental model           │
│                                                              │
│   Partial Clone + Sparse Checkout                            │
│   ├── Best for: Huge monorepos with no LFS support          │
│   ├── Pros: Native Git, no extra tools                      │
│   └── Cons: Limited to recent Git versions                  │
│                                                              │
│   External Storage (S3 + references)                         │
│   ├── Best for: Truly massive files (10GB+)                 │
│   ├── Pros: No size limits, cheap storage                   │
│   └── Cons: Manual management, no versioning                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Git Partial Clone (Native Alternative)

# Clone without blob content
git clone --filter=blob:none https://github.com/org/repo.git
cd repo

# Files downloaded on demand when accessed
cat large-file.bin  # Downloaded now

# Sparse checkout for large repos
git sparse-checkout init
git sparse-checkout set src/ docs/

DVC for ML Projects

# Install DVC
pip install dvc

# Initialize in repo
dvc init

# Track large file
dvc add data/training-set.parquet

# Configure remote storage
dvc remote add -d myremote s3://my-bucket/dvc

# Push data
dvc push

# Pull data on another machine
dvc pull

Best Practices

.gitattributes Patterns

# Images
*.png filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
*.psd filter=lfs diff=lfs merge=lfs -text
*.ai filter=lfs diff=lfs merge=lfs -text

# Videos
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text
*.avi filter=lfs diff=lfs merge=lfs -text

# Audio
*.mp3 filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text

# Archives
*.zip filter=lfs diff=lfs merge=lfs -text
*.tar.gz filter=lfs diff=lfs merge=lfs -text

# Binaries
*.exe filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
*.so filter=lfs diff=lfs merge=lfs -text

# Data
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text

# Game/3D assets
*.fbx filter=lfs diff=lfs merge=lfs -text
*.blend filter=lfs diff=lfs merge=lfs -text
*.unitypackage filter=lfs diff=lfs merge=lfs -text

Repository Organization

project/
├── src/                    # Regular Git (code)
├── docs/                   # Regular Git (documentation)
├── assets/                 # LFS tracked
│   ├── images/
│   ├── videos/
│   └── designs/
├── data/                   # LFS tracked (or DVC for ML)
│   ├── raw/
│   └── processed/
├── .gitattributes         # LFS tracking patterns
└── .lfsconfig             # LFS server configuration

Frequently Asked Questions

Find answers to common questions

Git LFS (Large File Storage) replaces large files in your repository with small pointer files while storing actual content on a remote server. Git wasn't designed for large binaries—every clone downloads entire history. LFS lets you version large files (images, videos, datasets, binaries) without bloating repository size or slowing down clones.

Track binary files that change or are large: images (PSD, PNG, JPG), videos (MP4, MOV), audio (WAV, MP3), archives (ZIP, TAR), compiled binaries (DLL, EXE), datasets (CSV over 1MB, Parquet), game assets (FBX, textures), ML models. Don't track small text files or files that compress well in Git.

GitHub includes 1GB storage and 1GB/month bandwidth free with every account. Additional data packs cost $5/month for 50GB storage plus 50GB bandwidth. GitLab offers 5GB free storage. Self-hosted Git LFS has no storage limits but requires your own storage backend (S3, MinIO, local filesystem).

You'll get the pointer files instead of actual content. Each pointer is a small text file containing the SHA-256 hash and file size. Your builds will likely fail because the actual binary isn't present. Install Git LFS and run git lfs pull to download the actual files.

Use git lfs migrate: First track file types (git lfs track '.psd'), then migrate history (git lfs migrate import --include='.psd' --everything). This rewrites Git history—coordinate with team members who must re-clone. For files not yet committed, simply track before adding: git lfs track then git add.

Yes, use actions/checkout with lfs option: steps: - uses: actions/checkout@v4 with: lfs: true. For selective fetch (faster CI), use git lfs pull --include='needed-files/*' after checkout. Cache LFS files between runs using actions/cache to avoid repeated downloads.

git-annex stores files in configurable backends and supports partial clones. DVC (Data Version Control) is optimized for ML datasets with pipeline tracking. Git partial clone with sparse checkout fetches only needed files. External storage (S3, GCS) with references in repo—simpler but no versioning. Choose based on: Git LFS for general use, DVC for ML, git-annex for flexibility.

Prune old versions: git lfs prune removes unreferenced files locally. Delete remote objects via provider's API or web UI for versions you'll never need. Use git lfs migrate to remove files from history that shouldn't have been tracked. Set up lifecycle policies on your storage backend to auto-delete old versions.

LFS downloads happen after regular clone, one file at a time by default. Speed up with: git lfs install --skip-smudge then clone, then git lfs pull -j 8 for parallel downloads. Use git clone --depth 1 for shallow clone, then selective git lfs pull --include='needed-paths/*'. Consider LFS caching proxy for CI.

Export objects: git lfs fetch --all to ensure all versions are local. Configure new remote: git config lfs.url https://new-server/org/repo.git/info/lfs. Push to new storage: git lfs push --all origin. Update .lfsconfig if using repo-specific config. Team members need git lfs pull after the switch.

Engineering Excellence for Your Business

Our engineers build systems that scale. Clean architecture, comprehensive testing, and security-first development.