Save 15% on All Hosting Services

Test your skills and get Discount on any hosting plan

Use code: Skills Get Started
FAQ’s Sections
Linux Virtual Servers

Git Repository Structure: A Complete Technical Guide

Git is a distributed version control system that stores project history as a directed acyclic graph (DAG) of immutable snapshot objects. Every Git repository is built from three logical zones β€” the working directory, the staging index, and the object store inside .git/ β€” plus a set of lightweight pointers (branches, tags, remotes) that navigate that history. Understanding how these layers interact is the difference between using Git mechanically and using it with surgical precision.

If you self-host your repositories on a VPS, mastering this internal structure lets you recover from disasters, design efficient CI/CD pipelines, and audit every byte of your project's history without relying on a third-party platform.

The Three-Zone Model: How Git Moves Data

Before diving into individual components, internalize the data-flow model that governs every Git operation:

Working Directory  -->  Staging Area (Index)  -->  .git/ Object Store
     (edit)               (git add)                  (git commit)

Changes travel left-to-right when you build a commit, and right-to-left when you restore or reset. Every Git command is essentially a read or write operation on one or more of these zones.

Working Directory

The working directory (also called the working tree) is the filesystem view of your project at a specific checkout state. When you run git clone or git checkout, Git reconstructs files from compressed objects in .git/objects/ and writes them to this directory.

Files in the working directory exist in one of four states:

  • Untracked β€” Git has never seen this file; it exists only on disk.
  • Tracked, unmodified β€” the file matches the last committed snapshot exactly.
  • Tracked, modified β€” the file differs from the last committed snapshot but has not been staged.
  • Tracked, deleted β€” the file was removed from disk but the deletion has not been staged.

A critical nuance that trips up many developers: the working directory is not a simple copy of the repository. Git reconstructs it by reading tree objects and decompressing blob objects. If .git/ is intact, you can always regenerate the working directory from scratch β€” the reverse is not true.

Sparse Checkout for Large Monorepos

On repositories with tens of thousands of files (common in monorepo architectures), you can limit which paths Git materializes in the working directory:

git sparse-checkout init --cone
git sparse-checkout set services/api services/auth

This is invaluable on a VPS with constrained disk I/O, because Git skips decompressing blobs for paths outside the cone.

Staging Area (Index)

The staging area, internally called the index, is a binary file located at .git/index. It acts as a proposed next commit β€” a mutable snapshot that sits between your working directory and the permanent object store.

git add <file>          # Stage a specific file
git add -p              # Interactively stage hunks within a file
git add -u              # Stage all tracked modifications and deletions
git status              # Compare working directory and index against HEAD
git diff --cached       # Show diff between index and HEAD

Why the Index Exists

The index solves a problem that simpler VCS tools ignore: partial commits. You may have modified five files but only want three of them in the next commit. The index lets you compose exactly the snapshot you intend to record, independent of what your editor has open.

Edge case β€” index corruption: If a system crash interrupts a git add, the index file can become corrupt. Symptoms include git status hanging or reporting bizarre output. Recovery:

rm .git/index
git reset

Git rebuilds the index from HEAD without touching your working directory.

The Index as a Merge Conflict Register

During a merge conflict, the index stores three versions of each conflicted file simultaneously (stages 1, 2, and 3 β€” base, ours, theirs). This is why git diff --cached shows nothing useful mid-conflict; you need git diff --cc or a merge tool to inspect all three stages.

The .git/ Directory: Anatomy of the Object Store

The .git/ directory is the repository. Everything else β€” the working directory, remote clones β€” is derived from it. Deleting .git/ turns a repository into a plain directory with no history.

.git/
β”œβ”€β”€ HEAD
β”œβ”€β”€ config
β”œβ”€β”€ description
β”œβ”€β”€ index
β”œβ”€β”€ COMMIT_EDITMSG
β”œβ”€β”€ hooks/
β”œβ”€β”€ info/
β”œβ”€β”€ logs/
β”‚   β”œβ”€β”€ HEAD
β”‚   └── refs/
β”œβ”€β”€ objects/
β”‚   β”œβ”€β”€ info/
β”‚   └── pack/
└── refs/
    β”œβ”€β”€ heads/
    β”œβ”€β”€ remotes/
    └── tags/

HEAD is a plain text file containing either a symbolic ref (pointing to a branch) or a raw SHA-1 hash (detached HEAD state).

cat .git/HEAD
# ref: refs/heads/main        <-- on a branch
# a3f1c9d...                  <-- detached HEAD

Detached HEAD is not an error state β€” it is intentional when you check out a tag or a specific commit for inspection. The danger is making commits in detached HEAD: those commits are reachable only via reflog until you attach them to a branch.

git checkout -b rescue-branch   # Attach detached commits to a new branch

config

The local repository configuration file. It overrides global (~/.gitconfig) and system (/etc/gitconfig) settings. Common entries:

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
[remote "origin"]
    url = git@github.com:user/repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
    remote = origin
    merge = refs/heads/main

On a self-hosted server, you will frequently edit this file directly when rotating remote URLs or configuring uploadpack.allowReachableSHA1InWant for partial clones.

refs/

The refs/ directory contains plain text files, each holding a single SHA-1 hash. They are the named pointers that make Git's DAG navigable.

Ref TypePathDescription
Local branchrefs/heads/<name>Points to the tip commit of a branch
Remote-tracking branchrefs/remotes/<remote>/<name>Local cache of a remote branch's tip
Lightweight tagrefs/tags/<name>Points directly to a commit object
Annotated tagrefs/tags/<name>Points to a tag object, which points to a commit
Stashrefs/stashPoints to the stash commit

For performance, Git packs refs into .git/packed-refs once a repository accumulates many of them. Always check both locations when scripting against refs.

Git Objects: The Immutable Core

Everything stored in .git/objects/ is content-addressed: the filename is the SHA-1 (or SHA-256 in newer Git versions) hash of the object's content. This makes Git inherently tamper-evident β€” changing any byte changes the hash, breaking the chain.

The Four Object Types

Object TypeWhat It StoresPoints To
BlobRaw file content (no filename, no permissions)Nothing
TreeDirectory listing: filenames, permissions, blob/tree SHAsBlobs and other trees
CommitAuthor, committer, timestamp, message, parent SHA(s)One tree + zero or more parent commits
TagTagger identity, timestamp, message, GPG signatureUsually a commit

Inspecting Objects Directly

# Show the type of any object
git cat-file -t a3f1c9d

# Show the content of any object
git cat-file -p a3f1c9d

# Show the tree of the current HEAD commit
git ls-tree HEAD

# Show a specific blob's content
git show HEAD:src/main.py

Loose Objects vs. Pack Files

Initially, each object is stored as an individual compressed file under .git/objects/<2-char-prefix>/<38-char-suffix>. These are loose objects. Over time, Git runs git gc (garbage collection) to bundle loose objects into pack files (.git/objects/pack/*.pack) with a corresponding index (.pack.idx).

Pack files use delta compression β€” storing the difference between similar objects rather than full copies. A repository with thousands of similar text files can shrink dramatically after packing. On a VPS with limited NVMe capacity, running git gc --aggressive on large repositories before archiving is standard practice.

git count-objects -vH    # Show loose object count and disk usage
git gc --aggressive      # Repack aggressively (CPU-intensive)
git verify-pack -v .git/objects/pack/*.idx | sort -k3 -n | tail -20
# Find the 20 largest objects in the pack

Commit History: The Directed Acyclic Graph

Each commit object contains exactly one pointer to a tree object (the root directory snapshot) and zero or more pointers to parent commits. This forms a DAG where:

  • Zero parents = the initial commit (root commit)
  • One parent = a normal commit
  • Two parents = a merge commit
  • Three or more parents = an octopus merge (rare, used for integrating many feature branches simultaneously)
git log --oneline --graph --all    # Visualize the full DAG
git log --format="%H %P"           # Show each commit's SHA and parent SHA(s)

Commit Immutability and Rewriting History

Because a commit's SHA is derived from its content (including parent SHAs), any rewrite creates a new commit with a new SHA. Operations like git rebase, git commit --amend, and git filter-repo do not modify history β€” they create parallel history. The old commits remain in the object store until garbage collected.

This is why force-pushing rewritten history to a shared branch is destructive: collaborators' local branches still point to the old commit chain.

Branches: Lightweight Pointers

A branch is nothing more than a 41-byte file containing a SHA-1 hash. Creating a branch is instantaneous regardless of repository size because Git only writes one small file.

git branch feature/auth           # Create branch at current HEAD
git checkout -b feature/auth      # Create and switch in one step
git switch -c feature/auth        # Modern equivalent (Git 2.23+)
git branch -d feature/auth        # Delete (safe: refuses if unmerged)
git branch -D feature/auth        # Delete (force: regardless of merge status)

Branch Internals

cat .git/refs/heads/main
# a3f1c9d8e2b1f4c7d9e0a1b2c3d4e5f6a7b8c9d0

When you commit on a branch, Git writes the new commit SHA to this file. That is the entirety of "advancing a branch pointer."

Tracking Branches and Upstream Configuration

A tracking relationship tells Git which remote branch a local branch should compare against for git status divergence reporting and git pull behavior.

git branch --set-upstream-to=origin/main main
git branch -vv    # Show tracking relationships and ahead/behind counts

Tags: Permanent Markers in History

Tags mark specific commits as significant β€” typically software releases. Unlike branches, tags are not moved by new commits.

FeatureLightweight TagAnnotated Tag
StorageA ref file pointing to a commitA tag object in the object store
MetadataNoneTagger name, email, date, message
GPG signingNot possibleSupported via git tag -s
Recommended for releasesNoYes
Transfer with git push --tagsYesYes
git tag v2.1.0                              # Lightweight tag at HEAD
git tag -a v2.1.0 -m "Release 2.1.0"       # Annotated tag
git tag -s v2.1.0 -m "Signed release"      # GPG-signed annotated tag
git push origin --tags                      # Push all tags to remote
git push origin v2.1.0                      # Push a specific tag

Critical pitfall: git push does not push tags by default. Teams frequently forget this and publish release notes referencing a tag that does not exist on the remote.

Remotes: Distributed Collaboration

A remote is a named URL stored in .git/config. Remote-tracking branches (under refs/remotes/) are local read-only snapshots of the remote's branches, updated only when you explicitly fetch.

git remote add origin git@github.com:user/repo.git
git remote -v                          # List remotes with URLs
git remote set-url origin <new-url>    # Change a remote URL
git fetch origin                       # Update remote-tracking branches
git fetch --prune                      # Remove stale remote-tracking branches
git push origin main                   # Push local main to remote
git push -u origin feature/auth        # Push and set upstream tracking

Multiple Remotes

A single repository can track multiple remotes β€” common when maintaining a fork alongside the upstream:

git remote add upstream git@github.com:original/repo.git
git fetch upstream
git merge upstream/main

When self-hosting bare repositories on a dedicated server for your team, each developer adds the server as a remote and uses SSH key authentication for push access.

Hooks: Automated Enforcement at Every Git Event

Hooks are executable scripts in .git/hooks/. Git calls them at defined points in the workflow. They are not transferred by git clone or git push β€” each developer (or server) must install them independently. This is a frequent source of confusion in team environments.

Client-Side Hooks

HookTriggerCommon Use
pre-commitBefore commit message promptLinting, secret scanning, test execution
prepare-commit-msgAfter default message createdInject branch name into message
commit-msgAfter user writes messageEnforce conventional commit format
post-commitAfter commit is recordedLocal notifications
pre-pushBefore git push executesRun full test suite
pre-rebaseBefore rebase startsPrevent rebasing published branches

Server-Side Hooks

HookTriggerCommon Use
pre-receiveBefore refs are updatedEnforce branch protection, reject force-push
updatePer-ref during receivePer-branch policy enforcement
post-receiveAfter all refs updatedTrigger CI/CD, send notifications

Example: Pre-commit Hook for Secret Detection

#!/usr/bin/env bash
# .git/hooks/pre-commit

if git diff --cached --name-only | xargs grep -lE '(AKIA|passwords*=|api_keys*=)' 2>/dev/null; then
    echo "ERROR: Potential secret detected in staged files. Commit aborted."
    exit 1
fi
exit 0

Make it executable:

chmod +x .git/hooks/pre-commit

For team-wide hook distribution, use a tool like Husky (Node.js projects) or store hooks in a hooks/ directory at the repository root and symlink them during project setup.

Reflog: The Safety Net

The reflog records every movement of HEAD and branch pointers, including operations that appear to destroy history (hard resets, rebases, amended commits). It is stored in .git/logs/.

git reflog                          # Show HEAD movement history
git reflog show main                # Show movement history for a specific branch
git checkout HEAD@{3}               # Check out the state HEAD was in 3 moves ago
git branch recovered HEAD@{5}       # Recover commits by branching from a reflog entry

Reflog entries expire after 90 days by default (gc.reflogExpire). On a production server, consider extending this:

git config gc.reflogExpire 180
git config gc.reflogExpireUnreachable 30

Bare Repositories: Server-Side Hosting

A bare repository has no working directory. It contains only the contents of .git/ at the root level. Bare repositories are the correct format for centralized hosting β€” they accept pushes without the complications of a checked-out branch.

git init --bare /srv/repos/myproject.git

When you push to GitHub, GitLab, or a self-hosted Git server, you are pushing to a bare repository. If you host your own Git server on a VPS with cPanel or a raw Linux VPS, bare repositories under /srv/repos/ with SSH access are the standard architecture.

Initializing a Shared Bare Repository

# On the server
git init --bare --shared=group /srv/repos/project.git
chown -R git:developers /srv/repos/project.git

# On a developer's machine
git remote add origin git@yourserver.com:/srv/repos/project.git
git push -u origin main

Git Object Storage: Size, Integrity, and Maintenance

Checking Repository Health

git fsck --full          # Verify object integrity (finds dangling and corrupt objects)
git fsck --lost-found    # Write dangling objects to .git/lost-found/

Finding and Removing Large Objects

Large binary files accidentally committed are a common cause of bloated repositories. Identify them before using git filter-repo to excise them:

# Find the 10 largest objects by compressed size
git verify-pack -v .git/objects/pack/*.idx 
  | sort -k3 -rn 
  | head -10 
  | awk '{print $1}' 
  | xargs -I{} git cat-file -p {}
# Remove a file from all history (requires git-filter-repo)
git filter-repo --path path/to/large-file.bin --invert-paths

After filtering, all collaborators must re-clone β€” their local repositories reference SHA hashes that no longer exist in the rewritten history.

Comparison: Key Git Repository Concepts

ConceptTypeMutableStored InTransferred by Push/Fetch
BlobObjectNo.git/objects/Yes (when reachable)
TreeObjectNo.git/objects/Yes (when reachable)
CommitObjectNo.git/objects/Yes (when reachable)
Annotated TagObjectNo.git/objects/Only with --tags
BranchRefYes.git/refs/heads/Yes
Remote-tracking branchRefYes (on fetch).git/refs/remotes/No (local cache)
Lightweight TagRefNo.git/refs/tags/Only with --tags
HEADSymref/hashYes.git/HEADNo
IndexBinary fileYes.git/indexNo
HooksScriptsYes.git/hooks/No
ReflogLogYes (auto-expires).git/logs/No

Practical Decision Matrix and Key Takeaways

Use this checklist when setting up or auditing a Git repository on your infrastructure:

Repository initialization

  • Use git init --bare --shared=group for any repository that will receive pushes from multiple users.
  • Store bare repositories outside web-accessible directories (never under /var/www/).

Object store health

  • Run git fsck --full after any storage incident or filesystem error.
  • Schedule git gc periodically on long-lived repositories; automate it via cron on your server.
  • Monitor pack file size with git count-objects -vH; investigate if loose object count exceeds 1,000.

Branch and ref hygiene

  • Delete merged branches promptly; stale refs accumulate and slow down git fetch --prune operations.
  • Use git fetch --prune in CI pipelines to avoid acting on deleted remote branches.

Hook deployment

  • Never rely on .git/hooks/ for team-wide policy β€” hooks are not cloned. Use server-side pre-receive hooks or a CI gate instead.
  • Audit server-side hooks after every Git server upgrade; hook interpreter paths can change.

Security on self-hosted servers

  • Restrict SSH access to the git user with forced commands (command= in authorized_keys).
  • Use git-shell as the login shell for the git user to prevent arbitrary command execution.
  • Pair your repository server with a valid SSL certificate if you expose any web interface (Gitea, GitLab, cgit).

History rewriting

  • Never rewrite history on branches shared with others without a coordinated migration plan.
  • After git filter-repo, all collaborators must re-clone; update CI/CD remote URLs immediately.

Disaster recovery

  • Extend reflog expiry on production servers (gc.reflogExpire = 180).
  • Keep a secondary bare clone on a separate host as a backup; a simple git fetch from the primary is sufficient.

FAQ

What is the difference between a bare and a non-bare Git repository?

A non-bare repository has a working directory where files are checked out, plus a .git/ subdirectory containing the object store. A bare repository contains only the object store at its root (no working directory) and is the correct format for a shared server that receives pushes.

Can I recover commits after running git reset --hard?

Yes, as long as the commits have not been garbage collected. Run git reflog to find the SHA of the commit you want to recover, then git checkout -b recovery-branch <SHA> to attach it to a new branch. Reflog entries are retained for 90 days by default.

Why does git push not transfer my tags?

By design, git push only transfers commits reachable from the refs you explicitly push. Tags are separate refs and must be pushed with git push origin --tags (all tags) or git push origin <tagname> (a specific tag).

What happens to the index during a merge conflict?

The index stores all three versions of each conflicted file simultaneously: stage 1 (common ancestor/base), stage 2 (your version), and stage 3 (their version). Normal git add only writes stage 0 (resolved). Until all conflicts are resolved and staged, git commit will refuse to proceed.

How do Git hooks differ between client-side and server-side deployments?

Client-side hooks run on the developer's machine and are not enforced centrally β€” any developer can bypass them by deleting the hook file. Server-side hooks (pre-receive, update, post-receive) run on the hosting server and cannot be bypassed by the client, making them the correct enforcement point for branch protection policies, code review requirements, and CI/CD triggers.

Linux Operating Systems
Virtual Servers
Linux