Git Repository Structure: A Complete Technical Guide
Git is a distributed version control system that stores project history as a directed acyclic graph (DAG) of immutable snapshot objects. Every Git repository is built from three logical zones β the working directory, the staging index, and the object store inside .git/ β plus a set of lightweight pointers (branches, tags, remotes) that navigate that history. Understanding how these layers interact is the difference between using Git mechanically and using it with surgical precision.
If you self-host your repositories on a VPS, mastering this internal structure lets you recover from disasters, design efficient CI/CD pipelines, and audit every byte of your project's history without relying on a third-party platform.
The Three-Zone Model: How Git Moves Data
Before diving into individual components, internalize the data-flow model that governs every Git operation:
Working Directory --> Staging Area (Index) --> .git/ Object Store
(edit) (git add) (git commit)Changes travel left-to-right when you build a commit, and right-to-left when you restore or reset. Every Git command is essentially a read or write operation on one or more of these zones.
Working Directory
The working directory (also called the working tree) is the filesystem view of your project at a specific checkout state. When you run git clone or git checkout, Git reconstructs files from compressed objects in .git/objects/ and writes them to this directory.
Files in the working directory exist in one of four states:
- Untracked β Git has never seen this file; it exists only on disk.
- Tracked, unmodified β the file matches the last committed snapshot exactly.
- Tracked, modified β the file differs from the last committed snapshot but has not been staged.
- Tracked, deleted β the file was removed from disk but the deletion has not been staged.
A critical nuance that trips up many developers: the working directory is not a simple copy of the repository. Git reconstructs it by reading tree objects and decompressing blob objects. If .git/ is intact, you can always regenerate the working directory from scratch β the reverse is not true.
Sparse Checkout for Large Monorepos
On repositories with tens of thousands of files (common in monorepo architectures), you can limit which paths Git materializes in the working directory:
git sparse-checkout init --cone
git sparse-checkout set services/api services/authThis is invaluable on a VPS with constrained disk I/O, because Git skips decompressing blobs for paths outside the cone.
Staging Area (Index)
The staging area, internally called the index, is a binary file located at .git/index. It acts as a proposed next commit β a mutable snapshot that sits between your working directory and the permanent object store.
git add <file> # Stage a specific file
git add -p # Interactively stage hunks within a file
git add -u # Stage all tracked modifications and deletions
git status # Compare working directory and index against HEAD
git diff --cached # Show diff between index and HEADWhy the Index Exists
The index solves a problem that simpler VCS tools ignore: partial commits. You may have modified five files but only want three of them in the next commit. The index lets you compose exactly the snapshot you intend to record, independent of what your editor has open.
Edge case β index corruption: If a system crash interrupts a git add, the index file can become corrupt. Symptoms include git status hanging or reporting bizarre output. Recovery:
rm .git/index
git resetGit rebuilds the index from HEAD without touching your working directory.
The Index as a Merge Conflict Register
During a merge conflict, the index stores three versions of each conflicted file simultaneously (stages 1, 2, and 3 β base, ours, theirs). This is why git diff --cached shows nothing useful mid-conflict; you need git diff --cc or a merge tool to inspect all three stages.
The .git/ Directory: Anatomy of the Object Store
The .git/ directory is the repository. Everything else β the working directory, remote clones β is derived from it. Deleting .git/ turns a repository into a plain directory with no history.
.git/
βββ HEAD
βββ config
βββ description
βββ index
βββ COMMIT_EDITMSG
βββ hooks/
βββ info/
βββ logs/
β βββ HEAD
β βββ refs/
βββ objects/
β βββ info/
β βββ pack/
βββ refs/
βββ heads/
βββ remotes/
βββ tags/HEAD
HEAD is a plain text file containing either a symbolic ref (pointing to a branch) or a raw SHA-1 hash (detached HEAD state).
cat .git/HEAD
# ref: refs/heads/main <-- on a branch
# a3f1c9d... <-- detached HEADDetached HEAD is not an error state β it is intentional when you check out a tag or a specific commit for inspection. The danger is making commits in detached HEAD: those commits are reachable only via reflog until you attach them to a branch.
git checkout -b rescue-branch # Attach detached commits to a new branchconfig
The local repository configuration file. It overrides global (~/.gitconfig) and system (/etc/gitconfig) settings. Common entries:
[core]
repositoryformatversion = 0
filemode = true
bare = false
[remote "origin"]
url = git@github.com:user/repo.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/mainOn a self-hosted server, you will frequently edit this file directly when rotating remote URLs or configuring uploadpack.allowReachableSHA1InWant for partial clones.
refs/
The refs/ directory contains plain text files, each holding a single SHA-1 hash. They are the named pointers that make Git's DAG navigable.
| Ref Type | Path | Description |
|---|---|---|
| Local branch | refs/heads/<name> | Points to the tip commit of a branch |
| Remote-tracking branch | refs/remotes/<remote>/<name> | Local cache of a remote branch's tip |
| Lightweight tag | refs/tags/<name> | Points directly to a commit object |
| Annotated tag | refs/tags/<name> | Points to a tag object, which points to a commit |
| Stash | refs/stash | Points to the stash commit |
For performance, Git packs refs into .git/packed-refs once a repository accumulates many of them. Always check both locations when scripting against refs.
Git Objects: The Immutable Core
Everything stored in .git/objects/ is content-addressed: the filename is the SHA-1 (or SHA-256 in newer Git versions) hash of the object's content. This makes Git inherently tamper-evident β changing any byte changes the hash, breaking the chain.
The Four Object Types
| Object Type | What It Stores | Points To |
|---|---|---|
| Blob | Raw file content (no filename, no permissions) | Nothing |
| Tree | Directory listing: filenames, permissions, blob/tree SHAs | Blobs and other trees |
| Commit | Author, committer, timestamp, message, parent SHA(s) | One tree + zero or more parent commits |
| Tag | Tagger identity, timestamp, message, GPG signature | Usually a commit |
Inspecting Objects Directly
# Show the type of any object
git cat-file -t a3f1c9d
# Show the content of any object
git cat-file -p a3f1c9d
# Show the tree of the current HEAD commit
git ls-tree HEAD
# Show a specific blob's content
git show HEAD:src/main.pyLoose Objects vs. Pack Files
Initially, each object is stored as an individual compressed file under .git/objects/<2-char-prefix>/<38-char-suffix>. These are loose objects. Over time, Git runs git gc (garbage collection) to bundle loose objects into pack files (.git/objects/pack/*.pack) with a corresponding index (.pack.idx).
Pack files use delta compression β storing the difference between similar objects rather than full copies. A repository with thousands of similar text files can shrink dramatically after packing. On a VPS with limited NVMe capacity, running git gc --aggressive on large repositories before archiving is standard practice.
git count-objects -vH # Show loose object count and disk usage
git gc --aggressive # Repack aggressively (CPU-intensive)
git verify-pack -v .git/objects/pack/*.idx | sort -k3 -n | tail -20
# Find the 20 largest objects in the packCommit History: The Directed Acyclic Graph
Each commit object contains exactly one pointer to a tree object (the root directory snapshot) and zero or more pointers to parent commits. This forms a DAG where:
- Zero parents = the initial commit (root commit)
- One parent = a normal commit
- Two parents = a merge commit
- Three or more parents = an octopus merge (rare, used for integrating many feature branches simultaneously)
git log --oneline --graph --all # Visualize the full DAG
git log --format="%H %P" # Show each commit's SHA and parent SHA(s)Commit Immutability and Rewriting History
Because a commit's SHA is derived from its content (including parent SHAs), any rewrite creates a new commit with a new SHA. Operations like git rebase, git commit --amend, and git filter-repo do not modify history β they create parallel history. The old commits remain in the object store until garbage collected.
This is why force-pushing rewritten history to a shared branch is destructive: collaborators' local branches still point to the old commit chain.
Branches: Lightweight Pointers
A branch is nothing more than a 41-byte file containing a SHA-1 hash. Creating a branch is instantaneous regardless of repository size because Git only writes one small file.
git branch feature/auth # Create branch at current HEAD
git checkout -b feature/auth # Create and switch in one step
git switch -c feature/auth # Modern equivalent (Git 2.23+)
git branch -d feature/auth # Delete (safe: refuses if unmerged)
git branch -D feature/auth # Delete (force: regardless of merge status)Branch Internals
cat .git/refs/heads/main
# a3f1c9d8e2b1f4c7d9e0a1b2c3d4e5f6a7b8c9d0When you commit on a branch, Git writes the new commit SHA to this file. That is the entirety of "advancing a branch pointer."
Tracking Branches and Upstream Configuration
A tracking relationship tells Git which remote branch a local branch should compare against for git status divergence reporting and git pull behavior.
git branch --set-upstream-to=origin/main main
git branch -vv # Show tracking relationships and ahead/behind countsTags: Permanent Markers in History
Tags mark specific commits as significant β typically software releases. Unlike branches, tags are not moved by new commits.
| Feature | Lightweight Tag | Annotated Tag |
|---|---|---|
| Storage | A ref file pointing to a commit | A tag object in the object store |
| Metadata | None | Tagger name, email, date, message |
| GPG signing | Not possible | Supported via git tag -s |
| Recommended for releases | No | Yes |
Transfer with git push --tags | Yes | Yes |
git tag v2.1.0 # Lightweight tag at HEAD
git tag -a v2.1.0 -m "Release 2.1.0" # Annotated tag
git tag -s v2.1.0 -m "Signed release" # GPG-signed annotated tag
git push origin --tags # Push all tags to remote
git push origin v2.1.0 # Push a specific tagCritical pitfall: git push does not push tags by default. Teams frequently forget this and publish release notes referencing a tag that does not exist on the remote.
Remotes: Distributed Collaboration
A remote is a named URL stored in .git/config. Remote-tracking branches (under refs/remotes/) are local read-only snapshots of the remote's branches, updated only when you explicitly fetch.
git remote add origin git@github.com:user/repo.git
git remote -v # List remotes with URLs
git remote set-url origin <new-url> # Change a remote URL
git fetch origin # Update remote-tracking branches
git fetch --prune # Remove stale remote-tracking branches
git push origin main # Push local main to remote
git push -u origin feature/auth # Push and set upstream trackingMultiple Remotes
A single repository can track multiple remotes β common when maintaining a fork alongside the upstream:
git remote add upstream git@github.com:original/repo.git
git fetch upstream
git merge upstream/mainWhen self-hosting bare repositories on a dedicated server for your team, each developer adds the server as a remote and uses SSH key authentication for push access.
Hooks: Automated Enforcement at Every Git Event
Hooks are executable scripts in .git/hooks/. Git calls them at defined points in the workflow. They are not transferred by git clone or git push β each developer (or server) must install them independently. This is a frequent source of confusion in team environments.
Client-Side Hooks
| Hook | Trigger | Common Use |
|---|---|---|
pre-commit | Before commit message prompt | Linting, secret scanning, test execution |
prepare-commit-msg | After default message created | Inject branch name into message |
commit-msg | After user writes message | Enforce conventional commit format |
post-commit | After commit is recorded | Local notifications |
pre-push | Before git push executes | Run full test suite |
pre-rebase | Before rebase starts | Prevent rebasing published branches |
Server-Side Hooks
| Hook | Trigger | Common Use |
|---|---|---|
pre-receive | Before refs are updated | Enforce branch protection, reject force-push |
update | Per-ref during receive | Per-branch policy enforcement |
post-receive | After all refs updated | Trigger CI/CD, send notifications |
Example: Pre-commit Hook for Secret Detection
#!/usr/bin/env bash
# .git/hooks/pre-commit
if git diff --cached --name-only | xargs grep -lE '(AKIA|passwords*=|api_keys*=)' 2>/dev/null; then
echo "ERROR: Potential secret detected in staged files. Commit aborted."
exit 1
fi
exit 0Make it executable:
chmod +x .git/hooks/pre-commitFor team-wide hook distribution, use a tool like Husky (Node.js projects) or store hooks in a hooks/ directory at the repository root and symlink them during project setup.
Reflog: The Safety Net
The reflog records every movement of HEAD and branch pointers, including operations that appear to destroy history (hard resets, rebases, amended commits). It is stored in .git/logs/.
git reflog # Show HEAD movement history
git reflog show main # Show movement history for a specific branch
git checkout HEAD@{3} # Check out the state HEAD was in 3 moves ago
git branch recovered HEAD@{5} # Recover commits by branching from a reflog entryReflog entries expire after 90 days by default (gc.reflogExpire). On a production server, consider extending this:
git config gc.reflogExpire 180
git config gc.reflogExpireUnreachable 30Bare Repositories: Server-Side Hosting
A bare repository has no working directory. It contains only the contents of .git/ at the root level. Bare repositories are the correct format for centralized hosting β they accept pushes without the complications of a checked-out branch.
git init --bare /srv/repos/myproject.gitWhen you push to GitHub, GitLab, or a self-hosted Git server, you are pushing to a bare repository. If you host your own Git server on a VPS with cPanel or a raw Linux VPS, bare repositories under /srv/repos/ with SSH access are the standard architecture.
Initializing a Shared Bare Repository
# On the server
git init --bare --shared=group /srv/repos/project.git
chown -R git:developers /srv/repos/project.git
# On a developer's machine
git remote add origin git@yourserver.com:/srv/repos/project.git
git push -u origin mainGit Object Storage: Size, Integrity, and Maintenance
Checking Repository Health
git fsck --full # Verify object integrity (finds dangling and corrupt objects)
git fsck --lost-found # Write dangling objects to .git/lost-found/Finding and Removing Large Objects
Large binary files accidentally committed are a common cause of bloated repositories. Identify them before using git filter-repo to excise them:
# Find the 10 largest objects by compressed size
git verify-pack -v .git/objects/pack/*.idx
| sort -k3 -rn
| head -10
| awk '{print $1}'
| xargs -I{} git cat-file -p {}# Remove a file from all history (requires git-filter-repo)
git filter-repo --path path/to/large-file.bin --invert-pathsAfter filtering, all collaborators must re-clone β their local repositories reference SHA hashes that no longer exist in the rewritten history.
Comparison: Key Git Repository Concepts
| Concept | Type | Mutable | Stored In | Transferred by Push/Fetch |
|---|---|---|---|---|
| Blob | Object | No | .git/objects/ | Yes (when reachable) |
| Tree | Object | No | .git/objects/ | Yes (when reachable) |
| Commit | Object | No | .git/objects/ | Yes (when reachable) |
| Annotated Tag | Object | No | .git/objects/ | Only with --tags |
| Branch | Ref | Yes | .git/refs/heads/ | Yes |
| Remote-tracking branch | Ref | Yes (on fetch) | .git/refs/remotes/ | No (local cache) |
| Lightweight Tag | Ref | No | .git/refs/tags/ | Only with --tags |
| HEAD | Symref/hash | Yes | .git/HEAD | No |
| Index | Binary file | Yes | .git/index | No |
| Hooks | Scripts | Yes | .git/hooks/ | No |
| Reflog | Log | Yes (auto-expires) | .git/logs/ | No |
Practical Decision Matrix and Key Takeaways
Use this checklist when setting up or auditing a Git repository on your infrastructure:
Repository initialization
- Use
git init --bare --shared=groupfor any repository that will receive pushes from multiple users. - Store bare repositories outside web-accessible directories (never under
/var/www/).
Object store health
- Run
git fsck --fullafter any storage incident or filesystem error. - Schedule
git gcperiodically on long-lived repositories; automate it via cron on your server. - Monitor pack file size with
git count-objects -vH; investigate if loose object count exceeds 1,000.
Branch and ref hygiene
- Delete merged branches promptly; stale refs accumulate and slow down
git fetch --pruneoperations. - Use
git fetch --prunein CI pipelines to avoid acting on deleted remote branches.
Hook deployment
- Never rely on
.git/hooks/for team-wide policy β hooks are not cloned. Use server-sidepre-receivehooks or a CI gate instead. - Audit server-side hooks after every Git server upgrade; hook interpreter paths can change.
Security on self-hosted servers
- Restrict SSH access to the
gituser with forced commands (command=inauthorized_keys). - Use
git-shellas the login shell for thegituser to prevent arbitrary command execution. - Pair your repository server with a valid SSL certificate if you expose any web interface (Gitea, GitLab, cgit).
History rewriting
- Never rewrite history on branches shared with others without a coordinated migration plan.
- After
git filter-repo, all collaborators must re-clone; update CI/CD remote URLs immediately.
Disaster recovery
- Extend reflog expiry on production servers (
gc.reflogExpire = 180). - Keep a secondary bare clone on a separate host as a backup; a simple
git fetchfrom the primary is sufficient.
FAQ
What is the difference between a bare and a non-bare Git repository?
A non-bare repository has a working directory where files are checked out, plus a .git/ subdirectory containing the object store. A bare repository contains only the object store at its root (no working directory) and is the correct format for a shared server that receives pushes.
Can I recover commits after running git reset --hard?
Yes, as long as the commits have not been garbage collected. Run git reflog to find the SHA of the commit you want to recover, then git checkout -b recovery-branch <SHA> to attach it to a new branch. Reflog entries are retained for 90 days by default.
Why does git push not transfer my tags?
By design, git push only transfers commits reachable from the refs you explicitly push. Tags are separate refs and must be pushed with git push origin --tags (all tags) or git push origin <tagname> (a specific tag).
What happens to the index during a merge conflict?
The index stores all three versions of each conflicted file simultaneously: stage 1 (common ancestor/base), stage 2 (your version), and stage 3 (their version). Normal git add only writes stage 0 (resolved). Until all conflicts are resolved and staged, git commit will refuse to proceed.
How do Git hooks differ between client-side and server-side deployments?
Client-side hooks run on the developer's machine and are not enforced centrally β any developer can bypass them by deleting the hook file. Server-side hooks (pre-receive, update, post-receive) run on the hosting server and cannot be bypassed by the client, making them the correct enforcement point for branch protection policies, code review requirements, and CI/CD triggers.
on All Hosting Services