‹ Back to blog

Voyage into Git’s Depths

In the realm of code, a story unfolds, A tale of creation, where wisdom beholds, A system of magic, of power and grace, The legend of Git, in cyberspace.

A whisper of branches, where ideas take flight, Merging their forces, from darkness to light, In the hands of wizards, their keyboards they wield, Harnessing power, Git’s secrets unsealed.

A dance of commits, where changes entwine, In a river of time, an unbroken line, Each snapshot preserved, like a treasure chest, A history engraved, as code manifests.

Forks diverge paths, where new tales arise, Collaboration soars, as minds harmonize, A pull request beckons, a merge awaits, In Git’s hallowed halls, where creation dictates.

Conflicts resolved, as wisdom prevails, A symphony of code, where harmony hails, Cherry-picking moments, from branches afar, Like stars in the night, each commit a memoir.

A push and a pull, in the dance of the Git, A rhythm eternal, as worlds interknit, In this digital realm, a universe thrives, The Git Version Control, where code comes alive.

-gpt4

The Gist of Git

Git enables a group of collaborators to maintain a body of work using a peer to peer network of repositories. It is essentially a directed acyclic graph (DAG for short) that you can perform operations on to change the graph. This means that the edges are directed and there are no cycles. The edges are directed because each commit has a parent. The parent is the commit that the current commit was branched from. There are no cycles because each commit has a single parent. This means that there is only one way to get to a commit.

Origins of Git

Before Git was created, The Linux kernel was maintained with tarballs and patches which led the Linux community in search of a VCS that would meet all of their needs. The Linux community initially tried using two separate VCSs to manage the Linux Kernel codebase. These VCSs were called Bitkeeper and CVS. Bitkeeper was a proprietary VCS that was free to use for open source projects. CVS was a free and open source VCS. The Linux community eventually decided to use Bitkeeper because it was faster than CVS. However, Bitkeeper was not free to use for open source projects forever. In 2005, Bitkeeper decided to revoke the licenses of some core Linux kernel developers. This led Linus Torvalds to create Git. Git was designed to be a distributed VCS that was fast and free to use. Git was also designed to be a VCS that was easy to use for the Linux community. Torvalds had three usability design goals for Git.

These goals were:

  1. High performance
  2. Strong support for non-linear development (thousands of parallel branches)
  3. safeguards against content corruption (accidental and malicious)

Benefits of Git

Despite the added complexity that often comes with distributed version control system like Git some benefits include:

  1. The ability to work and view the history of a project offline
  2. The ability to work with one or more remote repositories
  3. Work can be published to multiple repositories
  4. Incremental commits

Git Toolkit Philosophy

Git being a product of Linus Torvalds and originally being intended for use by the linux community has a toolkit philosophy that is in alignment with Unix tradition. It consists ot both low-level commands and high-level commands. The low-level commands are the plumbing commands and the high-level commands are the porcelain commands. The low-level commands allow for manipulation of the directed acyclic graph and content tracking while the high-level commands allow for communication between repositories and general repository maintenence.

Git Object Primitives

Git is a content-addressable filesystem. This means that Git stores content in a way that allows it to be retrieved using the content’s hash. Git stores content in the form of objects. There are four types of objects in Git. These objects are blobs, trees, commits, and tags. All of these objects are stored in the .git/objects directory. The objects directory is a directory that contains all of the objects in the repository.

Blobs:

A blob is a binary large object that stores the contents of a file. It represents a file stored in the repository. Blobs are stored in the objects directory in a file named with the hash of the blob. Blobs are immutable and cannot be changed. If a file is changed, a new blob is created with the new content and the old blob is left untouched.

Trees:

A tree is a directory that contains blobs and other trees. It represents a directory stored in the repository.

Commits:

A commit is a snapshot of the repository at a given point in time. A commit contains a reference to a tree representing a top-level directory for that commit. It contains a reference to all of it’s parent commits along with various standard attributes such as metadata which contains informaition about the commit’s author.

Tags:

A tag is a named reference to a commit that is used to mark a specific point in the repositories history.

git init

The git init command transforms the current directory into a Git repository. In doing so, it generates a .git directory and populates it with various files. These files encompass all aspects of the Git configuration and the project’s history. They are simply standard files, devoid of any magic. Users can peruse and modify these files using a text editor or a shell, granting them the power to access and reshape their project’s history as effortlessly as they can their project files.

git add

Running git add creates a blob file with compressed data from the file you want to add and saves it to the objects directory within the .git directory (.git/objects). The blob files name is the result of hashing the file’s content. git add also stores the file to the index. The index holds a reference to every file Git keeps track of. The index is a file located at .git/index and maps all of the tracked files to their hashed blob equivalent. The index is a binary file, but it can be viewed using the git ls-files command.

git commit

git commit creates a tree graph from the index which represents the current state of the project. The tree graph records both the location and content of every tracked file. A tree graph consists of trees and blobs. Blobs are stored with git add and trees are stored on each commit. Trees represent directories. A tree object looks like this 040000 tree 0eed1217a2947f4930583229987d90fe5e8e0b74 folder. The first part is the file permissions, the second part is the type of object in this case tree instead of blob, and the third is the hash of the tree. The fourth part is the name of the folder or file in the case of a blob. Git commit also creates a commit object which is a text file located in .git/objects/ after creating the tree graph and points the current branch to the new commit object which points to the trees and blobs. The current branch can be found at .git/HEAD.

To be continued…