Implementation of a subset of git features
- Learn how git works in depth
- Try Scala3
- Have several loosely-coupled interchangeable components thanks to hexagonal architecture
- Try to integrate practices and patterns from DDD
- (double loop) TDD approach
- motivations and presentation of the objectives
- generated project
sbt new scala/scala3.g8
- hash a blob
- What is a blob?
- SHA1 of file with a prefix
blob <content_size>\0<content>
- Hash of a blob:
echo -n 'test content' | git hash-object --stdin
- Comparing with sha1 hash of the same string
echo -n 'blob 12\0test content' | shasum -a 1
- SHA1 of file with a prefix
- What is a blob?
๐บ Episode 2: Refactoring to use hexagonal architecture and introduce concepts like Command and UseCase
- refactoring and extension of the code to support other input options (file, write in database, type, etc.)
- setup domain and infrastructure packages (hexagonal architecture)
- write a test for Main
- introducing a
HashObjectCommand
- add zio (resource management, streaming, retries, parallelism, etc.)
- objective of the chapter: making a commit
- hash stdin string - change the way the command is used:
hash-object --text "test content"
instead ofhash-object "test content"
- Fix the encoding issue
- Hashing a stream of bytes (ZStream and ZSink)
- Write test to hash a file
- Refactor so the hash object usecase accepts several types of command
- Implement hashing a file
- Model the return type of the usecase with a richer type
- Update test to hash several files and implement
- [Refactor/hexagonal arch.] extract reading a file and have the implementation in the infrastructure package.
- problem in the hash object usecase
- fixing the problem
- [Business Logic] write a blob in git objects directory
- create an ObjectRepository
- [/] write a test for HashObjectUseCase verifying that the repository is called
- [Business Logic] write a blob in git objects directory
- create an ObjectRepository
- write a test for HashObjectUseCase verifying that the repository is called
- [/] create the implementation for the repository and test
- what to test? we are looking to test compatibility with Git: right place, right format
- [Business Logic] write a blob in git objects directory
- Object Repository File System
- Refactor the ObjectRepositoryFileSystemSpec to generate a single hash to avoid a "cache" issue.
- Implement Object Repository File System
- Object Repository File System
- Check that hash object use case is calling the object repository with the right value (with the blob + size prefix)
- Put things together: hash and save a blob from the app and try to read it with git
- Test missing: not call the repository when the save option is false
- refactor main to extract the parsing and the formatting part
- [Business Logic] read and write git index file
- read the git index file
- [Business Logic] read and write git index file
- create a dummy index file and read it
- refactor the code to use case classes
- [Business Logic] read and write git index file
- productionize the code
- [Business Logic] write a tree in git object directory
- refactor the MainSpec to separate the concerns
- use a more specific type than string for dealing with files
- [Business Logic] write a tree in git object directory
- [Business Logic] write a commit (with a tree hash provided)
Source: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelaigitn
Git uses the concept of Object. There 3 types of object:
- blobs. A blob basically represents the content of a file. It is stored in a file named after the hash of the content.
- trees. Trees are used to represent the hierarchy between blobs. A tree contains blobs and other trees with their names. For instance :
100644 blob dc711f442241823069c499197accce1537f30928 .gitignore
100644 blob e5d351c3cd44aa1d8c1cb967c7e7fde1dee4b0ad README.md
100644 blob 7a010b786eb29b895ba5799306052b996516d63b build.sbt
040000 tree 8bac5f27882165d313f5732bb4f140003156c693 project
040000 tree 163727ec9bd17ef32ee088a52a31fe0b483fa18f src
- there are different types of files:
100644
is a normal file,100755
is an executable file,120000
for symbolic links,040000
for tree160000
for sub-modules
- commits. Commits are used to capture :
- the
tree
snapshot of the code - the
parent(s)
commits. Usually a commit has only one parent, but it can have 0 to n parents. The first commit does not have any parent. A merge commit has several parents (usually 2). - the
author
- the
commiter
- a blank line
- the commit
message
- the
Those files are stored in .git/objects
. Each file representing either blob
s, tree
s or commit
s, are stored within directory named after the first two characters of the hexadecimal hash. For the hash dc711f442241823069c499197accce1537f30928
will be stored the in folder .git/objects/dc
.
The filename is the hash without the first two letters. For the hash dc711f442241823069c499197accce1537f30928
, the filename will be 711f442241823069c499197accce1537f30928
-- note that the prefix dc
has been removed here. The file corresponding to the hash dc711f442241823069c499197accce1537f30928
would be .git/objects/dc/711f442241823069c499197accce1537f30928
.
ZLib is a C library used for data compression. It only supports one algorithm: DEFLATE (also used in the zip archive format). This algorithm is widely used.
https://git-scm.com/docs/index-format
git cat-file
show information about an object-p <hash>
show the content of an object.hash
can bemaster^{tree}
to reference the tree object pointed to the last version of master.-t <hash>
show the type of object
git hash-object
(explicit)git update-index
Register file contents in the working tree to the indexgit write-tree
writes the staging area to a tree objectgit ls-files
--stage
or-s
show all files tracked
zlib-flate -uncompress < .git/objects/18/7fbaf52b4fdebd0111740829df5b51edc8b029
other program that deflates files
- https://git-scm.com/book/sv/v2/Git-Internals-Git-Objects
- https://stackoverflow.com/questions/4084921/what-does-the-git-index-contain-exactly
- https://git-scm.com/docs/gitglossary
- https://github.com/git/git/blob/master/Documentation/technical/index-format.txt
- https://git-scm.com/book/en/v2/Git-Internals-Packfiles
- Good explanations about the format of git tree https://stackoverflow.com/questions/14790681/what-is-the-internal-format-of-a-git-tree-object