Learn-git

Git and Github have revolutionized the way one creates, maintains and shares software code. It is said to be the Linus Torvald's second gift to the world, first obviously being the Linux operating system. Nowadays it is common for job seekers to showcase their work in the form of several github repositories so that various employers can evaluate the job seeker in a much better way. Open source projects are thriving because of easy to use git based social coding platforms. The popularity of these platforms has grown to such an extent that many non-programmers are using git and github for maintaining version control of their work. Personally I know two nonfiction writers/journalists who use git to maintain their various documents.

My guess is that the transition from a newbie to a moderately skilled user of github might take a week or two. It is easy to mug up commands and just use it. But if you want to understand the way git does certain things and why it does certain things, it would be a good idea to spend some time understanding each of the main ideas of git. There are many tutorials, videos, screencasts, podcasts available on the internet. They may be a good starting point for a newbie. There are also book length treatment given to Git, Github etc. Out of the massive content that is available to understand Git, I think this book is one of the best introductions, the reasons being,

  • Each main idea is condensed in to a one hour lesson, meaning that someone else has thought through the process of "what's the ideal chunking necessary to understand git"?
  • Each idea is illustrated via visuals, i.e the ideas that you glean from the book stick with you longer
    Easy to understand examples through out the book
  • There is a lab at the end of every chapter. One cannot learn unless one practices, more so when you are completely new to the subject
  • Examples are reused all over the book so that there is some sort of reinforcement of the previous ideas

In this post, I will summarize the main points of each chapter.


Chapter 1 - Before you begin

Firstly, the rationale of title. Each chapter is designed to be read during your lunch hour, not literally though. It basically means that all you need to read any chapter in the book is an hour of your time. In that hour, you should be able to read the text and go through each of the "Try it now" exercises for a particular chapter. The exercises at the end of each chapter will help reinforce the points of the chapter

The author suggests a learning path that a reader could follow, i.e. one chapter each day for 20 days. There are no pre-requisites to reading this book. Any novice can pick up the material covered in the book, if he chooses to allocate 20 hrs. My guess is that even if you manage to spend 14 hours on this book, it should make you conversant with the git workings and should turn you in to a moderately skilled git user.


Chapter 2 - An Overview of Git and Version Control

If you try coding stuff, then you typically have files that you will modify and save. You might save the file multiple times but a set of saves might warrant a comment. For a reader who isn't well versed with version control, she might try to incorporate comments in the file name itself by suffixing or prefixing the file name. This type of managing files doesn't scale well. Hence the need for a versioning system.

Every version control system has three concepts

  • Versioning
  • Auditing
  • Branching

The power of Git comes from the following features:

  • Distributed repositories: Each developer has his own repository that she can commit to. There is no problem of taking a backup as every user of the centralized repository has a full working repository. The basic difference between previous version control systems and git is that the latter is a DVCS (Distributed version control system). This means that you don't need to run a Git server to get all its benefits. You don't even need a network to run Git's commands. Every developer is given a version control of the repository.
  • Fast branching: One of the ways to keep the distinction between development and production code is to separate it out in to two folders. One has to switch between folders to know the appropriate folder to work on. Git makes branching extremely fast. Internally it manages branching by a set of pointers. There is no need to copy files and other things. The speed with which you can create a new branch and begin working creates a new model for doing work : if you want to try a new idea in your code, create your own branch. Because you're working in a local repository, no other developer is disturbed by this new code stream. Your work is safe and isolated.
  • Staging area: There are situations where you want a specific code to be used while developing but a sanitized code to be used in the production. For example, the username and password could be hard-coded in development stage but not in production. Git has a concept of staging where you stage the file but commit a sanitized version of the file to the repository

The author gives a tour of Git via GUI interface as well as CLI. Towards the end of the chapter, the author lists down the various terms that one comes across while using git

  • Branch
  • Check out
  • Clone
  • Commit
  • Distributed
  • Repository
  • Staging area
  • Timeline
  • Version Control

 

Chapter 3 - Getting Oriented with Git

The syntax for using any git command is

git [switches] <command> [<args>]

switches are optional arguments, command is the git command and args are the arguments to the git command.

For example in the following command,

git -p config --global user.name "RK"

In the above command, -p is the switch to paginate output if needed, config is the git command, –global, user.name, "RK" are three arguments

This chapter introduces basic command line functions that are used to create, remove, rename files and directories. The most important function of this chapter is to make the user set the global user.name and global email setting. These two values plus a bunch of other stuff will be used by git to create SHA1 for the commit objects.

The following are the commands mentioned in the chapter :

Command   Description
git config --global user.name "Your Name" Add your name to the global configuration
git config –global user.email "Your email"    Add your email to the global configuration
git config –list    Display all the git configurations    
git config user.name Displays the user.name configuration value
git config user.email Displays the user.email configuration value
git help help    Ask git to help about its help system
git help -a    Print all the git available commands
git –paginate help -a    Paginate the display of all the git available commands
git help -g     Print all the git available guides
git help glossary Display the git glossary



Chapter 4 - Making and Using a git repository

This chapter introduces the basics of creating and using a git repository. git init creates a repository on your local machine.

There are two things to keep in mind

  • No server was started
  • The repository was entirely local

The init commands creates a special folder called .init and it contains a host of folders for managing the commit objects, trees, references etc. There is a difference between working directory and repository. The working directory is the place where you do your work. The repository is a specialized storage area in which you can save versioned files. The repository lives inside the working directory.

For any file you create in the working directory, you need to make it git aware. This can be done via git add command. Once git add is run on a file, Git knows about your file and tracks changes to it. But since the file is not committed, there is no time information that is recorded in git. A good way to to imagine adding a file to git is, putting a file in a queue called staging area. It appears in the repository only after one commits the file with the relevant comment.

The following are the commands mentioned in the chapter :

Command   Description
git init Initialize a git repository in the current repository
git status Display status of current directory, as it relates to Git
git add FILE start tracking FILE in Git; adds FILE to the staging area
git commit -m MSG Commit changes to the git repository, with a message in quotes
git commit -a -m MSG Adds the unstaged files and creates a new commit object
git log     Display the log history
git log –status Displays the log with the files that were modified
git ls-files List the files in the repository

 


Chapter 5 - Using Git with a GUI

This chapter uses GUI to create/add/commit to the repository. I have used Cola Git Gui to explore the various lessons in this chapter. Towards the end of the chapter, the author touches upon Tcl/Tk.Tcl is a dynamic interpreted language invented in 1988 by John Ousterhout. Tk, a toolkit of GUI controls, was added to the language not long after. Both Git Gui and gitk are written in Tcl/Tk.


Chapter 6 - Tracking and Updating files in Git

The author introduces "staging area" in Git via the following analogy:

Pretend that your code is an actor in a theater production. The dressing room is the Git working directory. The actor gets a costume and makeup all prepared for the upcoming scene. The stage manager calls the actor and says that the scene is just about to start. The actor (the code) doesn't jump in front of the audience right then. Instead, it waits in an area behind the curtain. This is our Git staging area. Here, the actor might have one last look at the costume or makeup. If the actor (the code) looks good, the stage manager opens the curtains, and the code commits itself to its performance.

Whenever you change anything in the working directory, that change has to be reflected in the staging area. This staging area can be committed to git. The author shows step by step procedure of adding a file to the staging area, committing the file, checking the log messages, figuring out the difference between staged file and the file in the working directory etc.

The following are the commands mentioned in the chapter :

Command   Description
git commit -m "Message" commit changes with the log message entered on the command line via -m switch
git diff Show any difference between tracked files in the current directory and the staging area
git diff –staged Show any difference between the files in the staging area and repository
git commit -a -m "Message" Perform git add, and Perform git commit with the given message
git add –dry-run Show what git add would do
git add . Add all new files to the git repository
git log –shortstat –oneline Show history using one line per commit, and listing each file changed per commit


Chapter 7 - Committing parts of changes

The way to delete a file from the repository is to remove the file from the staging area first and then commit to the repository. If you use bash command rm it will only remove the file from the working directory. To remove the the file from staging area, use git rm. This removes the file from the staging area as well as the current directory. The same logic applies to renaming files too. Use git mv command to rename files in the staging directory as well as the working directory. In a way it might seem like staging area is pretty redundant. However it is extremely useful in committing partial files. You can choose the portion of files that you want to stage by using git add -p filename. This will throw a list of hunks that you can choose to stage or ignore. It took me sometime to get used to understand this functionality. The other aspect that is covered in the chapter is about commit. When to commit ? It makes sense to commit to the repository under any of these conditions:

  • Adding or deleting a file
  • Renaming a file
  • Updating a file to a known good working state
  • When you anticipate being away from the work
  • When you introduce some questionable code

However the author feels that since all the commits are local to user machine, it is better to commit as frequently as possible

The following are the commands mentioned in the chapter :

Command   Description
git rm file Remove file from the staging area
git mv file1 file2 Rename file1 to file2 in the staging area
git add -p Pick parts of your changes to add to staging area
git reset file Reset your staging area, removing any changes you have done with git add
git checkout file Check out the latest committed version of the file in to your working directory


Chapter 8 - The time machine that is Git

Each commit has a unique SHA1 ID associated with it. This code is generated based on author's email, time of the commit, the files in the staging area and previous commit SHA1. The fact that it is based on previous commit SHA1 means you can traverse the entire version tree via the latest commit SHA1. No two commit objects will ever share a common SHA1 ID. At the beginning of the project, HEAD and master point to the same version. As you keep doing commits, master points to the latest commit and so does HEAD. However once you checkout a particular version, then the HEAD moves back in time to that particular version. One of the easy ways to refer to SHA1s are by using tags. You can set a particular SHA1 a specific tag that you can use it later for quick checkout.

The following are the commands mentioned in the chapter :

Command   Description
git log –parents Show the history, displaying the parent commit's SHA1 ID for each commit
git log –parents –abbrev-commit Same as the preceding command, but shorten the SHA1 ID
git log –oneline Display history concisely using one line per each commit
git log –patch Display the history, showing the file differences between each commit
git log –stat Display the history, showing a summary of the file changes between each commit
git log –patch-with-stat Display the history combining patch and stat output
git log –oneline file_one Display the history for file_one
git rev-parse    Translate a branch name or tag in to a specific SHA1
git checkout your_sha1id change your working directory to match a specified sha1id       
git tag tag_name -m "message" sha1id create a tag named tag_name, pointing to your sha1id
git tag List all tags
git show tag_name Show information about the tag named tag_name


Chapter 9 - Taking a fork in the road

Branching is one of the most important concepts in git. Typically you start with a master code and as time goes, you keep creating divergent code bases. Each of the divergent code base could represent a bug fix, an enhancement, a new feature, etc. Each branch has a reference called master that refers to the latest commit in that specific branch. There is also a reference by name,"HEAD", that refers to the commit of the checked out commit. If the checked out code and latest commit represent the same set of files, then master and HEAD point to the same commit object. One often forgets that SHA1 for every commit object includes the information of its parent object. Once you create branches, one obviously needs to know commands to

  • switch to another branch
  • list down all the branches
  • create a DAG showing all the branches
  • Difference between the codebase between two branches
  • Creating and checking out a branch in single line of code

The author gradually introduces commands to do all the above. He concludes the chapter after introducing git stash and git pop commands.

The commands mentioned in this chapter are :

Command   Description
git branch List all branches
git branch dev Create a new branch named dev
git checkout dev Change your working directory to the branch named dev
git branch -d master Delete the branch named master
git log –graph –decorate –pretty=oneline –abbrev-commit View history of the repository across all branches
git branch -v    List all branches with SHA1 information
git branch fixing_readme YOUR_SHA1ID Making a branch using YOUR_SHA1ID as the starting point
git checkout -b another_fix_branch fixing_readme Make a branch name another fix_branch using branch fixing_readme as the starting point
git reflog Show a record of all times you changed branches     
git stash Set the current work in progress to stash, so you can perform a git checkout
git stash list List works in progress that you have stashed away
git stash pop Apply the most recently saved stash to the current working directory, remove it from stash


Chapter 10 - Merging Branches

"Branch often" is the mantra of a git user. In that sense, merging the created branch with the master or any other branch becomes very important. Branching diverges code base and Merging converges code base. Using the pneumonic "traffic merges in to us", the author reinforces the point that git merge command is used to merge other branches in to the branch we are on. A merge results in creating a commit object that has two parent commits. One of the most useful commands to explore the master branch commit structure is

git log --graph --oneline --decorate --all --parents --abbrev-commit

In any merge, there is a possibility of conflicts between the code bases. The conflicts can be resolved by opening the conflict files, choosing the appropriate hunk, and creating a new commit by merge. The author shows the steps to do a git merge via UI tools. The chapter ends with the discussion of fast-forward merge. This type of merge arises when you the target branch is a direct descendant of the branch that it will merge with. Git also has the ability to merge multiple branches, the jargon for such a task is, "octopus merge".

The following are the commands mentioned in the chapter :

Command   Description
git diff BRANCH1…BRANCH2 Indicate the difference between BRANCH1 and BRANCH2 relative to when they first became different
git diff –name-status BRANCH1…BRANCH2 Summarize the difference between BRANCH1 and BRANCH2
git merge BRANCH2 Merge BRANCH2 in to the current branch that you're on
git log -l    A shorthand for git log -n 1
git mergetool open a tool to help perform a merge between two conflicted branches
git merge –abort Abandon a merge between two conflicted branches
git merge-base BRANCH1 BRANCH2    Show the base commit between BRANCH1 and BRANCH2


Chapter 11 - Cloning

When you typically want to share your code, you can either copy your working directory code and send it across OR in the git's world, host your repository for others to clone it. In the first approach, all your version control is lost. The receiver has no way to track changes that you make in your code after the code has been shared. In the second approach, all your version history is intact and anyone can clone your directory to get access to the entire history of commits. The crucial advantage of cloning is that the copy is linked to the original repository and you can send and receive changes back to the original.

When you clone a directory, the only branch that appears in the clone is the active branch from the original repository, i.e the branch that is pointed by HEAD. When you look at the tracking branches in a repository cloned from another one, you see a strange naming convention such as remotes/origin/branch_name. For each branch on the remote repository, git creates a reference branch.The remote-tracking branches, like regular branches, point to the last commit of that line of development. Because every commit points to its parent, you can see how you have the entire history. If you want to develop code by working on any reference branch, you checkout the branch in the usual way using git branch and it creates a branch off the remote tracking branch

The author introduces bare directory, i.e. a standalone directory that contains only a git repository and nothing else. An important aspect of a bare directory is that it has no reference to the original repository. Unlike a clone, which has a reference to its originating repository, the bare directory is a completely standalone repository. Because of this, and the fact that it has no working directory, bare directories are often the official copy of a repository. The only way to update it is to push to it, and the only way to retrieve its contents is to clone, or pull, from it.

The following are the commands mentioned in the chapter :

Command   Description
git clone source destination_dir Clone the Git repository at source to the destination_dir
git log –oneline -all Display all commit log entries from all branches
git log –simplify-by-decoration –decorate –all –oneline Display the history in a simplified form
git branch -all Show remote-tracking branches in addition to local branches
git clone –bare source destination_dir Clone the bare directory of the source directory into the destination_dir
git ls-tree HEAD Display all the files for HEAD


Chapter 12 - Collaborating with Remotes

This chapter talks about creating references to one or many remote repositories. The remote could be a single or multiple repositories. These remotes could reside anywhere on the network. Once you set up a remote and clone the repository, you are all set to send and receive changes from the remotes. The usual word attributed to remote repository is "origin". However you can change it to refer to any word that sticks with your mental model.

The following are the commands mentioned in the chapter :

Command   Description
git checkout -f master checkout the master branch, throw away any changes in the current branch
git remote Displays the name of the remote directory
git remote -v show Displays the names of the remotes along with the corresponding URL
git remote add bob ../math.bob Add a remote names bob that points to the local repository ../math.bob
git ls-remote bob Display the references of a remote repository
GIT_TRACE_PACKET git ls-remote REMOTE Display the underlying network interaction


Chapter 13 - Pushing your changes

git push is a command that affects another repository besides your own. Once you are done with the changes in your local repository, you might want to share your code with a remote repository. In the case where the remote repository has not changed, the code can be easily merged via a fast-forward merge. If you get a conflict in pushing code, you need to fix your local repository by pulling changes from the remote and then pushing your code. If you create a new branch in your local repository and then try to push your code, git will crib. You have to use –set-upstream switch so that git creates a branch on the remote and then pushes the code to it. The author also explains the way to delete branches on the remote. It is a two step process where you first delete the branch from the local repository and then use a specific syntax to push to the remote, post which the branch on the remote is also deleted. The last section of the chapter talks about pushing and deleting tags on the remote.

The following are the commands mentioned in the chapter :

Command   Description
git push origin master Push the master branch to the remote name origin
git push Push the current branch to the default remote-tracking branch set up by git checkout or git push –set-upstream
git push –set-upstream origin new_branch create a remote tracking branch to new_branch on the remote named origin
git config –get-regexp branch List all the git configuration settings that have the word branch in the variable name
git branch -d local branch Remove the local branch named local branch
git push origin :remotebranch Remove the branch named remotebranch from the remote named origin
git tag -a TAG_NAME -m TAG_MESSAGE SHA1 create a tag to the sha1 with the name tag_name and the message tag_message
git push origin TAGNAME Push the tag named TAGNAME to the remote named origin
git push –tags Push all the tags to the default remote
git push origin :TAGNAME Delete the tag named TAGNAME on the remote named origin
git tag -d TAGNAME Remove the tag named TAGNAME from the local repository


Chapter 14 - Keeping in sync

The rationale for syncing is simple - git will not allow you to push your code to the remote until your local repository is in sync with the remote. git pull is a two part operation. git pull comprises git fetch and git merge. The first step comprises fetching the remote repository and seeing to it that your repository look like remote repository. This overlays all the commits from the remote repository on to the working repository. The crucial thing to note is the pointer by name FETCH_HEAD that points to the most recent remote tracking branch that was fetched. When git merge is done on your working branch, you use the FETCH_HEAD pointer to merge in all the changes of the same branch on the remote.

The following are the commands mentioned in the chapter :

Command   Description
git pull Sync your repository with the repository that you cloned from. This comprises git fetch and git merge
git fetch The first part of git pull . This brings in new commits from the remote repository and updates the remote-tracking branch
git merge FETCH_HEAD Merge the new commits from FETCH_HEAD in to the current branch
git pull –ff-only The -ff-only will allow a merge if FETCH_HEAD is a descendant of the current branch


Chapter 15 - Software archaeology

This chapter gives elaborate explanation of various switches that go with the git log command. Detailed explanations are given for understanding gitk view configurations.

The following are the commands mentioned in the chapter :

Command   Description
git log –merges List commits that are a result of merges
git log –oneline FILE List commits that affect FILE
git log –grep=STRING List commits that have STRING in the commit message
git log –since MM/DD/YYYY –until MM//DD/YYYY List commits between two dates
git shortlog Summarizes commits by various authors
git shorlog -e Summarizes commits by various authors including email
git log –author=AUTHOR List commits by AUTHOR
git log -stat HEAD^..HEAD List the difference between the current checked out branch and its immediate parent
git branch –column List the branches by column name
git name-rev SHA1 Given a SHA1, it gives the name of the branch
git grep STRING Find all the files with the given STRING
git blame FILE Display blame output for a FILE


Chapter 16 - Understanding git rebase

It is often the case that the checked out branch that you are working in the local directory goes out of sync with the remote master because of a collaborator committing it to the remote master. If you want to push your branch on to remote, git will crib. One of the ways to deal with this situation is to use git rebase. This command alters the history of your local directory by downloading the remote repository commit and then adding your changes as the descendant of the HEAD branch of the downloaded commit.The most important reason for using git rebase is to change the starting point of your local branches. In case there is an accidental rebase, one can always use git reflog and reset the head to the point at the relevant SHA1 ID. The chapter concludes by introducing git cherry-pick that can copy a specific commit to the current branch.

The following are the commands mentioned in the chapter :

Command   Description
git log –oneline master..new_feature Show the commits between the master branch and the feature branch
git rebase master Rebase your current branch with the latest commit from master
git reflog Display the reflog
git reset –hard HEAD@{4} Reset HEAD to point to the SHA1 ID represented by HEAD@{4}.
git cherry-pick SHA1 ID Copy the commit to the current branch you are on


Chapter 17 - Workflows and branching conventions

This chapter discusses the unwritten rules, policy and convention relating to git.

  • Try to keep the Git commit subject under 50 characters
  • It might make sense to limit users who are given rights to push the code
  • Standardize the name of branches
  • Depending on whether there is a need to maintain the history of every commit or not, one might want to use git rebase or not
  • Standardize the name of the tags that can be used

The author explains two workflows that are popular amongst git users

  • git-flow : There are two main branches in a git-flow repository. Other branches such as feature and release are created temporarily and then deleted when finished. The master branch contains released production-level code. This is what the public can see, perhaps on a deployed website or in some released software that they've feature downloaded from you.The develop branch release contains code that is about to be released
  • GitHub flow: There is one master branch that is forever alive. There are feature branches that are brought in to existence whenever required. Once the feature is developed, it is merged in to the master branch. Unlike git-flow workflow, the branches are not deleted in this type of work-flow.

The following are the commands mentioned in the chapter :

Command   Description
git commit –allow-empty -m "Initial commit" Create a commit without adding any files
git merge –no-ff BRANCH Merge BRANCH in to the current branch, creating a merge commit even if its a fast-forward commit
git flow A git command that becomes available after installing gitflow


Chapter 18 – Working with Github
Github is a service that hosts git repositories. These repositories are typically bare directories and they contain all the version control related files and folders. The way to go about creating a github repository is via an UI on the github website. Once the bare directory is created, it is ready to be used. One can add the URL of the bare repository using git add remote command and then the rest is same as communicating with any remote repository. All the commands such as push, fetch, merge, pull remain the same. The power of github lies in widespread collaboration on a single project. If you want to contribute to a github repository XYZ, the first thing one needs to do is to fork it. A fork creates a replica of the XYZ and this can serve as your own private space to play with the entire repository. You can clone it on to your local machine, hack it, develop on the code etc. There is one key element that needs to kept in mind. All the changes that you push on to the github will only be present in your fork. They will not be reflected in the original XYZ repository unless you send a request to the XYZ owner and the owner accepts your pull request. Github has a cool UI that enables any developer to send pull requests to owner. It also has many features that enable an owner to keep track of various pull requests, maintain wiki and much more.

The following are the commands mentioned in the chapter :

Command   Description
git remote add github https:/// Add a rename named github that points to your math repo on github
git push -u github master push your master branch to remote identified by github
git clone http:/// Clone your github repository named math


Chapter 19 - Third Party Tools and Git

I did not go through this chapter as I do not foresee, at least in the near future, using IDE plugins mentioned in the chapter, i.e. Atlassian's SourceTree, EclipseIDE integration


|Chapter 20 – Sharpening your Git

This chapter urges the reader to explore the configuration files. There are three levels at which config options can be set. First at the local or repository level. Second at the global level and third at the System level. The switches used to access each of the three levels are –local, –global, –system. Each configuration is specified as name=value pair. The author explain ways to configure various IDEs with git like notepad++, nano etc. The author concludes the chapter by giving some general directions for continually learning git.

The following are the commands mentioned in the chapter :

Command   Description
git config –local –list List the local git configuration
git config –global –list List the global git configuration
git config –system –list List the system git configuration
git -c log.date=relative log -n 2 Show the last two commits using the relative date format
git config –local log.date relative Save the relative date format in the local Git configuration
git config –local –edit Edit the local Git configuration
git config –global –edit Edit the global Git configuration
git config –system –edit Edit the system Git configuration
git -c core.editor=echo config –local –edit Print the name of git configuration file     
git -c core.editor=nano config –local –edit Edit the local git configuration file using nano
git config core.excludesfile Print the value of the core.excludesfile git configuration settings

 

image Takeaway :

This book is an excellent book to learn git for someone who is short on time. Each chapter takes an hour and depending on one’s requirement, one could select the relevant chapters of the book, read it, practice the lab exercises and become a moderately skilled git user. Highly recommended book for a git newbie.