Fundamentals of Version Control System

Bikash NishankBikash Nishank
20 min read

Table of contents

1. Introduction to Version Control

What is Version Control?

Version control is a tool that helps keep track of changes made to files over time. Imagine you’re working on a project with multiple versions or edits. Version control allows you to:

  • Track changes: Every change you make is recorded.

  • Collaborate easily: Multiple people can work on the same project without overwriting each other’s work.

  • Undo mistakes: If something goes wrong, you can easily go back to a previous version of the project.

It’s like having a “time machine” for your code or documents, allowing you to restore earlier versions whenever needed.


Why Version Control is Important?

Version control is important for both individuals and teams:

  1. Prevents Data Loss: By saving multiple versions, you never lose important work. You can always go back to a previous version.

  2. Simplifies Collaboration: When many people work on the same project, version control ensures that everyone has the latest version. It helps merge everyone's work without issues.

  3. Provides History: You can see what changes were made, who made them, and why. This helps in troubleshooting and tracking progress.

  4. Backup and Recovery: If files are accidentally deleted or corrupted, you can easily restore them from the version control system.


2. Types of Version Control Systems

Local Version Control Systems

  • Local Version Control is the simplest form of version control, where all changes are tracked on a single computer. You manually save each version of a file.

    Example:
    You might create a folder for each version of a document, like:

    • document_v1.doc

    • document_v2.doc

This method works if you're the only one working on the project, but it has limits:

  • You can't collaborate easily with others.

  • There’s a risk of data loss if your computer crashes.


Centralised Version Control Systems (CVCS)

  • Centralised Version Control stores all project files and their history in one central location (a server). Everyone who works on the project connects to this central server to download (checkout) the latest version of the files. When they make changes, they upload (commit) the new version back to the server.

    Example: Subversion (SVN) is a popular CVCS.

    How it works:

    1. The server holds the main project.

    2. Developers download files from the server, make changes, and upload them back to the server.

Advantages:

  • Simple to use for teams.

  • Central control means easier management.

Disadvantages:

  • If the server goes down, no one can access the files.

  • Changes can only be made online.


Distributed Version Control Systems (DVCS)

  • Distributed Version Control is more advanced. Instead of relying on one central server, every developer has a complete copy of the entire project and its history on their computer. Even if the server is down, developers can continue working.

    Example: Git is a popular DVCS.

    How it works:

    1. Developers clone (download) the entire project to their computer.

    2. They make changes locally and commit them to their own copy of the project.

    3. Once they’re ready, they push their changes back to a shared repository (like GitHub).

Advantages:

  • Works offline.

  • Faster, as most operations are done locally.

  • No single point of failure (since everyone has the full project history).

Disadvantages:

  • More complex than centralised systems.

  • Requires more disk space since everyone has a full copy of the project.


3. Centralised vs Distributed Version Control

Overview of CVCS (Subversion)

  • Subversion (SVN) is a centralised version control system. It stores all the files and the history of changes on a central server. Developers must connect to this server to access and update the project.

    Key Features:

    1. Checkout and Commit: Developers download (checkout) the latest files and upload (commit) their changes back to the server.

    2. Locking Mechanism: SVN can lock files, meaning only one person can work on a file at a time, preventing conflicts.

    3. History: The server keeps a complete history of all changes, which can be reviewed or rolled back if needed.

Pros:

  • Simple and well-suited for small teams.

  • Easy to manage since everything is in one place.

Cons:

  • Requires constant connection to the server.

  • Can be slow for large teams or projects.


Overview of DVCS (Git)

  • Git is a distributed version control system. Each developer has a full copy of the project and its history on their local machine. Changes can be made offline and then shared with others through a process called pushing and pulling.

    Key Features:

    1. Branching: Git makes it easy to create branches (parallel versions of the project) and merge them later. This is great for experimenting with new features without affecting the main project.

    2. Commit Locally: Developers commit changes to their local repository and can push them to the shared repository when ready.

    3. Pull Requests: Git allows developers to propose changes through pull requests, where others can review the changes before merging them.

Pros:

  • Works offline, so developers can continue working even if they can’t access the server.

  • Highly flexible, especially with branching and merging.

  • No single point of failure; even if the server crashes, every developer has a complete backup.

Cons:

  • Steeper learning curve, especially for beginners.

  • Requires more storage since every developer has the full project history.


Key Differences between CVCS and DVCS

FeatureCVCS (e.g., Subversion)DVCS (e.g., Git)
Server DependencyMust be connected to the central server to workCan work offline, server not required for most tasks
HistoryStored only on the central serverStored locally on every developer’s computer
PerformanceSlower for large teams/projects, relies on serverFaster, as most operations happen locally
CollaborationDevelopers must work directly with the central serverDevelopers can collaborate by sharing changes without a central server
ComplexitySimpler, easier to learnMore complex, but offers more flexibility
BackupRisk of losing all data if the server crashesNo single point of failure since everyone has a full copy of the project

4. Subversion (SVN): A Centralized Version Control System

Setting Up Subversion

  1. Install Subversion:

    • Install Subversion on your computer. For example, on Linux, you can run:

        sudo apt-get install subversion
      
    • On Windows, download and install from the official site.

  2. Create a Repository:

    • After installation, create a new repository using the following command:

        svnadmin create /path/to/repository
      
    • This creates a folder structure where Subversion will store your project files and history.

  3. Accessing the Repository:

    • You can access the repository using an SVN client (such as TortoiseSVN or command-line).

    • The repository can be hosted on your local machine or a remote server for team collaboration.


SVN Repository Structure

An SVN repository has the following basic structure:

  1. /trunk: This is where the main codebase resides. Developers usually work from this directory.

  2. /branches: Used to create parallel versions of the project. Developers create branches when they want to experiment with new features without affecting the main codebase.

  3. /tags: This directory holds snapshots of the project at specific points in time. Tags are used to mark important releases (e.g., version 1.0, version 2.0).

This structure helps in organizing the project, especially when multiple versions of the code are being maintained.


Basic SVN Commands

  1. Checkout: To start working on a project, developers need to download a local copy of the repository. This is done using the checkout command:

     svn checkout <repository_url>
    

    This command fetches the latest version of the files from the repository.

  2. Commit: Once changes are made, you can upload them back to the repository with the commit command:

     svn commit -m "Description of changes"
    
  3. Update: Before committing changes, it's important to sync your local copy with the repository. Use the update command:

     svn update
    
  4. Branching: To create a branch, you make a copy of the trunk directory into branches:

     svn copy <repository_url>/trunk <repository_url>/branches/feature-branch
    
  5. Merging: After working on a branch, you can merge the changes back into the trunk:

     svn merge <repository_url>/branches/feature-branch
    

Collaboration in SVN (Locking and Unlocking)

In SVN, files can be locked to prevent multiple people from editing them at the same time, which helps avoid conflicts.

  • Locking a File: To prevent others from editing a file, lock it:

      svn lock <file>
    
  • Unlocking a File: When done, unlock the file so others can edit it:

      svn unlock <file>
    

This ensures only one person can modify a specific file at a time, reducing the risk of conflicts.


Pros and Cons of Subversion

Pros:

  1. Centralized control: It's easier to manage since all files are in one place.

  2. Simple branching model: Subversion's branching model is simple and easy to use.

  3. Locks: The locking feature prevents conflicts in binary files, which can’t be merged.

Cons:

  1. Server dependency: You must be connected to the server to work, making it less ideal for offline work.

  2. Slower operations: Large projects with many files may lead to slow server performance.

  3. Merging issues: Merging can sometimes be difficult, especially when multiple people are working on the same files.


5. Git: A Distributed Version Control System

Setting Up Git

  1. Install Git:

    • On Linux, use:

        sudo apt-get install git
      
    • On Windows or macOS, download and install from git-scm.com.

  2. Configure Git:

    • Set up your name and email:

        git config --global user.name "Your Name"
        git config --global user.email "your.email@example.com"
      
  3. Create a Local Repository:

    • To start a new Git project, initialize a repository:

        git init
      

Git Repository Structure

  1. /.git directory: When you initialise a Git repository, Git creates a .git directory that stores all the project history and configuration files.

  2. Working Directory: This is where your files are stored and worked on.

  3. Staging Area: Changes are first added to the staging area before they are committed. It allows you to prepare commits in stages.

  4. Branches: By default, Git creates a master or main branch, but you can create new branches for feature development.


Basic Git Commands

  1. Clone: To download a copy of an existing repository, use the clone command:

     git clone <repository_url>
    
  2. Commit: After making changes, you can save them to your local repository with commit:

     git add .
     git commit -m "Description of changes"
    
  3. Push: To send your changes to the remote repository, use push:

     git push origin <branch_name>
    
  4. Pull: To get the latest changes from the remote repository, use pull:

     git pull origin <branch_name>
    

Branching and Merging in Git

  • Branching: Git makes it easy to create separate branches for new features:

      git branch <branch_name>
      git checkout <branch_name>
    
  • Merging: After completing work on a branch, you can merge it into the main branch:

      git checkout main
      git merge <branch_name>
    

Advanced Git Commands

  1. Rebase: Moves the base of your branch to a new position in history, allowing a cleaner commit history:

     git rebase <branch_name>
    
  2. Stash: Temporarily saves your uncommitted changes without committing them:

     git stash
    
  3. Revert: Reverses a specific commit while preserving other changes:

     git revert <commit_id>
    

Collaboration in Git (Forking and Pull Requests)

  • Forking: Forking creates a personal copy of someone else's repository, allowing you to experiment without affecting the original project.

  • Pull Requests: After making changes in your fork, you can request to merge them back into the original project through a pull request. This allows others to review your changes before they are merged.


Pros and Cons of Git

Pros:

  1. Distributed: Every developer has the entire project history, so work can continue offline.

  2. Speed: Most operations (commits, branches) are local, making Git fast.

  3. Powerful branching: Git’s branching and merging features are flexible and highly efficient.

Cons:

  1. Steep learning curve: Git can be complex, especially for beginners.

  2. Disk space: Since each developer has a full copy of the repository, it can consume more disk space.

6. Core Concepts in Version Control

Repositories: Local vs Remote

  • Local Repository: A local repository is stored on your own computer. You work on files and commit changes locally, which allows you to keep a version history without needing internet access. For example, when using Git, every project folder with a .git directory is a local repository.

  • Remote Repository: A remote repository is hosted on a server or cloud platform (like GitHub, GitLab, or Bitbucket), enabling team collaboration. Developers can push their changes from local repositories to the remote one, and pull updates from others. This makes remote repositories essential for distributed teams.


Commits and Commit History

  • Commit: A commit is a snapshot of changes made to the files in the repository. When you make a commit, you save the current state of the code, along with a message explaining the changes. Each commit acts as a checkpoint in the project’s history, allowing you to track changes over time.

  • Commit History: The commit history is a timeline of all commits made in a project. It includes details like who made the changes, when, and why (via the commit message). Viewing the commit history helps in understanding how the codebase evolved.


Branching: Best Practices and Use Cases

  • Branching: Branching allows you to create an independent line of development separate from the main codebase. Developers use branches to work on new features, bug fixes, or experiments without affecting the main project.

  • Best Practices:

    1. Keep Branches Small and Focused: Don’t make branches too large. Work on specific features or tasks to avoid large, complex merges.

    2. Create Descriptive Branch Names: Names like feature/user-authentication or bugfix/login-issue help others understand the purpose of the branch.

  • Use Cases:

    • Feature Development: Create a branch for each new feature (e.g., feature/login-system).

    • Bug Fixing: Create a branch for each bug fix (e.g., bugfix/header-error).

    • Release Branches: Branches created for managing releases (e.g., release/v1.0).


Merging: Strategies and Conflict Resolution

  • Merging: Merging takes changes from one branch (typically a feature branch) and integrates them into another (like the main branch). Merging is a common practice when a developer finishes work on a branch and wants to bring those changes into the main codebase.

  • Strategies:

    1. Fast-forward Merge: If the main branch hasn’t changed since the feature branch started, Git will simply move the pointer forward, without creating a new commit.

    2. Three-way Merge: If both branches have diverged, Git creates a new commit that combines the changes from both branches.

  • Conflict Resolution: Merge conflicts happen when two people modify the same part of a file differently. When conflicts occur:

    1. Git highlights the conflicting sections.

    2. You manually resolve the conflict by choosing which changes to keep.

    3. After resolving, you commit the merged file.


Tags: Marking Versions

  • Tags: Tags mark specific points in the project’s history. They’re used to label important milestones, such as releases (e.g., v1.0, v2.0). Unlike branches, tags are fixed and don't change over time.

  • Use Case: When releasing version 1.0 of a software project, you can create a tag to capture the exact state of the code at that moment. This helps in referring back to that version when needed for updates or bug fixes.


Comparing Versions (Diffs)

  • Diffs: A diff shows the differences between two versions of a file. It highlights what lines have been added, modified, or removed. Diffs are essential for code reviews and tracking changes in a project.

  • Usage:

    1. View Diffs Between Commits: Compare the state of the project at two different points in time.

    2. View Diffs Before Committing: Review changes before committing to ensure you’re only including relevant changes.


Rollback and Reverting Changes

  • Rollback: Rolling back refers to reverting the entire codebase to a previous commit, undoing all subsequent changes. This is useful if a major issue arises that requires returning to a stable version.

  • Reverting Changes: Reverting is a more specific operation where you undo individual commits without affecting later commits. This is often safer than a full rollback. For example:

      git revert <commit_id>
    

7. Best Practices for Version Control

Writing Meaningful Commit Messages

  • Why Commit Messages Matter: A clear commit message explains why changes were made, which helps other developers (and your future self) understand the purpose of each change.

  • Best Practices:

    1. Keep It Short and Clear: The message should summarize the changes in 50 characters or less (e.g., "Fix login issue for user authentication").

    2. Use Imperative Tone: Write commit messages as commands (e.g., "Add feature X" instead of "Added feature X").

    3. Provide Context: Include details in the extended description if needed, like bug numbers or explanations of tricky code changes.


Branching Strategies (Gitflow, Trunk-based)

  • Gitflow:

    • A popular branching model with separate branches for development (develop), main production (main), and feature-specific branches (feature/*). It is ideal for large projects with defined release cycles.

Workflow:

  1. New features are developed in feature branches.

  2. Once completed, they are merged into the develop branch.

  3. For release, changes are merged from develop to main.

  • Trunk-Based Development:

    • All developers commit to a single branch, called trunk. Feature branches are short-lived and merged quickly, ensuring a constantly stable main branch.

Best for: Small teams or projects requiring continuous integration.


Code Review and Collaboration

  • Code Review: Reviewing code before merging it into the main branch ensures higher code quality and helps catch bugs early. Teams often use pull requests for code review.

    Process:

    1. Developer submits a pull request for their feature branch.

    2. Reviewers check the code for issues, offer feedback, and request changes if needed.

    3. Once approved, the pull request is merged into the main branch.


Handling Merge Conflicts

  • When Conflicts Occur: Merge conflicts happen when two developers change the same file differently. These conflicts need to be manually resolved before the merge can complete.

  • Best Practices:

    1. Communicate: Notify team members when working on the same files to avoid conflicts.

    2. Resolve Conflicts Early: Address conflicts as soon as they appear to prevent them from growing larger.

    3. Use Tools: Tools like Git's merge tool help visualize conflicts, making resolution easier.


Version Control in Large Teams

  • Challenges:

    1. Coordination: Ensuring everyone is aware of who is working on what to prevent overlapping changes.

    2. Handling Large Repositories: When the project grows, the repository can become slow or hard to manage.

  • Best Practices:

    1. Use Branching Models: Adopt a branching strategy like Gitflow or trunk-based development to keep the team’s work organized.

    2. Automate Tests: Set up automated testing with each commit to ensure code quality.

    3. Define Clear Rules: Set rules for merging, branching, and handling conflicts so everyone is on the same page.


8. Version Control for Collaboration

Step 1: Working with Multiple Developers

When multiple developers work on the same codebase, conflicts may arise if they try to modify the same files simultaneously. Version control systems (VCS) help in managing these conflicts and ensuring that everyone’s contributions are preserved.

  • Branching: Each developer can create a branch to work on a specific feature or bug fix. A branch is essentially a copy of the code that can be modified independently of the main codebase. Once the work is complete, the changes can be merged back into the main branch.

    • Git Example:

        git checkout -b feature-branch
        # Work on the feature
        git add .
        git commit -m "Add new feature"
        git push origin feature-branch
      
  • Merging: Once a feature or bug fix is completed on a branch, it is merged back into the main branch. This allows multiple developers to work in isolation and then integrate their work when ready.

    • Git Example:

        git checkout main
        git merge feature-branch
      
  • Conflict Resolution: If two developers modify the same part of the code, a conflict arises during the merge process. VCS will prompt the user to manually resolve these conflicts before merging the code.

Step 2: Remote Repositories and Collaboration Platforms

Remote repositories are hosted on platforms like GitHub, GitLab, and Bitbucket, which provide additional tools for collaboration:

  • GitHub: A popular platform with features like pull requests, issues tracking, and GitHub Actions for CI/CD.

  • GitLab: Known for its built-in CI/CD pipeline support, making it easier to manage both source code and automation in one place.

  • Bitbucket: Integrates well with Jira and other Atlassian products, popular among teams using the Atlassian suite.

Remote Repositories Workflow:

  • Developers push their changes to a remote repository.

    • Example:

        git push origin branch-name
      
  • Other team members can pull these changes to their local machines.

    • Example:

        git pull origin main
      

Step 3: Continuous Integration with Version Control

Continuous Integration (CI) ensures that code changes are automatically tested and integrated into the main branch whenever new code is pushed to the repository.

  • Integration with CI tools: Tools like Jenkins, Travis CI, and GitLab CI automatically run tests and deploy applications based on triggers (such as code commits).

    • Example (GitLab CI .gitlab-ci.yml):

        yamlCopy codestages:
          - test
          - deploy
      
        test:
          script:
            - ./run-tests.sh
      
        deploy:
          script:
            - ./deploy.sh
      

Step 4: Pull Requests and Code Review Process

A pull request (PR) is a formal way to propose changes to the codebase. It allows team members to review the code before it is merged, ensuring that it adheres to best practices, doesn’t introduce bugs, and aligns with the overall project goals.

  • Pull Request Workflow:

    1. Developer pushes their changes to a branch.

    2. Developer creates a pull request on the platform (GitHub, GitLab, etc.).

    3. Team members review the code, comment, and request changes if needed.

    4. Once approved, the pull request is merged into the main branch.


9. Choosing Between Subversion and Git

Step 1: When to Use Subversion

Subversion (SVN) is a centralised version control system. It works well in environments that require strict control over the codebase and where a single point of truth is necessary.

  • Use Cases:

    • Large organisations with centralised workflows.

    • Projects that involve a lot of large binary files (e.g., design or media files).

    • Teams that require strict auditing and logging of changes.

  • Advantages:

    • Simpler for users who want a single central repository.

    • More straightforward history management (linear history).

Step 2: When to Use Git

Git is a distributed version control system (DVCS), meaning every developer has a complete copy of the repository on their local machine. It is particularly powerful for teams working remotely, open-source projects, and organizations with complex branching and merging needs.

  • Use Cases:

    • Open-source projects or distributed teams.

    • Projects that require frequent branching and merging.

    • Teams that want offline access to their repositories.

  • Advantages:

    • Better support for branching and merging.

    • Each developer has a local copy of the repository, allowing for offline work.

    • Faster performance for projects with many small commits.

Step 3: Pros and Cons of Subversion and Git

SubversionGit
Centralised, easy for strict controlDistributed, more flexible for remote work
Handles large binary files betterBetter for merging and branching
Slower for large repositoriesFaster, especially with smaller commits
Linear commit history, simplerMore complex, especially with multiple branches

10. Version Control in DevOps

Step 1: Integrating Version Control with CI/CD Pipelines

Version control is a crucial part of DevOps, where continuous integration and deployment are automated through CI/CD pipelines. These pipelines trigger builds, tests, and deployments when changes are pushed to the repository.

  • CI/CD Integration:

    1. Code is committed to a Git repository.

    2. The CI/CD pipeline pulls the latest code, builds the application, and runs tests.

    3. If tests pass, the code is deployed to a development or production environment.

  • Example with Jenkins:

    • A Jenkins job is triggered on every commit to build the code, run tests, and deploy it.

Step 2: Automating Code Testing and Deployment

Automation tools like Jenkins, GitLab CI, and CircleCI work alongside version control systems to ensure that code is always tested and ready for deployment.

  • Code Testing Automation: On every commit, test suites are run to verify that the new changes don’t break existing functionality.

  • Automated Deployment: Once the code is successfully tested, it can be automatically deployed to a live environment.

Step 3: The Role of Version Control in DevOps

Version control is at the heart of DevOps, providing a way to:

  • Track changes to the codebase.

  • Automate workflows like testing and deployment.

  • Ensure collaboration between development and operations teams.

In DevOps, Git is the preferred choice due to its distributed nature and ease of integration with CI/CD tools.


11. Conclusion

Key Takeaways

  • Version control is essential for managing and collaborating on code across teams.

  • Subversion is useful in centralised, structured environments, while Git is better for distributed, flexible teams.

  • In DevOps, version control integrates with automation tools, CI/CD pipelines, and ensures smooth collaboration between development and operations.

The Future of Version Control in Software Development

  • Version control systems are likely to evolve with more support for cloud-based repositories, better handling of large files, and tighter security.

  • AI integration may also assist in automating code reviews, testing, and identifying potential issues based on commit history.

  • Collaboration tools will become more sophisticated, enhancing remote work and distributed development even further.

1
Subscribe to my newsletter

Read articles from Bikash Nishank directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bikash Nishank
Bikash Nishank