Automate Blogging with GitHub: A New Tool

As developers, we pour our hearts into building amazing projects. But when it comes to sharing our creations with the world through a blog post, it often feels like a whole new project in itself. Explaining the architecture, highlighting key features, and writing engaging content can be incredibly time-consuming, often taking a back seat to actual coding.

What if you could instantly transform your GitHub repository, a specific folder, or even a single file into a comprehensive and engaging technical blog post? That's exactly the problem the GitHub to Blog Generator aims to solve.

Introducing the GitHub to Blog Generator

The GitHub to Blog Generator is an AI-powered web application designed to revolutionize how developers document and share their work. Simply paste a public GitHub URL into the intuitive interface, and the application springs into action:

Fetches the relevant code from your specified GitHub location.
Analyzes its purpose and structure.
Generates a well-structured, insightful, and ready-to-publish blog post in Markdown format.

It's your personal technical writer, available 24/7, ready to turn your code into compelling narratives.

Diving Under the Hood: How It Works

This project is a fantastic example of integrating powerful APIs and modern web technologies to create a truly useful tool. Let's break down its core components and clever architecture.

The Intelligent GitHub Fetcher

At the heart of the application is its ability to intelligently consume code from GitHub. This involves several critical steps:

1. URL Parsing

First, the application needs to understand what you've pasted. The parseGithubUrl utility efficiently dissects a GitHub URL, extracting the owner, repo, and the specific path (which could be a root, a folder, or a file). This ensures the subsequent API calls target the correct content.

// utils/urlParser.ts
export function parseGithubUrl(url: string): ParsedUrl | null {
  try {
    const parsed = new URL(url);
    if (parsed.hostname !== 'github.com') {
      return null;
    }

    const pathParts = parsed.pathname.slice(1).split('/').filter(p => p);

    if (pathParts.length < 2) {
      return null;
    }

    const owner = pathParts[0];
    const repo = pathParts[1];

    let pathIndex = 2; 

    if (pathParts[2] === 'blob' || pathParts[2] === 'tree') {
        pathIndex = 4; // Skip 'blob' or 'tree' and the branch name
    }

    const path = pathParts.slice(pathIndex).join('/');

    return { owner, repo, path };

  } catch (error) {
    console.error("URL parsing failed:", error);
    return null;
  }
}

2. Recursive Content Fetching

Once the URL is parsed, the fetchRepoContents (or fetchRepoContentsBFS for the server-side) function takes over. This function performs a Breadth-First Search (BFS) traversal of the specified GitHub path. It smartly handles:

Directories: Recursively explores subdirectories.
Files: Fetches the content of each file.
Smart Filtering: Crucially, it excludes non-source code files (like images, .lock files, build artifacts) and common ignored directories (e.g., node_modules, dist). This is vital to provide the AI with a clean, relevant context, preventing it from getting bogged down by irrelevant data.
Concatenation: All relevant source code is concatenated into a single string, prefixed with --- File: path --- headers, preparing it for the AI model.

// services/githubService.ts (similar logic in pages/api/generate-blog.ts)
async function fetchRepoContents(owner: string, repo: string, startPath: string, token?: string): Promise<string> {
    let allContent = '';
    let fileCount = 0;
    const queue: string[] = [startPath || ''];
    const visited = new Set<string>(queue);

    while (queue.length > 0 && fileCount < MAX_FILES_TO_FETCH) {
        const currentPath = queue.shift()!;
        // ... (API call and sorting logic omitted for brevity) ...

        for (const item of items) {
            // ... (fileCount check and directory handling omitted) ...
            if (item.type === 'file') {
                const itemNameLower = item.name.toLowerCase();
                const isExcluded = EXCLUDED_FILES.includes(itemNameLower) || EXCLUDED_EXTENSIONS.some(ext => itemNameLower.endsWith(ext));
                if (!isExcluded) {
                    try {
                        const fileData = await fetchFromGithubApi<GithubContent>(`/repos/${owner}/${repo}/contents/${item.path}`, token);
                        if (fileData.content) {
                            const decodedContent = atob(fileData.content); // Buffer.from for server-side
                            allContent += `--- File: ${item.path} ---\n\n${decodedContent}\n\n`;
                            fileCount++;
                        }
                    } catch (e) {
                        console.warn(`Could not fetch content for file: ${item.path}`, e);
                    }
                }
            }
        }
    }
    return allContent;
}

This meticulous preparation ensures the AI receives a focused and high-quality input, leading to more accurate and relevant blog posts.

AI-Powered Content Generation with Gemini

Once the code content is fetched and formatted, it's handed over to the Google Gemini API. The generateBlogPost function orchestrates this interaction, crafting a highly specific prompt that guides the AI on exactly what to produce.

The prompt acts as a detailed instruction manual for the AI, dictating:

The role of the AI ("expert technical writer").
The required structure (Title, Introduction, Core Concepts, Code Highlights, Conclusion).
Specific formatting rules (H1 for title, H2 for sections, code blocks with language identifiers, use of bold/italics).
Emphasis on selectivity for code snippets, avoiding full file dumps.

This prompt engineering is crucial for eliciting high-quality, structured output from the LLM.

// services/geminiService.ts (similar prompt in pages/api/generate-blog.ts)
export async function generateBlogPost(codeContent: string, modelName: string): Promise<string> {
  // ... (API key validation omitted) ...

  const prompt = `
You are an expert technical writer and blogger. Based on the following code and file structure from a GitHub project, write a comprehensive and engaging blog post in Markdown format.

**The blog post must have:**
1.  **A Catchy Title:** An engaging H1 heading (e.g., '# My Awesome Project') that reflects the project's purpose.
2.  **An Introduction:** Briefly explain the problem the project solves and what the code does.
3.  **Core Concepts/Features:** A detailed breakdown of the main logic, architecture, or key features. Explain *why* the code is written the way it is.
4.  **Code Highlights:** Include relevant code snippets (using Markdown code blocks with the correct language identifier like \`\`\`typescript) to illustrate your points. Do not just dump entire files. Be selective.
5.  **A Conclusion:** Summarize the project's value and suggest potential next steps or use cases.

**Formatting Rules:**
- The entire output MUST be a single block of valid Markdown.
- Use H2 headings ('##') for main sections like 'Introduction', 'Core Features', 'Conclusion'.
- Use lists for features or steps.
- Use bold and italics to emphasize key terms.

Here is the concatenated content from the project's files:
---
${codeContent}
---
`;

  const response = await ai.models.generateContent({
    model: modelName, // e.g., "gemini-2.5-flash"
    contents: prompt,
  });

  return response.text;
}

Seamless User Experience & Hybrid Architecture

The frontend, built with React and Next.js (for server-side rendering capabilities and API routes), provides a clean and responsive user interface powered by Tailwind CSS.

Key UI components include:

GithubInputForm: A simple input field for the GitHub URL and a "Generate Blog" button.
Loading States: Visual feedback (spinners, stage messages like "Parsing GitHub URL...") keeps the user informed during the generation process.
BlogOutput: Displays the generated Markdown. It offers a convenient toggle between a rendered Markdown preview (using react-markdown and remark-gfm) and the raw Markdown text, along with a copy button.

A notable architectural decision is the hybrid approach to API key handling and AI generation:

Default (Server-Side): By default, the application routes requests through a Next.js API endpoint (/api/generate-blog). This is the recommended and most secure approach, as your Gemini API key and GitHub token (if configured) reside on the server, never exposed to the client. The server handles fetching and AI generation.
Optional (Client-Side with User Keys): The SettingsPanel allows advanced users to provide their own GitHub token and LLM API key/model name. If these are provided, the generation process happens entirely client-side. This gives users greater control and flexibility, especially for testing different models or bypassing default rate limits, while still ensuring their keys are only used from their browser session.

// pages/index.tsx (excerpt from handleGenerate function)
const handleGenerate = useCallback(async () => {
    // ... (input validation, loading state setup omitted) ...

    // Client-side generation if API key is provided in settings
    if (apiKey) {
        // ... (client-side fetchRepoContentsClient and generateBlogPostClient calls) ...
    } else {
        // Server-side generation (default flow)
        setLoadingStage('Preparing request to server...');
        try {
            const response = await fetch('/api/generate-blog', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ githubUrl: url, githubToken }),
            });
            // ... (server response handling) ...
        } catch (err: unknown) {
            // ... (error handling) ...
        } finally {
            setLoading(false);
            setLoadingStage('');
        }
    }
}, [url, apiKey, modelName, githubToken]);

This dual-path architecture provides both security by default and powerful customization options for those who need them.

Conclusion

The GitHub to Blog Generator is more than just a tool; it's a testament to how AI can empower developers by automating repetitive, creative tasks. It significantly reduces the friction between developing a project and effectively communicating its value to a wider audience.

Imagine:

No more hours spent drafting explanations for your open-source contributions.
Rapidly generating documentation-like blog posts for internal company projects.
Easily creating marketing content for your new library or framework.

This project can evolve further by incorporating:

Support for more LLM providers: Expanding beyond Gemini to offer choices like OpenAI's GPT models, Claude, etc.
Customizable Prompts: Allowing users to tweak the AI's instructions for more tailored blog posts.
Deeper Code Understanding: Integrating static analysis tools to provide even richer insights into specific frameworks, design patterns, or performance considerations within the code.

The future of technical content creation is here, and it's powered by AI, making developer lives a whole lot easier. Give the GitHub to Blog Generator a try and experience the magic of automated tech blogging yourself!

Github repo

https://github.com/nik-hil/github-to-blog-generator

Automate Your Tech Blogging: Introducing the GitHub to Blog Generator