How it started

I recently came across the fancyflags open-source project by Google DeepMind. The project is a Python library that extends absl.flags with additional structured flag types. It provides flags that correspond to the structures in Python such as dicts, dataclasses and arbitrary callables. In basic sense, they catch errors before they propagate into your program.

This project piqued my interest because in my past Python projects that grew into considerable codebases, I kept running into the same frustrating pattern: passing around massive configuration dictionaries or having dozens of command-line arguments scattered throughout my code. fancyflags caught my attention because it is bridging the gap between simple flags and complex configuration structures.

The concepts of dotted flags like replay replay.capacity and replay.exponents.priority really felt intuitive and I appreciate the project a lot.

Understanding its design philosophy

fancyflags embodies a philosophy of selective complexity - it acknowledges that “in many cases a few regular absl flags are all you need” while providing structured alternative when complexity is genuinely warranted. The library doesn’t try to replace simple command-line arguments with over-engineered solutions; instead, it promotes “mixing with regular absl.flags” and only steps in when you’re dealing with structured data like dictionaries, dataclasses, or complex configurations.

The library deliberately avoids becoming a dependency injection framework, preferring that users write regular Python for wiring up their code, because it’s explicit, simple to understand, and allows static analysis tools to identify problems. The philosophy treats configuration as data transformation rather than runtime magic, making the code more predictable and debuggable.

It establishes clear boundaries about where and how it should be used. The library should only be used in the main file band does not require the developer to modify library code - this constraint prevents configuration concerns from leaking throughout the codebase (imagine breaking the codebase, just because you wanted to extend a library in your main file). By integrating seamlessly with absl.flags and maintaining the same validation and error-handling patterns, it allows developers and teams to adopt structured configuration incrementally without disrupting the existing flag usage.

Setting up the Project

Setting up the library was fairly simple. It just involved forking the GitHub Repository from Google DeepMind official Repository, Cloning it into my local system, creating a python virtual environment, and setting up the project according to the commands given in the ReadMe file provided.

Once I had the project setup, I began researching and exploring the codebase. I started noticing patterns that seemed worth exploring, Rather than immediately proposing changes, I spent time understanding the existing architecture - how ff.DEFINE_dict generates dotted flag names, how the ff.Item abstraction works, and how the library integrates with absl.flags under the hood

I began noticing areas where the codebase could genuinely benefit from the attention. Some internal functions in _argument_parsers.py lacked comprehensive docstrings, making it harder to understand the parsing logic. Type annotations in _auto.py were sometimes incomplete, which could impact static analysis tools that the library explicitly supports. There were TODO comments scattered throughout the code indicating areas which the maintainers themselves had identified for improvements. They were incremental improvements that any mature codebase benefits from.

Contributing to the Codebase

My first contribution was motivated by what I perceived as a security vulnerability in the argument parsers. Looking at the code in _argument_parsers.py, I saw ast.literal_eval being used to parse user input.

try:
    result = ast.literal_eval(argument)
except (ValueError, SyntaxError) as e:
    raise ValueError(
        "Failed to parse \"{}\" as a python literal: {}".format(
            argument, e)
    ) from e

This immediately had alerts going off in my mind. I thought “What if someone passes malicious code through the command line itself” So I implemented comprehensive input validation.

def _validate_string_input(argument: str) -> None:
    """Validates string input before parsing."""
    if len(argument) > _MAX_STRING_LENGTH:
        raise ValueError(
            f"Input string too long ({len(argument)} chars). "
            f"Maximum allowed: {_MAX_STRING_LENGTH}"
        )

    # Checking for potentially dangerous patterns here
    dangerous_patterns = [
        '__import__',
        'exec',
        'eval',
        'open',
        'file'
    ]

    for pattern in dangerous_patterns:
        if pattern in argument:
            raise ValueError(
                f"Input contains potentially unsafe pattern: '{pattern}'"
            )

def _validate_sequence_depth(obj, current_depth=0):
    """Validates sequence nesting depth."""
    if current_depth > _MAX_DEPTH:
        raise ValueError(
            f"Sequence nesting too deep (depth > {_MAX_DEPTH})"
        )

    if isinstance(obj, BASIC_SEQUENCE_TYPES):
        for item in obj:
            _validate_sequence_depth(item, current_depth + 1)

I also added extensive test coverage for these validation functions

class TestSequenceParserValidation:
    def test_dangerous_patterns(self):
        parser = SequenceParser()
        with pytest.raises(ValueError, match="unsafe pattern"):
            parser.parse("__import__('os').system('ls')")

        with pytest.raises(ValueError, match="unsafe pattern"):
            parser.parse("exec('print(1)')")

    def test_excessive_nesting(self):
        parser = SequenceParser()
        nested = "[" * 15 + "1" + "]" * 15
        with pytest.raises(ValueError, match="nesting too deep"):
            parser.parse(nested)

After submitting my pull request

Once I had made the changes in my code (+220, -12) lines added and removed. I submitted a detailed pull request regarding what I observed, and the changes that I had made accordingly.

I received a pretty swift and educational response from the maintainers of the project (Senior Scientists at Google DeepMind). They pointed out that ast.literal_eval is specifically designed to be safe as it only allows literal expressions and already blocks all the “dangerous patterns” I was trying to detect in my contribution.

# What ast.literal_eval actually allows
import ast

# These work fine
ast.literal_eval("[1, 2, 3]")        # [1, 2, 3]
ast.literal_eval("{'a': 1}")         # {'a': 1}
ast.literal_eval("(1, 2, 3)")        # (1, 2, 3)

# These are already blocked by ast.literal_eval
ast.literal_eval("__import__('os')")  # ValueError: malformed node
ast.literal_eval("exec('print(1)')")  # ValueError: malformed node

What I have learnt from making the contribution

This was my first open source contribution towards any major open source project. It taught me that open source work often involves unglamorous but essential improvements. Adding proper docstrings, enhancing type annotations, and addressing technical debt might not be as exciting as adding new features, but they directly improve the developer experience and maintainability of the codebase. Each small improvement made to the library makes it more accessible to the future contributors and users, aligning with the philosophy of being simple to understand. I enjoyed the process a lot, introducing me to such a major codebase and Google’s philosophy for the first time. Feels like I’m one step closer to the industry standard practices and growing my knowledge day-by-day :)

Contributing to Open Source: My Attempt to Improve Input Validation in fancyflags