Learn How to Navigate Code Structures and Extract Details Using Tree-sitter

If you’ve ever wished you could query code like data, Tree-sitter might be your new best friend.

Whether you're building a code analysis tool, editor extension, or just exploring syntax trees—this guide will help you understand Tree-sitter Queries from scratch using real Python examples. Let’s dive in!


What Is Tree-sitter?

Tree-sitter is a parser generator and runtime for building fast, accurate parsers for programming languages. It's used in editors like Neovim, Zed, and VS Code extensions for:

  • Syntax highlighting

  • Structural editing

  • Code navigation

  • Language-aware tools

Tree-sitter Queries

A Tree-sitter query is a way to search through this syntax tree to find specific code patterns. Think of it like a super-powered search tool that not only looks for words but understands the structure of the code.

Let’s say we’re analyzing the following Python code using Tree-sitter:


from rest_framework.views import APIView

from rest_framework.response import Response

from rest_framework import generics, serializers

from django.contrib.auth.models import User



class UserView(APIView):

def get(self, request):

user_id = request.GET.get('id')

if user_id:

return Response({"user_id": user_id})

return Response({"error": "User ID missing"}, status=400)



def post(self, request):

data = request.data

username = data.get('username')

return Response({"username": username"})



class UserSerializer(serializers.ModelSerializer):

class Meta:

model = User

fields = ['id', 'username', 'email']



class UserDetailUpdateView(generics.RetrieveUpdateAPIView):

queryset = User.objects.all()

serializer_class = UserSerializer

lookup_field = 'pk'

We will go through various Tree Sitter queries to match parts of this code, so lets begin.

You can practice using tree sitter from the tree sitter playground. We will be using that here for the demo.

Every tree sitter query is composed of nodes. Lets go through some of the Node types first.

Node Types

Every piece of code is represented as a node in the syntax tree.

Some Examples (Python):

  • identifier

  • call

  • string

  • assignment

  • parameters

  • argument_list

  • return_statement

  • attribute

  • if_statement


identifier

An identifier is a name that the programmer gives to things like variables, functions, classes, or parameters.


(identifier) @var-name


string

In Tree-sitter, a (string) node represents a string literal in the source code — i.e., any value enclosed in quotation marks, like "hello" or 'world'.


(string) @string-val


call

In Tree-sitter, a (call) node represents a function call — when a function is being invoked/executed in the code.


(call

function: (identifier) @called-func)


assignment

In Tree-sitter, an (assignment) node represents an assignment statement, where a value is stored in a variable.


(assignment

left: (identifier) @left-var

right: (_) @right-value)


parameters

In Tree-sitter, a (parameters) node represents the list of parameters that a function accepts.

This query captures each (identifier) inside the parameter list and tags it as @param-name.


(parameters

(identifier) @param-name)


argument_list

In Tree-sitter, an (argument_list) node represents the list of arguments passed to a function when it's being called.


(argument_list

(string) @arg)


return_statement


(return_statement) @return-line


attribute

In Tree-sitter, a (return_statement) node represents a return statement in a function — used to send a value back to the caller.


(attribute

object: (identifier) @object

attribute: (identifier) @prop)


if_statement

In Tree-sitter, an (if_statement) node captures the structure of an if block in a language like Python


(if_statement

condition: (_) @cond

consequence: (_) @if-body)

Named vs Anonymous Nodes

Named nodes are meaningful parts of the code defined by the grammar, like function calls, variable names, or statements.

Anonymous nodes are just syntax symbols or punctuation like =, (, ), or commas — they don’t have special names in the grammar.

| Node Type | Description | Examples |

| ------------- | ---------------------- | ---------------------------------------- |

| Named | Grammar-defined | call, identifier, return_statement |

| Anonymous | Just syntax characters | '=', '(', ')', ',' |

Logical Operators in Tree-sitter Queries

Logical operators help you choose exactly what you want when searching code with Tree-sitter.

Think of them like filters — they check if the thing you found matches or doesn't match certain words or patterns.

You write them with a # before the word, and they work on parts of the code you already found.

1. #match?: Regex match


(function_definition

name: (identifier) @func-name

(#match? @func-name "^get"))

This matches any function whose name starts with get, such as get_user, getData, etc.

In the output, you can observe that the matched text is highlighted as blue.

Continue reading the full article here

10
Subscribe to my newsletter

Read articles from Rijul Rajesh T P directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rijul Rajesh T P
Rijul Rajesh T P