Mastering Python List Structures: A Detailed Look at Features and Operations


This document covers an in-depth analysis of the List, one of the most fundamental data structures in Python. The goal is to move beyond the superficial usage of lists and to explore and organize their intrinsic features and core internal operating principles in depth.
We will first define the four basic characteristics of a list, organize its practical methods, and then proceed to a step-by-step analysis of its memory management and the Dunder methods that form the basis of operator overloading.
1. The 4 Core Features of a List
The List data structure is defined by the following four core features.
1-1. Ordered
All elements in a list have a distinct order, and this order is maintained as data is added. Each element can be accessed via a unique index, which allows for retrieving or modifying data at a specific position.
# The index starts from 0.
my_list = [10, 20, 30, 40, 50]
# Indexing: Accessing the element at index 1
print(my_list[1]) # Output: 20
# Slicing: Extracting a subset from index 2 up to (but not including) index 4
print(my_list[2:4]) # Output: [30, 40]
The guarantee of order forms the basis for using loops. The for
statement internally uses the iterator protocol to sequentially traverse from the first to the last element of the list, automatically terminating the loop once all elements have been visited.
1-2. Allows Duplicates
A list allows for the inclusion of multiple elements with the same value.
my_list = [10, 10, 10]
print(my_list) # Output: [10, 10, 10]
1-3. Mutable
A list is a 'Mutable' object. This means it is possible to modify, add, or delete its internal elements even after the object has been created.
my_list = [100, 200, 300]
print(f"Before modification: {my_list}") # Output: Before modification: [100, 200, 300]
# Changing the value at index 0 to 1000
my_list[0] = 1000
print(f"After modification: {my_list}") # Output: After modification: [1000, 200, 300]
1-4. Heterogeneous
A single list can store data of different types, such as integers, strings, and even other lists or tuples, all together.
my_list = [1, "hello", [10, 20], (30, 40)]
print(my_list) # Output: [1, 'hello', [10, 20], (30, 40)]
2. Key List Methods
The list object provides a variety of built-in methods for data manipulation. The key methods are as follows.
Method | Description | Example Usage |
append(x) | Adds element x to the very end of the list. | l = [1, 2]; l.append(3) # [1, 2, 3] |
extend(iterable) | Extends the list by adding all elements from another iterable. | l = [1, 2]; l.extend([3, 4]) # [1, 2, 3, 4] |
insert(i, x) | Inserts element x at index i . | l = [1, 2]; l.insert(1, 100) # [1, 100, 2] |
remove(x) | Removes the first occurrence of element x from the list. | l = [1, 2, 1]; l.remove(1) # [2, 1] |
pop(i) | Returns the element at index i and removes it from the list. If i is omitted, it targets the last element. | l = [1, 2, 3]; val = l.pop(1) # val=2, l=[1, 3] |
clear() | Removes all elements from the list, making it empty. | l = [1, 2, 3]; l.clear() # [] |
index(x) | Returns the index of the first occurrence of element x . Raises a ValueError if the element is not found. | l = [10, 20, 30]; l.index(20) # 1 |
count(x) | Returns the number of times element x appears in the list. | l = [1, 1, 2, 3]; l.count(1) # 2 |
sort() | Sorts the elements of the list in-place. (Default: ascending order) | l = [3, 1, 2]; l.sort() # l becomes [1, 2, 3] |
reverse() | Reverses the order of the elements of the list in-place. | l = [1, 2, 3]; l.reverse() # l becomes [3, 2, 1] |
copy() | Creates and returns a shallow copy of the list. | l1 = [1, 2]; l2 = l1.copy() # l2 is [1, 2], a different object from l1 |
3. In-Depth Analysis of Internal Workings
Moving beyond simple functionalities, we can discover some interesting points by examining the internal workings of a list.
3-1. The Phenomenon of Shared Memory Addresses for Identical Value Elements
When checking the memory addresses (id()
) of list elements with the same value, we can observe that they all refer to the same address.
# Elements with the same value in a list share the same memory address.
l = [10000, 10000, 10000]
print(f'id of l[0]: {id(l[0])}')
print(f'id of l[1]: {id(l[1])}')
print(f'id of l[2]: {id(l[2])}')
# Example Output:
# id of l[0]: 2258941012848
# id of l[1]: 2258941012848
# id of l[2]: 2258941012848
This phenomenon stems from Python's variable assignment method and memory management efficiency strategies. A list operates by referencing the same address value to save memory when values are identical. In Python, a variable is not a container that directly stores a value, but rather a name (reference) that points to an object created in memory. Particularly for 'Immutable' objects like numbers or strings, Python tends to reuse existing objects in memory instead of creating new ones if they have the same value.
Python uses an 'Integer Interning' mechanism, where it pre-creates integer objects from -5 to 256 for efficiency. It's guaranteed at the language level that numbers within this range will always refer to the same object.
What is Integer Interning?
Python pre-creates a range of small integer objects. Whenever an integer within this range is needed in the program, instead of creating a new object, it reuses the already existing one. This reduces memory usage and saves the time required for object creation.
# Integers in the range -5 to 256 are always the same object.
x = 10
y = 10
print(f'x is y: {x is y}') # Output: x is y: True
However, caution is needed as this behavior is not always guaranteed for general variable assignments.
# Caution is needed when comparing addresses of general variables.
a = 10000
b = 10000
# The values are the same (==), but it is not guaranteed that their addresses are the same (is).
print(f'a == b: {a == b}') # Output: a == b: True
print(f'a is b: {a is b}') # True or False depending on the execution environment
# Although the CPython interpreter may reuse the same object for optimization,
# this is not a language specification and should not be relied upon.
# Always use the `is` operator when comparing object addresses.
In conclusion, each slot in a list stores a reference (address value) to an object in memory, not the value itself. This is why the id()
values can appear identical.
Related Topics for Further Learning
Mutable vs Immutable: A clear understanding of the behavioral differences based on Python object 'mutability' and 'immutability' is essential.
Shallow Copy vs Deep Copy: Learning the difference between a shallow copy (the behavior of the
copy()
method) and a deep copy can prevent potential errors when copying lists.
3-2. Operators and Dunder Methods
If you run dir(list)
, you can see special methods with double underscores on both sides, such as __add__
and __mul__
. These are called Dunder (Double Underscores) methods or magic methods.
These methods are designed to be called internally when standard Python operators like +
and *
or built-in functions are used on an object. The +
and *
operations applied to lists are prime examples.
l = [1, 2, 3]
# The two lines of code below behave identically internally.
print(l + l) # Calls l.__add__(l)
# Output: [1, 2, 3, 1, 2, 3]
print(l * 3) # Calls l.__mul__(3)
# Output: [1, 2, 3, 1, 2, 3, 1, 2, 3]
This mechanism, where an object reacts to a specific operator to perform a predefined action, is called Operator Overloading.
3-3. Object String Representation: __str__
vs __repr__
Dunder methods play a key role not only in operator overloading but also in defining how an object is represented as a string. The representative dunder methods for this are __str__
and __repr__
.
The difference between the two becomes clear when you define a custom class.
class A:
def __str__(self):
return 'hello (from __str__)'
def __repr__(self):
return 'world (from __repr__)'
a = A()
# The print() function preferentially finds and calls the __str__ method.
print(a) # Output: hello (from __str__)
# Executing the variable directly calls the __repr__ method.
a # Output: world (from __repr__)
__str__
: Used for creating an informal, 'human-readable' string representation, like with theprint()
function.__repr__
: Used for creating an 'official' string representation that can unambiguously identify the object. Its main purpose is to provide an accurate state of the object, primarily during development and debugging. If__str__
is not defined, theprint()
function will call__repr__
instead.
Conclusion
This document has analyzed the four core features of Python lists (ordered, allows duplicates, mutable, heterogeneous), the functionality of key methods, and the principles of internal memory management and operation handling.
I believe that moving beyond simply using a data type to understanding its internal operating principles is a necessary foundation for writing predictable and efficient code. In particular, the concepts of object memory referencing and operator overloading through dunder methods are crucial for understanding Python's object-oriented nature, and I think further study is needed in these areas.
Subscribe to my newsletter
Read articles from Cheon Sejun directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
