Python Lists: Features and Operations Explained

This document covers an in-depth analysis of the List, one of the most fundamental data structures in Python. The goal is to move beyond the superficial usage of lists and to explore and organize their intrinsic features and core internal operating principles in depth.

We will first define the four basic characteristics of a list, organize its practical methods, and then proceed to a step-by-step analysis of its memory management and the Dunder methods that form the basis of operator overloading.

1. The 4 Core Features of a List

The List data structure is defined by the following four core features.

1-1. Ordered

All elements in a list have a distinct order, and this order is maintained as data is added. Each element can be accessed via a unique index, which allows for retrieving or modifying data at a specific position.

# The index starts from 0.
my_list = [10, 20, 30, 40, 50]

# Indexing: Accessing the element at index 1
print(my_list[1]) # Output: 20

# Slicing: Extracting a subset from index 2 up to (but not including) index 4
print(my_list[2:4]) # Output: [30, 40]

The guarantee of order forms the basis for using loops. The for statement internally uses the iterator protocol to sequentially traverse from the first to the last element of the list, automatically terminating the loop once all elements have been visited.

1-2. Allows Duplicates

A list allows for the inclusion of multiple elements with the same value.

my_list = [10, 10, 10]
print(my_list) # Output: [10, 10, 10]

1-3. Mutable

A list is a 'Mutable' object. This means it is possible to modify, add, or delete its internal elements even after the object has been created.

my_list = [100, 200, 300]
print(f"Before modification: {my_list}") # Output: Before modification: [100, 200, 300]

# Changing the value at index 0 to 1000
my_list[0] = 1000
print(f"After modification: {my_list}") # Output: After modification: [1000, 200, 300]

1-4. Heterogeneous

A single list can store data of different types, such as integers, strings, and even other lists or tuples, all together.

my_list = [1, "hello", [10, 20], (30, 40)]
print(my_list) # Output: [1, 'hello', [10, 20], (30, 40)]

2. Key List Methods

The list object provides a variety of built-in methods for data manipulation. The key methods are as follows.

Method	Description	Example Usage
`append(x)`	Adds element `x` to the very end of the list.	`l = [1, 2]; l.append(3) # [1, 2, 3]`
`extend(iterable)`	Extends the list by adding all elements from another iterable.	`l = [1, 2]; l.extend([3, 4]) # [1, 2, 3, 4]`
`insert(i, x)`	Inserts element `x` at index `i`.	`l = [1, 2]; l.insert(1, 100) # [1, 100, 2]`
`remove(x)`	Removes the first occurrence of element `x` from the list.	`l = [1, 2, 1]; l.remove(1) # [2, 1]`
`pop(i)`	Returns the element at index `i` and removes it from the list. If `i` is omitted, it targets the last element.	`l = [1, 2, 3]; val = l.pop(1) # val=2, l=[1, 3]`
`clear()`	Removes all elements from the list, making it empty.	`l = [1, 2, 3]; l.clear() # []`
`index(x)`	Returns the index of the first occurrence of element `x`. Raises a `ValueError` if the element is not found.	`l = [10, 20, 30]; l.index(20) # 1`
`count(x)`	Returns the number of times element `x` appears in the list.	`l = [1, 1, 2, 3]; l.count(1) # 2`
`sort()`	Sorts the elements of the list in-place. (Default: ascending order)	`l = [3, 1, 2]; l.sort() # l becomes [1, 2, 3]`
`reverse()`	Reverses the order of the elements of the list in-place.	`l = [1, 2, 3]; l.reverse() # l becomes [3, 2, 1]`
`copy()`	Creates and returns a shallow copy of the list.	`l1 = [1, 2]; l2 = l1.copy() # l2 is [1, 2], a different object from l1`

3. In-Depth Analysis of Internal Workings

Moving beyond simple functionalities, we can discover some interesting points by examining the internal workings of a list.

3-1. The Phenomenon of Shared Memory Addresses for Identical Value Elements

When checking the memory addresses (id()) of list elements with the same value, we can observe that they all refer to the same address.

# Elements with the same value in a list share the same memory address.
l = [10000, 10000, 10000]
print(f'id of l[0]: {id(l[0])}')
print(f'id of l[1]: {id(l[1])}')
print(f'id of l[2]: {id(l[2])}')
# Example Output:
# id of l[0]: 2258941012848
# id of l[1]: 2258941012848
# id of l[2]: 2258941012848

This phenomenon stems from Python's variable assignment method and memory management efficiency strategies. A list operates by referencing the same address value to save memory when values are identical. In Python, a variable is not a container that directly stores a value, but rather a name (reference) that points to an object created in memory. Particularly for 'Immutable' objects like numbers or strings, Python tends to reuse existing objects in memory instead of creating new ones if they have the same value.

Python uses an 'Integer Interning' mechanism, where it pre-creates integer objects from -5 to 256 for efficiency. It's guaranteed at the language level that numbers within this range will always refer to the same object.

What is Integer Interning?

Python pre-creates a range of small integer objects. Whenever an integer within this range is needed in the program, instead of creating a new object, it reuses the already existing one. This reduces memory usage and saves the time required for object creation.

# Integers in the range -5 to 256 are always the same object.
x = 10
y = 10
print(f'x is y: {x is y}') # Output: x is y: True

However, caution is needed as this behavior is not always guaranteed for general variable assignments.

# Caution is needed when comparing addresses of general variables.
a = 10000
b = 10000

# The values are the same (==), but it is not guaranteed that their addresses are the same (is).
print(f'a == b: {a == b}') # Output: a == b: True
print(f'a is b: {a is b}') # True or False depending on the execution environment

# Although the CPython interpreter may reuse the same object for optimization,
# this is not a language specification and should not be relied upon.
# Always use the `is` operator when comparing object addresses.

In conclusion, each slot in a list stores a reference (address value) to an object in memory, not the value itself. This is why the id() values can appear identical.

Mutable vs Immutable: A clear understanding of the behavioral differences based on Python object 'mutability' and 'immutability' is essential.
Shallow Copy vs Deep Copy: Learning the difference between a shallow copy (the behavior of the copy() method) and a deep copy can prevent potential errors when copying lists.

3-2. Operators and Dunder Methods

If you run dir(list), you can see special methods with double underscores on both sides, such as __add__ and __mul__. These are called Dunder (Double Underscores) methods or magic methods.

These methods are designed to be called internally when standard Python operators like + and * or built-in functions are used on an object. The + and * operations applied to lists are prime examples.

l = [1, 2, 3]
# The two lines of code below behave identically internally.
print(l + l)         # Calls l.__add__(l)
# Output: [1, 2, 3, 1, 2, 3]

print(l * 3)         # Calls l.__mul__(3)
# Output: [1, 2, 3, 1, 2, 3, 1, 2, 3]

This mechanism, where an object reacts to a specific operator to perform a predefined action, is called Operator Overloading.

3-3. Object String Representation: `str` vs `repr`

Dunder methods play a key role not only in operator overloading but also in defining how an object is represented as a string. The representative dunder methods for this are __str__ and __repr__.

The difference between the two becomes clear when you define a custom class.

class A:
    def __str__(self):
        return 'hello (from __str__)'

    def __repr__(self):
        return 'world (from __repr__)'

a = A()

# The print() function preferentially finds and calls the __str__ method.
print(a) # Output: hello (from __str__)

# Executing the variable directly calls the __repr__ method.
a # Output: world (from __repr__)

__str__: Used for creating an informal, 'human-readable' string representation, like with the print() function.
__repr__: Used for creating an 'official' string representation that can unambiguously identify the object. Its main purpose is to provide an accurate state of the object, primarily during development and debugging. If __str__ is not defined, the print() function will call __repr__ instead.

Conclusion

This document has analyzed the four core features of Python lists (ordered, allows duplicates, mutable, heterogeneous), the functionality of key methods, and the principles of internal memory management and operation handling.

I believe that moving beyond simply using a data type to understanding its internal operating principles is a necessary foundation for writing predictable and efficient code. In particular, the concepts of object memory referencing and operator overloading through dunder methods are crucial for understanding Python's object-oriented nature, and I think further study is needed in these areas.

Mastering Python List Structures: A Detailed Look at Features and Operations

1. The 4 Core Features of a List

1-1. Ordered

1-2. Allows Duplicates

1-3. Mutable

1-4. Heterogeneous

2. Key List Methods

3. In-Depth Analysis of Internal Workings

3-1. The Phenomenon of Shared Memory Addresses for Identical Value Elements

What is Integer Interning?

3-2. Operators and Dunder Methods

3-3. Object String Representation: `str` vs `repr`

Conclusion

Subscribe to my newsletter

Cheon Sejun

Cheon Sejun

Mastering Python List Structures: A Detailed Look at Features and Operations

1. The 4 Core Features of a List

1-1. Ordered

1-2. Allows Duplicates

1-3. Mutable

1-4. Heterogeneous

2. Key List Methods

3. In-Depth Analysis of Internal Workings

3-1. The Phenomenon of Shared Memory Addresses for Identical Value Elements

What is Integer Interning?

Related Topics for Further Learning

3-2. Operators and Dunder Methods

3-3. Object String Representation: __str__ vs __repr__

Conclusion

Subscribe to my newsletter

Cheon Sejun

Cheon Sejun

3-3. Object String Representation: `str` vs `repr`