Modules and Packages
Organize your code for reusability and scalability.
π¦ Introduction: Why Organize Code?
As your Python projects grow, keeping all your functions and classes in one file becomes messy. Modules and Packages are Python's way of organizing code into reusable and manageable chunks.
Imagine a library:
A **Module** is like a single book containing related topics (functions).
A **Package** is like a bookshelf (a folder) containing many related books (modules).
This system promotes **reusability**, prevents **name collisions** (two functions with the same name), and improves **readability** for large-scale projects, especially in Data Science (e.g., NumPy, Pandas).
π― Topic 1: The Python Module
A module is simply a Python file (`.py` extension) containing Python definitions and statements.
π Creating and Using a Module:
def add(a, b):
return a + b
result = calculator.add(5, 3)
print(result)
The module name acts as a **namespace** (calculator.add) to avoid confusion if another module also has an `add` function.
β‘ Topic 2: Import Statements and Aliases
There are three primary ways to import code, each with different effects on your namespace.
π Import Types:
import module)
import mathAccess: math.sqrt(4). **Recommended** for clarity.
import module as alias)
import pandas as pdAccess: pd.DataFrame(). Used for long module names (e.g., Data Science libraries).
from module import name)
from math import pi, sqrtAccess: sqrt(4). Brings names directly into the local namespace. Be cautious of name collisions.
π» Example: The Math Module
from math import pi
ποΈ Topic 3: Python Packages
A package is a collection of modules organized in a directory structure. This allows for hierarchical code organization.
The key to a Python package is the file: __init__.py.
π Package Structure Example:
__init__.py
data/
__init__.py
loader.py # Module 1
models/
__init__.py
train.py # Module 2
Any directory with an **\_\_init\_\_.py** file is considered a Python package. This file runs when the package is imported and can be used to set up package-wide variables or imports. (In modern Python 3.3+, this file is optional but still standard practice).
Importing from a Package:
from my_project.models.train import start_training
π» Topic 4: The Script vs. Module ($__name__$)
When Python executes a file, it sets a special variable called $__name__$. This is crucial for creating modules that can also be run directly.
The $__name__$ Variable:
- If the file is run directly: $
__name__$ is set to'__main__'. - If the file is imported as a module: $
__name__$ is set to the module's name (e.g.,'calculator').
We use the following idiom to ensure certain code (like tests or program start) only runs when the file is executed as the main script:
print("Running main application logic...")
if __name__ == "__main__":
main()
π Module Summary
- Module: A single
.pyfile containing functions, classes, and variables. - Package: A directory containing an (optional)
__init__.pyfile and other modules/sub-packages. - Import: Used to load a module. Use
import moduleorfrom module import name. - Alias: Use
import pandas as pdto shorten the name. __name__: Special variable used to determine if a file is the main script or an imported module.
π€ Interview Q&A
Tap on the questions below to reveal the answers.
A **Module** is a single file (.py) containing code. A **Package** is a directory (folder) containing multiple modules and an optional __init__.py file, used to structure a large codebase hierarchically.
Aliasing (e.g., import numpy as np) is used to shorten long module names, making the code cleaner and easier to read, especially in Data Science where standard aliases (like np for NumPy) are common practice.
It imports ALL public names (functions, variables, classes) from the module directly into the current namespace. It is discouraged because it can lead to **name collisions** (overwriting existing variables/functions) and makes it unclear where a specific function came from.
It historically marks a directory as a Python package. When a package is imported, the code in __init__.py is executed, allowing for initialization tasks, setting up package-level variables, and specifying which modules should be available when a user runs from package import *.