Competitive programming often reduces algorithmic efficiency to Big O
notation. While theoretically sound, Big O ignores the constants, interpreter
overhead, and memory allocation patterns inherent in actual execution. A
solution might be O(N) on paper but fail runtime constraints due to Python’s
specific implementation details.
I developed Occam to bridge the gap between theoretical complexity and hardware reality. The goal was to build a local profiling tool capable of analyzing solution behavior under stress before submission to platforms like LeetCode.
Dynamic Analysis: Empirical Benchmarking
The primary component of Occam is a dynamic analyzer that measures runtime
behavior relative to input size (N). Rather than running a single test case,
the tool executes the target function against procedurally generated datasets of
increasing scale.
Implementation
The analyzer utilizes Python’s `timeit` module for precision clocking and
`tracemalloc` to snapshot peak memory usage. By collecting data points across a
range (e.g., N=10 to N=10,000), we establish a dataset representing the
algorithm’s growth curve.
To determine the complexity class programmatically, Occam performs a linear regression on the log-log plot of Time vs N. The resulting slope approximates the polynomial degree:
- Slope approx. 1: Linear
O(N) - Slope approx. 2: Quadratic
O(N^2) - Slope approx. 0: Constant
O(1)
This allows for the validation of algorithmic assumptions against actual hardware performance.
Static Analysis: CPython Internals
Runtime metrics can sometimes mask inefficient implementation details. To address this, Occam includes a static analysis engine that inspects the raw CPython bytecode using the `dis` module.
Bytecode & Instruction Counting
The tool disassembles the function object to count specific bytecode instructions. This reveals hidden costs, such as implicit loop overheads or expensive global lookups, that high-level Python syntax obscures.
Cyclomatic Complexity
Occam calculates the Cyclomatic Complexity number by analyzing the Control Flow
Graph (CFG). It iterates through the bytecode to count branching instructions
(POP_JUMP_IF_FALSE, FOR_ITER, etc.). This metric provides a quantitative
measure of code maintainability and branch density, independent of execution
time.
Battle Mode: Comparative Profiling
Optimization is often about trade-offs. To facilitate A/B testing, I implemented a “Battle Mode” that accepts multiple script files.
The tool executes all provided solutions against the exact same generated input vectors. It then overlays the performance graphs and outputs a statistical comparison. This allows for direct analysis of different approaches—for example, comparing the memory overhead of a Recursive DFS against the time complexity of an Iterative BFS—to determine which implementation is optimal for a specific constraint set.
Visualization
Raw numerical data can be difficult to interpret at a glance. Occam integrates `scipy` for statistical smoothing and `matplotlib` to generate complexity graphs. These visualizations display not just the average runtime, but also the standard deviation (stability) and peak memory consumption at each interval.
Conclusion
Understanding code requires more than looking at the syntax; it requires understanding how that syntax executes on the machine. Occam provides the tooling necessary to dissect Python solutions, offering both the theoretical confirmation of static analysis and the practical reality of dynamic stress testing.
Source code: https://github.com/antonhibl/occam