Offline

Data Layout Matters: Why Your Python Objects Are Slower Than You Think

Track:
Python Core, Internals, Extensions
Type:
Talk
Level:
intermediate
Duration:
30 minutes
View in the schedule

Abstract

We often discuss algorithms, big-O complexity, and clever optimizations. But there’s a quieter performance killer hiding in plain sight: data layout. In lower-level languages, developers debate this because memory layout directly affects cache efficiency and CPU performance. But this isn’t just a C++ concern. The same principles silently shape performance in Python, whether you’re using lists of objects, dictionaries, NumPy arrays, or pandas DataFrames.

In this talk, we’ll start with an intuitive mental model of how data is stored in memory, then connect it directly to practical Python examples, such as:

  • list[MyObject] vs. multiple lists of primitives
  • dataclass objects vs. structured NumPy arrays
  • pandas row iteration vs. vectorized column operations
  • Why vectorization works, and when it doesn’t

You’ll leave with:

  • A new mental model for thinking about Python performance
  • Practical guidelines for designing data structures
  • Benchmarks and real-world examples
  • A clearer understanding of when to choose objects vs. arrays

This talk is for Python developers who want to go beyond surface-level optimization and understand what actually makes code fast, without needing to become systems programmers.

Because sometimes the biggest speedup isn’t a better algorithm... it’s better data layout