Offline

Stop Guessing, Start Understanding: How Arrow and Pandas Exchange Data

Track:
Data Engineering and MLOps
Type:
Talk
Level:
intermediate
Duration:
30 minutes
View in the schedule

Abstract

Pandas now natively supports PyArrow-backed data types. But what does that actually mean? If you've ever wondered how these two libraries relate to each other, whether they compete or complement each other, and what happens to your data when it moves between them, this talk is for you.

As PyArrow maintainers, we took on the challenge of digging into the conversion code between PyArrow and Pandas, and we're here to share what we've learned. We'll show you what's really going on under the hood: how Arrow's columnar format differs from Pandas' block-based memory layout (including what a BlockManager actually is), when data can be shared without copying, and when a full copy is unavoidable.

We'll also clarify what each library is designed for and how they work together rather than against each other. With pandas increasingly adopting PyArrow as a backend, understanding this relationship is becoming essential rather than optional.

This talk is aimed at Python developers and data engineers who want to deepen their understanding of what's happening beneath the surface.