Everything you always wanted to know about pandas*
- Track:
- Data Engineering and MLOps
- Type:
- Talk
- Level:
- intermediate
- Duration:
- 30 minutes
Abstract
*but were too afraid to ask!
pandas, the data wrangling workhorse, will celebrate its 18th year of existence in 2026. You rely on it daily, but are you truly confident in your code?
This session is dedicated to the unwritten rules and hidden mechanics that separate a confident user from one who constantly battles warnings and unexpected outputs. We will confront the infamous SettingWithCopyWarning that haunts chained operations, clarify the critical differences in deep vs. shallow copies and the true cost of using inplace=True. We’ll also demystify the complex handling of missing data (NaNs) and much more!
Crucially, we will look to the future. pandas is engaged in a DataFrame library race with newer, high-performance libraries like polars and duckdb. The latest advancements—pandas 2.0 and the new and shiny 3.0, with features like Copy-on-Write and Apache Arrow integration—are the direct response, promising a future of dramatically improved speed, memory efficiency, and data types.
Join me to master the crucial concepts of the past and prepare for the performance gains and new behaviors of the future, ensuring your skills stay ahead of the curve. Stop guessing and start mastering pandas!