{palmerpenguins}
Overview
{palmerpenguins} is a Python package that provides easy access to the Palmer Penguins dataset, making it simple to load this popular dataset for data science education, exploration, and visualization in Python. This is the Python equivalent of the popular R package of the same name.
Project Links
- GitHub Repository: github.com/mcnakhaee/palmerpenguin
- PyPI: Install via
pip install palmerpenguins
About Palmer Penguins
The Palmer Penguins dataset is a modern alternative to the classic Iris dataset. It contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica.
The dataset includes: - 344 penguins across 3 species - 7 variables including species, island, bill dimensions, flipper length, body mass, and sex - Real data collected by Dr. Kristen Gorman at Palmer Station, Antarctica
Why This Package?
While the Palmer Penguins dataset is available in R, Python users needed an easy way to access it. This package:
- Simplifies Data Loading: One-line import of the dataset
- Multiple Formats: Returns pandas DataFrames or raw data
- Consistent API: Follows Python conventions and best practices
- Well-Documented: Clear examples and use cases
- Lightweight: Minimal dependencies
Installation
pip install palmerpenguinsBasic Usage
from palmerpenguins import load_penguins
# Load the penguins dataset
penguins = load_penguins()
# Start exploring
print(penguins.head())
print(penguins.describe())Use Cases
Example Analysis
import matplotlib.pyplot as plt
import seaborn as sns
from palmerpenguins import load_penguins
# Load data
penguins = load_penguins()
# Create visualization
sns.scatterplot(
data=penguins,
x="bill_length_mm",
y="bill_depth_mm",
hue="species"
)
plt.title("Palmer Penguins: Bill Dimensions")
plt.show()Credits
Dataset originally published by: - Dr. Kristen Gorman: Palmer Station, Antarctica LTER - Dr. Allison Horst: Artwork and R package —
Open source project available on GitHub and PyPI