Daft: The Distributed Python Dataframe
Daft is a fast and scalable Python dataframe for Complex Data and Machine Learning workloads.

Get Started
You can get started with Daft by installing it with a simple command using pip:
$ pip install getdaft
Community
Daft Blog
Daft is a fast, Pythonic and scalable Open-Source dataframe library. Checkout https://getdaft.io
By Sammy Sidhu
More Resources
Integrations
Daft is open-sourced and you can use any Python library when processing data in a dataframe. It integrates with many other open-sourced technologies as well, plugging directly into your current infrastructure and systems.
Data Science & Machine Learning

Cloud Platforms' Storage

Use Cases
# Data Science Experimentation
Daft enables data scientists/engineers to work from their preferred Python notebook environment for interactive experimentation on complex data
# Complex Data Warehousing
The Daft Python dataframe efficiently pipelines complex data from raw data lakes to clean, queryable datasets for analysis and reporting.
# Machine Learning Training Dataset Curation
Modern Machine Learning is data-driven and relies on clean data. The Daft Python dataframe integrates with dataloading frameworks such as Ray and PyTorch to feed data to distributed model training.
# Machine Learning Model Evaluation
Evaluating the performance of machine learning systems is challenging, but Daft Python dataframes make it easy to run models and SQL-style analyses at scale.
Key Features
# User-Defined Functions
Daft supports running Python User-Defined Functions (UDF) on columns of Python objects - if Python supports it Daft can handle it!
# Interactive Computing
Daft embraces Python's dynamic and interactive nature, enabling fast, iterative experimentation on data in your notebook and on your laptop.
# Distributed Computing
Daft integrates with frameworks such as Ray to run large petabyte-scale dataframes on a cluster of machines in the cloud.