A framework for analyzing prediction market data, including the largest publicly available dataset of Polymarket and Kalshi market and trade data. Provides tools for data collection, storage, and running analysis scripts that generate figures and statistics.
Overview
This project enables research and analysis of prediction markets by providing:
- Pre-collected datasets from Polymarket and Kalshi
- Data collection indexers for gathering new data
- Analysis framework for generating figures and statistics
Currently supported features:
- Market metadata collection (Kalshi & Polymarket)
- Trade history collection via API and blockchain
- Parquet-based storage with automatic progress saving
- Extensible analysis script framework
Installation & Usage
Requires Python 3.9+. Install dependencies with uv:
1uv syncDownload and extract the pre-collected dataset (36GiB compressed):
1make setupThis downloads data.tar.zst from Cloudflare R2 Storage and extracts it to data/.
Data Collection
Collect market and trade data from prediction market APIs:
1make indexThis opens an interactive menu to select which indexer to run. Data is saved to data/kalshi/ and data/polymarket/ directories. Progress is saved automatically, so you can interrupt and resume collection.
Running Analyses
1make analyzeThis opens an interactive menu to select which analysis to run. You can run all analyses or select a specific one. Output files (PNG, PDF, CSV, JSON) are saved to output/.
Packaging Data
To compress the data directory for storage/distribution:
1make packageThis creates a zstd-compressed tar archive (data.tar.zst) and removes the data/ directory.
Project Structure
1├── src/
2│ ├── analysis/ # Analysis scripts
3│ │ ├── kalshi/ # Kalshi-specific analyses
4│ │ └── polymarket/ # Polymarket-specific analyses
5│ ├── indexers/ # Data collection indexers
6│ │ ├── kalshi/ # Kalshi API client and indexers
7│ │ └── polymarket/ # Polymarket API/blockchain indexers
8│ └── common/ # Shared utilities and interfaces
9├── data/ # Data directory (extracted from data.tar.zst)
10│ ├── kalshi/
11│ │ ├── markets/
12│ │ └── trades/
13│ └── polymarket/
14│ ├── blocks/
15│ ├── markets/
16│ └── trades/
17├── docs/ # Documentation
18└── output/ # Analysis outputs (figures, CSVs)Documentation
- Data Schemas - Parquet file schemas for markets and trades
- Writing Analyses - Guide for writing custom analysis scripts
Contributing
If you'd like to contribute to this project, please open a pull-request with your changes, as well as detailed information on what is changed, added, or improved.
For more information, see the contributing guide.
Issues
If you've found an issue or have a question, please open an issue here.
Research & Citations
- Becker, J. (2026). The Microstructure of Wealth Transfer in Prediction Markets. Jbecker. https://jbecker.dev/research/prediction-market-microstructure
- Le, N. A. (2026). Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets. arXiv. https://arxiv.org/abs/2602.19520
If you have used or plan to use this dataset in your research, please reach out via email or Twitter -- i'd love to hear about what you're using the data for! Additionally, feel free to open a PR and update this section with a link to your paper.