Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports
Work primarily completed during internship at Microsoft.
In baseball, a scouting report is a written profile about a player describing their characteristics and traits, usually intended for use in player valuation. This work presents a first-of-its-kind dataset of nearly 10,000 scouting reports for minor league, international, and draft prospects. Compiled from MLB.com and FanGraphs Baseball, each report consists of a written description of the player, numerical grades of their key attributes (known as the “20-80 scale”), metadata, and unique IDs to reference their profiles on popular resources like MLB.com, FanGraphs, and Baseball-Reference.
With this dataset, we employ several deep neural networks to predict if minor league players will make the MLB given their scouting report. We open-source this data to share with the community, and present a web application demonstrating language variations in the reports of successful and unsuccessful prospects.
Recommended citation
@misc{
danovitch2019trouble,
title={Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports},
author={Jacob Danovitch},
year={2019},
eprint={1910.12622},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
or
Danovitch, J. (2019). Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports. 2019 Carnegie Mellon Sports Analytics Conference, Pittsburgh, USA. https://arxiv.org/abs/1910.12622