Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

CMSAC, 2019 jacobdanovitch/Trouble-With-The-Curve

Work primarily completed during internship at Microsoft.

In baseball, a scouting report is a written profile about a player describing their characteristics and traits, usually intended for use in player valuation. This work presents a first-of-its-kind dataset of nearly 10,000 scouting reports for minor league, international, and draft prospects. Compiled from MLB.com and FanGraphs Baseball, each report consists of a written description of the player, numerical grades of their key attributes (known as the “20-80 scale”), metadata, and unique IDs to reference their profiles on popular resources like MLB.com, FanGraphs, and Baseball-Reference.

With this dataset, we employ several deep neural networks to predict if minor league players will make the MLB given their scouting report. We open-source this data to share with the community, and present a web application demonstrating language variations in the reports of successful and unsuccessful prospects.

Recommended citation

@misc{
danovitch2019trouble,
title={Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports},
author={Jacob Danovitch},
year={2019},
eprint={1910.12622},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

or

Danovitch, J. (2019). Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports. 2019 Carnegie Mellon Sports Analytics Conference, Pittsburgh, USA. https://arxiv.org/abs/1910.12622