Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

Recommended citation: Danovitch, J. (2019). Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports. 2019 Carnegie Mellon Sports Analytics Conference, Pittsburgh, USA.

Github


Work primarily completed during internship at Microsoft.

In baseball, a scouting report is a written profile about a player describing their characteristics and traits, usually intended for use in player valuation. This work presents a first-of-its-kind dataset of nearly 10,000 scouting reports for minor league, international, and draft prospects. Compiled from MLB.com and FanGraphs Baseball, each report consists of a written description of the player, numerical grades of their key attributes (known as the “20-80 scale”), metadata, and unique IDs to reference their profiles on popular resources like MLB.com, FanGraphs, and Baseball-Reference.

With this dataset, we employ several deep neural networks to predict if minor league players will make the MLB given their scouting report. We open-source this data to share with the community, and present a web application demonstrating language variations in the reports of successful and unsuccessful prospects.