Linking Social Media Posts to News with Siamese Transformers

NLCA, 2020 jacobdanovitch/jdnlp

Many computational social science projects examine online discourse surrounding a specific event, such as natural disasters, sporting matches, and political events. A costly bottleneck in this work is collecting relevant social media posts. Normally, an initial corpus would be collected using a high-recall, low-precision keyword search, and filtered manually by human judges.

We design an efficient Siamese architecture to minimize the distance between embeddings of articles and their comments. By allowing researchers to automatically filter corpora using comments most similar to news articles describing the event of interest, this approach can greatly mitigate time and money spent on manual annotation.

Recommended citation

@misc{danovitch2020linking,
title={Linking Social Media Posts to News with Siamese Transformers},
author={Jacob Danovitch},
year={2020},
eprint={2001.03303},
archivePrefix={arXiv},
primaryClass={cs.IR}
}

or

Danovitch, J. (2019). Linking Social Media Posts to News with Siamese Transformers. International Conference on Natural Language Computing Advances (NLCA), Vancouver, CA. https://arxiv.org/abs/2001.03303