Basic principles of Information Retrieval. Indexing methods. Query processing. Linguistic aspects of Information Retrieval. Agents and artificial intelligence approaches to Information Retrieval. Relation of Information Retrieval to the World Wide Web. Search engines. Servers and clients. Browser and server side programming for Information Retrieval.
Posts by Collection
Semantic web technologies (RDF, RDFS, OWL). Ontology and knowledge base development. Data integration and normalization. Ontology matching. Semantic Web access through SPARQL queries. Semantic Web expansion from unstructured data (text), including Named Entity Recognition, Entity Linking and Relation Extraction from textual data. Question Answering over Linked Data. Data availability, redundancy, contextualization and trust.
Machine learning (ML) is the scientific study of algorithms and statistical models that computers use in order to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. This course will cover advanced topics in machine learning such as deep learning, transfer learning, multiview learning, clustering and Interpretability of ML methods.
Exploring a computational model of Lakoff’s conceptual metaphor theory.
Recommended citation: Lynch, B., Danovitch, J., & Davies, J. (2018). Towards a Computational Approach to Conceptual Metaphor. Poster session at CogSci 2019, Montreal, CA.
Using natural language processing to predict future MLB players.
Danovitch, J. (2019). Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports. 2019 Carnegie Mellon Sports Analytics Conference, Pittsburgh, USA. https://arxiv.org/abs/1910.12622
We design an efficient Siamese architecture to minimize the distance between embeddings of articles and their comments.
Danovitch, J. (2019). Linking Social Media Posts to News with Siamese Transformers. International Conference on Natural Language Computing Advances (NLCA), Vancouver, CA. https://arxiv.org/abs/2001.03303
ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents
We present Gapformer, which effectively classifies content as informative or not. It reformulates the problem as graph classification, drawing on not only the tweet but connected webpages and entities.
Pelrine, Kellin, et al. ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents.Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). 2020. https://www.aclweb.org/anthology/2020.wnut-1.63/
We examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods
Pelrine, Kellin, Jacob Danovitch, and Reihaneh Rabbany. The Surprising Performance of Simple Baselines for Misinformation Detection. arXiv preprint arXiv:2104.06952 (2021). https://arxiv.org/abs/2104.06952