Posts by Collection

CSI4107: Information Retrieval and the Internet

Basic principles of Information Retrieval. Indexing methods. Query processing. Linguistic aspects of Information Retrieval. Agents and artificial intelligence approaches to Information Retrieval. Relation of Information Retrieval to the World Wide Web. Search engines. Servers and clients. Browser and server side programming for Information Retrieval.

CSI5180: Topics in Artificial Intelligence

Semantic web technologies (RDF, RDFS, OWL). Ontology and knowledge base development. Data integration and normalization. Ontology matching. Semantic Web access through SPARQL queries. Semantic Web expansion from unstructured data (text), including Named Entity Recognition, Entity Linking and Relation Extraction from textual data. Question Answering over Linked Data. Data availability, redundancy, contextualization and trust.

COMP5900: Advanced Machine Learning

Machine learning (ML) is the scientific study of algorithms and statistical models that computers use in order to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. This course will cover advanced topics in machine learning such as deep learning, transfer learning, multiview learning, clustering and Interpretability of ML methods.

On the Prediction Instability of Graph Neural Networks

Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

Using natural language processing to predict future MLB players.

Danovitch, J. (2019). Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports. 2019 Carnegie Mellon Sports Analytics Conference, Pittsburgh, USA. https://arxiv.org/abs/1910.12622

Linking Social Media Posts to News with Siamese Transformers

We design an efficient Siamese architecture to minimize the distance between embeddings of articles and their comments.

Danovitch, J. (2019). Linking Social Media Posts to News with Siamese Transformers. International Conference on Natural Language Computing Advances (NLCA), Vancouver, CA. https://arxiv.org/abs/2001.03303

ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents

We present Gapformer, which effectively classifies content as informative or not. It reformulates the problem as graph classification, drawing on not only the tweet but connected webpages and entities.

Pelrine, Kellin, et al. ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents.Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). 2020. https://www.aclweb.org/anthology/2020.wnut-1.63/

The Surprising Performance of Simple Baselines for Misinformation Detection

We examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods

Kellin Pelrine, Jacob Danovitch, and Reihaneh Rabbany. 2021. The Surprising Performance of Simple Baselines for Misinformation Detection. In Proceedings of the Web Conference 2021 (WWW '21). Association for Computing Machinery, New York, NY, USA, 3432–3441. https://doi.org/10.1145/3442381.3450111 https://dl.acm.org/doi/abs/10.1145/3442381.3450111

Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Through extensive experiments using synthetic and real world data, we show that SCPD (a) achieves state-of-the-art performance, (b) is significantly faster than the state-of-the-art methods and can easily process millions of edges in a few CPU minutes, (c) can effectively tackle a large quantity of node attributes, additions or deletions and (d) discovers interesting events in large real world graphs.

Huang, S., Danovitch, J., Rabusseau, G., Rabbany, R. (2023). Fast and Attributed Change Detection on Dynamic Graphs with Density of States. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_2 https://link.springer.com/chapter/10.1007/978-3-031-33374-3_2

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs.

Huang, Shenyang, et al. Temporal graph benchmark for machine learning on temporal graphs. Advances in Neural Information Processing Systems 36 (2024). https://proceedings.neurips.cc/paper_files/paper/2023/hash/066b98e63313162f6562b35962671288-Abstract-Datasets_and_Benchmarks.html

Jacob Danovitch

Posts by Collection

education

CSI4107: Information Retrieval and the Internet

CSI5180: Topics in Artificial Intelligence

COMP5900: Advanced Machine Learning

presentations

On the Prediction Instability of Graph Neural Networks

publications

Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

Linking Social Media Posts to News with Siamese Transformers

ComplexDataLab at W-NUT 2020 Task 2: Detecting Informative COVID-19 Tweets by Attending over Linked Documents

The Surprising Performance of Simple Baselines for Misinformation Detection

Fast and Attributed Change Detection on Dynamic Graphs with Density of States

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

talks

Gradient Descent

K-Means Clustering

Linear Regression

PBPRDF - Linked Data for Basketball Analytics

Introduction to Data Science and Artificial Intelligence

Word Embeddings for Information Retrieval

Tutorial Summary: ‘HybridNLP2018 - Tutorial on Hybrid Techniques for Knowledge-based NLP’

CMSAC ‘19: Trouble with the Curve