I'm a third year grad student at the University of Washington in Computer Science and Engineering. My research centers around natural language processing and computer vision. In particular, I'm excited about giving AI systems the tools to understand and reason about what's happening in a situation.
I work with Yejin Choi and Ali Farhadi. Sometimes I also am at AI2. In the past, I graduated from Harvey Mudd College in 2016, where I majored in Computer Science and Math. As an undergrad, I worked with Louis-Philippe Morency on multimodal machine learning, and with Jacqueline Dresch on computational biology.
Here's a list of my publications. You can also check out my Google Scholar profile.
We formulate the new task of Visual Commonsense reasoning, where a model must not only answer challenging visual questions expressed in natural language: it must provide a rationale explaining why its answer is true. We introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes.
We release a new NLI dataset, SWAG, with 113k challenging multiple choice questions about grounded sitautions. To build this dataset, we present Adversarial Filtering (AF), which allows for data collection at scale while minimizing annotation artifacts. Even state-of-the-art models for NLI struggle on our dataset.
We study scene graph generation: building a graph where the nodes are objects and the edges are pairwise relationships. We investigate the Visual Genome dataset and find it contains many repeating structures (motifs) and build a model to capture them. It improves over prior state-of-the-art by an average 11% relative improvement.
We investigate zero-shot multimodal learning where the topics of classification are verbs, not objects. We crowdsource verb attributes and build a model to learn them from text (word embeddings and dictionary definitions). We also use these verb attributes alongside word embeddings for action recognition in images.
Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages
Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency. IEEE Intelligent Systems 2016 [paper]
We provide a study of sentiment analysis applied on video data, not just text. We present a model that exploits the dynamics between gestures and verbal messages.
Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome
Jacqueline Dresch, Rowan Zellers, Daniel Bork, Robert Drewell. Gene Regulation and Systems Bio 2016. [paper]
MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding
Rowan Zellers, Robert Drewell, Jacqueline Dresch. BMC Bioinformatics 2015. [paper+code]
We model the specificity with which regulatory proteins bind to DNA sequences during embryonic development. We use this model to study binding sites for 15 distinct regulatory proteins in the Drosophila (fruit fly) genome.
Deep Learning Workshop
In November 2015 at Harvey Mudd, I organized a workshop for students in CS158: Machine Learning and CS151: Artificial Intelligence on Deep Learning. Slides are available here.