Yu Su

About Me

Yu Su

I'm an Assistant Professor of the Department of Computer Science and Engineering at the Ohio State University, where I serve as the Co-Lead of Foundational AI in the ICICLE AI Institute and Lead of Machine Learning Foundations in the Imageomics Institute. I also spend some fun time at Microsoft Semantic Machines, where I help develop a suite of new technologies for task-oriented dialogue systems that have been deployed in Microsoft Outlook. I got my PhD from University of California, Santa Barbara and my bachelor degree from Tsinghua University.

I have broad interests in understanding human languages, formal knowledge and languages, and their interplay, with the overarching goal of enabling humans and machines to communicate and collaborate via natural language. More specifically, I'm interested in conversational AI (dialogue systems, semantic parsing, question answering, and grounded language understanding) and knowledge representation (knowledge base construction, reasoning, and querying). I'm also fascinated by a series of general AI problems such as interpretability, robustness and out-of-distribution generalization, multimodality, and sample-efficient learning. Finally, I believe in translational research and seek to leverage AI capabilities to empower a variety of domains such as biomedicine and biology.

I'm always looking for highly motivated students. Drop me an email with CV and transcripts if you are interested in natural language processing, machine learning, or AI in general (unfortunately due to the high volumn I may not be able to reply to every email)

What's New

  • Check out a summary of the major achievements by the OSU NLP group in 2021!
  • 07/2022: Serve as Senior PC member for AAAI'23.
  • 07/2022: Invited talk at the DLG4NLP workshop at NAACL'22: Will Graphs Lead to the Next Breakthrough of Conversational AI?
  • 06/2022: Our OSU team won the 3rd place in the inaugural Amazon Alexa Prize TaskBot Challenge! Check out our website.
  • 05/2022: Thank you, Walmart and Cisco, for supporting our research!
  • 04/2022: Talk at Nanjing University and JD.com on emerging frontiers of conversational AI.
  • 02/2022: Paper on long-horizon vison-and-language navigation accepted to CVPR 2022.
  • 02/2022: Paper on text-to-SQL generalization accepted to ACL 2022.
  • 11/2021: Our team is selected to participate in the Alexa Prize SimBot challenge!
  • 09/2021: Excited to be a part of the Imageomics Institute -- a new NSF HDR Institute dedicated for knowledge-guided machine learning for biology. I will lead the Machine Learning Foundations team.
  • 08/2021: Paper on pre-trained language models with better reasoning capabilities accepted to EMNLP 2021.
  • 07/2021: Excited to be a part of ICICLE -- a new NSF AI Institute dedicated to democratizing AI through AI and cyberinfrastructure innovations. I will lead the AI team with Eric Fosler-Lussier. Read more.
  • 07/2021: Talk at USC/ISI and Beijing Academy of Artificial Intelligence on Emerging Frontiers of Conversational AI.
  • 05/2021: Long paper on large-scale joint KB and text embedding accepted to ACL 2021.
  • 03/2021: Received an Accelerator Grant from OSU TDAI on NLP for Social Media Pharmacovigilance.
  • 03/2021: Short Paper on compositional generalization for neural semantic parsing accepted to NAACL-HLT 2021.
  • 01/2021: Paper on non-i.i.d. generalization of question answering on knowledge bases accepted to TheWebConf 2021 (previously WWW).
  • 11/2020: Will co-organize the First Workshop on Natural Language Processing for Programming at ACL-IJCNLP 2021.
  • 09/2020: Will serve on the organizing committee of NAACL 2021.
  • 09/2020: Super excited to share some of the work I've been working on at Microsoft Semantic MachinesTask-Oriented Dialogue as Dataflow Synthesis (TACL'20)
  • 09/2020: Two long papers (learning language interfaces from use and data-to-text generation) accepted to EMNLP'20. One short paper on document classification for COVID-19 literature accepted to Findings of EMNLP.
  • 05/2020: Serve as Area Chair (Conversational Bot/QA) at NLPCC'20. Serve in the Program Committee of ACL'20, KDD'20 (chair of Trustworthy Data Mining session), EMNLP'20, AAAI'21, AKBC'20, IntEx-SemPar'20.
  • 04/2020: Long paper on logical natural language generation accepted to ACL 2020
  • 03/2020: Thank you, Fujitsu Laboratories of America, for supporting our research!
  • 01/2020: Started as Assistant Professor of Computer Science and Engineering at the Ohio State University
  • 08/2019: Long paper on model-based interactive semantic parsing got accepted to EMNLP 2019
  • 08/2019: Long paper on taxonomic categorization of documents got accepted to ICDM 2019
  • 05/2019: Short paper on general-purpose textual relation embedding got accepted to ACL 2019
  • 05/2019: Received Outstanding Dissertation Award of Computer Science from UCSB. Thank you UCSB!
  • 05/2019: Check out what we are doing at Microsoft Semantic Machines (highlighted in Microsoft Build 2019)!
  • 02/2019: Full paper on vocabulary selection got accepted to NAACL 2019
  • 02/2019: Talk at Stanford NLP Seminar on democratizing data science with knowledge engines
  • 11/2018: Full paper on zero-shot video captioning got accepted to AAAI 2019
  • 10/2018: Started as researcher at Microsoft Semantic Machines in Berkeley working on conversational AI.
  • 08/2018: Full paper on concept mining from text got accepted to ICDM 2018.
  • 08/2018: Two long papers on dialog/semantic parsing got accepted to EMNLP 2018.
  • 07/2018: Our work on natural language interfaces to APIs highlighted in Microsoft Research Blog!
  • 06/2018: Serve as PC member for ACL'18, EMNLP'18, CoNLL'18, NLPCC'18, and AAAI'19.
  • 04/2018: Paper "DialSQL: Dialogue Based Structured Query Generation" accepted to ACL'18 as long paper: Improve semantic parsing with dialog.
  • 04/2018: Paper "Natural Language Interfaces with Fine-Grained User Interaction: A Case Study on Web APIs" accepted to SIGIR'18 as long paper.
  • 03/2018: Awarded the Best Distinguished Graduate Student Lecture of UCSB CS Summit.
  • 02/2018: Paper "Global Relation Embedding for Relation Extraction" accepted to NAACL-HLT'18: Robust relation extraction from text with global statistics.
  • 02/2018: Talk about "Bridging the Gap between Human and Data with AI" at the University of Massachusetts, Amherst.
  • 02/2018: Successfully organized the first Workshop on Knowledge Base Construction, Reasoning and Mining at Los Angeles. Check out the great invited talks and accepted papers!
  • 01/2018: Talk about "Bridging the Gap between Human and Data with AI" at the Ohio State University.
  • 12/2017: I will serve in the Program Committee (Research Track) of KDD'18
  • 12/2017: Paper "Unsupervised Neural Categorization for Scientific Publications" accepted to SDM'18.
  • 11/2017: Attended CIKM'17 in Singapore and gave a talk on natural lanugage interface and a tutorial on construction and querying of large-scale knowledge bases.
  • 10/2017: Upcoming visits in China: 10.09-10.15 (Alibaba, Hangzhou), 10.10 (Fudan University, Shanghai), 10.11 (The Computing Conferencce, Hangzhou), 10.16 (Tsinghua University, Beijing), 10.17 (Toutiao AI Lab, Beijing)
  • 09/2017: I'm co-organizing the First Workshop on Knowledge Base Construction, Reasoning and Mining (KBCOM'18) co-located with WSDM'18 on Feb 9, 2018 at Los Angeles. CFP is out!
  • 09/2017: Finished summer internship at MSR. Flying to Copenhagen for EMNLP.
  • 08/2017: I will serve in the Program Committee of WWW'18
  • 08/2017: Paper on natural language interface to web API from zero user and data accepted to CIKM'17.
  • 07/2017: Tutorial on Construction and Querying of Large-Scale Knowledge Bases accepted to CIKM'17. See you in Singapore!
  • 06/2017: Three papers on semantic parsing/QA accepted to EMNLP'17. Thanks to my collaborators!
  • 06/2017: Started summer internship in Microsoft Research
  • 04/2017: I will serve in the Program Committee of CIKM'17
  • 03/2017: Attended a project meeting at UIUC and gave a talk on unsupervised document categorization
  • 03/2017: I will serve in the Program Committee of NLPCC'17
  • 02/2017: I will serve in the Program Committee of EMNLP'17
  • 01/2017: I will serve in the Program Committee of ACL'17
  • 11/2016: Attended EMNLP'16 in Austin, US
  • 09/2016: Our QA dataset GraphQuestions v1 is released. Check it out!
  • 09/2016: Two papers on knowledge base question answering got accepted to EMNLP'16!
  • 09/2016: Attended the Bay Area Deep Learning School, Stanford
  • 06/2016: Started summer internship in Microsoft Research, Redmond

Teaching

Students

     Ph.D. Students

     Undergraduate Students

  • Alison Zhong (2022-)
  • Sparsh Kustagishettar (2022-)
  • Clay Washington (2021-)
  • Nikolas McNeal (2021-)
  • Sam Stevens (2020-2021) → Ph.D. at OSU

Publications

(Google Scholar) (Semantic Scholar)

     Refereed Publications

  • When More Data Hurts: A Troubling Quirk in Developing Broad-Coverage Natural Language Understanding Systems
    Elias Stengel-Eskin, Emmanouil Antonios Platanios, Adam Pauls, Sam Thomson, Hao Fang, Benjamin Van Durme, Jason Eisner, Yu Su. arXiv preprint arXiv:2205.12228, 2022 [paper]
  • ArcaneQA: Dynamic Program Induction and Contextualized Encoding for Knowledge Base Question Answering
    Yu Gu and Yu Su. arXiv preprint arXiv:2204.08109, 2022 [paper]
  • Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again
    Bernal Jiménez Gutiérrez, Nikolas McNeal, Clay Washington, You Chen, Lang Li, Huan Sun, Yu Su. arXiv preprint arXiv:2203.08410, 2022 [paper]
  • Bootstrapping a User-Centered Task-Oriented Dialogue System
    Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun. Alexa Prize TaskBot Challenge Proceedings, 2022 [paper]
  • Detecting Drug-Drug Interactions Between COVID-19 Therapies and Concomitant Medications Using the FDA Adverse Event Reporting System
    Eugene Jeong, Scott D Nelson, Yu Su, Bradley Malin, Lang Li, You Chen. Frontiers in Pharmacology, 2022 [paper]
  • Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
    Chen Zhao, Yu Su, Adam Pauls, Emmanouil Antonios Platanios. In the Annual Conference of the Association for Computational Linguistics, 2022 (ACL'22) [paper]
  • One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
    Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su. In the Conference on Computer Vision and Pattern Recognition, 2022 (CVPR'22) [paper] [code]
  • Random Control Selection for Conducting High-throughput Adverse Drug Events Screening Using Large-scale Longitudinal Health Data
    Chien-Wei Chiang, Pengyue Zhang, Macarius Donneyong, You Chen, Yu Su, Lang Li. CPT: Pharmacometrics & Systems Pharmacology, 2021 (PSP'21) [paper]
  • Compositional Generalization for Natural Language Interfaces to Web APIs
    Saghar Hosseini, Ahmed Hassan Awadallah, Yu Su. arXiv preprint arXiv:2112.05209, 2021 [paper]
  • An Investigation of Language Model Interpretability via Sentence Editing
    Samuel Stevens and Yu Su. In the Proc. of the BlackboxNLP Workshop at EMNLP, 2021 (BlackboxNLP'21) [paper] [code and data]
  • ReasonBERT: Pre-trained to Reason with Distant Supervision
    Xiang Deng, Yu Su, Alyssa Lees, You Wu, Cong Yu and Huan Sun. In the Proc. of the Conference on Empirical Methods in Natural Language Processing, 2021 (EMNLP'21) [paper] [code and pre-trained models]
  • A Systematic Investigation of KB-Text Embedding Alignment at Scale
    Vardaan Pahuja, Yu Gu, Wenhu Chen, Mehdi Bahrami, Lei Liu, Wei-Peng Chen, and Yu Su. In the Proc. of the Annual Conference of the Association for Computational Linguistics, 2021 (ACL'21) [paper] [code]
  • Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention
    Pengcheng Yin, Hao Fang, Graham Neubig, Adam Pauls, Emmanouil Antonios Platanios, Yu Su, Sam Thomson, and Jacob Andreas. In the Proc. of the Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies, 2021, short paper (NAACL-HLT'21) [paper]
  • Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases
    Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, and Yu Su. In the Proc. of the Web Conference (previously WWW), 2021 (TheWebConf'21) [paper] [data and leaderboard] [code]
  • Task-Oriented Dialogue as Dataflow Synthesis
    Semantic Machines, Jacob Andreas, John Bufe, David Burkett, Charles Chen, Josh Clausman, Jean Crawford, Kate Crim, Jordan DeLoach, Leah Dorner, Jason Eisner, Hao Fang, Alan Guo, David Hall, Kristin Hayes, Kellie Hill, Diana Ho, Wendy Iwaszuk, Smriti Jha, Dan Klein, Jayant Krishnamurthy, Theo Lanman, Percy Liang, Christopher H. Lin, Ilya Lintsbakh, Andy McGovern, Aleksandr Nisnevich, Adam Pauls, Dmitrij Petters, Brent Read, Dan Roth, Subhro Roy, Jesse Rusak, Beth Short, Div Slomin, Ben Snyder, Stephon Striplin, Yu Su, Zachary Tellman, Sam Thomson, Andrei Vorobev, Izabela Witoszko, Jason Wolfe, Abby Wray, Yuchen Zhang and Alexander Zotov. Transactions of the Association for Computational Linguistics, 2020 (TACL’20) [blog][paper] [code] [data and leaderboard] [Satya Nadella's Presentation at Microsoft Build 2019] [talk at EMNLP'20]
  • An Imitation Game for Learning Semantic Parsers from User Interaction
    Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, Yu Su. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 2020 (EMNLP’20) [paper] [code]
  • KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation
    Wenhu Chen, Yu Su, Xifeng Yan, William Yang Wang. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 2020 (EMNLP’20) [paper] [code]
  • Document Classification for COVID-19 Literature
    Bernal Jiménez Gutiérrez, Juncheng Zeng, Dongdong Zhang, Ping Zhang, Yu Su. Findings of EMNLP'20, short paper. Presented at NLP-COVID workshop at ACL'20 [paper]
  • Logical Natural Language Generation from Open-Domain Tables
    Wenhu Chen, Jianshu Chen, Yu Su, Zhiyu Chen, William Yang Wang. In Proc. of the Annual Conference of the Association for Computational Linguistics, 2020 (ACL’20) [paper][code and data]
  • Model-based Interactive Semantic Parsing: A Unified Formulation and A Text-to-SQL Case Study
    Ziyu Yao, Yu Su, Huan Sun, Scott Wen-tau Yih. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 2019 (EMNLP’19) [paper][code]
  • HierCon: Hierarchical Organization of Technical Documents based on Concepts
    Keqian Li, Shiyang Li, Semih Yavuz, Hanwen Zha, Yu Su, and Xifeng Yan. In Proc. of the IEEE International Conference on Data Mining, 2019 (ICDM’19) [paper]
    Best of ICDM 2019 Selection
  • Global Textual Relation Embedding for Relational Understanding
    Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan and Yu Su. In Proc. of the Annual Conference of the Association for Computational Linguistics, 2019, short paper (ACL’19) [paper] [code and data]
  • How Large A Vocabulary Does Text Classification Need? A Variational Approach on Vocabulary Selection
    Wenhu Chen, Yu Su, Yilin Shen, Zhiyu Chen, Xifeng Yan and William Yang Wang. In Proc. of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2019 (NAACL-HLT’19) [paper]
  • Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
    Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang. In Proc. of the AAAI Conference on Artificial Intelligence, 2019 (AAAI’19) [paper]
  • Concept Mining via Embedding
    Keqian Li, Hanwen Zha, Yu Su, Xifeng Yan. In Proc. of the IEEE International Conference on Data Mining, 2018 (ICDM’18) [paper]
  • XL-NBT: A Cross-lingual Neural Belief Tracking Framework
    Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan and William Yang Wang. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 2018 (EMNLP’18) [paper]
  • What It Takes to Achieve 100% Condition Accuracy on WikiSQL
    Semih Yavuz, Izzeddin Gur, Yu Su and Xifeng Yan. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 2018 (EMNLP’18) [paper]
  • DialSQL: Dialogue Based Structured Query Generation
    Izzeddin Gur, Semih Yavuz, Yu Su, Xifeng Yan. In Proc. of the Annual Meeting of the Association for Computational Linguistics, 2018, oral (ACL’18) [paper]
  • Natural Language Interfaces with Fine-Grained User Interaction: A Case Study on Web APIs
    Yu Su, Ahmed Hassan Awadallah, Miaosen Wang, Ryen White. In Proc. of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018, oral (SIGIR’18) [paper] [Microsoft Research Blog]
  • Global Relation Embedding for Relation Extraction
    Yu Su*, Honglei Liu*, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan. In Proc. of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018 (NAACL-HLT’18) [paper] [code] (*: Equal Contribution)
  • Unsupervised Neural Categorization for Scientific Publications
    Keqian Li, Hanwen Zha, Yu Su, Xifeng Yan. In Proc. of the SIAM International Conference on Data Mining, 2018, oral (SDM’18) [paper]
  • Building Natural Language Interfaces to Web APIs
    Yu Su, Ahmed Hassan Awadallah, Madian Khabsa, Patrick Pantel, Michael Gamon, Mark Encarnacion. In Proc. of the ACM International Conference on Information and Knowledge Management, 2017, oral (CIKM’17) [paper]
  • Cross-domain Semantic Parsing via Paraphrasing
    Yu Su, Xifeng Yan. In Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17) [paper] [code]
  • An End-to-End Deep Framework for Answer Triggering with a Novel Group-Level Objective
    Jie Zhao, Yu Su, Ziyu Guan, Huan Sun. In Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing, short paper (EMNLP’17) [paper]
  • Recovering Question Answering Errors via Query Revision
    Semih Yavuz, Izzeddin Gur, Yu Su, Xifeng Yan. In Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing, short paper (EMNLP’17) [paper]
  • On Generating Characteristic-rich Question Sets for QA Evaluation
    Yu Su, Huan Sun, Brian Sadler, Mudhakar Srivatsa, Izzeddin Gur, Zenghui Yan, Xifeng Yan. In Proc. of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16) [paper] [appendix] [data]
  • Improving Semantic Parsing via Answer Type Inference
    Semih Yavuz, Izzeddin Gur, Yu Su, Mudhakar Srivatsa, Xifeng Yan. In Proc. of the 2016 Conference on Empirical Methods in Natural Language Processing, oral (EMNLP’16) [paper]
  • A Fast Kernel for Attributed Graphs
    Yu Su, Fangqiu Han, Richard E. Harang, Xifeng Yan. In Proc. of the SIAM International Conference on Data Mining, 2016, oral (SDM’16) [paper] [appendix] [slides] [poster]
  • Table Cell Search for Question Answering
    Huan Sun, Hao Ma, Xiaodong He, Wen-Tau Yih, Yu Su, Xifeng Yan. In Proc. of the International World Wide Web Conference, 2016, oral (WWW’16) [paper]
  • Visual Graph Query Formulation and Exploration: A New Perspective on Information Retrieval at the Edge
    Sue Kase, Michelle Vanni, Joanne Knight, Yu Su, Xifeng Yan. In Proc. of SPIE 9851, Next-Generation Analyst IV, 2016 (SPIE Defense+Security’16)
  • Exploiting Relevance Feedback in Knowledge Graph Search
    Yu Su, Shengqi Yang, Huan Sun, Mudhakar Srivatsa, Sue Kase, Michelle Vanni, Xifeng Yan. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, oral (KDD’15) [paper] [slides] [poster] [data]
  • On the Validity of Geosocial Mobility Traces
    Zengbin Zhang, Lin Zhou, Xiaohan Zhao, Gang Wang, Yu Su, Miriam Metzger, Haitao Zheng, and Ben Y. Zhao. In Proc. of the ACM Workshop on Hot Topics in Networks, 2013 (HotNets’13) [paper]

     Tutorials

  • Scalable Construction and Querying of Massive Knowledge Bases
    Xiang Ren, Yu Su, Pedro Szekely, Xifeng Yan. In Proc. of the International Conference on World Wide Web, 2018 (WWW’18) [website]
  • Construction and Querying of Large-scale Knowledge Bases
    Xiang Ren, Yu Su, Xifeng Yan. In Proc. of the ACM International Conference on Information and Knowledge Management, 2017 (CIKM’17) [website]

Service

  • Organizer/Co-chair: NAACL'21 (Faculty Advisor to Student Research Workshop), NLP4Prog'21, NLI'20 (co-located with ACL'20), KBCOM'18 (co-located with WSDM'18)
  • Area Chair: ACL'22, NLPCC'20
  • Program Committee Member: ACL'22, ACL'21, EMNLP'21, NAACL'21, AKBC'21, KDD'21, AAAI'21, KDD'20 (session chair), ACL'20, EMNLP'20, AKBC'20, KDD'19 (session chair), NAACL'19, AAAI'19, AKBC'19, KDD'18, ACL'18, EMNLP'18, WWW'18, NLPCC'18, CoNLL'18, ACL'17, EMNLP'17, NLPCC'17, CIKM'17 (session chair)
  • Reviewer: TACL, ACL Rolling Review, IEEE TNNLS, IEEE TKDE

Sponsers

  • We are grateful for NSF (awards 2118240, 2112606, 2137806), Amazon, Walmart, Cisco, Fujitsu, and OSU TDAI for supporting our research.

Contact

  • Email: %s@osu.edu % 'su.809'