Casting a Wider Net: NLP for the Social Web

Natural language text dominates the information available on the Web. Yet the language of online expression often differs substantially, in both style and substance, from the language found in more traditional sources such as news. Making natural language processing techniques robust to this sort of variation is thus important for applications to behave intelligently when presented with Web text.

This talk presents new research applying two sequence prediction tasks—part-of-speech tagging and named entity detection—to text from online social media platforms (Twitter and Wikipedia). For both tasks, we adapt standard forms of annotation to better suit the linguistic and topical characteristics of the data. We also propose techniques to elicit more accurate statistical taggers, including linguistic features inspired by the domain (for part-of-speech tagging of Twitter messages) as well as modifications to the learning algorithm (for named entity detection in Arabic Wikipedia).

Wednesday, October 5, 2011 - 12:00

Room 1213

Contact Info: Nancy Lacson,

Nathan Schneider is a Ph.D. student in the Language Technologies Institute at Carnegie Mellon University, where he is advised by Noah Smith. Nathan’s current research focuses on machine learning as applied to various linguistic analysis problems in NLP, especially those involving semantics, multilingual or heterogeneous data, and second language acquisition. Other linguistic interests include Semitic languages, morphology, and the cognitive underpinnings of language. As an undergraduate, he studied Computer Science and Linguistics at the University of California, Berkeley.

CS Jobs Sitemap