Research

Natural Language Processing studies the computational models that underlie the processing of human languages, and develops key technology that allows users to interact with computers using English or Italian rather than a programming language. One of the many challenges is to develop interactive applications where users can hold a natural conversation with a computer. This has been the focus of the NLP laboratory at UIC for many years, especially as concerns the computational modeling of extended text (discourse) and conversations between two or more agents (dialogue). In the last 15 years, the lab has pursued what could be called NLP with a purpose: interfaces whose core is NLP technology and that have the potential of positively affecting society. In this vein, we have explored three main strands of research, briefly described below:

Natural Language interfaces for educational technology
Summarization and recommender systems, including in the health sciences
Human-robot interaction.

Methodology and Foundational work

The methodology we employ consists of data mining from corpora, motivated as much by linguistic and cognitive principles as by empirical considerations; development of computational frameworks based on the corpus analysis; and rigorous evaluation of the ensuing software systems. Our work has provided significant contributions to the whole area, including issues of resource creation and validation. The most cited paper from the lab is (Di Eugenio & Glass, 2004), a critical look at the Kappa coefficient of inter annotator agreement. We still strongly believe in marrying linguistic insights and statistics, when we can show that linguistic knowledge results in better models. For example, we developed an innovative discourse parser that incorporates verb semantics and that performs better than models based only on lexical and syntactic information (Subba & Di Eugenio, 2009). This work also resulted in a publicly available corpus of texts annotated with discourse structure (please send inquiries in this regard to This e-mail address is being protected from spambots. You need JavaScript enabled to view it ). Similarly, we explored a variety of parameter settings in different corpora for centering, a theory of reference (Poesio et al., 2004). In work on recognizing dialogue acts, we showed that information about the hierarchical structure of the dialogue (dialogue games) improves empirical models (Di Eugenio et al., 2010b). On the basis of some older work on discourse cues (Di Eugenio et al., 1997), we have recently explored the societal important application of translating from Italian into Italian Sign Language (Lugaresi & Di Eugenio, 2013).

NL for educational technology

Since 2000, we have been investigating what makes human conversations effective in a learning context; and what is the value-added of NL interfaces to educational technology applications. In collaboration with psychologists and instructional experts, we have been studying tutorial and peer dialogues, i.e., dialogues between a human tutor and a tutee, or between two peers studying together. We collect, analyze and mine dialogues for instructional / collaboration strategies that are cognitively plausible and correlate with learning. Several software systems have resulted from this work. In our first project, DIAG-NLP (Di Eugenio et al., 2005; Di Eugenio et al., 2008), we showed that more concise and abstract feedback would lead to more learning in diagnosing simulated malfunctions of a mechanical system. In the last few years, we have focused on instruction in introductory Computer Science classes. We investigated two different approaches: iList, an Intelligent Tutoring System (Fossati et al., 2008; Fossati et al., 2009a; Fossati et al., 2009b), and KSC-PaL, a system with which a student can collaborate as if with a real peer (Kersey et al., 2009; Kersey et al., 2010). Both systems have been found to be conducive to learning. In particular, iList has been evaluated with more than 200 students, and downloaded by about 1500 additional users (iList is publicly available and downloadable from http://www.digitaltutor.net/). Our current funding from the Qatar Research Foundation (NPRP award 5-939-1-155) seeks to expand iList into ChiQat-Tutor. This research covers the gamut from investigating the best pedagogical strategies to teach recursion to sophisticated data mining models of the log data from iList (Di Eugenio et al., 2013b).

Summarization for entertainment and for health care

The vast amount of available language data calls for appropriate tools to manage it, such as summarization, a well-established field in NLP. However, summarization research has mostly resulted in summaries composed of sentences extracted whole from the text. First, we worked on generating summaries composed of subsentential fragments that are recombined to create new sentences (Xie et al., 2004; Xie et al., 2008). In more recent work, we used information extraction and generation techniques to produce summaries of reviews of songs for a Music Recommendation System (Tata & Di Eugenio, 2010; Tata & Di Eugenio, 2012). One current project applies summarization to the generation of patient-centric summaries of hospital stays, which integrate information that comes from both discharge notes, written by doctors, and nursing notes. Our first result computationally confirms our intuition that there is very scant overlap between the notes written by doctors and by nurses for the same patient (Di Eugenio et al., 2013a).

Human-robot interaction

In the last few years, the RoboHelper project (supported by NSF award IIS 0905593) has explored the development of robots tailored to the needs of the elderly (Di Eugenio et al., 2010a). We collected the multimodal ELDERLY-AT-HOME corpus, where one assistant collaborates with an elderly person in performing Activities of Daily Living (ADLs). The project has focused on building a multimodal interface for communication between the elderly person and the robot, since our data collection confirms that beyond language, gestures and haptic actions (gestures that involve touch) are pervasive in this sort of interaction. The corpus has been annotated for a variety of information and will be made available in due course. We have developed a multimodal dialogue manager that performs multimodal reference resolution (Chen & Di Eugenio, 2012), models the fact that these interactions comprise not only dialogue acts but also physical actions, and predicts the next dialogue act on the basis of the preceding multimodal signals (Chen & Di Eugenio, 2013).