research

Current Projects

Responsiveness in Spoken Tutorial Dialogs
An interesting aspect of human-human conversations is the ability for a conversant to pick up on the nuances of the other conversant's speech and be able to infer the other's state at the time, such as if they are happy, frustrated, or confused. Using the prosody of the utterances and the timing between utterances, an insightful conversant is able to alter their own speaking style and feedback to encourage the other speaker. By evaluating data obtained through a corpus collected from a skilled tutor, we will be able to develop rules of when to use and how to use specific acknowledgments and feedback. Using these rules, we hope to build a more human-like tutoring system that users will find preferable to a system without these rules. This is an extension of a previous study done in Japanese.

Tasha Hollingsed, Ernesto Medina (Advisor: Nigel Ward)

Prosodic Cues that Lead to Back-Channel Feedback in Northern-Mexican Spanish
Although human-human spoken dialog is generally fluent and natural, interaction with today's spoken dialog systems is often rigid and unpleasant. Thus a research priority is to identify aspects of human-human interaction that may be exploited for better human-computer dialog systems. One such aspect is back-channel feedback, that is, the short utterances which a listener commonly produces while a speaker is talking, such as uh-huh in English and si, aja, and mjm in Spanish. A dialog system that appropriately produces such utterances may seem more encouraging and competent to users. To do this systems need to know the places in which back-channels are welcomed. This study analyzed prosodic features of five conversations to identify the cues which lead to back-channel feedback. In Northern Mexican Spanish, these places are mostly characterized by a pitch downslope followed by a pitch rise accompanied by a rate reduction on the last syllable and then a drop in energy followed by a slight pause. A quantitative model based on this feature gave 29% coverage and 14% accuracy.

Luis H. Acosta Reyes and Anais Rivera (Advisor: Nigel Ward)

Prosodic Features that Invite Back-Channel Responses in Arabic
To be a good listener requires active listening, which can be achieved partly by producing small utterances called back-channel feedback. Second language learners, even if masters in grammar and vocabulary, can easily appear uninterested when they do not show responsiveness in real face-to-face conversations. In Arabic, there is a lack of resources that teach L2 learners when to produce back-channels in dialog. The most frequent prosodic feature used as a cue for back-channel feedback in Arabic was found to be a steep pitch downslope. The performance of this feature as a predictive rule gave 43% coverage and 13% accuracy on a 168 minute corpus of Egyptian Arabic (Ward and Al Bayyari; 2006). The next most frequent cue (causing 14% of the total back-channels) is an upturn in pitch, whose predictive rule gave 19% coverage and 15% accuracy on 112 minutes of Iraqi Arabic dialogs. Further work to be done on whether there are any visual cues exist in Arabic dialog. For achieving this, a video-recorded Arabic corpus will be used. Additionally, we have done work to find out whether speakers' gestures play a role in cueing back-channel feedback from listeners. We used a video-recorded Iraqi Arabic corpus of face-to-face free-content dialogs which we collected. We found that the tendency of visual cues co-occurred with subsequent back-channel was not significant. Also, we found that a prosodic cue accompanied by a visual cue is not stronger than a prosodic cue alone in eliciting a back-channel response.

Yaffa Al Bayyari (Advisor: Nigel Ward)

A Training Tool for Teaching Arabic Back-channeling Behavior
We aim to develop a tool to teach back-channeling to learners of a foreign language. The current development version of the trainer has been used in pilot studies combined with Flash-based tutorials and the results show that a 15-minute session with the trainer and the tutorials is effective in teaching basic back-channeling behavior. The tool, called “Back-channel Trainer”, works as follows: The core trainer plays back a conversation to the user taken from real phone conversations in Arabic. These conversations present the user with back-channel opportunities, so that learners receive numerous examples of the cues. The listener’s side of the conversations, that is, the side containing the back-channel productions to be emulated by the learner is removed and the remaining side is used as the stimulus. The trainer tool features a visual indicator to call attention to the cue that is included in the stimulus track. The trainer scores the learner based on the timing and frequency quality of her productions.

Rafael Escalante, Nigel Ward, Yaffa Al Bayyari, Thamar Solorio.

Open Source Game System
This project provides public-domain programming resources for authors and developers of video games.

Joaquin A. Aguilar (Advisor: David Novick)

Adding the Ability to Eavesdrop on Opponent Communications to an Online Game
This project is aimed at determining whether allowing opponents to overhear each other's conversations during an team online capture the flag match improves the fun factor of a game. The implementation uses the Quake 3 source code as the core platform and incorporates elements of other open source library packages such as PortAudio, Speex, SOX, and Raknet. Several User Studies are being conducted to test whether the new feature is valuable. The subjects will engage in capture-the-flag matches, with and without the eavesdropping ability; capture the flag intrinsically gives scenarios that require users to plan strategies, work in teams, and communicate. The positive and negative impacts of this new feature on subjective satisfaction will be measured, including vividness, realism, enjoyment, difficulty, novelty, and sensory gratification.

Jaime Acosta (Advisor: Nigel Ward)


Previous Projects

A Tool for Analysis of Sound Files
This project involved the extension of Wavesurfer, an existing open-source analysis tool. The original software allows a user to analyze various aspects of a sound file, including energy and pitch. It also provides a user with the ability to transcribe the files. The main feature added was support for stereo sound files, because the original version supported only mono files.

Ernesto Medina (Advisor: Thamar Solorio)

The Effects of Transmission Delay on Conversation Dynamics
In telephony, the effects of transmission delay on user satisfaction have been long studied, and the relationship is quantified as part of a standard predictive model, the E-model. However this research has focused almost exclusively on telephony in traditional contexts, with both conversants seated in quiet offices and devoting full attention to the conversation. We plan to reexamine the effects of transmission delay on conversational dynamics, and explore second-order effects that may arise in mobile contexts.

Anais Rivera (Advisor: Nigel Ward)


Byrics Software, a Tool to Assist the Deaf and Hard of Hearing Experience Music
To be born deaf or hard of hearing, a person enters a world of limited to complete silence. Their world contains limited auditory conversation, music, and the sounds about them. There are some deaf and/or hard of hearing people who do not understand music since they lack the ability to hear. People with hearing disabilies have lost the ability to hear some or all frequencies that exist in a song and possibly hear the music, but the lyrics are hard to place since they have a higher pitch. A tool is being developed that will convey music through the senses of sight, sound, and touch via vibrations would enable them to enhance the experience and gain a better appreciation and understanding of music. The tool has the ability to play music, display lyrics, spectrograms and interact with Snugums. Snugums, from Somatron, is a vibro-acoustic speaker inside a furry creature. This device provides sound vibrations without having to raise the volume.

Rosario Chavez (Advisor: Nigel Ward)


Automated Processing of interlanguages
The main goal is to extend Natural Language Processing tools by allowing them to process language alternations. We are currently working with Spanglish. In this context, we use the term Spanglish to refer to the language alternation patterns of English and Spanish observed in large Hispanic communities across the U.S. As a first step, we gathered a small corpus of Spanglish use in a natural conversation between several Hispanic speakers. We then explored the use of existing language modeling toolkits and evaluate how well these tools can be trained with this corpus (read more here). Currently we are working on developing a Part-of-Speech tagger for Spanglish.

Juan Carlos Franco (Advisor: Thamar Solorio)

Consistent Generation of Text and Graphics
Safety-critical applications such as aircraft maintenance manuals contain both textual and graphical descriptions of the same systems. Incidents have occurred as a result of inconsistencies between the textual and graphical descriptions. To address this problem, we are developing ways of producing both the text and graphics for a system from a common logical description of the system. This approach will also enable generation of the text in multiple languages.

PI: Dr. David Novick

Speech Recognition System Development
This project involves the study of speech recognition software and the development of scripts written in Perl that can be used to automatically build speech recognition systems. The purpose for the development of these scripts is to build a speech recognition system that can recognize a user's single word answer to quiz items presented by an interactive tutoring system (ITS) under development by Tasha Hollingsed. Eventually, the two systems will be integrated to form an ITS that can automatically recognize answers. In addition, these scripts can also be used to customize and evaluate the accuracy of the speech recognition system. In particular, we are producing scripts that work with The Hidden Markov Model Toolkit (HTK), a speech recognition system developed by the Cambridge University Engineering Department.

Ernesto Medina (Advisor: Nigel Ward)

Acknowledgments in Human-Computer Interaction
This study focuses on the use of acknowledgments as a form of understanding and agreement during Human-Computer dialogues. This study asks if people are unwilling to use acknowledgments when speaking to a machine or if their scarcity is due to the way that spoken dialogue systems are designed. This study was run in English and Spanish.

PI: Karen Ward
Other study members: Tasha Hollingsed, Javier Aldaz Salmon

Extended Direct Manipulation
This study's objective is to extend direct-manipulation interfaces by incorporating, via the direct-manipulation modality itself, interaction techniques that add kinds of language features associated with spoken conversation.

PI: David Novick
Other study members: Armando Sandoval, Gabriel Sotelo

An Improved Tool For Taking Notes in the Classroom
Although computers have become ubiquitous, largely replacing traditional tools such as paper and pen, there remains a common environment where computers are seldom used: the classroom. The reason for this is that class notes have unique properties not seen in other documents. In early 2002 Tatsukawa and Ward's NoteTaker demonstrated how to design a system suitable for taking notes in class, and studies found that it could be useful for students. However weaknesses of the hardware available at that time limited the utility of this system. The system developed here provides similar functionality on the Tablet PC, a PC design with a high resolution digitizing tablet released in late 2002. This version includes a GUI redesigned for SWING and various event-handling optimizations to provide good performance on the Tablet PC. This version was tested by five subjects using it to take notes in their classes, and most rated it positively.

Thesis Project by: Jabel Morales
Advisor: Nigel Ward

Situational Awareness in Medical Displays
Study in which we apply Endsley's Situation Awareness Model (Endsley, 1995) and PICTIVE's low-tech approach to participatory design (Muller, 1992) to the display of medical records. We believe that this process will yield a better way of displaying medical data that helps medical professionals better understand the condition of their patients.

Thesis Project by: Francisco Romero
Advisor: Karen Ward

Towards Automatic Transcription of Sports Play-by-Play
Play-by-play broadcasts of sports events are a challenge for speech recognition because they involve noise, multiple speakers, emotions, and other phenomena of spontaneous speech. This study is a first attempt to build a system to recognize football play-by-play, initially focusing on the combination of two language models, one for the event descriptions and one for the commentary.

Thesis Project by: Ryota Miura
Advisor: Nigel Ward

Border Agent-based Simulation Environment (BASE)
This project is developing a single-cell automata simulation tool for modeling the complex phenomena of political borders. The objective of the modeling is not so much prediction as understanding the nature of border phenomena. The questions being modeled and explored in our Border Agent-based Simulation Environment (BASE) begin with migration stimulated by material push/pull factors, namely, availability of food, and extend to the effects of the border, such as rates of filtering, mortality, congestion, asymmetries (resources, fertility, disease) on each side of the border, acculturation, the effects of communication, resulting patterns of interaction, including inter-breeding, among agents of different backgrounds, ownership, importation and theft of resources, and secondary borders, such as highway checkpoints. The results of the research will have value both for policy-makers facing border issues and for computer scientists striving to make simulations more realistic.

Guillermo Enriquez (Advisors: David Novick, Jon Amastae)

Some Practical Issues and Research Priorities in Dialog Management
Today's spoken dialog systems fall short of the human to human ideal. Is this due to the fact that developers do not always use the state of the art in the development of their systems or is it because the technologies necessary to improve the spoken interactions do not yet exist? To answer this question we developed a credit card system using state-of-the-art technology. We conducted experiments in which the subjects were to complete three tasks both using the system and speaking to a human operator. We were able to observe and classify the differences between the two dialogs. We expect that these observations will provide us with a deeper insight into what would be necessary in order to achieve the ideal interaction.

Anais Rivera (Advisor: Nigel Ward)

Adaptive Speaking Rate in a Tutorial Spoken Dialog System
Today we have many tutor-systems which can train users in memory games and there are also systems which can adapt their speed to user's needs. Our aim is to build a system, which can do both, a tutor adapting its speed to suit the needs of the user such that there is neither loss of time due to too slow prompt, nor need for repetition of statements due to too fast prompts. Further we plan to link such a system to external processes for prosodic based high responsive behavior to improve turn taking, emulating human - human interaction.

Kumar Mamidipally (Advisor: Nigel Ward)

Non-Lexical Utterances
There are a many sounds in conversation that are not words, such as u-uh, uh-uhmmm, um-um-hu-mh, ahh, aum, and haah. These sounds play important roles in human-human communication --- especially the managment of information status, interpersonal affect, and channel control --- but are not used in spoken dialog systems today. We are exploring the phonetics and pragamatics of such sounds in English, Japanese, and Spanish. So far we have concluded, based on both corpus and experimental evidence, that these items are in part compositional, in that each component sound brings a corresponding component of meaning. We are now extending the empirical work and exploring potential applications.

PI: Dr. Nigel Ward

An Acceptance Estimator for Computer Science Graduate Admissions
Potential applicants to graduate school find it difficult to predict, even approximately, which schools will accept them. We have created a predictive model of admissions decision-making, based on analysis of a database of past decisions. Interesting aspects of the model include the way that weights are assigned dynamically to various factors based on the informativeness of each factor and based on the applicant's relative strengths on each factor. This model has been packaged in the form of a Web page that enables a student to enter his or her information and see a list of schools where he or she is likely to be accepted. We are currently working to improve the accuracy of the model and make it more generally useful.

PI: Dr. Nigel Ward