Visit the links below for more information about our past projects.

Automated analysis of musical structure
Recurrent structure analysis
Sound Blocks, Sound Scratch
PyPortMIDI
Proximity Detecting Microphone
Chord Detection
Music Structure Modeling
Learning the meaning of music
Automatic Record Reviews
Eigenradio
Chiclet / The DSP Music Box and Concrete Music
Melody Retrieval on the Internet
Formalized Auditory Scene Analysis
The Audio Spotlight
Recording Studios Without Walls: Geographically Unrestricted Music Collaboration
Singing Voice Parameterization and Re-Synthesis
Instrument Identification and Cochlear Implants
Musical Preference Understanding
The MIT Experimental Music Studio
Automatic Generation of Sound Synthesis Techniques
Structured Audio: NetSound and MPEG-4
Automatic Musical Instrument Identification
Music Understanding Systems
Computational Auditory Scene Analysis
Perceptual Audio Models
3-D Audio using Loudspeakers
Prediction Driven CASA
KEMAR HRTF Measurements

» Automated analysis of musical structure

Professor Barry Vercoe and Wei Chai

Research on automatic music segmentation, summarization and classification using a framework combining music cognition, machine learning and signal processing. It will inquire scientifically into the nature of human perception of music, and offer a practical solution to difficult problems of machine intelligence for automatic musical content analysis and pattern discovery.

» Recurrent structure analysis

Professor Barry Vercoe and Wei Chai

This paper presents an algorithm that can automatically analyze the repetitive structure of musical signals. First, the algorithm detects the repetitions of each segment of fixed length in a piece using dynamic programming. Second, the algorithm summarizes this repetition information and infers the structure based on heuristic rules.

» Sound Blocks, Sound Scratch

Professor Barry Vercoe and John Harrison

SoundBlocks and SoundScratch are two different environments in which children can manipulate digital sound. SoundBlocks is a tangible programming language for describing dataflow with adaptive, context aware primitives and real-time sensing. SoundScratch is a set of sound primitives that extend the media-rich capabilities of the children's programming language called Scratch. Both environments have been created and developed as a way to explore how it might be possible to construct an environment in which youth design their own sounds. Children ages 10-15 years old have explored the environments and participated in user studies. Music educators have observed these studies, and their observations are summarized.

» PyPortMIDI

Professor Barry Vercoe and John Harrison

PyPortMidi is a Python wrapper for PortMidi. PortMidi is a cross-platform C library for realtime MIDI control. Using PyPortMidi, you can send and receive MIDI data in realtime from Python.

» Proximity Detecting Microphone

Professor Barry Vercoe and John Harrison

Worldwide, people fall into one of two categories: those who are experienced microphone users and those who are not. For experienced microphone users, a proximity detection system within a microphone could provide another dimensionality for expression. For inexperienced microphone users, a proximity detection system could make a microphone easier to use. To explore these ideas further, I added proximity detection to a microphone and added an amplification circuit whose gain was inversely proportional to the distance of the microphone from the user.

» Chord Recognition

Professor Barry Vercoe, John Harrison, and Victor Adan

How can a computer recognize chords in music?

» Music Structure Modeling

Professor Barry Vercoe and Victor Adan

This project explores representation as it pertains to music. Specifically, we address the question of how to extract the "essence" of a piece of music and how to use it as a source for generating new music with similar qualities. We approach the problem of modeling of high level musical structures from a dynamical systems and signal processing perspective, focusing on motion per se independently of particular musical systems or styles. The point of departure is the construction of a state space that represents geometrically the motion characteristics of music. We address ways in which this space can be modeled deterministically, as well as ways in which it can be transformed to generate new musical structures.

» Learning the meaning of music

Professor Barry Vercoe and Brian Whitman

Expression as complex and personal as music is not adequately represented by the signal alone. For every artist and song there is a significant culture of meaning connecting the perception to interpretation. This thesis aims to computationally model the meaning of music by taking advantage of community usage and description, using the self-selected and natural similarity clusters, opinions, and usage patterns as labels and ground truth to inform on-line and unsupervised 'music acquisition' systems. We present a framework for capturing community metadata from free-text sources, audio representations robust enough to handle event and meaning relationships yet general enough to work across domains of music, and a machine-learning framework for learning the relationship between music signals and reaction iteratively, at a large scale.

» Automatic Record Reviews

Professor Barry Vercoe and Brian Whitman

We analyze a large testbed of music and a corpus of reviews for each work to uncover patterns and develop mechanisms for removing reviewer bias and extraneous non-musical discussion. By building upon work in grounding free text against audio signals we invent an "automatic record review" system that labels new music audio with maximal semantic value for future retrieval tasks. In effect, we grow an unbiased music editor trained from the consensus of the online reviews we have gathered.

» Eigenradio

Professor Barry Vercoe and Brian Whitman

Eigenradio is the future music: a constantly live radio stream automatically synthesized by eigenanalysis of up to 100 radio stations at once.

» Chiclet / The DSP Music Box and Concrete Music

Professor Barry Vercoe and Brian Whitman

All music is already algorithmic in that a process generated it and generates it. Interference from a laser to pits on a plastic disc is just small-scale microcode, the simplest Turing machine scanning in radial order. But what are the possibilities when we're freed from the static medium? Can we embed a process in song from composer to audience?

» Melody Retrieval on the Internet

Professor Barry Vercoe and Wei Chai

The emergence of digital music on the Internet requires new information retrieval methods adapted to specific characteristics and needs. While music retrieval based on the text information, such as title, composers, or subject classification, has been implemented in many existing systems, retrieval of a piece of music based on musical content, especially an incomplete, imperfect recall of a fragment of the music, has not yet been fully explored. A query-by-humming system can find a piece of music in the digital music repository based on a few hummed notes. So when the user does not know the title or any other text information about the music, he is still able to search for music by humming the melody. Combining Internet, audio signal processing and database techniques, we are attempting to provide a friendlier interface for Internet music searching.

» Formalized Auditory Scene Analysis

Professor Barry Vercoe and Paris Smaragdis

Our hearing mechanism is very good in ignoring redundant sounds and parsing complex auditory scenes. This is a subject that has been extensively studied, but most of the work is in the heuristic level thus impractical for machine implementation. By redefining listening theories in a more rigorous and mathematical framework we can come closer to constructing machines capable of auditory consciousness.

» The Audio Spotlight

Professor Barry Vercoe and F. Joseph Pompei

Standard loudspeakers transmit sound which necessarily spreads very quickly, and control of sound projection and position is only about as flexible as where you can hang a loudspeaker. The Audio Spotlight is a device that will project sound much like a spotlight projects light; shining it at a listener allows only them to hear it, while shining it at a surface causes the sound to appear to originate from there, creating something of a 'virtual loudspeaker'. Beamsteering by phased arrays allows the sound to move, enabling the user to dynamically place sound exactly, and only, where it is desired.

» Recording Studios Without Walls: Geographically Unrestricted Music Collaboration

Professor Barry Vercoe and Nyssim Lefford

Music producers and recording musicians move from city to city and one recording facility to another in order to expand the their options for collaboration with other musicians or technicians. This project examines the development of an Internet-based, music recording system that will enlarge the pool of potential collaborators without requiring physically movement from location to location. The Internet provides a medium through which recorded performances can be transmitted from performer to producer in (near) real-time over great distances. This research investigates the design of a system that will make optimal use of available bandwidth during transmission while retaining the artistic dialogue between collaborators that is central to the music production process.

» Singing Voice Parameterization and Re-synthesis

Professor Barry Vercoe and Youngmoo Kim

The human singing voice is the oldest musical instrument, yet it is one of the most difficult to simulate convincingly. Using signal processing and pattern recognition techniques in combination with prior knowledge of the musical score, we are attempting to extract control parameters which capture the perceptually most significant features. This parameterized, structured model could then be efficiently transmitted and re-synthesized, resulting in a high quality recreation of the original performance.

» Instrument Identification and Cochlear Implants

Professor Barry Vercoe and Rebecca Reich

Significant segments of the population are unable to fully appreciate music due to some form of hearing loss. Individuals with profound sensory deafness can be surgically implanted with a cochlear implant, a prosthetic device originally designed to convey speech and environmental sounds. Current cochlear implant recipients still do not perform adequately on music-related tests. Several subjective tests have been administered to attempt to understand what implantees can process in terms of pitch, rhythm and timbral attributes of music. However, the need to systematically test music-related recognition tasks still exists, in order to isolate the limitations of the cochlear implant. This research focuses on one such recognition task: identification of musical instruments. The objective is to quantify recognition scores by normal-hearing listeners using a simulation of a cochlear implant. Such an investigation would help determine what kind of information might be lost in the first stage of processing, thus leading to recommendations for an improved processing scheme for music signals.

» Musical Preference Understanding

Professor Barry Vercoe and Brian Whitman

Different people respond to and query for music in vastly different ways. Someone's favorite song is only barely recognizable by another: this leads us to question why and how we attach preference and memory to musical information. We are working on an architecture to model the individual and singular representation of a piece of music by a human listener. This model can then be applied to common music retrieval tasks such as recommendation and search, which in turn could leverage the power of the immediate delivery of music over networks while allowing users to discover new and varied music.

The MIT Experimental Music Studio gave its first computer music courses in the 1973-74 school year. The EMS went on to become one of the most innovative studios in the field of electronic music. The MIT Media Lab commemorated the 25th anniversary of the EMS with an international symposium and public concert of computer music at MIT's Kresge Auditorium. Visit the EMS homepage for more details.

» Automatic Generation of Sound Synthesis Techniques

Barry Vercoe and Ricardo A. Garcia

Several sound synthesis techniques (and topologies) have been developed through the years. All of them can be described using simple, functional objects. Any sound can be synthesized by combining these objects in numerous combinations, and it is impractical to test all possibilities. A genetic program can be used to search this space and find one (or several) topologies that meet our goal. The generated topologies are not restricted to known synthesis techniques. The sound and its control parameters can be specified as input to the genetic program, which will incorporate these in the generated model.

» Structured Audio: NetSound and MPEG-4

Professor Barry Vercoe, Eric Scheirer, Paris Smaragdis, F. Joseph Pompei

New streaming sound encoding technologies are starting to fill the demand for low-bandwidth transmission of audio over the Internet, but such information-theoretic encoding schemes often don't allow for high-quality audio transmission or client-side sound manipulation. We are developing NetSound, a structured audio coding/decoding scheme (sort of a "Postscript for audio") that allows sound descriptors, algorithms, models, and schedulers to be sent efficiently over the Internet, and then altered, personalized, and reconstructed in real-time at the receiver site with high fidelity. A set of tools based on this concept have been proposed by the Machine Listening Group and accepted by the MPEG Consortium as part of the MPEG-4 international standard. We continue to develop and refine Structured Audio techniques in collaboration with researchers and industrial labs around the world.

» Automatic Musical Instrument Identification

Professor Barry Vercoe and Keith Martin>

When we hear sounds, our brains extract features from the audio signal that highlight the physical characteristics of the sound source, including its size and material properties. By combining low-level feature extraction modeled after the brain with high-level sound-source models, we are building computer systems that can learn to recognize sound sources in the environment. In particular, we are currently at work on a system that listens to music and learns to recognize the instruments, much as a young child might.

» Music Understanding Systems

Prof. Barry Vercoe and Eric Scheirer

We are engaged in the construction of computer systems which can understand music like people do. By conducting psychoacoustic and psychological research into the science of human music perception, we learn about the way the brain processes music. Then we use these findings to build tools which can perform sophisticated musical tasks such as automated music annotation, multimedia database search, query-by-example, and advanced authoring and remixing systems.

» Computational Auditory Scene Analysis

Professor Barry Vercoe and Keith Martin

By modeling human auditory processes such as sound localization, signal grouping and separation, pitch-tracking, and high-level domain understanding, machines can exhibit elements of sound recognition ranging from feature detection and texture recognition to generalized auditory scene analysis. This enables machines to extract information from soundtracks or audio environments, or search through large databases of unprocessed sound data.

» Perceptual Audio Models

Professor Barry Vercoe and Michael Casey

Model-based audio is an emerging paradigm for the representation and distribution of sound for networked, computer-based applications. It is an alternative to sample- and stream-based representations of audio, which are the prevalent modes of sound dissemination at this time. This project explores the use of perceptually based feature representations of sound for creating controllable, compact audio models for networked computer applications, such as virtual worlds and 3-D games, and new audio distribution protocols for sound and music.

» 3-D Audio using Loudspeakers

Professor Barry Vercoe and Bill Gardner

3-D audio systems, which can surround a listener with sounds at arbitrary locations, are an important part of immersive interfaces. A new approach is presented for implementing 3-D audio using a pair of conventional loudspeakers. The new idea is to use the tracked position of the listener's head to optimize the acoustical presentation, and thus produce a much more realistic illusion over a larger listening area than existing loudspeaker 3-D audio systems. By using a remote head tracker, for instance based on computer vision, an immersive audio environment can be created without donning headphones or other equipment.

» Prediction Driven CASA

Professor Barry Vercoe and Dan Ellis

The sound of a busy environment gives rise to a perception of numerous distinct events in a human listener - the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as `computational auditory scene analysis'. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot `infer' the presence of a sound for which direct evidence is hidden by other components. The `prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment.

» KEMAR HRTF Measurements

Professor Barry Vercoe, Bill Gardner, and Keith Martin

An extensive set of head-related transfer function (HRTF) measurements of a KEMAR dummy head microphone was completed in May, 1994. The measurements consist of the left and right ear impulse responses from a Realistic Optimus Pro 7 loudspeaker mounted 1.4 meters from the KEMAR. Maximum length (ML) pseudo-random binary sequences were used to obtain the impulse responses at a sampling rate of 44.1 kHz. A total of 710 different positions were sampled at elevations from -40 degrees to +90 degrees. Also measured were the impulse response of the speaker in free field and several headphones placed on the KEMAR. This data is being made available to the research community on the Internet via anonymous FTP and the World Wide Web.

Projects (Archived)