|
Visit the links below for more information about our past projects.
Automated analysis of musical structure
Professor Barry Vercoe and Wei Chai
Research on automatic music segmentation, summarization and classification using a framework
combining music cognition, machine learning and signal processing. It will inquire scientifically into the
nature of human perception of music, and offer a practical solution to difficult problems of machine
intelligence for automatic musical content analysis and pattern discovery.
Recurrent structure analysis
Professor Barry Vercoe and Wei Chai
This paper presents an algorithm that can automatically analyze the repetitive structure of musical
signals. First, the algorithm detects the repetitions of each segment of fixed length in a piece using dynamic
programming. Second, the algorithm summarizes this repetition information and infers the structure based on
heuristic rules.
Sound Blocks,
Sound Scratch
Professor Barry Vercoe and John Harrison
SoundBlocks and SoundScratch are two different environments in which children can manipulate digital sound.
SoundBlocks is a tangible programming language for describing dataflow with adaptive, context aware
primitives and real-time sensing.
SoundScratch is a set of sound primitives that extend the media-rich capabilities of the children's
programming language called Scratch.
Both environments have been created and developed as a way to explore how it might be possible to construct
an environment in which youth design their own sounds. Children ages 10-15 years old have explored the environments
and participated in user studies. Music educators have observed these studies, and their observations are summarized.
PyPortMIDI
Professor Barry Vercoe and John Harrison
PyPortMidi is a Python wrapper for PortMidi. PortMidi is a cross-platform C library for
realtime MIDI control. Using PyPortMidi, you can send and receive MIDI data in realtime from Python.
Proximity Detecting Microphone
Professor Barry Vercoe and John Harrison
Worldwide, people fall into one of two categories: those who are experienced microphone users and those who
are not. For experienced microphone users, a proximity detection system within a microphone could provide
another dimensionality for expression. For inexperienced microphone users, a proximity detection system could make
a microphone easier to use. To explore these ideas further, I added proximity detection to a microphone and
added an amplification circuit whose gain was inversely proportional to the distance of the microphone from the user.
Chord Recognition
Professor Barry Vercoe, John Harrison, and Victor Adan
How can a computer recognize chords in music?
Music Structure Modeling
Professor Barry Vercoe and Victor Adan
This project explores representation as it pertains to music. Specifically,
we address the question of how to extract the "essence" of a piece of
music and how to use it as a source for generating new music with similar
qualities.
We approach the problem of modeling of high level musical
structures from a dynamical systems and signal processing perspective,
focusing on motion per se independently of particular musical systems
or styles.
The point of departure is the construction of a state space that represents
geometrically the motion characteristics of music. We address ways in
which this space can be modeled deterministically, as well as
ways in which it can be transformed to generate new musical structures.
Learning the meaning of music
Professor Barry Vercoe and Brian Whitman
Expression as complex and personal as music is not adequately represented by the signal alone.
For every artist and song there is a significant culture of meaning connecting the perception to
interpretation. This thesis aims to computationally model the meaning of music by taking advantage
of community usage and description, using the self-selected and natural similarity clusters, opinions,
and usage patterns as labels and ground truth to inform on-line and unsupervised 'music acquisition'
systems. We present a framework for capturing community metadata from free-text sources, audio
representations robust enough to handle event and meaning relationships yet general enough to work
across domains of music, and a machine-learning framework for learning the relationship between
music signals and reaction iteratively, at a large scale.
Automatic Record Reviews
Professor Barry Vercoe and Brian Whitman
We analyze a large testbed of music and a corpus of reviews for each work to uncover patterns
and develop mechanisms for removing reviewer bias and extraneous non-musical discussion. By
building upon work in grounding free text against audio signals we invent an "automatic record
review" system that labels new music audio with maximal semantic value for future retrieval tasks.
In effect, we grow an unbiased music editor trained from the consensus of the online reviews we
have gathered.
Eigenradio
Professor Barry Vercoe and Brian Whitman
Eigenradio is the future music: a constantly live radio stream automatically synthesized by
eigenanalysis of up to 100 radio stations at once.
Chiclet / The DSP Music Box and Concrete Music
Professor Barry Vercoe and Brian Whitman
All music is already algorithmic in that a process generated it and generates it. Interference
from a laser to pits on a plastic disc is just small-scale microcode, the simplest Turing machine
scanning in radial order. But what are the possibilities when we're freed from the static medium?
Can we embed a process in song from composer to audience?
Melody Retrieval on the Internet
Professor Barry Vercoe and Wei Chai
The emergence of digital music on the Internet requires
new information retrieval methods adapted to specific characteristics
and needs. While music retrieval based on the text information, such as
title, composers, or subject classification, has been implemented in many
existing systems, retrieval of a piece of music based on musical content,
especially an incomplete, imperfect recall of a fragment of the music,
has not yet been fully explored. A query-by-humming system can find a
piece of music in the digital music repository based on a few hummed notes.
So when the user does not know the title or any other text information
about the music, he is still able to search for music by humming the melody.
Combining Internet, audio signal processing and database techniques, we
are attempting to provide a friendlier interface for Internet music searching.
Formalized Auditory Scene Analysis
Professor Barry Vercoe and Paris Smaragdis
Our hearing mechanism is very good in ignoring redundant
sounds and parsing complex auditory scenes. This is a subject that has
been extensively studied, but most of the work is in the heuristic level
thus impractical for machine implementation. By redefining listening theories
in a more rigorous and mathematical framework we can come closer to constructing
machines capable of auditory consciousness.
The Audio Spotlight
Professor Barry Vercoe and F. Joseph Pompei
Standard loudspeakers transmit sound which necessarily
spreads very quickly, and control of sound projection and position is
only about as flexible as where you can hang a loudspeaker. The Audio
Spotlight is a device that will project sound much like a spotlight projects
light; shining it at a listener allows only them to hear it, while shining
it at a surface causes the sound to appear to originate from there, creating
something of a 'virtual loudspeaker'. Beamsteering by phased arrays allows
the sound to move, enabling the user to dynamically place sound exactly,
and only, where it is desired.
Recording Studios Without Walls: Geographically Unrestricted Music Collaboration
Professor Barry Vercoe and Nyssim Lefford
Music producers and recording musicians move from city
to city and one recording facility to another in order to expand the their
options for collaboration with other musicians or technicians. This project
examines the development of an Internet-based, music recording system
that will enlarge the pool of potential collaborators without requiring
physically movement from location to location. The Internet provides a
medium through which recorded performances can be transmitted from performer
to producer in (near) real-time over great distances. This research investigates
the design of a system that will make optimal use of available bandwidth
during transmission while retaining the artistic dialogue between collaborators
that is central to the music production process.
Singing Voice Parameterization and Re-synthesis
Professor Barry Vercoe and Youngmoo Kim
The human singing voice is the oldest musical instrument,
yet it is one of the most difficult to simulate convincingly. Using signal
processing and pattern recognition techniques in combination with prior
knowledge of the musical score, we are attempting to extract control parameters
which capture the perceptually most significant features. This parameterized,
structured model could then be efficiently transmitted and re-synthesized,
resulting in a high quality recreation of the original performance.
Instrument Identification and Cochlear Implants
Professor Barry Vercoe and Rebecca Reich
Significant segments of the population are unable to fully appreciate music due to
some form of hearing loss. Individuals with profound sensory deafness can be surgically
implanted with a cochlear implant, a prosthetic device originally designed to convey speech
and environmental sounds. Current cochlear implant recipients still do not perform adequately
on music-related tests. Several subjective tests have been administered to attempt to understand
what implantees can process in terms of pitch, rhythm and timbral attributes of music. However,
the need to systematically test music-related recognition tasks still exists, in order to isolate
the limitations of the cochlear implant. This research focuses on one such recognition task:
identification of musical instruments. The objective is to quantify recognition scores by
normal-hearing listeners using a simulation of a cochlear implant. Such an investigation would
help determine what kind of information might be lost in the first stage of processing, thus
leading to recommendations for an improved processing scheme for music signals.
Musical Preference Understanding
Professor Barry Vercoe and Brian Whitman
Different people respond to and query for music in vastly
different ways. Someone's favorite song is only barely recognizable by
another: this leads us to question why and how we attach preference and
memory to musical information. We are working on an architecture to model
the individual and singular representation of a piece of music by a human
listener. This model can then be applied to common music retrieval tasks
such as recommendation and search, which in turn could leverage the power
of the immediate delivery of music over networks while allowing users
to discover new and varied music.
The MIT Experimental Music
Studio gave its first computer music courses in the 1973-74 school year.
The EMS went on to become one of the most innovative studios in the field
of electronic music. The MIT Media Lab commemorated the 25th anniversary
of the EMS with an international symposium and public concert of computer
music at MIT's Kresge Auditorium. Visit the EMS
homepage for more details.
Automatic Generation of Sound Synthesis Techniques
Barry Vercoe and Ricardo A. Garcia
Several sound synthesis techniques (and topologies) have been developed
through the years. All of them can be described using simple, functional
objects. Any sound can be synthesized by combining these objects in numerous
combinations, and it is impractical to test all possibilities. A genetic
program can be used to search this space and find one (or several) topologies
that meet our goal. The generated topologies are not restricted to known
synthesis techniques. The sound and its control parameters can be specified
as input to the genetic program, which will incorporate these in the generated
model.
Structured Audio: NetSound and MPEG-4
Professor Barry Vercoe, Eric Scheirer, Paris Smaragdis, F. Joseph Pompei
New streaming sound encoding technologies are starting to fill the demand
for low-bandwidth transmission of audio over the Internet, but such information-theoretic
encoding schemes often don't allow for high-quality audio transmission
or client-side sound manipulation. We are developing NetSound, a structured
audio coding/decoding scheme (sort of a "Postscript for audio") that
allows sound descriptors, algorithms, models, and schedulers to be sent
efficiently over the Internet, and then altered, personalized, and reconstructed
in real-time at the receiver site with high fidelity. A set of tools based
on this concept have been proposed by the Machine Listening Group and
accepted by the MPEG Consortium as part of the MPEG-4 international standard.
We continue to develop and refine Structured Audio techniques in collaboration
with researchers and industrial labs around the world.
Automatic Musical Instrument Identification
Professor Barry Vercoe and Keith Martin
When we hear sounds, our brains extract features from the audio signal
that highlight the physical characteristics of the sound source, including
its size and material properties. By combining low-level feature extraction
modeled after the brain with high-level sound-source models, we are building
computer systems that can learn to recognize sound sources in the environment.
In particular, we are currently at work on a system that listens to music
and learns to recognize the instruments, much as a young child might.
Music Understanding Systems
Prof. Barry Vercoe and Eric Scheirer
We are engaged in the construction of computer systems which can understand
music like people do. By conducting psychoacoustic and psychological research
into the science of human music perception, we learn about the way the
brain processes music. Then we use these findings to build tools which
can perform sophisticated musical tasks such as automated music annotation,
multimedia database search, query-by-example, and advanced authoring and
remixing systems.
Computational Auditory Scene Analysis
Professor Barry Vercoe and Keith Martin
By modeling human auditory processes such as sound localization, signal
grouping and separation, pitch-tracking, and high-level domain understanding,
machines can exhibit elements of sound recognition ranging from feature
detection and texture recognition to generalized auditory scene analysis.
This enables machines to extract information from soundtracks or audio
environments, or search through large databases of unprocessed sound data.
Perceptual Audio Models
Professor Barry Vercoe and Michael Casey
Model-based audio is an emerging paradigm
for the representation and distribution of sound for networked, computer-based
applications. It is an alternative to sample- and stream-based representations
of audio, which are the prevalent modes of sound dissemination at this
time. This project explores the use of perceptually based feature representations
of sound for creating controllable, compact audio models for networked
computer applications, such as virtual worlds and 3-D games, and new audio
distribution protocols for sound and music.
3-D Audio using Loudspeakers
Professor Barry Vercoe and Bill Gardner
3-D audio systems, which can surround a
listener with sounds at arbitrary locations, are an important part of
immersive interfaces. A new approach is presented for implementing 3-D
audio using a pair of conventional loudspeakers. The new idea is to use
the tracked position of the listener's head to optimize the acoustical
presentation, and thus produce a much more realistic illusion over a larger
listening area than existing loudspeaker 3-D audio systems. By using a
remote head tracker, for instance based on computer vision, an immersive
audio environment can be created without donning headphones or other equipment.
Prediction Driven CASA
Professor Barry Vercoe and Dan Ellis
The sound of a busy environment gives rise
to a perception of numerous distinct events in a human listener - the
'auditory scene analysis' of the acoustic information. Recent advances
in the understanding of this process from experimental psychoacoustics
have led to several efforts to build a computer model capable of the same
function. This work is known as `computational auditory scene analysis'.
The dominant approach to this problem has been as a sequence of modules,
the output of one forming the input to the next. Sound is converted to
its spectrum, cues are picked out, and representations of the cues are
grouped into an abstract description of the initial input. This 'data-driven'
approach has some specific weaknesses in comparison to the auditory system:
it will interpret a given sound in the same way regardless of its context,
and it cannot `infer' the presence of a sound for which direct evidence
is hidden by other components. The `prediction-driven' approach is presented
as an alternative, in which analysis is a process of reconciliation between
the observed acoustic features and the predictions of an internal model
of the sound-producing entities in the environment.
KEMAR HRTF Measurements
Professor Barry Vercoe, Bill Gardner, and Keith Martin
An extensive set of head-related transfer
function (HRTF) measurements of a KEMAR dummy head microphone was completed
in May, 1994. The measurements consist of the left and right ear impulse
responses from a Realistic Optimus Pro 7 loudspeaker mounted 1.4 meters
from the KEMAR. Maximum length (ML) pseudo-random binary sequences were
used to obtain the impulse responses at a sampling rate of 44.1 kHz. A
total of 710 different positions were sampled at elevations from -40 degrees
to +90 degrees. Also measured were the impulse response of the speaker
in free field and several headphones placed on the KEMAR. This data is
being made available to the research community on the Internet via anonymous
FTP and the World Wide Web.
|