How can machine learning with voice disorders

There’s no human instinct more basic than speech, and yet, for many people, talking can be taxing. One in 14 working-age Americans suffer from voice disorders that are often associated with abnormal vocal behaviors — some of which can cause damage to vocal cord tissue and lead to the formation of nodules or polyps that interfere with normal speech production.

Unfortunately, many behaviorally-based voice disorders are not well understood. In particular, patients with muscle tension dysphonia (MTD) often experience deteriorating voice quality and vocal fatigue (“tired voice”) in the absence of any clear vocal cord damage or other medical problems, which makes the condition both hard to diagnose and hard to treat.

But a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Massachusetts General Hospital (MGH) believes that better understanding of conditions like MTD is possible through machine learning.

Using accelerometer data collected from a wearable device developed by researchers at the MGH Voice Center, researchers demonstrated that they can detect differences between subjects with MTD and matched controls. The same methods also showed that, after receiving voice therapy, MTD subjects exhibited behavior that was more similar to that of the controls.

“We believe this approach could help detect disorders that are exacerbated by vocal misuse, and help to empirically measure the impact of voice therapy,” says MIT PhD student Marzyeh Ghassemi, who is first author on a related paper that she presented at last week’s Machine Learning in Health Care (MLHC) conference in Los Angeles. “Our long-term goal is for such a system to be used to alert patients when they are using their voices in ways that could lead to problems.”

The paper’s co-authors include John Guttag, MIT professor of electrical engineering and computer science; Zeeshan Syed, CEO of the machine-learning startup Health[at]Scale; and physicians Robert Hillman, Daryush Mehta and Jarrad H. Van Stan of Massachusetts General Hospital.

How it works

Existing approaches to applying machine learning to physiological signals often involve supervised learning, in which researchers painstakingly label data and provide desired outputs. Besides being time-consuming, such methods currently can’t actually help classify utterances as normal or abnormal, because there is currently not a good understanding of the correlations between accelerometer data and voice misuse.

Because the CSAIL team did not know when vocal misuse was occurring, they opted to use unsupervised learning, where data is unlabeled at the instance level.

“People with vocal disorders aren’t always misusing their voices, and people without disorders also occasionally misuse their voices,” says Ghassemi. “The difficult task here was to build a learning algorithm that can determine what sort of vocal cord movements are prominent in subjects with a disorder.”

The study was broken into two groups: patients that had been diagnosed with voice disorders, and a control group of individuals without disorders. Each group went about their daily activities while wearing accelerometers on their necks that captured the motions of their vocal folds.

Researchers then looked at the two groups’ data, analyzing more than 110 million “glottal pulses” that each represent one opening and closing of the vocal folds. By comparing clusters of pulses, the team could detect significant differences between patients and controls.

The team also found that after voice therapy the distribution of patients’ glottal pulses were more similar to those of the controls. According to Guttag, this is the first such study to use machine learning to provide objective evidence of the positive effects of voice therapy.

System from the Computer Science and Artificial Intelligence

There are few things more frustrating than trying to use your phone on a crowded network. With phone usage growing faster than wireless spectrum, we’re all now fighting over smaller and smaller bits of bandwidth. Spectrum crunch is such a big problem that the White House is getting involved, recently announcing both a $400 million research initiative and a $4 million global competition devoted to the issue.

But researchers from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) say that they have a possible solution. In a new paper, a team led by professor Dina Katabi demonstrate a system called MegaMIMO 2.0 that can transfer wireless data more than three times faster than existing systems while also doubling the range of the signal.

The soon-to-be-commercialized system’s key insight is to coordinate multiple access points at the same time, on the same frequency, without creating interference. This means that MegaMIMO 2.0 could dramatically improve the speed and strength of wireless networks, particularly at high-usage events like concerts, conventions and football games.

“In today’s wireless world, you can’t solve spectrum crunch by throwing more transmitters at the problem, because they will all still be interfering with one another,” says Ezzeldin Hamed, a PhD student who is lead author on a new paper on the topic. “The answer is to have all those access points work with each other simultaneously to efficiently use the available spectrum.”

To test MegaMIMO 2.0’s performance, the researchers created a mock conference room with a set of four laptops that each roamed the space atop Roomba robots. The experiments found that the system could increase the devices’ data-transfer speed 330 percent.

MegaMIMO 2.0’s hardware is the size of a standard router, and consists of a processor, a real-time baseband processing system, and a transceiver board.

Katabi and Hamed co-wrote the paper with Hariharan Rahul SM ’99, PhD ’13, an alum of Katabi’s group and visiting researcher with the group, as well as visiting student Mohammed A. Albdelghany. Rahul will present the paper at next week’s conference for the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM 16).

How it works

The main reason that your smartphone works so speedily is multiple-input multiple-output (MIMO), which means that it uses several transmitters and receivers at the same time. Radio waves bounce off surfaces and therefore arrive at the receivers at slightly different times; devices with multiple receivers, then, are able to combine the various streams to transmit data much faster. For example, a router with three antennas works twice as fast as one with two antennas.

But in a world of limited bandwidth, these speeds are still not as fast as they could be, and so in recent years researchers have searched for the wireless industry’s Holy Grail: being able to coordinate several routers at once so that they can triangulate the data even faster and more consistently.

“The problem is that, just like how two radio stations can’t play music over the same frequency at the same time, multiple routers cannot transfer data on the same chunk of spectrum without creating major interference that muddies the signal,” says Rahul.

For the CSAIL team, the missing piece to the puzzle was a new technique for coordinating multiple transmitters by synchronizing their phases. The team developed special signal-processing algorithms that allow multiple independent transmitters to transmit data on the same piece of spectrum to multiple independent receivers without interfering with each other.

Programming language delivers fourfold speedups on problems

In today’s computer chips, memory management is based on what computer scientists call the principle of locality: If a program needs a chunk of data stored at some memory location, it probably needs the neighboring chunks as well.

But that assumption breaks down in the age of big data, now that computer programs more frequently act on just a few data items scattered arbitrarily across huge data sets. Since fetching data from their main memory banks is the major performance bottleneck in today’s chips, having to fetch it more frequently can dramatically slow program execution.

This week, at the International Conference on Parallel Architectures and Compilation Techniques, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new programming language, called Milk, that lets application developers manage memory more efficiently in programs that deal with scattered data points in large data sets.

In tests on several common algorithms, programs written in the new language were four times as fast as those written in existing languages. But the researchers believe that further work will yield even larger gains.

The reason that today’s big data sets pose problems for existing memory management techniques, explains Saman Amarasinghe, a professor of electrical engineering and computer science, is not so much that they are large as that they are what computer scientists call “sparse.” That is, with big data, the scale of the solution does not necessarily increase proportionally with the scale of the problem.

“In social settings, we used to look at smaller problems,” Amarasinghe says. “If you look at the people in this [CSAIL] building, we’re all connected. But if you look at the planet scale, I don’t scale my number of friends. The planet has billions of people, but I still have only hundreds of friends. Suddenly you have a very sparse problem.”

Similarly, Amarasinghe says, an online bookseller with, say, 1,000 customers might like to provide its visitors with a list of its 20 most popular books. It doesn’t follow, however, that an online bookseller with a million customers would want to provide its visitors with a list of its 20,000 most popular books.

Thinking locally

Today’s computer chips are not optimized for sparse data — in fact, the reverse is true. Because fetching data from the chip’s main memory bank is slow, every core, or processor, in a modern chip has its own “cache,” a relatively small, local, high-speed memory bank. Rather than fetching a single data item at a time from main memory, a core will fetch an entire block of data. And that block is selected according to the principle of locality.

It’s easy to see how the principle of locality works with, say, image processing. If the purpose of a program is to apply a visual filter to an image, and it works on one block of the image at a time, then when a core requests a block, it should receive all the adjacent blocks its cache can hold, so that it can grind away on block after block without fetching any more data.

But that approach doesn’t work if the algorithm is interested in only 20 books out of the 2 million in an online retailer’s database. If it requests the data associated with one book, it’s likely that the data associated with the 100 adjacent books will be irrelevant.

Going to main memory for a single data item at a time is woefully inefficient. “It’s as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton, and put it back in the fridge,” says Vladimir Kiriansky, a PhD student in electrical engineering and computer science and first author on the new paper. He’s joined by Amarasinghe and Yunming Zhang, also a PhD student in electrical engineering and computer science.

Batch processing

Milk simply adds a few commands to OpenMP, an extension of languages such as C and Fortran that makes it easier to write code for multicore processors. With Milk, a programmer inserts a couple additional lines of code around any instruction that iterates through a large data collection looking for a comparatively small number of items. Milk’s compiler — the program that converts high-level code into low-level instructions — then figures out how to manage memory accordingly.

Prototype display enables viewers

3-D movies immerse us in new worlds and allow us to see places and things in ways that we otherwise couldn’t. But behind every 3-D experience is something that is uniformly despised: those goofy glasses.

Fortunately, there may be hope. In a new paper, a team from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and Israel’s Weizmann Institute of Science have demonstrated a display that lets audiences watch 3-D films in a movie theater without extra eyewear.

Dubbed “Cinema 3D,” the prototype uses a special array of lenses and mirrors to enable viewers to watch a 3-D movie from any seat in a theater.

“Existing approaches to glasses-free 3-D require screens whose resolution requirements are so enormous that they are completely impractical,” says MIT professor Wojciech Matusik, one of the co-authors on a related paper whose first author is Weizmann PhD Netalee Efrat. “This is the first technical approach that allows for glasses-free 3-D on a large scale.”

While the researchers caution that the system isn’t currently market-ready, they are optimistic that future versions could push the technology to a place where theaters would be able to offer glasses-free alternatives for 3-D movies.

Among the paper’s co-authors are MIT research technician Mike Foshey; former CSAIL postdoc Piotr Didyk; and two Weizmann researchers that include Efrat and professor Anat Levin. Efrat will present the paper at this week’s SIGGRAPH computer-graphics conference in Anaheim, California.

Glasses-free 3-D already exists, but not in a way that scales to movie theaters. Traditional methods for TV sets use a series of slits in front of the screen (a “parallax barrier”) that allows each eye to see a different set of pixels, creating a simulated sense of depth.

But because parallax barriers have to be at a consistent distance from the viewer, this approach isn’t practical for larger spaces like theaters that have viewers at different angles and distances.

Other methods, including one from the MIT Media Lab, involve developing completely new physical projectors that cover the entire angular range of the audience. However, this often comes at a cost of lower image-resolution.

The key insight with Cinema 3D is that people in movie theaters move their heads only over a very small range of angles, limited by the width of their seat. Thus, it is enough to display images to a narrow range of angles and replicate that to all seats in the theater.

What Cinema 3D does, then, is encode multiple parallax barriers in one display, such that each viewer sees a parallax barrier tailored to their position. That range of views is then replicated across the theater by a series of mirrors and lenses within Cinema 3D’s special optics system.

“With a 3-D TV, you have to account for people moving around to watch from different angles, which means that you have to divide up a limited number of pixels to be projected so that the viewer sees the image from wherever they are,” says Gordon Wetzstein, an assistant professor of electrical engineering at Stanford University, who was not involved in the research. “The authors [of Cinema 3D] cleverly exploited the fact that theaters have a unique set-up in which every person sits in a more or less fixed position the whole time.”

Practical applications for non native English

After thousands of hours of work, MIT researchers have released the first major database of fully annotated English sentences written by non-native speakers.

The researchers who led the project had already shown that the grammatical quirks of non-native speakers writing in English could be a source of linguistic insight. But they hope that their dataset could also lead to applications that would improve computers’ handling of spoken or written language of non-native English speakers.

“English is the most used language on the Internet, with over 1 billion speakers,” says Yevgeni Berzak, a graduate student in electrical engineering and computer science, who led the new project. “Most of the people who speak English in the world or produce English text are non-native speakers. This characteristic is often overlooked when we study English scientifically or when we do natural-language processing for English.”

Most natural-language-processing systems, which enable smartphone and other computer applications to process requests phrased in ordinary language, are based on machine learning, in which computer systems look for patterns in huge sets of training data. “If you want to handle noncanonical learner language, in terms of the training material that’s available to you, you can only train on standard English,” Berzak explains.

Systems trained on nonstandard English, on the other hand, could be better able to handle the idiosyncrasies of non-native English speakers, such as tendencies to drop or add prepositions, to substitute particular tenses for others, or to misuse particular auxiliary verbs. Indeed, the researchers hope that their work could lead to grammar-correction software targeted to native speakers of other languages.

Diagramming sentences

The researchers’ dataset consists of 5,124 sentences culled from exam essays written by students of English as a second language (ESL). The sentences were drawn, in approximately equal distribution, from native speakers of 10 languages that are the primary tongues of roughly 40 percent of the world’s population.

Every sentence in the dataset includes at least one grammatical error. The original source of the sentences was a collection made public by Cambridge University, which included annotation of the errors, but no other grammatical or syntactic information.

To provide that additional information, Berzak recruited a group of MIT undergraduate and graduate students from the departments of Electrical Engineering and Computer Science (EECS), Linguistics, and Mechanical Engineering, led by Carolyn Spadine, a graduate student in linguistics.

After eight weeks of training in how to annotate both grammatically correct and error-ridden sentences, the students began working directly on the data. There are three levels of annotation. The first involves basic parts of speech — whether a word is a noun, a verb, a preposition, and so on. The next is a more detailed description of parts of speech — plural versus singular nouns, verb tenses, comparative and superlative adjectives, and the like.

Next, the annotators charted the syntactic relationships between the words of the sentences, using a relatively new annotation scheme called the Universal Dependency formalism. Syntactic relationships include things like which nouns are the objects of which verbs, which verbs are auxiliaries of other verbs, which adjectives modify which nouns, and so on.

The annotators created syntactic charts for both the corrected and uncorrected versions of each sentence. That required some prior conceptual work, since grammatical errors can make words’ syntactic roles difficult to interpret.

Berzak and Spadine wrote a 20-page guide to their annotation scheme, much of which dealt with the handling of error-ridden sentences. Consistency in the treatment of such sentences is essential to any envisioned application of the dataset: A machine-learning system can’t learn to recognize an error if the error is described differently in different training examples.

Repeatable results

The researchers’ methodology, however, provides good evidence that annotators can chart ungrammatical sentences consistently. For every sentence, one evaluator annotated it completely; another reviewed the annotations and flagged any areas of disagreement; and a third ruled on the disagreements.

How could system helps ensure databases

Genome-wide association studies, which try to find correlations between particular genetic variations and disease diagnoses, are a staple of modern medical research.

But because they depend on databases that contain people’s medical histories, they carry privacy risks. An attacker armed with genetic information about someone — from, say, a skin sample — could query a database for that person’s medical data. Even without the skin sample, an attacker who was permitted to make repeated queries, each informed by the results of the last, could, in principle, extract private data from the database.

In the latest issue of the journal Cell Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington describe a new system that permits database queries for genome-wide association studies but reduces the chances of privacy compromises to almost zero.

It does that by adding a little bit of misinformation to the query results it returns. That means that researchers using the system could begin looking for drug targets with slightly inaccurate data. But in most cases, the answers returned by the system will be close enough to be useful.

And an instantly searchable online database of genetic data, even one that returned slightly inaccurate information, could make biomedical research much more efficient.

“Right now, what a lot of people do, including the NIH, for a long time, is take all their data — including, often, aggregate data, the statistics we’re interested in protecting — and put them into repositories,” says Sean Simmons, an MIT postdoc in mathematics and first author on the new paper. “And you have to go through a time-consuming process to get access to them.”

That process involves a raft of paperwork, including explanations of how the research enabled by the repositories will contribute to the public good, which requires careful review. “We’ve waited months to get access to various repositories,” says Bonnie Berger, the Simons Professor of Mathematics at MIT, who was Simmons’s thesis advisor and is the corresponding author on the paper. “Months.”

Bring the noise

Genome-wide association studies generally rely on genetic variations called single-nucleotide polymorphisms, or SNPs (pronounced “snips”). A SNP is a variation of one nucleotide, or DNA “letter,” at a specified location in the genome. Millions of SNPs have been identified in the human population, and certain combinations of SNPs can serve as proxies for larger stretches of DNA that tend to be conserved among individuals.

The new system, which Berger and Simmons developed together with Cenk Sahinalp, a professor of computer science at Indiana University, implements a technique called “differential privacy,” which has been a major area of cryptographic research in recent years. Differential-privacy techniques add a little bit of noise, or random variation, to the results of database searches, to confound algorithms that would seek to extract private information from the results of several, tailored, sequential searches.

The amount of noise required depends on the strength of the privacy guarantee — how low you want to set the likelihood of leaking private information — and the type and volume of data. The more people whose data a SNP database contains, the less noise the system needs to add; essentially, it’s easier to get lost in a crowd. But the more SNPs the system records, the more flexibility an attacker has in constructing privacy-compromising searches, which increases the noise requirements.

Practical quantum computers

Quantum computers are largely hypothetical devices that could perform some calculations much more rapidly than conventional computers can. Instead of the bits of classical computation, which can represent 0 or 1, quantum computers consist of quantum bits, or qubits, which can, in some sense, represent 0 and 1 simultaneously.

Although quantum systems with as many as 12 qubits have been demonstrated in the lab, building quantum computers complex enough to perform useful computations will require miniaturizing qubit technology, much the way the miniaturization of transistors enabled modern computers.

Trapped ions are probably the most widely studied qubit technology, but they’ve historically required a large and complex hardware apparatus. In today’s Nature Nanotechnology, researchers from MIT and MIT Lincoln Laboratory report an important step toward practical quantum computers, with a paper describing a prototype chip that can trap ions in an electric field and, with built-in optics, direct laser light toward each of them.

“If you look at the traditional assembly, it’s a barrel that has a vacuum inside it, and inside that is this cage that’s trapping the ions. Then there’s basically an entire laboratory of external optics that are guiding the laser beams to the assembly of ions,” says Rajeev Ram, an MIT professor of electrical engineering and one of the senior authors on the paper. “Our vision is to take that external laboratory and miniaturize much of it onto a chip.”

Caged in

The Quantum Information and Integrated Nanosystems group at Lincoln Laboratory was one of several research groups already working to develop simpler, smaller ion traps known as surface traps. A standard ion trap looks like a tiny cage, whose bars are electrodes that produce an electric field. Ions line up in the center of the cage, parallel to the bars. A surface trap, by contrast, is a chip with electrodes embedded in its surface. The ions hover 50 micrometers above the electrodes.

Cage traps are intrinsically limited in size, but surface traps could, in principle, be extended indefinitely. With current technology, they would still have to be held in a vacuum chamber, but they would allow many more qubits to be crammed inside.

“We believe that surface traps are a key technology to enable these systems to scale to the very large number of ions that will be required for large-scale quantum computing,” says Jeremy Sage, who together with John Chiaverini leads Lincoln Laboratory’s trapped-ion quantum-information-processing project. “These cage traps work very well, but they really only work for maybe 10 to 20 ions, and they basically max out around there.”

Performing a quantum computation, however, requires precisely controlling the energy state of every qubit independently, and trapped-ion qubits are controlled with laser beams. In a surface trap, the ions are only about 5 micrometers apart. Hitting a single ion with an external laser, without affecting its neighbors, is incredibly difficult; only a few groups had previously attempted it, and their techniques weren’t  practical for large-scale systems.

Getting onboard

That’s where Ram’s group comes in. Ram and Karan Mehta, an MIT graduate student in electrical engineering and first author on the new paper, designed and built a suite of on-chip optical components that can channel laser light toward individual ions. Sage, Chiaverini, and their Lincoln Lab colleagues Colin Bruzewicz and Robert McConnell retooled their surface trap to accommodate the integrated optics without compromising its performance. Together, both groups designed and executed the experiments to test the new system.

“Typically, for surface electrode traps, the laser beam is coming from an optical table and entering this system, so there’s always this concern about the beam vibrating or moving,” Ram says. “With photonic integration, you’re not concerned about beam-pointing stability, because it’s all on the same chip that the electrodes are on. So now everything is registered against each other, and it’s stable.”

Speed up on computer simulations

Computer simulations of physical systems are common in science, engineering, and entertainment, but they use several different types of tools.

If, say, you want to explore how a crack forms in an airplane wing, you need a very precise physical model of the crack’s immediate vicinity. But if you want to simulate the flexion of an airplane wing under different flight conditions, it’s more practical to use a simpler, higher-level description of the wing.

If, however, you want to model the effects of wing flexion on the crack’s propagation, or vice versa, you need to switch back and forth between these two levels of description, which is difficult not only for computer programmers but for computers, too.

A team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory, Adobe, the University of California at Berkeley, the University of Toronto, Texas A&M, and the University of Texas have developed a new programming language that handles that switching automatically.

In experiments, simulations written in the language were dozens or even hundreds of times as fast as those written in existing simulation languages. But they required only one-tenth as much code as meticulously hand-optimized simulations that could achieve similar execution speeds.

“The story of this paper is that the trade-off between concise code and good performance is false,” says Fredrik Kjolstad, an MIT graduate student in electrical engineering and computer science and first author on a new paper describing the language. “It’s not necessary, at least for the problems that this applies to. But it applies to a large class of problems.”

Indeed, Kjolstad says, the researchers’ language has applications outside physical simulation, in machine learning, data analytics, optimization, and robotics, among other areas. Kjolstad and his colleagues have already used the language to implement a version of Google’s original PageRank algorithm for ordering search results, and they’re currently collaborating with researchers in MIT’s Department of Physics on an application in quantum chromodynamics, a theory of the “strong force” that holds atomic nuclei together.

“I think this is a language that is not just going to be for physical simulations for graphics people,” says Saman Amarasinghe, Kjolstad’s advisor and a professor of electrical engineering and computer science (EECS). “I think it can do a lot of other things. So we are very optimistic about where it’s going.”

Kjolstad presented the paper in July at the Association for Computing Machinery’s Siggraph conference, the major conference in computer graphics. His co-authors include Amarasinghe; Wojciech Matusik, an associate professor of EECS; and Gurtej Kanwar, who was an MIT undergraduate when the work was done but is now an MIT PhD student in physics.

Graphs vs. matrices

As Kjolstad explains, the distinction between the low-level and high-level descriptions of physical systems is more properly described as the distinction between descriptions that use graphs and descriptions that use linear algebra.

In this context, a graph is a mathematical structure that consists of nodes, typically represented by circles, and edges, typically represented as line segments connecting the nodes. Edges and nodes can have data associated with them. In a physical simulation, that data might describe tiny triangles or tetrahedra that are stitched together to approximate the curvature of a smooth surface. Low-level simulation might require calculating the individual forces acting on, say, every edge and face of each tetrahedron.

Linear algebra instead represents a physical system as a collection of points, which exert forces on each other. Those forces are described by a big grid of numbers, known as a matrix. Simulating the evolution of the system in time involves multiplying the matrix by other matrices, or by vectors, which are individual rows or columns of numbers.

Electrical Engineering and Computer Science

Nancy Lynch, the NEC Professor of Software Science and Engineering, has been appointed as associate head of the Department of Electrical Engineering and Computer Science (EECS), effective September 1.

Lynch is known for her fundamental contributions to the foundations of distributed computing. Her work applies a mathematical approach to explore the inherent limits on computability and complexity in distributed systems.

Her best-known research is the “FLP” impossibility result for distributed consensus in the presence of process failures. Other research includes the I/O automata system modeling frameworks. Lynch’s recent work focuses on wireless network algorithms and biological distributed algorithms.

The longtime head of the Theory of Distributed Systems (TDS) research group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), Lynch joined MIT in 1981. She received her BS from Brooklyn College in 1968 and her PhD from MIT in 1972, both in mathematics. Recently, Lynch served as head of CSAIL’s Theory of Computation (TOC) group for several years.

She is also the author of several books and textbooks, including the graduate textbook Distributed Algorithms, considered a standard reference in the field. Lynch has also has co-authored several hundred articles about distributed algorithms and impossibility results, and about formal modeling and verification of distributed systems. She is the recipient of numerous awards, an ACM Fellow, a Fellow of the American Academy of Arts and Sciences, and a member of both the National Academy of Sciences and the National Academy of Engineering.

New computational imaging method

MIT researchers and their colleagues are designing an imaging system that can read closed books.

In the latest issue of Nature Communications, the researchers describe a prototype of the system, which they tested on a stack of papers, each with one letter printed on it. The system was able to correctly identify the letters on the top nine sheets.

“The Metropolitan Museum in New York showed a lot of interest in this, because they want to, for example, look into some antique books that they don’t even want to touch,” says Barmak Heshmat, a research scientist at the MIT Media Lab and corresponding author on the new paper. He adds that the system could be used to analyze any materials organized in thin layers, such as coatings on machine parts or pharmaceuticals.

Heshmat is joined on the paper by Ramesh Raskar, an associate professor of media arts and sciences; Albert Redo Sanchez, a research specialist in the Camera Culture group at the Media Lab; two of the group’s other members; and by Justin Romberg and Alireza Aghasi of Georgia Tech.

The MIT researchers developed the algorithms that acquire images from individual sheets in stacks of paper, and the Georgia Tech researchers developed the algorithm that interprets the often distorted or incomplete images as individual letters. “It’s actually kind of scary,” Heshmat says of the letter-interpretation algorithm. “A lot of websites have these letter certifications [captchas] to make sure you’re not a robot, and this algorithm can get through a lot of them.”

Timing terahertz

The system uses terahertz radiation, the band of electromagnetic radiation between microwaves and infrared light, which has several advantages over other types of waves that can penetrate surfaces, such as X-rays or sound waves. Terahertz radiation has been widely researched for use in security screening, because different chemicals absorb different frequencies of terahertz radiation to different degrees, yielding a distinctive frequency signature for each. By the same token, terahertz frequency profiles can distinguish between ink and blank paper, in a way that X-rays can’t.

Terahertz radiation can also be emitted in such short bursts that the distance it has traveled can be gauged from the difference between its emission time and the time at which reflected radiation returns to a sensor. That gives it much better depth resolution than ultrasound.

The system exploits the fact that trapped between the pages of a book are tiny air pockets only about 20 micrometers deep. The difference in refractive index — the degree to which they bend light — between the air and the paper means that the boundary between the two will reflect terahertz radiation back to a detector.

In the researchers’ setup, a standard terahertz camera emits ultrashort bursts of radiation, and the camera’s built-in sensor detects their reflections. From the reflections’ time of arrival, the MIT researchers’ algorithm can gauge the distance to the individual pages of the book.

True signals

While most of the radiation is either absorbed or reflected by the book, some of it bounces around between pages before returning to the sensor, producing a spurious signal. The sensor’s electronics also produce a background hum. One of the tasks of the MIT researchers’ algorithm is to filter out all this “noise.”