Monthly Archives: October 2016

How could system helps ensure databases

Genome-wide association studies, which try to find correlations between particular genetic variations and disease diagnoses, are a staple of modern medical research.

But because they depend on databases that contain people’s medical histories, they carry privacy risks. An attacker armed with genetic information about someone — from, say, a skin sample — could query a database for that person’s medical data. Even without the skin sample, an attacker who was permitted to make repeated queries, each informed by the results of the last, could, in principle, extract private data from the database.

In the latest issue of the journal Cell Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and Indiana University at Bloomington describe a new system that permits database queries for genome-wide association studies but reduces the chances of privacy compromises to almost zero.

It does that by adding a little bit of misinformation to the query results it returns. That means that researchers using the system could begin looking for drug targets with slightly inaccurate data. But in most cases, the answers returned by the system will be close enough to be useful.

And an instantly searchable online database of genetic data, even one that returned slightly inaccurate information, could make biomedical research much more efficient.

“Right now, what a lot of people do, including the NIH, for a long time, is take all their data — including, often, aggregate data, the statistics we’re interested in protecting — and put them into repositories,” says Sean Simmons, an MIT postdoc in mathematics and first author on the new paper. “And you have to go through a time-consuming process to get access to them.”

That process involves a raft of paperwork, including explanations of how the research enabled by the repositories will contribute to the public good, which requires careful review. “We’ve waited months to get access to various repositories,” says Bonnie Berger, the Simons Professor of Mathematics at MIT, who was Simmons’s thesis advisor and is the corresponding author on the paper. “Months.”

Bring the noise

Genome-wide association studies generally rely on genetic variations called single-nucleotide polymorphisms, or SNPs (pronounced “snips”). A SNP is a variation of one nucleotide, or DNA “letter,” at a specified location in the genome. Millions of SNPs have been identified in the human population, and certain combinations of SNPs can serve as proxies for larger stretches of DNA that tend to be conserved among individuals.

The new system, which Berger and Simmons developed together with Cenk Sahinalp, a professor of computer science at Indiana University, implements a technique called “differential privacy,” which has been a major area of cryptographic research in recent years. Differential-privacy techniques add a little bit of noise, or random variation, to the results of database searches, to confound algorithms that would seek to extract private information from the results of several, tailored, sequential searches.

The amount of noise required depends on the strength of the privacy guarantee — how low you want to set the likelihood of leaking private information — and the type and volume of data. The more people whose data a SNP database contains, the less noise the system needs to add; essentially, it’s easier to get lost in a crowd. But the more SNPs the system records, the more flexibility an attacker has in constructing privacy-compromising searches, which increases the noise requirements.

How can machine learning with voice disorders

There’s no human instinct more basic than speech, and yet, for many people, talking can be taxing. One in 14 working-age Americans suffer from voice disorders that are often associated with abnormal vocal behaviors — some of which can cause damage to vocal cord tissue and lead to the formation of nodules or polyps that interfere with normal speech production.

Unfortunately, many behaviorally-based voice disorders are not well understood. In particular, patients with muscle tension dysphonia (MTD) often experience deteriorating voice quality and vocal fatigue (“tired voice”) in the absence of any clear vocal cord damage or other medical problems, which makes the condition both hard to diagnose and hard to treat.

But a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Massachusetts General Hospital (MGH) believes that better understanding of conditions like MTD is possible through machine learning.

Using accelerometer data collected from a wearable device developed by researchers at the MGH Voice Center, researchers demonstrated that they can detect differences between subjects with MTD and matched controls. The same methods also showed that, after receiving voice therapy, MTD subjects exhibited behavior that was more similar to that of the controls.

“We believe this approach could help detect disorders that are exacerbated by vocal misuse, and help to empirically measure the impact of voice therapy,” says MIT PhD student Marzyeh Ghassemi, who is first author on a related paper that she presented at last week’s Machine Learning in Health Care (MLHC) conference in Los Angeles. “Our long-term goal is for such a system to be used to alert patients when they are using their voices in ways that could lead to problems.”

The paper’s co-authors include John Guttag, MIT professor of electrical engineering and computer science; Zeeshan Syed, CEO of the machine-learning startup Health[at]Scale; and physicians Robert Hillman, Daryush Mehta and Jarrad H. Van Stan of Massachusetts General Hospital.

How it works

Existing approaches to applying machine learning to physiological signals often involve supervised learning, in which researchers painstakingly label data and provide desired outputs. Besides being time-consuming, such methods currently can’t actually help classify utterances as normal or abnormal, because there is currently not a good understanding of the correlations between accelerometer data and voice misuse.

Because the CSAIL team did not know when vocal misuse was occurring, they opted to use unsupervised learning, where data is unlabeled at the instance level.

“People with vocal disorders aren’t always misusing their voices, and people without disorders also occasionally misuse their voices,” says Ghassemi. “The difficult task here was to build a learning algorithm that can determine what sort of vocal cord movements are prominent in subjects with a disorder.”

The study was broken into two groups: patients that had been diagnosed with voice disorders, and a control group of individuals without disorders. Each group went about their daily activities while wearing accelerometers on their necks that captured the motions of their vocal folds.

Researchers then looked at the two groups’ data, analyzing more than 110 million “glottal pulses” that each represent one opening and closing of the vocal folds. By comparing clusters of pulses, the team could detect significant differences between patients and controls.

The team also found that after voice therapy the distribution of patients’ glottal pulses were more similar to those of the controls. According to Guttag, this is the first such study to use machine learning to provide objective evidence of the positive effects of voice therapy.

Practical quantum computers

Quantum computers are largely hypothetical devices that could perform some calculations much more rapidly than conventional computers can. Instead of the bits of classical computation, which can represent 0 or 1, quantum computers consist of quantum bits, or qubits, which can, in some sense, represent 0 and 1 simultaneously.

Although quantum systems with as many as 12 qubits have been demonstrated in the lab, building quantum computers complex enough to perform useful computations will require miniaturizing qubit technology, much the way the miniaturization of transistors enabled modern computers.

Trapped ions are probably the most widely studied qubit technology, but they’ve historically required a large and complex hardware apparatus. In today’s Nature Nanotechnology, researchers from MIT and MIT Lincoln Laboratory report an important step toward practical quantum computers, with a paper describing a prototype chip that can trap ions in an electric field and, with built-in optics, direct laser light toward each of them.

“If you look at the traditional assembly, it’s a barrel that has a vacuum inside it, and inside that is this cage that’s trapping the ions. Then there’s basically an entire laboratory of external optics that are guiding the laser beams to the assembly of ions,” says Rajeev Ram, an MIT professor of electrical engineering and one of the senior authors on the paper. “Our vision is to take that external laboratory and miniaturize much of it onto a chip.”

Caged in

The Quantum Information and Integrated Nanosystems group at Lincoln Laboratory was one of several research groups already working to develop simpler, smaller ion traps known as surface traps. A standard ion trap looks like a tiny cage, whose bars are electrodes that produce an electric field. Ions line up in the center of the cage, parallel to the bars. A surface trap, by contrast, is a chip with electrodes embedded in its surface. The ions hover 50 micrometers above the electrodes.

Cage traps are intrinsically limited in size, but surface traps could, in principle, be extended indefinitely. With current technology, they would still have to be held in a vacuum chamber, but they would allow many more qubits to be crammed inside.

“We believe that surface traps are a key technology to enable these systems to scale to the very large number of ions that will be required for large-scale quantum computing,” says Jeremy Sage, who together with John Chiaverini leads Lincoln Laboratory’s trapped-ion quantum-information-processing project. “These cage traps work very well, but they really only work for maybe 10 to 20 ions, and they basically max out around there.”

Performing a quantum computation, however, requires precisely controlling the energy state of every qubit independently, and trapped-ion qubits are controlled with laser beams. In a surface trap, the ions are only about 5 micrometers apart. Hitting a single ion with an external laser, without affecting its neighbors, is incredibly difficult; only a few groups had previously attempted it, and their techniques weren’t  practical for large-scale systems.

Getting onboard

That’s where Ram’s group comes in. Ram and Karan Mehta, an MIT graduate student in electrical engineering and first author on the new paper, designed and built a suite of on-chip optical components that can channel laser light toward individual ions. Sage, Chiaverini, and their Lincoln Lab colleagues Colin Bruzewicz and Robert McConnell retooled their surface trap to accommodate the integrated optics without compromising its performance. Together, both groups designed and executed the experiments to test the new system.

“Typically, for surface electrode traps, the laser beam is coming from an optical table and entering this system, so there’s always this concern about the beam vibrating or moving,” Ram says. “With photonic integration, you’re not concerned about beam-pointing stability, because it’s all on the same chip that the electrodes are on. So now everything is registered against each other, and it’s stable.”

System from the Computer Science and Artificial Intelligence

There are few things more frustrating than trying to use your phone on a crowded network. With phone usage growing faster than wireless spectrum, we’re all now fighting over smaller and smaller bits of bandwidth. Spectrum crunch is such a big problem that the White House is getting involved, recently announcing both a $400 million research initiative and a $4 million global competition devoted to the issue.

But researchers from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) say that they have a possible solution. In a new paper, a team led by professor Dina Katabi demonstrate a system called MegaMIMO 2.0 that can transfer wireless data more than three times faster than existing systems while also doubling the range of the signal.

The soon-to-be-commercialized system’s key insight is to coordinate multiple access points at the same time, on the same frequency, without creating interference. This means that MegaMIMO 2.0 could dramatically improve the speed and strength of wireless networks, particularly at high-usage events like concerts, conventions and football games.

“In today’s wireless world, you can’t solve spectrum crunch by throwing more transmitters at the problem, because they will all still be interfering with one another,” says Ezzeldin Hamed, a PhD student who is lead author on a new paper on the topic. “The answer is to have all those access points work with each other simultaneously to efficiently use the available spectrum.”

To test MegaMIMO 2.0’s performance, the researchers created a mock conference room with a set of four laptops that each roamed the space atop Roomba robots. The experiments found that the system could increase the devices’ data-transfer speed 330 percent.

MegaMIMO 2.0’s hardware is the size of a standard router, and consists of a processor, a real-time baseband processing system, and a transceiver board.

Katabi and Hamed co-wrote the paper with Hariharan Rahul SM ’99, PhD ’13, an alum of Katabi’s group and visiting researcher with the group, as well as visiting student Mohammed A. Albdelghany. Rahul will present the paper at next week’s conference for the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM 16).

How it works

The main reason that your smartphone works so speedily is multiple-input multiple-output (MIMO), which means that it uses several transmitters and receivers at the same time. Radio waves bounce off surfaces and therefore arrive at the receivers at slightly different times; devices with multiple receivers, then, are able to combine the various streams to transmit data much faster. For example, a router with three antennas works twice as fast as one with two antennas.

But in a world of limited bandwidth, these speeds are still not as fast as they could be, and so in recent years researchers have searched for the wireless industry’s Holy Grail: being able to coordinate several routers at once so that they can triangulate the data even faster and more consistently.

“The problem is that, just like how two radio stations can’t play music over the same frequency at the same time, multiple routers cannot transfer data on the same chunk of spectrum without creating major interference that muddies the signal,” says Rahul.

For the CSAIL team, the missing piece to the puzzle was a new technique for coordinating multiple transmitters by synchronizing their phases. The team developed special signal-processing algorithms that allow multiple independent transmitters to transmit data on the same piece of spectrum to multiple independent receivers without interfering with each other.

Speed up on computer simulations

Computer simulations of physical systems are common in science, engineering, and entertainment, but they use several different types of tools.

If, say, you want to explore how a crack forms in an airplane wing, you need a very precise physical model of the crack’s immediate vicinity. But if you want to simulate the flexion of an airplane wing under different flight conditions, it’s more practical to use a simpler, higher-level description of the wing.

If, however, you want to model the effects of wing flexion on the crack’s propagation, or vice versa, you need to switch back and forth between these two levels of description, which is difficult not only for computer programmers but for computers, too.

A team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory, Adobe, the University of California at Berkeley, the University of Toronto, Texas A&M, and the University of Texas have developed a new programming language that handles that switching automatically.

In experiments, simulations written in the language were dozens or even hundreds of times as fast as those written in existing simulation languages. But they required only one-tenth as much code as meticulously hand-optimized simulations that could achieve similar execution speeds.

“The story of this paper is that the trade-off between concise code and good performance is false,” says Fredrik Kjolstad, an MIT graduate student in electrical engineering and computer science and first author on a new paper describing the language. “It’s not necessary, at least for the problems that this applies to. But it applies to a large class of problems.”

Indeed, Kjolstad says, the researchers’ language has applications outside physical simulation, in machine learning, data analytics, optimization, and robotics, among other areas. Kjolstad and his colleagues have already used the language to implement a version of Google’s original PageRank algorithm for ordering search results, and they’re currently collaborating with researchers in MIT’s Department of Physics on an application in quantum chromodynamics, a theory of the “strong force” that holds atomic nuclei together.

“I think this is a language that is not just going to be for physical simulations for graphics people,” says Saman Amarasinghe, Kjolstad’s advisor and a professor of electrical engineering and computer science (EECS). “I think it can do a lot of other things. So we are very optimistic about where it’s going.”

Kjolstad presented the paper in July at the Association for Computing Machinery’s Siggraph conference, the major conference in computer graphics. His co-authors include Amarasinghe; Wojciech Matusik, an associate professor of EECS; and Gurtej Kanwar, who was an MIT undergraduate when the work was done but is now an MIT PhD student in physics.

Graphs vs. matrices

As Kjolstad explains, the distinction between the low-level and high-level descriptions of physical systems is more properly described as the distinction between descriptions that use graphs and descriptions that use linear algebra.

In this context, a graph is a mathematical structure that consists of nodes, typically represented by circles, and edges, typically represented as line segments connecting the nodes. Edges and nodes can have data associated with them. In a physical simulation, that data might describe tiny triangles or tetrahedra that are stitched together to approximate the curvature of a smooth surface. Low-level simulation might require calculating the individual forces acting on, say, every edge and face of each tetrahedron.

Linear algebra instead represents a physical system as a collection of points, which exert forces on each other. Those forces are described by a big grid of numbers, known as a matrix. Simulating the evolution of the system in time involves multiplying the matrix by other matrices, or by vectors, which are individual rows or columns of numbers.