Bioinformatics grows to keep pace with petabytes of small tech data

June 25, 2003 – A microfluidic device can function as a “liver-on-a-chip.” Forests of nano-pillars can unravel strands of DNA. These small tech-enabled leaps in life science involve increasingly complex interactions, and the result is new data for scientists working on curing diseases.

Too much data.

Click here to enlarge image

In fact, so much data that humans alone cannot possibly make sense of it all.

“It’s like a 3-D puzzle in which a piece may fit not with just one, but hundreds of others,” said Laura Mazzola, founder of the Nanobio Forum in Northern California. The ability to recognize patterns in huge data-bases of genetic information, said Mazzola, has rapidly outpaced what even the best human minds can comprehend.

Today, analyzing how multiple genes function together can produce terabytes of data. But as nanotech enables greater sensing and collecting of data, the info flow could become measured in petabytes, or a quadrillion bytes of information. Muscling such large and complex raw results into useful knowledge is the goal of bioinformatics.

Front Line Strategic Consulting predicted last year that the bioinformatics business will reach $1.7 billion by 2006. The market research firm said bioinformatics would grow at a 20 percent annual rate while helping shave 33 percent of the cost, and two years of time, off the drug discovery process.

Stanford bioinformatics professor Russ Altman envisions the day when microfluidic systems connected to computers will fully model an individual cell?s structure and function. Armed with such a detailed virtual representation of how a cell works, scientists will be able to develop novel drugs with unprecedented speed and precision “without ever touching a mouse.” The biological kind, he means.

The National Science Foundation (NSF) recently funded a Network for Computational Nanotechnology based at Purdue University. Formed by seven universities, the network aims to give academia and industry access to advanced simulation tools for disciplines including biotech.

Startups like BioForce Nanosciences Inc. in Ames, Iowa, are working on nanobiotechnology?s commercial frontier. The company said its NanoPro system can deposit as few as a thousand biomolecules in an array of droplets spaced a few nanometers apart. Chief Science Officer Eric Henderson said that 1,000 droplets can fit in the area of one microarray well. An atomic force microscope can then “feel” topographical features in the samples that signal cancer or HIV.

Public bioinformatics resources include online databases such as the Protein Data Bank, a collaboration among Rutgers University, the San Diego Supercomputer Center and the National Institute of Standards and Technology. Web-based tools such as the National Center for Biotechnology Information’s BLAST (Basic Local Alignment Search Tool) gene search engine enable researchers to hunt for patterns across the entire human genome, as well as all other known genomes.

Of course, the convergence of bio, nano and computing also encompasses a slew of software, open source and commercial. Stanford’s Altman said that all commercial bioinformatics software is only as good as the proprietary data and expertise that may come with it.

Cengent Therapeutics Inc. — the product of a merger between Structural Bioinformatics and GeneFormatics — is a privately held company in San Diego focused on proteomic drug discovery. The company’s relational database, called StructureBank, can store and compare different 3-D protein structures.

Rosetta Biosoftware Inc. in Kirkland, Wash., produces bioinformatics packages for drug discovery research. The company’s Resolver system helps scientists see gene pathways, make predictions about how genes interact and identify patterns. Doug Bassett, Rosetta’s general manager, said the new 4.0 version of Resolver has a feature that can filter out the most biologically significant data in a specific experiment to speed and focus results.

Accelrys Inc., also in San Diego, offers programs for modeling, simulation and analysis of biomolecules as well as a range of bioinformatics tools. And GeneSpring 5.1 from Silicon Genetics of Redwood City, Calif. is one of many software applications for gene expression analysis. LION Bioscience AG in Heidelberg, Germany, markets its DiscoveryCenter as a total platform integrating drug discovery data, applications and documents into a single desktop interface.

Not surprisingly, all this intense computing requires significant hardware horsepower. IBM launched its $100 million Blue Gene supercomputing project in 1999 to help unravel how protein molecules are constructed and to advance the field of biomolecular simulation.

Diseases such as Alzheimer’s and Parkinson’s are thought to be caused when genes build proteins with small errors in them. But measuring the smallest proteins folding themselves into intricate twists could take decades for a current scientific work-
station to process.

IBM’s first machine, the Linux-powered Blue G/L, is slated to debut in 2005. With 65,000 parallel processors and 16 trillion bytes of memory, the bioinformatics supercomputer is expected to process 200 trillion calculations per second. And with IBM’s e-business-on-demand initiative, the company may handle intense computational projects such as bioinformatics as a service or utility, paid for like electricity or water .

POST A COMMENT