Key Moments
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
Purdue's Cyber Center researches cyber infrastructure to enable scientific discovery, but new complex data types and query needs challenge traditional database systems.
Key Insights
The Purdue Cyber Center acts as an umbrella organization for IT research, partnering with nine other Discovery Park centers to foster projects across various application areas.
Advancements in communication bandwidth are doubling faster than data storage and retrieval speeds, creating a need for novel systems to manage massive data generation.
Scientific experiments can generate data equivalent to 100 million concurrent phone calls, highlighting the sheer volume and variety of data scientists produce.
The Cyber Center emphasizes four major areas: high-performance computing, networking, grid and middleware, and data analytics/visualization, with security and virtualization across all.
NanoHUB.org serves as a model 'cyber community,' providing resources for microelectronics researchers, with usage growing from 5,000 to 12,000 users.
Annotation and provenance tracking are critical innovations for biological databases, allowing users to annotate data and trace its origins, a feature not previously available.
Bridging research and deployment
The Purdue Cyber Center operates as a broad initiative focused on IT-related research, distinctly separate from ITAP, which handles deployment at Purdue University. The Cyber Center functions as an umbrella organization coordinating research projects that can range in size from 10 to 50 people. Its mission is twofold: to conduct basic cyber infrastructure research and develop new tools and techniques, and to deploy these advancements to real-world communities. This deployment aspect is managed in collaboration with ITAP, ensuring that research outcomes translate into practical applications. The center views itself as a partner to all nine other centers within Purdue's Discovery Park, which cover diverse fields such as manufacturing, life sciences, and energy, aiming to provide the foundational cyber infrastructure necessary to accelerate discovery across these disciplines.
The data deluge and the need for new systems
A primary motivation for the Cyber Center's work stems from a 'tipping point' in data generation. Advances in communication bandwidth are outstripping the pace of data storage and retrieval improvements. This imbalance means that massive amounts of data are being generated at the periphery of the internet, necessitating novel systems and architectures to manage it. The sheer volume can be staggering; one physics experiment, for instance, can produce data equivalent to 100 million phone calls simultaneously. Beyond volume, the types of data are also becoming increasingly complex and varied, moving beyond traditional database system capabilities. Scientists are posing high-level, domain-specific queries like 'how does a protein fold?' or 'what happens when two black holes collide?', rather than standard SQL queries. This demands systems capable of handling fuzzy, inaccurate, and incomplete data, as well as dealing with periscale systems, provenance, reliability, and security.
Key research thrusts and enabling technologies
The Cyber Center concentrates its efforts on four major areas: high-performance computing, networking, grid and middleware, and data analytics and visualization. These areas are supported by overarching efforts in security, virtualization, and the provisioning and management of cyber infrastructure. Complementing these 'trust areas' are various 'application areas' represented by the nine Discovery Park centers. The center fosters projects that combine expertise from these trust and application areas. For example, its own database research group, comprising about 45 people, focuses on developing data infrastructure for various applications, including healthcare surveillance and biological database management. Other projects involve novel parallel architectures, scientific computing, and middleware for grid computing, leveraging strengths from institutions like Sirius for security and privacy.
NanoHUB: A model for cyber communities
NanoHUB.org is presented as a successful example of a 'cyber community' developed by the Cyber Center. This platform provides a wealth of resources for researchers in microelectronics, including collaboration tools, learning modules, seminars, courses, tutorials, and crucially, simulations. Usage has grown significantly, from 5,000 to 12,000 users, who engage with the site for learning techniques, running simulations, and accessing educational content. Users can interact with simulations online, such as visualizing quantum dots or studying carbon nanotubes, modifying parameters and viewing results in real-time. This capability separates researchers from the underlying infrastructure, allowing them to focus on science while the system handles the computational demands. The majority of NanoHUB users are learners, utilizing web-based modules, tests, and exercises.
Addressing domain-specific data challenges: E. coli and healthcare
The Cyber Center applies its community-building approach to diverse scientific domains. The E. coli hub illustrates the challenge of integrating data from a global community of researchers, each contributing to specialized databases. Projects like this require addressing complex issues such as schema matching and mapping to create canonical interfaces, enabling collaboration and sharing of approximately 50 different databases. Another significant project is the Purdue Regional Center for visualization and analytics (PERVAc), funded by the Department of Homeland Security. This initiative involves collecting and analyzing medical and veterinary data, including hospital admissions, emergency room visits, and drug sales, to detect patterns indicative of disease outbreaks. A key challenge here is reducing false positives and negatives in alerts, making the system more intelligent in identifying genuine threats versus benign events like school absenteeism due to a 'senior day out'.
Building pipelines for biological and scientific discovery
The concept of 'pipelines' is central to projects within systems biology. Researchers work across different levels, from genomics and proteomics to metabolomics and ionomics, generating vast amounts of data. The Cyber Center aims to build a common infrastructure that supports these disparate groups. For example, a project for protein analysis involves processing raw data from mass spectrometers, storing large datasets, and allowing scientists to visualize peaks and analyze specific experimental runs. Critically, for medicinal discovery, capturing all raw data is often a federal requirement for future testing. The same pipeline infrastructure is adapted for other areas, such as the ionomics project, which studies how plants become more efficient in storing minerals like iron and zinc. The innovation lies not in building specific systems but in creating generic, reusable infrastructure that allows scientists to jointly study cells and organisms without worrying about the underlying data management and search complexities.
Innovations in database annotation and provenance
A significant research area within the Cyber Center's database group focuses on enhancing biological databases with advanced annotation and provenance capabilities. The objective is to treat annotations as first-class citizens within the database, enabling SQL queries directly on annotation data, not just treating them as simple string fields. This allows for more sophisticated manipulation, querying, and updating of annotations. Provenance, or tracking the origin of data, is another critical, often unavailable, feature in existing biological databases. The system tracks where data came from, how aggregate information was formed, and maintains historical data. This ensures that researchers can verify the origins of information, which is paramount in scientific discovery. This approach moves beyond traditional file-based annotations or manual updates to separate tables, integrating them directly into the database schema for robust querying and management.
Querying the physical world through smart tags
The newest frontier explored by the Cyber Center is 'spacey,' a project aiming to query and access the physical world directly, moving beyond searching stored data or metadata. The vision is to use technologies like RFID tags to query physical objects in real-time. For instance, instead of looking up information about cameras and chairs in a room from a database, the system would query the RFID tags attached to these objects directly. This involves building heterogeneous indexes on the physical world, enabling database discovery, querying, and monitoring of physical assets. This represents a paradigm shift from information retrieval to direct interaction with tangible objects via connected sensors and tags.
Mentioned in This Episode
●Software & Apps
●Companies
●Organizations
Common Questions
The Cyber Center is an umbrella organization at Purdue University dedicated to IT-related research. It focuses on developing new cyber infrastructure tools and techniques and deploying them to real communities, often in collaboration with ITAP (Information Technology at Purdue).
Topics
Mentioned in this video
A Department of Homeland Security-funded project at Purdue focused on looking at disease outbreaks by collecting and analyzing medical, healthcare, and veterinary data.
A research and development hub at Purdue University comprising multiple centers, including the Cyber Center, fostering collaboration across various scientific disciplines.
A department that collaborates with PERVaC, experiencing issues with false positives and negatives in disease outbreak detection.
A cyber community initiative focused on studying E. coli by enabling collaboration and sharing of biological databases worldwide.
A collaborating agency with PERVaC for disease outbreak analysis.
Funder of several projects discussed, including data analytics and visualization centers and the PERVaC project.
The institution where the speaker and Professor McGarmett are based, and where the Cyber Center is located.
The deployment arm of Purdue's IT organization, responsible for implementing results from the Cyber Center.
An umbrella organization at Purdue University for all IT-related research, focusing on building cyber communities.
Funder of various projects mentioned, including the NanoHub and e.coli Hub initiatives.
Pacific Northwest National Laboratory, involved in the funding of the PERVaC project through the Department of Homeland Security.
The client for a completed project focused on knowledge management for maintenance data of a radar avoidance system on an aircraft carrier.
A database system mentioned in a discussion about how annotations are currently handled.
A new project focused on querying and monitoring the physical world, akin to searching physical objects via RFID tags.
A database system mentioned as running in conjunction with Websphere for the ionomics project.
A database system mentioned in the context of the ionomics project, running within Websphere.
A distributed computing system used to route simulations when they cannot be run on the NMI cluster.
A cyber community and portal for nanotechnology research, providing collaboration tools, learning modules, seminars, and simulations.
A grid computing resource utilized by the Cyber Center, part of the National Science Foundation's infrastructure.
A middleware technology used for routing simulations from NanoHub to other computing clusters.
A software platform mentioned as running a DB2 database for the ionomics project.
A computing cluster at Purdue used for running virtual machines and simulations within the NanoHub infrastructure.
A database system mentioned in a discussion about how annotations are currently handled.
The website portal for the NanoHub cyber community, offering resources for microelectronics research.
A query language discussed in the context of handling annotations, with a proposal for 'annotation SQL'.
An industrial company that Professor McGarmett consulted for.
An organization or program whose strengths are leveraged by the Cyber Center's security and privacy group.
An industrial company that Professor McGarmett consulted for.
A database system mentioned as handling schema matching and other front-end work for the E.coli Hub project.
More from GoogleTalksArchive
View all 79 summaries
58 minEverything is Miscellaneous
54 minStatistical Aspects of Data Mining (Stats 202) Day 7
45 minKey Phrase Indexing With Controlled Vocabularies
63 minMysteries of the Human Genome
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free