Key Moments
Towards Telesophy: Federating All the World' s Knowledge
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
The internet has connected us to vast knowledge, but current search engines only organize it, not truly understand or synthesize it. We need to move beyond 'tele' (far) to 'sophie' (wisdom) to truly leverage information for problem-solving.
Key Insights
The evolution of the net has progressed from data transmission to information retrieval, and is now moving towards knowledge navigation in the 'Interspace'.
Google excels at organizing and accessing knowledge, but does little in the subsequent stages of analysis and synthesis for problem-solving.
Semantics federation involves understanding the meaning of phrases within documents and matching similar concepts across different sources.
Scalable semantics aims to achieve deep meaning across a broad range of topics, moving from 'meaning type 869' to recognizing broader entities like people, places, and things.
The BSpace system demonstrates 'space algebra' for manipulating and merging collections of knowledge, enabling on-the-fly analysis and summarization.
Future systems like 'hive mind' aim for true knowledge synthesis and problem-solving, building on current federated knowledge approaches.
The evolution from 'cyberspace' to 'interspace' knowledge
Bruce Schatz's 2007 talk at Google outlines the progression of the internet's capabilities, moving from early data transmission (Internet) to information retrieval (Web), and now towards deeper knowledge navigation within an 'Interspace'. He notes that while systems like Google have largely achieved the goal of federating all the world's knowledge for access and organization, they fall short in the crucial next steps of analysis, synthesis, and problem-solving. This gap represents the remaining half of the journey towards a 'hive mind' or collective wisdom, implying that while we can access information from afar ('tele'), we have yet to achieve true wisdom ('sophie') from it. Schatz emphasizes that Google, despite its success, is operating at a research level from about 10 years prior, and to remain competitive, it must engage with the more advanced stages of knowledge processing.
The linguistic hierarchy: syntax, structure, and semantics
Schatz introduces a linguistic framework for understanding knowledge federation. Syntax refers to the raw data, like bits in a file or words in a document. Structure involves identifying the parts of a document, such as author names, introduction, or methods sections, enabling more targeted searches. Semantics delves into the meaning of phrases, moving beyond mere context. While meaning is often considered static (e.g., a gene's function), its interpretation can be context-dependent (pragmatics). Schatz highlights that current systems often substitute context for true meaning, a pragmatic approach that has proven effective in systems like Google, which leverage the context of web links. The ultimate goal, 'pragmatics,' involves using knowledge for task-dependent applications, which is complex but crucial for practical problem-solving, such as in healthcare.
Federation across knowledge levels: from syntax to semantics
Schatz details different types of federation. Syntax federation, pioneered by systems like Telesphere and largely employed by Google, involves sending the same query to multiple sources, which requires managing network access, query syntax, and result merging, especially to eliminate duplicates. Structure federation, demonstrated by the DELIVER project, enables structured queries based on document parts (e.g., finding papers with 'nanostructures' in the figure caption within the last 10 years). This requires uniform markup, which is challenging due to varying definitions of authors or creators across different media. Semantics federation, the focus of his talk, aims to extract and match meaning from phrases across distributed data, a far more complex task. Currently, structure federation has not significantly penetrated mass systems, with a limited amount of correctly structured text available online.
Scalable semantics: bridging breadth and depth
The concept of 'scalable semantics' is an oxymoron, as semantics implies deep meaning while scalability requires broad coverage. Historically, semantics focused on deep parsing of specific topics. However, research, particularly from DARPA programs, found that broad approaches, like identifying entities (people, places, things) and noun phrases, scaled better and became more practical with increasing machine speed. Semantics has thus shifted from understanding a phrase's exact meaning to recognizing its type and its co-occurrence with other entities. This shift makes semantics an engineering problem: working globally by understanding all possible knowledge but acting locally by analyzing narrow collections precisely. This necessitates moving away from centralized, monolithic index systems towards distributed approaches to handle the complexity and variety of information.
Entities: identifying and tagging information units
Identifying entities is crucial for scalable semantics. This can be done through hand-tagged markup (like XML for the Semantic Web) or more practically through automatic machine tagging using training sets. The process involves extracting phrases, recognizing parts of speech, and then identifying entities like people, places, or specific domain terms (e.g., genes in biology). While manually tagged data is ideal, the informal nature of much online content necessitates automatic methods. Biology and medicine provide good examples, with entities like genes or protein kinases being frequently mentioned. However, entities vary in their ease of tagging, with organism names being straightforward while behaviors or functions are more challenging, requiring larger training sets. A significant challenge is the domain-specificity of entities, requiring separate efforts for biology, medicine, physics, and everyday subjects.
Context graphs and concept navigation for enhanced retrieval
Extracted entities can be used to build 'context graphs' that map the co-occurrence of terms within a collection. This graph can enhance search by suggesting related terms if a direct search fails, acting as a sophisticated suggestion facility. Schatz illustrates how advances in computing power, particularly the rise of clusters of workstations and then supercomputers, enabled the processing of larger collections and the in-memory computation of these vast relationship graphs. This allows for 'on-the-fly' analysis, such as clustering data or finding inner-related graphs, without requiring extensive pre-computation that traditional centralized systems demanded. This shift from pre-computation to dynamic, real-time analysis on powerful, distributed hardware is key to handling the scale of global knowledge.
BSpace: a system for dynamic knowledge manipulation
The BSpace system, developed by Schatz's team, exemplifies a new paradigm for knowledge interaction, moving beyond traditional search. It focuses on creating and manipulating 'spaces' – dynamic collections of knowledge. Key operations include 'extracting' distinguishing terms from a space, 'mapping' to break down and cluster documents within it, and performing 'space algebra' such as intersection and merging. This allows users to navigate and refine knowledge iteratively. For instance, a search for 'behavioral maturation' in insects can be refined by automatically identifying key terms, clustering results, and then intersecting those clusters with other relevant spaces. The system can also dynamically summarize entities within a given space on the fly, providing a deeper understanding than a simple list of search results. This approach emphasizes interactive exploration and manipulation of knowledge rather than passive retrieval.
The future: hive minds and semantically-based social networks
Schatz envisions future systems evolving towards 'hive minds,' capable of true knowledge synthesis and problem-solving. This involves moving away from centralizing all knowledge towards a distributed network of 'spaces' that can be dynamically manipulated. He proposes grand projects, like capturing all the knowledge generated within a university (emails, documents, communications) to build semantically-based social networks that facilitate deep understanding and collaboration, rather than just sharing content. Such a system could have profound implications for education, research, and even social interaction, moving beyond current paradigms to a more integrated and intelligent use of collective knowledge. He concludes by noting that while understanding bees' disappearance is a complex problem, the current scientific community lacks a definitive answer, reflecting the broader challenge of deep knowledge synthesis.
Mentioned in This Episode
●Products
●Software & Apps
●Companies
●Organizations
●Concepts
●People Referenced
Common Questions
Telesophy is a concept introduced by Bruce Schatz aiming to federate all the world's knowledge. It's described as having two parts: 'tele' (broadcasting/access) and 'sophie' (wisdom/analysis), with current systems excelling at the former but needing to advance in the latter.
Topics
Mentioned in this video
A term introduced by Bruce Schatz in the 1980s, representing a project focused on federating all the world's knowledge, with the 'tele' part (broadcasting) being more advanced than the 'sophie' part (wisdom/analysis).
A web of data that can be more easily processed by machines, discussed as an attempt to create languages for structure and semantics, though not yet widely adopted.
Mentioned as having a strategy of classifying all web knowledge, similar to classification efforts that could be applied to entities across different subject areas.
The company hosting the talk, serving as a prime example of a large-scale knowledge organization system that the speaker critiques for its focus on access and organization over analysis and synthesis.
Where Bruce Schatz introduced the concept of Telesophy in the 1980s.
Mentioned in the context of supercomputer limitations in the past, contrasting with the rise of PCs and distributed computing.
Where the DARPA-funded concept space project members went after the project ended; suggested that their work might appear in future Windows versions.
A publisher whose digital library project failed, contrasted with the success of the University of Illinois's project due to better data cleaning and tagging.
A medical literature database used as an example to illustrate synonym recognition automation and the role of human curation in data quality.
Community Architectures for Network Information Systems, a project directed by Bruce Schatz at the University of Illinois.
A database of biomedical literature, mentioned in comparison to the size of the web in 1998 and used as an example for the BSpace system's data source.
National Center for Supercomputing Applications, where a large-scale computation for information retrieval and entity relation discovery was performed.
Located in the same area as the University of Illinois, mentioned in relation to Bruce Schatz's early work and computing advancements.
A renowned research lab where Bruce Schatz first presented his ideas on federating knowledge over 20 years prior to the talk.
Funded a project that developed concept space systems, but later pulled the plug in 2000, leading the project members to Microsoft.
One of the first web browsers, derived from Tim Berners-Lee's work, which emerged about 10 years after early federated search concepts were explored.
A tiny worm with 50 cells, mentioned as the first living organism whose genome was completely sequenced. Bruce Schatz introduced the concept of creating a database for research on this organism.
A project Bruce Schatz worked on at UIUC, focused on digital libraries.
A system developed by Bruce Schatz that organizes concept spaces to assist in deeper understanding of knowledge, particularly in biology and medicine.
The fruit fly, mentioned as an organism studied within the BSpace system, specifically in relation to behavioral maturation and gene summaries.
A program the University of Illinois library has joined, providing context for the grand project idea of capturing and relating university knowledge.
An ambitious, largely failed attempt to encode common sense knowledge for automatic reasoning, discussed in comparison to automated template derivation methods.
A free email service offered as part of the hypothetical grand project to capture and utilize university knowledge.
The inventor of the World Wide Web, whose work inspired the development of browsers like Mosaic.
Mentioned by the host in jest for wearing a coat.
The speaker, a director at the University of Illinois, Champaign-Urbana, presenting on telesophy, scalable semantics, and concept navigation.
Mentioned as being involved with the first browsers (Mosaic) during Bruce Schatz's time.
Mentioned by Bruce Schatz in the context of discussing the semantic web, indicating prior conversations or familiarity.
A virtual world game mentioned as an example of online environments children engage with, contrasted with the more advanced concept spaces discussed.
A virtual world platform mentioned as an example of online environments, contrasted by the speaker as potentially something older audiences might disengage from compared to newer, more complex systems.
More from GoogleTalksArchive
View all 48 summaries
58 minEverything is Miscellaneous
54 minStatistical Aspects of Data Mining (Stats 202) Day 7
45 minKey Phrase Indexing With Controlled Vocabularies
63 minMysteries of the Human Genome
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free