Key Moments

Towards Telesophy: Federating All the World' s Knowledge

Google TalksGoogle Talks
Education6 min read66 min video
Aug 22, 2012|145 views|2
Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

TL;DR

The internet has connected us to vast knowledge, but current search engines only organize it, not truly understand or synthesize it. We need to move beyond 'tele' (far) to 'sophie' (wisdom) to truly leverage information for problem-solving.

Key Insights

1

The evolution of the net has progressed from data transmission to information retrieval, and is now moving towards knowledge navigation in the 'Interspace'.

2

Google excels at organizing and accessing knowledge, but does little in the subsequent stages of analysis and synthesis for problem-solving.

3

Semantics federation involves understanding the meaning of phrases within documents and matching similar concepts across different sources.

4

Scalable semantics aims to achieve deep meaning across a broad range of topics, moving from 'meaning type 869' to recognizing broader entities like people, places, and things.

5

The BSpace system demonstrates 'space algebra' for manipulating and merging collections of knowledge, enabling on-the-fly analysis and summarization.

6

Future systems like 'hive mind' aim for true knowledge synthesis and problem-solving, building on current federated knowledge approaches.

The evolution from 'cyberspace' to 'interspace' knowledge

Bruce Schatz's 2007 talk at Google outlines the progression of the internet's capabilities, moving from early data transmission (Internet) to information retrieval (Web), and now towards deeper knowledge navigation within an 'Interspace'. He notes that while systems like Google have largely achieved the goal of federating all the world's knowledge for access and organization, they fall short in the crucial next steps of analysis, synthesis, and problem-solving. This gap represents the remaining half of the journey towards a 'hive mind' or collective wisdom, implying that while we can access information from afar ('tele'), we have yet to achieve true wisdom ('sophie') from it. Schatz emphasizes that Google, despite its success, is operating at a research level from about 10 years prior, and to remain competitive, it must engage with the more advanced stages of knowledge processing.

The linguistic hierarchy: syntax, structure, and semantics

Schatz introduces a linguistic framework for understanding knowledge federation. Syntax refers to the raw data, like bits in a file or words in a document. Structure involves identifying the parts of a document, such as author names, introduction, or methods sections, enabling more targeted searches. Semantics delves into the meaning of phrases, moving beyond mere context. While meaning is often considered static (e.g., a gene's function), its interpretation can be context-dependent (pragmatics). Schatz highlights that current systems often substitute context for true meaning, a pragmatic approach that has proven effective in systems like Google, which leverage the context of web links. The ultimate goal, 'pragmatics,' involves using knowledge for task-dependent applications, which is complex but crucial for practical problem-solving, such as in healthcare.

Federation across knowledge levels: from syntax to semantics

Schatz details different types of federation. Syntax federation, pioneered by systems like Telesphere and largely employed by Google, involves sending the same query to multiple sources, which requires managing network access, query syntax, and result merging, especially to eliminate duplicates. Structure federation, demonstrated by the DELIVER project, enables structured queries based on document parts (e.g., finding papers with 'nanostructures' in the figure caption within the last 10 years). This requires uniform markup, which is challenging due to varying definitions of authors or creators across different media. Semantics federation, the focus of his talk, aims to extract and match meaning from phrases across distributed data, a far more complex task. Currently, structure federation has not significantly penetrated mass systems, with a limited amount of correctly structured text available online.

Scalable semantics: bridging breadth and depth

The concept of 'scalable semantics' is an oxymoron, as semantics implies deep meaning while scalability requires broad coverage. Historically, semantics focused on deep parsing of specific topics. However, research, particularly from DARPA programs, found that broad approaches, like identifying entities (people, places, things) and noun phrases, scaled better and became more practical with increasing machine speed. Semantics has thus shifted from understanding a phrase's exact meaning to recognizing its type and its co-occurrence with other entities. This shift makes semantics an engineering problem: working globally by understanding all possible knowledge but acting locally by analyzing narrow collections precisely. This necessitates moving away from centralized, monolithic index systems towards distributed approaches to handle the complexity and variety of information.

Entities: identifying and tagging information units

Identifying entities is crucial for scalable semantics. This can be done through hand-tagged markup (like XML for the Semantic Web) or more practically through automatic machine tagging using training sets. The process involves extracting phrases, recognizing parts of speech, and then identifying entities like people, places, or specific domain terms (e.g., genes in biology). While manually tagged data is ideal, the informal nature of much online content necessitates automatic methods. Biology and medicine provide good examples, with entities like genes or protein kinases being frequently mentioned. However, entities vary in their ease of tagging, with organism names being straightforward while behaviors or functions are more challenging, requiring larger training sets. A significant challenge is the domain-specificity of entities, requiring separate efforts for biology, medicine, physics, and everyday subjects.

Context graphs and concept navigation for enhanced retrieval

Extracted entities can be used to build 'context graphs' that map the co-occurrence of terms within a collection. This graph can enhance search by suggesting related terms if a direct search fails, acting as a sophisticated suggestion facility. Schatz illustrates how advances in computing power, particularly the rise of clusters of workstations and then supercomputers, enabled the processing of larger collections and the in-memory computation of these vast relationship graphs. This allows for 'on-the-fly' analysis, such as clustering data or finding inner-related graphs, without requiring extensive pre-computation that traditional centralized systems demanded. This shift from pre-computation to dynamic, real-time analysis on powerful, distributed hardware is key to handling the scale of global knowledge.

BSpace: a system for dynamic knowledge manipulation

The BSpace system, developed by Schatz's team, exemplifies a new paradigm for knowledge interaction, moving beyond traditional search. It focuses on creating and manipulating 'spaces' – dynamic collections of knowledge. Key operations include 'extracting' distinguishing terms from a space, 'mapping' to break down and cluster documents within it, and performing 'space algebra' such as intersection and merging. This allows users to navigate and refine knowledge iteratively. For instance, a search for 'behavioral maturation' in insects can be refined by automatically identifying key terms, clustering results, and then intersecting those clusters with other relevant spaces. The system can also dynamically summarize entities within a given space on the fly, providing a deeper understanding than a simple list of search results. This approach emphasizes interactive exploration and manipulation of knowledge rather than passive retrieval.

The future: hive minds and semantically-based social networks

Schatz envisions future systems evolving towards 'hive minds,' capable of true knowledge synthesis and problem-solving. This involves moving away from centralizing all knowledge towards a distributed network of 'spaces' that can be dynamically manipulated. He proposes grand projects, like capturing all the knowledge generated within a university (emails, documents, communications) to build semantically-based social networks that facilitate deep understanding and collaboration, rather than just sharing content. Such a system could have profound implications for education, research, and even social interaction, moving beyond current paradigms to a more integrated and intelligent use of collective knowledge. He concludes by noting that while understanding bees' disappearance is a complex problem, the current scientific community lacks a definitive answer, reflecting the broader challenge of deep knowledge synthesis.

Common Questions

Telesophy is a concept introduced by Bruce Schatz aiming to federate all the world's knowledge. It's described as having two parts: 'tele' (broadcasting/access) and 'sophie' (wisdom/analysis), with current systems excelling at the former but needing to advance in the latter.

Topics

Mentioned in this video

More from GoogleTalksArchive

View all 48 summaries

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free