Computer Timescales Mapped onto Human Timescales - Computerphile
Key Moments
Computers are astronomically faster than humans, but some operations reveal surprising slowness relative to humans.
Key Insights
Computers perform basic arithmetic operations like addition, subtraction, multiplication, and division at speeds incomprehensible to humans.
While integer operations are fast, floating-point operations, crucial for real-world applications and AI, are also highly optimized and relatively quick.
Branch prediction penalties, where the computer guesses the wrong execution path, are significant but analogous to human distraction and recovery time.
Accessing memory (RAM) is a major bottleneck, taking significantly longer than on-chip operations, akin to a human taking a trip to a distant store.
Interacting with external storage (SSDs and HDDs) is orders of magnitude slower than RAM access, resembling a multi-day or multi-year task for humans.
Network latency, though seemingly small to humans, translates to weeks or even years for round trips in computer time.
Despite immense speed, the effective task time for complex operations like rendering a game frame is still substantial when mapped to human timescales.
THE MIND-BOGGLING SPEED OF BASIC ARITHMETIC
The video begins by establishing a baseline for human arithmetic speed, estimating about five seconds for a human to add two four-digit numbers. This is contrasted with a computer's ability to perform a similar operation (adding two 32 or 64-bit numbers) in a single clock cycle, which, at 2 GHz, equates to half a nanosecond. This fundamental difference highlights the vast disparity in processing power. Multiplication and division, though more complex, are also executed in mere clock cycles, with division being a more expensive operation for processors, taking around 30 cycles for 32-bit numbers.
FLOATING-POINT MATH AND HIDDEN EXPENSES
When dealing with real-world numbers that can be fractional, computers use floating-point arithmetic. While this involves approximations, modern hardware is highly optimized for these operations, especially for graphics (GPUs) and AI. Adding two floating-point numbers takes about four clock cycles, comparable to integer multiplication. Division of floating-point numbers is surprisingly faster than integer division on this machine, taking about 15 cycles. However, division, in general, is a costly operation, often hidden within commands like modulo, which can impact performance in applications like hash maps.
THE COST OF MISPREDICTION AND MEMORY ACCESS
Branch prediction is a critical technique for efficient pipeline processing in CPUs. A misprediction, where the CPU follows the wrong execution path, incurs a penalty ranging from 10 to 40 cycles. Analogously, this is like a human being interrupted and needing time to regain focus, equating to about 1 minute 40 seconds in human time. Memory access, however, presents a more significant slowdown. Accessing data in the L1 cache takes 4 cycles (20 seconds), L2 cache takes 12 cycles (1 minute), and L3 cache takes 30-50 cycles (3-4 minutes). This illustrates the escalating cost of retrieving data as it moves further from the CPU.
THE VAST GULF OF RAM AND STORAGE ACCESS
When data isn't in the caches, the computer must access RAM, which takes around 100 nanoseconds, translating to a staggering 15 minutes in human time. This highlights how crucial caching is for perceived speed. Further down the hierarchy, accessing data from a SATA SSD takes approximately 20 microseconds, which translates to 2.3 days in human equivalent time. Traditional spinning hard drives (HDDs) are even slower, taking milliseconds per access, amounting to about 3.2 years to read a single arbitrary sector.
NETWORK LATENCY AND INTER-PROCESS COMMUNICATION
Network interactions also reveal dramatic timescale differences. Pinging a router in the next room takes 400 microseconds (6 weeks). Pinging Google servers takes about 10 milliseconds (16 weeks), and pinging a server across the country can take around 31 years. Even more relatable, rendering a single frame in a video game, targeting 60 frames per second, gives the computer roughly 16 milliseconds, which is equivalent to five years of human computation time for that single frame. This vast disparity underscores the engineering marvels that make modern computing feel instantaneous.
THE ILLUSION OF MULTITASKING AND GRAND FINALE
Modern operating systems create the illusion of multitasking by rapidly switching between processes, often allocating a fixed time slice (e.g., 16 milliseconds) to each. This rapid switching, though too fast for humans to perceive, represents significant compute time per application. The video concludes by emphasizing that despite the incredible speeds of individual operations, the aggregate time taken for complex, real-world tasks when mapped onto human timescales is astronomical, making the technology we use daily a testament to clever engineering that hides these vast temporal differences.
Mentioned in This Episode
●Products
●Tools
●Companies
●Organizations
●Concepts
Comparison of Computational Task Times: Human vs. Computer
Data extracted from this episode
| Task | Human Time (approx.) | Computer Cycles (approx.) | Computer Time (approx. Sha Units) |
|---|---|---|---|
| Add two 4-digit numbers | 10 seconds | 1 cycle | 0.5 nanoseconds |
| Multiply two numbers | 20 seconds | 4 cycles | 2 nanoseconds |
| Divide two 32-bit integers | 2.5 minutes | 30 cycles | 15 nanoseconds |
| Divide two 64-bit integers | 8 minutes | 100 cycles | 50 nanoseconds |
| Add two floating-point numbers | 20 seconds | 4 cycles | 2 nanoseconds |
| Multiply two floating-point numbers | 20 seconds | 4 cycles | 2 nanoseconds |
| Divide two floating-point numbers | 1 minute | 15 cycles | 7.5 nanoseconds |
| Branch Misprediction Penalty | 1 minute 40 seconds | 10-40 cycles | 5-20 nanoseconds |
| Access L1 Cache | 20 seconds | 4 cycles | 2 nanoseconds |
| Access L2 Cache | 1 minute | 12 cycles | 6 nanoseconds |
| Access L3 Cache | 3-4 minutes | 30-50 cycles | 15-25 nanoseconds |
| Access RAM | 15 minutes | Not measured in cycles | 100 nanoseconds |
| Read from SATA SSD (512 bytes) | 2.3 days | Not measured in cycles | 20 microseconds |
| Read from Spinning Disk (512 bytes) | 3.2 years | Not measured in cycles | 8-10 milliseconds |
| Prepare one frame (60 FPS game) | 5 years | Not measured in cycles | 16 milliseconds |
| Ping local router | 6 weeks | Not measured in cycles | 400 microseconds |
| Ping Google | 16 weeks | Not measured in cycles | 10 milliseconds |
| Ping University of Nottingham | 31 years | Not measured in cycles | 31 milliseconds |
Common Questions
For a simple task like adding two four-digit numbers, a human might take around 10 seconds, while a modern computer can do it in a single clock cycle, which is roughly half a nanosecond. This massive difference highlights the computational power computers possess for basic arithmetic.
Topics
Mentioned in this video
The second layer of cache, larger and slightly slower than L1 cache.
An algorithm that uses division and remainders to fit values into a set of boxes, often hiding expensive operations.
A technique used in computer pipelines to guess the direction of a branch to avoid performance penalties.
A university whose web server was pinged to measure long-distance network latency.
A type of number representation used for fractional numbers, which modern computers are optimized for.
Operations that involve division and finding the remainder, which can be computationally expensive.
Graphics Processing Unit, specialized hardware optimized for parallel processing, crucial for video games.
The fastest and smallest layer of cache, located right next to the CPU die, holding the most recently used data.
The clock speed of the laptop used for measurements, indicating half a nanosecond per clock tick.
Random Access Memory, the main system memory, significantly slower to access than caches.
A type of solid-state drive, a modern, faster alternative to traditional spinning hard drives for data storage.
Traditional hard disk drives which use spinning platters to store data, much slower than SSDs.
A networking device that forwards data packets between computer networks.
Used as a simple tool for adding two numbers in one clock cycle in the computer analogy.
The third layer of cache, shared among multiple CPUs, larger and slower than L2 cache.
More from Computerphile
View all 82 summaries
21 minVector Search with LLMs- Computerphile
15 minCoding a Guitar Sound in C - Computerphile
13 minCyclic Redundancy Check (CRC) - Computerphile
13 minBad Bot Problem - Computerphile
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free