home Browsers Physical limits of computing. Formation of a physical address in a universal microprocessor under various operating modes

Physical limits of computing. Formation of a physical address in a universal microprocessor under various operating modes

Programs use logical addressing to address operands in the physical address space. The processor automatically translates logical addresses into physical ones, which are then issued to the system bus.

The architecture of a computer distinguishes between a physical address space (PHA) and a logical address space (LAP). Physical address space is a simple one-dimensional array of bytes, accessed by the memory hardware at the address present on the address bus of the microprocessor system. logical address space organized by the programmer based on specific needs. Translation of logical addresses into physical ones is carried out by the MMU memory management unit.

In the architecture of modern microprocessors, LAP is represented as a set of elementary structures: bytes, segments, and pages. Microprocessors use the following options organizations logical address space:

flat (linear) LAP: consists of an array of bytes that has no defined structure; address translation is not required, since the logical address is the same as the physical one;
segmented LAP: consists of segments - continuous areas of memory containing in the general case a variable number of bytes; the logical address contains 2 parts: the segment ID and the offset within the segment; address translation is carried out by the MMU segmentation unit;
page PA: consists of pages - contiguous areas of memory, each of which contains a fixed number of bytes. The logical address consists of a page number (identifier) and an offset within the page; translation of a logical address into a physical address is carried out by the MMU paging unit;
segment-page LAP: comprises segments, which, in turn, consist of pages; the logical address consists of a segment identifier and an offset within the segment. The segment mapper MMU translates the logical address into a page number and page offset, which are then translated to a physical address by the page mapper MMU.

The microprocessor is capable of operating in two modes: real and protected.

When working in real mode processor capabilities are limited: the addressable memory capacity is 1 MB, there is no paging of memory, segments have a fixed length of 216 bytes.

This mode is usually used at the initial stage of booting the computer to enter protected mode.

AT real mode processor segment registers contain the upper 16 bits of the physical address of the beginning segment. Shifted 4 bits to the left selector gives the 20-bit base address of the segment. The physical address is obtained by adding this address to a 16-bit offset value in the segment, formed according to the specified addressing mode for the operand or extracted from the EIP register for the instruction (Fig. 3.1). At the received address, information is fetched from memory.

Rice. 3.1. Scheme for obtaining a physical address

The microprocessor's memory addressing capabilities are most fully realized when working in protected mode. The amount of addressable memory is increased to 4 GB, the possibility of page addressing mode appears. Segments can have a variable length from 1 byte to 4 GB.

The general scheme for the formation of a physical address by a microprocessor operating in protected mode, shown in Fig. 3.2.

As already noted, the basis for the formation of a physical address is a logical address. It consists of two parts: selector and offsets in the segment.

Selector contained in the segment register of the microprocessor and allows you to find the description of the segment (descriptor) in a special descriptor table. Descriptors segments are stored in special system objects global (GDT) and local (LDT) descriptor tables. Descriptor plays a very important role in the functioning of the microprocessor, from the formation of a physical address with various organization of the address space and to the organization of a multiprogram operation mode. Therefore, let's consider its structure in more detail.

Segments microprocessor running in protected mode, are characterized by a large number of parameters. Therefore, in general-purpose 32-bit microprocessors, segment information is stored in

Rice. 3.2. Formation of a physical address in the segment-page organization of memory

a special 8-byte data structure called descriptor, and the main function is assigned to the segment registers - determining the location of the descriptor.

Structure segment descriptor shown in fig. 3.3.

Rice. 3.3. Segment descriptor structure

We will consider the structure, and not the format of the descriptor, since during the transition from the i286 microprocessor to the 32-bit MP, the arrangement of individual descriptor fields lost its harmony and partially began to look like "patches" put in order to mechanically increase the bit depth of these fields.

The 32-bit base address field allows you to specify the starting address of a segment anywhere in the 232-byte (4 GB) address space.

Limit field(limit) indicates the length of the segment (more precisely, the length of the segment minus 1: if this field is written 0, then this means that the segment has a length of 1) in addressable units, that is maximum size segment is 2 20 elements.

The value of the element is determined by one of the attributes of the descriptor bit G (Granularity - granularity, or fractionality):

Thus, a segment can have a size with an accuracy of 1 byte in the range from 1 byte to 1 Mbyte (with G = 0). With a page size of 2 12 = 4 KB, you can set the segment size to 4 GB (with G = l):

Since, in the IA-32 architecture, a segment can start at an arbitrary point in the address space and have an arbitrary length, segments in memory may partially or completely overlap.

Dimension Bit(Default size) determines the length of addresses and operands used in the instruction by default:

to your own discretion. Of course, this bit is not for regular user, and for system programmer, which uses it, for example, to mark segments for garbage collection or segments whose base addresses cannot be modified. This bit is available only to programs running at the highest privilege level. The microprocessor does not change or use it in its work.

Access Byte defines the basic rules for handling the segment.

Presence bit P (Present) indicates the ability to access the segment. The operating system (OS) marks the segment transferred from RAM to external memory as temporarily absent by setting its descriptor to P = 0. When P = 1, the segment is in physical memory. When a descriptor with P = 0 is selected (the segment is not in RAM), the base address and limit fields are ignored. This is natural: for example, how can we talk about the base address of a segment if the segment itself is not in random access memory? In this situation, the processor rejects all subsequent attempts to use descriptor in teams, and defined descriptor The address space seems to "disappear".

There is a special case of non-presence of the segment. At the same time, the operating system copies the requested segment from disk to memory (while possibly deleting another segment), loads it into descriptor the base address of the segment, sets P = 1 and restarts the instruction that accessed the segment that was not in RAM.

The two-digit DPL (Descriptor Privilege Level) field indicates one of four possible (from 0 to 3) handle privilege levels, which determines the possibility of access to the segment by certain programs (level 0 corresponds to the most high level privileges).

Reversal bit A (Accessed) is set to "1" on any access to the segment. Used by the operating system to keep track of segments that have not been accessed for the longest time.

Let, for example, once per second, the operating system reset bit A in the descriptors of all segments. If, after some time, it is necessary to load a new segment into RAM, for which there is not enough space, the operating system determines "candidates" to clear part of the RAM , among those segments in descriptors which bit A has not been set to "1" up to this point, that is, which have not been accessed recently.

Type field in the access byte defines the purpose and usage of the segment. If bit S (System - bit 4 of the access byte) is 1, then this descriptor describes a real memory segment. If S = 0, then this descriptor describes a special system object, which may not be a memory segment, such as a call gateway used in a task switch, or a handle to a local LDT descriptor table. Bit Assignment<3...0>access byte is determined by the segment type (Fig. 3.4).

Rice. 3.4. Access Byte Type Field Format

In the code segment: the C (Conforming) subordination bit defines additional handling rules that provide protection for program segments. For C = 1 this segment is a subordinate code segment. In this case, he is intentionally deprived of privilege protection. Such a tool is convenient for organizing, for example, subroutines that should be available to all tasks running in the system. With C = 0, this is a normal code segment; The R (Readable) read bit sets whether the segment can be accessed only for execution or for execution and reading, for example, constants as data using the segment replacement prefix. When R = 0, only a selection from the segment of commands for their execution is allowed. When R = 1, reading data from the segment is also allowed.

Writing to the code segment is prohibited. Any attempt to write causes a software interrupt.

In the data segment:

ED (Expand Down) - expansion direction bit. With ED = 1, this segment is a stack segment and the offset in the segment must be greater than the segment size. With ED = 0, this is the actual data segment (the offset must be less than or equal to the segment size);
write enable bit W(Writeable). When W = 1, segment modification is allowed. When W = 0, writing to the segment is prohibited; when you try to write to the segment, a software interrupt occurs.

In the case of an operand request segment offset is formed by the microprocessor according to the operand addressing mode specified in the command. The offset in the code segment is extracted from register - instruction pointer EIP.

The sum of the segment start address retrieved from the descriptor and the generated offset in the segment gives linear address(LA).

If only the segment representation of the address space is used in the microprocessor, then the resulting linear address is also a physical one.

If, in addition to the segment one, a paging mechanism for organizing memory is used, then linear address represented as two fields: the highest digits contain the number virtual page, and the lowest offset in the page. The conversion of a virtual page number to a physical page number is carried out using special system tables: page table directory(KTS) and page tables(TS). The location of the page table directory in memory is determined by the system register CR3. The physical address is calculated as the sum of the physical page address obtained from the page table and the page offset obtained from the linear address.

Let us now consider all the stages of converting a logical address into a physical one in more detail.

The 80386 microprocessor is a full 32-bit version of the previous 16-bit 8086/80286 microprocessors and represents a major advancement in processor architecture from 16-bit to 32-bit architecture. Along with the increase in bit depth, there are also many improvements and additional features. The 80386 microprocessor provides: task switching, improved memory management, virtual memory management with or without paging, software protection, and mass storage capability. All software, written for the older 8086/8088 and 80286 microprocessors is bottom-up compatible and for the 80386 microprocessor. The amount of addressable memory by the processor has increased from 1 MB (for the 8086/8088 microprocessor) or 16 MB (for the 80286 microprocessor) to 4 GB available to the 80386 processor in a protected mode. The 80386 microprocessor can change from protected mode to real mode without hard reset microprocessor, which is time-consuming and was necessary in the case of using the 80286 processor.

The 80486 microprocessor is an improved version of the 80386 microprocessor and executes many of its instructions in a single clock cycle. The 80486 microprocessor also has an internal L1 cache of 8 KB and an integrated high-performance math coprocessor that is software compatible with the 80387 coprocessor. Note that the 80486DX4 processor has a 16 KB cache. The 80486 microprocessor, running at the same clock speed as the 80386 microprocessor, is 50% more productive.

Microprocessor 80386

Two versions of the 80386 microprocessor were most common: the 80386DX processor, and the simplified 80386SX processor with an external 16-bit data bus, although it used a 32-bit bus internally.

The 80386DX processor was placed in a 132-pin PGA (pin grid array) package. More a new version microprocessor 80386 - 80386EX contains: an AT bus control unit, a dynamic RAM regeneration control unit, a programmable crystal select unit, an interrupt controller, a DMA controller, a timer unit, a serial data transfer unit, 26 address bus pins, 16 data bus pins.

The 80386DX microprocessor, having a 32-bit data bus and a 32-bit address bus, is capable of addressing 4 GB of physical memory. The 80386SX microprocessor, like the 80286 microprocessor, addresses 16 MB of memory and has only a 24-bit address bus and a 16-bit data bus. The 80386SX microprocessor was developed after the 80386DX microprocessor for use without requiring full version 32-bit bus. The 80386SX microprocessor has been used in many personal computers who used the same system board same as for the 80286 microprocessor. At a time when most applications, including Windows, required less than 16 MB of memory, the 80386SX processor was a popular and less expensive version of the 80386 microprocessor. Later, despite the fact that the 80486 processor became increasingly cheaper for building new computing systems, anyway, the 80386 processor for a long time remained in demand for many applications. For example, the 80386EX microprocessor, although not used in personal computers, was very popular in embedded systems.

Microprocessor 80386, as well as previous versions of the family Intel microprocessors, only one +5.0V power supply is required for operation. frequency 16 MHz. In addition, there was a version of the processor with a clock speed of 33 MHz, which had a current consumption of 600 mA. The current consumption of the 80386EX microprocessor is 320 mA when operating at 33 MHz.

The processor draws current up to 1.0A during certain modes of operation. This means that the power supply and power wiring must be able to handle these current spikes. The processor has numerous pins V CC and V SS , which for correct operation microprocessor must all be connected to the source direct current+5.0 V (V CC) and grounded (V SS). Some processor pins are marked as N/C (no connection) and should not be connected anywhere. In addition to those mentioned, there were several other versions of the 80386SX and 80386EX microprocessors, which had a reduced supply voltage of +3.3 V. These processors were used in portable notebook computers (notebook) or laptop computers(laptop).

Memory system

The physical memory system that the 80386DX microprocessor can access has a capacity of 4 GB. In addition, the processor has support virtual memory up to 64 TB mapped to physical memory using the MCU and descriptors. It should be noted that virtual addressing allows you to use the program more than 4 GB in the presence of the swapping method (swapping) and with a large capacity of the hard drive. On fig. Figure 6.4 shows the organization of the physical memory system of the 80386DX microprocessor.

The memory is divided into four 8-bit memory banks, with a capacity of up to 1 GB each. This 32-bit memory organization allows access to a byte, word, or double word. The 80386DX microprocessor is able to transfer 32-bit data in one memory cycle, while the 8088 microprocessor required four cycles, and the 80286 and 80386SX microprocessors needed two cycles. Large data width is very important, especially for single precision floating point numbers that take up 32 bits. Sufficiently advanced software uses floating point numbers to store data, so 32-bit memory locations speed up program execution if it is written with memory in mind.

The address of each byte of memory is represented in hexadecimal notation, as for previous versions processors. The difference is that the 80386DX microprocessor uses a 32-bit address bus with memory addressable in the range 00000000H-FFFFFFFFH.

The two memory banks in a system built on the 8086, 80286, and 80386SX microprocessors are accessed via the BLE (A0 on 8086 and 80286) and OUT signals. The 80386DX microprocessor accesses memory banks using four BE3-BE0 signals. This organization of memory allows you to access one byte when the microprocessor activates one enable signal.

When two enable signals are activated, the processor is addressed to a word. In most cases, addressing a word refers to bank 0 and 1 or bank 2 and 3. Memory location 00000000H is in bank 0, location 00000001H is in bank 1, location 00000002H is in bank 2, and location 00000003 is in bank 3. The 80386DX microprocessor does not have address pins A0 and A1 because they are decoded internally as byte enable signals. Similarly, the 16-bit 80386SX microprocessor does not have address pin A0, since it is decoded into BLE and OUT signals. The 80386EX microprocessor addresses a data word, located in two banks of a 16-bit memory system, when the BS8 signal is passive (logic high), or a byte in an 8-bit system when this signal is activated.

Control registers

The 80386 microprocessor has other control registers in addition to the EFLAGS flag register and the EIP instruction pointer. The control register CR0 (control register) is identical to the machine status register MSW(machine status word) of the 80286 microprocessor, except that it is a 32-bit register, not a 16-bit register. Additional control registers are CR1, CR2 and CR3.

On fig. 6.5. shows the structure of the control registers of the microprocessor 80386.

The CR1 control register is not used in the 80386 microprocessor, but is reserved for future products. Control register CR2 captures the linear address at which the last memory page fault was received. Finally, control register CR3 fixes the base address of the page table. The lower 12 bits 0 through 11 of the 32-bit register contain zeros and are combined with the rest of the register bits to determine the beginning of the 4K page table.

The CR0 register has a number of special control bits, which are defined in the 80386 microprocessor as follows:

The PG (paging enable) bit is intended to select the conversion of the page table from linear addresses to physical addresses when PG = 1. Memory paging allows you to assign any physical memory location to a linear address.

□ ET

The ET (extension type) bit is an indicator of support for math coprocessor instructions. If ET is 0, then the 80287 coprocessor is selected, and if ET = 1, then the 80387 coprocessor is selected. This bit has been added to reflect the fact that the system has an 80387 coprocessor.

The TS (task switch) bit indicates that the microprocessor has performed a task switch (changing the contents of the TR task register in secure mode sets the TS bit). If the TS bit is set, then a digital coprocessor instruction results in a type 7 interrupt (coprocessor not available).

□ EAT

The EM (emulate coprocessor) bit is set to trigger a Type 7 interrupt whenever an attempt is made to execute each ESC command, i.e., an instruction related to the coprocessor. This is often used for software emulation of a coprocessor. Emulation reduces the cost of the system, but the emulated coprocessor instructions often take longer to execute, perhaps by a factor of 100.

The MP (monitor coprocessor) bit is set to trigger a Type 7 interrupt on every wait command while the TS bit is set.

□ RE

The PE (protection enable) bit is set to put the 80386 microprocessor into protected mode. It may also reset to start a rather long sequence of instructions to switch to real mode. In the 80286 microprocessor, this bit can only be set. The 80286 microprocessor is unable to transition back to real mode without performing a hard reset, which precludes its use in most protected mode systems.

Descriptors and Selectors

Before discussing the paging block, let's look at the descriptors and selectors of the 80386 microprocessor. The 80386 microprocessor uses descriptors in much the same way as the 80286 microprocessor. Descriptor in both microprocessors, this is a sequence of eight bytes that contains information about the memory segment and its location. Selector(content of the segment register) is used to identify the descriptor specified in the descriptor table. The main difference between the 80286 and 80386 microprocessors is that the latter has two additional selectors (FS and GS), and the two most significant bytes of the descriptor are defined for the 80386 microprocessor. Another difference is that the 80386 microprocessor descriptors use the 32-bit segment base address and 20-bit segment limit field instead of the 24-bit base address and 16-bit segment limit field found on the 80286 microprocessor.

The 80286 microprocessor addresses a 16 MB memory area with its 24-bit base address and has a 64 KB segment length with its 16-bit limit field. The 80386 microprocessor uses a 32-bit base address to address a 4 GB memory area, and the segment size is determined by a 20-bit limit field, which is used by two different ways. The granularity bit G (granularity) of the descriptor or, otherwise, the fractional bit determines the unit of measure for the segment size: if G = 0, the size is specified in bytes, and if G = 1, then in 4K pages. Thus, the segment size with a 20-bit limit field can be 1 MB or 4 GB, respectively.

The G granularity bit has appeared in the descriptor since the 80386 microprocessor. If the G bit = 0, then the value stored in the limit field is treated directly as a segment size limit, allowing access to any 00000H-FFFFFG segment of a 1 MB segment. If the G bit = 1, then the number stored in the limit field is interpreted as 00000XXXH-FFFFFXXXH, where XXX is any value between 000H and FFFH. This provides access to the segment size from 0 bytes to 4 GB in chunks of 4 KB. A limit value of 00001H indicates that the sector size limit is 4 KB when the G bit is 1, or 1 byte when the G bit is 0. An example would be a segment starting at physical address 10000000H. For a limit value of 00001H and a G bit = 0, this segment starts at 10000000H and ends at 10000001H. If the G bit = 1 with the same limit value (00001H), then the segment starts at location 100000000H and ends at 10001FFFH.

On fig. Figure 6.6 shows how the 80386 microprocessor addresses a memory segment in protected mode using a selector and a descriptor. The addressing shown is identical to the way the 80286 microprocessor addresses the segment. The difference is the size of the segment available to the 80386 microprocessor. The upper 13 bits (bits 15-3) of the selector are used to select a descriptor from the descriptor table. Table indicator bit TI (table indicator) (bit 2 of the selector) indicates the type of the descriptor table: local if the TI bit = 1, or global if the TI bit = 0. The lower two bits of the RPL (requested privilege level) (bits 1-0) of the selector determine the requested privilege level to access the sector.

Since the selector uses a 13-bit code to access the descriptor, each table (local or global) contains no more than 8192 descriptors. Since the possible segment size for the 80386 microprocessor reaches 4 GB, 16,384 segments can be accessed at a time using two descriptor tables. All this allows the 80386 microprocessor to support virtual memory up to 64 TB (1 TB = 1024 MB). Of course, a memory system with a capacity of only up to 4 GB can actually exist. If the program at some point in time requires more than 4 GB of memory, then the required data can be pumped into the memory system from a disk drive or some other large storage device.

The 80386 microprocessor uses descriptor tables for global ( GDT) and local (LDT) descriptors. The third descriptor table is used by interrupt descriptors ( GOT) or valves(gates). The first six bytes of the 80386 microprocessor descriptor are the same as those in the 80286 microprocessor, which ensures software compatibility with it from the bottom up. The high two bytes of the 80286 microprocessor descriptor were reserved and contained the value 00H. Descriptors for microprocessors 80286 and 80386 are shown in fig. 6.7.

The 80386 microprocessor descriptor contains a 32-bit base address, a 20-bit segment limit field, and a granularity bit G that determines the segment limit multiplier (1 or 4K times) or, otherwise, in what units the limit is set: in bytes (G = 0) or pages 4K each (G = 1). The following is the purpose of the 80386 microprocessor descriptor fields:

□Base (B31-B0)

The Base field specifies the base (starting) 32-bit address of the segment in the physical 4 GB address space of the 80386 microprocessor.

□ Limit (L19-L0)

The Limit field specifies the limit in bytes for the segment if the granularity bit G = 0 or in 4K pages if G = 1. This allows any segment size from 1 byte to 1 MB if the G bit = 0 or from 4 KB to 1 GB if the G bit = 1. Recall that the limit indicates the last byte in the segment.

□Access rights

The Access rights field defines the privilege level and other information regarding the segment. This byte is different for different types descriptors and is specified for each of them.

The granularity bit G (granularity) selects a multiplier of 1 or 4K for the segment limit field. If the G bit = 0, then the multiplier is 1, if the G bit = 0, then the multiplier is 4K.

The D (default size) bit determines the default size of operands and registers used. If D = 0, then 16 bits, as in the 80286 microprocessor; if D = 1, then 32 bits, as in the 80386 microprocessor. This bit determines whether prefixes are required for 32-bit data and index registers. If D = 0, then a prefix is required to access 32-bit registers and to use 32-bit pointers. If D = 1, then a prefix is required for accessing 16-bit registers and for 16-bit pointers. The use16 and use32 attributes, used with the segment directive in assembly language, control the setting of the D bit. real mode operation always assumes that the registers are 16-bit, so any instruction addressed by a 32-bit register or pointer must be prefixed.

The AVL (available) bit is available for operating system and can be used as needed. It is not applied or parsed by the processor and is intended for use by application programs.

There are two kinds of descriptors: a code and data segment descriptor, and a system segment descriptor. The first descriptor defines the data, stack, and code segments. The system segment descriptor is intended to store information about system tables, tasks, and gates.

Supercomputers have always been considered a special class of computing technology. Since such machines are built to solve unusual problems, the budgets are unusual, and this, in turn, gave a feeling of endless possibilities: it seemed that the problem was always only in money and, with another dozen or two millions, productivity could be increased endlessly. What happened in recent months and years and expressed by a fresh list of the 500 most powerful number of the planet's biters - TOP500.org known to you - gives, however, a reason to assert that "infinity" is over. Supercomputers are the first of modern computer systems stumbled into the physical limit of the possibilities of semiconductor electronics - and for them, first of all, it is now necessary to find a way out of the impasse. new technology computing.

The formal clue for such a far-reaching statement was a strange pattern noticed by the compilers of the above list. Top 500 is updated twice a year, and in top positions his latest version, published last week, there were almost no changes (only one was added to the top ten new member, and the total performance of all five hundred machines increased slightly, from 0.223 to 0.250 exaflops). But there was a qualitative general change: the “center of gravity” of the list has shifted to the top of it, or, to put it more simply, the main computing power is now concentrated in a relatively small (historically - record-breaking) number of the fastest machines. It looks like this: half of the cumulative power of the Top 450 is provided by only the first 17 computers in the list. This trend did not appear yesterday, but over the past six years it has taken shape so much that it needs to be thought about.

There is no single definitive explanation. One of the most compelling is financial: in recent years, supercomputers have become much more expensive (about four times as compared, say, with the number of bites of the mid-zero), and therefore are now available only to a relatively few government agencies and large companies. In addition, designers and buyers of new, not too powerful machines do not seek to be featured in the rating so as not to spoil their image. And so it turns out that the further, the brighter the trend manifests itself: the strong ones become stronger, the weak ones nonlinearly quickly lag behind.

An important conclusion: supercomputers have not ceased to be needed, they have only become less accessible. But what about Moore's undying law? Shouldn't it make up for the price increase with tighter packaging and thus higher performance? This is where the main suspicion emerges. It seems that we have reached the finish line, where Moore's law, although still working, is already too expensive to use for most players.

Scientists formulate the result as follows: in the absence of breakthrough technologies that would provide previously unattainable computing speed in one leap, the supercomputer industry is forced to move along an extensive path - stupidly increasing the number of processors on their machines. And even worse: since this way is not able to satisfy the appetites of users (and the number of bits is traditionally not only a data processing tool, but also a way to establish corporate and national authority), the designers have relied on graphic accelerators, which, let's say, are suitable for solving not any tasks. The number of supercomputers that actively use GPUs has grown by an order of magnitude over the past five years!

And here it is very handy to recall the forthcoming replacement of the famous Linpack test, which from the very beginning of the publication of the Top 500 (twenty years ago) has been the main measure of the performance of supercomputer systems. It is proposed to replace it with the newly developed HPCG (High Performance Conjugate Gradient) test. Reason: Linpack - written in "Fortran" as far back as 1979 - reflects the true performance of the systems being measured is unsatisfactory and the discrepancy is growing.

In general, even their common co-author Jack Dongarra cannot clearly explain the difference between Linpack and HPCG. But, greatly simplifying, the difference can be reduced to the following: Linpack mainly evaluates the ability of a supercomputer to perform pure calculations (which GPU accelerators do well), while HPCG also takes into account the performance of internal communications, which is important in solving practical scientific and technical problems (that is, frequent irregular memory access, for example).

If HPCG does not replace, then it will complement Linpack after a few years of “running in” (for those who are interested, the source codes are available under a BSD license from the Sandia Labs website). And this could lead to significant reshuffling of the entire Top 500 list, the return of small participants to it, who will begin to receive higher, fairer ratings, and even adjustments to the architecture of supercomputers when they are no longer optimized for Linpack. Although, of course, one should not particularly hope for the latter - after all, there is still no breakthrough computing technology!

And without breakthroughs, boredom reigned in the world of Numberbiters. How to build a more powerful machine? Put more processors - and therefore, find more money. But the reality is that the parallelization of practical tasks above a certain (and already achieved) level does not bring speed gains, and even the most powerful supercomputers are already so expensive that their construction and operation can be afforded by units, as discussed above. As a result, the supercomputer stream dries up. This is the end of a technological era, the end of semiconductors as we have known them for the past fifty years. And until there is a technology that can take computer performance to a new level, we will stagnate, content with an annual increment of a few percent.

What can provide such a breakthrough? The Western press is staring at nanotubes, from which the guys at Stanford managed to build one-dimensional polar transistors (CNFETs), learn how to make microcircuits with guaranteed functionality (the main problem: it is still difficult to avoid a large number misplaced nanotubes) and even build a MIPS-compatible computer, demonstrated just last week at the ACM / IEEE SC13 supercomputing conference (“Computerra” wrote about this project: see “”). In the future, this technology is able to provide a 13-fold superiority in performance per unit of energy consumption to semiconductor chips. I wonder if anyone here deals with nanotubes?

Many enthusiasts computer technology they remember with experience the times when processor frequencies were measured in megahertz, and manufacturers (that is, Intel and AMD) tried to get ahead of each other in this indicator. Then the level of power consumption and heat dissipation of processors grew so much that it became impossible to continue this race. In recent years, they began to increase the number of processor cores, but as a result, a limit was reached when this growth became unprofitable. Now getting the most power per watt has become a major factor in performance.

All these changes did not happen because the developers were faced with the physical limits of the further development of existing processors. Rather, performance has been limited by the fact that progress in some areas - primarily energy efficiency - has been slower than progress in other areas, such as expanding functionality and instruction sets. However, can it be that now the physical limit of processors and their computing power already close? Igor Markov of the University of Michigan addressed this issue in an article in the journal Nature.

Considering obstacles

Markov notes that, based on purely physical limitations, some scientists have calculated that Moore's Law will last for another hundred years. On the other hand, the International Technology Roadmap for Semiconductors (ITRS) group gives it a couple of decades of life. However, ITRS predictions can be called into question: this group used to predict processors with a frequency of 10 GHz in the days of Core2 chips. The reason for this discrepancy is that many severe physical limitations never came into play.

For example, the extreme limit on the size of a functional unit is one atom, which is the ultimate physical limit. But long before that limit can be reached, physics limits the ability to accurately control the flow of electrons. In other words, circuits can potentially be as thin as one atom, but their behavior will become unreliable much sooner. Much of Intel's ongoing work to move to thinner manufacturing processes (smaller transistors) is figuring out how to structure individual components so they can continue to function as expected.

The essence of Markov's argument can be understood something like this: although there are hard physical limits, they are often irrelevant to the problems holding modern semiconductor progress. Instead, we are faced with softer restrictions that can often be bypassed. “When the moment comes for a certain progression-impeding limitation, understanding its nature is the key to overcoming it,” he writes. “Some limitations can simply be ignored, while others remain hypothetical and based only on empirical evidence; they are difficult to install a high degree certainty."

As a result, what appear to be barriers to development are often overcome by a combination of creative thinking and improved technology. Markov's example is the diffraction limit. Initially, it was supposed to keep argon-fluorine lasers from etching any structures thinner than 65 nanometers. But with subwavelength diffraction, we are currently working on 14nm structures using the same laser.

Where are the modern limits?

Markov pays attention to two issues that he considers the largest limits: energy and communications. The issue of energy consumption comes from the fact that the amount of energy used by modern circuits does not decrease in proportion to the reduction in their physical size. The main result of this: efforts made to block parts of the chip when they are not in use. But with the current pace of development this approach at any given time, most of the chip is inactive—hence the term "dark silicon."

Energy usage is proportional to the operating voltage of the chip, and transistors simply cannot operate below 200mV. Now their voltage is 5 times higher, so there is room for reduction. But progress in reducing the operating voltage has slowed down, so we may again reach the technological limits before the physical ones.

The problem of energy use is related to the issue of communication: most of the physical volume of the chip and most of its power consumption is spent on the interaction between its different blocks or the rest of the computer. Here we really get to the physical limits. Even if the signals in the chip were to travel at the speed of light, a chip above 5 GHz would not be able to transmit information from one side of the chip to the other. The best we can do with modern technologies, is to try to develop chips in which blocks that frequently exchange data with each other would be physically close. Including a third dimension (that is, three-dimensional chains) in the equation could help, but only marginally.

What's next?

Markov is not particularly optimistic about the coming changes. In the near term, he expects the use of carbon nanotubes for wiring and optical interconnects for communications to continue the trend of helping us avoid physical limits. However, he notes that both of these technologies have their own limitations. Carbon nanotubes can be small, up to a nanometer in diameter, but they also have a size limit. And photons, if used for communication, would require hardware and energy.

Many pin their hopes on quantum computers, but Markov is not one of their fans. " quantum computers, both digital and analog, are only promising in niche applications and don't offer significant performance in general computing because they can't quickly perform sorting and other specific tasks,” he says. The problem is also that this equipment works best at temperatures close to absolute zero, while at room temperature the performance is extremely low.

However, all calculations rely on quantum effects to some extent, and Markov believes that there is something useful to be learned from quantum systems. "Individual quantum devices are approaching the energy limits for switching, while non-quantum devices are falling behind by an order of magnitude." Obviously, obtaining even a small degree of efficiency of quantum systems can make a big start in energy consumption within the entire chip.

Another Markovian physical limit: erasing a bit of information has a thermodynamic cost that cannot be avoided—computations always consume energy. One idea for avoiding this limit is "computation reversible", where the components are returned to the initial state after calculation. This method can, at least in theory, allow you to get back some of the energy used.

This idea is not entirely theoretical. Markov cites work using superconducting circuits (which he calls "quite exotic") to provide reversible behavior and energy dissipation below the thermodynamic limit. Of course, only 4 microkelvins are used here, so more energy is spent on checking the functionality of the circuits than on their actual operation.

Beyond Physics

While physics and materials science put a lot of limits on hardware, mathematics puts limits on what we can do with them. And despite its reputation as an exact science, mathematical limitations are much more vague than physical ones. For example, there is still no answer to the equality of complexity classes P and NP, despite years of effort. And although we can prove that some algorithms are the most efficient for general cases, it is also easy to find ranges of problems where alternative computational approaches perform better.

The biggest problem that Markov sees here is the struggle to extract more parallelism from the code. Even cheap smartphones are now working on multi-core processors, but so far their use is not optimal.

In general, it seems that the main limitation is the human mind. While Markov does not see fantastic new technologies coming up, he is optimistic that current obstacles will be removed or bypassed through progress in other areas.

From the editor. Our regular readers know that occasionally reprints of the most famous, classic articles and works in the field of computer science appear in our newspaper. “Physical Limits of Computing” we wanted to print for a long time… about fifteen years. But this wonderful article somehow did not find a place in terms of the composition of other materials, it would look too strange in a newspaper, being printed “just like that”. And then such luck! The article was mentioned (absolutely deserved) in the last lecture of our advanced training course, as one of the few sources of information on this topic in Russian. Of course, we could not miss the opportunity. We hope you enjoy exploring this wonderful popular material. After all, even 24 (!) years that have passed since its publication did not make it “obsolete”, although, of course, technologies have gone ahead by parsecs! But fundamental laws are too tough even for technology!

What physical factors limit the calculation process? Is there a limiting minimum energy required, for example, to perform one logical step? Apparently, such a minimum does not exist, but there are other questions that still remain open.

Calculation, regardless of whether it is performed electronic devices, on ordinary accounts or a biological system, such as the brain, is a physical process. The same concepts apply to it as to other physical processes. How much energy is needed to perform a given calculation? How long will it take? What size should the computing device be? In other words, what are the physical limits on the computation process?

Of course, asking these questions is much easier than answering them. The constraints we are interested in are somehow very far from the real constraints we are dealing with. modern technology. Therefore, we cannot claim that our research helps an engineer or a technologist in his work. These studies are more theoretical in nature. Our goal is to identify the general laws that govern all types of information processing, regardless of the means and methods of this processing. Any limits we find must be based solely on fundamental physical principles and not on the technologies currently in use.

This search for fundamental limitations has already had precedents. In the 1940s, K. Shannon, at that time an employee of Bell Telephone Laboratories, established that there were restrictions on the amount of information that could be transmitted over a communication channel in the presence of noise. These restrictions apply regardless of how the message is encoded. Shannon's work marked the birth of modern information theory. Even earlier, in the middle and end of the last century, physicists, trying to determine the fundamental limitations on the efficiency of a steam engine, created a science called “thermodynamics”. Around 1960, Landauer (one of the authors of this article), together with J. Swanson, while working at IBM, tried to apply this kind of analysis to the computation process. Starting from the mid-1970s, more and more groups of scientists from other organizations began to join these studies.

In our analysis of the physical constraints on computation, we use the term "information" in the sense in which it is defined in information theory. According to this definition, information disappears whenever two previously distinct situations become indistinguishable. AT physical systems, characterized by the absence of friction forces, information cannot be destroyed, because when information is destroyed, a certain amount of energy must pass into heat. As an example, consider two easily distinguishable physical situations. In one of them, a rubber ball is supported at a height of 1 m from the floor, in the other, at a height of 2 m. If the ball is released, it will fall and bounce up from the floor. In the absence of friction and provided that the ball is perfectly elastic, the observer will always be able to tell what the initial state of the ball was (in this case, at what height it was at the initial moment of time), since the ball that fell from a height of 2 m will bounce higher, than when it falls from a height of 1 m.

However, in the presence of friction forces, each time the ball bounces off the floor, some energy will be dissipated, and eventually the ball will stop bouncing and remain on the floor. Then it will be impossible to determine what the initial state of the ball was: the ball that fell from a height of 2 m will be completely identical to the ball that fell from a height of 1 m. Information will be lost as a result of energy dissipation.

Conventional computing devices, abacus and microprocessor dissipate energy during operation. The dissipation of energy by the logic gates of the microprocessor is due to the disappearance of information. There are other reasons too: electronic circuits Microprocessors consume power even when they are simply storing information without processing it. Accounts are dissipative due to friction forces that cannot be eliminated: in the absence of static friction, the “bones” would change position under the influence of random thermal motion of molecules. Static friction represents a certain minimum force that does not depend on the speed of the “bones”, and therefore the abacus requires some minimum energy, no matter how slowly they work.

Let us give another example of the disappearance of information. The expression "2 + 2" contains more information than the expression "= 4". If we only know that the number 4 was obtained by adding two numbers, then we will not be able to determine which numbers were added: 1 + 3, 2 + 2, 0 + 4, or some other pair of numbers. Since the output information is contained implicitly already in the input, we can assume that no calculation generates information.

Conventional logic gates dissipate energy because they discard unnecessary information. For example, if the output of the AND gate is 0, then we cannot determine what was at the inputs.

In fact, the calculations performed on modern computers are carried out with the help of many operations that destroy information. The so-called valve And ” is a device with two input lines, each of which can be set to a signal equal to 1 or 0, and one output line - the value of its signal is determined by the values of the inputs. If both inputs are 1, then the output will also be 1. If one or both inputs is 0, then the output will be 0. Whenever the output of the gate is 0, we lose information, because we do not know which of the three possible states were the input lines (0 and 1; 1 and 0 or 0 and 0). In fact, any logic gate that has more inputs than outputs inevitably loses information because we cannot determine the state of the inputs from the state of the outputs. Therefore, whenever we use such a "logically irreversible" valve, we dissipate energy in environment. Erasing one bit of data in computer memory is another operation often used in computing, which is also dissipative in nature. When erasing one bit of data, we lose all information about the previous state of this bit.

However, it is fair to ask the question, is the use of irreversible logic gates and erasing operations unavoidable in computing? If so, then any calculation we make must dissipate a certain minimum amount of energy.

As Benne (one of the authors of this article) showed in 1973, one can do without irreversible logical elements and without erasing information in the calculation. Since then, the validity of this provision has been demonstrated in several models. It is easiest to describe models based on so-called “reversible logic gates,” such as the Fredkin gate, named after Edward Fredkin at MIT. The valve has three input and three output lines. The signal on one input line, called the "control channel", does not change as it passes through the gate. If the signal on the control channel is set to 0, then the input signals on the other two lines also pass without change. But if the control line is 1, then the other two output lines switch: the input signal of one line becomes the output of the other, and vice versa. The Fredkin gate does not lose information, since the state of the inputs can always be determined from the state of the outputs.

Fredkin showed that any logical device necessary for the operation of a computer can be built in the form of an appropriate combination of Fredkin gates. In order to perform the calculation, certain input lines of some gates must be pre-set to certain values (see lower figure on the left).

A reversible Fredkin logic gate may not dissipate energy - the state at its inputs can be determined from the state of the outputs. The valve has a “control” line, the state of which is not changed by the valve. If the control line is 0, then the signal values on the other two lines also do not change, but if the control line is 1, then the input of line A becomes the output of line S, and vice versa. With reversible valves properly connected, any function performed by a conventional non-reversible device can be realized. To implement an AND operation (right), one input is set to 0 and two output bits, called "garbage", are temporarily ignored. When the calculation is complete, these bits are used to operate the gate in the opposite direction to return the computer to its original state.

Fredkin gates have more output lines than the ones they model. Therefore, in the course of calculations, it would seem that “garbage bits” are formed, i.e. bits of information not required to produce a result. Before you start another calculation, you need to somehow clear the computer of these bits. But if we erase them, then the very dissipation of energy that we wanted to avoid will occur.

In fact, these bits play a very important role. After we have received the result of the calculation and copied it from the machine from the usual output lines, the process should be run in the opposite direction. In other words, we use the “junk bits” and the output bits that the computer receives during the calculation as “input” from the “back side” of the machine. This is possible because every computer logic gate is reversible. There is no loss of information in the process of computing in the reverse direction, and therefore there is no need to dissipate energy. Eventually the computer will return to the state it was in before the computation began. Therefore, it is possible to complete the "calculation cycle" - to drive the computer forward and then return to its original state, without any energy dissipation.

So far, we have been talking about abstract logical operations without touching on the physical devices that perform these operations. However, it is not difficult to imagine a physical device that works according to the Fredkin principle. In such a device, channels for transmitting information are presented in the form of tubes. In turn, a bit of information is represented by the presence or absence of a ball in a certain section of the tube. The presence of a ball is interpreted as 1, and the absence is interpreted as 0.

The control line is represented by a narrow section of the tube, split in the middle in the longitudinal direction. When the ball enters the split section of the tube, it pushes the side walls of the tube apart, thus actuating the switching device. This switching device guides the input beads, which may be in two other tubes. When there is a ball in the control tube, then any ball coming through the input line is automatically transferred to another tube. To ensure that the switching device is turned off in the absence of a ball in the control tube, the split halves of the latter are pressed against each other by springs. When the ball enters the control tube and compresses the springs, it must expend some energy to do so. However, this energy is not lost: it is given back when the control ball leaves the split tube and the springs are released.

All the balls are, as it were, connected to each other and are pushed forward by one mechanism, so that they move synchronously; otherwise, we could not ensure that the various input and control balls arrive at the logic gate at the same time. In a sense, the calculation process is similar to movement with one degree of freedom, such as the movement of two wheels sitting rigidly on the same axle. When the computation is complete, we push all the balls in the opposite direction, eliminating all the operations carried out on the way forward and returning the computer to its original state.

If the device is completely immersed in an ideal viscous fluid, then the friction forces acting on the balls will be proportional to their speed, while there will be no static friction. Therefore, if we are satisfied with the slow movement of the balls, the friction force will be very small. In any mechanical system, the work to overcome the friction force is equal to the product of the friction force and the distance traveled by the body. (Therefore, the faster a swimmer swims a certain distance, the more energy he expends, despite the fact that the distance remains the same regardless of the speed of the swimmer.) If the balls pass through the Fredkin valves at low speed, then the work done during the movement (the product of the force distance) will be very small, since the friction force is directly proportional to the speed of the ball. In fact, we can spend arbitrarily little energy, simply due to the corresponding slowdown in the calculation process. Thus, we come to the conclusion that there is no minimum required amount of energy that must be expended in order to perform any given calculation.

An idealized physical model of a Fredkin valve: tubes play the role of conductors, and the presence or absence of a ball is interpreted as 1 or 0. The narrow split section of the tube is a control channel. When a ball hits it, the walls of the tube diverge to the sides, actuating the switching mechanism. The latter, in turn, transfers any arriving ball from line A to line B and vice versa. Two springs keep the control channel off when there is no ball in it. Such a valve does not require static friction to perform operations. It can be immersed in a viscous liquid, and then the friction forces will depend only on the speed of the balls. In this case, the dissipated energy can be arbitrarily small: in order to reduce the amount of energy dissipated, it is only necessary to reduce the speed of the balls passing through the valve.

In the considered model of a computing device, the energy lost to friction will be very small if this device operates slowly enough. Is it possible to build a model of an even more idealized machine that could calculate without any friction? Or is friction a necessary attribute of the computational process? Fredkin, together with T. Toffoli and other specialists from MIT, showed that friction is not necessary.

They demonstrated this on a model of a computing device in which calculations are carried out by shooting ideal billiard balls towards each other in the absence of friction forces. In the billiard model, ideally reflecting “mirrors” - surfaces that change the direction of ball movement, are located in such a way that the movement of balls on the table simulates the passage of bits of information through logic gates (see figure). As before, the presence of a ball in a certain part of the “computer” is interpreted as 1, and the absence is interpreted as 0. If two balls reach the logic gate at the same time, then they collide and their trajectories change; the new paths represent the output of the gate. Fredkin, Toffoli and others developed mirror arrangements corresponding to different types of logic gates and proved that it is possible to build a billiard model of any logic element required for calculations.

Billiard computer model: The movement of billiard balls on the surface of the table simulates the passage of bits of information through a logic gate. In billiard logic gates (left), the trajectories of the balls change when they collide with each other or with "mirrors". In addition to their functions in valves, mirrors can change the angle of the ball's trajectory (a), shift it to the side (b), delay the ball without changing its final direction or speed (c), or cause the trajectories to intersect (d). Mirrors can be arranged in such a way that the resulting "computer" performs the functions of any logical device. For example, you can build a billiards computer to recognize prime numbers. Such a computer (on the right) accepts an arbitrary five-digit binary number(in this case 01101, or 13) and a fixed input sequence of 01. Like the Fredkin gate, the billiard computer returns more output bits than the user needs. In this case, it returns the original number itself (representing the “extra” output) and the “answer”: the sequence 10 if the input is prime, and 01 if it is composite.

To start the calculation process, we shoot a billiard ball at the input of the computer if we need to enter a unit. The balls must enter the machine at the same time. Since the balls are perfectly elastic, they do not lose energy when they collide with each other. They will exit the car with the same amount of kinetic energy they entered it with.

In the process of operation, the billiard computer generates “junk bits”, just like a computer built on Fredkin gates. After the computer has finished executing the task, we reflect the billiard balls in the opposite direction, reversing the calculation process. The balls will come out of the car exactly where we sent them into the car, and at the same time they will move at the same speed. Thus, the mechanism that launched the balls into the car can now get their kinetic energy back. And in this case, by performing a calculation, we can return the computer to its original state without dissipating energy.

The billiard computer has one significant drawback: it is extremely sensitive to the slightest inaccuracies. If the ball is sent with a slight deviation from the correct direction, or the mirror is turned at an angle slightly different from the calculated one, the balls will go off the desired trajectories. One or more balls will deviate from the calculated path, and after some time the combined effect of these errors will disrupt the entire calculation process. Even if perfectly elastic, frictionless balls could be made, the random thermal motion of the molecules that make up the balls could be enough to cause errors after a few dozen collisions.

Of course, it would be possible to install corrective equipment that would return an incorrectly moving ball to the desired trajectory, but in this case, information about the previous states of the ball would have to be destroyed. For example, it would be necessary to destroy information regarding the amount of deviation of the mirror from the correct position. However, getting rid of information even in order to correct an error is possible only in a system in which friction forces exist and energy loss is possible. Therefore, the corrective equipment must dissipate a certain amount of energy.

Many of the difficulties encountered in using the billiard computer model could be avoided or at least reduced if submicroscopic particles, such as electrons, were used instead of billiard balls. As W. Zurek from the Los Alamos National Laboratory pointed out, thanks to the laws of quantum mechanics that impose restrictions on the state of elementary particles, the possibility of small deviations in the movement of particles can be eliminated.

Although so far our reasoning has been based mainly on classical dynamics, several researchers have proposed other models of reversible computers based on the principles of quantum mechanics. Such machines, first proposed by P. Benioff from the National Laboratory in Argonne (France) and improved by others, in particular R. Feynman from the California Institute of Technology, have so far been described only in the most general terms. In essence, the particles in these computer models must be arranged in such a way that the rules of quantum mechanics that govern their interaction are exactly the same as the rules that predict the values of signals at the outputs of reversible logic gates. Suppose, for example, that the spin of a particle can have only two possible values: direction up (corresponding to binary 1) and down (corresponding to binary 0). The interaction between the values of the spins of the particles must take place in such a way that the value of the spin of a given particle changes depending on the spin of the particles that are nearby. In this case, the spin of the particle will correspond to one of the outputs of the logic gate.

Above, we talked mainly about information processing. But the computer must not only process data, but also store them. The interplay between information storage and processing is perhaps best described by a device called a “Turing machine” (after Alan M. Turing, who first proposed such a machine in 1936). A Turing machine can perform any calculation performed by a modern computer. S. Benne (one of the authors of this article) proved the possibility of building a Turing machine, i.e. one that does not lose information and, therefore, in the process of work can expend any predetermined small amount of energy.

A Turing machine is capable of doing any calculation that a computer can do. An infinitely long tape is divided into discrete segments, each containing a 0 or 1. A "read and write head", which can be in any of several internal states(here there are only two states: A and B), moves along the tape. Each cycle begins with the head reading one bit from a tape segment. Then, according to a fixed set of transition rules, it writes a bit of data to the tape segment, changes its internal state, and moves one position left or right. Because the this machine Turing has only two internal states, its capabilities are limited to trivial calculations. More complex machines with a large number of states are capable of simulating the behavior of any computer, including those much more complex than themselves. This is possible due to the fact that they store a complete representation of the logical state of a larger machine on an infinite tape and break each computational cycle into a large number of simple steps. The machine shown here is logically reversible: we can always determine the previous states of the machine. Turing machines with different transition rules may not be logically reversible.

A Turing machine is made up of several components. One of them is a tape divided into separate sections or segments, each of which contains 0 or 1, which is the input data. The "read and write head" moves along the tape. The head can perform several functions - read one bit of data from the tape, write one bit to the tape and move one segment to the left or right. In order for the next cycle to keep information about what was done on the previous one, the head mechanism has a number of so-called “states”. Each state represents its own, slightly different configuration of the internal parts of the head.

On each cycle, the head reads a bit from the tape segment opposite which it is in this moment located. It then writes the new bit value to the tape, changes its internal state, and moves one segment to the left or right. The value of the bit it writes, the state it transitions to, and the direction it moves in are determined by a fixed set of transition rules. Each rule describes certain actions. Which rule the machine is currently following is determined by the state of the head and the value of the bit just read from the tape. For example, the rule might be: “If the head is in state A and is located opposite the segment in which 0 is written, then it should change the value of this bit to 1, go to state B and move one segment to the right.” According to some other rule, the machine must not change its state or write a new bit to the tape, or else it must stop. Not all Turing machines are reversible, but it is possible to build a reversible Turing machine that can perform any calculation.

Models based on a reversible Turing machine have an advantage over machines such as the billiard computer, which has no friction. In a billiards computer, random thermal motion of molecules leads to inevitable errors. Reversible Turing machines actually use random thermal motion: they are designed in such a way that it is thermal motion, assisted by a weak driving force, that takes the machine from one state to another. The development of the computational process resembles the movement of an ion (charged particle) in a solution in a weak electric field. If you observe the behavior of an ion for a short period of time, it will seem random: the probability of movement in one direction is almost the same as in the other. However, the driving force due to the action of the electric field gives the movement a preferred direction. The probability that the ion will move in this direction is somewhat greater. At first glance, it may seem incredible that the purposeful sequence of operations inherent in the process of calculation can be realized by an apparatus whose direction of movement at any moment of time can be considered almost random. However, this type of action is very common in nature. It, in particular, can be observed in the microscopic world of chemical reactions. Through trial and error, Brownian motion, or random thermal motion, is efficient enough for the reacting molecules to come into contact, arrange themselves properly relative to each other, as required by this reaction, and form new molecules, which are reaction products. In principle, all chemical reactions are reversible: the same Brownian motion that makes a reaction proceed in the forward direction sometimes causes the reaction products to go through a reverse transition. In a state of equilibrium, the reverse direction of the reaction is just as likely as the direct one. To make a reaction go in the forward direction, you must constantly add molecules that enter into the reaction, and remove molecules that are products of the reaction. In other words, we must apply a small driving force. When this force is very small, the reaction will go both forward and backward, but on average it will go in the forward direction. To provide a driving force, we must expend energy, however, as in the Fredkin valve model of tubes and balls, the amount of energy can be arbitrarily small. If we are satisfied with the very slow execution of operations, then there is no minimum required amount of energy that needs to be spent on these operations. The explanation is that the total amount of energy dissipated depends on the number of steps in the forward direction divided by the number of steps in the reverse direction. (Actually, it is proportional to the logarithm of this ratio; when the ratio itself increases or decreases, its logarithm changes in the same direction.) The slower the reaction goes in the forward direction, the smaller the ratio will be. (The analogy with fast and slow swimmers is relevant here again: if the reaction proceeds more slowly, the total amount of energy expended will be less, despite the fact that the number of intermediate decays and connections remains the same.)

RNA polymerase is an enzyme that acts as a reversible tape copying machine. It is a catalyst for the synthesis of RNA, which is a copy of DNA. Moving along the DNA chain, the enzyme selects from the surrounding solution a nucleoside triphosphate molecule (each nucleoside triphosphate consists of an RNA base, a sugar molecule and three phosphate groups), the base of which is complementary to the DNA base that is currently to be copied. It attaches a new base to the end of the RNA chain under construction and releases the pyrophosphate ion. The reaction is reversible: sometimes the enzyme attaches pyrophosphate to the last RNA link (the resulting nucleoside triphosphate returns to the solution) and moves back one position along the DNA chain. When the reaction is close to chemical equilibrium, the enzyme takes almost as many steps back as it does forward, and the total energy required to copy one segment of DNA is very small. The energy dissipation is the smaller, the slower the reaction proceeds. Therefore, there is no minimum energy required to copy a segment of DNA.

LET'S SEE how a Brownian Turing machine works using the example of a Brownian machine for copying a tape. Such a machine already exists in nature. This is RNA polymerase - an enzyme involved in the synthesis of RNA, which is a copy of the DNA that makes up genes. Single-stranded DNA is a lot like the tape of a Turing machine. In each of its elements, i.e. at each position along the chain, there is one of four nucleotides, or bases: adenine, guanine, cytosine, or thymine (A, G, C, T for short). The structure of RNA is very similar to DNA. It is also a long chain-like molecule consisting of four types of bases - adenine, guanine, cytosine and uracil (respectively A, G, C and U). RNA bases are able to bind to their complementary DNA bases.

RNA polymerase catalyzes the formation of its complementary copy, RNA, on DNA. Usually, a DNA double strand twisted into a helix is surrounded by a solution containing a large number of ribonucleoside triphosphate molecules, each of which consists of a ribonucleotide (RNA base), a sugar, and a tail of three phosphate groups connected in series. RNA polymerase selects from the solution one of the RNA bases that is complementary to the base that is currently to be copied from the DNA strand, and attaches it to the end of the growing RNA strand, releasing two phosphates into the surrounding solution in the form of a pyrophosphate ion. The enzyme then moves forward one position along the DNA strand, preparing to add the next base to the RNA strand. As a result, an RNA chain is formed that is complementary to the template - the DNA chain. Without RNA polymerase, these reactions would be very slow and there would be no guarantee that the resulting RNA is exactly complementary to DNA.

The described reactions are reversible: sometimes the enzyme attaches a free pyrophosphate ion to the last base of the growing RNA chain and a ribonucleoside triphosphate molecule is released into the environment, and the enzyme itself returns one position back along the DNA chain. In a state of equilibrium, forward and backward steps occur with the same frequency, but in a living cell, other metabolic processes shift the equilibrium towards a forward reaction due to the removal of pyrophosphate and the creation of an excess of ribonucleoside triphosphates. Under laboratory conditions, it is possible to control the rate of the RNA polymerase reaction by varying the concentration of the initial reagents (this was proved by J. Levin and M. Chamberlain from the University of California at Berkeley). As the concentrations approach equilibrium, the enzyme works slower and less energy is dissipated in copying a given piece of DNA as the ratio of forward to backward steps becomes smaller.

RNA polymerase simply copies information without processing it, it's not hard to imagine how a hypothetical Turing chemical machine could work. The ribbon is one long skeletal molecule to which two types of bases are attached at regular intervals, interpreted as bits 0 and 1. Another small molecule is attached to one of the positions in the chain of zeros and ones. The position to which this molecule is attached is nothing but the segment of the tape on which the head of the Turing machine is located. There are several different types of "head molecule". Each type represents one of the possible internal states of the machine.

The machine's transition rules are represented by enzymes. Each enzyme is a catalyst for a specific reaction. To better understand how these enzymes work, consider an example.

Let us assume that the head molecule is of the type BUT (meaning the machine is in the state BUT ) and attached to the zero base. Assume also that the following transition rule is in effect: “When the head is in the state BUT and reads 0, replace 0 with 1, go to state AT and move to the right. The enzyme molecule representing this rule has a site suitable for attachment of a head molecule of the type BUT connected to base 1. It also has a place suitable for attaching base 0 and a place suitable for head type AT (see picture).

To make the required transition, the enzyme molecule first approaches the position on the tape immediately to the right of the base to which the type head is currently attached. BUT . She then separates both the head molecule and the base 0 to which the head is attached from the tape, and puts base 1 in their place. She then attaches a head like AT to the base to the right of the single base just attached to the tape. This completes the transition. On the original tape segment, 0 has been replaced by 1, the head molecule is now of the type AT and attached to the base, located one position to the right of the original.

A hypothetical Turing enzyme machine can perform a computation with an arbitrarily small amount of energy. Molecules representing bits 0 and 1 are attached to the backbone molecule. The molecule representing the head of the Turing machine is attached to one of the positions in the chain (7). different types head molecules represent different states of the machine. The transition rules are represented by enzymes. At each cycle, the enzyme connects with the head and the bit molecule associated with the head (2), separates them from the chain, and puts the desired bit molecule in their place (3). In doing so, it rotates, attaching the appropriate head molecule to the next bit to the right or left of the one just changed. The loop is now complete (4): the value of the bit has changed, the head has changed state and moved. Reactions like RNA synthesis can dissipate an arbitrarily small amount of energy.

Turing's Brownian machine is a clock mechanism consisting of rigid smooth parts that do not fit tightly to each other and are supported in the desired position not by friction, but by a system of grooves and teeth. Despite the free connection of parts, they can only make such large-scale movement that corresponds to the step of calculations in the forward or reverse direction, in other words, they can follow only one “computation path”. The mechanism is slightly pushed by a very weak external force, so that the probability of moving forward is almost the same as moving backward. However, on average the machine will move forward and the calculation will eventually be completed. A machine can be made to expend an arbitrarily small amount of energy by a corresponding reduction in the driving force.

The tape segments are represented by grooved disks, and the bits are represented by E-shaped blocks that are attached to the disk either in the upper (7) or lower (0) position. The head consists of rigid parts connected in a complex mechanism (most of which is not shown here). A reading element, a manipulator and a rod shaped like a screwdriver are suspended from it. The machine is controlled by a roller with grooves applied to its surface, similar to a roller for playing records on a phonograph (top left, deep right). Different grooves correspond to different states of the head.

At the beginning of the cycle, the head is located above one of the discs, and the “needle” is in the “read” segment of the control roller groove corresponding to the current state of the machine head. During the “read” phase of loop (7), the reader determines whether the block representing the bit is rotated up or down by performing the “obstacle read” procedure (center right). The reader element passes along the block along the upper or lower path. On one of these paths, he must meet an obstacle in the form of a ledge at the end of the block, so only one path remains possible. At the control roller point corresponding to this “decision”, the grooves branch out and the needle is guided into the groove corresponding to the value of bit (2). The control roller then turns until the needle reaches the "write" segment (3). Here, each groove contains its own set of “instructions”, which are transmitted to the machine using an intricate connection between the needle and the rest of the mechanism.

If the instruction requires changing the value of a bit, the manipulator is actuated and hooks onto the lug of the block, then the screwdriver turns the disk until the block is free, the manipulator turns the block up or down, and the screwdriver again turns the disk so that the block is in place. After passing the “write” segment of the control roller, the needle enters the “shift” segment (4). Each groove in this segment contains an instruction to move the head one position left or right. Next, the needle enters a "state change" segment (5) where the grooves merge so that the needle enters the groove representing the next state of the head. The loop is now complete (6). Discs adjacent to the one currently being read are held in position by the head. Disks further away are locked with a special “lock”. The lock of each disk is associated with a special bit, called the Q-bit, of the adjacent disk. The arrangement of this connection is such that the disk currently being read is released and can be moved, while the disks further away from it, both to the left and to the right, are maintained in a stationary state.

In order for the Brownian Turing machine to work, the tape must be immersed in a solution containing many enzyme molecules, as well as a sufficient supply of "zeros", "ones" and "heads" like BUT and AT . In order for the reaction to proceed in the forward direction, some other reaction is needed, which would purify the enzyme molecules from the heads and bases separated from the tape. The concentrations of substances that purify the molecules of enzymes are the driving force that makes the Turing machine work in the forward direction. Again, we can expend an arbitrarily small amount of energy if the machine performs operations slowly enough.

An enzyme-based Turing machine will not be error-free. From time to time, reactions can occur that occur without catalysis by enzymes. For example, base 0 can spontaneously separate from the skeletal molecule, and base 1 can take its place. In fact, such errors do occur during RNA synthesis.

In principle, one could get rid of these errors by building a Brownian Turing machine on the basis of a rigid, absolutely smooth clockwork. Such a Turing machine is less idealized than a billiard computer, but more idealized than an enzyme machine. On the one hand, its parts do not require absolutely precise machining, as is necessary for billiard balls, clockwork parts can have some tolerances and the machine can work even in the presence of significant thermal noise. And yet the machine must be absolutely rigid and free from static friction, and no macroscopic body possesses these qualities.

Since the parts of the machine do not fit tightly to each other, they are held in position not by friction, but by a system of grooves - grooves and teeth (see figure). Although each part of the machine has a small amount of free play, like the pieces of a wooden jigsaw puzzle, in general, the mechanism can only follow one “computational path”. In other words, the parts are interlocked with each other in such a way that at any given time the machine can perform only two types of large-scale movement: the movement corresponding to the calculation step in the forward direction, and the movement in the opposite direction.

The computer makes transitions between these two types of motion only as a result of the random thermal motion of its parts, due to the influence of a weak external force. The probability of moving in the opposite direction, eliminating the results of the last operation, is almost the same as the probability of moving in the forward direction. A small force applied from the outside pushes the calculations forward. And again this force can be made arbitrarily small; and hence there is no minimum energy required to keep a Turing machine running on a clockwork basis.

Thus, for reasons of classical thermodynamics, there is no necessary minimum energy for calculations. Doesn't thermodynamic analysis then come into conflict with quantum mechanics? After all, according to the quantum mechanical uncertainty principle, there should be an inverse relationship between the degree of uncertainty about how long the process lasts, and the degree of uncertainty about the amount of energy expended in this process. Some investigators therefore believe that in any switching process that takes place in a very short period of time, some minimum energy must be expended.

In fact, the uncertainty principle does not require any finite energy minimum for a fast switching event. The uncertainty principle would only apply if we tried to measure the exact moment in time when an event occurred. Even according to the laws of quantum mechanics, extremely fast events can occur without any loss of energy. Our belief that quantum mechanics makes it possible to carry out calculations with an arbitrarily small expenditure of energy is confirmed by the models of reversible quantum mechanical computers developed by Benioff et al. These models do not dissipate energy and obey the laws of quantum mechanics.

Thus, the uncertainty principle does not seem to impose fundamental restrictions on the calculation process. Classical thermodynamics does not impose them either. Does this mean that computing has no physical limits at all? No, this is far from true. The real limitations are related to questions that are much more difficult to answer than those we have posed and considered in this article. For example, do elementary logical operations some minimum end time? What are the minimum dimensions of a device capable of performing such operations? Since the scales of size and time are related to the finite speed of light, it seems that the answers to these questions are somehow interconnected. However, we will not be able to find these answers, in any case, until the question of whether there is some elementary discreteness in the universal scale of length and time is not resolved.

At the other extreme of the problem is the question of how big we can make computer memory. How many particles in the universe can we collect and combine for these purposes? The fact is that the maximum possible size of computer memory imposes a limit on the accuracy with which calculations can be carried out. For example, the number of decimal places in the calculated value of p will be limited. Another, possibly related to the latter, question concerns the inevitable degradation processes that take place in real computers as they age. Is it possible to reduce the rate of destruction and accumulation of errors to arbitrarily small values, or does this rate impose a limitation on the maximum duration of the calculation? In other words, are there any computational tasks that cannot be completed before the material part of the computer becomes unusable?

In fact, questions like these are about restrictions on physical execution. mathematical operations. The physical laws upon which the answers must ultimately be based are themselves expressed by means of such mathematical operations. Thus, we ask ourselves the question in what form physical laws can apply under the restrictions imposed by the properties of the universe, which, in turn, are described by these laws.

Just about the complex. Programs. Iron. Internet. Windows

Physical limits of computing. Formation of a physical address in a universal microprocessor under various operating modes

Considering obstacles

Where are the modern limits?

What's next?

Beyond Physics