All characters and letters can be encoded using eight binary bits. The most common tables for representing letters in binary code are ASCII and ANSI, they can be used to write texts in microprocessors. In ASCII and ANSI tables, the first 128 characters are the same. This part of the table contains codes for numbers, punctuation marks, upper and lower case Latin letters, and control characters. National extensions of character tables and pseudographic symbols are contained in the last 128 codes of these tables, so the Russian texts in DOS and WINDOWS operating systems do not match.

At the first acquaintance with computers and microprocessors, the question may arise - "how to convert text to binary code?" However, this transformation is the simplest action! To do this, you need to use any text editor. Including the simplest notepad program, which is part of the Windows operating system, is also suitable. Similar editors are present in all programming environments for languages ​​such as C, Pascal or Java. It should be noted that the most common text editor Word is not suitable for simply converting text to binary code. This test editor introduces a huge amount of additional information, such as the color of the letters, italics, underlining, the language in which a particular phrase is written, and the font.

It should be noted that, in fact, the combination of zeros and ones, with the help of which text information is encoded, is not a binary code, because bits in this code do not obey laws. However, on the Internet, the search phrase "representing letters in binary code" is the most common. Table 1 shows the correspondence of binary codes to letters of the Latin alphabet. For the sake of brevity, the sequence of zeros and ones in this table is presented in decimal and hexadecimal codes.

Table 1 Table of representation of Latin letters in binary code (ASCII)

Decimal code Hex code Display character Meaning
0 00 NUL
1 01 (display control word)
2 02 (First transmitted word)
3 03 ETX (Last Transmission Word)
4 04 EOT (end of transmission)
5 05 ENQ (initialization)
6 06 ACK (acknowledgement)
7 07 BEL
8 08 BS
9 09 HT (horizontal tab
10 0A LF (line feed)
11 0B VT (vertical tab)
12 0C FF (next page)
13 0D CR (carriage return)
14 0E SO (double width)
15 0F SI (Condensed Seal)
16 10 DLE
17 11 DC1
18 12 DC2 (Cancel Condensed Print)
19 13 DC3 (ready)
20 14 DC4 (cancel double width)
21 15 § NAC (Non Acknowledge)
22 16 SYN
23 17 ETB
24 18 CAN
25 19 EM
26 1A SUB
27 1B ESC (Start Sequence Control)
28 1C FS
29 1D GS
30 1E RS
31 1F US
32 20 Space
33 21 ! Exclamation point
34 22 « angle bracket
35 23 # Number sign
36 24 $ Currency sign (dollar)
37 25 % Percent sign
38 26 & Ampersand
39 27 " Apostrophe
40 28 ( opening bracket
41 29 ) Close bracket
42 2A * Star
43 2B + plus sign
44 2C , Comma
45 2D - Minus sign
46 2E . Dot
47 2F / Fractional bar
48 30 0 Numeric zero
49 31 1 number one
50 32 2 Number two
51 33 3 number three
52 34 4 Number four
53 35 5 Number five
54 36 6 Number six
55 37 7 Number seven
56 38 8 Number eight
57 39 9 Number nine
58 3A : Colon
59 3B ; Semicolon
60 3C < less than sign
61 3D = equal sign
62 3E > Sign more
63 3F ? question mark
64 40 @ Commercial floor
65 41 A Capital latin letter A
66 42 B Latin capital letter B
67 43 C Latin capital letter C
68 44 D Capital latin letter D
69 45 E Latin capital letter E
70 46 F Latin capital letter F
71 47 G Latin capital letter G
72 48 H Latin capital letter H
73 49 I Capital latin letter I
74 4A J Latin capital letter J
75 4B K Capital latin letter K
76 4C L Capital latin letter L
77 4D M Capital latin letter
78 4E N Capital latin letter N
79 4F O Capital latin letter O
80 50 P Latin capital letter P
81 51 Q Capital latin letter
82 52 R Latin capital letter R
83 53 S Capital latin letter S
84 54 T Latin capital letter T
85 55 U Latin capital letter U
86 56 V Capital latin letter V
87 57 W Latin capital letter W
88 58 X Capital latin letter X
89 59 Y Capital latin letter Y
90 5A Z Capital latin letter Z
91 5B [ Opening square bracket
92 5C \ Backslash
93 5D ] Closing square bracket
94 5E ^ "Lid"
95 5 _ Underscore character
96 60 ` Apostrophe
97 61 a Latin lowercase letter a
98 62 b Latin lowercase letter b
99 63 c Latin lowercase letter c
100 64 d Latin lowercase letter d
101 65 e Latin lowercase letter e
102 66 f Latin lowercase letter f
103 67 g Latin lowercase letter g
104 68 h Latin lowercase letter h
105 69 i Latin lowercase letter i
106 6A j Latin lowercase letter j
107 6B k Latin lowercase letter k
108 6C l Latin lowercase letter l
109 6D m Latin lowercase letter m
110 6E n Latin lowercase letter n
111 6F o Latin lowercase letter o
112 70 p Latin lowercase letter p
113 71 q Latin lowercase letter q
114 72 r Latin lowercase letter r
115 73 s Latin lowercase letter s
116 74 t Latin lowercase letter t
117 75 u Latin lowercase letter u
118 76 v Latin lowercase letter v
119 77 w Latin lowercase letter w
120 78 x lowercase latin letter x
121 79 y Latin lowercase letter y
122 7A z Latin lowercase letter z
123 7B { Open curly brace
124 7C | vertical bar
125 7D } Close curly brace
126 7E ~ Tilde
127 7F

In the classic version of the ASCII character table, there are no Russian letters and it consists of 7 bits. However, later this table was expanded to 8 bits, and Russian letters in binary code and pseudographic symbols appeared in the upper 128 lines. In the general case, the second part contains the national alphabets of different countries and the Russian letters there are just one of the possible sets (855) there can be a French (863), German (1141) or Greek (737) table. Table 2 shows an example of representation of Russian letters in binary code.

Table 2. Table of representation of Russian letters in binary code (ASCII)

Decimal code Hex code Display character Meaning
128 80 BUT Capital Russian letter A
129 81 B Capital Russian letter B
130 82 AT Capital Russian letter V
131 83 G Capital Russian letter G
132 84 D Capital Russian letter D
133 85 E Capital Russian letter E
134 86 AND Capital Russian letter Zh
135 87 W Capital Russian letter Z
136 88 And Capital Russian letter I
137 89 Y Capital Russian letter Y
138 8A To Capital Russian letter K
139 8B L Capital Russian letter L
140 8C M Capital Russian letter M
141 8D H Capital Russian letter H
142 8E O Capital Russian letter O
143 8F P Capital Russian letter P
144 90 R Capital Russian letter R
145 91 FROM Capital Russian letter C
146 92 T Capital Russian letter T
147 93 At Capital Russian letter U
148 94 F Capital Russian letter F
149 95 X Capital Russian letter X
150 96 C Capital Russian letter C
151 97 H Capital Russian letter Ch
152 98 W Capital Russian letter Sh
153 99 SCH Capital Russian letter Ш
154 9A Kommersant Capital Russian letter Ъ
155 9B S Capital Russian letter Y
156 9C b Capital Russian letter b
157 9D E Capital Russian letter E
158 9E YU Capital Russian letter Yu
159 9F I Capital Russian letter Ya
160 A0 a Lowercase Russian letter a
161 A1 b Lowercase Russian letter b
162 A2 in Lowercase Russian letter v
163 A3 G Lowercase Russian letter g
164 A4 d Lowercase Russian letter d
165 A5 e Lowercase Russian letter e
166 A6 and Lowercase Russian letter zh
167 A7 h Lowercase russian letter z
168 A8 and Lowercase Russian letter and
169 A9 th Lowercase Russian letter y
170 AA to Lowercase Russian letter k
171 AB l Lowercase Russian letter l
172 AC m Lowercase Russian letter m
173 AD n Lowercase Russian letter n
174 AE about Lowercase Russian letter o
175 AF P Lowercase Russian letter p
176 B0
177 B1
178 B2
179 B3 Pseudo symbol
180 B4 Pseudo symbol
181 B5 Pseudo symbol
182 B6 Pseudo symbol
183 B7 Pseudo symbol
184 B8 Pseudo symbol
185 B9 Pseudo symbol
186 BA Pseudo symbol
187 BB Pseudo symbol
188 BC Pseudo symbol
189 BD Pseudo symbol
190 BE Pseudo symbol
191 bf Pseudo symbol
192 C0 Pseudo symbol
193 C1 Pseudo symbol
194 C2 Pseudo symbol
195 C3 Pseudo symbol
196 C4 Pseudo symbol
197 C5 Pseudo symbol
198 C6 Pseudo symbol
199 C7 Pseudo symbol
200 C8 Pseudo symbol
201 C9 Pseudo symbol
202 CA Pseudo symbol
203 CB Pseudo symbol
204 CC Pseudo symbol
205 CD Pseudo symbol
206 CE Pseudo symbol
207 CF Pseudo symbol
208 D0 Pseudo symbol
209 D1 Pseudo symbol
210 D2 Pseudo symbol
211 D3 Pseudo symbol
212 D4 Pseudo symbol
213 D5 Pseudo symbol
214 D6 Pseudo symbol
215 D7 Pseudo symbol
216 D8 Pseudo symbol
217 D9 Pseudo symbol
218 DA Pseudo symbol
219 D.B.
220 DC
221 DD
222 DE
223 D.F.
224 E0 R Lowercase Russian letter p
225 E1 With Lowercase Russian letter c
226 E2 t Lowercase Russian letter t
227 E3 at Lowercase Russian letter u
228 E4 f Lowercase Russian letter f
229 E5 X Lowercase Russian letter x
230 E6 c Lowercase Russian letter c
231 E7 h Lowercase Russian letter h
232 E8 w Lowercase Russian letter sh
233 E9 sch Lowercase Russian letter u
234 EA b Lowercase Russian letter ъ
235 EB s Lowercase Russian letter y
236 EU b Lowercase Russian letter ь
237 ED uh Lowercase Russian letter e
238 EE Yu Lowercase Russian letter u
239 EF I Lowercase Russian letter i
240 F0 Yo Capital Russian letter Yo
241 F1 yo Lowercase Russian letter ё
242 F2 Є
243 F3 є
244 F4 Ї
245 F5 Ї
246 F6 Ў
247 F7 ў
248 F8 ° degree sign
249 F9 Multiplication sign (dot)
250 FA ·
251 Facebook Radical (taking the root)
252 FC Number sign
253 FD ¤ Currency sign (ruble)
254 F.E.
255 FF

When writing texts, in addition to binary codes that directly display letters, codes are used that indicate the transition to a new line and the return of the cursor (carriage return) to the zero position of the line. These characters are usually used together. Their binary codes correspond to decimal numbers - 10 (0A) and 13 (0D). As an example, below is a section of the text of this page (memory dump). This section contains the first paragraph. The following format is used to display information in a memory dump:

  • the first column contains the binary address of the first byte of the string
  • the next sixteen columns contain the bytes contained in the text file. For a more convenient determination of the byte number, a vertical line is drawn after the eighth column. Bytes, for brevity, are represented in hexadecimal code.
  • in the last column, these same bytes are represented as displayed alphabetic characters
00000000: 82 E1 A5 20 E1 A8 AC A2 ¦ AE AB EB 20 A8 20 A1 E3 All characters and bu AE A4 A8 E0 AE A2 ¦ A0 AD EB 20 AF E0 A8 20 coded with 00000030: AF AE AC AE E9 A8 20 A2 ¦ AE E1 EC AC A8 20 A4 A2 00000040: AE A8 E7 AD EB E5 20 E1 ¦ A8 AC A2 AE AB AE A2 2E 00000050: 0D 0A 8D A0 A8 A1 AE AB ¦ A5 A5 20 E0 A0 E1 AF E0 AC A8 20 EF ¦ A2 AB EF EE E2 E1 EF 20 persons are 00000080: E2 A0 A1 AB A8 E6 EB 20 ¦ 41 53 43 49 49 20 E1 20 ASCII tables with 00000090: AD A0 E6 A8 AE AD A0 AB ¦ EC AD EB AC A8 0D 0A E0 national♪◙p 000000A0: A0 E1 E8 A8 E0 A5 AD A8 ¦ EF AC A8 2C 20 AF E0 A8 extensions, at 000000B0: AC A5 AD EF EE E9 A8 A5 ¦ E1 EF 20 A2 20 44 4F 53 changing in DOS 000000C0: 20 28 A8 20 AA AE E2 AE ¦ E0 EB A5 20 AC AE A6 AD (and which can be 000000D0: AE 20 A8 E1 AF AE AB EC ¦ A7 AE A2 A0 E2 EC 20 A4 : AB EF 20 A7 A0 AF A8 E1 ¦ A8 0D 0A E2 A5 AA E1 E2 Write♪◙text 000000F0: AE A2 20 A2 20 AC A8 AA ¦ E0 AE AF E0 AE E6 A5 E1 A0 E5 29 2C 20 ¦ A8 20 E2 A0 A1 AB A8 E6 sop), and tables 00000110: EB 20 41 4E 53 49 2C 20 ¦ AF E0 A8 AC A5 AD EF EE s ANSI, apply 000 00120: E9 A8 A5 E1 EF 20 A2 20 ¦ 57 49 4E 44 4F 57 53 2E available in WINDOWS. 00000130: 20 82 20 E2 A0 A1 AB A8 ¦ E6 A0 E5 0D 0A 41 53 43 31 32 38 20 E1 A8 AC ¦ A2 AE AB AE A2 20 E1 AE 128 characters from 00000160: A2 AF A0 A4 A0 EE E2 2E ¦ 20 82 20 ED E2 AE A9 20 In this 00000170: E7 A0 E1 E2 A8 20 E2 A0 ¦ A1 AB A8 E6 EB 20 E1 AE parts of the table with 00000180: A4 A5 E0 A6 A0 E2 E1 EF ¦ 0D 0A E1 A8 AC A2 AE AB the ♪◙symbol 00000190: EB is kept 20 E6 A8 E4 E0 2C 20 ¦ A7 AD A0 AA AE A2 20 AF digits, signs n 000001A0: E0 A5 AF A8 AD A0 AD A8 ¦ EF 2C 20 AB A0 E2 A8 AD repinning, latin 000001B0: E1 AA A8 A5 20 A1 E3 AA ¦ A2 EB 20 A2 A5 E0 E5 AD Upper 000001C0: A5 A3 AE 20 A8 20 AD A8 ¦ A6 AD A5 A3 AE 20 E0 A5 Lower D 000001D0: A3 A8 E1 E2 E0 AE A2 20 ¦ A8 0D 0A E3 AF E0 A0 A2 hyster and♪◙control 000001E0: AB EF EE E9 A8 A5 20 E1 ¦ A8 AC A2 AE AB EB 2E 20 characters. 000001F0: 8D A0 E6 A8 AE AD A0 AB ¦ EC AD EB A5 20 E0 A0 E1 National races 00000200: E8 A8 E0 A5 AD A8 EF 20 ¦ E1 A8 AC A2 AE AB EC AD character extensions AB A8 ¦ E6 20 A8 20 E1 A8 AC A2 tables and symbols 00000220: AE AB EB 0D 0A AF E1 A5 ¦ A2 A4 AE A3 E0 A0 E4 A8 oly♪◙pseudographics 00000230: AA A8 20 E1 AE A4 A5 E0 ¦ A6 A0 E2 E1 EF 20 A2 20 ki contained in 00000240: AF AE E1 AB A5 A4 AD A8 ¦ E5 20 31 32 38 20 AA AE last 128 ko 00000250: A4 A0 E5 20 ED E2 A8 E5 ¦ 20 E2 A0 A1 AB A8 E6 2C given these tables, 00000260: 20 AF AE ED E2 AE AC E3 ¦ 20 E0 E3 E1 E1 AA A8 A5 so Russian 00000270: 0D 0A E2 A5 AA E1 E2 EB ¦ 20 A2 20 AE AF A5 E0 A0 ♪◙ texts in opera 00000280 : E6 A8 AE AD AD EB E5 20 ¦ E1 A8 E1 E2 A5 AC A0 E5 AF A0 A4 ¦ A0 EE E2 2E 0D 0A do not match. ♪◙

In the above example, you can see that the first line of text is 80 bytes. The first byte 82 corresponds to the letter "B". The second byte E1 corresponds to the letter "c". The third byte A5 corresponds to the letter "e". The next byte 20 represents the empty space between words (space) " ". Bytes 81 and 82 contain carriage return and line feed characters 0D 0A. We find these characters at the binary address 00000050: The next line of the source text is not a multiple of 16 (its length is 76 letters), so in order to find its end, you first need to find the line 000000E0: and count nine columns from it. Carriage return and line feed bytes 0D 0A are written there again. The rest of the text is parsed in exactly the same way.

Date of the last update of the file 04.12.2018

Literature:

Together with the article "Writing texts in binary code" they read:

Representation of binary numbers in the memory of a computer or microcontroller
http://website/proc/IntCod.php

Sometimes it is convenient to store numbers in the processor memory in decimal form.
http://website/proc/DecCod.php

Standard floating point formats for computers and microcontrollers
http://website/proc/float/

Currently, both positional and non-positional number systems are widely used both in technology and in everyday life.
.php

Since it is the simplest and meets the requirements:

  • The fewer values ​​that exist in the system, the easier it is to make individual elements that operate on these values. In particular, two digits of the binary number system can be easily represented by many physical phenomena: there is current - there is no current, the magnetic field induction is greater than the threshold value or not, etc.
  • The lower the number of states for an element, the higher the noise immunity and the faster it can work. For example, to encode three states through the value of the magnetic field induction, it will be necessary to enter two threshold values, which will not contribute to the noise immunity and reliability of information storage.
  • Binary arithmetic is pretty simple. Simple are the tables of addition and multiplication - the basic operations on numbers.
  • It is possible to use the apparatus of the algebra of logic to perform bitwise operations on numbers.

Links

  • Online calculator for converting numbers from one number system to another

Wikimedia Foundation. 2010 .

See what "Binary Code" is in other dictionaries:

    2 Bittal Code of Gray 00 01 11 10 3 Bit code Gray 000 000 001 011 010 110 111 101 100 4 Bit code Gray 0000 00 0001 0011 0010 0110 01111 0100 1100 1101 1111 1110 1010 1011 1000 Gray Custom Code, in which there are two neighboring values ​​in which there are two neighboring values … … Wikipedia

    The signal point code (English Signal Point Code (SPC)) of the signaling system 7 (SS7, SS 7) is a unique (on the home network) node address used at the third MTP level (routing) in telecommunications SS 7 networks to identify ... Wikipedia

    In mathematics, a squareless number is a number that is not divisible by any square other than 1. For example, 10 is squareless, but 18 is not, since 18 is divisible by 9 = 32. The beginning of the sequence of squareless numbers is: 1, 2, 3, 5, 6, 7, ... ... Wikipedia

    Would you like to improve this article?: Wikify the article. Rework the design in accordance with the rules for writing articles. Correct the article according to the stylistic rules of Wikipedia ... Wikipedia

    This term has other meanings, see Python (disambiguation). Python Language class: mu ... Wikipedia

    In the narrow sense of the word, at present, the phrase is understood as "Attack on the security system", and tends rather to the meaning of the following term Cracker attack. This was due to a distortion of the meaning of the word "hacker". Hacker ... ... Wikipedia

I decided to make such a tool as converting text to binary code and vice versa, there are such services, but they usually work with Latin, but mine translator works with UTF-8 unicode encoding, which encodes Cyrillic characters in two bytes. It's impossible to translate Chinese characters, but I'm going to correct this unfortunate misunderstanding.

To convert text to binary representation enter the text in the left box and press TEXT->BIN in the right box, its binary representation will appear.

To convert binary code to text enter the code in the right window and press BIN->TEXT in the left window its symbolic representation will appear.

If converting binary code to text or vice versa did not work out - check the correctness of your data!

Update!

The reverse view text transformation is now available:

into a normal look. To do this, check the box: "Replace 0 with spaces and 1 with placeholder █". Then paste the text in the right box: "Text in binary representation" and press the button below it "BIN->TEXT".

When copying such texts, you need to be careful because. you can easily lose spaces at the beginning or at the end. For example, the line above looks like:

██ █ █ ███████ █ ██ ██ █ █ ███ ██ █ █ ██ █ ██ █ █ ██ █ ███ █ ██ █ █ ██ █ █ ███ ██ █ █ ███ ██ █ ██

and on a red background:

██ █ █ ███████ █ ██ ██ █ █ ███ ██ █ █ ██ █ ██ █ █ ██ █ ███ █ ██ █ █ ██ █ █ ███ ██ █ █ ███ ██ █ ██

see how many spaces at the end can be lost?

Computers don't understand words and numbers the way humans do. Modern software allows the end user to ignore this, but at the lowest levels your computer operates with a binary electrical signal that has only two states: there is current or no current. To "understand" complex data, your computer must encode it in binary.

The binary system is based on two digits, 1 and 0, corresponding to on and off states that your computer can understand. You are probably familiar with the decimal system. It uses ten digits, from 0 to 9, and then moves on to the next order to form two-digit numbers, with the digit from each order ten times the previous one. The binary system is similar, with each digit being twice as large as the previous one.

Counting in Binary

In binary, the first digit is equivalent to 1 in decimal. The second digit is 2, the third is 4, the fourth is 8, and so on—doubling each time. Adding all these values ​​will give you a number in decimal format.

1111 (binary) = 8 + 4 + 2 + 1 = 15 (decimal)

Accounting for 0 gives us 16 possible values ​​for four binary bits. Move 8 bits and you get 256 possible values. This takes up a lot more space to represent, since four digits in decimal gives us 10,000 possible values. Of course, binary code takes up more space, but computers understand binary files much better than the decimal system. And for some things, like logic processing, binary is better than decimal.

It should be said that there is another basic system that is used in programming: hexadecimal. Although computers do not work in hexadecimal, programmers use it to represent binary addresses in a human-readable format when writing code. This is because two digits of a hexadecimal number can represent a whole byte, that is, they replace eight digits in binary. The hexadecimal system uses the numbers 0-9, as well as the letters A through F, to get an extra six digits.

Why computers use binaries

Short answer: hardware and the laws of physics. Every character in your computer is an electrical signal, and in the early days of computing, measuring electrical signals was much more difficult. It was more reasonable to distinguish between only the "on" state, represented by a negative charge, and the "off" state, represented by a positive charge.

For those who don't know why "off" is represented by a positive charge, this is because electrons have a negative charge, and more electrons means more current with a negative charge.

Thus early room-sized computers used binaries to build their systems, and although they used older, bulkier equipment, they operated on the same fundamental principles. Modern computers use what is called transistor to perform calculations with binary code.

Here is a schematic of a typical transistor:

Basically, it allows current to flow from the source to the drain if there is current in the gate. This forms a binary key. Manufacturers can make these transistors as small as 5 nanometers, or as small as two strands of DNA. This is how modern processors work, and even they can suffer from problems distinguishing between on and off states (although this is due to their unrealistic molecular size, subject to oddities of quantum mechanics).

Why only binary system

So you might be thinking, “Why only 0 and 1? Why not add another number? Although this is partly due to the traditions of creating computers, at the same time, adding one more digit would mean the need to highlight one more state of the current, and not just “off” or “on”.

The problem here is that if you want to use multiple voltage levels, you need a way to easily perform calculations with them, and modern hardware capable of doing this is not viable as a replacement for binary calculations. For example, there is a so-called triple computer, developed in the 1950s, but development stopped there. Ternary logic more efficient than binary, but there is no effective replacement for the binary transistor yet, or at least no transistor as tiny as binary.

The reason we can't use ternary logic comes down to how transistors are connected in a computer and how they are used for mathematical calculations. The transistor receives information on two inputs, performs an operation and returns the result to one output.

Thus, binary mathematics is easier for a computer than anything else. Binary logic is easily converted to binary systems, with True and False corresponding to the On and Off states.

A binary truth table running on binary logic will have four possible outputs for each fundamental operation. But, since triple gates use three inputs, the triple truth table would have 9 or more. While a binary system has 16 possible operators (2^2^2), a ternary system would have 19683 (3^3^3). Scaling becomes a problem because although trinity is more efficient, it is also exponentially more complex.

Who knows? In the future, we may very well see ternary computers as binary logic has run into problems of miniaturization. For now, the world will continue to operate in binary mode.

Binary code capacity, Converting information from continuous to discrete form, Universality of binary coding, Uniform and non-uniform codes, Informatics Grade 7 Bosov, Informatics Grade 7

1.5.1. Converting information from continuous to discrete form
To solve their problems, a person often has to convert the available information from one form of representation to another. For example, when reading aloud, information is converted from discrete (text) form to continuous (sound). During a dictation in a Russian language lesson, on the contrary, information is transformed from a continuous form (the teacher's voice) to a discrete one (students' notes).
Information presented in a discrete form is much easier to transfer, store or automatically process. Therefore, in computer technology, much attention is paid to methods for converting information from a continuous form to a discrete one.
Discretization of information is the process of converting information from a continuous form of representation to a discrete one.
Consider the essence of the process of discretization of information on an example.
Meteorological stations have self-recording instruments for continuous recording of atmospheric pressure. The result of their work are barograms - curves showing how pressure has changed over long periods of time. One of such curves drawn by the instrument during seven hours of observations is shown in Fig. 1.9.

Based on the information obtained, it is possible to construct a table containing the instrument readings at the beginning of measurements and at the end of each hour of observations (Fig. 1.10).

The resulting table does not provide a complete picture of how the pressure changed during the observation period: for example, the highest pressure value that occurred during the fourth hour of observation is not indicated. But if you enter in the table the pressure values ​​observed every half hour or 15 minutes, then the new table will give a more complete picture of how the pressure has changed.
Thus, the information presented in a continuous form (barogram, curve), with some loss of accuracy, we converted into a discrete form (table).
In the future, you will get acquainted with the methods of discrete presentation of sound and graphic information.

Chains of three binary characters are obtained by complementing two-digit binary codes on the right with a 0 or 1 character. As a result, code combinations of three binary characters are 8 - twice as many as from two binary characters:
Accordingly, a four-bit binary allows you to get 16 code combinations, a five-bit one - 32, a six-bit one - 64, etc. The length of the binary chain - the number of characters in the binary code - is called the bit depth of the binary code.
Note that:
4 = 2 * 2,
8 = 2 * 2 * 2,
16 = 2 * 2 * 2 * 2,
32 = 2 * 2 * 2 * 2 * 2 etc.
Here, the number of code combinations is the product of a certain number of identical factors equal to the bit depth of the binary code.
If the number of code combinations is denoted by the letter N, and the bit depth of the binary code is denoted by the letter i, then the revealed pattern in general form will be written as follows:
N = 2 * 2 * ... * 2.
i factors
In mathematics, such products are written as:
N = 2i.
Record 2 i read like this: "2 to the i-th power."

A task. The leader of the Multi tribe instructed his minister to develop a binary and translate into it all the important information. What bit depth would be required if the alphabet used by the Multi tribe has 16 characters? Write down all code combinations.
Solution. Since the alphabet of the Multi tribe consists of 16 characters, then they need 16 code combinations. In this case, the length (digit capacity) of the binary code is determined from the ratio: 16 = 2 i . Hence i = 4.
To write out all code combinations of four 0 and 1, we use the diagram in Fig. 1.13: 0000, 0001, 0010, 0011, 0100, 0101, 0110,0111,1000,1001,1010,1011,1100,1101,1110,1111.

1.5.3. Versatility of Binary Encoding
At the beginning of this section, you learned that, when represented in continuous form, can be expressed using the symbols of some natural or formal language. In turn, arbitrary alphabet characters can be converted to binary. Thus, with the help of a binary code, any one can be represented in natural and formal languages, as well as images and sounds (Fig. 1.14). This means the universality of binary coding.
Binary codes are widely used in computer technology, requiring only two states of the electronic circuit - "on" (corresponding to the digit 1) and "off" (corresponding to the digit 0).
Simplicity of technical implementation is the main advantage of binary coding. The disadvantage of binary encoding is the large length of the resulting code.

1.5.4. Uniform and non-uniform codes
Distinguish uniform and non-uniform codes. Uniform codes in code combinations contain the same number of characters, uneven - different.
Above we have considered uniform binary codes.
An example of a non-uniform code is Morse code, in which a sequence of short and long signals is defined for each letter and number. So, the letter E corresponds to a short signal (“dot”), and the letter Ш corresponds to four long signals (four “dashes”). Uneven allows you to increase the speed of message transmission due to the fact that the most frequently occurring symbols in the transmitted information have the shortest code combinations.

The information given by this symbol is equal to the entropy of the system and is maximum in the case when both states are equally probable; in this case, the elementary symbol conveys information 1 (double units). Therefore, the basis of optimal coding will be the requirement that elementary characters in the encoded text occur on average equally often.

Let us describe here a method for constructing a code that satisfies the stated condition; this method is known as the Shannon-Fano code. Its idea is that the encoded characters (letters or combinations of letters) are divided into two approximately equally probable groups: for the first group of characters, 0 is placed in the first place of the combination (the first character of the binary number representing the character); for the second group - 1. Further, each group is again divided into two approximately equally probable subgroups; for symbols of the first subgroup, zero is put in the second place; for the second subgroup - one, etc.

Let's demonstrate the principle of constructing the Shannon - Fano code on the material of the Russian alphabet (Table 18.8.1). Let's count the first six letters (from "-" to "t"); summing up their probabilities (frequencies), we get 0.498; all other letters (from "n" to "sf") will have approximately the same probability of 0.502. The first six letters (from "-" to "t") will have a binary sign 0 in the first place. The remaining letters (from "n" to "f") will have a unit in the first place. Next, we again divide the first group into two approximately equally probable subgroups: from "-" to "o" and from "e" to "t"; for all letters of the first subgroup, we put zero in the second place, and one for the second subgroup. We will continue the process until exactly one letter remains in each subgroup, which will be encoded by a certain binary number. The code construction mechanism is shown in Table 18.8 .2, and the code itself is given in table 18.8.3.

Table 18.8.2.

Binary signs

Table 18.8.3

Table 18.8.3 can encode and decode any message.

As an example, let's write the phrase in binary code: "information theory"

01110100001101000110110110000

0110100011111111100110100

1100001011111110101100110

Note that here there is no need to separate letters from each other with a special sign, since even without this decoding is performed unambiguously. This can be verified by decoding the following phrase using Table 18.8.2:

10011100110011001001111010000

1011100111001001101010000110101

010110000110110110

("coding method").

However, it should be noted that any encoding error (accidental confusion of characters 0 and 1) with such a code is fatal, since decoding of the entire text following the error becomes impossible. Therefore, this coding principle can be recommended only in the case when errors in coding and transmission of the message are practically excluded.

A natural question arises: is the code we compiled in the absence of errors really optimal? In order to answer this question, let's find the average information per one elementary symbol (0 or 1) and compare it with the maximum possible information, which is equal to one binary unit. To do this, we first find the average information contained in one letter of the transmitted text, i.e., the entropy per one letter:

,

where is the probability that the letter will take a certain state (“-”, o, e, a, ..., f).

From Table. 18.8.1 we have

(two units per letter of the text).

According to table 18.8.2, we determine the average number of elementary characters per letter

Dividing the entropy by, we obtain information per elementary symbol

(two units).

Thus, the information per symbol is very close to its upper limit of 1, and the code we have chosen is very close to optimal. Remaining within the limits of the task of coding by letter, we can not get anything better.

Note that in the case of encoding just binary numbers of letters, we would have an image of each letter with five binary characters and the information per character would be

(two units),

i.e. noticeably less than with optimal letter coding.

However, it should be noted that spelling coding is not economical at all. The fact is that there is always a dependence between neighboring letters of any meaningful text. For example, a vowel in Russian cannot be followed by "ъ" or "ь"; after hissing, “I” or “yu” cannot stand; after several consonants in a row, the probability of a vowel increases, etc.

We know that when dependent systems are combined, the total entropy is less than the sum of the entropies of the individual systems; therefore, the information conveyed by a piece of connected text is always less than the information per character multiplied by the number of characters. Given this circumstance, a more economical code can be constructed if not each letter is encoded separately, but whole “blocks” of letters. For example, in the Russian text it makes sense to encode in its entirety some frequently occurring combinations of letters, such as “ts”, “ает”, “nie”, etc. The encoded blocks are arranged in descending order of frequencies, like the letters in Table. 18.8.1, and binary encoding is carried out according to the same principle.

In some cases, it turns out to be reasonable to encode not even blocks of letters, but whole meaningful pieces of text. For example, to unload the telegraph office on holidays, it is advisable to encode entire standard texts with conditional numbers, such as:

"Happy New Year, I wish you good health and success in your work."

Without dwelling specifically on block coding methods, we confine ourselves to formulating the related Shannon theorem.

Let there be a source of information and a receiver connected by a communication channel (Fig. 18.8.1).

The performance of the information source is known, i.e., the average number of binary units of information coming from the source per unit of time (numerically it is equal to the average entropy of the message produced by the sources per unit of time). Let, in addition, know the channel capacity, i.e., the maximum amount of information (for example, binary characters 0 or 1) that the channel is capable of transmitting in the same unit of time. The question arises: what should be the bandwidth of the channel in order for it to "cope" with its task, i.e., so that information from the source to the receiver arrives without delay?

The answer to this question is given by Shannon's first theorem. We formulate it here without proof.

1st Shannon's theorem

If the bandwidth of the communication channel is greater than the entropy of the information source per unit time

then it is always possible to encode a sufficiently long message so that it is transmitted by the communication channel without delay. If, on the contrary,

then the transmission of information without delay is impossible.