All characters and letters can be encoded using eight binary bits. The most common tables for representing letters in binary code are ASCII and ANSI, they can be used to write texts in microprocessors. In ASCII and ANSI tables, the first 128 characters are the same. This part of the table contains codes for numbers, punctuation marks, upper and lower case Latin letters, and control characters. National extensions of character tables and pseudographic symbols are contained in the last 128 codes of these tables, so the Russian texts in DOS and WINDOWS operating systems do not match.
At the first acquaintance with computers and microprocessors, the question may arise - "how to convert text to binary code?" However, this transformation is the simplest action! To do this, you need to use any text editor. Including the simplest notepad program, which is part of the Windows operating system, is also suitable. Similar editors are present in all programming environments for languages such as C, Pascal or Java. It should be noted that the most common text editor Word is not suitable for simply converting text to binary code. This test editor introduces a huge amount of additional information, such as the color of the letters, italics, underlining, the language in which a particular phrase is written, and the font.
It should be noted that, in fact, the combination of zeros and ones, with the help of which text information is encoded, is not a binary code, because bits in this code do not obey laws. However, on the Internet, the search phrase "representing letters in binary code" is the most common. Table 1 shows the correspondence of binary codes to letters of the Latin alphabet. For the sake of brevity, the sequence of zeros and ones in this table is presented in decimal and hexadecimal codes.
Table 1 Table of representation of Latin letters in binary code (ASCII)
Decimal code | Hex code | Display character | Meaning |
---|---|---|---|
0 | 00 | NUL | |
1 | 01 | ☺ | (display control word) |
2 | 02 | ☻ | (First transmitted word) |
3 | 03 | ETX (Last Transmission Word) | |
4 | 04 | ♦ | EOT (end of transmission) |
5 | 05 | ♣ | ENQ (initialization) |
6 | 06 | ♠ | ACK (acknowledgement) |
7 | 07 | BEL | |
8 | 08 | ◘ | BS |
9 | 09 | ○ | HT (horizontal tab |
10 | 0A | ◙ | LF (line feed) |
11 | 0B | ♂ | VT (vertical tab) |
12 | 0C | ♀ | FF (next page) |
13 | 0D | ♪ | CR (carriage return) |
14 | 0E | ♫ | SO (double width) |
15 | 0F | ☼ | SI (Condensed Seal) |
16 | 10 | DLE | |
17 | 11 | ◄ | DC1 |
18 | 12 | ↕ | DC2 (Cancel Condensed Print) |
19 | 13 | ‼ | DC3 (ready) |
20 | 14 | ¶ | DC4 (cancel double width) |
21 | 15 | § | NAC (Non Acknowledge) |
22 | 16 | ▬ | SYN |
23 | 17 | ↨ | ETB |
24 | 18 | CAN | |
25 | 19 | ↓ | EM |
26 | 1A | → | SUB |
27 | 1B | ← | ESC (Start Sequence Control) |
28 | 1C | ∟ | FS |
29 | 1D | ↔ | GS |
30 | 1E | ▲ | RS |
31 | 1F | ▼ | US |
32 | 20 | Space | |
33 | 21 | ! | Exclamation point |
34 | 22 | « | angle bracket |
35 | 23 | # | Number sign |
36 | 24 | $ | Currency sign (dollar) |
37 | 25 | % | Percent sign |
38 | 26 | & | Ampersand |
39 | 27 | " | Apostrophe |
40 | 28 | ( | opening bracket |
41 | 29 | ) | Close bracket |
42 | 2A | * | Star |
43 | 2B | + | plus sign |
44 | 2C | , | Comma |
45 | 2D | - | Minus sign |
46 | 2E | . | Dot |
47 | 2F | / | Fractional bar |
48 | 30 | 0 | Numeric zero |
49 | 31 | 1 | number one |
50 | 32 | 2 | Number two |
51 | 33 | 3 | number three |
52 | 34 | 4 | Number four |
53 | 35 | 5 | Number five |
54 | 36 | 6 | Number six |
55 | 37 | 7 | Number seven |
56 | 38 | 8 | Number eight |
57 | 39 | 9 | Number nine |
58 | 3A | : | Colon |
59 | 3B | ; | Semicolon |
60 | 3C | < | less than sign |
61 | 3D | = | equal sign |
62 | 3E | > | Sign more |
63 | 3F | ? | question mark |
64 | 40 | @ | Commercial floor |
65 | 41 | A | Capital latin letter A |
66 | 42 | B | Latin capital letter B |
67 | 43 | C | Latin capital letter C |
68 | 44 | D | Capital latin letter D |
69 | 45 | E | Latin capital letter E |
70 | 46 | F | Latin capital letter F |
71 | 47 | G | Latin capital letter G |
72 | 48 | H | Latin capital letter H |
73 | 49 | I | Capital latin letter I |
74 | 4A | J | Latin capital letter J |
75 | 4B | K | Capital latin letter K |
76 | 4C | L | Capital latin letter L |
77 | 4D | M | Capital latin letter |
78 | 4E | N | Capital latin letter N |
79 | 4F | O | Capital latin letter O |
80 | 50 | P | Latin capital letter P |
81 | 51 | Q | Capital latin letter |
82 | 52 | R | Latin capital letter R |
83 | 53 | S | Capital latin letter S |
84 | 54 | T | Latin capital letter T |
85 | 55 | U | Latin capital letter U |
86 | 56 | V | Capital latin letter V |
87 | 57 | W | Latin capital letter W |
88 | 58 | X | Capital latin letter X |
89 | 59 | Y | Capital latin letter Y |
90 | 5A | Z | Capital latin letter Z |
91 | 5B | [ | Opening square bracket |
92 | 5C | \ | Backslash |
93 | 5D | ] | Closing square bracket |
94 | 5E | ^ | "Lid" |
95 | 5 | _ | Underscore character |
96 | 60 | ` | Apostrophe |
97 | 61 | a | Latin lowercase letter a |
98 | 62 | b | Latin lowercase letter b |
99 | 63 | c | Latin lowercase letter c |
100 | 64 | d | Latin lowercase letter d |
101 | 65 | e | Latin lowercase letter e |
102 | 66 | f | Latin lowercase letter f |
103 | 67 | g | Latin lowercase letter g |
104 | 68 | h | Latin lowercase letter h |
105 | 69 | i | Latin lowercase letter i |
106 | 6A | j | Latin lowercase letter j |
107 | 6B | k | Latin lowercase letter k |
108 | 6C | l | Latin lowercase letter l |
109 | 6D | m | Latin lowercase letter m |
110 | 6E | n | Latin lowercase letter n |
111 | 6F | o | Latin lowercase letter o |
112 | 70 | p | Latin lowercase letter p |
113 | 71 | q | Latin lowercase letter q |
114 | 72 | r | Latin lowercase letter r |
115 | 73 | s | Latin lowercase letter s |
116 | 74 | t | Latin lowercase letter t |
117 | 75 | u | Latin lowercase letter u |
118 | 76 | v | Latin lowercase letter v |
119 | 77 | w | Latin lowercase letter w |
120 | 78 | x | lowercase latin letter x |
121 | 79 | y | Latin lowercase letter y |
122 | 7A | z | Latin lowercase letter z |
123 | 7B | { | Open curly brace |
124 | 7C | | | vertical bar |
125 | 7D | } | Close curly brace |
126 | 7E | ~ | Tilde |
127 | 7F | ⌂ |
In the classic version of the ASCII character table, there are no Russian letters and it consists of 7 bits. However, later this table was expanded to 8 bits, and Russian letters in binary code and pseudographic symbols appeared in the upper 128 lines. In the general case, the second part contains the national alphabets of different countries and the Russian letters there are just one of the possible sets (855) there can be a French (863), German (1141) or Greek (737) table. Table 2 shows an example of representation of Russian letters in binary code.
Table 2. Table of representation of Russian letters in binary code (ASCII)
Decimal code | Hex code | Display character | Meaning |
---|---|---|---|
128 | 80 | BUT | Capital Russian letter A |
129 | 81 | B | Capital Russian letter B |
130 | 82 | AT | Capital Russian letter V |
131 | 83 | G | Capital Russian letter G |
132 | 84 | D | Capital Russian letter D |
133 | 85 | E | Capital Russian letter E |
134 | 86 | AND | Capital Russian letter Zh |
135 | 87 | W | Capital Russian letter Z |
136 | 88 | And | Capital Russian letter I |
137 | 89 | Y | Capital Russian letter Y |
138 | 8A | To | Capital Russian letter K |
139 | 8B | L | Capital Russian letter L |
140 | 8C | M | Capital Russian letter M |
141 | 8D | H | Capital Russian letter H |
142 | 8E | O | Capital Russian letter O |
143 | 8F | P | Capital Russian letter P |
144 | 90 | R | Capital Russian letter R |
145 | 91 | FROM | Capital Russian letter C |
146 | 92 | T | Capital Russian letter T |
147 | 93 | At | Capital Russian letter U |
148 | 94 | F | Capital Russian letter F |
149 | 95 | X | Capital Russian letter X |
150 | 96 | C | Capital Russian letter C |
151 | 97 | H | Capital Russian letter Ch |
152 | 98 | W | Capital Russian letter Sh |
153 | 99 | SCH | Capital Russian letter Ш |
154 | 9A | Kommersant | Capital Russian letter Ъ |
155 | 9B | S | Capital Russian letter Y |
156 | 9C | b | Capital Russian letter b |
157 | 9D | E | Capital Russian letter E |
158 | 9E | YU | Capital Russian letter Yu |
159 | 9F | I | Capital Russian letter Ya |
160 | A0 | a | Lowercase Russian letter a |
161 | A1 | b | Lowercase Russian letter b |
162 | A2 | in | Lowercase Russian letter v |
163 | A3 | G | Lowercase Russian letter g |
164 | A4 | d | Lowercase Russian letter d |
165 | A5 | e | Lowercase Russian letter e |
166 | A6 | and | Lowercase Russian letter zh |
167 | A7 | h | Lowercase russian letter z |
168 | A8 | and | Lowercase Russian letter and |
169 | A9 | th | Lowercase Russian letter y |
170 | AA | to | Lowercase Russian letter k |
171 | AB | l | Lowercase Russian letter l |
172 | AC | m | Lowercase Russian letter m |
173 | AD | n | Lowercase Russian letter n |
174 | AE | about | Lowercase Russian letter o |
175 | AF | P | Lowercase Russian letter p |
176 | B0 | ░ | |
177 | B1 | ▒ | |
178 | B2 | ▓ | |
179 | B3 | │ | Pseudo symbol |
180 | B4 | ┤ | Pseudo symbol |
181 | B5 | ╡ | Pseudo symbol |
182 | B6 | ╢ | Pseudo symbol |
183 | B7 | ╖ | Pseudo symbol |
184 | B8 | ╕ | Pseudo symbol |
185 | B9 | ╣ | Pseudo symbol |
186 | BA | ║ | Pseudo symbol |
187 | BB | ╗ | Pseudo symbol |
188 | BC | ╝ | Pseudo symbol |
189 | BD | ╜ | Pseudo symbol |
190 | BE | ╛ | Pseudo symbol |
191 | bf | ┐ | Pseudo symbol |
192 | C0 | └ | Pseudo symbol |
193 | C1 | ┴ | Pseudo symbol |
194 | C2 | ┬ | Pseudo symbol |
195 | C3 | ├ | Pseudo symbol |
196 | C4 | ─ | Pseudo symbol |
197 | C5 | ┼ | Pseudo symbol |
198 | C6 | ╞ | Pseudo symbol |
199 | C7 | ╟ | Pseudo symbol |
200 | C8 | ╚ | Pseudo symbol |
201 | C9 | ╔ | Pseudo symbol |
202 | CA | ╩ | Pseudo symbol |
203 | CB | ╦ | Pseudo symbol |
204 | CC | ╠ | Pseudo symbol |
205 | CD | ═ | Pseudo symbol |
206 | CE | ╬ | Pseudo symbol |
207 | CF | ╧ | Pseudo symbol |
208 | D0 | ╨ | Pseudo symbol |
209 | D1 | ╤ | Pseudo symbol |
210 | D2 | ╥ | Pseudo symbol |
211 | D3 | ╙ | Pseudo symbol |
212 | D4 | ╘ | Pseudo symbol |
213 | D5 | ╒ | Pseudo symbol |
214 | D6 | ╓ | Pseudo symbol |
215 | D7 | ╫ | Pseudo symbol |
216 | D8 | ╪ | Pseudo symbol |
217 | D9 | ┘ | Pseudo symbol |
218 | DA | ┌ | Pseudo symbol |
219 | D.B. | █ | |
220 | DC | ▄ | |
221 | DD | ▌ | |
222 | DE | ▐ | |
223 | D.F. | ▀ | |
224 | E0 | R | Lowercase Russian letter p |
225 | E1 | With | Lowercase Russian letter c |
226 | E2 | t | Lowercase Russian letter t |
227 | E3 | at | Lowercase Russian letter u |
228 | E4 | f | Lowercase Russian letter f |
229 | E5 | X | Lowercase Russian letter x |
230 | E6 | c | Lowercase Russian letter c |
231 | E7 | h | Lowercase Russian letter h |
232 | E8 | w | Lowercase Russian letter sh |
233 | E9 | sch | Lowercase Russian letter u |
234 | EA | b | Lowercase Russian letter ъ |
235 | EB | s | Lowercase Russian letter y |
236 | EU | b | Lowercase Russian letter ь |
237 | ED | uh | Lowercase Russian letter e |
238 | EE | Yu | Lowercase Russian letter u |
239 | EF | I | Lowercase Russian letter i |
240 | F0 | Yo | Capital Russian letter Yo |
241 | F1 | yo | Lowercase Russian letter ё |
242 | F2 | Є | |
243 | F3 | є | |
244 | F4 | Ї | |
245 | F5 | Ї | |
246 | F6 | Ў | |
247 | F7 | ў | |
248 | F8 | ° | degree sign |
249 | F9 | ∙ | Multiplication sign (dot) |
250 | FA | · | |
251 | √ | Radical (taking the root) | |
252 | FC | № | Number sign |
253 | FD | ¤ | Currency sign (ruble) |
254 | F.E. | ■ | |
255 | FF |
When writing texts, in addition to binary codes that directly display letters, codes are used that indicate the transition to a new line and the return of the cursor (carriage return) to the zero position of the line. These characters are usually used together. Their binary codes correspond to decimal numbers - 10 (0A) and 13 (0D). As an example, below is a section of the text of this page (memory dump). This section contains the first paragraph. The following format is used to display information in a memory dump:
- the first column contains the binary address of the first byte of the string
- the next sixteen columns contain the bytes contained in the text file. For a more convenient determination of the byte number, a vertical line is drawn after the eighth column. Bytes, for brevity, are represented in hexadecimal code.
- in the last column, these same bytes are represented as displayed alphabetic characters
In the above example, you can see that the first line of text is 80 bytes. The first byte 82 corresponds to the letter "B". The second byte E1 corresponds to the letter "c". The third byte A5 corresponds to the letter "e". The next byte 20 represents the empty space between words (space) " ". Bytes 81 and 82 contain carriage return and line feed characters 0D 0A. We find these characters at the binary address 00000050: The next line of the source text is not a multiple of 16 (its length is 76 letters), so in order to find its end, you first need to find the line 000000E0: and count nine columns from it. Carriage return and line feed bytes 0D 0A are written there again. The rest of the text is parsed in exactly the same way.
Date of the last update of the file 04.12.2018
Literature:
Together with the article "Writing texts in binary code" they read:
Representation of binary numbers in the memory of a computer or microcontroller
http://website/proc/IntCod.php
Sometimes it is convenient to store numbers in the processor memory in decimal form.
http://website/proc/DecCod.php
Standard floating point formats for computers and microcontrollers
http://website/proc/float/
Currently, both positional and non-positional number systems are widely used both in technology and in everyday life.
.php
Since it is the simplest and meets the requirements:
- The fewer values that exist in the system, the easier it is to make individual elements that operate on these values. In particular, two digits of the binary number system can be easily represented by many physical phenomena: there is current - there is no current, the magnetic field induction is greater than the threshold value or not, etc.
- The lower the number of states for an element, the higher the noise immunity and the faster it can work. For example, to encode three states through the value of the magnetic field induction, it will be necessary to enter two threshold values, which will not contribute to the noise immunity and reliability of information storage.
- Binary arithmetic is pretty simple. Simple are the tables of addition and multiplication - the basic operations on numbers.
- It is possible to use the apparatus of the algebra of logic to perform bitwise operations on numbers.
Links
- Online calculator for converting numbers from one number system to another
Wikimedia Foundation. 2010 .
See what "Binary Code" is in other dictionaries:
2 Bittal Code of Gray 00 01 11 10 3 Bit code Gray 000 000 001 011 010 110 111 101 100 4 Bit code Gray 0000 00 0001 0011 0010 0110 01111 0100 1100 1101 1111 1110 1010 1011 1000 Gray Custom Code, in which there are two neighboring values in which there are two neighboring values … … Wikipedia
The signal point code (English Signal Point Code (SPC)) of the signaling system 7 (SS7, SS 7) is a unique (on the home network) node address used at the third MTP level (routing) in telecommunications SS 7 networks to identify ... Wikipedia
In mathematics, a squareless number is a number that is not divisible by any square other than 1. For example, 10 is squareless, but 18 is not, since 18 is divisible by 9 = 32. The beginning of the sequence of squareless numbers is: 1, 2, 3, 5, 6, 7, ... ... Wikipedia
Would you like to improve this article?: Wikify the article. Rework the design in accordance with the rules for writing articles. Correct the article according to the stylistic rules of Wikipedia ... Wikipedia
This term has other meanings, see Python (disambiguation). Python Language class: mu ... Wikipedia
In the narrow sense of the word, at present, the phrase is understood as "Attack on the security system", and tends rather to the meaning of the following term Cracker attack. This was due to a distortion of the meaning of the word "hacker". Hacker ... ... Wikipedia
I decided to make such a tool as converting text to binary code and vice versa, there are such services, but they usually work with Latin, but mine translator works with UTF-8 unicode encoding, which encodes Cyrillic characters in two bytes. It's impossible to translate Chinese characters, but I'm going to correct this unfortunate misunderstanding.
To convert text to binary representation enter the text in the left box and press TEXT->BIN in the right box, its binary representation will appear.
To convert binary code to text enter the code in the right window and press BIN->TEXT in the left window its symbolic representation will appear.
If converting binary code to text or vice versa did not work out - check the correctness of your data!
Update!
The reverse view text transformation is now available:
into a normal look. To do this, check the box: "Replace 0 with spaces and 1 with placeholder █". Then paste the text in the right box: "Text in binary representation" and press the button below it "BIN->TEXT".
When copying such texts, you need to be careful because. you can easily lose spaces at the beginning or at the end. For example, the line above looks like:
██ █ █ ███████ █ ██ ██ █ █ ███ ██ █ █ ██ █ ██ █ █ ██ █ ███ █ ██ █ █ ██ █ █ ███ ██ █ █ ███ ██ █ ██
and on a red background:
██ █ █ ███████ █ ██ ██ █ █ ███ ██ █ █ ██ █ ██ █ █ ██ █ ███ █ ██ █ █ ██ █ █ ███ ██ █ █ ███ ██ █ ██
see how many spaces at the end can be lost?
Computers don't understand words and numbers the way humans do. Modern software allows the end user to ignore this, but at the lowest levels your computer operates with a binary electrical signal that has only two states: there is current or no current. To "understand" complex data, your computer must encode it in binary.
The binary system is based on two digits, 1 and 0, corresponding to on and off states that your computer can understand. You are probably familiar with the decimal system. It uses ten digits, from 0 to 9, and then moves on to the next order to form two-digit numbers, with the digit from each order ten times the previous one. The binary system is similar, with each digit being twice as large as the previous one.
Counting in Binary
In binary, the first digit is equivalent to 1 in decimal. The second digit is 2, the third is 4, the fourth is 8, and so on—doubling each time. Adding all these values will give you a number in decimal format.
1111 (binary) = 8 + 4 + 2 + 1 = 15 (decimal)
Accounting for 0 gives us 16 possible values for four binary bits. Move 8 bits and you get 256 possible values. This takes up a lot more space to represent, since four digits in decimal gives us 10,000 possible values. Of course, binary code takes up more space, but computers understand binary files much better than the decimal system. And for some things, like logic processing, binary is better than decimal.
It should be said that there is another basic system that is used in programming: hexadecimal. Although computers do not work in hexadecimal, programmers use it to represent binary addresses in a human-readable format when writing code. This is because two digits of a hexadecimal number can represent a whole byte, that is, they replace eight digits in binary. The hexadecimal system uses the numbers 0-9, as well as the letters A through F, to get an extra six digits.
Why computers use binaries
Short answer: hardware and the laws of physics. Every character in your computer is an electrical signal, and in the early days of computing, measuring electrical signals was much more difficult. It was more reasonable to distinguish between only the "on" state, represented by a negative charge, and the "off" state, represented by a positive charge.
For those who don't know why "off" is represented by a positive charge, this is because electrons have a negative charge, and more electrons means more current with a negative charge.
Thus early room-sized computers used binaries to build their systems, and although they used older, bulkier equipment, they operated on the same fundamental principles. Modern computers use what is called transistor to perform calculations with binary code.
Here is a schematic of a typical transistor:
Basically, it allows current to flow from the source to the drain if there is current in the gate. This forms a binary key. Manufacturers can make these transistors as small as 5 nanometers, or as small as two strands of DNA. This is how modern processors work, and even they can suffer from problems distinguishing between on and off states (although this is due to their unrealistic molecular size, subject to oddities of quantum mechanics).
Why only binary system
So you might be thinking, “Why only 0 and 1? Why not add another number? Although this is partly due to the traditions of creating computers, at the same time, adding one more digit would mean the need to highlight one more state of the current, and not just “off” or “on”.
The problem here is that if you want to use multiple voltage levels, you need a way to easily perform calculations with them, and modern hardware capable of doing this is not viable as a replacement for binary calculations. For example, there is a so-called triple computer, developed in the 1950s, but development stopped there. Ternary logic more efficient than binary, but there is no effective replacement for the binary transistor yet, or at least no transistor as tiny as binary.
The reason we can't use ternary logic comes down to how transistors are connected in a computer and how they are used for mathematical calculations. The transistor receives information on two inputs, performs an operation and returns the result to one output.
Thus, binary mathematics is easier for a computer than anything else. Binary logic is easily converted to binary systems, with True and False corresponding to the On and Off states.
A binary truth table running on binary logic will have four possible outputs for each fundamental operation. But, since triple gates use three inputs, the triple truth table would have 9 or more. While a binary system has 16 possible operators (2^2^2), a ternary system would have 19683 (3^3^3). Scaling becomes a problem because although trinity is more efficient, it is also exponentially more complex.
Who knows? In the future, we may very well see ternary computers as binary logic has run into problems of miniaturization. For now, the world will continue to operate in binary mode.
Binary code capacity, Converting information from continuous to discrete form, Universality of binary coding, Uniform and non-uniform codes, Informatics Grade 7 Bosov, Informatics Grade 7
1.5.1. Converting information from continuous to discrete form
To solve their problems, a person often has to convert the available information from one form of representation to another. For example, when reading aloud, information is converted from discrete (text) form to continuous (sound). During a dictation in a Russian language lesson, on the contrary, information is transformed from a continuous form (the teacher's voice) to a discrete one (students' notes).
Information presented in a discrete form is much easier to transfer, store or automatically process. Therefore, in computer technology, much attention is paid to methods for converting information from a continuous form to a discrete one.
Discretization of information is the process of converting information from a continuous form of representation to a discrete one.
Consider the essence of the process of discretization of information on an example.
Meteorological stations have self-recording instruments for continuous recording of atmospheric pressure. The result of their work are barograms - curves showing how pressure has changed over long periods of time. One of such curves drawn by the instrument during seven hours of observations is shown in Fig. 1.9.
Based on the information obtained, it is possible to construct a table containing the instrument readings at the beginning of measurements and at the end of each hour of observations (Fig. 1.10).
The resulting table does not provide a complete picture of how the pressure changed during the observation period: for example, the highest pressure value that occurred during the fourth hour of observation is not indicated. But if you enter in the table the pressure values observed every half hour or 15 minutes, then the new table will give a more complete picture of how the pressure has changed.
Thus, the information presented in a continuous form (barogram, curve), with some loss of accuracy, we converted into a discrete form (table).
In the future, you will get acquainted with the methods of discrete presentation of sound and graphic information.
Chains of three binary characters are obtained by complementing two-digit binary codes on the right with a 0 or 1 character. As a result, code combinations of three binary characters are 8 - twice as many as from two binary characters:
Accordingly, a four-bit binary allows you to get 16 code combinations, a five-bit one - 32, a six-bit one - 64, etc. The length of the binary chain - the number of characters in the binary code - is called the bit depth of the binary code.
Note that:
4 = 2 * 2,
8 = 2 * 2 * 2,
16 = 2 * 2 * 2 * 2,
32 = 2 * 2 * 2 * 2 * 2 etc.
Here, the number of code combinations is the product of a certain number of identical factors equal to the bit depth of the binary code.
If the number of code combinations is denoted by the letter N, and the bit depth of the binary code is denoted by the letter i, then the revealed pattern in general form will be written as follows:
N = 2 * 2 * ... * 2.
i factors
In mathematics, such products are written as:
N = 2i.
Record 2 i read like this: "2 to the i-th power."
A task. The leader of the Multi tribe instructed his minister to develop a binary and translate into it all the important information. What bit depth would be required if the alphabet used by the Multi tribe has 16 characters? Write down all code combinations.
Solution. Since the alphabet of the Multi tribe consists of 16 characters, then they need 16 code combinations. In this case, the length (digit capacity) of the binary code is determined from the ratio: 16 = 2 i . Hence i = 4.
To write out all code combinations of four 0 and 1, we use the diagram in Fig. 1.13: 0000, 0001, 0010, 0011, 0100, 0101, 0110,0111,1000,1001,1010,1011,1100,1101,1110,1111.
1.5.3. Versatility of Binary Encoding
At the beginning of this section, you learned that, when represented in continuous form, can be expressed using the symbols of some natural or formal language. In turn, arbitrary alphabet characters can be converted to binary. Thus, with the help of a binary code, any one can be represented in natural and formal languages, as well as images and sounds (Fig. 1.14). This means the universality of binary coding.
Binary codes are widely used in computer technology, requiring only two states of the electronic circuit - "on" (corresponding to the digit 1) and "off" (corresponding to the digit 0).
Simplicity of technical implementation is the main advantage of binary coding. The disadvantage of binary encoding is the large length of the resulting code.
1.5.4. Uniform and non-uniform codes
Distinguish uniform and non-uniform codes. Uniform codes in code combinations contain the same number of characters, uneven - different.
Above we have considered uniform binary codes.
An example of a non-uniform code is Morse code, in which a sequence of short and long signals is defined for each letter and number. So, the letter E corresponds to a short signal (“dot”), and the letter Ш corresponds to four long signals (four “dashes”). Uneven allows you to increase the speed of message transmission due to the fact that the most frequently occurring symbols in the transmitted information have the shortest code combinations.
The information given by this symbol is equal to the entropy of the system and is maximum in the case when both states are equally probable; in this case, the elementary symbol conveys information 1 (double units). Therefore, the basis of optimal coding will be the requirement that elementary characters in the encoded text occur on average equally often.
Let us describe here a method for constructing a code that satisfies the stated condition; this method is known as the Shannon-Fano code. Its idea is that the encoded characters (letters or combinations of letters) are divided into two approximately equally probable groups: for the first group of characters, 0 is placed in the first place of the combination (the first character of the binary number representing the character); for the second group - 1. Further, each group is again divided into two approximately equally probable subgroups; for symbols of the first subgroup, zero is put in the second place; for the second subgroup - one, etc.
Let's demonstrate the principle of constructing the Shannon - Fano code on the material of the Russian alphabet (Table 18.8.1). Let's count the first six letters (from "-" to "t"); summing up their probabilities (frequencies), we get 0.498; all other letters (from "n" to "sf") will have approximately the same probability of 0.502. The first six letters (from "-" to "t") will have a binary sign 0 in the first place. The remaining letters (from "n" to "f") will have a unit in the first place. Next, we again divide the first group into two approximately equally probable subgroups: from "-" to "o" and from "e" to "t"; for all letters of the first subgroup, we put zero in the second place, and one for the second subgroup. We will continue the process until exactly one letter remains in each subgroup, which will be encoded by a certain binary number. The code construction mechanism is shown in Table 18.8 .2, and the code itself is given in table 18.8.3.
Table 18.8.2.
Binary signs |
|||||||||
Table 18.8.3
Table 18.8.3 can encode and decode any message.
As an example, let's write the phrase in binary code: "information theory"
01110100001101000110110110000
0110100011111111100110100
1100001011111110101100110
Note that here there is no need to separate letters from each other with a special sign, since even without this decoding is performed unambiguously. This can be verified by decoding the following phrase using Table 18.8.2:
10011100110011001001111010000
1011100111001001101010000110101
010110000110110110
("coding method").
However, it should be noted that any encoding error (accidental confusion of characters 0 and 1) with such a code is fatal, since decoding of the entire text following the error becomes impossible. Therefore, this coding principle can be recommended only in the case when errors in coding and transmission of the message are practically excluded.
A natural question arises: is the code we compiled in the absence of errors really optimal? In order to answer this question, let's find the average information per one elementary symbol (0 or 1) and compare it with the maximum possible information, which is equal to one binary unit. To do this, we first find the average information contained in one letter of the transmitted text, i.e., the entropy per one letter:
,
where is the probability that the letter will take a certain state (“-”, o, e, a, ..., f).
From Table. 18.8.1 we have
(two units per letter of the text).
According to table 18.8.2, we determine the average number of elementary characters per letter
Dividing the entropy by, we obtain information per elementary symbol
(two units).
Thus, the information per symbol is very close to its upper limit of 1, and the code we have chosen is very close to optimal. Remaining within the limits of the task of coding by letter, we can not get anything better.
Note that in the case of encoding just binary numbers of letters, we would have an image of each letter with five binary characters and the information per character would be
(two units),
i.e. noticeably less than with optimal letter coding.
However, it should be noted that spelling coding is not economical at all. The fact is that there is always a dependence between neighboring letters of any meaningful text. For example, a vowel in Russian cannot be followed by "ъ" or "ь"; after hissing, “I” or “yu” cannot stand; after several consonants in a row, the probability of a vowel increases, etc.
We know that when dependent systems are combined, the total entropy is less than the sum of the entropies of the individual systems; therefore, the information conveyed by a piece of connected text is always less than the information per character multiplied by the number of characters. Given this circumstance, a more economical code can be constructed if not each letter is encoded separately, but whole “blocks” of letters. For example, in the Russian text it makes sense to encode in its entirety some frequently occurring combinations of letters, such as “ts”, “ает”, “nie”, etc. The encoded blocks are arranged in descending order of frequencies, like the letters in Table. 18.8.1, and binary encoding is carried out according to the same principle.
In some cases, it turns out to be reasonable to encode not even blocks of letters, but whole meaningful pieces of text. For example, to unload the telegraph office on holidays, it is advisable to encode entire standard texts with conditional numbers, such as:
"Happy New Year, I wish you good health and success in your work."
Without dwelling specifically on block coding methods, we confine ourselves to formulating the related Shannon theorem.
Let there be a source of information and a receiver connected by a communication channel (Fig. 18.8.1).
The performance of the information source is known, i.e., the average number of binary units of information coming from the source per unit of time (numerically it is equal to the average entropy of the message produced by the sources per unit of time). Let, in addition, know the channel capacity, i.e., the maximum amount of information (for example, binary characters 0 or 1) that the channel is capable of transmitting in the same unit of time. The question arises: what should be the bandwidth of the channel in order for it to "cope" with its task, i.e., so that information from the source to the receiver arrives without delay?
The answer to this question is given by Shannon's first theorem. We formulate it here without proof.
1st Shannon's theorem
If the bandwidth of the communication channel is greater than the entropy of the information source per unit time
then it is always possible to encode a sufficiently long message so that it is transmitted by the communication channel without delay. If, on the contrary,
then the transmission of information without delay is impossible.