Last Updated:

Scalar data type | Perl

The scalar data type in Perl is designed to represent and process numeric data (numbers) and a sequence of characters called strings. To specify the listed data in the program, literal constants are used, or literals: numeric and string.

Numeric literals are used to represent the ordinary numbers needed to implement an algorithm in a Perl program. Usually, numbers with a base of ten or decimal numbers are used, but the language allows you to use both octal (with a base of eight) and hexadecimal (with a base of sixteen) numbers, which are useful when working with the contents of computer memory in the process of solving some system problems.

Decimal numbers can be integers or fractional real numbers, which in programming are often referred to as floating-point numbers because of the way they are represented and stored in computer memory. The corresponding literals are no different from writing similar numbers in mathematics: a sequence of digits without spaces for integers and a sequence of digits in which a point separates an integer part from a fractional part for real numbers (Example 3.1).

Example 3.1. Numerical literals.

123 # Integer decimal number.
234.89 # Real number.
0.6780 # Real with zero integer part
678 # Leading zeros can be omitted
1_000_000.67 # To separate digits in the integer part of the number
# you can use the underscore character.

For floating-point real numbers, the exponential form of the notation can also be used:

[numbers]. [numbers] [E | e] [+1 - ] [numbers] 

This form of notation means that the value of a number is followed by its mantissa, specified in the form of a real number with a period ([digits]. [digits]), multiply by ten to the power of a number with a sign given in its exponential part after the symbol E or e (example 3.2).

Example 3.2. Exponential form of writing real numbers.

10.67E56 # The "+" sign in the exponent can be omitted.
10.67e+06 # This makes the exponent easier to read.
1e-203 # Number close to machine zero.
1e+308 # The number is close to an infinite number.

The Perl Interpreter represents all numbers (both integers and real numbers) in a floating-point format of double precision. This means that you cannot realistically specify more than sixteen significant digits of the mantissa, and the exponent is limited to the range from -323 to +308. The interpreter will not generate errors if the mantissa is greater than 16 digits and the exponent is 3 digits, but when such numbers are displayed, the mantissa will be reduced to sixteen significant digits. If the exponent is less than the lower limit, then zero will be displayed, and if the upper limit is greater, then the special character 1.#INF is used, denoting an infinitely large number. Such an algorithm for representing very large and very small numbers does not lead to the emergence, respectively, of overflow errors and the disappearance of the order inherent in many programming languages. If you specify an integer with a number of significant digits greater than 15, it will be displayed as a real number in exponential form when output

Some system settings or analysis of some system parameters is easier to perform using the numbers represented in octal or hexadecimal number systems. The form of writing such numbers is similar to their syntax in the C language: any integer starting with the zero "o" is treated by the interpreter as an octal integer, and the characters immediately following the combination "Oh" are treated as hexadecimal digits. When using octal numbers, it should be remembered that they cannot have a digit greater than 7, and hexadecimal numbers other than ten digits from 0 to 9 use the letters A or a, in or b, c or c, D or d, E or e, F or f to indicate the missing digits of the number (example 3.3).

Example 3.3. Eight-digit and hexadecimal numbers.

010 # Octal 10 equals decimal 8.
0x10 # Hexadecimal 10 equals decimal 16.
0239 # Will cause an interpretation error: the digit 9 cannot be used.
OxAIFF # Corresponds to 41477 decimal.
OxGA # Will cause an interpretation error: the letter G cannot be used.

The assignment of hexadecimal digits is the only case in Perl where uppercase and lowercase letters are identical; in other uses, such as identifiers, they are different

You cannot use the sequence "oh" instead of the sequence of characters "Oh", which identifies hexadecimal numbers

String literals, or simply strings, represent a sequence of characters enclosed in single ('), double ("), or inverse (') quotation marks, which is treated as a single whole. Using single and double quotation marks to specify strings is similar to using them for the same purpose on a UNIX system.

In a string bounded by single quotation marks, you cannot use ESC or control sequences, and you cannot substitute the value of a variable into it. The only exception is two control sequences: (V) and (\\). The first is used to display a single quotation mark in the string itself, because otherwise the interpreter would treat the first single quotation mark it encounters as a sign of the completion of the string, which would not correspond to its inclusion in the string. The second sequence is used to display the backslash itself. Examples of setting string literals limited to single quotation marks can be found in Table. 3.2.

Table 3.2. Character literals bounded by single quotation marks.

'Simple line #1'Simple line #1String without control sequences
'Vperl.exeV ''perl.exe'String with single quotation marks
'D: \\perl.exe'D: \perl. eheInverse decimal string
'sequence \n'Sequence \nThe \n control sequence does not affect the display of the string
'Breakfast Ham sandwich A cup of coffee'Breakfast Ham sandwich A cup of coffeeA multiline character literal is displayed in multiple lines

esc sequences that consist of a backslash followed by a letter or a combination of numbers. they treat the backslash character as a character that changes the meaning of the letter. together they are one and perform a specific action when output to the display device such as a new line (\n). The combination of numbers is treated as the ascii code of the character being displayed. The name of these sequences comes from the English word "escape", meaning to change the meaning. They are also called control sequences

A string literal can span multiple lines of a program (see the last literal in Table 3.2). To do this, when you type it from the keyboard, use the Enter key to switch to a new line.

Multiline literals are displayed on as many lines as they are specified. This means that a new-line character entered from the keyboard is stored in a character literal bounded by single quotation marks. It should be noted that this is also true for string literals bounded by double quotation marks.

Double quotation marks allow you to insert and interpret control sequences, and to substitute the values of variables that contain scalars or lists. Control sequences (Table 3.3) can be interpreted as newline characters, tabs, etc., when outputting lines, and can change the case of the letters following them.

Table 3.3. Control sequences.

Control sequenceMeaning
\bStep back
\eESC symbol
\fFormat translation
\nMove to a new line
\rCarriage return
\tHorizontal tabs
\vVertical tabs
\$Dollar sign
\@Ampersand or AT commercial
\0nnnOctal symbol code
\xnnHexadecimal character code
\cnEmulates pressing CTRL+. for example \cs corresponds to ctrl>+
\lConverts the following character to lowercase
\uConverts the following character to uppercase
\LConverts the sequence of characters that follows it, bounded by the \E control sequence, to lowercase
\Qin the sequence of characters that follows, bounded by the control sequence \e, a fractional inverse line is inserted before each non-alphanumeric character
\UConverts the sequence of characters that follows it, bounded by the \E control sequence, to uppercase
\ELimits the action of control sequences \L, \Q AND \U
\\Inverse fractional dash symbol
\"Double quotation marks
\'Single quotation marks

If a backslash in a string literal bounded by double quotation marks is followed by a character that does not form a control sequence with it, the backslash is not displayed when the string is output to the display device

String literals in double quotation marks are useful for structured output of text information specified by a single line. Examples of strings in double quotation marks are presented in Table. 3.4.

Table 3.4. Character literals bounded by double quotation marks.

"'\Uline\E #1"LINE #1The control sequences of the case translation \l, \u, \l and \ and act only on the letters of the Latin alphabet and are not applicable to the letters of the Russian alphabet
"End of page\f"End of pageWhen displayed on a monitor screen or in a file, a character for going to a new page is displayed at the end of the line; when you print to a printer printing starts on a new page after you display this line
" \t3awpak\nBeatbrod with ham\pA cup of coffee\n"Breakfast Ham sandwich A cup of coffeeThe character literal is specified by a single string with control characters


The last type of string literals are strings in reverse quotation marks, which are essentially not strings of data in the sense that the characters they contain are not treated as some stream of displayed characters when the Perl interpreter outputs. When the interpreter encounters a string in reverse quotation marks, the interpreter passes it for processing to the operating system under which it operates: Windows, UNIX, or some other, which executes the command passed to it and returns to the Perl program the results of its execution in the form of a string that can later be used to organize calculations.

Thus, strings in reverse quotation marks must contain a sequence of characters that are significant for the operating system: the operating system command, the application load string, etc. For example, when the string ~dir~ is displayed, the print statement will not display the word "dir", but the result of executing the dir command of the operating system. On Windows, this command displays the contents of the current folder (Example 3.4).

Volume in device D is unlabeled Volume serial number: 1F66-19F2
Contents of the D:\PerlOurBook directory
<CATALOGUE> 09.01.00 16:01.
<CATALOGUE> 09.01.00 16:01..
EXAMPLE PL 32 23.01.00 11:56
01 <CATALOG> 11.01.00 14:12 01
02 <CATALOGUE> 11.01.00 14:12 02
03 <CATALOGUE> 11.01.00 14:12 03
PERLINF TXT 1 781 12.01.00 11:39 perlinf.txt
EXAMPLE1 PL 347 18.01.00 18:02
3 file(s) 2 160 bytes
5 directory(s) 78 086 144 bytes free

In Perl, as in UNIX, strings in reverse quotation marks are used to "enter" into the program the results of executing not only system commands, but also the results of the execution of another program displayed on the monitor screen, since you can always pass the name of the loaded program module to the command shell for execution.

Some characters (also called metacharacters) have special meanings for the command shell. These include *-<>,?|, and &. In a UNIX system, to change the interpretation of a metacharacter as a character with a special meaning, put a backslash in front of it that changes (escape ) its special purpose. It is now treated by the command shell simply as a symbol representing itself. If there are many such special characters in the entered line, then the user needs to put a backslash before each one, which leads to poor readability of the entire line. To avoid such "difficulties", UNIX uses strings in single quotation marks, in which all characters are interpreted as they are. Single quotation marks perform the same function in The Perl language

In UNIX, it is common to substitute variable values into a command string that is passed to the shell for processing. When you specify a command in an input line, double-quotation marks are used, which, like single quotation marks, override special metacharacter values, except for the $ character, which is used to substitute the variable value. A backslash in front of it changes its special value. It was this mechanism of double quotation marks that served as a prototype for similar constructions in the Perl language.

A string in reverse quotation marks is used in UNIX to substitute the standard command output, which means that the contents of the string in reverse quotation marks are interpreted by the command shell as a system command to be executed and the result is substituted in place of the string in reverse quotation marks. In Perl, this design is transferred without any changes.

All data processed by the program is stored in some area of the computer's memory, determined by its address. For the convenience of programming data access, high-level languages, and Perl is no exception here, use variables by which a programmer can refer to data in memory or change its contents. A variable is defined by its own name, which is used by the program to access the memory area and retrieve the data stored in it or, conversely, write data to the memory area. It is commonly said that variables store data, although as we can see, this is not entirely true. It is more correct to say that a variable defines a named memory area in which some data is stored.

Furthermore, a variable defines the type of data stored in the memory area it references. In most programming languages, variables are declared as variables of a certain type before they are used in a program, informing the translator that they can store data of the corresponding type. As we remember, Perl does not have variable declaration operators of a certain type; they are automatically declared when they are first used in language constructs, such as the variable value statement. Any variable is defined by specifying its name, which is the correct language identifier. In Perl, the name of any variable consists of a special character (prefix) that identifies the type of variable followed by an identifier. For variables of scalar type (Example 3.5), or simply scalar variables, this defining symbol is the dollar sign "$".

# Valid scalar variable names. $Name; $name_surname; $name_l;
# Invalid scalar variable names.

$l_name; # An identifier cannot start with a number.
$Name@Surname; # Invalid @ symbol

A scalar variable can store only one scalar given: numeric or string, and there is no way to determine what type of scalar data it contains. The fact is that when using these variables in various operations, the data stored in them is automatically converted from one type to another, that is, in arithmetic operations, a string is converted to a number, and in string operations, a numeric value is converted to a string value. A string is converted to a numeric value if it contains a sequence of characters that is interpreted as a number, otherwise the interpreter generates an error. Hexadecimal numbers with the prefix "oh" and decimal numbers with an underscore to separate the triads in the integer part of the number specified as strings are not converted to numbers, and the sequence of digits' starting with o is not interpreted as an octal number.

Use the oct function to convert strings that contain representations of hexadecimal and octal numbers to numeric values. As noted above, strings in double quotation marks not only interpret control characters, but also allow you to substitute the values of scalar variables. This means that you can specify the name of a variable in a string, which is replaced by the value contained in the variable at the time of calculations. (A similar procedure for substituting the value of a variable into a character string will be called a variable substitution in the future for brevity.) For example, the following sequence of statements:

$s = "\$10";
$n = "The book costs $s dollars.";
print $n;

... will display the following line on the monitor screen:

The book costs $10 dollars. 

You can substitute the values not only of scalar variables, but also of arrays of scalars, array elements and hashes, and sections of arrays and hashes. We will discuss this in the following paragraphs of this chapter, defining the corresponding data types and their variables

When you look up a variable, its name must be separated by delimiters from the rest of the characters in the string, and this rule is not necessary for the first character of the variable name, because the interpreter, when it encounters the character "$" in a string bounded by double quotation marks, begins to allocate the correct identifier.

Delimiters can be spaces or control sequences. You can also explicitly specify the variable ID by specifying it in curly braces. Such a technique is demonstrated by the following fragment of the program:

$day = 'Friday';
$number = 5;
$html = "HTML";
$s = "${html}-document sent B\n$day\t$number of February.";
print $s;

The result of this fragment will be the display of two lines on the monitor screen:

The HTML document was sent out on Friday, February 5th. 

The variable $htmi is substituted with an explicit indication of its identifier, and delimiters are used to highlight the identifiers of the remaining variables.

You can substitute scalar variables whose values are defined using numeric and any type of string literal, and the string in reverse quotation marks is interpreted as an operating system command.

The different types of quotation marks discussed in this paragraph for specifying string literals are actually just a convenient form of recording Perl operations: q//, qq/7, qx/7. (These operations will be discussed in detail in Chapter 4).

The Perl parser selects words (a non-quotation marker sequence of alphanumeric characters) when parsing a program's text and determines whether they belong to a set of keywords. If a word is not a keyword, the interpreter treats it as a string of characters enclosed in quotation marks. This allows you to specify string literals without enclosing them in quotation marks:

$day = Friday; # Identical to the operator $day = 'Friday';

Such words without quotation marks in the text of the program are sometimes also called simple words (barewords).

Specifying string literals without quotation marks is possible only for literals containing letters of the Latin alphabet. An attempt to apply a similar technique to literals containing letters of the Russian alphabet will lead to a compilation error.

Concluding the conversation about literals, we should mention the special literals of the Perl language: _LINE_, _FILE_, _END_ and _DATA_. They are independent tokens, not variables, so they cannot be inserted into strings. The literal _LINE_ represents the number of the current line of the program text, and _FILE_ represents the name of the program file. The _END_ literal is used to specify the logical end of the program. The information located in the program file after this literal is not processed by the interpreter, but can be read through the DATA descriptor file. The last literal _DATA_ is similar to the _END_ literal, only additionally it opens a file with a DATA descriptor to read the information in the program file after it. The program of Example 3.6 demonstrates the use of special literals.

#! /per!520/bin/perl-w
$file = __FILE__;
$prog = __FILE__;
print "We are at line: $prog\n",
"File: $file";; _END_ print "Text after _END_ token";

The result of this program will be as follows if the program file is stored in the file D:\PerlEx\examplel.exe:

We are in line: 3

The output of the latter in the print operator program is not observed, since it is located after the _END_ token.