Last Updated:

CGI - Common Gateway Interface

Common Gateway Interface

CGI - Common Gateway Interface is a standard interface (communication) of an external application program with an information server such as HTTP, Web server.

Typically, hypertext documents retrieved from WWW servers contain static data. With the help of CGI, you can create CGI programs, called gateways, which, in interaction with application systems such as a database management system, a spreadsheet, business graphics, etc., will be able to display dynamic information on the user's screen.

The gateway program is launched by the WWW server in real time. The WWW server ensures the transmission of the user's request to the gateway, and it, in turn, using the tools of the application system, returns the result of processing the request to the user's screen. The gateway program can be encoded in C/C++, Fortran, Perl, TCL, Unix Schell, Visual Basic, Apple Script. As a feasible module, it is written to a subdirectory named cgi-bin www server.

The original description of the CGI interface - a tool for communicating the gateway program with the WWW server is located in the wist.ifmo.ru node.

  • Transferring data to gateways
    • Queries for different methods
    • Command-line arguments
    • CGI Environment Variables
  • Gateway output
    • Basic concepts
    • Output stream header

Transferring data to gateways

To pass information request data from the server to the gateway, the server uses the command line and environment variables. These environment variables are set when the server runs the gateway program.

Queries for different methods

Information is transmitted to the gateways in the following form:

name=value&name1=value1&..,

where name is the name of the variable (from the FORM statement, for example), and value is its real value. Depending on the method that is used for the request, this string appears either as part of the URL (in the case of the GET method) or as the content of the HTTP request (the POST method). In the latter case, this information will be sent to the gateway in the standard input stream.

A CONTENT_LENGTH bytes is sent to the file handle of the standard input stream. The server also sends the CONTENT_TYPE (the type of data being transmitted) to the gateway. The server is not required to send the end-of-file character after sending CONTENT_LENGTH bytes of data and after the gateway reads them.

Example

Let's take the result of the form working with the POST method (METHOD="POST") as an example. Suppose you get 7 bytes encoded something like this:

a=b&b=c.

In this case, the server will set the CONTENT_LENGTH value to 7 and CONTENT_TYPE in application/x-www-form-urlencoded. The first character in the standard input stream for the gateway will be "a", followed by the rest of the encoded string.

Command-line arguments

The gateway at the command line from the server receives:

  • the remainder of the URL after the gateway name as the first parameter (the first parameter will be empty if only the gateway name was present), and
  • a list of keywords as a remnant of the command line for the search script, or
  • Alternating names of form fields with an added equal sign (in even positions) and corresponding variable values (in odd positions).

Keywords, form field names, and values are passed decoded (from the HTTP URL encoding format) and transcoded according to the Bourne shell encoding rules, so that the gateway on the command line gets the information as it is, without the need for additional conversions.

FORM Statement Requests

Requests for the FORM statement are processed so that each parameter that is responsible for the field name ends with an equal sign, and the remainder is the value of that parameter. If there is anything after the name of the script (gateway), then this information is passed as the first parameter. Otherwise, the first parameter will be empty.

Examples:

 

 /htbin/foo/x/y/z?name1=value1&name2=value2

 

called as:

 

 /.../foo /x/y/z name1= value1 name2= value2

 

and

 

 /htbin/foo?name1=value1&name2=value2

 

called as:

 

 /.../foo '' name1= value1 name2= value2

 

CGI Environment Variables

The following environment variables are not type-specific queries and are set for all queries.

SERVER_SOFTWARE
The name and version of the information server that responds to the request (and starts the gateway). Format: Name/Version
SERVER_NAME
The host name on which the server is running, the DNS name, or the IP address as it appears in the URL.
GATEWAY_INTERFACE
The version of the CGI specification at the time the server was compiled. Format:CGI/version

The following environment variables are specific to different requests, and are populated before the gateway is invoked:

SERVER_PROTOCOL
The name and version of the information protocol in which the request came. Format:protocol/version
SERVER_PORT
Port number to which the request was sent
REQUEST_METHOD
The method that was used to query. For HTTP, it's "GET", "HEAD", "POST", etc.
PATH_INFO
Additional information about the path that the client transmitted. In other words, the gateway can be accessed via a virtual path followed by some additional information. This information is transmitted to the PATH_INFO.
PATH_TRANSLATED
The server sends a converted version of the PATH_INFO, which includes the path converted from virtual to physical.
SCRIPT_NAME
The virtual path to the gateway to run, used to retrieve the URL.
QUERY_STRING
Information following ? in the URL to which this gateway belongs. This information is a query string. It should not be decoded in any way. Regardless of the command line, this variable should always be set if the following information is available, .
REMOTE_HOST
The hostname of the host making the request. If the server does not have such information, it must set the REMOTE_ADDR, and leave this field unspecified.
REMOTE_ADDR
The IP address of the host making the request.
AUTH_TYPE
If the server supports user identification and the gateway is tamper-proof, this protocol-specific authentication method is used to verify the user.
REMOTE_USER
Used in situations similar to the previous case to store the user name.
REMOTE_IDENT
If the HTTP server supports user identification according to RFC 931, then this variable will contain the user name received from the server.
CONTENT_TYPE
For requests that contain additional incremental information, such as HTTP POST and PUT, this information contains the data type of that information.
CONTENT_LENGTH
The length of the data that the client transmits.

In addition, if the request contains additional request header fields, they are placed in environment variables with the prefix HTTP_ followed by the header name. Any '-' characters in the header are changed to underscores '_'. The server can exclude any headers it has already processed, such as Authorization, Content-type, and Content-length. If necessary, the server can exclude any (or even all) additional header fields in the event that their inclusion may cause the size limit of environment variables to be exceeded.An example of such a variable is the HTTP_ACCEPT variable that was defined in the CGI/1.0 specification. Another example is the User-Agent header.

 

HTTP_ACCEPT
A list of MIME types that the client can process as specified in the HTTP headers. Other protocols should get this information from elsewhere (if they need it). Each type in this list must be separated by a comma according to the HTTP specification. Format: Type/Subtype, Type/Subtype
HTTP_USER_AGENT
The viewer that the client uses to send the request. General format: program/version library/version.

Gateway output

 

Basic concepts

The gateway outputs to a standard output stream. This output can be either a document generated by the gateway or instructions to the server where to obtain the required document.

Typically, the gateway produces its output, which is interpreted and sent back to the client. The advantage of this approach is that the gateway does not have to send a full HTTP/1.0 header to each request.

 

Output stream header

For some gateways, it may be necessary to avoid the server processing their output, and to communicate with the client directly. In order to distinguish such gateways from the rest, CGI requires that their names begin with the nph- prefix. In this case, it is the gateway's responsibility to return a syntactically correct response to the client.

 

Parsed headers

The gateway output begins with a small header. It contains text strings, in the same format as in the HTTP header, and ends with an empty string (containing only the line feed character or CR/LF).

Any header strings that are not server directives are sent directly to the client. Currently, the CGI specification defines three server directives:

Content-type
The MIME type of the returned document.
Location
This field is used when you want to tell the server that a reference to the document is returned instead of the document itself.

If the argument is a URL, the server will instruct the client to redirect the request.If the argument is a virtual path, the server will return the document specified by this path to the client as if the client was requesting it directly.

 

  • Status

This directive is used to set the HTTP/1.0 server a status string to be sent to the client. Format: nnn xxxxx, where nnn is a 3-digit status code, and xxxxx a reason string such as "Forbidden".

 

Examples

Suppose you have some text to HTML converter. When it finishes its work, it must output the following to the standard output stream:

 

--- начало вывода ---Content-type: text/html--- конец вывода ---
Now consider a gateway that, in some cases, must issue the /path/doc.txt document from this server as if it were directly requested by the client through http://server:port/path/doc.txt. Finally, suppose that the gateway returns references to a gopher server, such as a gopher://gopher.ncsa.uiuc.edu/. The gateway output will be as follows:
--- начало вывода ---Location: /path/doc.txt--- конец вывода ---
--- начало вывода ---Location: gopher://gopher.ncsa.uiuc.edu/--- конец вывода ---
Non-parsed headers

Suppose now that we have a gateway that communicates directly with the client. As noted, its name must begin with the nph- prefix and it must return a valid HTTP header. In this case, if the gateway was accessed with a SERVER_PROTOCOL value of HTTP/1.0, its output must satisfy HTTP/1.0:

--- начало вывода ---HTTP/1.0 200 OKServer: NCSA/1.0a6Content-type: text/plain--- конец вывода ---