Last Updated:

XML: Capabilities and Perspectives

The fate of the Web, and above all the assessment of the prospects of the XML language, which is the basis of new Web technologies, is of concern to a wide range of people involved in the development and development of information systems.

Table of Contents

Key points

  • XML Language and Framework
  • and XML documents
  • XML metadata
  • XML and databases
  • XML Resource Semantics
  • XML Perspectives

In the second half of 2000, a discussion of these issues took place on the pages. However, important points that are worth paying attention to were not touched upon.

Part 1. The XML Platform and its Constituent Standards

In recent years, we have witnessed a revolution developing in the World Wide Web, which is associated with the emergence of a new standard for the XML hypertext markup language.


In our view, there is indeed a revolution taking place as radically new Web technologies are being intensively developed. This revolution can be considered, because it is carried out prudently, with clearly expressed and practical intentions to preserve all those huge information resources and those useful Web-oriented applications that have been intensively created in the Web environment during its short history.

Simultaneously with these internal changes, the Web is also heavily influenced by external processes, largely stimulated by them and associated with the new trend in the development of information systems, which has been actively developing in recent years - with the integration of database technologies, text (documentary) systems, Java technologies, Web technologies, corba technology of heterogeneous distributed object environments based on various approaches. The reasons for this trend lie in the desire not only to enrich the functionality of the large systems being created, but also to ensure the integration, including at the semantic (semantic) level, of heterogeneous information resources created and supported by means of various technologies.

The changes taking place in the Web affect a fairly wide class of information systems. It is no coincidence, therefore, that the future of the Web, and above all the assessment of the prospects of the XML language [1-3], which is the basis of new Web technologies, is of concern to a large number of specialists involved in the development and development of information systems. In this regard, it seems quite natural to have a discussion of these issues on the pages of this appendix [4]. However, at the same time, it seems that some important points that need to be paid attention to by readers were not touched upon or not clearly emphasized.

A discussion of the capabilities of XML and the prospects for its use is important, since this language, together with the set of W3C standards that make up its infrastructure, has already become a de facto standard. At the same time, the scope of its application is constantly expanding and includes not only Directly Web-technologies, but also some related areas of information technology, focusing on the Web as an environment of teleaccess to resources of a completely different kind and the exchange of information between systems based on other technologies.

XML Language and XML Platform

When evaluating the changes taking place on the Web, it would be a mistake to limit ourselves to considering the possibilities of only the XML language itself. Along with the creation of the standard for this language, the W3C, which forms the technical policy for the development of the Web and develops standardized specifications for this environment, is actually simultaneously forming a new platform, the basis of which is the XML language. The functionality of this platform is determined by a whole complex of interrelated standards, some of which have already been adopted by the W3C, while others are under development.

With few exceptions, other XML platform standards are considered xml applications by developers, mainly because they use XML syntax for their specifications. However, an important role is played by the fact that the new features defined by these standards are introduced by specifying the meaning and functionality of some of the syntax components of the XML language. In other words, there are a number of platform specifications in the XML language specifications that provide a natural way to extend the functionality of XML in various aspects. The specifications of some of these extensions are defined by a number of XML platform standards.

Rice. 1. XML Platform Standards. The arrows indicate the use of some standards in defining others. Frames of different colors indicate a different state of the standards: yellow - there is an accepted version for the standard; green - the standard is partially adopted; red - the standard is under development; blue - no draft standard yet

These standards allow, among other things, to define the set of markup tags and their attributes that are allowed in an XML document by associating some semantics with them by default (the XML namespace standard - Namespaces in XML [5]), enrich the description capabilities of the language using the DTD of the structure of XML documents (the schema specification standard is XML Schema [6-8]), allow you to define hyperlinks between documents and / or their fragments (pointer language and hyperlink language standards - XPointer [9] and XLink [10]), allow you to describe the semantics of XML documents with varying degrees of formality (resource definition environment standard - RDF [11-12]), manage the presentation of XML documents on the client side (css cascading style sheets standards [13] and the XSL extensible style sheet language [14]), describe transformations of XML documents (xml document transformation description language standard - XSLT [15] - a special part of the XSL standard).

In addition, a standard object model (DOM) [16] for XML and HTML documents has been created, defining the functions of the application programming interface for their processing.

A standard for the XML Resource Request Language (XML-QL) is also being developed, for which the requirements for the basic model and language are formulated [17-18] and a number of existing applicants are being studied [19], and the electronic signature standard for XML documents (XML-Signature [20]).

A special place in the considered set of standards is occupied by the recently adopted W3C standard XHTML 1.0 [21]. It provides one possible way to ensure the continuity of the Web environment by allowing the XML platform to use the information resources accumulated within the framework of HTML technologies. This standard supports, by means of XML, the functionality of the current version of the HTML language (HTML 4.01) with three different levels of completeness.

The XML platform standards in question also includes a number of supporting standards. Here are a few examples. The XML Information Set (Infoset) [22] standard is an abstract description of the data that makes up an XML document. It is based on the XML specification [1]. The XPath standard [23] defines the concept of an XML document fragment used in the XPointer and XSLT languages. The XML Inclusions (XInclude) standard [24] provides a model and syntax for describing the merging of XML documents. The XML Fragment Interchange standard [25] allows you to describe the context of fragments of an XML document and thereby view and edit them outside the full text of the document. Mention also the Canonical XML standard [26], which proposes a method for setting the equivalence of two XML documents with different syntactic representations. This possibility is essential, in particular, for the use of a digital signature [20].

Even the above incomplete list of the xml platform standards and their purpose show that when assessing the prospects for using the XML language, it is inappropriate to limit oneself to considering only the actual functionality of this language, and the totality of standards that make up the emerging XML platform should be taken into account.

How XML relates to other platform standards


As noted, the XML language specification provides a number of that ensure the relationship of XML with other standards of the platform based on it, as well as, if necessary, with standards that are not related to it.

In general terms, the following approach is used:

The main language is that XML is a metalanguage, as is the SGML language that gave rise to it. Unlike HTML, its specification does not capture the functional specialization of XML document elements, their attributes, or the semantics of attribute values. It is by specifying the functionality and syntax of the elements of XML documents that you can extend the functionality of the XML language.

The second in XML is the ability to use so-called namespaces – predefined named sets of names used as element type names and attributes of elements of XML documents. Defining a namespace also allows you to explicitly or implicitly associate a set of valid values of those attributes with attribute names.

It is assumed that each name that belongs to a given namespace, as well as the attribute values, correspond to some semantics that are defined by default or explicitly. The way semantics are defined is not fixed in the Namespaces in XML standard. These definitions may be based on various other standards or methods required for a particular application.

The standards of the XML platform, which extend the functionality of the language, are built on this principle. You enter a namespace with a reserved name that defines the names of special types of XML document elements and their attributes. The semantics of these elements and their attributes and the syntactic conventions are defined in the specifications of these complementary standards. Names belonging to this space are considered generally accepted.

An example of the use of the considered XML extension mechanism is the XLink standard [10], which allows you to use in XML documents of a special kind of reference elements that provide various kinds of hyperlinks between XML documents. In XML itself, the concept of a hyperlink is not supported.

What data can XML represent?


The statement often expressed in many publications that XML allows you to describe data of a very different nature needs to be clarified. This statement should be understood in this way.

XML allows you to mark up text files, turning linear text into hypertext. Various files of a different nature, often referred to as binaries, are not hypertext markup objects by means of XML. However, XML specifications allow such information resources to be integrated into hypertext by reference to the files containing them, thereby giving rise to the hypermedia information resources that make up the contents of Web pages. In addition to referencing resources contained in binary files, an XML document can contain a textual description of them directly or reference other XML documents that contain it. These latter, in turn, can contain binaries integrated into them, etc. The XML language does not provide any other possibilities for describing and presenting data such as images, audio, video, etc.

XML and HTML: Which Language Is More Complex


Discussion [4] raised the question of comparing the complexity of HTML and XML standards. It has been argued that XML is more primitive than HTML. I would like to clarify this point.

If we compare the volume of descriptions of these languages (documentation of XML and HTML standards), then the XML specifications take up several times less space. You need to spend less work and time to understand and remember them. And in this sense, XML is simpler than HTML.

However, XML is by no means more primitive than HTML. If we compare the functionality of these languages, we must first take into account that although they have common roots - the well-known international standard for the generalized document markup language SGML [27], but nevertheless belong to different levels of abstraction.

XML is a metalanguage that is known to be a subset of the SGML standard. Like SGML, it is designed to spawn a variety of specific markup languages by defining specific sets of tags (in XML document element types). These XML-defined languages are thus its concretizations.

As for the HTML language, it is a specific (not extensible) language. The functionality of markup tags in it is fixed, unlike XML. HTML was created as the simplest concretization of SGML, which is a powerful metalanguage. HTML can also be defined by XML (think of the XHTML standard), and therefore it is also one of the specificizations of XML.

Because of its abstractness, XML is open to extensions (as reflected in its name), and for this reason it is significantly more conservative than HTML, where the addition of functionality requires passing the procedure for adopting a new version of the standard. Versions of XML browsers will appear much less frequently than for HTML, which is still evolving.

Multi-level representation of XML documents


Xml provides the layered presentation of data that is a feature of database systems. Recall the textbook concepts and representations of data for database professionals or external, conceptual, and internal schemas in the three-circuit ANSI/X3/SPARC technology.

More specifically, XML supports primarily the presentation layer of an XML document — a description of its storage structure. The building blocks for it are the so-called entities of the XML language - files and fragments of files of various natures (files with XML specifications, for example, a DTD file for a type of document, or binary files of graphics, audio or video data, repeating lines inside an XML document, etc.). The storage structure of an XML document is a hierarchy of such entities. It is important to note that XML does not provide a separate description of the physical representation of an XML document. This view is self-describing. It is embedded in the document itself.

Further, along with the (physical) representation of XML documents is supported. The logical structure of an XML document is a hierarchy of structural elements that make up its contents, highlighted by markup tags. While the physical representation of XML documents, as already indicated, is self-describing, a separate explicit description is provided for their logical representation. It is for this purpose that the definition of the type of documents - DTD - serves.

Thus, although XML supports a two-level representation of documents, but, unlike database technologies in their modern implementation, in XML the description of the physical (stored) representation is not from the document, but is embedded in it (Fig. 2). This significantly limits the ability of the XML environment to support data sovereignty.

The top level of the XML document representation information architecture is a description of its semantics. The possibilities envisaged for this purpose are planned to be considered in the second part of the article.