Last Updated:

XML Serialization of the Delphi Object

The article describes the possibilities of direct loading/saving of XML documents into Delphi/C++Builder objects and generating the appropriate DTDs. An optimized component is proposed to implement these features.

The XML language provides us with an extremely convenient and almost universal approach to storing and transmitting information. There are many parsers for parsing XML documents using the DOM. On the Microsoft Windows platform, these are, first of all, MSXML parsers from Microsoft.

Parsers interact with calling applications through the Simple API for XML (SAX) and/or Document Object Model (DOM). All analyzers, with the exception of microsoft products, use SAX, and almost all of them can use the DOM.

The implementation of the MSXML parser is not bad, it supports checking the semantic correctness of the document and with its help it is quite convenient to load small XML documents. However, to work with each type of document implemented in XML, the developer has to write some wrapping code to load data from the Microsoft.XMLDOM object into the internal structures of the program or to conveniently navigate the DOM. When you change the format of a document, which is often possible in terms of extending its specification, changes to the generated code can be quite time-consuming and require careful debugging.

The question arises of the ability to simplify work with XML documents, to integrate their processing into the developed programs. For the DOM, the best practice is to directly load the XML document into the Delphi/C++Builder object. And that possibility is there. Using RTTI, you can load data directly from the XML tags of a document into the attributes of a given object. Accordingly, it becomes possible to XML-serialize published interfaces of objects of any Delphi classes.

This approach makes it possible to integrate XML processing into the Delphi and C++Builder development environment in the most convenient way. The ability to access the properties of objects is determined through RTTI mechanisms. Its capabilities in Delphi are very large, because the development environment itself stores object resources in text format.

Obviously, behind the proposed benefits there are a number of limitations. First of all, this applies to tag attributes. We don't have simple mechanisms to distinguish an attribute from a tag while retaining a property of an object. Therefore, in the proposed implementation, we will process XML documents that do not contain attributes. This limitation can only become critical if we want to maintain an existing XML document type. If we develop the format ourselves, we may well abandon the attributes. But our parser will work not just quickly, but very quickly. ;)

The XML serialization algorithm is implemented as a recursive traversal of the published interface of an object. First, let's define a number of simple functions for generating XML code. They will allow us to add opening, closing tags and values to the output stream.

{ writes a string to the output stream. Use when serializing }

procedure WriteOutStream(Value: string);
begin
   OutStream.Write(Pchar(Value)[0], Length(Value));
end;

{ Adds a start tag with the given name }

procedure addOpenTag(const Value: string);
begin
   WriteOutStream(CR + DupStr(TAB, Level) + '<' + Value + '>');
   inc(Level);
end;

{ Adds a closing tag with the given name }

procedure addCloseTag(const Value: string; addBreak: boolean = false);
begin
   dec(Level);
   if addBreak then
     WriteOutStream(CR + DupStr(TAB, Level));
   WriteOutStream('</' + Value + '>');
end;

{ Adds a value to the resulting string }

procedure addValue(const Value: string);
begin
   WriteOutStream(Value);
end;

The next thing to implement is to iterate through all the properties of the object and form tags. Property information is obtained through the component interface. This is type information. For each property, except for class properties, their name and text value are obtained, and then an XML tag is formed. The value is loaded via TypInfo.GetPropValue();

procedure TglXMLSerializer.SerializeInternal(Component: TObject;
                                             Level: integer = 1);
var
  PropInfo: PPropInfo;
  TypeInf, PropTypeInf: PTypeInfo;
  TypeData: PTypeData;
  i, j: integer;
  AName, PropName, sPropValue: string;
  PropList: PPropList;
  NumProps: word;
  PropObject: TObject;
begin
  { Playing with RTTI }
  TypeInf := Component.ClassInfo;
  AName := TypeInf^.Name;
  TypeData := GetTypeData(TypeInf);
  NumProps := TypeData^.PropCount;

  GetMem(PropList, NumProps * sizeof(pointer));
  try
    { Get a list of strings }
    GetPropInfos(TypeInf, PropList);

    for i := 0 to NumProps - 1 do
    begin
      PropName := PropList^[i]^.Name;

      PropTypeInf := PropList^[i]^.PropType^;
      PropInfo := PropList^[i];

      case PropTypeInf^.Kind of
        tkInteger, tkChar, tkEnumeration, tkFloat, tkString, tkSet,
          tkWChar, tkLString, tkWString, tkVariant:
          begin
            { Get property value }
            sPropValue := GetPropValue(Component, PropName, true);

            { Translation to XML }
            addOpenTag(PropName);
            addValue(sPropValue); { Add property value to result }
            addCloseTag(PropName);
          end;

For class types, you will have to use recursion to load all the properties of the corresponding object.

Moreover, a special approach must be taken for a number of classes. This includes, for example, string lists and collections. Let's limit ourselves to them.

For a TStrings text list, we will store its CommaText property in XML, and in the case of a collection, after processing all its properties, we will save each TCollectionItem element separately to XML. In this case, we will use the name of the TCollection(PropObject) class as the container tag. Items[j]. ClassName.

tkClass: { For class types, recursive processing }
begin
  addOpenTag(PropName);

  PropObject := GetObjectProp(Component, PropInfo);
  if Assigned(PropObject) then
  begin
    { For child class properties - recursive call }
    if (PropObject is TPersistent) then
      Result := Result + SerializeInternal(PropObject, Level);

    { Individual approach to some classes }
    if (PropObject is TStrings) then { Text Lists }
    begin
      WriteOutStream(TStrings(PropObject).CommaText);
    end
    else if (PropObject is TCollection) then { Collections }
    begin
      Result := Result + SerializeInternal(PropObject, Level);
      for j := 0 to (PropObject as TCollection).Count - 1 do
      begin
        addOpenTag(TCollection(PropObject).Items[j].ClassName);
        SerializeInternal(TCollection(PropObject).Items[j], Level);
        addCloseTag(TCollection(PropObject).Items[j].ClassName, true);
      end
    end;
    { Here you can add processing of other classes: TTreeNodes, TListItems }
  end;
  addCloseTag(PropName, true);
end;

The described functions will allow us to get the XML code for the object, including all its properties. It remains only to "wrap" the resulting XML in the top-level tag - the name of the object class. If we put the above code in the SerializeInternal() function, the resulting Serialize() function would look like this:

procedure Serialize(Component: TObject; Stream: TStream);
...
  WriteOutStream(PChar(CR + '<' + Component.ClassName + '>'));
SerializeInternal(Component);
WriteOutStream(PChar(CR + '</' + Component.ClassName + '>'));

To the above, you can add more functions to format the generated XML code. You can also add the ability to skip blank values and properties with default values. We implement all these extensions when creating a ready-made component.

It should be noted that if you want, you can rewrite this code to generate element attributes as well. To distinguish elements from their attributes in the persisted object interface, you can accept the following convention: only class types are elements, yet other properties are encoded as attributes of the corresponding classes. Accordingly, the parser can be modified. This makes it possible to use XML schemas instead of dTDs. Here, however, there is the problem of describing the content model for the text of the #PCDATA. To solve the problem, you will have to allocate a separate class to store such data.