Interview Question: During XML modeling, do you model a data member as XML Element or XML Attribute?

Short Link: http://wp.me/p5Jvc-aq

In case your resume showcases experience in using XML, the interviewer would normally probe you regarding what selection criteria do you apply in modeling a data member as a XML element or attribute.

The following are some of the selection criteria which I could think of or gather from internet:

Readability

At the end of the day, XML is human readable, hence some effort must be expended in keeping format easy to read. For example consider the following two XMLs PersonElements and PersonElements_Attributes.

<?xml version="1.0" encoding="UTF-8"?>
<!-- PersonElements.xml -->
<Person>
	<FirstName>Steve</FirstName>
	<MiddleName>M</MiddleName>
	<LastName>Richardson</LastName>
	<DateOfBirth>10-March-1988</DateOfBirth>
	<Gender>Male</Gender>
	<UniqueIdentifierNumber>123-45-6789</UniqueIdentifierNumber>
</Person>
<?xml version="1.0" encoding="UTF-8"?>
<!-- PersonElements_Attributes.xml -->
<Person uniqueIdentifierNumber="123-45-6789" gender="male">
	<FirstName>Steve</FirstName>
	<MiddleName>M</MiddleName>
	<LastName>Richardson</LastName>
	<DateOfBirth>10-March-1988</DateOfBirth>
</Person>

In PersonElements.xml, all data members have been modeled as elements; in PersonElements_Attributes.xml, some data members namely uniqueIdentifierNumber and gender have been modeled as attributes. Both XMLs represent the same data but in different formats. For a human readability standpoint PersonElements is easier to read vis-a-vis PersonElements_Attributes.

Data or meta-data

The second criteria I would apply is to verify if the data member represents data with respect to the parent element or metadata. Consider the following XML metadata.xml.

<?xml version="1.0" encoding="UTF-8"?>
<Person id="1AEFHG234">
	<FirstName>Steve</FirstName>
	<MiddleName>M</MiddleName>
	<LastName>Richardson</LastName>
	<DateOfBirth>10-March-1988</DateOfBirth>
	<Gender tc="1">Male</Gender>
	<UniqueIdentifierNumber tc="0">123-45-6789</UniqueIdentifierNumber>
	<AnnualIncome currency="USD" currencycode="1">120000</AnnualIncome>
</Person>

The XML format has become a data communication format of choice for electronic communication e.g. web services. Therefore information pertinent for system processing needs to be incorporated within the XML message payload. This additional information may not have a strong relationship or belongingness with the containing element, however is important from a system processing standpoint. Such information can be modeled as an attribute. In our example, the id attribute uniquely identifies the Person element. The id attribute by itself does not have any business context significance with the Person element but purely present from B2B processing standpoint.

Secondly we have introduced the attribute tc for two elements namely Gender and UniqueIdentifierNumber. From a human readability perspective, the actual values of these elements are sufficient, but from system processing point of view, getting codes instead of description is preferred, hence we have the tc or typecode attribute which points to code value 1. In the second case ‘UniqueIdentifierNumber’ from the formatting a human may decipher that the unique number is US Social Security Number but for the convenience of system processing we use tc=”1″ attribute to identify that this is a social security number. This technique is commonly used in ACORD(An insurance domain specific XML standard) XML modeling.

The third and the final example is that of AnnualIncome element. Here we need to model two aspects, measurement unit and value. Containing them together in one element is achieved by keeping the currency code or description as attribute and measurement value ‘120000’ as the element value. alternative methods for modeling would have been as follows:

<!-- Option 1 -->
<Person>
.
.
.
	<AnnualIncomeCurrencyCode>USD</AnnualIncomeCurrencyCode>
	<AnnualIncome>120000</AnnualIncome>
.
.
</Person>
<!-- Option 2 -->
<Person>
.
.
.
	<CurrencyCode>USD</CurrencyCode>
	<AnnualIncome>120000</AnnualIncome>
.
.
</Person>
<!-- Option 3 -->
<Person>
.
.
.
	<AnnualIncome>
		<CurrencyCode>USD</CurrencyCode>
		<Value>120000</Value>
	</AnnualIncome>
.
.
</Person>

Consider an XML containing a lot of measures and measure types; it would become difficult to comprehend if any of the three above mentioned approaches are used.

Extensibility

XML used for electronic communication needs to be flexible for accepting change. Inevitably business requirements change, data structures evolve; a data member modeled as an attribute cannot be extended further. For example, consider the XML Extension-1.xml. It does not provide any room for the publisher attribute to evolve.

<!-- Extension-1.xml -->
<Books>
	<Book id="12345" publisher="McGraw Hill">
		<Title>XML Modeling - An approach</Title>
		<Author>Ray Johnson</Author>
		<BookReleaseDate>20-March-2001</BookReleaseDate>
	</Book>
	<Book id="22345" publisher="Apress">
		<Title>XML XSLT Guide</Title>
		<Author>Tim Gibson</Author>
		<BookReleaseDate>20-May-2010</BookReleaseDate>
	</Book>	
</Books>

In case the publisher was modeled as a XML element, it could further on easily evolve into Extension-2.xml or Extension-3.xml.

<!-- Extension-2.xml -->
<Books>
	<Book id="12345">
		<Title>XML Modeling - An approach</Title>
		<Author>Ray Johnson</Author>
		<BookReleaseDate>20-March-2001</BookReleaseDate>
		<Publisher>
			<Name>McGraw Hill</Name>
			<Location>Seattle</Location>
		</Publisher>		
	</Book>
	<Book id="22345">
		<Title>XML XSLT Guide</Title>
		<Author>Tim Gibson</Author>
		<BookReleaseDate>20-May-2010</BookReleaseDate>
		<Publisher>
			<Name>Apress</Name>
			<Location>Florida</Location>
		</Publisher>	
	</Book>	
</Books>
<!-- Extension-3.xml -->
<Books>
	<Book id="12345" publisherid="12">
		<Title>XML Modeling - An approach</Title>
		<Author>Ray Johnson</Author>
		<BookReleaseDate>20-March-2001</BookReleaseDate>
	</Book>
		<Book id="22345" publisherid="13">
		<Title>XML XSLT Guide</Title>
		<Author>Tim Gibson</Author>
		<BookReleaseDate>20-May-2010</BookReleaseDate>
	</Book>	
	<Publisher id="12">
		<Name>McGraw Hill</Name>
		<Location>Seattle</Location>
	</Publisher>
		<Publisher id="13">
		<Name>Apress</Name>
		<Location>Florida</Location>
	</Publisher>
</Books>

Relationship with parent element

Any data member which shares a one-to-one relationship with the parent element can be modeled as an attribute, provided the containment relationship between the parent element is weak. A Person and FirstName have a strong containment relationship, a Book and Publisher do not necessarily share a strong containment relationship. If the relationship is one-to-many, that element necessairly needs to be modeled as a XML element. While modeling a person who is an insurance agent, his/her agent number can be modeled as an attribute, however the states in which he/she are licensed to sell insurance can be more than one. Hence the states need to be modeled as XML element.

<?xml version="1.0" encoding="UTF-8"?>
<Persons>
	<Person agentNbr="23456">
		<FirstName>Harry</FirstName>
		<LastName>Maxwell</LastName>
		<LicenseState>VA</LicenseState>
		<LicenseState>GA</LicenseState>
		<LicenseState>FL</LicenseState>
	</Person>
</Persons>

Modeling Relationships

While explaining the extensibility point I used two sample XMLs Extension-2.xml and Extension-3.xml. In Extension-2.xml the Publisher element was made a child element of Book. In Extension-3.xml, the Publisher element was retained as a separate data structure within the root element and its id was modeled as an attribute in Book element. Extension-3.xml provides a flexible structure where I can reuse the same publisher element across multiple Books. Therefore when relationships are not modeled as container relationships(e.g. Extension-2.xml) and a reference is provided at one end of the relationship, the reference can be modeled as a attribute.

Who’s the information consumer?

One key criteria to consider is who is the potential consumer of the information? Is it a human or is it a machine? If it is a machine, the data member can be modeled as an attribute, while a human being would find element preferable due to readability. Machines usually consume meta-data, while humans consume data.
This point might have some overlap with the data or metadata point. I just wanted to highlight it.

Is member order/sequence important?

In some cases maintaining data in specific order is important. Elements support maintenance of ordered sequence, attributes do not.

Some Useful references:

http://xml.coverpages.org/elementsAndAttrs.html
http://xml.coverpages.org/elementAttr9804.html

Advertisements

2 thoughts on “Interview Question: During XML modeling, do you model a data member as XML Element or XML Attribute?

  1. Hi Vinay,

    Pretty interesting explanation. Can you suggest some good documentation to know more about ACORD information modeling for insurance domain.

    Cheers !
    Prasad.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s