Friday, July 31, 2009

Why is XML so Difficult for SAS Programmers?

The FDA has adopted the use of XML as a standard file format for everything from a standard product labeling file format to electronic submission. This standard file format is widely used on the Internet and is useful for storing data along with metadata that explains the context of the data values. I am a team member of the SDS, part of CDISC establishing the way we use XML in the DEFINE.XML used to document clinical data in a submission. The XML is very well suited for the storage of the metadata describing how each variable is derived and detailing its code lists (controlled terms).

One challenge I find is that most users that work with clinical data are either data managers or SAS analysts who are responsible on generating the DEFINE.XML, do not understand XML. Syntactically, XML is a mark up language and the schema it uses is not rocket science. However, its requires a very different skill set since most users are familiar with SAS or SQL and does not really normally work with XML or the XSL that accompanies it for display. SAS’s attempt at implementing PROC CDISC in my opinion has come up short in closing this gap. Many users are therefore faced with the challenge of looking through the XSD files to understand the structure of the XML file. This is like trying to pick up Japanese when you have been using English for many years.

XML is useful for both storing data and describing data. Its accompanying tools such as XSL can also make use out of the data stored to have very useful functionality. For example, a XML file coupled with the right XSL can result in an HTML or even an Excel XLS looking result. I tried using some of these techniques in resolving the problem I see between SAS analysts and the need for generating DEFINE.XML. You can log in and sign up for free to try out Definedoc and how it can capture the metadata from SAS dataset to then generate a DEFINE.XML. This can be done without having to understand the XSD and XSL that accompanies it. I prefer to jump into a car and use it to get me from point A to B rather than learning all the details of the internal combustion engine. I hope this solution can help to close the gap and allow users to easily work with XML to gain the value it gives you as a powerful file format.

No comments:

Post a Comment