Cost Effective Ways to Generate DEFINE.XML

You can have greater understanding and management of your data if it is well documented with data definition documentation in the format of DEFINE.PDF and DEFINE.XML. As the number of datasets and variables increase, this can be very resource intensive. The time consuming documentation task is compounded by the fact that there are constant changes to the data so the documentation has to keep up with the changes in order for it to be useful and accurate. This paper will suggest methods and tools that would enable you to document your data definition document without purchasing a complex expensive system.

When you plan for a road trip, you need a map. This is analogous to understanding the data that is going to be part of an electronic submission. The reviewer requires a road map in order to understand what all the variables are and how they are derived. It is within the interest of all team members involved to have the most accurate and concise documentation pertaining to the data. This can help your team work internally while also speeding up the review process which can really make or break an electronic submission to the FDA. Some organizations perform this task at the end of the process but they really lose out on the benefits which the document provides for internal use. It is therefore recommended that you initiate this process early and therefore gain the benefit of having a road map of your data.

The process that is involved in managing and creating the data definition documentation is as follows:

The process is an iterative one since the SAS datasets are updated. The constant need to update the documentation is therefore one of the challenges which this paper will address.

Levels of Metadata
There are several steps towards documenting the data definition. Most of what is being done is documenting metadata which is information about the data that is to be included. There are several layers to the metadata. These include:

  1. General Information – This pertains to information that affects the entire set of datasets that are to be included. It could be things such as the name of the study, the company name, or location of the data.
  2. Data Table – This information is at the SAS dataset level. This includes things such as the dataset name and label.
  3. Variable – This information pertains to attributes of the variables within a dataset. This includes such information as variable name, label and lengths.

The order in which the metadata is captured should follow the same order as the layers that are described.

Automating Capturing and Editing
Tools such as PROC CONTENTS and Excel do have capabilities to customize and automate the documentation to a degree. They are not however intended specifically for creating data definition documentation. These tools therefore have limitations. A tool that was developed entirely in SAS specifically for generating this type of documentation is Defindoc. This tool contains both a graphical user interface and a macro interface to fit the user’s requirements. The tool addresses all the disadvantages of the manual methods. It uses a similar PROC CONTENTS type of mechanism of capturing the initial metadata. However, it only retains the specific information that is pertinent to the data definition documentation.

complete paper found at define.xml papers, and define.xml software.

Bookmark and Share


Popular posts from this blog

Clinical Trials Terminology for SAS Programmers

Remembering Dad