Clinical data management and data analysis are the main two areas where SAS has been most widely used. The SAS software has been implemented in other functional areas of clinical research such as gene sequencing or post market analysis, but it remains most entrenched in the data management and biostatistics research departments within many biotech and pharmaceutical companies. SAS plays an even more significant role in the area of analysis and reporting within the biostatistics groups as compared to data management. The reason for this is because SAS has distinctive statistical and analytical tools. SAS is more known for its analytical strengths from its academic roots and because it has a large library of PROC or procedures that has made it one of the most comprehensive and powerful data analysis tools. However, SAS also has strengths in data management with its unique scripting language utilizing the construct of a data step that sets it apart from all other analytical tools. SAS can also utilize the standard query language SQL similar to many other relational database management systems such as Oracle or MS SQL Server. In addition to SQL, it has its unique data step facility which allows it to perform data transformations suitable for the purpose of data analysis and data management. The combination of strengths in these two areas makes it a unique and compelling solution even in organizations with existing clinical data management systems. In these situations, SAS can still be used to perform many aspects of its data management tasks such as discrepancy management or the management of its controlled terminology. This chapter will elaborate on the relationship of data management and data analysis to provide a clearer understanding of how these two seemingly distinct and separate functions are interconnected and served well by SAS technologies.
Data Management Background
On the critical path to getting data from the source investigator sites to an electronic submission, the data capture technologies and associated processes have a direct impact on the quality and accuracy of the data. Data management may not be the main task and responsibility of a SAS user within a biostatistics department, but it can play a significant role. An understanding of the process of data management is thus essential to becoming effective in the analysis and reporting of clinical trials. Clinical Data Management or (CDM) is a functional group within an organization that is usually one of the first groups that is formed since it is needed early on in the conduct of a clinical trial. This process starts out by capturing the data in case report form or CRF. The development of EDC or electronic data capture systems is evolving and is having an effect on the process of capturing information more efficiently. This section will first describe the traditional paper method and then provide an overview of the EDC approaches. In the traditional method, the case report form or CRF is the legal document that captures all the information at the investigator site on physical paper. The ultimate task of a data manager is to capture this information into an electronic form that would later be used in an analysis for a submission.
The processing of this information is managed by a CDMS or a clinical data management system. Double key is a traditional and well established method within the industry used to capture the information on paper CRFs and transcribe them into a relational data base. Although this can be resource intensive, it ensures the accurate interpretation of the data. As the data is being entered, the data values are evaluated for correctness through edit checks or discrepancy checks. The discrepancy comes about from values being out of range. This can be identified in double key data entry and followed by an algorithm that checks for conditions such as an evaluation of a valid range of WBC (white blood count) in a lab test. An algorithm can be established to check for normal values ranging from 4,000 to 10,000 cells per micro liter of blood. If the values fall outside this range, it would show up in the discrepancy report which needs resolution. This simple example illustrates how the discrepancy management is an important step in the process of ensuring the integrity and accuracy of the data being captured.
There are many instances where the value entered from the source CRF is clear but can still be discrepant. In this case, a query is then sent to the site to be clarified by the investigator. The resolution of these queries will further ensure the integrity of the information captured. Once all the information is captured, an interpretation or coding of the adverse events and drug names captured in medical history or concomitant medication are applied against standard dictionaries. This usually falls within the responsibility of the data management process as well.
There are many commercial CDMS on the market to manage all the aspects of data management. SAS has great strengths in the ability to manage data so it can also be used to manage these tasks. Since the data is commonly converted to SAS for analysis after it is captured, some organizations prefer using SAS for data management so that the data captured will be in the format of its intended target. Its strengths in data manipulation shine even in environments where an organization already has relational databases to capture the information. SAS is traditionally used for analysis but can be used to process sophisticated edit checks or coding of its adverse and drug terms since the data managers do not have the right tools or skills to do this in the database. Even though SAS is used primarily in the analysis of clinical data, many organizations also implement SAS in many if not all data management areas. In addition to performing analysis, a SAS programmer would also play the role within data management since the two roles are interconnected.
Electronic Data Capture or EDC started out with technologies that allowed nurses, clinical data associates or other members at the clinical site to enter the data remotely in the late 1980s. These technologies proved to expedite the process of capturing clinical data from the site and then deliver them to the sponsors with greater speed and accuracy as compared to the traditional paper method. The process empowered nurses, physicians and other medical professionals with the ability to enter the information directly which provided more instant data discrepancy resolution since the edit checks could be applied during the entry point. The initial EDC systems had some challenges in that there was inefficient hardware involved. This process has improved with the development of thin client versions for ease of management and deployment. EDC is still being updated as it is beginning to also deliver features of analysis and reporting that has traditionally been left to the SAS or biostatistics group. EDC is also being affected by data standards established by organizations such as CDISC and HL7 to make the data more portable and compatible with regulatory agencies and for collaborating with contract research organizations (CRO).
to be continued in the book I am working on "Becoming a SAS Programmer in Pharmaceutical Industry" and related Serious Adverse Event Software ...