Thursday, April 30, 2009

Helpful Hints on Developing a User Friendly Database

Developing an effective database application requires an interface that is easy for the user. This paper will explore the features of SAS/AF and methodologies of building a successful database. It combines user interface suggestions for the front end while also suggesting back end SCL, SQL and data step logic that makes the software efficient to program and to operate. The majority of the examples are technical tips but there are also shared lessons learned from collaborating with end users which prove to be very important in creating an effective application.

There are many solutions for creating a data entry system ranging from a simple Excel spreadsheet to a sophisticated Oracle database. Each set of technologies works well for a specified task. This paper will explore a database containing clinical information used in regulatory submission. SAS/AF is very suitable for this since all analysis work for clinical data requires SAS. The scenario of this particular project involved a pharmaceutical company developing a custom database. They had already licensed BASE SAS with very little additional modules. One advantage of SAS/AF is that it can be rolled out to clients that do not have SAS/AF during execution. In this scenario, SAS was installed on a Windows server.

SAS was delivered to users on their desktops through terminal services. This was economical since there was only one SAS license required. Even though SAS/AF was the main software used during development, the programming was not all in SCL, but also involved SQL and data step logic. It is useful to use SQL and data step logic where ever possible, so that more SAS programmers can understand and maintain the system in the future. The main reason the client chose a customized solution was because of unique requirements. It would have made an off-the-shelf system, such as Oracle Clinical, a large investment in infrastructure and operational procedure change. It was therefore more effective to develop tools specific to the specific requirements. Even though the data entry system, named “BigRed”, was a custom effort, modularization and the use of a data driven approach made the process much more efficient.

Data Driven
One of the biggest time savings in development was to drive the behavior of the data entry system through SAS datasets. The data tables which contain the actual clinical data being entered were more than just data repositories. They also acted as a collection of metadata which drives the labels on the screens. This is a data step example which was used to create the initial data.

*** Define the Treatment Center ***;
data dbdata.trtcent (label="Blood Collection at Treatment Center" read=&password genmax=11);
attrib srcloc label="Collection Source
Location" length=8
unitid label="Blood Unit ID"
grprh label="Blood Group and Rh"
length=8 format=BLOODFMT.
pcode label="Test or Control"
length=8 format=PCODEFMT.
coldate label="Collection Date"
length=8 format=mmddyy6.
usrname label="User Name"
datetime label=
"Date Time of User Interaction"
length=8 format=DATETIME13.;

The data table was defined with a label of “Blood Collection at Treatment Center”. This label was later used in a selection list for table selection during data entry. Each variable also contained labels. These were used for data entry screen variable selections and labels. The user defined formats were stored in a formats catalog. The values of these were used as coded values for pull down menu selections. At first glance, the data being defined appears to be purely for storing information. For efficiency, these same attributes also affected the user’s selection choices on the data entry screens.

One additional layer of security that was easily implemented was to password protect the data table. In this example, it prevented users from reading the data directly by the option (read=&password). It forces the users to use the data export or other reporting engines through the system. This allowed better control over what variables were delivered to users such as excluding unnecessary administrative variables.

complete paper at "Helpful Hints on Developing a User Friendly Database with SAS/AF " and related Clinical Data Management.
Bookmark and Share

Monday, April 27, 2009

How to Develop User Friendly Macros

SAS® macros help automate tasks that are done repeatedly. However, if they are not easy to use or debug, they may never be used. This paper will describe approaches to developing and maintaining SAS macros that are easy to use. Some of the topics covered include:
  • effective documentation of macro header
  • portable code for use with different OS
  • error and warning message handling
  • paper and online documentation
  • use of nested macros and nested macro variables
  • keeping macros simple for debugging

A little effort can go a long way towards creating a successful SAS macro. This paper will present tips and techniques that are not always obvious. Besides getting the resulting numbers to the user, a user friendly macro can enhance the entire experience.

Macros Overview
SAS macros are great at automating repetitive tasks as a code generator. However, there are some features of macros that make them difficult to understand and debug. The syntax of SAS macros is similar to that of traditional SAS data step but there is a level of abstraction. To accomplish its code generating function, SAS macros add percent signs (%) and ampersand (&) in front of specified commands and variables. This layer can be confusing since it requires you to resolve the macro before understanding what is being processed. This confusion is compounded when things are nested. Macro variables can be nested by resolving into other macro variables. Macros themselves can call other macros which creates a nested looping structure. Macro code can sometimes be spaghetti code. It is therefore helpful for the user and the person maintaining the macro to make the macros user friendly.

One of the most straightforward ways to make SAS macros user friendly is to enhance the documentation that accompanies the macro. This is an essential part of software that some programmers overlook when delivering macros to users. There are many different forms of documentation and the more that is available, the greater the chance of connecting with the user. This paper will summarize eleven different types of documentation. The content can overlap among the different types but each method has its own strengths.

HTML – Documentation can be delivered as HTML pages within an intranet or Internet. This is readily available to all users. The website has the following advantages:

  1. Accessibility from any computer connected to the network.
  2. Hyperlinks to quickly navigate to specified content.
  3. Search engine to find specific content.
  4. Graphics for capturing screen shots and other diagrams for more effective communication.

PDF - Similar to HTML, a PDF file can be delivered within an intranet or Internet. Its strengths are similar to HTML with some distinguishing features:

  1. Can be delivered as one file, as in an email as an attachment, in case the user does not have network access.
  2. PDF format is more consistent for printing on physical paper.The content is locked from changes.

Complete paper found at "How to Develop User Friendly Macros ", related SAS Programming and sample SAS Macros...

Bookmark and Share

Saturday, April 25, 2009

Making Code Review Painless

Code review is an essential step in the development of concise and accurate SAS® programs. It is a required verification step in performing validation in a regulated environment. This paper will present techniques and a macro that can be freely downloaded to automate this task. The %coderev macro will perform many of the common tasks during a code review including:

1. Spell checking headers and comments
2. Reviewing all input and output datasets of the program
3. Comparing defined macro variables versus macro variable usage
4. Checking for multiple macro calls that are not in a macro library
5. Evaluating hard code logic Evaluating sort order of all datasets

These tasks are normally performed by an independent reviewer instead of the original programmer. By automating the tasks, the code review process will ensure that the smallest mistake can be captured through reports to ensure the highest quality and integrity. What normally is a dreaded task can now be done with ease.

Code Review Background
In the larger scheme of SAS program verification, code review plays a significant role in ensuring the quality and integrity of your analysis. However, it is sometimes not viewed as important because code review is a mundane task that is not fully appreciated for the value it delivers. The verification of SAS programs and output include some of the following tasks.
  • Code Review - Systematic review of SAS programs according to a predetermined checklist of verification criteria.
  • Code Testing - Perform testing on SAS programs or macros supplying valid and invalid inputs and verify expected output.
  • Log Evaluation - Evaluate the SAS log for error, warning and other unexpected messages.
  • Output Review - Visual or programmatic review of report outputs as compared to expected results.
  • Data Review - Review attributes and contents of output data for accuracy and integrity.
  • Duplicate Programming - Independent programming to produce the same output for comparison.

This list includes verification tasks that may not apply to all programs since not all programs produce output analysis datasets or output reports. However, if you are verifying a macro that is used many times among all your team members, the time invested in performing as many of these tasks as possible is worth the effort. In a regulated environment, these tasks are not just recommendations but become requirements.This paper will focus only on the first task of code review, although it will reference other aspects of SAS program verification. Even though the code review is the first item in the list of tasks, the order in which you perform your verification tasks can vary. They can be performed independently by different reviewers or by one reviewer either in parallel or serial. It is recommended that the reviewer is a different person from the original author of the SAS program to ensure objectivity. If possible, this effort can be outsourced to an external group so that there are no preconceived assumptions held from within your department. Once all the tasks are performed, a centralized summary report is compiled to capture all the findings. The process of reconciling these discrepancies goes beyond the scope of this paper but performing code review is an important step in this process.

Automating Review
Some critical analysis is performed during the code review process, but a large part of the task is repetitive. This is one of the main reasons why people dislike performing code reviews. After performing multiple iterations of the same review tasks, the work becomes mundane and the reviewer tends to get blurry eyed. This repetitive fatigue leads to sloppiness because the reviewer loses the fresh acuity of performing the very first verification task.

More details in MXI Papers and SAS Validation Software.

Bookmark and Share

Friday, April 24, 2009

Outsouring and Offshoring SAS Programming

Globalization is the reality of doing business in today's economic environment. SAS programmers and organizations that use SAS who choose to ignore it will face extinction. Outsourcing has had a significant effect on many industries dealing with information technologies. Knowledge workers face the same challenges now as manufacturing workers faced in previous decades. This paper will give you insights into offshore outsourcing as it pertains to SAS and provide strategies on how to navigate and work within this environment. Some of the topics discussed include:

  • Pros and Cons of outsourcing SAS related programming

  • Types of SAS programming within the Pharmaceutical industry that will be outsourced

  • Strategies on successfully managing a local team while outsourcing SAS projects

Outsourcing is not a new phenomenon but rather another step in the evolution of doing business in a technology driven global economy. SAS work is increasingly being outsourced to local Contract Research Organizations. Companies give employees the flexibility to telecommute and work remotely on some days of the week. The use of CROs and practice of telecommuting give companies the flexibility and competitive advantage. Offshore outsourcing utilizes some of the same methodologies to provide companies with an even more competitive method of performing tasks such as research and development. It is therefore prudent for organizations to evaluate how outsourcing will impact their SAS programming work.

Outsourcing Overview
Offshore outsourcing can be a very emotional topic. Only a few months ago, we had a contentious presidential race that used this as a campaign issue. It does have profound affects on jobs and how Americans will be working in the future. Rather than focusing on the emotional aspects of job loss, this paper will evaluate the issue from a rational and business perspective. No matter if you are for or against information technology offshore outsourcing, it is certain that it is a growing phenomenon. There are several studies that show the fast rate of increase in several IT sectors. An IT outsourcing study conducted by Diamond Cluster International showed that 86% of Global 1000 IT executives and providers of IT outsourcing services who participated in the study expect outsourcing to further increase next year. This study indicated that although reducing costs is the number one driving force behind outsourcing, another factor is that organizations are trying to free up internal resources to focus on other business critical functions. A University of California Berkeley study estimates that 14 million service jobs will be affected by outsourcing. Research has shown tremendous growth in the area of offshore outsourcing within the last ten years and the trend is not slowing down. An IT research firm, IDC, estimates that IT off shoring will increase by more than 500 percent by 2007.

Although the evidence from research shows an increase in the use of outsourcing, other studies show how ill-prepared American companies are. A survey was conducted in late 2004 and early 2005 entitled “Crunch Time: The Competitiveness Audit” of more than 300 CEOs and business executives at North America technology and telecommunications companies. The results show that most companies have not yet put in place new processes and practices to compete in a rapidly evolving global marketplace. Only about one third of these organization surveyed has instituted processes for accessing their competitive functions. The Pharmaceutical industry is more conservative and moves much slower than its IT counterparts. Therefore, even less Pharmaceutical companies have progressively instituted processes for implementing outsourcing strategies. It is prudent to act cautiously while trying to implement a new process such as offshore outsourcing, but to ignore and not act within a competitive environment is a formula for failure.

Pros and Cons
The number one advantage and reason why most organizations embark on an outsourcing project is to bring down costs. The cost of employing talented people with minimal operating costs makes offshore outsourcing an attractive solution in a competitive environment. It is however not a panacea. If the project is ill defined or the wrong kind of project is selected, it may cost more in implementing an outsource solution.

An advantage to outsourcing can be seen in less mission critical tasks since it can free up resources for your team to focus on more critical projects. For example, SAS program validation and data listings or CRF tabulations can be outsourced to relieve your Biostatisticians and SAS programmer analysts. The Biostatisticians can therefore devote time to designing the analysis plan or writing the final report, while the SAS programmer analyst can focus on developing CDISC compliant analysis datasets or programming complex summary table. This is an example of how outsourcing the correct project can really allow your team to work more efficiently in terms of both time and cost. If the right projects are selected, it will boost the motivation of the team since it will give them the most challenging tasks. However, if communication is not clear, outsourcing can damage moral and can be misinterpreted as a form of removing opportunities and jobs.

complete paper available at SAS Outsource Papers and SAS Outsource Service.

Bookmark and Share

Thursday, April 23, 2009

Cost Effective Ways to Generate DEFINE.XML

You can have greater understanding and management of your data if it is well documented with data definition documentation in the format of DEFINE.PDF and DEFINE.XML. As the number of datasets and variables increase, this can be very resource intensive. The time consuming documentation task is compounded by the fact that there are constant changes to the data so the documentation has to keep up with the changes in order for it to be useful and accurate. This paper will suggest methods and tools that would enable you to document your data definition document without purchasing a complex expensive system.

When you plan for a road trip, you need a map. This is analogous to understanding the data that is going to be part of an electronic submission. The reviewer requires a road map in order to understand what all the variables are and how they are derived. It is within the interest of all team members involved to have the most accurate and concise documentation pertaining to the data. This can help your team work internally while also speeding up the review process which can really make or break an electronic submission to the FDA. Some organizations perform this task at the end of the process but they really lose out on the benefits which the document provides for internal use. It is therefore recommended that you initiate this process early and therefore gain the benefit of having a road map of your data.

The process that is involved in managing and creating the data definition documentation is as follows:

The process is an iterative one since the SAS datasets are updated. The constant need to update the documentation is therefore one of the challenges which this paper will address.

Levels of Metadata
There are several steps towards documenting the data definition. Most of what is being done is documenting metadata which is information about the data that is to be included. There are several layers to the metadata. These include:

  1. General Information – This pertains to information that affects the entire set of datasets that are to be included. It could be things such as the name of the study, the company name, or location of the data.
  2. Data Table – This information is at the SAS dataset level. This includes things such as the dataset name and label.
  3. Variable – This information pertains to attributes of the variables within a dataset. This includes such information as variable name, label and lengths.

The order in which the metadata is captured should follow the same order as the layers that are described.

Automating Capturing and Editing
Tools such as PROC CONTENTS and Excel do have capabilities to customize and automate the documentation to a degree. They are not however intended specifically for creating data definition documentation. These tools therefore have limitations. A tool that was developed entirely in SAS specifically for generating this type of documentation is Defindoc. This tool contains both a graphical user interface and a macro interface to fit the user’s requirements. The tool addresses all the disadvantages of the manual methods. It uses a similar PROC CONTENTS type of mechanism of capturing the initial metadata. However, it only retains the specific information that is pertinent to the data definition documentation.

complete paper found at define.xml papers, and define.xml software.

Bookmark and Share

Wednesday, April 22, 2009

Optimizing Your SAS Performance

Google has become very successful by developing an efficient search engine running on commodity hardware. It no longer uses the old model of putting all its resource onto one super computer, but rather it spreads that processing onto a cluster of smaller machines running in parallel to form a grid. Gordon Moore made an observation in 1965 predicting that the number
of transistors per square inch used on computers would double every year. This trend has become law and continues to elevate the ubiquitous and relatively inexpensive desktop and laptop computers. This paper will discuss how you can cluster computers in a grid to optimize the execution of SAS programs. Some of the techniques discussed include:

  • Implementing supercomputer power with commodity hardware

  • Submitting SAS programs sequentially while maintaining inter program dependencies

  • Threading multiple groups of programs for optimal performance

  • Measuring SAS performance with Statmark, a standard metric for a cross platform benchmarking for SAS processing

  • Scheduling the execution of programs in a grid environment

In the world of Moore╩╣s law, it makes less sense to lay out large capital investment for a server. Clustering inexpensive smaller machines and dynamically adding new computers to this architecture within a grid can scale your SAS computing resources to become the Google of search engines.

In the space of analytics as statistical models get more sophisticated and the datasets gets larger, computing resources is much needed engine that delivers results. SAS has evolved along with hardware systems to utilize the horse power needed to crunch the statistical models and data manipulations. When I first started working with SAS, it was on a main frame computer system running TSO. This was centrally controlled with very limited user customization from a dumb terminal. As computing chips got smaller, the processing of SAS started to move toward smaller UNIX servers. Then the introduction of SAS on personal computers dramatically changed how most users performed their data exploration. Users were testing out their data models and reports on their PCs, although they still executed things on a networked server for production jobs. This evolution is continuing as the desktop is becoming more powerful. With maturing technologies used to connect these desktop computers, PC desktops are beginning to form computing grids that can outperform the traditional servers. The forces that drive this include the shrinking size and cost of computer chips while performance is increasing. This is coupled
with the lowering cost of memory and storage. These combined elements supply analytical tools such as SAS with greater abundance of computing resources. We are at a juncture in this evolutionary stage where the ways the computing resources are utilized can be more important than just obtaining the resources.

IT managers need to evaluate the cost of the lifetime of a server since the price to performance ratio of the computing resources would diminish over time. It is similar to purchasing a car in that the performance of the car does not go any faster but the value of the car is constantly going down. Computing resources have an even lower return on investment in that they become obsolete very quickly as the next model is usually cheaper, yet outperforms the current server model. It is therefore not always prudent to put out large capital expenditures on a piece of hardware when its performance to price ratio will diminish in such short spans of time. Grid computing offers a different model in that commodity hardware can be expense with less cost. There is greater flexibility in that the grid can scale to match the performance of a growing group without necessarily throwing out the old server for replacement of the new. Nodes can be added and older nodes can be taken off like a living organism shedding dead skin. In the Grid, the newer nodes have the advantage of obtaining the fastest computing power for the cost at that time. This spreading out of the capital expenses on computing resources is analogous to the time valued benefits of spreading out your investments and investing small amounts over your lifetime to form a balanced portfolio instead of putting one big sum investment into a single stock. It acts as a buffer towards the ups and downs of the markets. In this case, it is not the financial market but rather the market of computing hardware cost. As hardware costs continue to get cheaper per price performance, the cost of software seems to get more expensive as the complexity of the software increases. Licensing SAS is not cheap so it is wise to optimize the hardware which SAS runs on since over time, the hardware cost will be a fraction compared to the software cost.

One of the key components in the optimization of computing cost is the ability to measure with precision the performance of your system. This metric can help you evaluate the return on your investment. Without any form of measurement, it is like shopping for a credit card without having the ability to know what the APR or interest rates are. This paper will premiere a free utility called Statmark by MXI that will allow you the tools to make the right decision in hardware implementations.

SAS Institute has had the technology to run its jobs on remote machines for many years with SAS/Connect. It utilizes protocols such as TCP/IP to connect to a remote machine and have your program run remotely. SAS Grid computing leverages this along with other software such as the Grid Manager to optimize the performance of SAS on multiple nodes to optimize the computing resources within a grid.

more found at paper at Sy Truong papers and SAS Performance Tuning and SAS Grid Computing.
Bookmark and Share

Tuesday, April 21, 2009

SAS for Excel Jockeys

The ubiquity of Microsoft Excel and Word on desktop computers has made it a default entry point for many users to view and edit their information. Once Excel was included into the MS Office suite in 1993, it became the killer app overtaking other spreadsheet heavy weights such as Lotus 1-2-3. Although Excel has analysis capabilities, it does not have the powerful statistical procedures and depth of SAS. When analyzing certain types of data, such as financial information, Excel is the tool of choice. Its capabilities to easily generate graphs along with the visual pivot table provide powerful methods to view and analyze data. In certain cases, however, Excel is not capable of performing particular tasks which SAS can provide. Some of the topics that this paper will elaborate on include:

  • Connecting SAS to Excel – The use of TCP port communication allows Excel to connect directly with a SAS session

  • SAS Macro Management – Managing SAS macros that can be delivered to Excel users

  • Deliver SAS Data – SAS Datasets can be delivered directly to Excel or Word in optimized smaller blocks

  • Pivot Table and Graph – SAS data can be formulated and delivered to Excel users in the form of a Pivot Table and/or Graph (Excel Chart)

The union of SAS and Excel enables power users and decision makers, who may not be SAS programmers, to fully explore their business data with the full analytical power of the SAS system.

SAS and Excel
The advent of the modern spreadsheet has revolutionized how users analyze data such as financial data. It visually displays information in cells similar to how a professor may describe the information on a chalk board. The disadvantage of a chalk board is when a change or mistake is made. Rather than having to erase each item on the board, the cells of a spreadsheet can be updated through a formula. Programming languages have evolved to work with spreadsheets so they can provide a very efficient method for financial analysts to perform analysis interactively. There are many advantages to this approach but there are also some limitations. Some of these limitations include:

  • Change Control – The changes to the spreadsheet are very interactive so the information stored previous to updates are lost. This can lead to regulatory compliance issues or limit the ability to easily roll back to an earlier version.

  • Security – Spreadsheets are designed for individuals to work on their own set of data. They cannot easily handle multiple users with different permissions. This combined with the lack of change control makes it difficult to function as a secured system for large sets of data in a large organization with many users.

  • Cell Based Formula – Formulas defined for a cell in Excel are defined for a particular cell. An example is to sum up all the cells by their identified cell row and column name, =SUM(A1:A3). In this example, the cells A1, A2 and A3 are summed to a specified new cell such as A4. This is limited to cell A4 and is not easily replicated in an array of multiple cells. There have been scripting languages to enhance the capabilities of formulas but it is still limited since formulas are designed as expressions and not designed for complex algorithms which require a full featured programming language.

  • Multiple Users – Spreadsheets are designed for a single user. This creates limitations if multiple users need to update the same data.

  • Statistical Analysis – Formulas are designed as an expression to derive at a numeric answer. There is some statistical analysis that can be applied through a spreadsheet but this approach is limited. It is not optimized for more complex statistical modeling such as performing a multi-variate regression analysis.

The exploratory and visual aspects of spreadsheets make them very suitable for certain types of financial calculations. However, some of the limitations mentioned above can prevent power users from performing data mining to truly model business conclusions that can be delivered by SAS. SAS has the powerful programming language including a library of statistical procedures which deliver to users functionality beyond the capabilities of spreadsheets. The two tools can function together symbionically to form a complete solution. This paper will explore the integration of SAS and Excel in ways that give SAS programmers methods of delivering to Excel jockeys the power of SAS without requiring them to program SAS.

complete paper is found at: SAS and Excel published paper , SAS Export to Excel, and SAS Programming. Bookmark and Share

Monday, April 20, 2009

MedDRA and WHO Drug Easy as a Google Search?

In a world where information can easily be accessed by applying a Google search, mapping unstructured clinicians term or verbatim terms from an adverse event or a drug name to a standard term no longer needs to be an arduous task. One of the challenges of working with coding medical terminology is combining several skills that are diverse from many different disciplines. The user needs to be clinically trained to understand and interpret the meaning of the adverse events or the drug names. A conceptual understanding of the normalized database and the multi-axial hierarchical structure is required to navigate the dictionary. The user must also be adept at adding to the source data and joining the case report form data with the proper fields of the dictionary tables to derive at the final mapped data. Expecting users to overcome these hurdles without a clear process or tools can lead to an error prone, laborious and painful process. This paper will address many of the issues confronted when coding terms by demonstrating tried and true methodologies and technologies to automate and make the process efficient and easy.
  1. Auto Coding Auto code with existing dictionaries against source data for efficient mapping.
  2. Google like Search – Searching for terms in the dictionary or mapping decision history can be as simple as a Google Search.
  3. Optimize Coding Decision – Intelligence search recommendations, review process, and managing split terms are some techniques used to optimize.
  4. Optimize Dictionary – Loading source dictionary from MSSO (MedDRA) or UMC (WHO Drug) in an optimized fashion for performance.
  5. Managing Multiple Dictionaries – Organize dictionary in a centralized and hierarchical manner to have consistent coding decisions.
  6. Build Knowledge Base – Manual code only once and the term will be added to a knowledge base for future coding.
  7. Create new Mapped Data – Techniques for creating mapped data sets with the use of email to make the process seamless.

It is essential that you have a consistent thesaurus dictionary when performing an analysis on clinical terminologies. This paper will show processes along with SAS based software solutions such as Sy/Map™ to allow clinical users to function optimally with data managers and clinical programmer analysts. Armed with the understanding of the process and the right tools, you can narrow the gap between the different disciplines required to perform mapping decisions in a manner that is as easy as applying a Google search.

Controlled Terminology Introduction
The coding of patient data is critical in the grouping, analysis, and reporting of data. Coding decisions directly impact submissions for New Drug Applications (NDAs), safety surveillance, and product labeling. The success of a submission to the FDA can be significantly impacted by the analysis of adverse events, medical history and concomitant medications. The analysis relies on the interpretation of what has been transcribed from the subject CRF (Case Report Form). The original clinical term is referred to as the clinician’s term or verbatim term. This term needs to be re-interpreted or coded into a preferred term in order for it to be used during analysis. This is because different verbatim terms can have the same meaning such as in the example of the term "pain in head" or "headache". In this case, the two distinct verbatim terms are coded to one synonymous preferred term. The identical terms and the consistent classification of the term allow the analysis to draw valid statistical conclusions pertaining to the subject’s experience. The coding process can therefore affect the statistical interpretation of the adverse events or medications in which the subject is taking during the clinical trial.

There is room for inconsistency or error since the process contains many factors that go into making a decision. The following considerations are evaluated in making the optimal interpretation of the true meaning of the clinician’s term.

Clinical Accuracy – The interpretation of the meaning of the original term has to be clinically correct. In addition to the original term, other supportive information in the CRFs (e.g. drug indication and strength) is also used to ensure the accuracy of the mapping decision. The person performing this task needs to be clinically trained to decipher the meaning of the verbatim term as it relates to the preferred terms.

  • Global or Project Specific – The coding decision of a term in one specific project can be used again on other projects. It is therefore important to keep a global perspective while making a decision. However, there are instances where a coding decision needs to be applied specifically to special circumstances of the project.
  • Patient History – It is useful to look at the clinical history of the patient in order to understand what led up to the current situation. This allows the clinician to have a historic understanding and therefore make a more accurate interpretation of the terms. However, the decision cannot be subject specific since this needs to be applied to all subjects.
  • Dictionary Update - Understanding the structure of the dictionary and keeping up with the changes to the dictionary is critical for the success of mapping terms.

There are many factors that affect the interpretation of a clinical term and therefore the process becomes very complex. Besides the decision process, there are other operational and logistical considerations. The original clinician term can contain multiple terms so it needs to be split into separate distinct terms. This will therefore be coded separately. There are different versions to the dictionaries so version control becomes very important. There are many team members involved in this effort so training and standard operating procedures need to be established in order for the team to work together consistently. This multi-faceted process is complex but once a process is established, everything can work together in harmony so that terms are coded systematically and accurately to produce efficient results.

Mapping Methodologies
After the SAS based customized dictionary is established, the mapping can be preformed. There is a series of steps that needs to be performed in order to have your verbatim adverse events or drug names coded to the synonymous preferred terms. This section will describe the methodologies used to effectively manage thesaurus dictionaries and code the verbatim terms.

Before individual terms can be coded, thesaurus dictionaries need to be organized and managed. You would need to first identify and classify the types of dictionaries. The types of classifications for dictionaries include:

  • Dictionary Class - Example classifications of dictionaries include WHO Drug, MedDRA or Costart. This describes the content of the terms pertaining to drug or adverse event names and how it is related to an "external" dictionary. The word "external" in this case means a level of dictionary which will be described in more detail in the next section.
  • Level – A dictionary can be globally applied to all studies or it can be used specifically for a certain study. It can therefore be distinguished as either "global" or "project specific". These two types of dictionary levels pertain to terms managed internal to your organization. This is different from the dictionaries that are managed by an "external" organization such as MSSO who manages MedDRA. These external dictionaries are updated regularly by an external vendor and are managed differently from internal project specific or global level dictionaries.
  • Attribute – Administrative attributes or metadata that describe the dictionaries are also essential in managing dictionaries. This includes values such as a unique name, physical location where associated dictionary files are located, and data sets name that stores the dictionary.

The classification information mentioned above needs to be managed in a way which allow users to register new dictionaries in the event that a new version of the dictionary is made available. Modifications to existing information are also necessary in the event that the underlying metadata is changed. The deletion of an existing dictionary can also be applied. Note that this does not necessarily mean that you are deleting the underlying data sets which store the content of the dictionary, but rather just removing the registered information or metadata. External dictionaries are managed by the vendor but internal dictionaries such as global or project specific need the capabilities of having old terms retired. This means that when a specific coding decision based on a term in the dictionary is no longer valid, it can be removed from the dictionary by a designated administrator. These are some of the tasks that are necessary in managing thesaurus dictionaries in order to optimize the performance of coding terms.

complete paper available at Coding Dictionaries Papers and related AE Coding Software

Bookmark and Share

Sunday, April 19, 2009

CDISC Implementation for Dummies

CDISC (Clinical Data Interchange Standards Consortium) standards have been in development for many years. There have been major structural changes to the recommended standards from version 2 to 3. It is still an evolving process but it has reached a point of critical mass such that organizations are recognizing the benefits of taking the proposed standard data model out of the theoretical realm and putting it into real life applications. The complexity of clinical data coupled with technologies involved can make implementation of a new standard challenging. This paper will explore the pitfalls of the CDISC SDTM (Submission Data Tabulation Model) transformation and present methodologies and technologies that make the transformation of nonstandard data into CDISC efficient and accurate.

There are some tasks within the process that can be applied asynchronously, but the majority of the steps depend on each other and therefore follow a sequence. The process is described below:

It is important to have a clear vision of the processes for the project before you start and to be aware that the effort is resource-intensive. This provides the ability to resource and plan for all the processes and enables adherence to deadlines and budgets. The organization and planning for this undertaking is an essential first step towards an effective implementation.


STEP 5: TRANSFORMATION SPECIFICATION – The specification for transforming to CDISC standards is a detailed road map that will be referenced and used by all team members during the transformation implementation and review. There can be different technologies used to perform this task. The following example utilizes tools including MS Excel and Transdata. Dataset transformation is a process in which a set of source datasets and their variables are changed to meet new standard requirements. The following list describes some of the changes that can occur:

  1. Dataset Name - SAS dataset names must be updated to match SDTM standards, which require them to be no more than 8 characters in length.
  2. Dataset Label - The descriptive labels of SAS datasets must be modified to conform to SDTM standards.
  3. Variable Name - Each variable within a SAS dataset has a unique name. Variable names can be the same across different datasets, but if they share the same name, they are generally expected to possess the same attributes. Variable names are no more than 8 characters in length.
  4. Variable Label - Each variable has an associated label that describes the variable in more detail. Labels are no more than 40 characters in length.
  5. Variable Type - A variable’s type can be either character or numeric.
  6. Variable Length - A character variable can vary in length from 1 to 200 characters.
  7. Format - Variable format will be updated.
  8. Yesno - If the value of the variable is "yes", it will produce a new row with the newly assigned value of the label.
  9. Vertical - Multiple variables can be assigned to one variable that will produce a row if it has a value.
  10. Combine - Combine values of multiple source variables into one destination variable.
  11. Drop - The variable from the source dataset will be dropped when creating the destination data.
  12. Same - The same variable with all of the same attributes will be kept in the destination data.
  13. Value Change - This can have either a recoding of values or a formula change. This will change the actual values of the variable.

There may be other types of transformations, but these are the common transformation types that are usually documented in the specification. The transformation specification is stored in an Excel spreadsheet and is organized by tabs. The first tab named "Tables" contains a list of all the source tables. The subsequent tabs contain the transformation specifications for each source dataset as specified in the initial tables tab.

more details at: CDISC Papers , CDISC Software, Define.xml Software and SAS Outsourcing

Bookmark and Share

Friday, April 17, 2009

Can Validating SAS Programs be Fun and Easy?

Validation is normally a laborious and arduous task. This paper will present new methodologies and tools developed in SAS that will make the process painless. The goal is to add little or no effort from the user's perspective, yet gain the benefit of having a secured audit trail of all SAS programs during the development and verification process. Some benefits are described below:
  • comparing differences between different versions of programs
  • adding notes describing edit changes to each version
  • adding a validation checklist of tasks associated during verification and validation
  • managing status of development to production by applying version numbers such as version 1.2
  • generating reports for documentation and communication during validation

After you realize the ease of use and the amount of quality control that can be gained, the task of validation becomes transparent and fun.

Validating SAS programs presents some unique challenges especially when working within a regulated environment such as the pharmaceutical industry. This paper explores the challenges specific to this environment though the examples can be useful in other environments as well. SAS programmers come from many different backgrounds that range from biology to statistics. The majority are not from a computer science background. This is normally due to the fact that they have expertise in the domain of the data in which they are analyzing. This is helpful for ensuring the outcome of the analyses but creates an unstructured environment for developing SAS programs. The work flow is driven by reports and therefore is usually done in an ad hoc manner. The analyst normally gets mockups of the report which describe what they need to produce. They often jump right into SAS programming with little or no data and programming design consideration. SAS has adapted to this work flow well compared to other more structured high level languages. Other languages such as C or Java are stronger typed. This means that the variables and tables have to be defined with proper variable type and length before they can be used. On the other hand, SAS programs can dynamically create variables as you go along lending itself to the ad hoc nature of the development process. This can be beneficial for creating exploratory analysis and conducting experiments with the data. However, it fosters software development that is riddled with maintenance challenges. The tools used to develop SAS programs such as display managers or text editors are further examples of ad hoc nature. Display manager gives some structure, but it is designed for exploration. Software development tools for other languages allow for the programmer to manage the source code as it relates to other programs and data. On the other hand, SAS programs are plain text files that any user can edit with a text editor of their choice. In a similar way, display manager leaves the programs stored on disk as text files and does not create any other structure upon that.

This is an example of how SAS programming is difficult to validate. Programs are inherently buggy by nature since there is great variability and complexity. It is written by humans but is interpreted by machine. Even though the syntax is set up with constructs to handle the parameters to help, humans do not think in complete logic. This leads to misinterpretations and bugs. SAS programming is often data driven which adds another dimension to the complexity since the data can be dynamic in content and structure. The changes in data drive changes in results and therefore changes in programs. The management of changes of each component and all of its interrelationships makes SAS programs an ever changing organism desperately in need of containment. The issue of change control will become a major strategy in taming the beast and is one of the primary themes of this paper.

One of the attempts to create structure around the chaos is the use of SAS macros. Macros are intended to isolate repeated tasks and parameterize them so that they can be repeated. However, the way macros are sometimes used leads to spaghetti code since one macro calls another macro in a nested loop. This sometimes results in more complexity and becomes more challenging rather than simplifying.

Validation Benefits
There are many challenges in creating an effective validation environment for SAS programs but there are also many benefits that can rationalize the effort. There are reasons to make a strong business case for performing validation. The most obvious is the requirements by the FDA spelled out in CFR part 11. In a regulated environment, it is not just a nice idea to perform validation, but it is a legal necessity. Here are some examples of other important benefits.

  1. Less Roll outs – Each time a program is rolled out, it is commonly followed by patches. This is to fix bugs that did not get caught during validation.
  2. Prevent Data Corruption – Bugs can be traced back to programs that have not been fully validated. Using these programs creates corruption in the data and reports.
  3. Facilitates Communication – The requirements and functional specifications along with the test scripts can be developed with close collaboration with the end user. This leads to a clearer understanding between the user and the developer.
  4. Software Maintenance – During validation testing, versioning and an audit trail are created. This helps with tracing and attributing features and bugs. This audit trail leads to better tracking of programs between different releases which helps in the management of bugs and wish list items.

complete paper at: , SAS Validation software and SAS Outsourcing.

Bookmark and Share

Thursday, April 16, 2009

Cloud Computing with SAS

Cloud Computing is not a new computer architecture construct but there are several forces that are going to make it an alternative, if not the main way SAS software will be delivered in the future. This paper will detail some of the forces that are pushing many software solutions into Cloud Computing while describing how SAS is a microcosm of this transformation. Some of the topics being presented in this paper include:
  • Cloud Computing Introduction
  • Driving Forces for Change
  • Software Components of Cloud Computing
  • Requirements for Success

The development of Cloud Computing has been in the making for some time. There are still some challenges and SAS needs to prepare and make a shift to this computing environment if it is to remain a dominant force in business intelligence software.

As software becomes more sophisticated to handle larger and more complex sets of data, the computer systems that run these applications also become more challenging to manage. New system architecture such as Grid computing for parallel processing and multi-tiered computer architecture are designed for scalability to optimize performance. This environment enables hardware configurations to deliver and match user demand for data mining and powerful analytics. IT managers in organizations that are responsible for the installation and management of these systems are faced with daunting challenges that go beyond their resources. This is because each vendor and software system has its own unique computing configuration requirements and thus demands a full time member to validate each installation and perform the administration and maintenance of that system. This poses challenges but also opportunities which give rise to Cloud Computing. Cloud Computing embraces the complexities and unique requirements of complex software solutions, yet delivers them to users in much less time and start up cost. Cloud Computing is therefore going to alter the way computer systems will be implemented by large organizations and small organizations alike.

Software as we know it is going though a fundamental change in the way it is delivered to users. Historically, it has gone through several stages in its evolution and it is now going to make another transformative leap as it adapts to Cloud Computing. Early in software development, the main processing of software resided on a mainframe computer. Users connected to it through a dumb terminal which only displayed text on monochrome monitors. All the computing processes and related files were centralized on the mainframe. With the advent of the personal computer, the processing power on the user's desktop became more powerful so software shifted and was installed on desktop computers. Many software packages were delivered in a box, shrink wrapped and stored on CDs. The SAS system reflects this evolution with earlier versions of SAS on mainframes and version 8/9 migrating to desktops. The complexities from multiple offerings from SAS pushed the boundaries of software in the box model and is transitioning it into a new environment of Cloud Computing. Elements of this are beginning to happen as the SAS software is now delivered as a download.

The mass market acceptance of services such as Google and Amazon has helped push the evolution of Cloud Computing. The complexity of the software configuration is moving to a separate set of servers managed externally to the internal IT resources or individual software user. The hardware and software for these new systems are placed behind a "cloud" on data centers which are managed separately and accessed through the Internet. It is then delivered to users through a web browser so there is no longer the need for lengthy installation, validation, version upgrades and other related software maintenance. The portability of being able to access the software and related data combined with the ability to outsource the management of sophisticated software systems is proving to be more efficient and necessary for software such as SAS.

There are many forces at play that are creating a ground swell for the change that is enabling the growth of Cloud Computing. Some forces are technology advancements which have been evolving over the past decade, while others include market and economic forces that are more recent and can be more potent in specific vertical markets. The technology developments by themselves are not significant but the culmination of all them has profound effects across industries. The following list describes some of the reasons for the change and also provides insights on how you can adjust to and thrive in this dynamic environment.

  1. Telecommunications – With the dot com boom of the late 90s, there were fiber optics and high bandwidth data infrastructure put in place for global connections. This allowed access to fast speed voice and data connectivity. The dot com bust during the start of the millennium affected many of the companies that established this infrastructure by forcing them out of business. This offered the fiber optics and equipment for whole sale or at a market discount. This created cost effective opportunities for remote computing in many environments and in particular for the development of Cloud Computing.
  2. Commodity Hardware – Computing has evolved from expensive centralized mainframes to personal desktop computers (PCs) that have become ubiquitous. The mass production and personal consumption of PCs has made it a mainstream appliance which has driven down their prices. Although these PCs were originally designed for individual use, they can also be networked and clustered together into grid computing to form powerful super computers which provide the back bone of Cloud Computing. The power of Cloud Computing is no longer restricted to large institutions with large budgets but can be implemented at a fraction of the budget due to the commoditization of hardware accompanied with new clustering software.
  3. Open Source – The decentralized approach to software development challenged the institutionalized form of software development. It allowed for individuals to contribute program code updates to form sophisticated operating systems such as Linux that runs on most web servers and many computer electronic devices. Open source software also played a significant role in web server technologies that form the core components of Cloud Computing. In addition to the Linux operating system that was first installed on commodity hardware, Apache web servers provided web services. The server side software on these web servers along with middle ware including client side XML based components are all open source forming the foundation for many Cloud Computing applications.

complete paper was awarded best paper at SAS Global Forum... and related Serious Adverse Event Software ...

Bookmark and Share

Monday, April 13, 2009

SAS Programmers Need to Know Regulations?

Validation was introduced to the FDA in the mid 1970s but it is very much alive and relevant in today’s drug development environment. The recent recall of Merck’s Vioxx in September of 2004 after a study linked it to increase heart attack and stroke demonstrates the importance as to to why we need careful scrutiny from our regulatory review of the drug approval process in order to prevent such disasters. The need for validation is affirmed in the more recent event in July 2007 when the FDA concluded that GlaxoSmithKline’s type 2 diabetes drug Avandia increases heart attack risk. Ideally, the FDA would be proactive in deciding on regulations that would prevent dangerous drug recall after many deaths due to drugs that has been proven to be harmful. However, historically, the FDA has been reactive in proposing precautionary procedures such as validation. The early proposals for validation were in direct response to problems with sterility as it involves the production of parenteral products. Validation procedure then spread to other processes as it evolved to the guidelines in 1987 in "Establishing documented evidence that provides a high degree of assurance that a specific process will consistently produce a product meeting its pre-determined specifications and quality attributes.".

Although validation permeates all aspects of drug development ranging from manufacturing to computer systems, in the world of analytics and the use of SAS, computer system validation regulation has a direct impacts. The FDA published in 1983 the guide to for Computerised Systems in Pharmaceutical Processing. This developed into a regulation 21 CFR Part 11 for rules on the use of electronic records, electronic signatures in 1997.

There are many provisions to the guidelines but one of the core themes is defined as the “Confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled”. This means that for any software developed, it needs to be predetermined from requirements in a consisted and expected manner. This structured approach is intended to have computer system function as intended from its design. In a validated system, you would define all the requirements and functional specifications first and then test and verify that the software function according to these requirements. In most cases this process works well with drug development but can pose many challenges in an environment of data exploration where the end result is not always predefined. The data exploratory nature of SAS is very well suited for data analysis but often create challenges when working within the confines of a structured validated environment.

The SAS System is a comprehensive set of tools but it is only one component within a larger system used to perform data management, analysis and reporting of clinical trials data. Validation can only be effective if it is treated as a whole within an entire process. Even if you are developing a single program that reads one table from a relational database and then generate one report that goes into an electronic submission, your single program has a significant impact and therefore is interconnected to a larger system. There are different inherit risks associated with each program. The level of risks pertaining to your use of SAS has a direct correlation with the scope of your validation effort. It is therefore important to perform a risk assessment to determine the appropriate level of validation efforts applied to each component. All the components of your computer system will need to function according to to specification with a appropriate level of validation for the whole system to function consistently with integrity.

SAS Validation Examples
The tools used to develop SAS programs, such as display manager or text editors, are further examples of the ad hoc nature of SAS programming. Display manager gives some structure, but it is designed for data exploration. Software development tools for other languages allow for the programmer to manage the source code as it relates to other programs and data in organized units such as projects. However, SAS programs are plain text files that any user can edit with any text editor of their choice. In a similar way, display manager leaves the programs stored on disk as text files and do not enforce any other structure upon these files.

Similar to validating the SAS System itself, SAS programs need an organized methodology and approach for validation. One of the first steps in validating SAS programs is to have a test plan. The test plan is derived from a list of functional specifications. The functional specifications are, in turn, driven by the list of requirements. This is an interrelated set of documents that drive the process.

The level of detail of documentation for validation depends on the complexity of the set of programs being validated. The following levels distinguish the types of SAS programs.

  • Exploratory - These are random sets of programs developed by programmers and statisticians to test out a hypothesis. They are not included in the final analysis or part of a submission.
  • Stand Alone - These programs are developed to generate specific reports or analysis files. They may be driver programs that call other macros, but these driver programs are not used multiple times.
  • Multi-Use - These are usually macro code or standard code segments that are used multiple times in more than one analysis. They can be stored in a global library where multiple users can access them.

more to come in book I am working on "Becoming a SAS Programmer in Pharmaceutical Industry"...

Bookmark and Share