Validating Transformation to CDISC SDTM and ADaM
You have just finished transforming your data from your operational clinical database into SDTM and ADaM, now how do you go about validating this? In general you want to have a method that reviews both the structure according to the guidelines and the values of the data to ensure that nothing is changed from what was collected. The following steps uses SAS tools to validate and ensure the integrity of your final CDISC data.
- Transformation Model Validation- A transformation model documenting the source data and how it was transformed confirming the destination and source variables.
- Data Value Subset Review - An automated report printing out a subset of the data before and after the transformation is reviewed and validated. This may catch truncation.
- Categorical Aggregate Review - An automated summary report is generated summarizing the frequency counts of categorical variables verifying the counts are the same. This catches missing or dropped values.
- Continuous Aggregate Review - An automated summary report is generated summarizing the min, max median counts of continuous variables verifying the counts are the same. This catches missing or dropped values.
- CDISC Rules PROC CDISC - SAS tools such as PROC CDISC provides a short list of deviations or guidelines that may have been violated. This review is applied programmatically and a report is generated.
- Variable Lengths - An evaluation of all variable lengths and a report is generated with recommendations on standardizing lengths for variables across all data to adhere to standards.
- Deviation Summary - A summary report documenting all deviations and their resolutions.
- Test Plan - A formal test plan document is used to document all the related tests and deviations.
- CDISC Builder Rule Test - An 18 criteria check list. The list are shown here with an example report shown below:
- Required Fields: Required identifier variables including: DOMAIN, USUBJID, STUDYID and --SEQ.
- Subject Variable: (4.1.2.3) For variable names, labels and comments, use the word "Subject" when referring to "patients" or "healthy volunteer".
- Variable Length: (4.1.2.1) Variable names are limited to 8 characters with labels up to 40 characters.
- Yes/No: (4.1.3.7) Variables where the response is Yes or No (Y/N) should normally be populated for both Yes and No responses.
- Date Time Format: (4.1.4.1) Date or Datetime must be in ISO 8601 format.
- Study Day Variable: (4.1.4.4) Study day variable has the name ---DY.
- Variable Names: (3.2.3) If any variable names used matches CDISC variables, the associated label has to match.
- Variable Label: (3.2.3) If any variable labels match that of CDISC labels, the associated variable has to match.
- Variable Type: (3.2.3) If any variables match that of CDISC variables, the associated type has to match.
- Dataset Names: (3.2.3) If any of the dataset names match CDISC, the associated data label has to match.
- Dataset Labels: (3.2.3) If any of the dataset label match CDISC, the associated dataset name has to match.
- Abbreviations: (10.3.1) (10.4) The following abbreviations are suggested for variable names and data sets.
Acronym
Descriptive Text
AE Adverse Events AU Autopsy BM Bone Mineral Density (BMD) Data BR Biopsy CM Concomitant Meds CO Comments DA Drug Accountability DC Disease Characteristics DM Demographics DS Disposition DV Protocol Deviations EE EEG EG EEG EX Exposure HU Healthcare Resource Utilization IE Inclusion/Exclusion IM Imaging LB Laboratory Data MB Microbiology Specimens MH Medical History ML Meal Data MS Microbiology Susceptibility OM Organ Measurements PC PK Concentration PE Physical Exam PP PK Parameters PG Pharmacogenomics QS Questionnaires SC Subject Characteristics SE Subject Elements SG Surgey SK Skin Test SL Sleep (Polysomnography) Data SL Signs and Symptoms ST Stress (Exercise) Test Data SU Substance Use SV Subject Visits TA Trial Arms TE Trial Elements TI Trial Inclusion/Exclusion Criteria TS Trial Summary TV Trial Visits VS Vital Signs CAN ACTION ADJ ADJUSTMENT ADJ ANALYSIS DATASET BL BASELINE BRTH BIRTH BOD BODY CAN CANCER CAT CATEGORY C CHARACTER CND CONDITION CLAS CLASS CD CODE COM COMMENT CON CONCOMITANT CONG CONGENTTAL DTC DATE TIME - CHARACTER DY DAY DTH DEATH DECOD DECODE DRV DERIVED DESC DESCRIPTION DISAB DISABILITY DOS DOSE DOS DOSAGE DOSE DOSE DOSE DOSAGE DUR DURATION EL ELAPSED ET ELEMENT EM EMERGENT END END EN END ETHNIC ETHNICITY X EXTERNAL EVAL EVALUATOR EVL EVALUATION FAST FASTING FN FILENAME FL FLAG FRM FORMULATION, FORM FREQ FREQUENCY GR GRADE GRP GROUP HI HIGHER LIMIT HOSP HOSPITALIZATION ID IDENTIFIER INDC INDICATION INDC INDICATOR INT INTERVAL INTP INTERPRETATION INV INVESTIGATOR LIFE LIFE-THREATENING LOC LOCATION LOINC LOINC CODE LO LOWER LIMIT MIE MEDICALLY-IMPORTANT EVENT NAM NAME NST NON-STUDY THERAPY NR NORMAL RANGE ND NOT DONE NUM NUMBER N NUMERIC ONGO ONGOING ORD ORDER ORIG ORIGIN OR ORIGINAL OTH OTHER O OTHER OUT OUTCOME OD OVERDOSE PARM PARAMETER PATT PATTERN POP POPULATION POS POSITION QUAL QUALIFIER REAS REASON REF REFERENCE RF REFERENCE RGM REGIMEN REL RELATED R RELATED REL RELATIONSHIP R RELATIONSHIP RES RESULT RL RULE SEQ SEQUENCE S SERIOUS SER SERIOUS SEV SEVERITY SPEC SPECIMEN SPC SPECIMEN SPEC SPONSOR SPC SPONSOR ST STANDARD STD STANDARD ST START STD START STAT STATUS SCAT SUBCATEGORY SUBJ SUBJECT SUPP SUPPLEMENTAL SYS SYSTEM TXT TEXT TM TIME TPT TIMEPOINT TOT TOTAL TOX TOXICITY TRANS TRANSITION TRT TREATMENT U UNIT U UNIQUE UP UNPLANNED VAR VARIABLE VAL VALUE V VEHICLE - SEQ Values: When the --SEQ variable is used, it must have unique values for each USUBJID within each domain.
- Label Casing: For Dataset labels and variable labels, all non trivial words (more than three characters) must start with a capital letter with the rest of the characters lowercase.
- Required Values: (4.1.1.5) For required fields such as the ones specified in number 1, check to see if there are values. If there are any missing, values, report the observation number where it is missing.
- Similar Parenthesis: For labels with matching values inside parenthesis such as (Yes/No) within the same dataset, it will check to see if the variables have the same type and length. If not, it will report the differences.
- Required Variables: (4.1.1.5) A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables should always be included in the dataset and cannot be null for any record.
- Expected Variable: (4.1.1.5) An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Columns for Expected variables are assumed to be present in each submitted dataset even if some values are null.
Comments
Post a Comment