Validating Transformation to CDISC SDTM and ADaM

You have just finished transforming your data from your operational clinical database into SDTM and ADaM, now how do you go about validating this? In general you want to have a method that reviews both the structure according to the guidelines and the values of the data to ensure that nothing is changed from what was collected. The following steps uses SAS tools to validate and ensure the integrity of your final CDISC data.

Transformation Model Validation- A transformation model documenting the source data and how it was transformed confirming the destination and source variables.
Data Value Subset Review - An automated report printing out a subset of the data before and after the transformation is reviewed and validated. This may catch truncation.
Categorical Aggregate Review - An automated summary report is generated summarizing the frequency counts of categorical variables verifying the counts are the same. This catches missing or dropped values.
Continuous Aggregate Review - An automated summary report is generated summarizing the min, max median counts of continuous variables verifying the counts are the same. This catches missing or dropped values.
CDISC Rules PROC CDISC - SAS tools such as PROC CDISC provides a short list of deviations or guidelines that may have been violated. This review is applied programmatically and a report is generated.
Variable Lengths - An evaluation of all variable lengths and a report is generated with recommendations on standardizing lengths for variables across all data to adhere to standards.
Deviation Summary - A summary report documenting all deviations and their resolutions.
Test Plan - A formal test plan document is used to document all the related tests and deviations.

CDISC Builder Rule Test - An 18 criteria check list. The list are shown here with an example report shown below:

Required Fields: Required identifier variables including: DOMAIN, USUBJID, STUDYID and --SEQ.
Subject Variable: (4.1.2.3) For variable names, labels and comments, use the word "Subject" when referring to "patients" or "healthy volunteer".
Variable Length: (4.1.2.1) Variable names are limited to 8 characters with labels up to 40 characters.
Yes/No: (4.1.3.7) Variables where the response is Yes or No (Y/N) should normally be populated for both Yes and No responses.
Date Time Format: (4.1.4.1) Date or Datetime must be in ISO 8601 format.
Study Day Variable: (4.1.4.4) Study day variable has the name ---DY.
Variable Names: (3.2.3) If any variable names used matches CDISC variables, the associated label has to match.
Variable Label: (3.2.3) If any variable labels match that of CDISC labels, the associated variable has to match.
Variable Type: (3.2.3) If any variables match that of CDISC variables, the associated type has to match.
Dataset Names: (3.2.3) If any of the dataset names match CDISC, the associated data label has to match.
Dataset Labels: (3.2.3) If any of the dataset label match CDISC, the associated dataset name has to match.

Abbreviations: (10.3.1) (10.4) The following abbreviations are suggested for variable names and data sets.

Acronym	Descriptive Text
AE	Adverse Events
AU	Autopsy
BM	Bone Mineral Density (BMD) Data
BR	Biopsy
CM	Concomitant Meds
CO	Comments
DA	Drug Accountability
DC	Disease Characteristics
DM	Demographics
DS	Disposition
DV	Protocol Deviations
EE	EEG
EG	EEG
EX	Exposure
HU	Healthcare Resource Utilization
IE	Inclusion/Exclusion
IM	Imaging
LB	Laboratory Data
MB	Microbiology Specimens
MH	Medical History
ML	Meal Data
MS	Microbiology Susceptibility
OM	Organ Measurements
PC	PK Concentration
PE	Physical Exam
PP	PK Parameters
PG	Pharmacogenomics
QS	Questionnaires
SC	Subject Characteristics
SE	Subject Elements
SG	Surgey
SK	Skin Test
SL	Sleep (Polysomnography) Data
SL	Signs and Symptoms
ST	Stress (Exercise) Test Data
SU	Substance Use
SV	Subject Visits
TA	Trial Arms
TE	Trial Elements
TI	Trial Inclusion/Exclusion Criteria
TS	Trial Summary
TV	Trial Visits
VS	Vital Signs

CAN	ACTION
ADJ	ADJUSTMENT
ADJ	ANALYSIS DATASET
BL	BASELINE
BRTH	BIRTH
BOD	BODY
CAN	CANCER
CAT	CATEGORY
C	CHARACTER
CND	CONDITION
CLAS	CLASS
CD	CODE
COM	COMMENT
CON	CONCOMITANT
CONG	CONGENTTAL
DTC	DATE TIME - CHARACTER
DY	DAY
DTH	DEATH
DECOD	DECODE
DRV	DERIVED
DESC	DESCRIPTION
DISAB	DISABILITY
DOS	DOSE
DOS	DOSAGE
DOSE	DOSE
DOSE	DOSAGE
DUR	DURATION
EL	ELAPSED
ET	ELEMENT
EM	EMERGENT
END	END
EN	END
ETHNIC	ETHNICITY
X	EXTERNAL
EVAL	EVALUATOR
EVL	EVALUATION
FAST	FASTING
FN	FILENAME
FL	FLAG
FRM	FORMULATION, FORM
FREQ	FREQUENCY
GR	GRADE
GRP	GROUP
HI	HIGHER LIMIT
HOSP	HOSPITALIZATION
ID	IDENTIFIER
INDC	INDICATION
INDC	INDICATOR
INT	INTERVAL
INTP	INTERPRETATION
INV	INVESTIGATOR
LIFE	LIFE-THREATENING
LOC	LOCATION
LOINC	LOINC CODE
LO	LOWER LIMIT
MIE	MEDICALLY-IMPORTANT EVENT
NAM	NAME
NST	NON-STUDY THERAPY
NR	NORMAL RANGE
ND	NOT DONE
NUM	NUMBER
N	NUMERIC
ONGO	ONGOING
ORD	ORDER
ORIG	ORIGIN
OR	ORIGINAL
OTH	OTHER
O	OTHER
OUT	OUTCOME
OD	OVERDOSE
PARM	PARAMETER
PATT	PATTERN
POP	POPULATION
POS	POSITION
QUAL	QUALIFIER
REAS	REASON
REF	REFERENCE
RF	REFERENCE
RGM	REGIMEN
REL	RELATED
R	RELATED
REL	RELATIONSHIP
R	RELATIONSHIP
RES	RESULT
RL	RULE
SEQ	SEQUENCE
S	SERIOUS
SER	SERIOUS
SEV	SEVERITY
SPEC	SPECIMEN
SPC	SPECIMEN
SPEC	SPONSOR
SPC	SPONSOR
ST	STANDARD
STD	STANDARD
ST	START
STD	START
STAT	STATUS
SCAT	SUBCATEGORY
SUBJ	SUBJECT
SUPP	SUPPLEMENTAL
SYS	SYSTEM
TXT	TEXT
TM	TIME
TPT	TIMEPOINT
TOT	TOTAL
TOX	TOXICITY
TRANS	TRANSITION
TRT	TREATMENT
U	UNIT
U	UNIQUE
UP	UNPLANNED
VAR	VARIABLE
VAL	VALUE
V	VEHICLE

SEQ Values: When the --SEQ variable is used, it must have unique values for each USUBJID within each domain.
Label Casing: For Dataset labels and variable labels, all non trivial words (more than three characters) must start with a capital letter with the rest of the characters lowercase.
Required Values: (4.1.1.5) For required fields such as the ones specified in number 1, check to see if there are values. If there are any missing, values, report the observation number where it is missing.
Similar Parenthesis: For labels with matching values inside parenthesis such as (Yes/No) within the same dataset, it will check to see if the variables have the same type and length. If not, it will report the differences.
Required Variables: (4.1.1.5) A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables should always be included in the dataset and cannot be null for any record.
Expected Variable: (4.1.1.5) An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Columns for Expected variables are assumed to be present in each submitted dataset even if some values are null.

Search This Blog

Becoming a SAS Programmer in Pharmaceutical

Validating Transformation to CDISC SDTM and ADaM

Comments

Post a Comment

Popular posts from this blog

Clinical Trials Terminology for SAS Programmers

How to Get a Job as a SAS Programmer