Metadata & Data Documentation
March 25, 2026
PhDDesk
Focus
Structured metadata and data documentation for reproducible obstetric HIV phenotype modeling, longitudinal inference, and dynamic prediction.
Overview
This documentation stream formalizes the data architecture that supports the PhD analytical program after the Proposal Abstract stage. The objective is to move from conceptual aims to an explicit variable registry, join-key strategy, and analytical-use map so that each downstream model is tied to a documented data object rather than an informal extraction workflow.
The current metadata build is organized around six linked assets:
- The source eCRF data dictionary for the pregnancy cohort.
- A complete variable shortlist spanning social, clinical, treatment, and outcome domains.
- A core analytical shortlist restricted to variables with direct methodological use across the PhD aims.
- An aim-linked use map that records why each retained variable is needed and where it enters the statistical pipeline.
- A synthetic ADaM contract that fixes the prototype analysis datasets and model-input files before live data are linked.
- A shortlist-to-ADaM derivation map that shows how the prototype social-risk indicators are constructed from retained baseline source fields.
The current core registry has already been mapped to the three main data-analytic streams that follow the methodological audit. The same variable does not have a single static role across the dissertation; some variables act as baseline confounders in one stream, time-varying predictors in another, and endpoint-support variables in the translational prediction stage. The prototype data layer now also records where a derived analysis variable begins, rather than allowing it to appear only inside downstream model code.
Documentation Design
The metadata layer is being built around three principles:
- Variable provenance must be explicit. Every retained field should be traceable to its source table, join key, and intended analytical role.
- Aim-specific use must be documented. The registry should show whether a variable is used for phenotype construction, confounding control, trajectory modeling, endpoint definition, or dynamic prediction support.
- Missingness and timing cannot be treated as afterthoughts. Because the cohort contains irregular visit schedules and partial biomarker capture, the documentation needs to record missing-data burden and temporal anchors as part of the analysis design.
- Derived variables must be traceable to retained source fields. The prototype ADaM layer should expose the derivation logic for social-risk indicators and other constructed variables instead of hiding them inside analysis scripts.
Mplus Naming Layer
The current metadata build now includes a provisional Mplus naming layer for core variables. These aliases are designed to be short, syntax-friendly, and easier to manage in model scripts while still retaining a CDISC-style domain logic.
The current naming proposal follows four rules:
- All analysis names are uppercase.
- Each suggested alias is limited to a maximum of 4 characters.
- The first two characters identify the source-domain family.
- The last two characters summarize the variable concept and remain provisional until the analysis dataset structure is finalized.
| Current Documentation Scope | |
| Documentation asset | Current role |
|---|---|
| db_ecrf_data_dictionary.csv | Source-level variable definitions and field naming reference. |
| maternal_health_variable_shortlist.csv | Broad candidate registry for maternal-health analysis planning. |
| maternal_health_variable_shortlist_core.csv | Core analytical variable set for the PhD program. |
| maternal_health_core_use_by_objective.qmd | Aim-linked documentation of variable use and methodological justification. |
| Analytical Coverage | ||
| Research stream | Core variables | Scope |
|---|---|---|
| Social Risk Phenotypes and the Maternal Burden of Obstetric HIV: A Causal Mixture Modeling Approach | 27 | Baseline social structure, reproductive history, STI context, and treatment covariates used for latent social-risk phenotyping and causal mixture modeling. |
| Multivariate Maternal Risk Phenotyping: Handling Time-Unstructuredness in Growth Mixture Models | 17 | Physiological and HIV-related markers used to characterize multivariate maternal risk trajectories under irregular follow-up. |
| Multidimensional Latent Burden and Translational Stratification: A Dynamic Prediction Framework | 62 | Outcome, timing, biomarker, follow-up, and support variables used for severe maternal outcome definition and dynamic prediction. |
| Preliminary Mplus Alias Scheme | |
| Item | Current rule |
|---|---|
| Naming style | CDISC-style uppercase alias suggestion |
| Maximum length | 4 characters |
| Structure | 2-character source-domain prefix + 2-character analysis suffix |
| Intended use | Mplus-ready short names for modeling datasets and syntax files |
| Status | Provisional naming layer to be refined with the evolving analysis data model |
| Synthetic ADaM Contract | ||||
| Variable | Label | Role | Source anchor | Notes |
|---|---|---|---|---|
| ADLB | ||||
| BMI_ANT | Antenatal BMI | biomarker | bmi | Longitudinal anthropometry. |
| CD4_CNT | CD4 count | biomarker | cd4_count | Longitudinal immune marker. |
| DAYS_SINCE_ANC1 | Days since first ANC | time_var | visit chronology | Clinical timescale. |
| GEST_DAYS_CORRECTED | Corrected gestational age (days) | time_var | edddt / last_menstrual_period | Gold-standard gestational timescale. |
| HGB_LVL | Hemoglobin | biomarker | haemoglobin | Longitudinal hematology marker. |
| LOG_VL | Log viral load | biomarker | viral_load | Longitudinal HIV biomarker. |
| ADSL | ||||
| BASE_AGE | Maternal age at enrollment | baseline_covariate | age | Baseline demographic covariate. |
| BASE_BMI | Baseline BMI | baseline_covariate | bmi | Baseline physiologic covariate. |
| CD4_BASE | Baseline CD4 count | baseline_covariate | cd4_count | Baseline HIV disease activity marker. |
| ECON | Economic strain indicator | derived_social_indicator | highest_education + number_of_rooms + work_status | Derived from shortlisted fields: highest_education, number_of_rooms, work_status. |
| ECTV | Time-varying material strain signal | derived_social_indicator | use_family_planning | Derived from shortlisted fields: use_family_planning. |
| EST_GA_AT_ANC1 | Estimated gestational age at first ANC (days) | timing_anchor | est_gestational_age_wks | Clinical timeline anchor. |
| GRAVIDITY | Prior pregnancies | baseline_covariate | total_number_pregnancies | Baseline reproductive history. |
| HOUS | Housing constraint indicator | derived_social_indicator | kind_of_toilet + number_of_rooms + people_sleep_in_room + tap_water | Derived from shortlisted fields: kind_of_toilet, number_of_rooms, people_sleep_in_room, tap_water. |
| MOM_ID | Analysis mother identifier | subject_key | usubjid | Anchored to maternal_health_variable_shortlist_core.csv |
| PART | Partner-context risk indicator | derived_social_indicator | marital_status + partner_hiv_status + partner_other_sexual_relations | Derived from shortlisted fields: marital_status, partner_hiv_status, partner_other_sexual_relations. |
| PRTV | Time-varying partner-context signal | derived_social_indicator | partner_other_sexual_relations | Derived from shortlisted fields: partner_other_sexual_relations. |
| SITE_COMPLEX | Facility complexity | design_covariate | siteid/sitenm | Ordinal site complexity proxy. |
| SITE_COUNTRY | Country | design_covariate | country | Site-level context. |
| SITE_ID | Site identifier | design_covariate | siteid | Primary clustering variable for multilevel mixture models. |
| SITE_VL_SUPP | Site-level viral suppression rate | design_covariate | siteid | Continuous contextual covariate. |
| SOC_RISK_CLASS | Synthetic latent social risk class | simulation_latent_truth | Derived | Used only for synthetic data generation and fallback prediction. |
| TRNS | Transport barrier indicator | derived_social_indicator | travel_time_to_clinic | Derived from shortlisted fields: travel_time_to_clinic. |
| ADSL_RAW | ||||
| highest_education | Your highest level of education | shortlist_source_field | highest_education | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| kind_of_toilet | What kind of toilet do you have | shortlist_source_field | kind_of_toilet | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| marital_status | Marital status | shortlist_source_field | marital_status | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| number_of_rooms | Number of rooms house has | shortlist_source_field | number_of_rooms | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| partner_hiv_status | What is your partner hiv status | shortlist_source_field | partner_hiv_status | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| partner_other_sexual_relations | Partner has other sexual relations | shortlist_source_field | partner_other_sexual_relations | Shortlisted baseline source retained in the synthetic prototype (SRHH). |
| people_sleep_in_room | How many people sleep in your room | shortlist_source_field | people_sleep_in_room | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| tap_water | Tap water in the premises | shortlist_source_field | tap_water | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| travel_time_to_clinic | Travel time from home to clinic | shortlist_source_field | travel_time_to_clinic | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| use_family_planning | Did you use family planning methods | shortlist_source_field | use_family_planning | Shortlisted baseline source retained in the synthetic prototype (SRHH). |
| work_status | Do you work | shortlist_source_field | work_status | Shortlisted baseline source retained in the synthetic prototype (SOCDEM). |
| ADTTE | ||||
| PTB_STATUS | Preterm birth status | secondary_outcome | delivery outcomes | Synthetic secondary outcome. |
| SGA_STATUS | Small-for-gestational-age status | secondary_outcome | delivery outcomes | Synthetic secondary outcome. |
| SMO_DAYS | Days to severe maternal outcome or censoring | time_to_event | delivery/adverse event timing | Event timescale for Aim 4. |
| SMO_EVENT | Severe maternal outcome flag | time_to_event | pregnancy_outcome / adverse events | Primary dynamic prediction event indicator. |
| MPLUS_AIM2 | ||||
| aim2_social_mplus.dat | Wide categorical social indicator export | model_input | ADSL | Written from the centralized synthetic ADaM layer. |
| MPLUS_AIM3 | ||||
| aim3_logvl_wide.dat | Wide TSCORES export for LOG_VL | model_input | ADLB | Written from the centralized synthetic ADaM layer. |
| Current PME prototype data contract generated from the centralized synthetic ADaM builder. | ||||
| Shortlist to ADaM Social-Derivation Map | |||||
| Source variable | Source label | Source table | Aim scope | Derivation logic | Role |
|---|---|---|---|---|---|
| ECON - Economic strain indicator | |||||
| highest_education | Your highest level of education | SOCDEM | aim_2;aim_4 | Score lower educational attainment as structural economic vulnerability. | Latent social phenotype indicator |
| number_of_rooms | Number of rooms house has | SOCDEM | aim_2;aim_4 | Score very low room count as constrained household resources. | Latent social phenotype indicator |
| work_status | Do you work | SOCDEM | aim_2;aim_4 | Score unemployment/household work as material strain. | Latent social phenotype indicator |
| ECTV - Time-varying material strain signal | |||||
| use_family_planning | Did you use family planning methods | SRHH | aim_2;aim_4 | Use family-planning non-use as a proxy support signal layered onto baseline material strain. | Time-varying support flag |
| HOUS - Housing constraint indicator | |||||
| kind_of_toilet | What kind of toilet do you have | SOCDEM | aim_2;aim_4 | Flag lower-quality sanitation as household infrastructure strain. | Latent social phenotype indicator |
| number_of_rooms | Number of rooms house has | SOCDEM | aim_2;aim_4 | Combine low room count with crowding to define housing constraint. | Latent social phenotype indicator |
| people_sleep_in_room | How many people sleep in your room | SOCDEM | aim_2;aim_4 | Combine sleeping density with rooms to define crowding burden. | Latent social phenotype indicator |
| tap_water | Tap water in the premises | SOCDEM | aim_2;aim_4 | Flag absent on-premises tap water as household infrastructure strain. | Latent social phenotype indicator |
| PART - Partner-context risk indicator | |||||
| marital_status | Marital status | SOCDEM | aim_2;aim_4 | Score unstable partner context as relational vulnerability. | Latent social phenotype indicator |
| partner_hiv_status | What is your partner hiv status | SOCDEM | aim_2;aim_4 | Score partner HIV-positive status as partner-context vulnerability. | Latent social phenotype indicator |
| partner_other_sexual_relations | Partner has other sexual relations | SRHH | aim_2;aim_4 | Score known partner concurrency as partner-context vulnerability. | Latent social phenotype indicator |
| PRTV - Time-varying partner-context signal | |||||
| partner_other_sexual_relations | Partner has other sexual relations | SRHH | aim_2;aim_4 | Carry partner-context instability into a time-varying support flag. | Time-varying support flag |
| TRNS - Transport barrier indicator | |||||
| travel_time_to_clinic | Travel time from home to clinic | SOCDEM | aim_2;aim_4 | Flag long clinic travel times as access-to-care barriers. | Latent social phenotype indicator |
| The social-risk indicators used in the PME prototype are now derived explicitly from shortlisted baseline fields rather than simulated directly in the aim scripts. | |||||
| Derived Analytical Objects | |||
| Derived object | Analytical role | Mplus name | Purpose |
|---|---|---|---|
| Social risk phenotyping | |||
| Social risk phenotype class | Latent baseline social-vulnerability phenotype | XSRP | Represents the derived latent social-risk grouping used in causal mixture analyses for obstetric HIV. |
| Posterior social-risk probability | Class-membership uncertainty summary for social-risk models | PSRP | Retains posterior uncertainty so social-risk class assignment is not treated as fixed in downstream estimation. |
| Maternal burden phenotyping | |||
| Maternal burden phenotype class | Latent longitudinal physiological-burden phenotype | XMBP | Represents the derived multivariate maternal-risk trajectory grouping under irregular longitudinal follow-up. |
| Outcome endpoint | |||
| Severe maternal outcome endpoint | Binary downstream maternal endpoint | YSMO | Defines the derived severe maternal outcome target used for modeling and translational interpretation. |
| Prediction objects | |||
| Dynamic landmark risk estimate | Updated individualized risk at clinical prediction landmarks | RDLM | Stores updated patient-level risk predictions generated from the joint longitudinal-event modeling framework. |
| Uncertainty objects | |||
| BCH / posterior uncertainty weights | Uncertainty control for latent-class and mixture-model estimation | WBCH | Carries BCH-style or posterior-probability uncertainty adjustments into distal outcome and prediction models. |
Observed Variable Maps
Social Risk Phenotypes and the Maternal Burden of Obstetric HIV: A Causal Mixture Modeling Approach
| Social Risk Phenotypes and the Maternal Burden of Obstetric HIV: A Causal Mixture Modeling Approach | ||||||
| Source | Original variable | Mplus name | Label | Role in analysis | Join keys | Justification |
|---|---|---|---|---|---|---|
| Arv Treatment | ||||||
| ARV | drg1 | ARD1 | Drug 1 | treatment_exposure | usubjid|dsta | Primary regimen component for therapy exposure grouping. |
| ARV | dsta | ARDS | Date started | treatment_exposure | usubjid|dsta | Treatment exposure start needed for ART-time alignment. |
| ARV | dsto | ARDE | Date stopped | treatment_exposure | usubjid|dsta | Treatment exposure stop needed for treatment dynamics and censoring. |
| Clinical Sti | ||||||
| SRHH | abnormal_vaginal_discharge | SRAV | Last 3 months-abnormal vag discharge | clinical_covariate | usubjid|visitnum | Clinical STI symptom indicator; model separately from social exposure constructs. |
| SRHH | genital_herpes | SRGH | In the last 3 months-genital herpes | clinical_covariate | usubjid|visitnum | Clinical STI diagnosis indicator; model separately from social exposure constructs. |
| SRHH | syphilis | SRSY | In the last 3 months-syphilis | clinical_covariate | usubjid|visitnum | Clinical STI diagnosis indicator; model separately from social exposure constructs. |
| Obstetric History | ||||||
| OBSH | number_abortions | OBAB | Number of abortions/miscarriages | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | number_premature_births | OBPB | Premature births | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | number_stillbirths | OBST | Number stillbirths >20 weeks | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | outcome_last_pregnancy | OBOP | Outcome of the last pregnancy | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | total_number_pregnancies | OBTP | Total number of pregnancies | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| Sexual Reproductive | ||||||
| SRHH | currently_pregnant | SRCP | Currently pregnant | covariate | usubjid|visitnum | Sexual/reproductive and STI vulnerability profile. |
| Site Context | ||||||
| CNTRYST | country | CNCO | Country | design_covariate | siteid | Country-level context for health-system and policy heterogeneity adjustment. |
| CNTRYST | siteid | CNSI | Site ID | design_covariate | siteid | Primary site-level clustering and stratification covariate across aims. |
| CNTRYST | sitenm | CNS1 | Site Name | design_covariate | siteid | Readable site label for outputs and quality control checks. |
| Social | ||||||
| SRHH | partner_other_sexual_relations | SRPO | Partner has other sexual relations | covariate | usubjid|visitnum | Social relationship-risk context indicator; keep in social exposure domain. |
| SRHH | use_family_planning | SRUF | Did you use family planning methods | covariate | usubjid|visitnum | Behavior/access indicator in reproductive social context; model with social determinants. |
| Social Baseline | ||||||
| SOCDEM | age | SDAG | Age | covariate | usubjid|visitnum | Baseline demographic confounder for all models. |
| SOCDEM | highest_education | SDHE | Your highest level of education | covariate | usubjid|visitnum | Socioeconomic confounder and social phenotype component. |
| SOCDEM | kind_of_toilet | SDTO | What kind of toilet do you have | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | marital_status | SDMS | Marital status | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | number_of_rooms | SDRO | Number of rooms house has | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | partner_hiv_status | SDPH | What is your partner hiv status | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | people_sleep_in_room | SDPS | How many people sleep in your room | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | tap_water | SDTW | Tap water in the premises | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | travel_time_to_clinic | SDTT | Travel time from home to clinic | covariate | usubjid|visitnum | Access-to-care proxy affecting both exposure and outcomes. |
| SOCDEM | work_status | SDWS | Do you work | covariate | usubjid|visitnum | Economic vulnerability indicator for social phenotype classes. |
| Baseline social structure, reproductive history, STI context, and treatment covariates used for latent social-risk phenotyping and causal mixture modeling. | ||||||
Multivariate Maternal Risk Phenotyping: Handling Time-Unstructuredness in Growth Mixture Models
| Multivariate Maternal Risk Phenotyping: Handling Time-Unstructuredness in Growth Mixture Models | ||||||
| Source | Original variable | Mplus name | Label | Role in analysis | Join keys | Justification |
|---|---|---|---|---|---|---|
| Arv Treatment | ||||||
| ARV | drg1 | ARD1 | Drug 1 | treatment_exposure | usubjid|dsta | Primary regimen component for therapy exposure grouping. |
| ARV | dsta | ARDS | Date started | treatment_exposure | usubjid|dsta | Treatment exposure start needed for ART-time alignment. |
| ARV | dsto | ARDE | Date stopped | treatment_exposure | usubjid|dsta | Treatment exposure stop needed for treatment dynamics and censoring. |
| Biochemistry | ||||||
| LFT | creatinine1 | LFCR | Creatinine | biomarker | usubjid|visitnum | Core renal dysfunction marker relevant to severe maternal outcomes. |
| LFT | total_bilirubin | LFTB | Total bilirubin | biomarker | usubjid|visitnum | Core hepatic dysfunction marker relevant to severe maternal outcomes. Methodologically essential despite higher missingness; use missing-data strategy. |
| Hematology | ||||||
| FBC | haemoglobin | FBHB | Haemoglobin | biomarker | usubjid|visitnum | Core anemia marker for maternal risk phenotyping. |
| FBC | platelet_count | FBPL | Platelet count | biomarker | usubjid|visitnum | Core coagulation/hematologic risk marker. |
| Hiv Markers | ||||||
| HIVM | cd4_count | HIC4 | Cd4 count | biomarker | usubjid|visitnum | Core immune status marker for risk adjustment and phenotyping. Methodologically essential despite higher missingness; use missing-data strategy. |
| HIVM | viral_load | HIVL | Viral load copies | biomarker | usubjid|visitnum | Core HIV disease activity marker for maternal risk stratification. Methodologically essential despite higher missingness; use missing-data strategy. |
| Site Context | ||||||
| CNTRYST | country | CNCO | Country | design_covariate | siteid | Country-level context for health-system and policy heterogeneity adjustment. |
| CNTRYST | siteid | CNSI | Site ID | design_covariate | siteid | Primary site-level clustering and stratification covariate across aims. |
| CNTRYST | sitenm | CNS1 | Site Name | design_covariate | siteid | Readable site label for outputs and quality control checks. |
| Vitals | ||||||
| VIT | bmi | VTBM | Bmi | biomarker | usubjid|visitnum | Core maternal physiologic trajectory measures. |
| VIT | diastolic_bp | VTDB | Diastolic bp | biomarker | usubjid|visitnum | Core maternal hemodynamic marker linked to obstetric risk. |
| VIT | systolic_bp | VTSB | Systolic bp | biomarker | usubjid|visitnum | Core maternal hemodynamic marker linked to obstetric risk. |
| VIT | weight | VTWT | Weight | biomarker | usubjid|visitnum | Core maternal physiologic trajectory measures. |
| Who Stage | ||||||
| WHO | who_clinical_classification | WHWC | Who clinical classification | disease_severity_covariate | usubjid|visitnum | Global HIV clinical severity summary for baseline adjustment. |
| Physiological and HIV-related markers used to characterize multivariate maternal risk trajectories under irregular follow-up. | ||||||
Multidimensional Latent Burden and Translational Stratification: A Dynamic Prediction Framework
| Multidimensional Latent Burden and Translational Stratification: A Dynamic Prediction Framework | ||||||
| Source | Original variable | Mplus name | Label | Role in analysis | Join keys | Justification |
|---|---|---|---|---|---|---|
| Adverse Events | ||||||
| AE | ae_severity | AESE | Adverse event severity | outcome_support | usubjid|vstdt | Supports severe-event capture beyond delivery record fields. |
| AE | serious_adverse_event | AESA | Is this a serious adverse event | outcome_support | usubjid|vstdt | Captures severe intercurrent maternal morbidity events. |
| Arv Treatment | ||||||
| ARV | drg1 | ARD1 | Drug 1 | treatment_exposure | usubjid|dsta | Primary regimen component for therapy exposure grouping. |
| ARV | dsta | ARDS | Date started | treatment_exposure | usubjid|dsta | Treatment exposure start needed for ART-time alignment. |
| ARV | dsto | ARDE | Date stopped | treatment_exposure | usubjid|dsta | Treatment exposure stop needed for treatment dynamics and censoring. |
| Biochemistry | ||||||
| LFT | creatinine1 | LFCR | Creatinine | biomarker | usubjid|visitnum | Core renal dysfunction marker relevant to severe maternal outcomes. |
| LFT | total_bilirubin | LFTB | Total bilirubin | biomarker | usubjid|visitnum | Core hepatic dysfunction marker relevant to severe maternal outcomes. Methodologically essential despite higher missingness; use missing-data strategy. |
| Clinical Sti | ||||||
| SRHH | abnormal_vaginal_discharge | SRAV | Last 3 months-abnormal vag discharge | clinical_covariate | usubjid|visitnum | Clinical STI symptom indicator; model separately from social exposure constructs. |
| SRHH | genital_herpes | SRGH | In the last 3 months-genital herpes | clinical_covariate | usubjid|visitnum | Clinical STI diagnosis indicator; model separately from social exposure constructs. |
| SRHH | syphilis | SRSY | In the last 3 months-syphilis | clinical_covariate | usubjid|visitnum | Clinical STI diagnosis indicator; model separately from social exposure constructs. |
| Delivery Outcomes | ||||||
| LDR | deldt | LDDD | Delivery date | outcome | usubjid|visitnum | Defines pregnancy and delivery endpoints. |
| LDR | gestational_diabetes | LDGD | Gestational diabetes | outcome | usubjid|visitnum | Defines pregnancy and delivery endpoints. |
| LDR | high_blood_pressure | LDHB | High blood pressure | outcome | usubjid|visitnum | Defines pregnancy and delivery endpoints. |
| LDR | obstetrics_complications | LDOC | Any obstetric complications | outcome | usubjid|visitnum | Defines pregnancy and delivery endpoints. |
| LDR | pregnancy_outcome | LDPO | 2 indicate pregnancy outcome | outcome | usubjid|visitnum | Primary maternal outcome definition for SMO trajectory and endpoint modeling. |
| LDR | type_delivery | LDTD | Type of delivery | outcome | usubjid|visitnum | Defines pregnancy and delivery endpoints. |
| LDR | vaginal_bleeding | LDVB | Vaginal bleeding | outcome | usubjid|visitnum | Defines pregnancy and delivery endpoints. |
| Hematology | ||||||
| FBC | haemoglobin | FBHB | Haemoglobin | biomarker | usubjid|visitnum | Core anemia marker for maternal risk phenotyping. |
| FBC | platelet_count | FBPL | Platelet count | biomarker | usubjid|visitnum | Core coagulation/hematologic risk marker. |
| Hiv Markers | ||||||
| HIVM | cd4_count | HIC4 | Cd4 count | biomarker | usubjid|visitnum | Core immune status marker for risk adjustment and phenotyping. Methodologically essential despite higher missingness; use missing-data strategy. |
| HIVM | viral_load | HIVL | Viral load copies | biomarker | usubjid|visitnum | Core HIV disease activity marker for maternal risk stratification. Methodologically essential despite higher missingness; use missing-data strategy. |
| Hospital Admissions | ||||||
| HOSP | admdt | HOAD | Admission date | outcome_support | usubjid|admdt | Hospitalization timing signal for acute decompensation episodes. |
| HOSP | diag | HODG | Diagnosis | outcome_support | usubjid|admdt | Captures acute clinical deterioration episodes. |
| Maternal Followup | ||||||
| MAFU | admitted_hospital | MFAH | Been admitted to hospital | time_varying_covariate | usubjid|visitnum | Time-varying maternal clinical status and symptoms. |
| MAFU | current_health_status | MFHS | General health status | time_varying_covariate | usubjid|visitnum | Time-varying maternal clinical status and symptoms. |
| MAFU | current_pregnancy_status | MFPS | Current pregnancy status | time_varying_covariate | usubjid|visitnum | Time-varying maternal clinical status and symptoms. |
| Obstetric History | ||||||
| OBSH | number_abortions | OBAB | Number of abortions/miscarriages | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | number_premature_births | OBPB | Premature births | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | number_stillbirths | OBST | Number stillbirths >20 weeks | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | outcome_last_pregnancy | OBOP | Outcome of the last pregnancy | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| OBSH | total_number_pregnancies | OBTP | Total number of pregnancies | baseline_covariate | usubjid|visitnum | Baseline reproductive history risk profile. |
| Pregnancy Followup | ||||||
| ANCF | admitted_hospital | ANAH | Admitted to hospital since last visit | time_varying_covariate | usubjid|visitnum | Antenatal follow-up status and interim clinical changes. |
| ANCF | change_in_arv_regimen | ANCA | Change in arv regimen since last visit | time_varying_covariate | usubjid|visitnum | Antenatal follow-up status and interim clinical changes. |
| ANCF | current_pregnancy_status | ANPS | Current pregnancy status | time_varying_covariate | usubjid|visitnum | Antenatal follow-up status and interim clinical changes. |
| Pregnancy Registration | ||||||
| PRF | conceive_on_arvs | PRCA | Conceive while taking arvs | pregnancy_timing_covariate | usubjid|visitnum | Pregnancy registration and gestational timing anchors. |
| PRF | edddt | PRED | Expected delivery date | pregnancy_timing_covariate | usubjid|visitnum | Pregnancy time anchor for gestational alignment and dynamic prediction windows. |
| PRF | est_gestational_age_wks | PREG | Estimated gestational age-weeks | pregnancy_timing_covariate | usubjid|visitnum | Core gestational timing covariate for longitudinal models. |
| PRF | last_menstrual_period | PRMP | Date of last menstrual period | pregnancy_timing_covariate | usubjid|visitnum | Pregnancy registration and gestational timing anchors. |
| Sexual Reproductive | ||||||
| SRHH | currently_pregnant | SRCP | Currently pregnant | covariate | usubjid|visitnum | Sexual/reproductive and STI vulnerability profile. |
| Site Context | ||||||
| CNTRYST | country | CNCO | Country | design_covariate | siteid | Country-level context for health-system and policy heterogeneity adjustment. |
| CNTRYST | siteid | CNSI | Site ID | design_covariate | siteid | Primary site-level clustering and stratification covariate across aims. |
| CNTRYST | sitenm | CNS1 | Site Name | design_covariate | siteid | Readable site label for outputs and quality control checks. |
| Social | ||||||
| SRHH | partner_other_sexual_relations | SRPO | Partner has other sexual relations | covariate | usubjid|visitnum | Social relationship-risk context indicator; keep in social exposure domain. |
| SRHH | use_family_planning | SRUF | Did you use family planning methods | covariate | usubjid|visitnum | Behavior/access indicator in reproductive social context; model with social determinants. |
| Social Baseline | ||||||
| SOCDEM | age | SDAG | Age | covariate | usubjid|visitnum | Baseline demographic confounder for all models. |
| SOCDEM | highest_education | SDHE | Your highest level of education | covariate | usubjid|visitnum | Socioeconomic confounder and social phenotype component. |
| SOCDEM | kind_of_toilet | SDTO | What kind of toilet do you have | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | marital_status | SDMS | Marital status | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | number_of_rooms | SDRO | Number of rooms house has | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | partner_hiv_status | SDPH | What is your partner hiv status | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | people_sleep_in_room | SDPS | How many people sleep in your room | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | tap_water | SDTW | Tap water in the premises | covariate | usubjid|visitnum | Baseline social determinants and access profile. |
| SOCDEM | travel_time_to_clinic | SDTT | Travel time from home to clinic | covariate | usubjid|visitnum | Access-to-care proxy affecting both exposure and outcomes. |
| SOCDEM | work_status | SDWS | Do you work | covariate | usubjid|visitnum | Economic vulnerability indicator for social phenotype classes. |
| Termination Death | ||||||
| TERM | cause_of_death | TRCD | Cause of death | outcome | usubjid|visitnum | Captures terminal outcomes and death attribution. |
| TERM | dthdt | TRDT | Date of death | outcome | usubjid|visitnum | Maternal death timing anchor for severe outcome endpoint. |
| TERM | reason_for_termination | TRRF | Reason for termination | outcome | usubjid|visitnum | Captures terminal outcomes and death attribution. |
| Vitals | ||||||
| VIT | bmi | VTBM | Bmi | biomarker | usubjid|visitnum | Core maternal physiologic trajectory measures. |
| VIT | diastolic_bp | VTDB | Diastolic bp | biomarker | usubjid|visitnum | Core maternal hemodynamic marker linked to obstetric risk. |
| VIT | systolic_bp | VTSB | Systolic bp | biomarker | usubjid|visitnum | Core maternal hemodynamic marker linked to obstetric risk. |
| VIT | weight | VTWT | Weight | biomarker | usubjid|visitnum | Core maternal physiologic trajectory measures. |
| Who Stage | ||||||
| WHO | who_clinical_classification | WHWC | Who clinical classification | disease_severity_covariate | usubjid|visitnum | Global HIV clinical severity summary for baseline adjustment. |
| Outcome, timing, biomarker, follow-up, and support variables used for severe maternal outcome definition and dynamic prediction. | ||||||
Immediate Data Domains
The current metadata work is centered on the domains that directly support the proposed framework:
- Baseline social and demographic structure for latent social risk phenotypes.
- Obstetric and reproductive history for baseline maternal vulnerability profiling.
- ART exposure and treatment timing for longitudinal alignment.
- Vital signs, hematology, renal, hepatic, and HIV markers for physiological burden phenotyping.
- Delivery, hospitalization, adverse event, and termination records for severe maternal outcome definition and support.
Next Documentation Steps
- Finalize a controlled registry for severe maternal outcome derivation variables and supporting event fields.
- Document gestational and clinical time anchors used for irregular longitudinal alignment.
- Add a formal join-specification layer for participant, visit, admission, and delivery records.
- Extend the registry with derived-variable documentation for phenotype classes, dynamic prediction inputs, and final endpoint-construction rules once the live data pipeline is linked.