选择偏倚:相关性证据的致命漏洞Selection Bias: The Fatal Flaw of Correlation Evidence

主动参与健康管理项目的人,往往具备以下特征:更强的健康意识、更高的教育水平、更规律的生活习惯。换句话说,参与者在参与之前就已经更健康了。People who actively participate in health management programs tend to have stronger health awareness, higher education, and more regular lifestyles. In other words, participants are already healthier before they join.

匹配前:两组不可比Pre-match: Not Comparable
干预组:42岁,不吸烟,运动Treatment: 42yr, non-smoker, active
干预组:38岁,BMI正常Treatment: 38yr, normal BMI
对照组:48岁,吸烟,不运动Control: 48yr, smoker, inactive
对照组:52岁,超重Control: 52yr, overweight
匹配后:两组特征相近Post-match: Comparable Groups
干预组:44岁,轻度吸烟Treatment: 44yr, light smoker
对照组(已匹配):45岁,轻度吸烟Control (matched): 45yr, light smoker
现在的差异主要来自干预本身Now differences mainly from intervention

PSM 的三步工作原理PSM in Three Steps

1

计算每位患者的"干预倾向评分"Calculate Each Patient's "Propensity Score"

对每位患者(干预组和对照组),基于其年龄、性别、BMI、基线风险评分、吸烟状态等协变量,使用逻辑回归模型估计"该患者在其特征下接受干预的概率"——即倾向评分。For each patient (treatment and control), use logistic regression to estimate the probability of receiving the intervention given their covariates — the propensity score.

P(T=1 | X) = logistic(β₀ + β₁·age + β₂·BMI + β₃·risk_score + ...)
2

按评分配对,消除基线差异Match by Score, Eliminate Baseline Differences

对干预组中的每位患者,在对照组中找到倾向评分最相近的患者作为"匹配对"(最近邻匹配)。设置卡钳(通常为倾向评分标准差的0.2倍),确保配对不过于勉强。配对完成后,检验匹配质量:要求所有协变量的标准化均值差(SMD)< 0.1。For each patient in the treatment group, find the closest propensity score match in the control group (nearest neighbor matching). Set a caliper (0.2 × SD of propensity score) to ensure adequate matching quality. Verify matching: require SMD < 0.1 for all covariates.

SMD = |μ₁ - μ₀| / √((σ₁² + σ₀²)/2) < 0.1
3

估计因果效应(ATT)Estimate Causal Effect (ATT)

在配对成功的样本上,比较干预组和匹配对照组的健康结果差异,得到平均处理效应(ATT,Average Treatment Effect on the Treated)。使用Bootstrap重采样计算置信区间。Compare health outcomes between matched treatment and control groups to obtain the Average Treatment Effect on the Treated (ATT). Use Bootstrap resampling to calculate confidence intervals.

ATT = E[Y(1) - Y(0) | T=1] ≈ mean(Y_treated) - mean(Y_matched_control)

PSM输出指标解读Reading PSM Output Metrics

指标Metric含义Meaning判读标准Standard
ATT干预组平均处理效应。干预相比不干预平均产生的健康变化量。Average Treatment Effect on the Treated. Average health change from intervention vs. no intervention.方向性 + CI不过零 = 显著效应Direction + CI not crossing zero = significant
95% CIATT的95%置信区间。反映估计的不确定性。95% Confidence Interval of ATT. Reflects estimation uncertainty.区间不含零 = 统计显著CI not containing zero = significant
SMD(匹配后)标准化均值差。衡量配对后两组的可比性。Standardized Mean Difference. Measures post-match group comparability.< 0.1 = 匹配质量良好Good match quality
匹配率Match Rate干预组中成功配对的比例。Proportion of treatment group successfully matched.> 80% = 重叠性良好Good overlap
Bootstrap SEBootstrap标准误。反映ATT估计在重采样下的稳定性。Bootstrap standard error. Reflects ATT stability under resampling.越小越稳定Smaller is more stable

PSM 的适用条件与局限性PSM Assumptions and Limitations

必须满足Required

条件可忽略性(Conditional Ignorability)Conditional Ignorability

在控制了所有协变量之后,干预分配与潜在结局无关。换句话说:影响干预分配的因素都被纳入了倾向评分模型。如果存在重要的未测量混杂因素(如患者的健康意识),PSM无法控制。After controlling for all covariates, treatment assignment is independent of potential outcomes. In other words, all factors influencing treatment assignment are included in the propensity score model. PSM cannot control for unmeasured confounders like health consciousness.

必须满足Required

重叠性(Overlap / Common Support)Overlap / Common Support

干预组和对照组在倾向评分分布上存在重叠——即对于每个干预组患者,对照组中都存在特征相近的人可以配对。如果两组特征差异过大,PSM无法完成有效匹配。Treatment and control groups must overlap in propensity score distribution — for every treated patient, there must be a comparable control. If groups are too different, PSM cannot produce valid matches.

已知局限Known Limitation

无法控制未观测混杂Cannot Control Unmeasured Confounders

PSM只能控制数据中存在的可观测协变量。相比随机对照试验(RCT),PSM是次优的因果推断方法。ReHealth Core在所有报告中明确说明这一局限,并建议将PSM证据作为补充性证据使用,而非替代RCT。PSM can only control observable covariates. Compared to RCTs, PSM is a second-best causal inference method. ReHealth Core clearly states this limitation in all reports and recommends PSM evidence as supplementary, not replacing RCTs.

常见问题Frequently Asked Questions

PSM和随机对照试验(RCT)有什么区别?How does PSM differ from Randomized Controlled Trials (RCT)?
RCT通过随机分配确保干预组和对照组在所有特征上均可比,是因果推断的黄金标准。PSM是在无法随机分配的真实世界场景中的最佳替代方案,通过统计匹配消除可观测混杂因素。PSM的局限在于无法控制未观测到的混杂因素。ReHealth Core在报告中明确声明PSM的适用范围和局限性。RCTs use random assignment to ensure group comparability on all characteristics — the gold standard for causal inference. PSM is the best alternative for real-world settings where random assignment isn't feasible, eliminating observable confounders through statistical matching. PSM's limitation is its inability to control for unobserved confounders.
什么是SMD?如何判断匹配质量?What is SMD and how to assess matching quality?
SMD(标准化均值差)是衡量干预组和对照组在某个变量上差异程度的标准化指标。SMD < 0.1通常被认为匹配质量良好,意味着两组在该变量上差异不显著。ReHealth Core要求所有匹配变量的匹配后SMD均 < 0.1,否则发出预警并建议扩大对照组池。SMD (Standardized Mean Difference) measures the standardized difference between groups on each variable. SMD < 0.1 is generally considered good matching quality. ReHealth Core requires all matching variables to achieve SMD < 0.1 post-match, otherwise it issues a warning and recommends expanding the control pool.
ATT(平均处理效应)代表什么意思?What does ATT (Average Treatment Effect) mean?
ATT(处理组平均处理效应)代表的是:对于实际接受了干预的这批人,干预相比不干预平均产生了多大的健康效果差异。例如ATT = -0.087(心血管风险评分),意味着接受干预的这批人,平均而言,相比如果他们没有接受干预,心血管风险评分下降了0.087个单位。这是一个有因果解释力的估计,不是简单的前后对比。ATT (Average Treatment Effect on the Treated) represents the average health effect difference for those who actually received the intervention, compared to if they hadn't. For example, ATT = -0.087 (cardiovascular risk score) means the intervention group's cardiovascular risk score decreased by 0.087 units on average compared to their counterfactual. This is a causally interpretable estimate, not a simple before/after comparison.

相关概念Related Concepts