As healthcare moves toward value-based payment models, understanding whether population health and Complex Care Management (CCM) services are actually improving patient outcomes and reducing costs is essential. Both healthcare providers and administrators need to measure not just whether interventions are delivered, but whether they truly make a difference in improving patient health, care quality, and value.
Defining Success: Selecting the Right Metrics
Measuring effectiveness begins with clearly defining what success looks like. The Centers for Medicare & Medicaid Services (CMS) emphasizes that programs should include a mix of process metrics, short-term outcomes, intermediate outcomes, and long-term outcomes to tell a complete story.
- Process metrics track how care is delivered. How quickly patients receive follow-up after hospital discharge or how often care plans are updated are examples of process metrics. These indicators help determine if the intervention is being implemented as designed.
- Short-term outcomes might include improvements in medication adherence, timely completion of preventive screenings, or reductions in avoidable emergency department visits.
- Intermediate outcomes could measure progress such as improved control of chronic diseases like diabetes or hypertension.
- Long-term outcomes reflect the ultimate goals: lower total cost of care, improved clinical outcomes, reduced hospitalizations, and enhanced quality of life for patients.
Defining these measures at the start of a program allows organizations to build data systems and reporting processes that will support ongoing evaluation.

Designing an Evaluation: Methodologies and Their Tradeoffs
When evaluating the impact of Complex Care Management (CCM) services or other population health programs, it is important to choose an analytic approach that balances rigor with practicality. Below are three commonly used methods, each with strengths and limitations to consider.
1. Pre-Post Analysis
This straightforward method compares outcomes before and after the intervention for the same patients.
Pros:
- Relatively easy to implement and interpret; ideal for quick insights.
- Requires minimal data, often just claims or EHR records from before and after enrollment.
- Useful for internal quality improvement and early-stage evaluations.
Cons:
- Does not control for external influences such as seasonality, policy changes, or broader healthcare trends.
- Improvements may reflect regression to the mean, since patients often enroll when their condition is at its worst.
- Cannot separate the effect of the program from other simultaneous initiatives.

2. Propensity Score Analysis
This approach creates a comparison group by matching participants with similar patients who did not receive the intervention. Matching criteria often include age, gender, comorbidities, prior utilization, and risk scores.
Pros:
- Reduces bias by balancing observed characteristics between groups.
- Can provide a stronger estimate of program impact than simple pre-post analysis.
- Commonly used in healthcare research and policy evaluations, including Centers for Medicare & Medicaid Services (CMS) studies.
Cons:
- Only adjusts for known and measured variables, so unmeasured differences may still bias results.
- Requires robust, high-quality data across both intervention and comparison groups.
- Matching can limit sample size if suitable matches are not available.

3. Difference-in-Difference (DiD) Analysis
DiD compares changes in outcomes over time between an intervention group and a comparison group, helping isolate the effect of the intervention from other external factors.
Pros:
- Controls for external factors that affect both groups, such as seasonality, policy changes, or pandemic effects.
- Provides a more rigorous estimate of causal impact when data are available for both groups before and after the intervention.
- Works well when evaluating value-based payment models and population-level initiatives.
Cons:
- Requires stable patient populations and consistent data over time.
- Assumes both groups would have followed parallel trends in the absence of the intervention, which may not always be true.
- More complex to implement and interpret, often requiring statistical expertise.

Challenges in Comparison Group Selection
Selecting a fair comparison group is one of the most critical and challenging parts of evaluation. Some organizations use patients who were eligible for an intervention but declined participation as a comparison group. However, those who refuse CCM services may differ in unmeasured ways, such as motivation, health literacy, social support, or trust in the healthcare system. These differences can bias results, making the intervention appear more or less effective than it truly is.
Propensity score matching helps mitigate this issue by aligning key baseline characteristics, but even sophisticated matching cannot control for all unmeasured variables. Therefore, results should always be interpreted with an understanding of these limitations.

Accounting for Seasonality
Health outcomes often vary by season. For example, respiratory illnesses peak in winter, while elective surgeries are more common in certain months. Evaluations that span less than a full year may misinterpret these patterns as effects of an intervention. To ensure accuracy, it is important to include at least 12 months of baseline and follow-up data or to adjust analyses for seasonal patterns.

Best Practices for Evaluating Complex Care Management
Evaluating complex care management programs requires a blend of analytical rigor and practical understanding of care delivery. Some best practices include:
- Define metrics early: Set targets for utilization, cost, and clinical outcomes before program launch.
- Use multiple data sources: Combine claims, electronic health record (EHR), and patient-reported data for a fuller picture.
- Account for patient variation: Patients are often enrolled for different lengths of time. Using per-member-per-month (PMPM) measures can standardize results.
- Engage care managers: Those closest to patients often provide context behind the data, such as identifying social or behavioral barriers that affect outcomes.
- Communicate results: Share findings with frontline teams to reinforce what is working and identify opportunities for improvement.

The Challenge of Varying Enrollment Durations
Patients often enroll in interventions like CCM services for different lengths of time. Some may participate for a few weeks, while others remain enrolled for a full year or longer. This can make it difficult to compare results across participants or programs. One effective approach is to normalize outcomes using PMPM metrics or to analyze only patients who have completed a minimum participation threshold. Additionally, sensitivity analyses can test whether outcomes differ by length of enrollment.

How Illustra Health Supports Effective Monitoring and Evaluation
At Illustra Health, we help healthcare organizations take a data-driven approach to population health management and complex care management monitoring and evaluation. Built on the population health expertise of Johns Hopkins, Illustra combines advanced analytics with deep clinical insight to identify the most impactable patients based on risk prediction and actionable opportunities.
Illustra integrates claims, electronic medical records (EMR), and social risk data to provide a unified, comprehensive view of patient populations. Our standardized outcomes metrics span clinical, cost, and utilization domains across all payor types, enabling clear, consistent evaluation of program effectiveness.
Illustra delivers value to organizations across the analytic spectrum. For teams with limited analytic resources, Illustra provides streamlined monitoring of complex care management and other population health programs along with direct collaboration with population health experts to support interpretation and strategy. For organizations with established analytic capabilities, Illustra accelerates insight generation through easy access to robust back-end data files that include detailed patient-level condition and morbidity markers, risk scores, medication adherence indicators, cost and utilization metrics.
By equipping both emerging and advanced teams with actionable data and expertise, Illustra empowers healthcare leaders to monitor progress, demonstrate measurable impact, and advance the goal of delivering better health outcomes at lower cost.

References
- Bodenheimer T, Ghorob A, Willard-Grace R, Grumbach K. The 10 building blocks of high-performing primary care. Ann Fam Med. 2014 Mar-Apr;12(2):166-71. doi: 10.1370/afm.1616. PMID: 24615313; PMCID: PMC3948764. https://pubmed.ncbi.nlm.nih.gov/24615313/
- Centers for Medicare & Medicaid Services (CMS). Chronic Care Management Services Fact Sheet. CMS.gov; 2023. https://www.cms.gov/files/document/chronic-care-management-factsheet.pdf
- Chang TH, Stuart EA. Propensity score methods for observational studies with clustered data: A review. Stat Med. 2022 Aug 15;41(18):3612-3626. doi: 10.1002/sim.9437. Epub 2022 May 23. PMID: 35603766; PMCID: PMC9540428. https://pmc.ncbi.nlm.nih.gov/articles/PMC9540428/
- Damschroder LJ, Reardon CM, Opra Widerquist MA, Lowery J. Conceptualizing outcomes for use with the Consolidated Framework for Implementation Research (CFIR): the CFIR Outcomes Addendum. Implement Sci. 2022 Jan 22;17(1):7. doi: 10.1186/s13012-021-01181-5. PMID: 35065675; PMCID: PMC8783408. https://pubmed.ncbi.nlm.nih.gov/35065675/
- McCall N, Cromwell J, Urato C. Evaluation of Medicare Care Management for High Cost Beneficiaries (CMHCB) Demonstration: Final Report. RTI International for CMS. 2010. https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/reports/research-reports-items/cms1247380
- Peikes D, Chen A, Schore J, Brown R. Effects of care coordination on hospitalization, quality of care, and health care expenditures among Medicare beneficiaries: 15 randomized trials. JAMA. 2009 Feb 11;301(6):603-18. doi: 10.1001/jama.2009.126. PMID: 19211468. https://pubmed.ncbi.nlm.nih.gov/19211468/
- Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. doi:10.1093/biomet/70.1.41. https://www.stat.cmu.edu/~ryantibs/journalclub/rosenbaum_1983.pdf
- Ryan AM, Krinsky S, Kontopantelis E, Doran T. Long-term evidence for the effect of pay-for-performance in primary care on mortality in the UK: a population study. Lancet. 2016 Jul 16;388(10041):268-74. doi: 10.1016/S0140-6736(16)00276-2. Epub 2016 May 17. PMID: 27207746. https://pubmed.ncbi.nlm.nih.gov/27207746/
- Shadish WR, Cook TD, Campbell DT. Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin; 2002.
- Taylor EF, Peikes D, Geonnotti K, et al. Building Quality Improvement Capacity in Primary Care: Supports and Resources. Agency for Healthcare Research and Quality; 2013. https://www.ahrq.gov/sites/default/files/publications/files/pcmhqi2.pdf