Data Warehouse

 HIPAA-compliant integrated clinical-sociocontextual data warehouse

Electronic Health Records Merged to Insurance Claims, Criminal Justice, and Community Data

We leverage over a decade of managing and analyzing EHR data and patient insurance claims from commercial insurance, Medicaid and Medicare payers, as well as the close connections among research at the Department of Psychiatry, Health Equity Research Lab and the Information Technology and Business Analytics departments at CHA. The HERLab currently has access to Epic EHR data for all CHA patients that are linked to multiple social service and administrative system datasets (see below). Access to these data sources allows for the evaluation of the impact and the mediators and moderators of innovative disparities-reducing interventions on service utilization (e.g., outpatient, inpatient visits, medications, emergency department use) medical expenditures (both total expenditures and expenditures by service setting) of psychiatric, specialty and primary care services. 

Epic /Clarity DataCHA has used the Epic electronic health record system since 2008. With direct access to the Epic data warehouse (Clarity), we collect detailed information on individuals’ demographics, diagnoses, medications and service use. We have used these data in multiple prior publications, receiving approval for over twenty protocols from our Institutional Review Board (IRB) to publish studies with results coded or de-identified so that patient information is not revealed. Using SQL server software combined with statistical software (SAS, Stata, R), we are able to manipulate the relational database to create large-scale flat file datasets for analysis (>600,000 patients, over tens of millions of claims, and millions of clinician notes). The dataset has been used for disparities studies,5–7and natural language processing (NLP) and machine learning applications.59 We are in regular contact with CHA IT and Business Analytics to ensure the quality of the data and to resolve any inconsistencies or changes to the data structure.  

Insurance Claims Data: For the 50% of the CHA patient population attributed by public and private payers to the CHA Accountable Care Organization (ACO), we are able to merge EHR information to insurance claims data. Analysis of claims data allows us to analyze service use within CHA as well as other healthcare institutions.  For example, recent work using these claims compared in- and out-of-network expenditures for mental and physical health services for youth with autism and asthma.8 

Linked Criminal Justice Information: Through a data use agreement with the Middlesex Sheriff’s Department, we are able to link CHA patient records to the Middlesex Sheriff’s Office (MSO) County Jail and House of Corrections data. Between 2009 and 2019, 7,098 individuals receiving treatment at CHA were in the Middlesex County jail (pre-trial) or house of corrections (post-trial). This represents about 1.5% of all CHA patients between 2009-2019 and one sixth of the MSO jail/corrections census over that time period. Data include name, sex, birthdate, race/ethnicity and date of entry and discharge, number and specific type of offense (categorized into violent and non-violent offense). Through another data use agreement with the Cambridge Police Department (CPD), we have merged CHA EHR data to CPD call and arrest data in 2009-2019 for youth and adults, having published multiple papers using these linked data.60,61 For both MSO and CPD data, we merge data using LinkPlus 2.0, a publicly available CDC record linkage tool merging on a combination of birthdate, sex, first and last name. 

Community-level sociocontextual Information We link all individuals interacting with the healthcare system to a rich set of community-level (i.e., census block group, census tract, county) sociocontextual data using Proc GEOCODE in SAS. This procedure uses patient street addresses to merge to Federal Information Processing Standard Publication (FIPS) codes. We have been successful in matching 90% of CHA patients using this procedure. Based on the literature and our previous publications on neighborhoods and behavioral health,20,62,63 we categorize community-level data into four sets of factors: 1) psychosocial; 2) economic; 3) built environment; and 4) health-related:62

(1) Psychosocial factors are defined as the influence of social relationships and perceived social control on health outcomes.64 Evidence suggests that areas with high levels of social disorder, result in lower levels of social efficacy (i.e., community supervision, involvement, and trust) and informal social control (i.e., internalization of social norms and values).65 For example, lack of supervision and low community involvement are associated with high crime rates and feelings of estrangement,66 and may contribute towards a higher incidence of mental illness in a neighborhood, and a decreased probability of intervention by neighbors to support those with early symptoms of illness. Block group-level psychosocial factors include: 1) Percent of White residents (measured by the U.S. Census); 2) Theil Index, a measure (range 0 to 1) of residential segregation, where 0 indicates a block group has an equal distribution of racial/ethnic households (i.e. maximum integration) and a score of 1 suggests a block group is completely homogenous (i.e. maximum segregation);67  3) Percent of single female-headed households (measured by the U.S. Census); and 4) Percent of owner occupied homes (measured by the U.S. Census). County-level psychosocial factors are: 1) Number of robberies per 100,000 (measured by the USDOJ Crime Statistics Report); and 2) Number of social associations per 10,000, such as civic, religious, political, or sports organizations (measured by the Robert Wood Johnson Foundation County Health Rankings).

(2) Economic factors refer to individual- and group- level economic resources that influence health.68 For example, residents living within high poverty neighborhoods, as opposed to lower poverty neighborhoods, may be more likely to avoid care because of the cost of treatment, prescription medications and/or a lack of medical insurance and have less time to dedicate to recovery.  Block group-level economic factors are: 1) Percent of families living in poverty (measured by the U.S. Census), an income measure found to be more robust than median household income;69 and 2) Percent of cost-burdened renters, a housing-related measure of financial hardship.70 County-level economic factors are: rate of food insecurity, percent of residents on Supplemental Nutrition Assistance Program (SNAP) benefits, and the 2011-2015 unemployment rate (measured by the U.S. Census).

(3)The built environment refers to the human-made space that surrounds where we live,68 influencing health through housing quality, availability of green space, public transit, and access to food sources. Research suggests that the built environment can influence people’s routine behavior, health, and social patterns.62  A well-built environment influences the opportunities for outdoor activities and decreases isolation, and creates spaces that increase social support and decrease opportunities for crime and using illicit drugs.71 Built environment factors at the block group-level are: 1) Percent of residents taking public transit to work; 2) Percent of residents that moved into the area between 2000 and 2009 (measured by the U.S. Census), capturing the in-migration that may signal changing neighborhood dynamics; 3) Percent of vacant rental units (measured by the U.S. Census); and 4) Limited access to supermarkets, where higher values indicate greater inadequacies in access to healthy food options.72

(4) Health-related factors refer to risky behaviors (e.g. heavy drinking) as well as the use, availability, and access to health services (e.g. number of hospital beds and supply of psychiatrists). Evidence suggests that certain risky behaviors can be normalized when highly prevalent.73 For example, persons at risk for mental illness that misuse substances and live in neighborhoods with a high percent of heavy drinkers, may be at higher risk of suicidal or overdose events. An individual’s ability to access emergency health services in times of crisis is directly related to their mental health outcomes.74 Health-related factors at the county-level downloaded from the Area Health Resource File (AHRF) are: 1) Number of Federally Qualified Health Centers (FQHCs); 2) Number of hospital beds per 10,000; 3) Number of emergency room visits per 10,000; and 4) Number of psychiatrists per 100,000. 

Social Determinants of Health (SDOH) and Other Linked Data: As a complement to the neighborhood-level SDOH measures linked to the patient’s address, we also have access to two sets of individual-level social determinants of health data. The first is the Connect-S measure translated into Spanish, Haitian-Creole, and Portuguese, which is a nine-item survey currently administered in CHA primary care intake and stored in the Epic Data Warehouse (that is accessed and merged into patient records by the research team). The Connect-S records information related to food and housing insecurity (e.g., “What is your housing situation today?”), financial difficulties (e.g., “I skipped medications to save money.”), transportation barriers, employment (“I am unemployed and looking for work”) and an item asking if the patient would like to link to relevant services. 

A second set of SDOH measures comes from a scale piloted at the CHA Malden Family Health Center in collaboration with Dr. David Williams, an expert in the field of social determinants of health and racial/ethnic disparities in mental health (funding source: W.K Kellogg Foundation grant entitled, “Stress (Early Childhood and Politically-Related) and Health”). The instrument takes an average nine minutes to administer. It was developed by Dr. Williams and other experts in racial/ethnic disparities and social determinants of health. In close partnership with CHA interpreter services, the survey was translated into Haitian-Creole, Spanish, and Portuguese. It includes the following domains: social and religious relationships and resources, stressful events and life circumstances, everyday discrimination, relationships and family well-being, financial and material well-being, traumatic experience, acculturative and immigration-related stress, and neighborhood safety. These items are available and linked to the EHR for patients at Malden Family Health Center and can be readily added to the CAT-MH and K-CAT surveys. This piloted survey forms the basis for an intensive adaptive individual SDOH screener (CAT-SDOH) to be rolled out system-wide over the next two years.    

Other datasets are available for linkage for subsets of the CHA patient population including Cambridge Public School data (attendance, grades, truancy, suspension, graduation), and Greater Boston Food Bank data (containing food insecurity information from CHA community health centers linked to patient data). 

Standardized Data Collection using CAT-MH and K-CAT Merged to the Data Warehouse

The addition of the department-wide implementation of adaptive psychiatric testing using the CAT-MH for adults (validated in Spanish,11 and translated into Portuguese), and the K-CAT for youth (translated into Spanish, with translations in progress for Portuguese), allows for longitudinal tracking of mental health symptoms, diagnosis, and functional status conducted in a rapid and reliable manner. The CAT-MH, developed and validated for ages eleven and older by Methods Core Co-I Gibbons, uses item response theory (IRT) to efficiently calibrate information contained in large item banks consisting of hundreds of symptom items using multidimensional item response theory, adaptively administering a small number of items for each individual. CAT diagnostic screening can accurately track an hour-long face-to-face clinician diagnostic interview for major depressive disorder in less than a minute using an average of four questions with unprecedentedly high sensitivity and specificity.75 The CAT-MH and K-CAT are easy for patients to fill in online and to identify whether the respondent meets threshold for depression, suicidality, general anxiety, PTSD, psychosis and other major psychiatric conditions according to DSM-5 criteria in only a few minutes. All of the CAT screeners provide severity levels and thresholds (e.g. mild, moderate, severe) that are based on validated quantitative comparisons. 

CAT-MH and K-CAT data are a key component of the clinical-sociocontextual dataset. We have partnered with Adaptive Testing Technologies to administer the necessary number of longitudinal tests for each R34 and are currently in final negotiations to make the CAT-MH and K-CAT available across the CHA system. We have developed an IT pipeline to seamlessly administer the tests online using RedCap and receive the data in our servers for linkage to Epic data, claims, criminal justice and neighborhood- and individual-level SDOH data. 

ValidationThe CAT-MH and K-CAT have been fully validated for depression severity, depression diagnosis, anxiety diagnosis, suicidality, mania, PTSD, psychosis and suicidality, against reliable standard research structured clinical interviews (e.g. Structured Clinical Interview for DSM-5 (SCID), Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS) and standard clinician rated scales (CSSRS). Depression, Anxiety, and Suicidality: The CAT for depression (CAT-DI) correlates well with the Hamilton Depression Rating Scale (r = 0.79), PHQ-9 (r = 0.90), and CES-D (r = 0.90) in terms of depression severity. Ninety-seven percent of patients indicated that the CAT-MH accurately reflected their mood, 86% preferred the computer interface to all alternatives, 97% felt comfortable taking the test, and 98% reported that they answered honestly.75 The CAT-MDD and CAT-ANX likewise perform well compared to the SCID DSM-5 MDD (sensitivity=0.95 and specificity 0.88) and the SCID DSM-5 (GAD) (sensitivity 0.86 and specificity 0.86). The CAT-SS module is highly sensitive and specific to the Columbia Suicide Severity Rating Scale.75 Mania/hypomania: The CAT-MANIA76 reproduces the information in the 89 item bank using adaptive administration of seventeen items in 3.4 minutes. This has been reduced to approximately twelve items in two minutes through refined CAT termination criteria. The CAT- MANIA was validated against the SCID DSM-5 structured clinical interview for bipolar disorder (BP, BP I, BP II). Across the range of the scale, the probability of a positive diagnosis of bipolar disorder increased twelve-fold. Psychosis: The CAT-Psychosis clinician version measures are adaptive clinician administered and patient self-reported psychosis measures. Test-retest reliability was r=0.86 for the clinician and r=0.82 for the patient self-report.  For the clinician administered version, inter-rater reliability was ICC=0.73.75 Substance use disorder: CAT-SUD assesses the severity of the SUD without regard to the specific substance(s) used, also on a 100-point scale with five points of precision, and frequency of use of five substance use classes during the past month. The CAT-SUD-E expands the CAT-SUD to include an even more detailed view of specific substances available. Severity is scored for each domain (1) SUD, (2) psychological disorders, (3) risky behavior, (4) functional impairment and (5) social support).75 The K-CAT™ extends the adult technology in the CAT-MH™ to children and adolescents ages 7-17, based on parent/caregiver and youth self-reports.9 It measures depression, anxiety, mania/hypomania, ADHD, oppositional defiant disorder, conduct disorder and suicidality.  

 

Bibliography 

5.         Hahm HC, Cook BL, Ault-Brutus A, Alegría M. Intersection of race-ethnicity and gender in depression care: screening, access, and minimally adequate treatment. Psychiatr Serv. 2015;66(3):258–264.

6.         Cook B, Creedon T, Wang Y, et al. Examining racial/ethnic differences in patterns of benzodiazepine prescription and misuse. Drug Alcohol Depend. 2018;187:29-34. doi:10.1016/j.drugalcdep.2018.02.011

7.         Fortuna L, Noyola N, Cook B, Amaris A. Sleep disturbances and substance use disorders: An international study of primary care and mental health specialty care patients. Eur Psychiatry. 2016;33:S109-S110. doi:10.1016/j.eurpsy.2016.01.101

8.         Robinson LA, Menezes M, Mullin B, Cook BL. A Comparison of Health Care Expenditures for Medicaid-Insured Children with Autism Spectrum Disorder and Asthma in an Expanding Accountable Care Organization. J Autism Dev Disord. 2020;50(3):1031-1044. doi:10.1007/s10803-019-04327-z

9.         Gibbons RD, Kupfer DJ, Frank E, et al. Computerized adaptive tests for rapid and accurate assessment of psychopathology dimensions in youth. J Am Acad Child Adolesc Psychiatry. Published online 2019.

10.       Gibbons RD, Weiss DJ, Pilkonis PA, et al. Development of a computerized adaptive test for depression. Arch Gen Psychiatry. 2012;69(11):1104–1112.

11.       Gibbons RD, Alegría M, Cai L, et al. Successful validation of the CAT-MH Scales in a sample of Latin American migrants in the United States and Spain. Psychol Assess. 2018;30(10):1267.

12.       Gibbons RD, Smith JD, Brown CH, et al. Improving the evaluation of adult mental disorders in the criminal justice system with computerized adaptive testing. Psychiatr Serv. 2019;70(11):1040–1043.

13.       Whitney DG, Peterson MD. US national and state-level prevalence of mental health disorders and disparities of mental health care use in children [published online ahead of print February 11, 2019]. JAMA Pediatr.

14.       McGorry P, Nelson B. Why we need a transdiagnostic staging approach to emerging psychopathology, early diagnosis, and treatment. JAMA Psychiatry. 2016;73(3):191–192.

15.       McGorry P, van Os J. Redeeming diagnosis in psychiatry: timing versus specificity. The Lancet. 2013;381(9863):343–345.

16.       Cook BL, Zuvekas SH, Carson N, Wayne GF, Vesper A, McGuire TG. Assessing racial/ethnic disparities in treatment across episodes of mental health care. Health Serv Res. 2014;49(1):206–229.

17.       Cook B, Brown JD, Loder S, Wissow L. Acculturation differences in communicating information about child mental health between Latino parents and primary care providers. J Immigr Minor Health. 2014;16(6):1093–1102.

18.       Platt R, Weiss-Laxer NS, Creedon TB, Roman MJS, Cardemil EV, Cook B. Association between maternal and child mental health among US Latinos: variation by nativity, ethnic subgroup, and time in the USA. Arch Womens Ment Health. Published online 2019:1–8.

19.       Fortuna LR, Álvarez K, Ortiz ZR, et al. Mental health, migration stressors and suicidal ideation among Latino immigrants in Spain and the United States. Eur Psychiatry. 2016;36:15–22.

20.       Cook BL, Zuvekas SH, Chen J, Progovac A, Lincoln AK. Assessing the individual, neighborhood, and policy predictors of disparities in mental health care. Med Care Res Rev. 2017;74(4):404–430.

 

58.       Delman J, Progovac AM, Flomenhoft T, Delman D, Chambers V, Cook BL. Barriers And Facilitators To Community-Based Participatory Mental Health Care Research For Racial And Ethnic Minorities. Health Aff (Millwood). 2019;38(3):391–398.

59.       Carson NJ, Mullin B, Sanchez MJ, et al. Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PloS One. 2019;14(2).

60.       Barrett JG, Janopaul-Naylor E, Rose J, Progovac AM, Hou SS-Y, Cook BL. Do Diverted Kids Stay Out of Trouble?: A Longitudinal Analysis of Redicivism Outcomes in Diversion. J Appl Juv Justice Serv.:13.

61.       Janopaul-Naylor E, Morin SL, Mullin B, Lee E, Barrett JG. Promising approaches to police–mental health partnerships to improve service utilization for at-risk youth. Transl Issues Psychol Sci. 2019;5(2):206.

62.       Flores MW, Le Cook B, Mullin B, et al. Associations between Neighborhood-Level Factors and Opioid-Related Mortality: A Multilevel Analysis using Death Certificate Data. Addiction. Published online 2020.

63.       Cook B, Doksum T, Chen C, Carle A, Alegría M. The role of provider supply and organization in reducing racial/ethnic disparities in mental health care in the US. Soc Sci Med. 2013;84:102–109.

64.       Hembree C, Galea S, Ahern J, et al. The urban built environment and overdose mortality in New York City neighborhoods. Health Place. 2005;11(2):147-156. doi:10.1016/j.healthplace.2004.02.005

65.       Sampson RJ, Raudenbush SW. Disorder in Urban Neighborhoods--Does It Lead to Crime?: (512722006-001). Published online 2001. doi:10.1037/e512722006-001

66.       Sampson R, Morenoff J, Earls F. Beyond Social Capital: Spatial Dynamics of Collective Efficacy for Children. Am Sociol Rev. 1999;64:633-660.

67.       Iceland, J. The multigroup entropy index (also known as Theil’s H or the information theory index).

68.       Beck AF, Sandel MT, Ryan PH, Kahn RS. Mapping Neighborhood Health Geomarkers To Clinical Care Decisions To Promote Equity In Child Health. Health Aff Proj Hope. 2017;36(6):999-1005. doi:10.1377/hlthaff.2016.1425

69.       Krieger N, Chen JT, Waterman PD, Rehkopf DH, Subramanian SV. Race/ethnicity, gender, and monitoring socioeconomic gradients in health: a comparison of area-based socioeconomic measures--the public health disparities geocoding project. Am J Public Health. 2003;93(10):1655-1671. doi:10.2105/ajph.93.10.1655

70.       Heflin CM, Iceland J. Poverty, Material Hardship and Depression. Soc Sci Q. 2009;90(5):1051-1071. doi:10.1111/j.1540-6237.2009.00645.x

71.       Cohen DA, Inagami S, Finch B. The built environment and collective efficacy. Health Place. 2008;14(2):198-208. doi:10.1016/j.healthplace.2007.06.001

72.       Fund T. 2014 Analysis of Limited Supermarket Access. Published online 2015.

73.       Perkins HW. Social norms and the prevention of alcohol misuse in collegiate contexts. J Stud Alcohol Suppl. 2002;(14):164-172. doi:10.15288/jsas.2002.s14.164

74.       Kessler RC, Bossarte RM, Luedtke A, Zaslavsky AM, Zubizarreta JR. Suicide prediction models: a critical review of recent research with recommendations for the way forward. Mol Psychiatry. 2020;25(1):168-179. doi:10.1038/s41380-019-0531-0

75.       Gibbons RD, Weiss DJ, Frank E, Kupfer D. Computerized Adaptive Diagnosis and Testing of Mental Health Disorders. Annu Rev Clin Psychol. 2016;12(1):83-104. doi:10.1146/annurev-clinpsy-021815-093634

76.       Achtyes ED, Halstead S, Smart L, et al. Validation of computerized adaptive testing in an outpatient nonacademic setting: the VOCATIONS trial. Psychiatr Serv. 2015;66(10):1091–1096.