Using GIS to Understand a Rare Disease: Primary Sclerosing Cholangitis (PSC) ESRI Regional User Conference Estella M. Geraghty, MD, MS, MPH/CPH, FACP, GISP Christopher L. Bowlus, MD - Gastroenterologist March 7, 2012
Outline § What is PSC? § GIS Techniques 1. Distribution Mapping 2. Hot Spot Analysis 3. Distance to Transplant Centers 4. Validation 5. Interpolation § Next Steps
What is PSC? § A rare, chronic & progressive disease of the biliary tract
Image from Johns Hopkins Medicine, Gastroenterology and Hepatology website
What is PSC? § The normal gallbladder and biliary tree are smooth § Inflammation causes changes in PSC § Strictures § Beads on a string Images from Johns Hopkins Medicine, Gastroenterology and Hepatology website
What is PSC? § Can be quiescent for a long time (yrs) § Symptoms § Abdominal pain § Itching & jaundice § Fever and chills § Weight loss, fatigue
Image from Johns Hopkins Medicine, Gastroenterology and Hepatology website
Complications of PSC? § Cirrhosis § Cholangiocarcinoma § Colon Cancer
Images from Johns Hopkins Medicine, Gastroenterology and Hepatology and Daily Health News websites
Causes of PSC…
Ischemic vascular damage
Toxins from intestinal bacteria
GIS Step 1. US Distribution of PSC? § Data § United Network for Organ Sharing (UNOS) 1995-2008 § U.S. Census population 2008 § Enumeration unit – ZIP code § Analysis variable – PSC prevalence PSC cases on transplant list *100,000 = PSC prevalence per 100,000 population Population
GIS Step 1. US Distribution of PSC? § Some stats: § 6767 PSC cases over 14 years § Lost 78 cases - no ZIP code (1.15%) § Excluded ZIP codes with < 1000 population (302 cases) § Cases occur in 4615 of 30322 (15.22%) of US ZIP codes
GIS Step 2. Are there ‘hot spots’ of PSC? § Used the method prescribed by ‘the Laurens’ § Control for the heterogeneity of ZIP code sizes § Determine the distance where clustering is most intense § Use spatial weights matrix to include large ZIP codes See Spatial Statistics Resources at: http://blogs.esri.com/esri/arcgis/2010/07/13/spatial-statistics-resources/
GIS Step 3. Distance to Transplant Centers § Euclidean distance calculated from each ZIP code centroid to nearest transplant center § Mean distance was shorter to a transplant center in hot spots vs cold spots (p0 § Hot spot = 10 § GiPValue < 0.0501 & GiZScore 0.0501 § Equivalent to mean = 0
GIS Step 4. Validation § Union parts A & B § Add a field (sum) § Add parts A & B
Possible Combinations
Meaning
10 + 10
= 20
Both hot spots
-1 + -1
= -2
Both cold spots
0
Both ‘mean’ spots
+
0
=
10 +
0
= 10
One hot & one ‘mean’ spot
-1 +
0
= -1
One cold & one ‘mean’ spot
=
One hot & one cold spot
10 + -1
0
9
Grey areas are non-concordant
Spearman Rank Correlation Coefficient = 0.2359
Spatial Spearman’s § Permutation analysis (2 models) § To determine significance (thanks to Lauren Scott and Mark Janikas)!
Model #1
Add ‘0’ to every row (default)
Determine the number of iterations
Clear selection
Send to next model
Clear selection
Also selecting and counting those = -2 and 0 (all concordant rows)
Model #2
Create a table name and location
Add field to contain the concordant row count
Receives data from permutation model
Last 3 rows of code block: icursor.insertRow(row) del icursor return rtable
Idaho, Utah & Colorado – mostly mountainous areas
Ohio & Susquehanna River Valleys – mostly low lying
Lead & Inflammation § Conflicting evidence
GIS Step 5. Interpolation of lead levels § TOXMAP
GIS Step 5. Interpolation of lead levels § Challenges: § Data are aggregated over many years for each station § >41,300 stations in U.S. § Latitude and longitude are included…but… § Multiple datums are used § Eg. Astro, OLDHI, Unknown
GIS Step 5. Interpolation of lead levels § § § § § §
Separated dataset by datum Geocoded stations in each datum Transformed each result to WGS84 Appended results into one file Dealt with multiple outliers Performed IDW & Kriging interpolations
Next Steps § Re-aggregate the interpolated lead results to ZIP code § Zonal analysis vs § Gaussian geostatistical simulation § Use the exploratory regression tool to evaluate variable relationships § Consider other environmental variables § Ideas?
Thank you for your attention!
Este Geraghty
[email protected] [email protected]