Classification of pathology reports for Cancer Registry notifications An Automated Tool to Identify Cancer Cases A. Nguyen1, J. Moore2, G. Zuccon1, M. Lawley1, S. Colquist2 1 Australian e-Health Research Centre, CSIRO 2 Cancer Control Analysis Control Team, Qld Health THE AUSTRALIAN E-HEALTH RESEARCH CENTRE | ICT CENTRE
Manual Pathology Notifications Cancer is a notifiable disease in all States and Territories in Australia Public and private pathology laboratories legally required under the Public Health Act 2005 to provide copies of specimen reports that contain a result of cancer to the Cancer Registry
Pathology lab identifies notifiable reports
Timely and labour intensive process
2 | Automating Cancer Registry Notifications
Cancer Registry manually sort and code cancer cases
Automating Pathology Notifications Sending and receiving electronic HL7 feeds from pathology laboratories is now available in Queensland Health HL7 Messages
Pathology Lab
Pathology Information System
Challenge: Still need to classify pathology reports into those that are notifiable to the Cancer Registry >> 100,000 pathology reports per year
3 | Automating Cancer Registry Notifications
Hypothesis Automated computer system could perform the time and labour intensive manual review of cancer cases
Automatically scan free-text medical documents for terms relevant to cancer
4 | Automating Cancer Registry Notifications
Design Electronic Pathology Report
1. Filter pathology reports Not Cancer Notifiable Report
- Retain cytology/histology reports - Exclude urine/sputum samples Filter
Cancer Notifiable Candidate Not Cancer Notifiable Result
Medical Free-Text Analysis (MEDTEX)
2. Classify Histological Type - Histological type candidate generation - Histological type selection based on context
3. Classify Supporting Notifiable Reports Includes basal and squamous cell carcinoma of skin, and benign cancers (excluding central nervous system & brain)
5 | Automating Cancer Registry Notifications
Cancer Notification
- “Re-excision/residual” keyword spotting and association with histological type candidates - Flag “suspected” histological type candidates
Design – Pathology report type filtering 1. Report Type Filtering
Electronic Pathology Report
1. Filter pathology reports Not Cancer Notifiable Report
- Retain cytology/histology reports - Exclude urine/sputum samples Filter
Cancer Notifiable Candidate Not Cancer Notifiable Result
Medical Free-Text Analysis (MEDTEX)
2. Classify Histological Type
Retrieve report types that are potentially notifiable totypethe QCR - Histological candidate generation - Histological type selection based on context
Histology (and haematology) & Cytology (excl. urine,3. sputum and pap smears). Classify Supporting Notifiable Reports “Re-excision/residual” keyword spotting and and squamous Includes HL7basal order detail (OBR) segment, Universal service ID-association (UnivServID) & Specimen with histological type candidates cell carcinoma of skin, and Source (SpecSource) field - Flag “suspected” histological type candidates benign cancers (excluding Cancer central nervous system & brain)
6 | Automating Cancer Registry Notifications
Notification
Report Type
Pathology Test (UnivServId)*
Histology
Bone marrow BM Asp & Treph
Histology Frozen Histology Biopsy
Cytology
Cytology (Skin, D/C) Cytology (Fluids)
Cytology FNA Flow Cytometry
Design – Notifiable report classification 1. Notifiable cancer classification Electronic Pathology Report
2. Supporting notifiable report classification 1. Filter pathology reports - Retain cytology/histology reports Queensland Cancer Registry business rules - Exclude urine/sputum samples Not Cancer Filter NaturalNotifiable language processing Report Inference & reasoning using SNOMED CT Cancer Notifiable Candidate Not Cancer Notifiable Result
Medical Free-Text Analysis (MEDTEX)
2. Classify Histological Type - Histological type candidate generation - Histological type selection based on context
3. Classify Supporting Notifiable Reports Includes basal and squamous cell carcinoma of skin, and benign cancers (excluding central nervous system & brain)
Cancer Notification
- “Re-excision/residual” keyword spotting and association with histological type candidates - Flag “suspected” histological type candidates
2. Notifiable Report Classification
7 | Automating Cancer Registry Notifications
Design – Notifiable report classification 1. Notifiable cancer classification
All invasive cancers excluding basal cell carcinoma (BCC) and squamous cell carcinomas (SCC) of the skin; Any cancer with uncertain behavior; All in-situ conditions; and Benign central nervous system and brain tumours. Filter concepts that are asserted “absent” or “possible”
SNOMED CT Concept ID 367651003
Fully Specified Name Malignant neoplasm of primary, secondary, or uncertain origin (morphologic abnormality)
86251006
Neoplasm, uncertain whether benign or malignant (morphologic abnormality)
127569003
In situ neoplasm (morphologic abnormality)
253061008
Nervous system tumor morphology (morphologic abnormality)
128928004
Neuroendocrine neoplasm (morphologic abnormality)
115241005
Neuroepitheliomatous neoplasm (morphologic abnormality)
8 | Automating Cancer Registry Notifications
Design – Notifiable report classification 1. Notifiable cancer classification 2. Supporting notifiable report classification
Follow-up pathology reports that include excisions resulting in no residual cancer, re-excisions and suspected notifiable cancers – At least one histological type candidate was asserted as “possible” – At least one histological type candidate was associated with “residual” – Keyword “re-excision” was found in the report.
10 | Automating Cancer Registry Notifications
Evaluation
Development Set
Test Set
237 (201/36)
248 (220/28)
Non-Notifiable
263
231
Total
500
479
Notifiable (Canc./Supp.)
11 | Automating Cancer Registry Notifications
Results 1
0.95 0.9
0.85 0.8
0.75 0.7 Development (N=500) Evaluation (N=479)
Sensitivity 0.987 0.984
Specificity 0.951 0.957
PPV, positive predicted value
12 | Automating Cancer Registry Notifications
PPV 0.947 0.961
F-measure 0.967 0.972
Error Analysis Confusion matrix Frequency counts according to assigned “System” labels and actual “Ground Truth” labels Missed Notifications, higher cost Development Set
Test Set
System
System
Ground Truth
Notifiable
Not Notifiable
Ground Truth
Notifiable
Not Notifiable
Notifiable
234
3
Notifiable
244
4
Not Notifiable
13
250
Not Notifiable
10
221
Correct classifications False positive notifications, lower cost
Missed notifications: 2x supporting notifiable reports, 2x SCC/BCC of skin, 1x negation, 1x report substructure parsing False positive notifications: 10x supporting notifiable reports, 3x SCC/BCC of skin, among others relating to classification algorithm 13 | Automating Cancer Registry Notifications
Summary Automatic tool for identifying cancer cases Medical free-text processing can achieve reliable classification of cancer notifiable pathology reports Queensland Cancer Registry business rules Natural language processing Semantic reasoning using SNOMED CT
Potential use by Cancer Registries and pathology laboratories
14 | Automating Cancer Registry Notifications
Thank you The Australian e-Health Research Centre Dr Anthony Nguyen Project Leader t +61 7 3253 3637 e
[email protected] w aehrc.com THE AUSTRALIAN E-HEALTH RESEARCH CENTRE | ICT CENTRE