This Is How We Created Gold Standard Data for Developing the NLP Pipelines

To create gold standard data for developing the NLP pipelines, we selected 300 notes from 300 unique patients at MSHS and 225 notes from 221 unique patients at WCM for fine- and coarsegrained manual annotation.

featured-image

Table of LinksAbstract and 1. Introduction2 Data2.1 Data Sources2.

2 SS and SI Categories3 Methods3.1 Lexicon Creation and Expansion3.2 Annotations3.



3 System Description4 Results4.1 Demographics and 4.2 System Performance5 Discussion5.

1 Limitations6 Conclusion, Reproducibility, Funding, Acknowledgments, Author Contributions, and ReferencesSUPPLEMENTARYGuidelines for Annotating Social Support and Social Isolation in Clinical NotesOther Supervised Models3.2 AnnotationsTo create gold standard data for developing the NLP pipelines, we selected 300 notes from 300 unique patients at MSHS and 225 notes from 221 unique patients at WCM for fine- and coarsegrained manual annotation. Notes were chosen from unique patients to maximize the contextual diversity of SS/SI terms (different note-writers, different time periods, and avoiding redundancy caused by copy-forward practices within a single patient’s EHR).

To optimize the gold standard annotation set of notes, those selected for review were enriched for mentions of SS and SI: 75 notes were selected that had at least one occurrence of an SI lexicon term, another 75 notes for SS, and finally 75 notes were randomly selected from the reminder of underlying corpus. At MSHS, 75 additional notes were selected that contained a clinical note template to further enrich the annotation corpus for notes in which a clinician was prompted (by the template) to assess SS/SI.The Brat Rapid Annotation Tool (BRAT) [40] was used to annotate the notes manually with the same annotation configuration schema across sites.

The annotation guideline‡ and lexicons are provided in Supplementary Tables S3. Initially, the annotations were performed at the entity level (every instance of a lexicon term in the note text) using BRAT. For evaluation, the entity-level annotations were converted to “document” (note) level.

For example, if there was a single entity mentioning loneliness and two mentions of instrumental support in a given note, the loneliness and instrumental support subcategories were assigned to that note. Finally, the coarse-grained categories were assigned to each document using rules. SS was assigned to a document if there were one or more mentions of any SS subcategories and similarly, SI was labeled if there were one or more mentions of any SI subcategories.

The above note would be annotated with both SI (for loneliness) and SS (for instrumental support).The notes were meticulously reviewed by two annotators and disagreements were resolved by a third adjudicator to create the final gold-standard corpus. For coarse-grained annotation, the inter-annotator agreement (IAA) Cohen’s Kappa scores were 0.

92 [MSHS] and 0.86 [WCM]; for fine-grained, 0.77 [MSHS] and 0.

81 [WCM]. The counts of fine- and coarse-grained categories found in the gold-standard data are provided in Supplemental Table S5.The rule book was used to train the annotators and were continually updated during the adjudication process.

Often, disagreeing annotations could both be seen as correct given the inherent subjectivity of the classification process; however, new rules were created to arrive at one consistent label for edge cases. Sometimes, rules were created for more practical reasons, for example, mentions of ‘psychotherapy’ were excluded from emotional support because otherwise almost every note in the MSHS psychiatric corpus would be flagged. Of note, mentions were only labelled when SS/SI was explicit and not implied.

For example, a mention of ‘boyfriend’ or ‘living alone’ without further context would not count. The general subcategory became a “catch-all” for mentions that clearly involved support or isolation, but a single fine-grained category could not be discerned. For example, ‘staying on his best friend’s couch’ could be seen as instrumental support (providing shelter), social network (having friends), or emotional support (best friend implies a level of closeness).

At both institutions, the IAA was reflective of the subjective, overlapping nature of the fine-grained subcategories. Another reason for disagreements between annotators was the site-specific familiarity required to recognize acronyms and social services, e.g.

, ‘HASA stands for the HIV/AIDS Services Administration.’:::infoThis paper is available on arxiv under CC BY 4.0 DEED license.

:::‡ rule book and annotation guideline are used interchangeably:::infoAuthors:(1) Braja Gopal Patra, Weill Cornell Medicine, New York, NY, USA and co-first authors;(2) Lauren A. Lepow, Icahn School of Medicine at Mount Sinai, New York, NY, USA and co-first authors;(3) Praneet Kasi Reddy Jagadeesh Kumar. Weill Cornell Medicine, New York, NY, USA;(4) Veer Vekaria, Weill Cornell Medicine, New York, NY, USA;(5) Mohit Manoj Sharma, Weill Cornell Medicine, New York, NY, USA;(6) Prakash Adekkanattu, Weill Cornell Medicine, New York, NY, USA;(7) Brian Fennessy, Icahn School of Medicine at Mount Sinai, New York, NY, USA;(8) Gavin Hynes, Icahn School of Medicine at Mount Sinai, New York, NY, USA;(9) Isotta Landi, Icahn School of Medicine at Mount Sinai, New York, NY, USA;(10) Jorge A.

Sanchez-Ruiz, Mayo Clinic, Rochester, MN, USA;(11) Euijung Ryu, Mayo Clinic, Rochester, MN, USA;(12) Joanna M. Biernacka, Mayo Clinic, Rochester, MN, USA;(13) Girish N. Nadkarni, Icahn School of Medicine at Mount Sinai, New York, NY, USA;(14) Ardesheer Talati, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;(15) Myrna Weissman, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;(16) Mark Olfson, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA, New York State Psychiatric Institute, New York, NY, USA, and Columbia University Irving Medical Center, New York, NY, USA;(17) J.

John Mann, Columbia University Irving Medical Center, New York, NY, USA;(18) Alexander W. Charney, Icahn School of Medicine at Mount Sinai, New York, NY, USA;(19) Jyotishman Pathak, Weill Cornell Medicine, New York, NY, USA.:::.