Consumer Data (From Dina)
I. Introduction
The page describes consumer data that are used in baseline dictionary. There are four files in this data set:
- umls_anatomy_merged.txt
- umls_interventions_merged.txt
- umls_population_merged.txt
- umls_problem_merged.txt
II. Algorithm
The above 4 files are generated from UMLS (2013AB?) by the following steps:
- Retrieve English strings from UMLS, filtered by semantic types
- St list (abb): selected Semantic Types in abbreviation
- SRDEF: converts ST abb to TUI
- MRSTY.RRF: CUI|TUI, use as filter
- MRCONSO.RRF: Terms|CUI, used to retrieve terms
- Lower case
- Add some terms from Gopher, problem list, Susan's data, etc.
III. Analysis
| File Name | Semantic Types | Terms | Not UMLS (No CUI) |
|---|---|---|---|
| umls_anatomy_merged.txt | 9 | 295,932 | 0 |
| umls_interventions_merged.txt | 65 | 528,668 | expo: 5,457 |
| umls_population_merged.txt | 4 | 5,898 | 0 |
| umls_problem_merged.txt | 68 | 644,839 | prob: 1,643, (from Gopher Terms) |
| Total Terms | 147 | 1,475,204 | all.txt.1 |
| Total Unique Terms | 97 | 1,469,339 | all.txt.1.uSort |
| Total Tokens | N/A | 299,669 | medDic.data |
IV. Others
- Program: ${PRE_PROCESS}/bin/RunPreProc
- Data: ${PRE_PROCESS}/data/Baseline/inData
- Data: ${PRE_PROCESS}/data/Baseline/outData
- If the data is generated from 2013AA UMLS, there are three ST (abb) are not in the SRDEF, 2013AA (they are actually exist before 2009AB):
ST abb Source File (term no) alga - umls_problem_merged.txt (1)
invt rich - umls_problem_merged.txt (3)
V. Other Resources
Other resources are used to merge to the above 4 files:
- PICO Interventions list
- PICO Framework
- Paper (2007 Ned Tijdschr Tandheelkd): The PICO (Patient-Intervention-Comparison-Outcome) question
- Paper (2006 AMIA): Evaluation of PICO as a knowledge representation for clinical questions.
- UMLS problem list (Kin-Wah Fung)
- Analysis:
The above two files are used as source for the interventions and problem list:
- No Cui Semantic Types abbreviations are the same:
- [expo] in intervention
- [prob] in problem
- Same typos of [fndg ] are found in both umls_problem_list.txt and umls_problem_merged.txt
liver is small|C0577047|small liver|fndgspleen is enlarged|C0038002|Splenomegaly|fndg
File Name Semantic Types Terms Not UMLS (No CUI) interventions.txt (PICO) 76 30,492 expo: 6,344 umls_problem_list.txt (UMLS) 71 254,420 prob: 1,792 - No Cui Semantic Types abbreviations are the same:
