Monday, July 18, 2011

It Seems SNOMED CT Has A Few Issues That Need to Be Addressed. Right Now It Is Apparently Broken And Needs to Be Fixed!

The following paper was formally released a few weeks ago.

J Am Med Inform Assoc. 2011 July; 18(4): 432–440.

Published online 2011 April 21. doi: 10.1136/amiajnl-2010-000045 PMCID: PMC3128394

Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications

Alan L Rector,1 Sam Brandt,2 and Thomas Schneider1

1School of Computer Science, University of Manchester, Manchester, UK

2Siemens Health Services, Malvern, Pennsylvania, USA

Correspondence to Alan L Rector, School of Computer Science, University of Manchester, Manchester M13 9PL, UK;

Received December 13, 2010; Accepted December 30, 2010.



(a) To determine the extent and range of errors and issues in the Systematised Nomenclature of Medicine – Clinical Terms (SNOMED CT) hierarchies as they affect two practical projects. (b) To determine the origin of issues raised and propose methods to address them.


The hierarchies for concepts in the Core Problem List Subset published by the Unified Medical Language System were examined for their appropriateness in two applications. Anomalies were traced to their source to determine whether they were simple local errors, systematic inferences propagated by SNOMED's classification process, or the result of problems with SNOMED's schemas. Conclusions were confirmed by showing that altering the root cause and reclassifying had the intended effects, and not others.

Main results

Major problems were encountered, involving concepts central to medicine including myocardial infarction, diabetes, and hypertension. Most of the issues raised were systematic. Some exposed fundamental errors in SNOMED's schemas, particularly with regards to anatomy. In many cases, the root cause could only be identified and corrected with the aid of a classifier.


This is a preliminary ‘experiment of opportunity.’ The results are not exhaustive; nor is consensus on all points definitive.


The SNOMED CT hierarchies cannot be relied upon in their present state in our applications. However, systematic quality assurance and correction are possible and practical but require sound techniques analogous to software engineering and combined lexical and semantic techniques. Until this is done, anyone using SNOMED codes should exercise caution. Errors in the hierarchies, or attempts to compensate for them, are likely to compromise interoperability and meaningful use.

Keywords: Knowledge bases, knowledge representations, methods for integration of information from disparate sources, knowledge acquisition and knowledge management, developing and refining EHR data standards (including image standards), data models, data exchange, controlled terminologies and vocabularies, communication, integration across care settings (inter- and intraenterprise), ontologies, terminology, EHRs

The full free text is available here:

This paper needs to be carefully considered as it is written by an internationally recognised authority in the area of clinical terminology deployment.

Here is a link to his home page:

The range of issues and problems identified included the following:

  • Errors and omissions with propagation and helter-skelter modelling
  • Incomplete modeling: myocardial infarction and ischemic heart disease
  • Issues with sites of systemic disorders
  • Errors in modeling anatomy: Structure-Entire-Part (SEP) triples and the ankle in the abdomen
  • Overgeneralized concepts with underspecified ‘fully specified names’
  • Lack of distinction between structure and function
  • Inconsistent modeling of complications: hypertensive disorders

Detailed examples of each of these are found in the text.

The full text of the conclusions is as follows:

“This study has five classes of outcome:

  • On the SNOMED hierarchies. There are sufficient anomalies in the hierarchies that they cannot be used without significant modification in our applications. More generally, we question whether clinicians entering codes or researchers retrieving information understand their implications. As postcoordination relies on accurate classification, it is doubtful that applications using postcoordination will behave predictably.
  • On the use of description logic in SNOMED. Using a description logic is both part of the problem and part of the solution. The response to the issues raised here is not to abandon SNOMED's description logic but to use it more effectively. Using a description logic means that the correcting root errors found in modules will usually repair analogous problems throughout SNOMED.
  • On the possibility of quality assurance of SNOMED. Given modern tooling and computer power, the barriers quality assurance of SNOMED can now be overcome, although no well-integrated toolset is yet available.
  • On practicality of quality assurance of SNOMED. This was a preliminary study and not exhaustive, but it required less than three person-months using poorly integrated tools. Given an integrated toolset, we estimate that a thorough quality assurance of the Core Problem List Subset would require a small team under 2 years, probably less. This would cover a high fraction of all uses of SNOMED. Most changes would be propagated automatically by the description logic into the full SNOMED corpus. Applying these methods to the remainder of the SNOMED findings would require further resources, but they would be minor by comparison with the effort already devoted to SNOMED's development, let alone to those that will be required for its implementations.lvii
  • On methods required. Using a description logic requires staff who understand both medical content and description logics. It requires adapting the techniques of software engineering to tracing and managing errors. Space does not permit setting out a detailed methodology.lviii However, key maxims should include:
    • Start from clinically important concepts—use clinical intuition.
    • Focus on the classified hierarchies—reclassify after every change.
    • Work in small modules—so that reclassification is quick.
    • Look upwards first and then downwards—there are fewer ancestors than descendants.
    • Trace all errors to their root cause—avoid local ‘kluging.’
    • Look for analogous errors and repair using consistent patterns—for example, complications and sites.
    • Reformulate problematic sections systematically rather than attempting to repair them—for example, head injury and branches in anatomy.
    • Use a combination of lexical and semantic methods—as first suggested by Campbell et al19 and now made straightforward using Ontology Patterns Preprocessing Language (OPPL).20
    • Test systematicallymaintain a suite of ‘unit tests’ covering all issues identified; include tests for unintended consequences of changes; run test suite after every major set of changes and before each release.

Some might argue that many of the erroneous classifications reported here are several steps removed from the original concept in the hierarchies and would be ignored by clinicians. However, the semantics of the description logic underpinning SNOMED is unambiguous. Software and queries must follow them literally. Likewise, the reliability of postcoordination is a function of the reliability of the classifier, which is best determined by its manifestation in the hierarchies.

Until comprehensive quality assurance has been undertaken, anyone using, or mandating, SNOMED should be aware that the hierarchies contain serious anomalies. Should a ‘Reference terminology’ classify diabetes as a disease of the abdomen; fail to classify myocardial infarction as ischemic heart disease; place the arteries of the foot in the abdomen?

Without further quality assurance, clinicians may not realize the implications of what they are saying; researchers may not realize what their queries should retrieve, and postcoordination cannot be expected to be reliable. Interoperability, and therefore meaningful use, will be limited.”

I suggest anyone who is interested in the area read the whole paper carefully and then e-mail NEHTA ( asking them just when the work recommended here will be undertaken and finalised. A decision to deploy SNOMED CT was made by NEHTA about 4 years ago and the very limited use so far also suggests there are some significant implementation problems.

It seems that while SNOMED is the best available choice for a clinical terminology there is a real effort to be undertaken to make it fully ‘fit for purpose’. Right now is seems it isn’t. It is especially worrying that there seem to be some clear patient safety issues.

Again we seem to be seeing that NEHTA has over promised and under delivered. They need to get weaving and push for the changes Prof. Rector is suggesting with IHTSDO - the international maintainers of SNOMED.



Anonymous said...

1) It is well known and documented that inconsistencies exist in SNOMED CT

2) Many of these are formally documented as work items in the issue tracker - e.g., for Ischemic heart disease and myocardial infarction

3) Some fixes are simple, others are not, and unless you are expert in the modelling of SNOMED CT, the constraints on fixes may be non-obvious (note this expertise is *not* necessary to use SNOMED CT)

4) These kinds of problems are not specific to SNOMED CT and plague many, if not all, non-trivial terminologies, whether they are belong to a standard or are ad-hoc or proprietary. It is the description logic semantics of SNOMED CT that makes these problems explicit, apparent, and detectable. This is a good thing - when the semantics are implicit or buried in software or ad hoc queries the problems are more likely to remain hidden and unknown.

Dr David More MB, PhD, FACHI said...

I think that was rather what Prof Rector was saying as I was - there are some issues that need fixing. Can I also suggest knowing about issues is rather different to fixing them. One wonders if they are known about why are they not fixed?


Anonymous said...

Issues don't get fixed because (a) there is more than one way of 'fixing' things and SNOMED is a consensus standard; the community of interest has to agree and endorse the method of correction and the outcome (so we're herding cats) (b) there are probably ~10 people in the world with enough expertise in SNOMED, description logic, modelling as well as clinical knowledge (not to mention tools) who could 'fix' it properly (regardless of agreed methods and approaches) These people are already grossly over-employed. It is also worth noting that SNOMED is an international standard, with licensing terms and conditions and constraints on changing the content (for better or worse). Therefore AU operatives (NeHTA or otherwise) cannot willy-nilly decide to fix it for ourselves.

Dr David More MB, PhD, FACHI said...

I didn't say NEHTA had to fix it. I said they should push to have it properly fixed. There are a lot of issues if you read the paper.


Anonymous said...

What makes you think they are not doing this?

Dr David More MB PhD FACHI said...

Evidence for this is?