13.07.2015 Views

Acf302 - HL7

Acf302 - HL7

Acf302 - HL7

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9DRAFTTable of ContentsAbstract . . . . . . . . . . . . . . . . . . 11 Introduction . . . . . . . . . . . . . . . . . 21.1 Goals . . . . . . . . . . . . . . . . . . 31.2 Methods . . . . . . . . . . . . . . . . . 61.2.1 Analysis of Semantic Fields . . . . . . . . . . . . 71.2.2 Form of Data Type Definitions . . . . . . . . . . . 101.2.3 Generalized Types . . . . . . . . . . . . . . 111.2.4 Generic Types . . . . . . . . . . . . . . . 121.2.5 Collections . . . . . . . . . . . . . . . 151.2.6 The Meta Model . . . . . . . . . . . . . . 181.2.7 Implicit Type Conversion . . . . . . . . . . . . 221.2.8 Literals . . . . . . . . . . . . . . . . 261.2.9 Instance Notation . . . . . . . . . . . . . . 261.2.10 Typus typorum: Boolean . . . . . . . . . . . . 281.2.11 Incomplete Information . . . . . . . . . . . . 311.2.12 Update Semantics . . . . . . . . . . . . . . 332 Text . . . . . . . . . . . . . . . . . . . 362.1 Introduction . . . . . . . . . . . . . . . . 362.1.1 From Characters to Strings . . . . . . . . . . . . 362.1.2 Display Properties . . . . . . . . . . . . . . 372.1.3 Encoding of appearance . . . . . . . . . . . . . 372.1.4 From appearance of text to multimedial information . . . . . . 392.1.5 Pulling the pieces together . . . . . . . . . . . . 402.2 Character String . . . . . . . . . . . . . . . 402.2.1 The Unicode . . . . . . . . . . . . . . . 412.2.2 No Escape Sequences . . . . . . . . . . . . . 422.2.3 ITS Responsibilities . . . . . . . . . . . . . 422.2.4 <strong>HL7</strong> Applications are "Black Boxes" . . . . . . . . . . 432.2.5 No Penalty for Legacy Systems . . . . . . . . . . . 442.2.6 Unicode and XML . . . . . . . . . . . . . . 472.3 Free Text . . . . . . . . . . . . . . . . . 472.3.1 Multimedia Enabled Free Text . . . . . . . . . . . 482.3.2 Binary Data . . . . . . . . . . . . . . . 552.3.3 Outstanding Issues . . . . . . . . . . . . . . 573 Things, Concepts, and Qualities . . . . . . . . . . . . . 583.1 Overview of the Problem Space . . . . . . . . . . . . 583.1.1 Concept vs. Instance . . . . . . . . . . . . . 583.1.2 Real World vs. Artificial Technical World . . . . . . . . . 593.1.3 Segmentation of the Semantic Field . . . . . . . . . . 60DRAFT version 1.0 22 Mar 1999i


DRAFT3.2 Technical Instances . . . . . . . . . . . . . . . 623.2.1 Technical Instance Identifier . . . . . . . . . . . . 653.2.2 ISO Object Identifiers . . . . . . . . . . . . . 673.2.3 Technical Instance Locator . . . . . . . . . . . . 713.2.4 Outstanding Issues . . . . . . . . . . . . . . 723.3 Real World Instances . . . . . . . . . . . . . . 733.3.1 Real World Instance Identifier . . . . . . . . . . . 743.3.2 Postal and Residential Address . . . . . . . . . . . 84Examples . . . . . . . . . . . . . . . . . 883.3.3 Person Name . . . . . . . . . . . . . . . 943.3.4 Organization Name . . . . . . . . . . . . . . 1143.4 Technical Concepts and the Code Value . . . . . . . . . . 1163.4.1 Outstanding Issues . . . . . . . . . . . . . . 1183.5 Real World Concepts . . . . . . . . . . . . . . 1203.5.1 The Concept Descriptor . . . . . . . . . . . . . 1223.5.2 Code Translation . . . . . . . . . . . . . . 1233.5.3 Code Phrase . . . . . . . . . . . . . . . 1243.5.4 Examples . . . . . . . . . . . . . . . . 1243.5.5 Outstanding Issues . . . . . . . . . . . . . . 1284 Quantities . . . . . . . . . . . . . . . . . . 1324.1 Overview . . . . . . . . . . . . . . . . . 1324.2 Integer Number . . . . . . . . . . . . . . . 1334.3 Floating Point Number . . . . . . . . . . . . . . 1344.4 Ratio . . . . . . . . . . . . . . . . . . 1374.5 Measurements . . . . . . . . . . . . . . . . 1384.5.1 Physical Quantities . . . . . . . . . . . . . . 1394.5.2 Monetary Quantities: Currencies . . . . . . . . . . . 1404.5.3 Things as Pseudo Units . . . . . . . . . . . . . 1434.6 Time . . . . . . . . . . . . . . . . . . 1444.6.1 Point in Time . . . . . . . . . . . . . . . 1444.6.2 Time Durations . . . . . . . . . . . . . . 1474.6.3 Other issues and curiosities about Time . . . . . . . . . 1474.6.4 Calendar Modulus Expressions . . . . . . . . . . . 1485 Orthogonal Issues . . . . . . . . . . . . . . . . 1495.1 Interval . . . . . . . . . . . . . . . . . 1495.2 General Annotations . . . . . . . . . . . . . . 1525.3 The Historical Dimension . . . . . . . . . . . . . 1545.3.1 Generic Data Type for Information History . . . . . . . . 1545.3.2 Generic Data Type "History Item" . . . . . . . . . . 1555.4 Uncertainty of Information . . . . . . . . . . . . . 1565.4.1 Uncertain Discrete Values . . . . . . . . . . . . 1585.4.2 Non-Parametric Probability Distribution . . . . . . . . . 1595.4.3 Parametric Probability Distribution . . . . . . . . . . 161ii 22 Mar 1999DRAFT version 1.0


DRAFT5.4.4 Uncertain Value using Narrative Expressions of Confidence . . . . . 169Appendix A: All Data Types At a Glance . . . . . . . . . . . 171DRAFT version 1.0 22 Mar 1999iii


DRAFT


AbstractDRAFT<strong>HL7</strong> v3.0 Data Types SpecificationVersion 0.9Gunther SchadowRegenstrief Institute for Health CareAbstractThis document is a proposal for a complete redesigned set of data types to be used by <strong>HL7</strong>.Whereas in version 2.x data types where considered "formats" of character strings thatwould appear in <strong>HL7</strong> data fields, this proposal assumes a more fundamental position: datatypes are the constituents of all meaning that can ever be communicated in messages. In<strong>HL7</strong> v2.x, data types where defined a posteriori on an as-needed basis. Conversely thisredesign defines data types a priori searching for fundamental semantic units in the space ofall possible data types. This redesign work is heavily based on experiences with <strong>HL7</strong> v2.x.Data types are defined for (1) character strings and multimedia enabled free text; (2) codesand identifiers for concepts and instances both of the real world and of technical artifacts;(3) all kinds of quantities including integer and floating point numbers, physicalmeasurements with units, various kinds of time. Data types are classified (generalized) invarious ways with respect to certain properties of interest.A number of issues have been identified to be equally applicable to many if not all datatypes. Intervals (of ordered types), uncertain information, incomplete information, updatesemantics, historic information, and general annotations are defined as generic data types,that can be used to enhance the meaning of any other type. Although this type system isprecisely defined, it has a lot of flexibility not found in many other type systems. Preciseconversions are defined between types so that data of one type can be used instead ofanother if there is a conversion. As a special case, character string literals are defined formost types which allows an instance of composite types to be sent in one compact characterstring.Copyright © 1999, Regenstrief Institute for Health Care. All rights reserved.DRAFT version 1.0 22 Mar 19991


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT1 IntroductionThis document proposes a redesigned system of <strong>HL7</strong> data types to be used for <strong>HL7</strong> version 3. Itis the result of a task force group spawned off Control Query at the San Diego Meeting inSeptember 1998. Since then, that group has been meeting in weekly phone conferences, chairedby Gunther Schadow. The following people (mentioned in alphabetic order) contributed to thisendeavor: James Case (University of California, Davis), Norman Daoust (Health Partners),Laticia Fitzpatrick (Kaiser Permanente), Mike Henderson (Kaiser Permanente), Stan Huff(Intermountain Health Care), Matt Huges, Irma Jongeneel (<strong>HL7</strong> The Netherlands), AnthonyJulian (Mayo), Joann Larson (Kaiser Permanente), Randy Marbach (Kaiser Permanente), JohnMolina (SMS), Richard Ohlmann (HBO & Company), Larry Reis (Wizdom Systems), DawidRowed (<strong>HL7</strong> Australia), Carlos Sanroman, Mark Shafarman (Oacis Healthcare Systems), GregThomas (Kaiser Permanente), Mark Tucker (Regenstrief Institute), Klaus Veil (Macquarie HealthCorp., <strong>HL7</strong> Australia), David Webber, and Robin Zimmerman (Kaiser Permanente).This task force planned to conclude its work by January 1999. Although we made tremendousprogress due to the commitment of the task force members, we were not completely finished. ByJanuary (Orlando meeting) we were about 80% finished. By April 1999 (Toronto), we have about90% of the work done. As usual, the last parts of a project consume the most amount of time andenergy. However, all data types are defined by now and the remaining work is to polish andrefine.This report is divided into two major parts. (1) The remainder of this introductory sectionexplains the concepts and ideas that govern this proposed system of data types, while (2) thesections 2 through 5 will define the data types in detail.This document was compiled from the notes of the twentyfour (???) conferences. The conferencenotes where issued in Hypertext (HTML) and publicly available for browsing(http://aurora.rg.iupui.edu/v3dt). In the notes I heavily utilized the unique advantages of thehypertext medium, namely the ease by which one can follow cross references. It so happened thatgeneral concepts and detailed definitions were mixed together as they came up in theconferences. Hyperlinks have been an invaluable tool to recall definitions and explanations fromearlier notes and to show how ideas evolved over time.This report is written as Hypertext too, but it is delivered to the general <strong>HL7</strong> working group as apaper document, which required to bring the material into a systematic order. However, thedivision into a first part, explaining the overall concepts, and a second part, defining the datatypes in detail, is problematic, since the usefulness of the general concepts are illustrated only byhow those concepts are actually used in the definitions of the data types. The definitions of thedata types, however, depend on general rules. Thus the reader faces a kind of "hermeneuticcircle", where one has to know about the first part before one can fully comprehend the secondpart and vice versa. The Hypertext version of this report contains numerous forward andbackward links, which, in the printed form appear as cross references to page numbers in square2 22 Mar 1999DRAFT version 1.0


1.1 GoalsDRAFTbrackets.This ordering of the material comes in handy for the "impatient reader" who can exploreeverything just by following cross references. The reader who wants to see just some actual typedefinitions can use the index [p. 171] and directly proceed to the types he or she is interested in.The reader who wants to read through all the data type definitions can directly proceed to thesections of the second part [p. 36] and, if necessary, follow links back to the explanation ofgeneral concepts. Those who want to read through all of the text from the beginning can startwith the general concepts and will be guided forward to the points where each concept is actuallyused.A final word of acknowledgment. Many of the great ideas reported here are born in numerousand intense discussions that Mark Tucker and I had before and after the conference calls. WithoutMark Tucker, this whole type system work would have never evolved to a useful state. I alsowant to acknowledge Mark Shafarman, whose great support was (and continues to be) vital forlinking our ideas back to the <strong>HL7</strong> organization, which we wanted to serve. Without him, ourideas might never have been able to touch ground. And last but most, I want to acknowledgeClem McDonald who keeps Mark Tucker and myself going by providing us with "fuel" and timeto engage in <strong>HL7</strong> work.1.1 GoalsThe overall goal of this redesign project has been rationalization and simplification of the <strong>HL7</strong>data type system. This project is inspired by the tremendous redesign project "version 3" thatguides <strong>HL7</strong> into a competitive future. It starts with the observation that the number andcomplexity of <strong>HL7</strong> v2.x data types has increased almost exponentially over the first 10 years of<strong>HL7</strong> (from approx. 10 to 50 types) The reason for that explosion of types was new requirementsthat came up only in the recent years but were not anticipated by <strong>HL7</strong>’s "founding fathers" whodesigned the data types system in 1988.New requirements that we learned about in the version 2 period of <strong>HL7</strong> had to do with thediscovery that data in health care (and business in general) is not as clean as we thought at first.For example, the history of the TS data type shows the struggle with quantities that are imprecisein the real world and that all real world information is uncertain to some extent. Information maybe wrong and needs to be updated, and most information items may change over time and wemay have to keep track of the history (recent XAD changes initiated by Susan Abernathy with theNational Immunization Program). Many data elements turned out to have more facettes to themthan was expected, which lead to various X-variants of preexisting data types. New technologychanged the way we think about telecommunication (TN-XTN) and formatted text (ST, TX, FT,HTML, SGML, RP, ED).DRAFT version 1.0 22 Mar 19993


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTNew requirements to an existing data type system must be met by modifications to the existingdata types or by inventing new ones. In <strong>HL7</strong> this sometimes lead to minor changes that couldwell be reverted later (TS). Sometimes the changes were felt so radical that the changed typeswere given other names (e.g., XPN, XAD, XTN, etc.). Over time the number of types grew and itbecame hard not to lose oversight.In some ways, however, the old <strong>HL7</strong> data type system was inherently flawed. The CM type, forinstance, became a pain over time and we are still struggling to get rid of this undefinedcomposite type. We just had too many data types for free text (TX and FT, recently growing toED, HTML, etc.) and those arbitrary multiplicities multiplied the types that depend on free text,such as CE and CF. Such types as PN and AD were not designed under an internationalperspective.The most deep flaw in <strong>HL7</strong>’s concept of data types was a wrong conceptualization of what a datatype is. Data types were considered mere "formats" of data elements. This notion of a "format" isbased on a focus on external representation (as character encodings) rather than on internalmeaning. Thus data types where supposed to be constraints on character strings that wouldappear in data fields. This notion was in part supported by experience with programminglanguages that had a poor and weak type system, such as COBOL, BASIC or PL/1, that werewidely used in business application programming.Computer science, however, developed a much stronger concept of data types. Data types arenow understood as the basic constituents of all meaning that can be processed with computers.The ALGOL family of programming languages (Pascal and MODULA 2) has a very strict datatype system. At the same time their data types are extensible. New semantic entities were createdby programmers through defining new types. Object oriented languages such as SMALLTALK,Eiffel, C++, and recently Java, have further elaborated this approach of creating new domains ofmeaning by defining types with operations. Common LISP and Scheme show a very well definedtype system with emphasis on the semantics of types rather than representation.From a deep collective understanding of <strong>HL7</strong> version 2.x and its problems and from consideringmodern lessons of computer science, we can formulate specific goals and pathways of how toredesign a system of data types that not only improve the old one, but would also serve better inthe future that may come with requirements that none of us may be able to conceive of today.Semantics firstData types are the basic building blocks of information exchanged in messages. Information isexchanged in the form of signals which are ordered according to lexical and syntactical rules.These signals are exchanged to convey a meaning (semantics) and to eventually serve a purpose(pragmatics). Therefore, data types must have a precisely defined semantics that isunambiguously related to their syntax (including the rules for building lexemes).4 22 Mar 1999DRAFT version 1.0


1.1 GoalsDRAFTUsefulness and reuseabilityThe basic set of data types must be equally useful for all <strong>HL7</strong> technical committees. This means,the data types must be meaningful enough so that the technical committees can use them directlyas the data types for the attributes of their information model classes. It also means that the basicset of data types must be reusable for many purposes and should not be too highly specialized.This does not preclude a highly specialized data type to be defined by a technical committee thatuses it.CoherenceThe set of all data types should be coherent. There should not be two or more competing datatypes for a certain use case. The relationships between the data types should be well defined. Thismeans that data types should be organized similarly to the organization of domain informationmodels (DIM) in the reference information model (RIM). The RIM and RIM harmonizationsmake sure that the DIM classes are in a close relationship and that there are no competingalternatives to express the same information in different ways.MinimalityFrom the coherence requirement it follows that the number of data types in the set should beminimal. There should be just as many data types as there are independent basic semanticconcepts to support. The lower boundary of minimality is that each data type should have a welldefined semantics on a level that is relevant to the application domain of <strong>HL7</strong>. For example, wecould have only one data type "string of bits", but bits do not have a generally relevant meaningon the application level of <strong>HL7</strong>.StabilityIt follows from the reusability requirement that every basic data type will be used by manyclasses and attributes of almost every technical committee. It becomes extremely difficult tocoordinate changes to the data types and to estimate the effect that those changes would have onthe many different areas in which the data types are used. Therefore the set of data types must bedesigned for high stability.CompletenessUsefulness, reusability, coherence and stability can be achieved by aiming for maximalcompleteness a priori. This means that the data types of each basic semantic area cover that areato every logical extent conceivable by the time of design. Conversely completeness a posterioriwould only make sure that every current concrete use case is covered by the design. Stability canonly be achieved through aiming for complete coverage of every conceivable current and futureuse case.DRAFT version 1.0 22 Mar 19995


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTSimplicityThe data types should be as simple as possible to ease implementation and use. This does notmean oversimplifications or neglect of requirements. Simplicity does not mean that the definitionof the types would go with just a few simple words, because complete definitions are necessaryfor interoperability. Simplicity, however, does mean that exceptions, duplications, anddependencies are kept to a minimum. Simplicity mainly means that the type system should beeasy to use and that it should prevent the user from making mistakes as much as possible.Simplicity also means that mistakes can be clearly seen as mistakes and prevented or fixed.Mistakes should not be hidden by imprecise definitions.1.2 MethodsFor our design of <strong>HL7</strong> data types we can build on two kinds of prior knowledge and experience.There is more than ten years of experience with data types in version 2 of <strong>HL7</strong> and there is morethan 40 years of experience with data types in general computer science. In this proposal we willtry to maximize leverage of these two rich sources of knowledge.The redesign of data types is done in a top down fashion. We approach every semantic field bytrying to understand what goes on. This understanding flows from experience and theidentification of actual and possible requirements. But experience can only refer to the past. Toreach stability and conciseness, we have to develop a precise semantic model that defines whatexactly a type should mean and how it should be used. This definition is necessarily "theoretic"rather than practical, but it is meant to serve current and future praxis, not just academicalcuriosities. When the semantics is clearly enough defined, we proceed with specifying thestructure of the types, i.e. their "abstract syntax".We generally stop defining types at the abstract syntax level and we do not define specificmappings to XML, CORBA or other implementable technologies as part of this redesign work.This mapping to implementable technologies is a task of the Implementable TechnologySpecifications (ITS) prepared by special groups who focus on those technologies. However,many of the participants in this task force group know pretty well the the pain of implementingbad specifications, and some of us are part of the initial ITS definitions for XML and CORBA.Thus we do not neglect the actual implementation constraints. We will also continue to work onthe ITS specifications as well as we will help the domain technical committees work with thenew types.ITS definitions of the data types should take into account not only the abstract syntax definitionsbut most of all the semantics and requirements of each data type. This is of utmost importancesince the abstract syntax that we identify here is not absolutely normative. Variations in theabstract syntax definitions given here are allowed to make use of features that are available in aparticular implementation technology. Variations of abstract syntax are permitted as long as thesemantic features of the data types are all mapped to and preserved in the ITS.6 22 Mar 1999DRAFT version 1.0


1.2.1 Analysis of Semantic FieldsDRAFTAlthough we define data types top down, we will make sure that for every old <strong>HL7</strong> v2.x data typethere is at least one appropriate v3 data type. The mapping of types between v2.3 and 3.0 will beshown in an appendi [not done yet]. Some of our outstanding actions items are to provide help totechnical committees to migrate to the new data types. Since no data types are assigned in theRIM so far and no durable messages specifications have been produced, this migration does notrequire any changes to actual version 3 specifications.The intention in doing this theoretical approach is not to enforce some home-grown dogma ofinformation science on system developers. It can not be made clear enough that through the typesystem proposed in this report, <strong>HL7</strong> interfaces will not enforce new functionality on informationsystems. This type system aims in supporting new requirements, such as conveying uncertaintyof information, but it does not force anyone to implement all of the features that it supports. Wehave defined a methodology called "implicit type conversion", to add enough flexibility to buildbridges between systems that do have advanced features and those systems that do not have orneed those features. We make sure that a sender can say all the detail that he wants to say aboutdata items (not more and not less) and that the receiver can find as much information in amessage as he can digest (not more).1.2.1 Analysis of Semantic FieldsGuttman (1944) and Stevens (1953) identified four categories of data. Their classification coinedthe methodology for all sciences including biology, medicine, and psychology. Guttman andStevens identified four scales on which we perform measurements or observations: (1) thenominal scale, (2) the ordinal scale, (3) the interval scale, and (4) the ratio scale.We observe qualities on nominal scales. A nominal scale is a collection of all possible outcomesof an observation with no particular order. For example, gender, colors, or diagnoses aredetermined on nominal scales.We have an ordinal scale when we can sensibly arrange the set of possible outcomes of anobservation in an order. For example the NYHA classification of heart failure or tumor stagingsare ordinal scales. We can determine the stage of the disease, we can tell the worse conditionfrom the better, but we cannot measure distances, i.e. we cannot say that the step from NYHA Ito NYHA II is as big as from NYHA II to NYHA III.Interval scales are ordered quantitative scales, where you can measure distances (intervals)between two points. The paradigmatic example are the temperature scales Fahrenheit andCelsius. It does, however, not make sense to say 100 degree are twice as much as 50 degrees.However, the concept of the absolute zero temperature allows to make those decisions on theKelvin scale (a ratio scale).For an information standard in medicine it would be appropriate to reflect these fundamentalcategories of scientific observations. However, there are some problems with this classification.DRAFT version 1.0 22 Mar 19997


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTYou can artificially try to upgrade the scale property. For instance, you can define anarbitrary order in qualitative observations (e.g., for gender: male = 0, female = 1).It often depends on the scope of the observation how you classify it, e.g., you can classifycolors in any of those scales depending on what you think colors are (qualitativeobservations, up to wavelengths of visual light).The distinction between ratio and interval scales seems artificial because a simple translationof temperatures to the Kelvin scale is all that makes the difference.Common sense might justify to distinguish qualitative from quantitative observations, althoughthe color example shows that even the boundary between qualities and quantities can be blur.We can further distinguish between observations that are discrete and those that are continuous,but again those are not precise categories. Many qualitative observations are continuous (i.e.color) but continuous qualitative observations are best understood by quantization. For instance,color can be quantized by wavelength of visible light, which is a scalar (a one-dimensional scale).But quantization can involve more than one dimension, as the color example shows: the RGBcolor quantization is a three-dimensional vector of numbers representing the intensity of thecolors red, green and blue.Since qualitative and quantitative, discrete and continuous observations are important in scienceas well as in everyday life, we can distinguish the field of discrete qualities and the field ofquantities, both discrete and continuous. We will later have to show how to express continuousqualitative observations.There are other important kinds of information not covered by the Guttman/Stevensclassification: text. Text, is not just an abstracted observation and does not fall into the distinctionbetween qualities and quantities, discrete and continuous. Text is chunks of information that areultimately exchanged between humans. Computers and automatic messaging may be used toexchange text, but after having been entered by a human user, text is passed through essentiallyunchanged to be displayed to another human user. Text can express many observations, but thisinformation content is not unlocked for the purpose of messaging and computer processing.Text does not only include letters, words and sentences of natural human language, but can alsobe graphics or pictures (still or animated) or audio. Also, the same information content of naturallanguage text can be communicated in written (characters) or spoken form (audio). Thus, wedistinguish the field of textual information. Since one property of text data in messaging is that itis passed through unchanged and uninterpreted and without respect to the destination or purpose,we can subsume all other uninterpreted (encapsulated) data in the category of text.There are thus three major areas of information that we identified by contemplating the broadfield of all information. Those areas are pictured in Figure 1 [p. 9] .8 22 Mar 1999DRAFT version 1.0


1.2.1 Analysis of Semantic FieldsDRAFTcharacterstringsSymbol<strong>HL7</strong> protocolartifactTextmultimedialexpressionsInformationThingnominalapplicationdomainconceptNumberOrdinaldiscrete/continuousproportionQuantityFigure 1: Phenomenology of Information.Information usually consists of all three moments, text, thing and quantity: Information is alwaysrepresented in some textual form; information is about things and concepts, which may havequantitative properties.When talking about things, we have to use symbols to label the things and concepts we aretalking about. Symbols are a form of text. However, the reverse is also true: text consists ofthings, i.e. letters, graphemes, or glyphs, that we recognize as distinguished concepts. Thussymbols are at the an area between text and things.Likewise, numbers are represented through digits, which are characters, that is text. On the otherhand, on computers, all text is stored in the form of binary numbers and only character codetables or image maps allow us to interpret those binary numbers as text. Thus, numbers are at thearea between quantity and text.DRAFT version 1.0 22 Mar 19999


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTA similar overlap exists between quantities and things. By enumerating concepts in codingsystems, we can assign an ordinal number to each concept. On the other hand, concepts can haveessential quantitative moments, if there is an order relationship, such as, for instance, in militarydegrees.Because everything seems so blur and the boundaries between those areas of interest are notclearly demarcated, because there is no information that would not contains each of the threemoments to some extent, it is hard to come up with any honest classification. The method ofphenomenology, developed by G. W. F. Hegel (1807) and 100 years later by E. Hussel (1906), isa much better approach to such a messy field, that has so many facettes to it. Thephenomenologic method basically observes how the meaning of the concepts drift and howconcepts are in opposition to each other but, at the same time, depend on each other.In this data type redesign, we guided our attention by those three major moments of informationwithout neglecting the overlaps. Thus, our exposition of defined types will consist of the threemajor sections about text, things and concepts, and quantities.1.2.2 Form of Data Type DefinitionsHaving said that the essence of data types is their semantics not their abstract syntax, we nowintroduce how we present the definition of some of the semantics and of the abstract syntax ofour types. We use type definition boxes. The following is such a type definition box. Text set initalics stands for the fields that will be filled out for every defined type.name of the typea brief textual description of the semanticscomponent name type/domain optionalitydescriptionname 1 type 1 optionality 1 brief description of component 1name 2 type 2 optionality 2 brief description of component 2...name n type n optionality n brief description of component nSome data types are so fundamental that there are no distinguishable semantic components. Forexample, an integer number is a closed well defined concept that can not further be split intocomponents. We call such data types primitive data types as opposed to composite data types.Note however, that complex vs. primitive are relative qualifiers. In some implementationtechnology a primitive data type may well be implemented as having some internal structure andwhat we define as a composite data type may well be implemented using a primitive of someprogramming language. What is essential is that the semantics will be covered undistorted.10 22 Mar 1999DRAFT version 1.0


1.2.3 Generalized TypesDRAFTData types that are primitive in our system are defined using a simpler type definition box asfollows:name of the typea brief textual description of the semanticsPRIMITIVE TYPEWe initially considered to reuse the UML modeling tools for data types. However, after someexperiments we discovered an interesting dilemma with using UML. There are two possiblestyles to define data types in a UML class diagram. Both styles have in common that every typewould be represented by one class box, labeled with the name of the type. The one style wouldlist all the semantic components as attributes in the box. Those attributes would again be definedas having a data type. Thus the name of other data types would appear in the list of attributes,almost like foreign keys. Obviously there are relationships between types but those relationshipsare not made visible. Every data type’s class box would stand on its own.The other style to model data types in UML would be to depict the semantic components asrelationship lines drawn from the containing type to the contained type. The role label at the sideof the containing type would be the name of the semantic component. This results in aninteresting diagram with just tiny little class boxes that maintain abundant relationships with eachother, a picture that resembles a spider’s web. It is quite difficult to navigate through those manyrelationships.Although using UML for data type definition is an interesting exercise it does not contribute verymuch to understanding of the types. The main problem with using UML is, however, that itevokes the impression as if the structure of the data types were all that needs to be said about thetypes. But the opposite is true. The most important part of the type definition is the defining andexplanatory text.1.2.3 Generalized TypesWe use a notion of generalized types. Types can maintain an inheritance relationship with eachother. We explicitly allow (and use) "multiple inheritance". However, we did not (yet) useinheritance as a way to specialize subtypes from general super-types. Rather we go the other way.Abstract generalized types are used to categorize the concrete types in different ways. Thus, wecan get hold of all types that have a certain property of interest.For instance, we define the generalized type Quantity to subsume all quantitative types. This isused to define one type Ratio [p. 137] as a ratio of any two quantities.DRAFT version 1.0 22 Mar 199911


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTWe defined a data type Interval [p. 149] that is a continuous subset of any type with an orderrelation. All types with an order relation are subsumed under OrderedType. Note that not allquantities are ordered (e.g. vectors are not) and there may be non-quantities that have an orderrelationship (ordinals, e.g. military degrees).This categorization currently is done ad hoc rather than systematically. We will at some pointrevise and validate this ad hoc categorization. For instance, it seems as if Quantity may be toobroad a category as it should contain ordinals. Ordinals, however, should not occur in a Ratio. Itis unclear whether interval scaled quantities may properly occur in a Ratio, although most peoplewould not worry about that.1.2.4 Generic TypesGeneric data types are incomplete type definitions. This incompleteness is signified by one ormore parameters to the type definition. Usually parameters stand for other types. Usingparameters, a generic type might declare components (fields) of other not fully specified datatypes. For example, the generic data type Interval [p. 149] is declared with a parameter T. In thisexample, T can stand for any OrderedType. The components low and high are declared as beingof type T.Before you can instantiate a generic type, you must complete its definition. For example, if youwant to use an Interval [p. 149] , you have to say of what base data type the interval should be,i.e. you have to bind the parameter T. Say, you want an interval of Integer Number [p. 133] . Youwould bind the parameter T to the type Integer Number through which the incomplete data typeInterval becomes completed as a data type Interval of IntegerNumber.You can complete the definition of a generic data type right at the point of instantiation. Thismeans, that you do not have to define all possible types generated by the generic type in advance.For instance, given the generic type Interval [p. 149] and the ordered typesInteger Number [p. 133] ,Floating Point Number [p. 134] ,Physical Quantity [p. 138] ,Monetary Amount [p. 140] ,Ratio of Quantities [p. 137] , andPoint in Time [p. 144] .You can use intervals of all those base types without having an actual specification of all thespecific types. The specification, what an Interval is, is given only once, generically. Wheneveryou have a new ordered type, you can build an interval from it and use that new special interval,without having to define the new interval type explicitly. Generic types are thus a more efficientway of type specification.12 22 Mar 1999DRAFT version 1.0


1.2.4 Generic TypesDRAFTGeneric types became most popular in C++, where they are called class templates. In the C++notation the Interval type would be defined as:template class Interval {T low;T high;...};this interval generic type can then be used as follows:Interval eligibleRankingNumbers;Interval normalRange;Interval effectivePeriod;Generic data types may have more than one parameters. E.g. a type could be defined astemplate class Ratio {N numerator;D denominator;...}which is actually one way of making constraints: with this generic type Ratio, iswould be clear that Ratio would be a ratio of two integers (a rational number),Ratio would be a ratio of two floating point numbers, andRatio would be a ratio of a float and an int.Note: Our data type Ratio of Quantities [p. 137] , is not defined as a generic type. Ratio is justused here to make an example about what generic types are.Generic data types can be used in a nested way. Suppose you want an Interval of Ratios offloats by ints:DRAFT version 1.0 22 Mar 199913


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTInterval foo;would be all you needed to do to instantiate that new type.Note: We did not decide on using the C++ notation of generic types, it is just used here becausemany people know C++ templates and thus C++ templates are a good illustration for whatgeneric types are and how they work.We will define generic types using type definition boxes that look like this:a brief textual description of the semanticsname of the typeGENERIC TYPEparameter name allowed types descriptionparameter name 1 parameter types 1 brief description of parameter 1parameter name 2 parameter types 2 brief description of parameter 2...parameter name m parameter types m brief description of parameter mcomponent name type/domain optionality descriptioncomponent name 1 component type 1 optionality 1 brief description of component 1component name 2 component type 2 optionality 2 brief description of component 2...component name n component type n optionality n brief description of component nAs you see, the section defining the semantic components of the type is preceded by the keyword"GENERIC TYPE" and a parameter section. In this parameter section, type parameters aredefined that are used in the subsequent section to define the semantic components. The parametersection may define a smaller number of parameters than there are defined components. Usuallygeneric types go with just one parameter, sometimes there are two of them (as in the above Ratioexample).Please confer to the definition of the Interval [p. 149] to see a real life example of a definitionbox for a generic data type. For the interval, there is just one parameter T defined. Bothboundaries of the interval are of the same type T. Any ordered type may be bound to theparameter T.14 22 Mar 1999DRAFT version 1.0


1.2.5 CollectionsDRAFT1.2.5 Collections<strong>HL7</strong> v2.x used the word "repeating" to describe certain qualities of the definition of fields andsegments. This reflected the observation that "repeated" stuff could occur multiple times in themessage. However, obviously there must be a reason why someone would make the decision thata segment or a field is to be repeatable in a message. It turns out that there are different reasons tomake that decision. It was never clear from the <strong>HL7</strong> spec. what the meaning of repeatability wasin every instance.The stuff that could repeat was either a segment or a field. For the purpose of this discussion wewill consider the v3 equivalent of a segment to be a class, whereas the v3 equivalent of a field isan attribute.If segments repeated in v3 this expressed a relationship (with multiplicity "1..*") between classes.When fields were declared "repeatable" this expressed a relationship between an attribute and itsdata values. We will concentrate here on the relationship between attributes and data valuesrather than on inter-class relationships, although what we say here is equally valid for classrelationships.In general, when things end up being "repeatable" we have a collection of things.Consider the example of Patient "telephone number" (tel) that was declared as a "repeatable"field in version 2. The meaning of this is obviously that a patient has several telephones, weusually say, a patient has a "set" of telephone numbers. The word "set" implies that (1) it wouldnot be meaningful if a given telephone occurred twice, and (2) that the order of telephonenumbers does not matter.We can use those two criteria to sort out the field of all possible collections, as the following 2 ×2 table shows:unorderedorderedno multiples set *multiples bag listThe ordered sequence without multiples is marked by an asterisk since this case is rarelyconsidered in the computer science literature. Actually we can construct the field of collections asa lattice (a tree like structure) rather than a matrix. In such a construct, the set would be the parentof both bag and list, and ordered without multiples would not occur.DRAFT version 1.0 22 Mar 199915


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTseta collection of elements with no notion of order or duplicate element values. The number ofdistinguished elements in the set is called the "cardinality" of the set. An example of a set isthe available fruits on a menu of a restaurant, e.g., { apples, oranges, bananas }.list (or sequence)an ordered collection of elements where the same value can occur more than once atdifferent positions in the ordered collection. The notion of a list can be constructed from thenotion of a set if we extend each element of the set by a position counter (a positive integernumber). The number of elements in the list is referred to as the "length" of the list. Anexample of a list may be the list of my favorite fruits, where the fruit I like more precede theones I like less, e.g., the list (orange, apple, banana) can be represented as the set { (apple,2), (orange, 1), (banana, 3) }.bagunordered collection of elements where each element can occur more than once (think of ashopping bag containing 3 apples, 2 oranges, and 5 bananas). A bag can be constructed froma set if we extend each element with an occurrence counter (e.g., a set that contains theelements { (apple, 3), (orange, 2), (banana, 5) } is a bag. The total number of things in thebag can be called the "size" of the bag, the total number of different items can be called the"cardinality".There are, however, other types of collections we frequently find, including vector and matrix.Those collection types, however, can be constructed using the above three fundamentalcollections, set, list and bag:vector or arraya list with a specific length. Every position in that list represents one "dimension" (of thevector) or one "field" of the array. A vector need not represent geometric points in the 3Dspace and elements of a vector need not be numbers. Vectors are just a quantitativerestriction on the list kind of collection, i.e. where the list must have a particular length.(The length of a list can be restricted in other ways, e.g. lengths that must be between 1 and5, those things are not vectors.)matrixa vector of vectors or a two dimensional array. Matrices are used for vector transformationsor to describe network structures. Images could be thought of a matrices, but this is not theonly way to think of images. <strong>HL7</strong> probably has not yet a use case for matrices, but that maychange as the Image Management SIG will contribute new contents to <strong>HL7</strong>16 22 Mar 1999DRAFT version 1.0


1.2.5 CollectionsDRAFTIt should have become clear that there are many types of collections and subsuming them allunder the (weakly defined) notion of "repeated" and "repeatability" is not very helpful to clear upthe meaning of a collection. We thus want to do away with language that speaks of "repeatedattributes" in the MDF to promote clarity regarding what specific semantic flavor of collections ismeant in each case.In case of waveforms, where "repeatedness" became quite tricky in v2.x, we can now define asample of an n-channel waveform signal as a list of n-dimensional vectors, where each vectorstands for a particular sample point in time.One question was always associated with collections in <strong>HL7</strong>: how do we update thosecollections? We can distinguish the following cases:1. The elements of the collection have identity (given to them through technical instanceidentifiers [p. 65] ). Thus we can change some values of those elements. For example, if wehave a list of individual practitioners, and if one practitioner changes her last name, we cansimply change the last name of that individual instance. The only requirement is that the listelements have identity.2. The elements of the collection have no identity. Changing the value of any given element isreplacing that value in the collection, which in turn means changing the collection itself.Although we could change the value of the third element of a list of numbers, the position ofan element in a list does not determine its identity. In a set or bag of numbers there is no"third element". The the only update one can do with a collection of values withoutidentities is to add or remove elements from the collection. Thus, the question boils down to:How do we change the collections themselves?One solution is to allow a collection to be updated only through separate trigger events withexplicit message structures that would specify exactly what would be changed in which way.While this strategy works fine for high level RIM objects, such as, Encounter_practitioner,Clinical_observations, etc. However, for things like "set of stakeholder phone numbers" it is a bittoo much of a burden to define specific trigger events.But even if we had a trigger event "change patient phone numbers" it is not clear how we wouldspecify what exactly should be changed.For v2.x the answer always was: you send a snapshot of the collection as you want it to be andthe recipient could simply throw away whatever he knows and would remember only what youjust sent. This somewhat works in situations with just one master information producer andseveral slave information consumers, but it is totally insufficient for collaborative informationmanagement. For example, my message could wipe out all the telephone numbers that youalready know.DRAFT version 1.0 22 Mar 199917


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTWe will give a solution below, when we talk about update semantics [p. 33] .1.2.6 The Meta ModelThe following is a first draft of a meta model for the data type definitions in UML. Since all theconcepts are described in the text above, this section does not have a lot of text. If you read thiswith an HTML browser, you can click on the class boxes in the diagram to find the description ofthe respective concepts embodied by that class.If you are not concerned with the overall methodology, maintenance and quality control of the<strong>HL7</strong> v3 specification you can safely skip this section.18 22 Mar 1999DRAFT version 1.0


1.2.6 The Meta ModelDRAFT+has_supertype0..*DTM_Generalizationdescription : DescriptiveTexthistory : CompoundHx+is_element_type1+is_supertype 1Data_type1name : NameString +is_subtypeisInternal : Boolean = false0..*isGeneric : Boolean = false+allowed_fordescription : DescriptiveTexthistory : CompoundHx1+is_scope_of1+is_used_in0..*+has_subtypetype+is_declared_aselement_type0..*+has_element_typePrimitive_data_typespecification : DescriptiveTextCollection_data_typecollection_type : Stringcardinality : MultiplicityStringComposite_data_type1+belongs_to 2..*+containsData_type_componentname : NameStringisReference : Booleandescription : DescriptiveText0..*Generic_type_parameter+has_scope0..*0..*+has_alloed_typesFigure 2: The meta model of data type definitions.Data TypeEvery data type has a name and a description. The history attribute exists for compatibility withthe current MDF meta modeling style.A data type may be defined as being "internal". An internal type is used only to define othercomposite data types. Internal types are not supposed to be directly used in messages. Forexample, we define a type Binary [p. 55] that contains pure raw data bits, and that is used only byMultimedia Enabled Free Text [p. 48] .DRAFT version 1.0 22 Mar 199919


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTA data type may be defined as being "generic". A generic type [p. 12] is a type whose completespecification is deferred until it is actually used in one or the other way. The missing pieces(Generic_type_parameter) must be specified when used. This is what C++ knows as "templates".Primitive Data TypeA primitive data type has only a textual specification of its semantics. The specification isseparate from the inherited description attribute, because it is essential for a primitive data type tohave a very careful (and likely long) specification that describes the exact semantics of such atype. [Perhaps we can replace this DescriptiveText with a pointer to the data type specificationdocument.]Composite Data TypeA composite data type [p. 10] consists of one or more named and typed components.Data Type ComponentA component of a composite data type [p. 10] is like a variable, i.e. it has a name and a type. Thetype can be declared to be included by reference instead of by value. This is useful if you knowsuch a component mentions an instance that is already mentioned elsewhere in the message. Inlanguages such as Java, where objects are always handled through references this does not makeany difference.Most fields are declared as being of some specific type. However when building generic types [p.12] one sometimes wants to leave the type-declaration of a field unspecified. Instead of leavingthe type declaration completely unspecified, one can also constrain the allowable types to certainspecific types. When just some types are allowed for a given generic data type.DTM GeneralizationA data type may be categorized into possibly many generalizations [p. 11] . For instance, IntegerNumber [p. 133] might be classified as an Ordered Type, as a Discrete Type and as a Quantity.Generalizations are themselves data types.All the rules of inheritance known from the object-oriented method apply here. I.e. generalizedtypes without attributes are called "abstract types" (all the above mentioned generalizations areabstract). You can never instantiate an abstract type. A specialization type of a non-abstract typeinherits all the attributes of the parent. Specialized types can add additional attributes or can makefurther constraints on inherited attributes.20 22 Mar 1999DRAFT version 1.0


1.2.6 The Meta ModelDRAFTCollection Data TypeA collection data type [p. 14] is a collection of one or many instances of a particular elementtype. The particular semantic variant of the collection data type be specified in thecollection_type attribute.The notion of a collection data type should once and forever supersede the traditional notion of"repeatability." [This means, the MDF meta model needs to be modified where it mentions"repeated" etc.]Collections are of one of the following types:setbaglistan unordered collection of unique element type instances.an unordered collection of element type instances. Instances may occur more than oncein the bag.an ordered collection of element type instances.Generic Type ParameterThis isn’t actually a type, but a parameter of a generic type [p. 12] template. However, generictype parameters are used as if they were types in the definition of the enclosing generic type. Forexample, we define a generic type Interval on all types with a total order relation. In C++ thiswould look like:template class Interval {...enum LimitType limitType;T lowLimit;T highLimit;...}Using DTM_Generalization [p. 20] we can define categories of data types and we can constrainthe template parameters to one of those generalized types [p. 11] .Having such a general type it seems possible to declare the generic type Interval without usingtemplates and template parameters:DRAFT version 1.0 22 Mar 199921


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther Schadowclass Interval {...enum LimitType limitType;OrderedType lowLimit;OrderedType highLimit;...}DRAFThowever both declarations are not equivalent. While the first one did not constrain the templateparameter T to be of an Ordered Type, the second declaration did not constrain lowLimit andhighLimit to actually refer to the same special type.This meta model allows to make both constraints by using the Generic_type_parameter that canbe constrained using the association has_allowed_types.1.2.7 Implicit Type ConversionImplicit type conversion was an integral part of the technology that powered the flexibility of<strong>HL7</strong> v2.x. Without being aware of the concept, <strong>HL7</strong> coincidentally had a form of implicit typeconversion that proved invaluable, especially for inter-version compatibility or localizationproblems. For instance, you could promote a single data element to a "repeating" element (i.e. alist of the base element) and vice versa without causing interoperability trouble with priorversions. Likewise, you could cast a data element declared as a primitive data type in one versionof <strong>HL7</strong> to a composite data type in another version. And you could "append" components "at theend" of a type definition, all without causing <strong>HL7</strong> agents of different versions to reject eachother’s messages.However, in <strong>HL7</strong> v2.x, implicit type conversion was not a stated rule, it was sort of a by-productof the way <strong>HL7</strong> messages used to be encoded. Transfer to other technologies, like C++ classes inProtoGen/<strong>HL7</strong> and IDL interfaces in SIGOBT’s work lost this convenience of the implicit typeconversion. If we want to preserve that invaluable technical feature of <strong>HL7</strong> v2.x, we mustexplicitly state the precise rules of implicit type conversion.Type conversion is also called "type casting". If a more primitive type is cast to a more complextype we can call this "up-casting" or "promoting" the lower to the higher level type. If a higherlevel type is being cast to a lower level type we call that "down-casting".Type conversion must be clearly defined by reasonable rules. The rules should transfer thesemantics of the data as good as possible. Especially the rules should not merely be driven by thecoincidence of representations. For instance, it makes no sense to cast an ICD-9 code 100.1 to afloating point number 100.1 just because their representation happens to be the same.22 22 Mar 1999DRAFT version 1.0


1.2.7 Implicit Type ConversionThe easiest way to state the rule for type conversion is by using a conversion matrix such asexemplified in the following table. The rows show the type you have and the columns show thetype you need to convert to.DRAFTExample type conversion matrixString FreeText CodeValue CodePhrase [p. CodeTranslation ConceptDescriptorFloat [p. PhysicalQantity Ratio [p.Integer [p. 133][p. 40] [p. 48] [p. 116] 124] [p. 123] [p. 122]134] [p. 138] 137]if codeString [p. 40] N/Asystem isif string is a if string is apromote known andif string is ais string is apromote to promote to promote tovalid validto string is avalid integervalid ratioCodeValue first CodeValue first CodeValue firstfloating measurementtext/plain valid codeliteralliteralpoint literal literalin thesystemtrytrytryif mediaconversion try conversion try conversion to try conversion to try conversion to conversion try conversion to conversionFreeText [p. 48] type is N/Ato string to string first string first string first string first to string string first to stringtext/plainfirstfirstfirstuse thecode orconvertmake a phraseCodeValue [p. otherpromote to a promote to ato string N/A with just one116] rule forCodePhrase first CodePhrase firstfirstCodeValuecreatingnone none none noneliteralstake firstconvertnew translation promote toCodePhrase [p. make aCodeValueto stringN/Awith origin set to CodeTransaltion124] literal?in phrasefirstNILfirst(cave!)none none none noneconvert convert toCodeTranslation make ause the termmake newto string CodePhraseN/A[p. 123] literal?componentConceptDescriptorfirst firstnone none none noneif a specificuse codeuseif a specific code"original system is"orignialdown-cast to system is needed,ConceptDescriptor text" or needed, seetext"?CodeTransaltion see whether it is N/A[p. 122]convert whether itmake afirstin the set ofto string is in the setliteral?translationsfirst ofnone none none nonetranslationsInteger [p. 133]make a floatuse as thefrom an int,use convertnumerator,precision isinteger to string none none none none N/Amake a float first setnumber ofliteral firstdenominatorall digits into 1the integerFloat [p. 134]use as theuseround the floatconvertnumerator,floatingto an int, cave:use "1" (theto string none none none noneN/Asetpointthis may createunity) for unitfirstdenominatorliteralpseudo-precisionto 1return theuse as theusevalue, mayconvertnumerator,PhysicalQantity floatingdown-cast to throwto string none none none noneN/Aset[p. 138] pointfloat first exception iffirstdenominatorliteralunit is notto 1"1"convertnumerator cast the ratioRatio [p. 137]convertand values to a float,use ratiodown-cast toto string none none none nonedenominator make a new unit N/Aliteralfloat firstfirstto floats and as the ratio ofthen build units (if any)the quotientDRAFT version 1.0 22 Mar 199923


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAs can be seen the conversion matrix is sizeable, even on a subset of our types. There are otherways to picture the allowed conversions. For instance in a directed acyclic graph, where everydata type is a node and every allowable conversion is an arc pointing from the type you have tothe type you need. The arc would be labelled by the conversion rule used.Conversions can be concatenated to eventually convert between "distant" types. This process isguided by pre-formulated strategy rules of the form "convert to T first". In a directed acyclicgraph representation, finding those strategies resembles finding the shortest way between twolocations on a road map.The matrix representation and the directed acyclic graph are equivalent, thus one can use either ofthose representations of conversion rules. Since the matrix grows so big, we will probably gowith the graph, which is an action item for future work.Type conversion matrices can be interpreted by computers quite easily. In C, for instance, thematrix would be stored as a two dimensional array of function pointers:typedef (*conv_func)(void*, void**);conv_func conv_matrix[MAXTYPE, MAXTYPE] {{ NULL, t1tot2, ..., t1totN },{ t2tot1, NULL, ..., t2totN },...{ tMtot1, tMtot2, ..., NULL },};convert(int ti1, void *vi1, int ti2, void **vi2){conv_func cnv = conv_matrix[ti1, ti2];if(cnv != NULL)(*cnv)(vi1, vi2);}In C++ one can do the same or one can use polymorphism to make the process more obvious.C++ even has its own rules of implicit type conversion using cast operators, which could be usedto some extent. In Java the process is mostly the same as in C++, but function pointers are notavailable. The above example does not show how concatenation and strategic steps can be usedto convert between distant types.24 22 Mar 1999DRAFT version 1.0


1.2.7 Implicit Type ConversionDRAFTIn order for conversion rules to be used, a receiver first has to know what data type he has in agiven message, in other words, the receiver needs to know the message element type (MET) ofany given message element instance (MEI). Only then can the receiver know whether or not thetype needs to be converted. Implementable Technology Specifications (ITS) of this type systemtherefore must make sure that the receiver has all the data type information he needs. This is mostsimply achieved by sending explicit data type information with every MEI.The XML encoding designed in summer ’98 and used in the ’99 HIMSS demo, for example, usesan XML-attribute "TY" and mentions the data type as the value to the TY attribute. For instance,the following two MEIs for a simple integer number and a ratio of a float and an int could appearin a message.10010.235The receiver might expect foo to be a floating point value. Using the conversion rule convertnumerator and denominator to floats and then build the quotient [p. 23] the receiver can convertthe type he has to the type he needs.Mark Tucker’s rule of minimal explicitness states that you only need to send TY attributes at aplace where the actual type used diverts from the specification. However, deciding that is a lot ofresponsibility on the sender’s side. It is therefore safe to always send TY attributes. For theHIMSS demo we simply made it the rule that the sender must supply explicit data typeinformation in TY attributes.When generic types are used, the TY value only specifies the generic type. The type of theparameters is found where the value of that type is expected to be. Thus, regardless of what isotherwise decided, TY attributes are always required for the parameterized components ofgeneric types.Conversion rules must be carefully validated to prevent surprises. For example, suppose we had ageneric data type "QualifiedInformation" that would allow to add some coded qualifier to anyother value. The conversion rule would say: whenever you need a T and you get a qualified T,just take out the value part and do not consider the qualifier part. Now consider that onequalifiers, "NOT", would exist for negation. What would happen if a message element instancecontainedDRAFT version 1.0 22 Mar 199925


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTPNEUMONIANOTWhile the sender would mean that the "main concern" is not pneumonia the receiver wouldunderstand just the opposite! This shows that conversion rules have to be specified with greatcare. In this case, conversion to simply pneumonia should be prohibited, i.e., the conversionroutine would either return no value or raise an exception.1.2.8 LiteralsIn the example type conversion matrix [p. 23] many special conversions exist between CharacterString [p. 40] and any other type. This is because we want to define concise and nice lookingstring literals for many of the data types, whether primitive or complex types. String literals canbe used in XML, for instance, to make the message more compact and human-readable.Literals can be used to specify data type instances in character oriented encoding rules. It is goodto have a single standardized form of literals to be used by different ITSs. Literals are useful formany ITSs, not just XML. For instance, SIGOBT did use character representations of most datatypes in their v2.x mapping of <strong>HL7</strong>.Literals are not only useful in inter-system messaging but also when we discuss about the designof <strong>HL7</strong> messages on a black board or in e-mail. Literals are much more handy than structuredinstance notations, such as XML. The guideline for the specification of literals is that literals areto be concise and easily understandable by humans.1.2.9 Instance NotationFor the purpose of discussion and to be able to show examples of data types we will use aninstance notation that is both, readable and concise. We do not use XML as an instance notationsince XML is just too verbous, writing XML on a blackboard takes too much time, and the XMLmarkup is too distractive for the human eye to find the real information to be conveyed in theexample.Our notation is borrowed from Common LISP and Scheme, a syntax also used in the XML world(DSSSL).This instance notation has only five idioms1. Atomic values (numbers, strings, symbols) are written in the usual character representation.Atomic values are separated by spaces, unless the spaces are contained within doublequotes. For example26 22 Mar 1999DRAFT version 1.0


1.2.9 Instance NotationDRAFT1234.45the a number 1234.45"hello world"fooa stringa symbol2. Composite values start with an opening parenthesis and end with a closing parenthesis.( ... )3. Composite values may contain atoms or other nested composites.(foo :bar (nest :baz))4. Composites always start with a symbol that denotes to the data type of that composite value.In the example above, foo would be the symbol of the data type.5. After the type symbol, composites contain keyword-value pairs. Keywords are symbols thatstart with a colon (e.g., :bar). For example(CodeValue :value "100.0":codeSystem "ICD-9")would be a Code Value [p. 116] representing the ICD-9 code 100.0 for Leptospirosisicterohemorrhagica.6. Symbols that start with a pound sign have special meaning. For instance, #true and#false would be two values for the Boolean [p. 28] type.7. Collections [p. 14] are composite expressions whose first symbol denotes the kind ofcollection (i.e., SET, LIST, or BAG). After the collection type symbol the elements of thecollections are enumerated. For example,(SET apple orange banana)a set of fruits, cardinality 3.(LIST orange apple banana)DRAFT version 1.0 22 Mar 199927


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTthe list of fruits ordered by how much I like them, length: 3.(BAG 3 apple 2 orange 5 banana)the shopping bag containing 3 apples, 2 oranges and 5 bananas, size: 10,cardinality: 3. Note that the bag notation uses alternated number-item-pairs.The beauty of this instance notation is that it can be completely defined by just a few simplerules. Moreover, the examples can usually be understood without the reader having to be able toactively master the rules.1.2.10 Typus typorum: BooleanLet’s define the first real data type, a primitive type to start with. Which type could be a betterstarter than the Boolean type, the type of all types? A Boolean value can either be true or false.The Boolean is the smallest quantum of all information (1 bit) and yet all digital information isbased on it. While Boolean values are the very basic values of all digital information processingmachinery, the Boolean data type is useful even in the highest sphere of abstract data analysis.The Boolean type embodies the axioms of logic. This is a universality that only the Boolean typehas.The Boolean type is defined as followsBooleanThe boolean type stands for the values of two-valued logic. A boolean value can be either trueor false.PRIMITIVE TYPEUse cases for the Boolean type are all RIM attribute with the "attribute type" suffix "_ind"(indicators).<strong>HL7</strong>’s position on Booleans used to be that of an ID data type with the special table that includedonly the values "Y" and "N". Since the follow-up data type for ID is Code Value [p. 116] , wecould continue to serve the use case for Booleans with Code Value [p. 116] constrained to the"Y/N" table.The reason not to continue with this habit is that Booleans are just so universally useful and bythe way are the simplest data type of the universe. Boolean information items exist and are usefulon virtually all levels of abstraction, so that it would be a move toward simplicity to define anexplicit Boolean data type for <strong>HL7</strong> to be used for all "indicators". It is so much more easy to useBooleans in program decisions, as the following example in a fictive programming languageshows:28 22 Mar 1999DRAFT version 1.0


1.2.10 Typus typorum: BooleanDRAFTVARX : BOOLEAN;...IF XTHEN(* X is true *)ELSE(* X is false *)END IF;By contrast, dealing with an arbitrary Code Value [p. 116] requires to first check whether thecode table used is the Y/N-table, then you would have to treat every possible case including thatthe given value is neither "Y" nor "N" (because there is no guarantee that the Y/N-table neverchanges, see below).VARX : CodeValue;...IF X.codeSystem == CodeSystem.Y_N_TABLETHENIF X.value == "Y"(* X is true *)ELSEIF X.value == "N"THEN(* X is false *)ELSE(* EXCEPTION: X is neither true or false *)END IF;END IF;END IF;DRAFT version 1.0 22 Mar 199929


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTWhy would we not want to use boolean data types?For backwards compatibility to the Y/N table?Because Technical Committees might want to refine the table later?Backwards compatibility to v2.x has never been (and should not be) the major issue for designdecisions for v3.0. However, through type conversions we can actually allow for backwardscompatibility. Thus, a Boolean would convert to a Code Value [p. 116] by using the Y/N table.Any Code Value [p. 116] with the coding system set to the Y/N table can be converted to aboolean.Note: We should, however, not define a conversion from Integer Number [p. 133] to Boolean onthe basis of 0 = false, 1 = true. While the Y/N table’s semantics is clearly to represent Booleanvalues, the mapping of Booleans to numbers is not semantically suggested nor is the mappingstyle determined by semantics (e.g. one could map false to -1 and true to 0, or false to 0 and trueto non-zero just as well).Some people might think that using the Y/N table to capture Boolean semantics is more flexible,because one could later extend the table to cover other (exceptional) values. For instance, somemight want to add the value P for "perhaps" and U for "unknown". Those two extensions to theY/N table can be called "generally applicable", since they are conceivably valid for all caseswhere the Y/N table is used.The programming example above shows why you just not want to extend a table used as areplacement for Booleans. Relying on Booleans means relying on one of the fundamental axiomsof logic (tertium non datur), sneaking in a third code into the Y/N table would render this axiomof logic invalid, which means that every if ... then ... else ... statement wouldhave to mutate into a case ... of ... otherwise ... statement.Those "generally applicable" extensions of the Y/N table are not just a bad idea, they are also notnecessary in the context of this data type proposal. The value "perhaps" is covered by all themechanisms to define uncertainty [p. 155] , and the "unknown" exception is covered by themethod to handle incomplete information [p. 31] .Other people might still think that the Y/N-table should be used to allow for subsequentextensions. An example might be for the patient death indicator, where Y/true means the patientis dead and N/false means that the patient is alive. Now, one could make the case that a patientafter the diagnosis of "brain death" might be kept in a vegetative state until some organtransplantation. This would be a status between live and death that neither falls in the category ofuncertainty nor incomplete information. So, one might need to extend the Y/N table by "B" for"brain death".30 22 Mar 1999DRAFT version 1.0


1.2.11 Incomplete InformationDRAFTClearly, such extensions of the Y/N table could be made only at one point of use of the Y/Ntable, e.g., only the death indicator would use the Y/N table extended by "B" for "brain death".This means that death indicator no longer would be defined as a code from the Y/N table, butfrom a "death code" table. According to the MDF, the attribute type suffix "_ind" would have tobe changed to "_cd".If "death indicator" would have been defined as a Boolean in version 3.0 and later would have tobecome a code of table "death code" one could either simply change the data type definitionbetween versions or, instead, add another field, such as "death detail status" if "death indicator" istrue. Those changes in the use of the field do require RIM changes regardless of whether we usedthe Boolean data type or not.If nothing else, a Boolean data type could help sharpen the analytic work of the committees,because it would be absolutely clear whether or not there can be other values aside from the twoopposites represented by true and false.1.2.11 Incomplete InformationIn v2.x we had the special values not present (||) or null (|""|) that could be sent instead ofany other value in almost every field in a message. The semantics of those special values weretwo fold (1) not present expressed that information was missing (2) null was able to removeexisting information at the side of the receiver so that this information was missing afterwards.We will factor this "update" component out into update semantics [p. 33] below. Here we onlydeal with the representation of incomplete information.After having defined the Boolean, the type that underlies all information, we now define a datatype called "No Information" as follows:No InformationA No Information value can occur in place of any other value to express that specificinformation is missing and how or why it is missing. This is like a NULL in SQL but with theability to specify a certain flavor of missing information.componentnameflavortype/domain optionality descriptionConcept Descriptor[p. 122]optionalThe flavor of the null value. Can beinterpreted as the reason why theinformation is missing.The "flavor" of the null value can be interpreted as the reason why the information is missing.For the time being we keep the list of possible flavors of null subject to open discussions.Reported numbers of different flavors of null values range between 1 (SQL) and 70 (reported byDRAFT version 1.0 22 Mar 199931


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAngelo Rossi-Mori).If No-Information flavors are to be used in a standard way, we will have to define a canonicalsystematization of flavors of null.For example, Stan Huff’s CE proposal contains the following null values:U unknown no information at all. I.e. nothing more is known about thecircumstances of missing information.UASKasked butunknownthe person asked could not supply the information (why?)NAV not available the person asked does have the information somewhere but notavailable right now (e.g. oh, I wrote down what the doctor said lasttime, but I didn’t bring this piece of paper with me).NAnotapplicablee.g. an answer to "gestational age" for a patient who is notpregnant.NASK not asked the person who should collect that information forgot to ask.The above example list provides no assurance to be complete or sufficient and it does not attemptto systematize the many possible flavors of null. It serves here as an example to show what suchflavors of null can comprise. Now that we defined a fairly general data type for no information,and as we factored update semantics into its own method, this issue of a canonical taxonomy ofnull values is less important. In most cases, all what people need is a No Information valuewithout the flavor component.For example, consider the patient’s date of birth is requested and we don’t know the date of birthbecause the patient does not remember it. In that case we could send:(Patient:date-of-birth (NoInformation:flavor (CV :value "UASK":codeSystem "SHNULLS")))In this example instance notation we will use the symbol #null to be equivalent with(NoInformation) without a flavor.32 22 Mar 1999DRAFT version 1.0


1.2.12 Update SemanticsNote that No Information is formally a composite data type, although it has but one component.We will list No Information under the category "primitive" anyway, since it is so fundamental toour type system.1.2.12 Update SemanticsUpdate semantics deals with the problem of what a receiver is supposed to do with information inthe message. That information may be equal to prior information at the receivers data base, inwhich case no questions occur. But what if the information is different?We can categorize the modes of updates in the following taxonomy:DRAFT1. IGNORE: Ignore the value all together2. VERIFY: Verify whether the value supplied matches the prior value. If the values do notmatch, raise an exception.3. REPLACE: Replace the value in the data base with the new value supplied in the message.Replace operations may be of the two more kinds:1. REPLACE VALUE: Change an old value to a new value2. DELETE: Change an old value to a No Information [p. 31] value (i.e. a null value).4. EDIT COLLECTION: If the data is of some collection type, we can change the collection inspecific ways depending on the kind of collection:1. A set can be updated in one of the following ways:1. include elements: build the union of the set and another set.2. exclude elements: build the difference of the set and another set.2. A list can be updated in one of the following ways:1. add element1. append2. prepend3. insert at given position4. insert at element with given value1. before2. after2. replace (either replace with new value, or set to no information)1. by position2. by value1. first occurrence2. last occurrence3. n-th occurrence4. all occurrences3. delete element entirely, changing the positions of all other elements after thedeleted one.DRAFT version 1.0 22 Mar 199933


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT1. by position2. by value1. first occurrence2. last occurrence3. n-th occurrence4. all occurrences3. A bag can be updated in one of the following ways1. include elements: build the union of the bag and another bag.2. exclude elements: build the difference of this bag and another bag.3. exclude all of elements of one kind: e.g., if a bag contains 5 apples and 3 oranges,you could exclude all oranges without having to know that you actually remove 3oranges.In principle, the update mechanism will send an update action code along with each messageelement instance (MEI). The update action code should be part of the MEI meta model.It turns out that updating a list is the most difficult task to do, since positions are relevant in thelist. The problem is concurrent updates; you never know exactly what the list looks like at thereceiver’s data base when your update message is being processed. For example, if you think thelist is (LIST A B C) and you want to insert an element D to come before C you may send anupdate expression(INSERT-AT 3 ’D)to insert D at position 3 (and shift C to position 4). However, if someone rearranged the list to(LIST C B A) just before your update message arrives, the receiver would insert the Dbetween B and A and you would cause the list to change to (LIST C B D A).If what you really wanted was to insert D before C, you should have sent the update expression(INSERT-BEFORE ’C ’D)which, at the receiver’s side would update (LIST A B C) to (LIST A B D C) but also(LIST C B A) to (LIST D C B A).The sender of an update message has to be very sure whether he wants the new element to appearin a particular position within the list or in a particular sequence relationship with anotherelement of the list. Concurrent edits to the same data at the receivers side can render the sender’sassumptions invalid.34 22 Mar 1999DRAFT version 1.0


1.2.12 Update SemanticsDRAFTConversely, with sets concurrent updates are not a problem at all, because the only thing to dowith a set is adding or removing values to and from the set, which is independent on the priorcontents of the set. For example, if you add a telephone number to a set of telephone numbers, itdoesn’t matter whether or not that telephone number is already known, since there are noduplicates of the same value in a set. Likewise, if you remove a bad telephone number from theset, you can do so no matter if the number was element of the set before. Also, there is noordering that could get messed up, nothing to assume before the update, so no assumptions can beinvalidated through concurrent updates.Updating a bag is equally straight forward. If you want to add 2 apples into the bag, you do thatwithout having to know how many apples where there before. If you want to remove 3 oranges,you can do that, no matter how many oranges were there before. Note that removal of items froma bag does not mean here that you want to get hold of those items, you just want them todisappear from the bag. Thus, if there are no more oranges left in the bag to be removed, yourremoval request is satisfied without changes.For the technical committees this means that a list collection semantics should only be chosen ifthe order really matters semantically from the perspective of pure abstract application logic. Ifthe order probably is not important enough to justify the headache around concurrent updates, thecommittee should choose the set or bag flavor.Selecting set and bag semantics should always be encouraged. A set is often exactly the rightkind of collection from the perspective of pure abstract application logic. Most collections, inpractice, are sets, while bags are quite rare.If the collection element type is a class, such as Condition_node, and a ranking is important, theranking could be represented explicitly by a ranking number rather than implying list semanticson some association, even though it is possible in UML to assume list semantics of anassociation.Also note that there are partially ordered collections that often capture the application logic muchbetter than totally ordered lists. Partially ordered collections are collections where elements mayhave the same ranking, so that you can not always decide whether one element has higher rankthan another.DRAFT version 1.0 22 Mar 199935


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT2 Text2.1 IntroductionAll information can be expressed by sequences of bits, this is the fundamental new discovery thatstarted the era of digital information processing. Written text consists of characters and charactersare by themselves expressed as sequences of bits. Eight consecutive bits are called octets orbytes. Although we usually identify one byte with one character, this identification is not aneternal law of nature and we have to distinguish bytes from characters.The ease by which we express characters as bytes and bytes as characters is due to the success ofthe American Standard Code for Information Interchange (ASCII) [ANSI X3.4]. Most computersinterpret bytes as characters according to the ASCII code. But this does not mean complete peaceof mind. On the one hand, although ASCII is by far the most important character code, there isanother one: EBCDIC.On the other hand, ASCII does not define sufficient characters to meet the needs of non-Englishlanguages. ISO 8859-1 defines an international extension to the ASCII code that fits mostlanguages of the world that use Roman charcters (Latin-1). However, there are numerous othersuch extensions. And there are numerous other languages, including Greek, Russian, andJapanese.We cannot even count on the truth that one character is expressible in one byte, as we learn fromJapanese and Chinese character sets that have way more characters than would be enumerablewith just 8 bit.The solution to the Babylonian coding chaos seems to be the Unicode standard [ISO/IEC 10646,Unicode (http://www.unicode.org/)]. Unicode is a character set that covers all languages of theworld, with even the rarest being added in upcoming versions of Unicode.Unicode seems to be accepted in all major language communities including America and westernEurope, Russia and the three countries China, Korea, Japan that were so often left alone withtheir character coding problems. China, Korea and Japan have submitted to the Unicode a jointlycompiled unified character set, called "Han", which includes more than 20000 characters. Ofcourse, those many characters can not be enumerated with only 8 bits, thus, one Unicodecharacter uses more than one byte.2.1.1 From Characters to StringsWhile most programming languages define data types for single characters, <strong>HL7</strong> messages didnot use single characters as opposed to character strings in the past and probably will not do so inthe future. A single character is on a too low level of abstraction. There is no clinical oradministrational information expressed in one character that stands for itself. There are single36 22 Mar 1999DRAFT version 1.0


2.1.2 Display PropertiesDRAFTcharacter codes, such as the "sec code" consisting of the symbols "M" for male and "F" forfemale. Those characters "M" and "F", however, do not stand for themselves but for some othermeaning. Therefore we will not need a data type for single characters.2.1.2 Display PropertiesA character code like ASCII, ISO 8859, or Unicode codifies only characters, i.e., the basicgraphemes from which written language is constructed, regardless of the style-variants ofcharacters. Often we are only interested in transmitting the semantics of a few words orsentences. But sometimes we want to enhance the expressiveness of text through an alteredappearance of characters. One can modify font family (e.g., Times Roman, Helvetica, ComputerModern), font style (e.g., roman, italics, bold), font size (e.g., 8 pt, 10 pt, 12 pt), alignment (e.g.subscript, superscript) or any other display properties.The question is, for what use cases we need only plain character strings and when do we needcontrol over the appearance of the characters?When a data field contains only one or a few words, we will probably not need control overappearance. However, who is to say how many words may appear in a given data element of typestring? And what is the exact limit of words that do not require formatting? Clearly the length ofa character string is no good criterion for whether formatting is required or not.Instead we need to look at fine semantic nuances to find the answer: A string that encodes a valuefrom a code table (e.g., "M" or "F") will not need formatting. A string that encodes a person’sfirst name or address will not need formatting too. These informations, code symbol, personnames, or address are readily conveyed only in the characters. To make this more clear. I alwaysrefer to the same city Indianapolis, regardless whether I write its name in bold letters(Indianapolis), italics (Indianapolis), underlined (Indianapolis), or any combination of those orother display properties.Conversely, controlling appearance of text will be useful in those data elements whose purpose itis to be shown to human users. Even of only two words, we sometimes want to emphasize oneword by underlining or emboldening it. There is no reason to prevent formatting for those dataelements that are placeholders for free text. Thus we have to distinguish between formalizedinformation and free text to find out when we need control over appearance.2.1.3 Encoding of appearanceThe format of a text is encoded in three different ways:1. through deploying certain intrinsic features of the underlying character code,2. through specially reserved positions in the underlying characters code, or3. through escape sequences.DRAFT version 1.0 22 Mar 199937


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAd 1: The ASCII control character number 8 ("backspace") can be used to overstrike an alreadyprinted letter. Thus one can print the same letter twice or three times to yield an emboldenedappearance on a simple typewriter or dot matrix printer. One can also print the underbar characterover the previous letter to yield the effect of underlining. There are simple software programs thatemulate the behavior of a typewriter to render this kind of simple formatting. For example, theUNIX "more" utility used to display online manual pages emulates a typewriter and someterminal devices have this emulation built in.Ad 2: Many text processors use other control character in non-standard ways to encode theformatting of the text. For example if you look at the raw file of a Word Perfect text, you willfind the words and characters interspersed with control characters that obviously encode the styleof the text. The problem with this approach is that it is proprietory and not standardized.Ad 3: Escape sequences are used by various printers and terminals. Originally, those were controlsequences separated from the normal text by a leading ASCII character number 27 ("escape"),hence the name "escape sequence". But escape sequences have since been used in many differentstyles. In C string literals, troff, TeX and RTF we see the backslash character (\) introducingescape sequences. Troff has a second kind of escape sequences started by a period at thebeginning of a new line. <strong>HL7</strong> version 2 also uses the backslash at the beginning and end ofescape sequences. SGML uses angle brackets to enclose escape sequences (markup tags), but inaddition there are other kinds of escape sequences in SGML opened with the ampersand orpercent sign and closed with a semicolon (entity references).From the many choices to encode formatted text <strong>HL7</strong> traditionally used a few special escapesequences and troff-style formatting commands. Those <strong>HL7</strong> escape sequences have thedisadvantage that they are is not very powerful and somewhat arcane or at least outdated by themore recent developments. HTML has become the most widely deployed text formatting system,available on virtually any modern computer display. HTML has been designed to be simpleenough to allow rendering in real time. Thus HTML seems to be the format of choice to transmitstyle-enhanced free text.A considerable group of <strong>HL7</strong> members also pursue using SGML or XML to define text, althoughthe purpose to using general SGML or XML is slightly different from using HTML. WhereHTML is used to control logical appearance of text, SGML is another way to structureinformation. Thus <strong>HL7</strong> will use SGML as one of its message presentation formats. SGML in freetext fields is so powerful and general, that it comes with the risk of not being interoperable.However we might want to allow for it in special circumstances.It will be difficult to limit the <strong>HL7</strong> standard to just one of the possible alternative encodings ofappearance. There is an issue of backwards compatibility that requires to keep the nroff-styleformatting of <strong>HL7</strong>’s FT data type. There is a tremendous and reasonable demand for supportingHTML, and we should not exclude general SGML and XML up front, despite the concerns forinteroperability.38 22 Mar 1999DRAFT version 1.0


2.1.4 From appearance of text to multimedial informationDRAFTThere are, in principle, two ways to support the multiple encodings of appearance. Either wedefine multiple data types, one for old FT, one for HTML and one for general SGML/XML, orwe define one data type that can contain formatted text in variable encodings.Defining multiple data types has the disadvantage that we need to decide at design time for oneof those alternatives whenever a free text data element is defined. This decision is unchangeableat the time an individual message is constructed. In other words, technical committees wouldhave to decide to use the old FT type here, the HTML data type there, and a simple TX type foryet another free text attribute. There is hardly any rationale for such a decision at design time ofthe standard.Thus, the irrationality and inflexibility of defining multiple data types for free text seems tooutweigh the conceivable advantage that a special data type might accommodate the intrinsics ofsome special encoding formats in greater detail and accuracy. Thus, we define only one flexibledata type for free text, that can support all the techniques for encoding appearance of free text.2.1.4 From appearance of text to multimedial informationBeing able to format the appearance of free text adds a great deal of expressiveness. But havingcontrol over graphical appearance of text begs the question whether graphics, drawings andpictures should not also be considered part of free text, for "a picture says more than thousandwords"? In human written communication, especially in business and science, we often usedrawings to illustrate the points we make in our words. The technology to do these things oncomputers is available, <strong>HL7</strong> only has to support it.Another use for multimedial information is that this is the only way to capture the state of a textthat precedes its typed form: dictation and handwriting. An <strong>HL7</strong> message that is sent of from aRadiologist’s or Pathologist’s workplace will usually contain very little written information, butrather the important information will be in dictated form. Again, the technology to capture voicedata, to communicate, and replay it is available on almost any PC now, <strong>HL7</strong> only has to supportit.Two alternatives exist to support multimedial information in <strong>HL7</strong>. Since <strong>HL7</strong> version 2.3, we canuse the "encapsulated data" (ED) type. The ED data type is powerful enough to communicate allkinds of multimedial information. The problem is that it is a special data type that can only beused in data fields assigned to the ED data type. Currently none of the <strong>HL7</strong> data fields isexplicitly assigned to the ED data type, which considerably diminished ED’s usefulness despiteits power.The only way to use the ED type is currently in the variable data type fieldOBX-observation-value. While this serves the communication of diagnostic data that is in imageor sound form, it is not generally usable. For any multimedial data we want to send per <strong>HL7</strong> wehave to pretend that it is diagnostic data even if it isn’t. If we want to send some descriptivedrawing to an order, we have to pretend it’s diagnostic data and send it in an OBX. Furthermore,DRAFT version 1.0 22 Mar 199939


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTit is not even clear whether there will be a variable data type in <strong>HL7</strong> version 3.The honest alternative to support multimedial data would be to admit that any free text data canpossibly be augmented or replaced by multimedial information. This means, we have to allow formultimedial data in any free text field, and thus, that free text and multimedia data share the samedata type. This is not hard to do since one flexible data type was already required toaccommodate the different encodings of text formats.2.1.5 Pulling the pieces togetherIn the previous exploration of the field of text, we separated out the difference between stringdata elements, where the raw information of characters is sufficient and free text, where there isuse for formatting the text and augment or even replace the text with multimedia information.This means that there will be a string data type on the one hand, and a flexible data type thatcovers free text and multimedial data on the other.2.2 Character StringThe character string data type for <strong>HL7</strong> is a primitive data type. We will not define any data typefor the character itself because there is hardly any use for single characters in medicalinformatics. Therefore a character string is a primitive data type in <strong>HL7</strong>. Just as it always used tobe.Character StringA string of characters where every character used by any language anywhere in the world isrepresented by one uniquely identifiable entity within the string. This type is used when theappearance of text does not bear meaning, which is true for formalized text and all kinds ofnames.PRIMITIVE TYPEToo meet the requirements of international <strong>HL7</strong> and globalization of the health care industry, thenew data type Character String is developed with this design goal:A character string is a sequence of entities each of which uniquely identifies one characterfrom the joint set of all characters used by any language anywhere in the world, now andforever.For example, one should be able to send Michio Kimura’s (chair of <strong>HL7</strong> Japan) name in JapaneseHiragana script and Latin script as40 22 Mar 1999DRAFT version 1.0


2.2.1 The UnicodeDRAFTa string of 24 uniquely identified characters without any switching of character sets.2.2.1 The UnicodeThe Unicode (http://www.unicode.org/) is a character code developed and maintained by aninternational consortium. The Unicode contains characters of virtually all contemporary scripts,and assigns a unique code to each one of them. Every character in the Unicode is called a "codepoint". All contemporary scripts fit into the first 65,000 code points. Thus every character can berepresented by a 16 bit number.For example, the string displayed above, would be represented by the following sequence of codepoints:U+307F, U+3061, U+3049, U+0020, U+304D, U+3080, U+3089, U+0020,U+0028, U+004B, U+0069, U+006d, U+0075, U+0072, U+0061, U+002c,U+0020, U+004d, U+0069, U+0063, U+0068, U+0069, U+006f, U+0029Unicode code points are usually written with a leading "U+" followed by 4 hexadecimal digits.16 bits, i.e., 65536 character code points are enough to accommodate the scripts of allcontemporary languages including Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic,Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai,Lao, Georgian, Tibetan, Japanese Kana, the complete set of modern Korean Hangul, and aunified set of Chinese/Japanese/Korean (CJK) ideographs. More scripts and characters arecontinuously added.The unified Chinese/Japanese/Korean (CJK) set of ideographs (also called "Han") uses up morethan 20000 character positions which is still less than half of the available positions.Acknowledgments should go to those three peoples of China, Japan and Korea, who made aconsiderable effort of joint standardization work. Given the historical and political problems inthis important corner of the world, this is an almost invaluable achievement. If CJK would notexist, we had to reserve for 60000 ideographs!As the Unicode will expand its scope further into historical scripts (Egyptian or Sumerian) andinto such curiosities like the Klingon alphabet, the code would claim another 16 more bits. SinceSumerian and Klingonian languages will not have to be supported by <strong>HL7</strong> for even the widestforeseeable future, one can safely assume that every character can be represented in 16 bits.,DRAFT version 1.0 22 Mar 199941


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT2.2.2 No Escape SequencesThe most important practical difference to the old v2.x ST data type is that, on the applicationlayer, escape sequences are no longer defined. This is a great relief for application programmersand it will reduce many interfacing problems.In the example sequence of Unicode characters above, one can look at any position in the stringand find a character without having to keep track of escape sequences that switch character sets.For example, we can randomly pick the 5 th character from the string, which is U+304D, aHiragana "mi". The 20 th character is a Latin "c". We can tell this without having to watch out forcharacter set switching escape sequences.Again, there will be no escape sequences defined for the character string data type on theapplication layer, not for switching of character sets, nor for any other purpose. Notably, theapplication layer has no idea about "delimiter characters", used by some ImplementableTechnology Specifications (ITS). To be ignorant of delimiters is a requirement if <strong>HL7</strong> is going tosupport multiple ITSs (e.g., for XML, for CORBA, etc.)This strong position will greatly improve robustness of <strong>HL7</strong> interfaces, since applicationprogrammers need not worry about whether some characters in strings might collide with adelimiter used by some ITS. The application can use verticle bars "|", carats "^", ampersands"&", less-than "


2.2.4 <strong>HL7</strong> Applications are "Black Boxes"DRAFTspecial escape sequences, are UTF-8 compliant. UTF-8 uses the highest bit to signalmulti-byte sequences, and thus requires 8 bit clean transport layers.UTF-7, [cf. RFC 2152 (ftp://ftp.isi.edu/in-notes/rfc2152.txt)], is an encoding that uses onlyseven bit on the transport layer. Like UTF-8, UTF-7 is backwards compatible to US-ASCII,with the exception of the plus sign "+" used to signal escape sequences consisting of base64encoded multi-byte Unicode characters.Underneath the application layer specification of <strong>HL7</strong> there is an Implementable TechnologySpecification (ITS). The task of encoding Unicode characters for transport through the wire is, byand large, assigned to the ITS. The software components implementing a certain ITS musttranslate characters from and to bytes using some encoding scheme, such as UTF-8.<strong>HL7</strong> interface toolkits that implement ITSs should deal with uniquely identified character entitieson the application programming interface (API) side and should always produce properencodings on the <strong>HL7</strong> wire. Applications that would use such an <strong>HL7</strong> interface toolkit shouldhave no obligation to deal with character set switching escape sequences or escaping ofcharacters that might interfere with the ITS.2.2.4 <strong>HL7</strong> Applications are "Black Boxes"<strong>HL7</strong> and this data type specification continues to make no assumptions on the internal working of<strong>HL7</strong> applications. Although we make recommendations that will help implement the standard,<strong>HL7</strong> does not specify the internal working of an <strong>HL7</strong> application. A particular implementationmay violate all the rules of distinguishing application layer and transport layer. Applications maytreat character strings as arrays of bytes, if they so choose, as long as this practice does not leadto a different behavior of the <strong>HL7</strong> interface.If application designers decide to deal with lower layer issues like character representation ontheir application layer, they can do so by selecting an ITS implementation that does not do themapping to and from uniquely identifiable character entities for them. Those application wouldbe <strong>HL7</strong> compliant, as long as they do not behave differently on the <strong>HL7</strong> wire.For example, a system SICK-TOS was written 40 years ago as a monolithic PDP-11 assemblerprogram. If this program behaves according to the <strong>HL7</strong> specification, it would be <strong>HL7</strong>conformant. On the other hand, a hyper-modern system SANI-NET would not be compliant with<strong>HL7</strong>, if it fails every time it receives an ampersand "&" character in a message element instanceof type character string.This is more important than it may seem: Suppose the system SANI-NET would "support" two<strong>HL7</strong> ITS interfaces, for XML and for CORBA. If it would receive "&" with CORBA, it shouldemit "&amp;" on the XML wire. And if it receives "&amp;" on the XML wire it should emit"&" on the CORBA wire. The easiest way to be <strong>HL7</strong> compliant is through separation of theapplication layer and the ITS layer through an application programming interface (API).DRAFT version 1.0 22 Mar 199943


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAgain, <strong>HL7</strong> does not specify the internal working of applications. Thus, the specification musttreat any application as a black box. The only issue, the specification may be concerned about iswhat happens on the <strong>HL7</strong> wire. Thus, this data type specification does not even mandate the useof the Unicode. It does not look at how the strings are represented in the application program. Allit cares about is described by the following scenario:Let system S send a message M to system T. That message M contains a character string C at adata element, for which T promises it will store this data element unmodified and will report thisdata element back later. Now, system T sends that message M’ back to system S containing thatdata element as a character string C’. Back at system S the character strings C and C’ must beexactly equal. That is, every character c i at position i in C must be the same character as c i ’ atthe same position i in C’.A more concrete example. Suppose your system promises to store a data element of typecharacter string and to report that same data element back to me later. My system uses Unicodecharacters internally, and I send to you a Devanagari OM character (U+0950)This character would be encoded and sent to your system my means specified in an ITS. Yoursystem receives that message and does with that message whatever it chooses to do. My systemdoes not care what your system does internally, and the <strong>HL7</strong> specification does not care whatyour system does internally. All the <strong>HL7</strong> specification claims is that when your system sends thatinformation back to my system, my system should see the same Devanagari OM character(U+0950) on its application layer.If my system does get back something else, then either my system’s ITS layer implementation isbroken or your system is broken. This is an operationalization for <strong>HL7</strong> conformance on characterstrings. This type specification demands nothing else.2.2.5 No Penalty for Legacy SystemsWe do not require any application to use Unicode characters internally. And, of course, we cannot require that every <strong>HL7</strong> conformant application would have to be able to display Kanji,Devanagari or Thai on their user screens.Applications that can replay any character of the Unicode can be called "high fidelity"applications. But this specification does not even require every application to be high fidelity. Forinstance, your application could chooses to transform any German umlaut "Ä", "Ö", "Ü" to "Ae","Oe", and "Ue", respectively, and would still conform to this data type specification. Thisspecification allows applications to be high fidelity quite easily, without requiring it from everyapplication.44 22 Mar 1999DRAFT version 1.0


2.2.5 No Penalty for Legacy SystemsDRAFTLegacy systems can comply to this specification, can even be "high fidelity", without having tosignificantly change their software! This is possible through UTF-8 encoding.Suppose your application handles 8 bit characters internally and only displays US-ASCIIcharacters. Your application would be conformant to <strong>HL7</strong> with any ITS that allows the use ofUTF-8 encoding. Any data that originates in your system would use only the US-ASCII characterset, which automatically conforms to UTF-8. If you receive data originating from otherapplications, and if that data contains Unicode characters beyond US-ASCII, your applicationwill not be able to sensibly display the characters, but it can store the characters in its data basebyte by byte. Your application would later send those UTF-8 bytes in <strong>HL7</strong> messages, thus itwould be a "high fidelity" application.If your application chooses to transcribe foreign characters to US-ASCII (e.g. German umlauts to"AE", "OE", and "UE", or "Kimura" in Hiragana to "KI-MU-RA"), it could display the characterstrings on US-ASCII terminals. If it transcribes the characters only for the display purpose, butkeeps the original code in its the data base, it would still be a high fidelity application.If your application transcribes the foreign characters as they come in over the <strong>HL7</strong> interface, itwould no longer be a high fidelity application, but could still be compliant with this specification,with the restriction that it could not claim "high fidelity". To be high fidelity on characters is notso important for end user systems anyway, but it is quite important for data repositories that areto be marketed or used internationally.High fidelity is possible if you use an ITS with UTF-8 encoding and1. your communication is 8 bit clean,2. your data base storage is 8 bit clean3. you do not use the 8th bit for string delimiters internally4. your screens won’t garble up when being sent 8 bit UTF-8 encoded sequences.For example, the Regenstrief Medical Record System (implemented using VAX BASIC) woulddo fine with criteria 1 and 2. It’s problems would be located at 3 and 4, though, since it usesdelimiters characters internally that are selected from the code range between 128 and 255.Furthermore, the screens would probably garble up when being sent UTF-8 bytes greater than128.In this case, i.e., if your environment is not fully 8 bit clean, you can use UTF-7 encoding insteadof UTF-8. UTF-7 has the same backwards compatibility features as UTF-8, but does not use the8 th bit. So you won’t have conflicts with your internal use of the 8 th bit and your communicationcan strip off the 8 th bit if it wants to.DRAFT version 1.0 22 Mar 199945


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTFor Europeans, who used ISO Latin-1, the backwards compatibility issue is not as easy as forsystems that used only US-ASCII characters. Even though the Unicode itself is backwardscompatible to ISO Latin-1, there is no Unicode transfer encoding that leverages this. In thecourse of this data type working group, we tried to pursue the Unicode maintainers to adopting amore flexible UTF character encoding that would allow backwards compatibility to Latin-1 andother ISO 8859 character sets. However, we did not succeed, more UTF specifications are notwelcome. Notably, it were the European Unicode participants who did not think that such a UTFwould be a good idea.It is the task of the ITS layer software to convert any incoming character encoding into theencoding that the application can handle. There is no requirement for applications to use Unicodeinternally and no requirement for ITS to not support other character encodings, such as ISOLatin-1, or the various Japanese character encodings. The ITS layer software would translate thecharacters to any kind of encoding that the application software can handle. For exampleFor most Java-based applications the ITS layer would most likely convert incoming UTF-8byte format to Java Strings, which use 16 bit per character internally. This is a basicfunctionality of the Java core API.Most UNIX-based C and C++ character functions treat one character as an int (16 or 32bits depending on the CPU native word size) not as a byte. However the quick and easyapproach in C is to use a char * as a string, which is just an array of 8-bit characters.For those and many other environments that stick to the equation 1 char = 1 byte, theapplication could choose to use UTF-8 strings internally where the normal US-ASCIIcharacters are represented as single bytes. Those applications would tell their ITS softwarethat it should convert everything to UTF-8.A very old legacy system that internally uses a packed array of char where acharacter has only 7 bits, or that for some other reason strips off the 8 th bit, would tell theirITS implementations to convert incoming characters to UTF-7 instead.The key issue is that the ITS layer always performs some translations on the character encodingaccording to the encoding of incoming messages and the needs of the application. Although<strong>HL7</strong>’s scope is on the message format only, we do recommend that implementors of ITS layersbe aware of this character encoding feature they should implement. What is important is that thenotion of different character encodings does not exist on the <strong>HL7</strong> application layer. No <strong>HL7</strong>specification would be valid that makes any assumptions about character encodings orencoding-related escape sequences on the application layer.46 22 Mar 1999DRAFT version 1.0


2.3 Free TextDRAFT2.2.6 Unicode and XMLUsing Unicode with an XML-based ITS is the most natural thing to do, since XML is itself awareof the Unicode and its encodings UTF-8 and UTF-16 are required features of every XML parser.In fact, the XML concept of characters served as a model for this <strong>HL7</strong> data type specification.The XML specification (http://www.w3.org/TR/1998/REC-xml-19980210) states:2.2 CharactersA parsed entity contains text, a sequence of characters, which may represent markup orcharacter data. A character is an atomic unit of text as specified by ISO/IEC 10646[ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal graphiccharacters of Unicode and ISO/IEC 10646. [...][...]The mechanism for encoding character code points into bit patterns may vary from entityto entity. All XML processors must accept the UTF-8 and UTF-16 encodings of 10646;the mechanisms for signaling which of the two is in use, or for bringing other encodingsinto play, are discussed later, in "4.3.3 Character Encoding in Entities".XML 1.0, 2.2 Characters (http://www.w3.org/TR/1998/REC-xml-19980210#charsets)Since XML uses Unicode internally, there is no need and no way to specify different characterencodings in different sections of an XML based <strong>HL7</strong> message. There is no interference ofUnicode and XML whatsoever. Thus the requirements to character strings stated here are noobstacle to using XML.2.3 Free TextTo cope with the various encoding formats of appearance, there will be only one data type forfree text. This type will have essentially two semantic components: It will (1) contain the freetext data and (2) specify the application which can render that free text data. The application torender the data will be specified by a media type code, similar to the Internet MIME standard [cf.RFC 2046 (ftp://ftp.isi.edu/in-notes/rfc2046.txt)] or <strong>HL7</strong> v2.3’s ED data type. The only problemis what data type to use for the free text data.Some formatted text could be defined on top of string data. Due to the backwards compatibilityof Unicode to ASCII and ISO Latin-1, the simple typewriter-style formatting, the troff escapesequences that were used by <strong>HL7</strong>’s old data type FT and HTML/SGML formatting is possible ontop of Unicode strings. In addition to the string data, we have to indicate the formatting methodthat should be used by the receiver to render a given string correctly.DRAFT version 1.0 22 Mar 199947


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTMost proprietory text formatting tools, however, do not fit in the character string, because thoseapplication use their own proprietory byte encoding of characters and their display properties.Proprietory word processor files and multimedia data is best regarded as an opaque sequence ofbits (or bytes) that are rendered by a special application software that understands the givenstream of bits. For those, we need to go back behind the character strings to raw bits and bytes.There seem to be two options. Either we consider it be the task of the ITS layer (the encodingrules) to support the communication of raw bytes data, or we encode raw bytes in strings usingthe base64 encoding.With the traditional <strong>HL7</strong> encoding rules that were unable to encode raw bytes, raw data had to besent on top of character strings. This, however, is wasteful for encoding rules and transportchannels that can send and receive raw bytes easily. In our definition of a Character String [p. 40]it is wasteful to first construct character strings from bytes, only to transform the character stringsback to bytes.It therefore seem reasonable to define a data type for raw byte strings to complement thecharacter string data type. The raw byte type would be used only by the data type for free text,though. There is hardly any use case for <strong>HL7</strong> application domain Technical Committees to usebyte string data types directly.Using byte strings instead of character strings for free text is not only a good idea for proprietoryapplication data or multimedia data, but is also supported by a closer look to standards such asHTML, SGML or troff. While those formats are defined on a notion of characters instead ofbytes, the applications that implement HTML, SGML or troff, have their own means to interpretbyte streams as character encodings (e.g. HTML has a META element and XML defines thecharacter set in its !XML header element. More traditional formatting with troff is not even ableto handle the full abstraction of characters that comes with Unicode and thus is also based onbyte strings rather than character strings.As a conclusion, we can uniformly define the free text / multimedia data type as the pair of mediatype selector and raw byte data. If the sender does not want to use any of the format options forfree text but just wants to send the raw characters, he can indicate this with a special media type(text/plain). It seems justified to make the plain text media type the default.2.3.1 Multimedia Enabled Free TextThe multimedia-enabled free text data type consists of the following components:48 22 Mar 1999DRAFT version 1.0


2.3.1 Multimedia Enabled Free TextDRAFTFree TextThe free text data type can convey any data that is primarily meant to be shown to human beingsfor interpretation. Free text can be any kind of text, whether unformatted or formatted writtenlanguage or other multi media data.componentnamemediadescriptordatacompressioncharset...type/domain optionality descriptionCode Value [p.116]using IANAdefined MIMEtype codesBinary Data [p. 55] requiredCode Value [p.116]optionalIANA defined codeCode Value [p.116]IANA defined codeoptionaldefaults to text/plainoptionalfor character-orientedmedia typesdefaults to the encodingused for Character String[p. 40]used to select an appropriatemethod to render the free textdatacontains the free text data asraw bytesindicates that the raw byte datais compressed and whatcompression algorithm wasusedin case of character basedmedia, indicates the characterset/encoding of the raw bytedataOther components may be defined for certain media types. This serves as a way to map MIMEmedia type "parameters" to this Free Text data type. An example is the charset component,which is a parameter of the MIME media type text/plain.The media type descriptor of MIME RFC 2046 (ftp://ftp.isi.edu/in-notes/rfc2046.txt) consists oftwo parts:1. the "top level media type", and2. the media subtype.However, this data type specification treats the entire media type descriptor as one atomic CodeValue [p. 116] .DRAFT version 1.0 22 Mar 199949


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTMIME media types and subtypes are defined by the Internet Assigned Numbers Authority(IANA). Currently defined media types are registered in a data base(http://www.isi.edu/in-notes/iana/assignments/media-types/) maintained by IANA. Any of theIANA defined media types is in principle allowed for use with the Free Text data type. But notall media types have the same status in this specification.The following top level media types are currently defined by the IANA:textimageaudiovideoNAMEwritten textual informationimage dataaudio datavideo dataapplication some other kind of datamultipartmessagemodelPURPOSEdata consisting of multiple MIME entitiesan encapsulated message"an electronically exchangeable behavioral or physical representation within agiven domain" [RFC 2077 (ftp://ftp.isi.edu/in-notes/rfc2077.txt)]This data type is called Free Text , and so it seems strange, almost frightening, that the above listcontain media types like video, application, even message. Should there not rather be one datatype only for written text, one for audio, one for image, one for video, etc.?The rationale that lead to the definition of the free text data type is that free text is informationsent from one human being to another human being. The receiving human being will - if she hasa method to render and see the information - be able to interpret this data. To understand the fullrange of meaning of the word "text" we should have a look into Webster’s dictionary(http://www.m-w.com/home.htm):50 22 Mar 1999DRAFT version 1.0


2.3.1 Multimedia Enabled Free TextDRAFTMain Entry: textPronunciation: ’tekstFunction: nounEtymology: Middle English, from Middle French texte, from Medieval Latin textus, fromLatin, texture, context, from texere to weave -- more at TECHNICALDate: 14th century1 a (1) : the original words and form of a written or printed work (2) : an edited oremended copy of an original work b : a work containing such text2 a : the main body of printed or written matter on a page b : the principal part of a bookexclusive of front and back matter c : the printed score of a musical composition3 a (1) : a verse or passage of Scripture chosen especially for the subject of a sermon or forauthoritative support (as for a doctrine) (2) : a passage from an authoritative sourceproviding an introduction or basis (as for a speech) b : a source of information or authority4 : THEME, TOPIC5 a : the words of something (as a poem) set to music b : matter chiefly in the form ofwords that is treated as data for processing by computerized equipment 6 : a type suitable for printing running text7 : TEXTBOOK8 a : something written or spoken considered as an object to be examined, explicated, ordeconstructed b : something likened to a text This multimedia data type remains to be text in the sense of Webster’s definitions 5 b and 8.Clearly, word processor documents can contain images such as drawings or photographs. Moderndocuments can embed video sequences and animations as well. Dictation (audio) is the mostimportant form of pre-written medical narratives. A scanned image of old medical records or ofhandwriting is certainly text. In this sense, almost everything can be text, which is supported alsoby the phenomenologic analysis [p. 7] given in the introduction.There are currently more than 160 different MIME media subtypes defined with the list growingquite fast. It makes no sense to list them all here. In general, all those types defined by the IANAmay be used. The downside is that so many options may lead to interoperability problems.Therefore, this specification prefers certain media types over others and thus assures that there isa greatest common denominator on which interoperability is not only possible, but that ispowerful enough to support even advanced multimedial communication needs.DRAFT version 1.0 22 Mar 199951


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAny IANA defined media type is classified as one of the following for categories:mandatoryEvery <strong>HL7</strong> application must support at least the mandatory media types if it supports agiven kind of media. There should be one mandatory media type for each kind of media(e.g. written text, image, audio, video, etc.). Without a very minimal greatest commondenominator we cannot guarantee interoperability. The set of mandatory media types,however, is very small so that no undue requirements are forced on <strong>HL7</strong> applications,especially legacy systems.In general, no <strong>HL7</strong> application would be forced to support any given kind of media otherthan written text. For example, many systems just do not want to receive audio data, becausethose systems can only show written text to their users. It is a matter of applicationconformance statements to say "I will not handle audio". Only if a system claims to handleaudio media, it must support the mandatory media type for audio.recommendedotherOther media types are recommended for a particular purpose. For any given purpose thereshould be only very few additionally recommended media types and the rationale,conditions and assumptions of such recommendations must be made very clear.By default, any media type falls into the category other. This category means, <strong>HL7</strong> doesneither forbid nor endorse the use of this media type. Given that there will be a mandatory orrecommended type for most practically relevant use cases, the other media types should beused very conservatively.deprecatedSome media types are inherently flawed, because there are better alternatives or because ofcertain risks. Such risks could be security risks, for example, the risk that such a media typecould spread computer viruses. If a media type is classified as deprecated, the rationale mustbe stated and equally viable alternatives suggested. Not every flawed media type is markedas deprecated, though. A media type that is not mentioned, and thus considered other bydefault, may well be flawed.The following list shows the categorization of media types according to the above mentionedrules.52 22 Mar 1999DRAFT version 1.0


2.3.1 Multimedia Enabled Free TextCategorization of Important Media TypesDRAFTMEDIA TYPE CATEGORY USE CASEtext/plaintext/x-hl7-fttext/htmlapplication/pdftext/sgmltext/xmltext/rtfmandatorydefaultTextfor any plain text. This is our former TX data type.recommendedthis represents the old FT data type. It’s use isforrecommended only for backwards compatibilitycompatibility towith <strong>HL7</strong> v2.x systems.<strong>HL7</strong> v2.xrecommendedcould becomemandatory inthe futurerecommendedrecommendedfor PRAdocumentsotherapplication/msword deprecatedfor any marked-up text, sufficient for most textualreports, platform independent and widelydeployed.for written text as completely laid out read-onlydocuments. PDF is a platform independent, widelydeployed, and open specification with freelyavailable rendering tools.There is a risk that general SGML/XML is toopowerful to allow a sharing of generalSGML/XML documents between differentapplications. However, this media type is to beused to convey documents conforming to the <strong>HL7</strong>Patient Record Architecture.this format is widely used, but it has itscompatibility problems, it is quite dependent onthe word processor, but may be useful if wordprocessor edit-able text should be shared.this format is very prone to compatibilityproblems. If sharing of edit-able text is required,text/plain, text/html or text/rtfshould be used instead.AudioDRAFT version 1.0 22 Mar 199953


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTaudio/basicaudio/k32adpcmimage/pngimage/gifimage/jpegimage/g3faximage/tiffimage/x-DICOMmandatoryrecommendedforcompressionmandatoryothermandatoryfor high colorimagesrecommendedfor FAXotherotherthis is the absolute minimum that should besupported for any system claiming to be audiocapable. The content of the "audio/basic" subtypeis single channel audio encoded using 8bit ISDNmu-law [PCM] at a sample rate of 8000 Hz. Thisformat is standardized by: CCITT, Fascicle III.4 -Recommendation G.711. Pulse Code Modulation(PCM) of Voice Frequencies. Geneva, 1972.this allows compressing audio data. It is anInternet standard specification [RFC 2421(ftp://ftp.isi.edu/in-notes/rfc2421.txt)]. itsimplementation base is unclear.Imageportable network graphics PNG(http://www.cdrom.com/pub/png/) a widelysupported lossless image compression standardwith open source code available.GIF is a nice format that is supported by almosteveryone. But it is patented, and the patent holder,Compuserve, has initiated nasty lawsuits in thepast. No use to discourage this format, but we cannot raise an encumbered format to a mandatorystatus.This format is required for high compression ofhigh color photographs. It is a "lossy"compression, but the difference is almostunnoticeable to the human vision.this is recommended only for fax applications. Theformat is not well compressed and G3 software isnot very widespread.although TIFF (Tag Image File Format) is aninternational standard it has a lot ofinteroperability problems in practice. Too manydifferent versions that are not handled by allsoftware alike.not sure whether there is an interoperable imagefile format in DICOM. I know of Papyrus, but is ita DICOM standard?54 22 Mar 1999DRAFT version 1.0


2.3.2 Binary DataDRAFTvideo/mpegvideo/x-avimodel/vrmlmultipartmessagemandatorydeprecatedrecommendeddeprecateddeprecatedVideothis is an international standard, widely deployed,highly efficient for high color video; open sourcecode exists; highly interoperable.the AVI file format is just a wrapper for manydifferent "codecs"; it is a source of lots ofinteroperability problems.Otherthis is an openly standardized format for 3Dmodels that can be useful for virtual reality type ofapplications and is used in biochemical research(visualization of the steric structure ofmacromolecules)This major media type depends on the MIMEstandard, the Free Text data type uses only want touse MIME multimedia type definitions, not theMIME message formatThis major media type this is used to encapsulatee-mail messages in delivery reports and e-mailgateways, not needed for <strong>HL7</strong>. <strong>HL7</strong> is itself amessaging standard that defines its own means ofdelivery and <strong>HL7</strong> is not used for e-mail.Constraints may be applied on the media types whenever a Free Text data type is used, whetherat the time of <strong>HL7</strong> message specification, or for a given application conformance statement, andeven in the RIM. For instance, suppose the Image Management SIG will eventually define a class"Image". This class Image would conceivably contain an attribute, "image_data", declared asFree Text. The IMSIG certainly would not want to see written text or audio here, but only images(and maybe a video clip of a coronary angiography.)2.3.2 Binary DataBinary DataBinary data is a sequence of uninterpreted raw bytes (8 bit sequences, or octets).PRIMITIVE TYPEDRAFT version 1.0 22 Mar 199955


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThe data component of the Free Text data type is not a character string but a block of raw bits.ASN.1 calls this an "octet-string," which is the same as a "byte-string." The important point isthat the byte string would not be subject to interpretation as characters, but must be passedthrough from one application’s memory into the other application’s memory unchanged.The ITS layer has therefore an additional tasks: to facilitate transport of raw byte strings.Transporting bytes is different from transporting characters, this can not be overemphasized.Traditionally, <strong>HL7</strong> v2.x roughly supported binary data on top of character string data, eitherthrough hexadecimal digits in escape sequence, or through base64 encoding used in the old HDdata type. However, this makes only sense for character-based encoding rules such as thetraditional <strong>HL7</strong> encoding rules or XML. An efficient CORBA ITS, would not need this, asCORBA allows you to transfer raw bytes without trouble.Just as character encoding is an ITS layer issue, the encoding of bytes is an ITS layer issuetoo. On the <strong>HL7</strong> application layer we do care only for the unchanged communication of abyte string.However, when the multimedia type is used to convey plain text, the binary data will beultimately interpreted as plain text. Through this, character encoding should not be sneaked intothe application layer.The ITS layer software should discover the special case of text/plain media and perform thecharacter set translation according to the character encoding used for ordinary Character Stringdata. The ITS layer software can reuse the same machinery that handles character stringencoding.If for any reason the plain text data is in an encoding different from the character encoding usedby character strings, this can be indicated through the charset component. The IANA maintains acode of character sets (http://www.isi.edu/in-notes/iana/assignments/character-sets) that must beused for this purpose. This IANA code mentions many synonyms for each encoding. If one ofthem are identified as the "preferred MIME name" it must be used instead of the other synonyms.If none of them is defined as preferred by IANA, the first name mentioned should be used.With text/plain we have the issue of how lines are terminated. The termination of lines mustbe standardized. The proper interpretation of the ASCII and Unicode standard suggest that lineterminators consist of the two control characters carriage return U+000D and line feed U+000A.This is also the Internet standard of terminating lines and it is native line termination of onMS-DOS descendents. It is easy to comply to those canonical line terminators on Unix systems,who natively use a single line feed as an end of line. Apple Macintosh systems use those controlcharacters in reverse order, and those must be swapped.56 22 Mar 1999DRAFT version 1.0


2.3.3 Outstanding IssuesDRAFTIt is often useful to compress binary data, e.g. using the "deflate" byte stream compressionalgorithm. This is used by gzip, and pkzip. Almost all data can be subject to byte streamcompression (except GIF, JPEG and MPEG, which are already maximally compressed.) Using amedia type of application/gzip for compressed data is obviously not useful, since it wouldoverride the description of the uncompressed data. The component compression is to be usedinstead. Either an IANA code is to be used or a subsequent revision of this specification willmention a table of allowed codes.2.3.3 Outstanding IssuesWe will define a code for compression algorithms.We recognized that there will be a reference data type defined to be used alternatively for hugedata blocks. Should the free text type be allowed to be replaced by a reference, or should itcontain a reference?Video streams do not fit into a single message, an external stream protocol (such as RealVideo)would be used. This could be accommodated through a reference data type.DRAFT version 1.0 22 Mar 199957


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT3 Things, Concepts, and Qualities3.1 Overview of the Problem Space3.1.1 Concept vs. InstanceMost medical information comes as qualitative information: complaints, symptoms, signs,diagnoses, goals, interventions, surgeries or medications, all of these are informations on anominal scale. But not only medical information, administrative data often is on nominal scalestoo, e.g., patient class (inpatient, outpatient, etc.), insurance, health plan, and many other dataelements. These nominal scaled values are variables that can take on one value of a list ofpossible values.The semantic field on which we are now focusing contains more than just values on nominalscales. Values on nominal scales are abstract concepts. For instance, the color green is such anabstract concept. There is not some tangible green anywhere in the world, only bodies whosecolor is green, or green light being reflected from bodies (which is the same physicalphenomenon.) Likewise, there is no pneumonia to which we can point and say: "here comesPneumonia!" And although we would say that a Streptococcus pneumoniae bacterium is a realphysical body, we usually are not interested in the one bacterium lying in the lower left corner ofour microscope view area. What we are interested in is the concept of Streptococcus pneumoniae,not the individual bacterium.On the other hand, we often need to refer to individual things, like this table, or this computer onwhich I type. Individual things can be classified into concepts, a table or a computer. But whenwe want to refer to individual things, we just do not want to classify. Referring to individualthings is thus the opposite from referring to concepts. In our data type model we have to serveboth needs, referring to concepts and referring to individual things. We can call individual things"instances".However, the distinction between "concept" and "instance" is not very crisp. Philosophically wecan easily argue that Gunther Schadow is a mere concept, (you might have seen me, but that isnot essential for your concept of Gunther Schadow). Through my writing I am currently aconcept in your mind that might have more or less shape but still it is likely to exist only in yourmind. Although Julius Caesar or Napoleon may have been real existing creatures, they nowpersist as mere concepts.An instance is something you can (merely) point to or touch or destroy. A concept can not bepointed at, touched or destroyed. A concept can only be explained. Both, instances and conceptshave names, although these names have different characteristics. I, as a living human being, aman instance and I have a name: "Gunther Schadow." By contrast, "headache" is something onecan explain. When you feel you own headache, your present headache might even become aninstance for you, but your particular headache is rarely an instance for others.58 22 Mar 1999DRAFT version 1.0


3.1.2 Real World vs. Artificial Technical WorldDRAFTThus "headache" is a concept. The Hypertext Transfer Protocol (HTTP) is a concept as well. Youcan not point at HTTP, you can not touch HTTP, you can not destroy HTTP. But you can explainHTTP. You can explain HTTP to your wife, but you can not explain Gunther Schadow to her.You can tell her about your experience with meeting me on phone or e-mail, but you can not"define" or explain Gunther Schadow. Instances can be assigned to categories. You can say thatGunther Schadow is a human, male, and living in Indianapolis. That categorizes me in certainmanners, but it does not explain me.Generalization and specialization are relationships between concepts, not instances. GuntherSchadow does have neither a generalization nor specializations. We too frequently blur thatdistinction between concepts and instances, when we talk about a "parent-class" or "children of aclass." Parent/child are relationships between instances, not classes. But the metaphor of thegenealogy for looking at concept-relationships is very very old (Porphyrius, an earlycommentator, perhaps a student, of Aristoteles.)Gunther Schadow has parents and I do have a child. Headache has no parents and "tensionheadache" may be a specialization of headache, but it is never a child of headache. As such thevery term "inheritance" is distracting, since inheritance exists only among instances, notconcepts. We have to be very careful about our metaphors.3.1.2 Real World vs. Artificial Technical WorldThe term "instance" is also in opposition to "class". In the object-oriented paradigm (actuallyoriginating with Aristoteles 400 b.C.) there are classes that are concepts of real things andinstances (or individuals) which are the real things themselves. In object-oriented language wewould probably want to say "class" vs. "object", however, this distinction is ambiguous, sincepeople often point onto a box in the RIM labeled "Patient" and say "this is the Patient object". Itis the class, not the object. But of course, when dealing with classes in computer systems, theytoo become objects (sometimes called meta-objects).Within computer systems everything tends to blur up. Every object oriented language haspointers (or references) to objects (= instances). Some treat classes as meta-objects (e.g., Javadoes). In any case, an instance in your computer memory or on a file can be pointed to (using anindex, pointer, database key, or whatever). It can be "touched" (modified, directly examined), itcan well be destroyed. But it can not be explained. It can be copied, though, and as such itbecomes like a concept. But "real" object-oriented systems (like CORBA) do not allow you tojust copy an instance.Computer systems shed a whole new light on the problem space. There are class instances onhealthcare information systems, that refer to some real world instances. Thus, a record in apatient registry refers to a real existing patient. Both the patient record and the human patient arerelated, but not the same. Thus there is a new pair of antonyms: real things and reflexions ofthings within information systems.DRAFT version 1.0 22 Mar 199959


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAlthough, <strong>HL7</strong> deals primarily with reflexions of things within computer systems, there is thisimportant linkage between the information about things and the real things themselves that <strong>HL7</strong>must care about.It is very difficult to unambiguously link to real things. This is because instances can only bepointed at. I can say "this table", but "this" does not mean anything for you if you are not here inmy office. I can describe my desk to you, but you will not be able to recognize the individualdesk from among others of the same kind. The only thing one can do about this is to search forindividual properties that only my desk has, e.g., a particular scratch. Thus, we can collectinformation about instances and use this information to refer to the instance, in the hope thatthere will be no second instance that matches the same description. But you never know.An alternative to describing the scratches at my table is to put a tag on it with an inventorynumber. My computer screen, for example, has such a tag on it with the inventory number"2464" assigned by the Regenstrief Institute. Inventory numbers are a common way to refer toindividual things, we can easily put a tag on them.But we can not put a tag on people. There is a custom to brand animals, but luckily in our culturewe do not brand human beings. We give names to human beings, names they remember fromabout their second year after being born and until several days before dying, at average.However, names do change, names are misspelled. Everyone who deals with healthcareinformatics knows something about the problem to identify people.On the one hand, with computers and technical devices, some things become easier. For example,real world concepts such as diseases or even colors are hard to describe. Modern science tries tooperationalize concepts, i.e. to provide a protocol by which you can reproduce an instance of thatconcept or by which you can decide whether something is an instance of a given concept or not.But operationalizations are a matter of consensus, and that consensus does often not exist, not ineveryday life and not in the sciences. Conversely, with computers and technical devices conceptshave crystal clear definitions and instances have exact locations and extent. For example, HTTPhas a specification that tells you exactly what to do to become HTTP compliant, and that allowsyou to exactly decide whether or not you deal with an HTTP interface. If I dial a telephonenumber, there will be precisely one phone ringing somewhere in the world.3.1.3 Segmentation of the Semantic FieldIn the introductory approach to our semantic field we found two pairs of terms that seem to covera lot of the phenomena that we have discovered: concept vs. instance and real-world vs. technicalartifacts. We try to sort out the phenomena we have to deal with in <strong>HL7</strong> with the following 2x2matrix.60 22 Mar 1999DRAFT version 1.0


3.1.3 Segmentation of the Semantic FieldDRAFTCONCEPTINSTANCEREALWORLDCoded using mostlyexternallydefined code systems:ICD9, ICD10, SNOMED,DSM-III, DSM-IV, ICPC,LOINC, ICPM, CPT4, etc.Examples:person names (old PN),organization names (old XON),locations descriptors (old AD, andPL),legal id numbers (SSN, DLN, etc.)TECHNICALExamples:message type,order status code,participation type code,MIME media type.Examples:message ids,Service catalog items,RIM instances (order numbers),phone numbers, e-mail addresses,URLsREAL-WORLD CONCEPTS are concepts that scientists and ordinary people deal with in theirminds and formulate in words. Communication must rely on commonly agreed terminology orstandard code systems. Those are mostly defined by external (i.e. non-<strong>HL7</strong>) organizations, suchas those organizations representing domain experts in a particular medical specialty.There is currently a lot of overlapping, competition and complementation of code systems. Itdoes not seem as if this apparent disorganization could ever change because medicine and humanlife in the real world is always changing. Thus, the communication of real world concepts willalways have to deal with issues of translating codes selecting the best matching "synonymous"code from different code systems.TECHNICAL CONCEPTS are labels for well-defined concepts, such as protocols. Forexample: if we say "HTTP" we refer to the hypertext transfer protocol, that is an Internetstandard defined quite rigorously. If we ultimately want to know what HTTP is, we can read thespecification. However, most often we are not so much interested in what "HTTP" is or in whatits meaning is, but we just want to use it. So we select an appropriate machinery (i.e. a webbrowser) and use HTTP.With Technical Concepts there is no use for different vocabulary, no use for using both "HTTP"and "HypTexTranProt" to refer to the same technical concept. This is not to say that people couldnot use different names or abbreviations for HTTP, but it means that there is no point in lettingeveryone choose his own terminology for the exact same technical concepts.DRAFT version 1.0 22 Mar 199961


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTREAL WORLD INSTANCES are individual people, organizations or things that we can meet,point at, think of, go to, etc. The strongest "definition" we can ever make is to point at thosepeople or things, touch them or take them into hands and show them. But in documents andhuman communication we commonly use Names, some officially assigned Identifiers (i.e. socialsecurity number, or driver license number). Places are named using residential addresses, or otherkinds of locators (e.g., building->tract,->floor->room->bed).Things are most often pointed to (e.g. "give me this screwdriver"), or described (e.g., "give methe long screw driver ... no, the stronger one"). In larger context where we can neither point tothings, nor could unambiguously describe things, we just assign arbitrary inventory numbers tothe things.In general, identifiers for Real World Instances are quite rich of intricacies and we will addressthose later. The common approach for data types is already laid out by <strong>HL7</strong> v2.x: i.e. PN, XON,DLN, AD, PL, and the like.TECHNICAL INSTANCES are instances that are useful in some technical sense. Just like withTechnical Concepts we are less interested to know what exactly those instances are. Rather, thereason why we name technical instances is because we want to use them. In case of <strong>HL7</strong> most ofthose technical instances will be particular data instances, such as messages, order numbers,service catalog items, or any other instance of a RIM class that we can refer to.But Technical Instances are also things like telephone numbers and e-mail addresses or UniformResource Locators (URL) to Web pages, images, or chat rooms. The general idea is that whatyou do with a phone number is to pick up your phone and call your party. You would not searchthe phone book in order to find the address of where a given telephone is located and to meatyour party there. Searching the phone book for an address would be to find out what a giventelephone number means. In most cases, we choose to directly use those telephone numbers bysimply picking up the next phone and dial that number.The same is true for database records or data instances on computer systems, we do not go andanalyze memory dumps of computer systems in order to find out what a given Technical Instancereally is, we just use them in some machinery that, for instance, lets us query for a given recordentry, lets us change that record entry.3.2 Technical InstancesThere are two different modes of referring to technical instances. You can (1) identify an instanceamong other instances present in a set (e.g. identifying a record in a data base). For instances thatare not immediately present, one can (2) locate that instance by dereferencing a pointer.However, there are many similarities between instances and pointers. It appears that thoseidentifiers can have three levels of quality. They can be62 22 Mar 1999DRAFT version 1.0


3.2 Technical InstancesDRAFT1. unique (globally)2. un-ravel-able3. de-reference-ableUnique IdentifiersSuppose you are given two identifiers. What you can always do is to compare them literally (i.e.character by character.) Now, if it turns out that these identifiers are literally equal, what do youknow? You know that they both refer to the same identical instance if and only if you can be surethat the literal match of both identifiers is not accidential because of some naming conflicts.Through narrowing down namespaces we can achieve uniqueness of identifiers quite easily. Thisis for example why in computer programming local variables in procedures are safer than usingglobal variables. The real important quality of uniqueness is that identifiers are globally unique.Global uniqueness is generally achieved by a structure defined in the following piece of BNF: ::= ::= Obviously this is a recursive structure, i.e. every namespace is itself identified by a name in itsparent namespace. This recursion up the namespace hierarchy must somehow be terminated. Thisis done by assigning one globally unique namespace, where names are valid without thereference to another namespace.The uniqueness of an identifier does not imply, however, that a given instance could not haveseveral names. Thus, if you compare unique identifiers literally and you find that they do notmatch, you know nothing. Both identifiers can still refer to the same instance.Un-ravelable IdentifiersAn identifier is "unravelable" if we can analyze its pieces, and for each piece, we can findsomeone to talk to.Internet domain names (DNS) are unravelable expressions. For example we can unravel thestring "falcon.iupui.edu" from the right, where "edu" is maintained by Internic (theorganization that assigns top level Internet domains). When the Indiana University PurdueUniversity Indianapolis (IUPUI) registered its domain name "iupui" with the Internic, they hadto name an official person who is responsible for "iupui". That person knows what "falcon"is.ISO Object Identifiers (OID) are unravelable too. ISO OIDs are unraveled from the left. Forexample,DRAFT version 1.0 22 Mar 199963


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT1.2.840.10008.421292.87828.333433.001stands forISO (1) ISO member body (2) USA (840) DICOM Standard (10008) AGFA (421292) ...The left most numbers are registered with gigantic organizations. Eventually, a company likeAGFA gets a number allocated, say, 421292. It then creates machines where one of the machineshas the number 87828. That machine allocates numbers to an imaging study (333433), thatcontains a series of images (001).In unraveling an ISO OID we walk the path down basically the same way as with DNS names.DICOM has registered people with in the US member body of ISO (ANSI). AGFA has registeredpeople to DICOM. They, or someone in the radiology department, could probably tell you that87828 is the CT machine in the trauma center. Finally, the machine itself allocates identifiers at"computer speeds" to things like studies and images.You can try out how it feels to unravel an OID using the information compiled by Harald TAlvestrand (http://www.alvestrand.no/objectid).<strong>HL7</strong> filler orders are somewhat unravelable. For example, you are given the filler order"1234^OUTPATIENT.LAB". If you could figure out what department the symbol"OUTPATIENT.LAB" referred to, then you could call them up, and ask them about item"1234".As we can see, the quality that an identifier is unravelable is a result of the way the namespacesare managed. Both ISO OIDs and Internet domain names are organized through hierarchicalnamespaces.De-referenceable IdentifiersAn identifier is "dereferenceable" if there is a machinery that resolves those identifiers for yourather than requiring you to go the rather painful way of unraveling. For Internet domain namesthere is such a machinery dedicated to resolve names. I.e. the domain name service (DNS). TheInternet name server next to you will resolve the address for you quite seamlessly. There is awhole infrastructure of domain name services, which is why it takes so long to get an answerfrom a DNS server if you typed in a wrong domain name: your DNS server asks another serverthat asks another server and so on.For ISO OIDs there is no such easy way of dereferencing. In some cases there may be catalogservices (e.g., X.500) that can resolve a subspace of the whole gigantic OID namespace.64 22 Mar 1999DRAFT version 1.0


3.2.1 Technical Instance IdentifierDRAFTA telephone number, however, is a perfectly unique and dereferenceable identifier if we start atthe root of the namespace provided by the global telephone system. Fax numbers are usuallywritten in a standardized way, where for instance "+49308153355" used to be my old fax andphone number in Germany, while "+13176307960" is my office phone number in U.S. All youneed to do to dereference such a phone number is to pick up your phone, dial the prefix forinternational codes ("+"), dial the other digits and my telephone will ring.Unified Resource Locators (URL) are another example of dereferenceable identifiers. Forinstance,http://aurora.rg.iupui.edu/v3dtis the version 3 data type project’s homepage. Your browser and the Internet does everything foryou after you typed in this URL. URLs start with naming the protocol to use, the rest of the URLis a literal that the protocol is supposed to understand. For example, I can watch the samehomepage as a local file using the URLfile:/home/schadow/public_html/v3dt/index.htmlIn general, for an identifier to be dereferenceable it need not be practically un-ravelable. Forinstance, a telephone number is for all everyday purpose not unravelable (only law enforcementis given this privilege). You may be able to figure out a country code (1 for U.S.) and an areacode (317 for Indianapolis), but you will have a pretty hard time to find the number 6307960 inthe phone book of Indianapolis.The important point about dereferencing identifiers is that you do not get down to their"meaning" in the real 3D world through the process of dereferencing. I.e. unless you come intomy office, you will never see my machine, "aurora", featuring the above homepage. And themachinery that dereferences URLs seamlessly does not bring you into my office. All you can dois to look at what the Internet/HTTP/Browser machinery brings to your screen as a result ofdereferencing the URL identifier. Likewise with the telephone you can call me, but you cannotcreep through the wire to see my telephone.We therefore create two different data types for referring to technical instances, one for technicalinstance identifiers and another for technical instance locators.3.2.1 Technical Instance IdentifierDRAFT version 1.0 22 Mar 199965


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTTechnical Instance IdentifierThis data type is used to uniquely identify some entity that exists within some computer system.Examples are object identifier for RIM class instances, things like medical record number,placer and filler order number, service catalog item number, etc.componentnamerootextensiontype/domain optionality descriptionISOObjectIdentifier [p.66]CharacterString [p. 40]requiredoptionalThis is the required field that guarantees theuniqueness of the identifier and that permits theorigin of the identifier to be determined (un-raveled).This can be the only field in institutions that useOIDs for their internal object identifiers.The extension can be used in case an institution usesnon-OID identifiers locally and does not want to mapevery internal identifier to OIDs. Especially useful ifthe local identifiers are not purely numeric. Thisfield may never ever be send alone without theconnecting root OID.<strong>HL7</strong> identifiers for technical instances are to be unique. For identifiers to be unique we have tomanage the global namespace. Most importantly every identifier must be explicitly linked to theroot of the namespace hierarchy. Since <strong>HL7</strong> has acquired a branch in the tree of ISO OIDs we arefree to use OIDs in a similar way as DICOM uses OIDs heavily and directly.In order to foster interoperability the technical instance identifier requires ISO Object Identifiersto be used. No other alternative unique identifier scheme is permitted. ISO Object Identifiers arevery common, and sufficiently easy to acquire.Many existing <strong>HL7</strong> systems do not assign purely numerical identifiers for the technical instancesin their realm. For instance they may use alphanumeric unique keys into any data file. We do notforce people to adopt a pure OID scheme for identifiers.<strong>HL7</strong> can, however, assign OIDs to everyone who writes applications for <strong>HL7</strong> and to everyonewho maintains <strong>HL7</strong> communications. On that basis, people are free to attach their own namingscheme to their standard OID. If they want, they may use OIDs in their realm, but they may alsouse free-form identifiers in the extension component.Organizations can use OID that they already have acquired from elsewhere (e.g. throughDICOM). <strong>HL7</strong> assigned OIDs are not required. <strong>HL7</strong> assigns OIDs as a service to its membersand users, but does not require OIDs to root in the <strong>HL7</strong> branch.66 22 Mar 1999DRAFT version 1.0


3.2.2 ISO Object IdentifiersDRAFT3.2.2 ISO Object IdentifiersISO Object Identifier (OID)The ISO Object Identifier is defined by ISO/IEC 8824:1990(E) clause 28.PRIMITIVE TYPEThe ISO definition of Object Identifier reads as follows:28.9 The semantics of an object identifier value are defined by reference to an objectidentifier tree. An object identifier tree is a tree whose root corresponds to [the ISO/IEC8824 standard] and whose vertices [i.e. nodes] correspond to administrative authoritiesresponsible for allocating arcs [i.e. branches] from that vertex. Each arc from that tree islabeled by an object identifier component, which is [an integer number]. Each informationobject to be identified is allocated precisely one vertex (normally a leaf) and no otherinformation object (of the same or a different type) is allocated to that same vertex. Thusan information object is uniquely and unambiguously identified by the sequence of[integer numbers] (object identifier components) labeling the arcs in a path from the rootto the vertex allocated to the information object.28.10 An object identifier value is semantically an ordered list of object identifiercomponent values. Starting with the root of the object identifier tree, each object identifiercomponent value identifies an arc in the object identifier tree. The last object identifiercomponent value identifies an arc leading to a vertex to which an information object hasbeen assigned. It is this information object which is identified by the object identifiervalue. [...]From ISO/IEC 8824:1990(E) clause 28The following diagram shows part of the huge ISO Object Identifier tree referred to in thedefinition.DRAFT version 1.0 22 Mar 199967


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTICUICUmedicineother stuff Kaiser113883surgeryICUpediatrics<strong>HL7</strong>LDSRegenstriefusersSaniTechSickTosvendorsWinSick<strong>HL7</strong> identified org.421292AGFA10008DICOM1 Internet1 US org. 101 US Govt.840 USA (ANSI)6US DoD840 USA2 ISO member body 3 ISO identified org.16 country assignments0ITU-T1 ISO2joint ISO/ITU-TFigure 3: The the hierarchy of ISO Object Identifiers and how it could be used by <strong>HL7</strong>.Rather than as a composite data type, we treat ISO Object Identifiers as primitives. However,through their semantic structure, there are a number of operations that can be performed with theobject identifier, including test for equalness and subsumption (i.e. partial match from the left).Just like in DICOM, ISO Object Identifiers may be treated as character strings by the ITS layer.How difficult will it be to acquire OIDs?ISO Object Identifiers come with the blessing of being world-wide unique and endorsed by theInternational Organization for Standardization (ISO). At the downside, one might be afraid howdifficult it will be for small vendors and users to make all the bureaucrats happy just in order toget one of such a unique Object Identifier.The good news is that no <strong>HL7</strong> vendor or user has to contact ISO in order to get an OID. OIDs areassigned hierarchically so that every OID can itself be reused as the basis for a large tree of otherOIDs. As soon as you have one OID you are an assigning authority by yourself. No need for you68 22 Mar 1999DRAFT version 1.0


3.2.2 ISO Object IdentifiersDRAFTto contact anyone else in order to issue other OIDs.<strong>HL7</strong> itself has acquired an OID recently. This makes <strong>HL7</strong> an assigning authority. On the onehand, we may use OIDs for <strong>HL7</strong> internal things. On the other hand we could have one branch for<strong>HL7</strong> identified organizations. This branch could be subdivided into users and vendors.A vendor who has acquired an OID could name all his <strong>HL7</strong> related products machines, software,single installations of their software and so on as OIDs in their subtree.For example, the Letter Day Saints (LDS) Hospital in Salt Lake City would have an OID at theuser’s side. They can, for example, subdivide their tree in pediatrics/medical/surgical departmentswhere each of them may have an ICU subdepartment with its systems and subsystems and so on.The Idea is that everyone can do with its part of the subtree whatever they want. Regenstrief andKaiser would have their OIDs to organize their namespace as they see fit.The point is that you need to get only one OID from somewhere else. Once you have your firstOID, you do with it whatever you want. It is just like you can design your directory hierarchy onyour hard disk just as you want. You can stick to a convention, or you can do chaos, as you seefit.How difficult will it be to use OIDs?One may hesitate to use ISO Object Identifiers within a system because of the amount of memorythey use up, in other words, the OIDs can become quite lengthy. Many legacy systems have theirpain threshold as low as 8 characters for identifiers. An OID would not fit into 8 characters. Forexample, some instance in the LDS pediatric ICU might have the following OID:2.16.840.1.113883.4.1.123456.32.101.12345.54321That is 44 characters. DICOM has set the maximal length to 80 characters. We will not specifyany particular maximal length since length is a problematic concept for Object Identifiers andOIDs are meant to be unbounded.But there is even a way to get around with only 8 characters. Here is how:No one should have trouble sending or receiving those long OIDs. The problem with length isonly about storing OIDs in data bases. Now, you can use an OID data base at your system thatcan handle long OIDs and that maps those to 8 byte base 64 strings. Those 8 byte strings allowyou to enumerate a total of 64 8 = 281,474,976,710,656 different identifiers. This is 2.814 x 10 15 ,a thousand-trillion numbers. Suppose you would waste those identifiers at a rate of 1000 persecond, your namespace would still not overflow in 8900 years!DRAFT version 1.0 22 Mar 199969


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTWhat ISO Object Identifiers can and can not doOne might wonder whether it is possible to interpret OIDs in a globally agreed way. TheAndover Working Group tried to design the OID namespace structure in a way that OIDs wouldnot only identify instances but would also classify them.So the question is: can we parse an OID and get any information from it? Can we learn anythingabout an instance just by looking at its OID? Things that we might what to find in an OID are:What Application? What Facility? What Department? What Country? What Location? WhichType? etc.We have to review our goal: we wanted to design an unique identifier for technical instances.Uniqueness that comes through hierarchical structure of the namespace brings with it the qualityof un-ravelability of identifiers. But the original meaning of "un-ravelable," was that unravelingan identifier is a painful and slow process. You use the phone, calling up ISO, ANSI, <strong>HL7</strong>, LDS,and so on until you have someone on the phone who is responsible for that number. Unraveling isnothing that a computer could do for you automatically. (Automatic unraveling would bedereferencing or resolving an identifier.)Thus, in general, there is no way to impose any meaning on the parts of an OID.However, owners of OIDs may "design" their namespace subtree in some meaningful way. Forinstance, Intermountain Healthcare could assign an OID to each of its institutions, the next levelwould contain departments. In each departments the number 1 would be the administrativesection, number 2 would be the ICU, number 3 would be the lab, number 100 to 999 would bethe normal inpatient wards, and so on.Everyone is free to design and use his own OID structure to make decisions. However, no oneoutside would be forced to do the same structuring. Thus, Intermountain Healthcare could baseit’s message routing heavily on the structures of their OIDs, but as soon as they receivesomething from the Utah State Dept. of Health or from the CDC, they would not necessarily beable to infer any meaning from the OIDs assigned by those other organizations.Can the root part of the OID be implied by some context?This really asks whether we can reduce the size of messages by setting any specific context,probably in the message header, which would be attached at the front of each incomplete OIDthat appears in the message.Apart from reducing message length, this does not seem to be a particularly useful feature. ISOObject Identifiers do not support any left-side incompleteness. We probably need not bother.70 22 Mar 1999DRAFT version 1.0


3.2.3 Technical Instance LocatorDRAFTThe main benefits of the Technical Instance Identifier using ISO Object Identifiers areSimplicity (only two components!)Flexibility (OIDs are already quite flexible, the "extension" component gives you all therest.)Interoperability (No worries for name clashes. No headache with local stuff. Actually,everything is local, but those localities are well organized in the overall OID system.)3.2.3 Technical Instance LocatorAnother data type of technical instance identifiers is dereferencable identifiers, or "locators". TheTechnical Instance Locator (TIL) is shaped similar to Universal Resource Locator (URL). That isTIL has the two components protocol and address where the format of address would bedetermined only by the protocol. Telephone number, e-mail address, and the locator for thereference pointer type would be of this data type.Technical Instance LocatorThis is a dereferencable locator for some instance. For example, a bunch of radiology imagesthat can be retrieved on demand. A given instance of this data type may not be valid forever.componentnameprotocoladdresstype/domain optionality descriptionCode Value[p. 116]for technicalconceptsCharacterString [p. 40]requiredrequiredThis mentions the protocol that can interpret theaccess string and can do something useful for theuser to render the particular technical instancereferred to. This may be spawning a WWW browserwith a particular URL, fetching a DICOM image andshow it, or opening a telephone connection to anotherparty.This is an arbitrary address string that must bemeaningful to the protocol.This data type is basically the URL. However. URLs are not maintained by <strong>HL7</strong> and <strong>HL7</strong> mayneed to have more freedom about defining its own protocols without adjustment to IETF needs.For example, we telephone numbers are semantically clearly Technical Instance Locators. AURL for telephone numbers does not exist, but it is conceivable how it would work. It would usean auto dialer to dial the telephone number put the called party on hold and signal to the humanuser that the line is opened. The human user would then pick up the phone and start talking.Likewise a URL for FAX data would initiate calls to send or retrieve telefax messagesautomatically.DRAFT version 1.0 22 Mar 199971


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTExamples for values of the TIL type are:(TIL:protocol (CodeValue:value "http":codeSystem "URL"):address "//aurora.rg.iupui.edu/v3dt")(TIL:protocol (CodeValue:value "ftp":codeSystem "URL"):address "//radiology.rg.iupui.edu/outbox/1ad832nd84nf.jpg")(TIL:protocol (CodeValue:value "mailto":codeSystem "URL"):address "your-boss@your-company.com")(TIL:protocol (CodeValue:value "PHONE":codeSystem "<strong>HL7</strong>PROT"):address "+13176307960")(TIL:protocol (CodeValue:value "FAX":codeSystem "<strong>HL7</strong>PROT"):address "+13176306962")3.2.4 Outstanding IssuesWe will still define as successor of the reference pointer (RP) to include the technical instancelocator but also more information about the thing that is referred. This would also include anexpiry date after which the locator can not be expected to be usable.72 22 Mar 1999DRAFT version 1.0


3.3 Real World InstancesDRAFT3.3 Real World InstancesWe refer to things in the "real world" generally by giving them names. Assigning names topeople, things and places are a public acts: the more people know some name, the more will laterunderstand what is meant by some name. In archaic cultures, knowing the name of somethingmeant having some power over it. Indeed, knowledge is power and without a name, we can nottalk about things, we can barely think of things, and we can not collect knowledge about them.The record linking problem is a moderen example pointing out the importance of names. Namesare the communicative handles over things.Alternatively, instead of naming things, we can describe them. The problem with descriptions isthat they refer to classes of everything that meets the description; but descriptions do not refer toindividuals. Of course, descriptions can be so detailled that there happens to be no secondalternative object in a given universe of discourse. Thus a description may identify an object.As opposed to descriptions, a name is essentially an arbitrary token assigned to the object it refersto. Since assignment of names is an action, it is performed by some actor. In the real world manyactors are entitled to assign names to entities. It thus happens that two or more things can begiven the same name. Moreover, the association between a thing and its name is not substantial,thus, this association can be lost. Birth certificates, passports, or tags are artifacts aiming insubstantiating the name-thing-association.This specification covers the following kinds of names:Real World Instance Identifier [p. 74] (e.g., SSN, DLN, Inventory #, etc.)Postal and Residential Address [p. 83]Person Name [p. 94]Organization Name [p. 114]Real World Instance Identifiers (RWII) are tokens designed to generate regular names, namesthat are handy and that have little ambiguity. Mostly those identifiers are designed to be easilycomputer-processable. The difference to a Technical Instance Identifier [p. 65] (TII) is that theTII naming scheme is tightly regulated, and that TIIs are supposed to never go through the handsof humans. Conversely, RWII does not regulate the naming scheme, and RWIIs are often taggedon things, issued on id cards, and are typed into information systems.The Person Name specification must deal with all the richness, variability, and ambiguity, thatthe cultural elaborations of person names entail. Organization Names are very similar to personnames, however, we simplify organization names drastically, since it was felt that organizationnames play a much less crucial role in health care than person names.DRAFT version 1.0 22 Mar 199973


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAddresses are also names for real world entities. The fact that locations tend to be extremelystable over a long period of time determines the structure of the address kind of names.Addresses determine locations by stepwise refinement of a scope (country - city - street - house -floor). Most scope-name has all the characteristics of names, i.e. arbitrarily assigned,non-descriptive, not unique. Apart from scope refinement all kinds of spacial descriptors can bepart of an addres (e.g. right hand side, opposite side.)3.3.1 Real World Instance IdentifierNote: This section is a proposal of the Data Type working group and still needs to benegociated with PAFM.External identifiers for real world people and things occur frequently. Examples for peopleidentifiers are Social Security Number, Driver License Number, Passport Number, IndividualTaxpayer Identification Number. Identifiers for organizations are, e.g., the federal identificationnumber or the Employer Identification Number. The current approach in the RIM is to use theStakeholder_identifier class for those numbers.Here are some of those identifiers used in the U.S.SSN used as a legal individual person identifierITIN (Individual Taxpayer Identification Number), like an SSN but issued by IRS for aliensnot eligible for an SSN.EIN (employer identification number) used by IRS for organizationsFIN (Federal Identification Number?) for corporationsDLN (Driver License Number). U.S. driver licenses are issued by the states. Driver licensesin the U.S. are used as identity cards.The "Universal" (meaning "U.S.American") Health Identifier - if it will ever come.Health Care Provider Identification Number (?)Passport NumberOther countries may or may not have similar identifiers. The interesting point is that suchidentifiers are often used for other than the original purposes. For example, very few U.S. peoplecare about whether you have a license to drive, but they do want your driver license numberanyway in order to get hold of your identity (e.g., to trust your bank check.) The U.S. SSN mayofficially not be used by everyone, but that does not keep everyone from using it as a prettyreliable person identifier. Banks and employers must collect the SSN of their customers andemployees (resp.) for tax purposes.However, there are other such identification numbers, not issued for persons. Those numbershave basically the same semantics and the same requirements, except that those numbers mightbe assigned for real world instances other than people or organizations. Examples are things, suchas devices and durable material (inventory numbers), lot numbers, etc.74 22 Mar 1999DRAFT version 1.0


3.3.1 Real World Instance IdentifierDRAFTThe public health / animal proposal, for example, has a concrete need for the followingidentification numbers:lip tattoo - horsesleg tattoo - dogsear tags - food animalsmicrochips - all speciesbreed registry number - dogsjockey club - thoroughbred horsesquarterhorse associationUS trotting associationHolstein association regsitry - cowsSuch real world instance identifiers are assigned not only by big organizations but also by smallerorganizations. For example, virtually every organization puts tags with numbers on theirinventory.Medical Record Numbers (MRN) as used in the world of Paper Medical Records are anotherexample for such real world instance identifiers. Note that in the computer world, we would notneed MRNs, since we could use Technical Instance Identifiers (TII) to refer to computerizedmedical records. However, Wes Rishel and I think that as a rule of thumb, TIIs should not becommunicated through human middlemen in order to keep reliability in their correctness high.Thus, as long as MRNs are typed in by clerks and other people, one should separate them fromTIIs.The basic structure of such a real world instance identifier is:valuevalidityperiodCharacterString [p.40]Interval [p. 149] OFPointInTime [p. 144]the identifier value itselfcovers effective date and expiration, begin and enddate/time, etc.kind Code Value [p. 116]A rough classification telling you what kind of identifierthis is (e.g. SSN, DLN, Passport, inventory, etc.)assigningauthority?An organization that has authority over and issued anidentifier.name space ?An organization may maintain different name spaceswithout necessarily creating organizational subdivisions.Thus one assigning authority may maintain multiplename spaces.DRAFT version 1.0 22 Mar 199975


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThe main methodological question is how we represent the identifier assigning authority. Thiswould usually be an organization, and hence would an issuing authority be represented by anassociation to the Organization class. This is basically what the Stakeholder_identifier class doesin RIM 0.88.However, this is also a problem. We are able to carry quite a lot of information about theidentifier assigning authority, which is good. But the structure is rather complex, which is bad.Particularly, while we all know that SSN, DLN, etc are issued by organizations, we do not careso much about that organization. The only thing we want to know is that a given number is anSSN.However, things become tricky if we try to shortcut. The problem is that SSN and DLN are validin realms defined by the issuing authorities. For example, for a DLN we need to know the state.For an SSN in an international context, we need to know the country.With a mandatory link to an assigning authority, an Indiana drivers license would be representedas having the "Indiana Bureau of Motor Vehicles (BMV)" as an issuing authority. This istroublesome because someone in California might not know that there is a BMV in Indiana. TheBMV, of course, is an affiliate of the state of Indiana, but communicating this as asuper-organization may be too much. In international contexts do, we would have to go oncemore through the stakeholder-affiliate loop so that the receiver can find out that Indiana isactually a part of the U.S. While this may be the correct solution, it seems to be ratherimpractical.The following principle options exist:1. Association with stakeholder (or organization) as the assiging authority. A clean, butsomewhat verbous heavy weight way, as described.Real World Instance Identifier (RWII)valueauthorityCharacterStringreference to OrganizationIn this alternative we pointing out to an Organization class instance from inside the datatype? This is a weird construct that we have never seen before in the world of the RIM vs.Data Types dichotomy.2. The Organization as an assigning authority would itself have one or more RWIIs. Thus, onerepresent the assigning authority recursively as a RWII.76 22 Mar 1999DRAFT version 1.0


3.3.1 Real World Instance IdentifierDRAFTReal World Instance Identifier (RWII)valueauthorityCharacterStringRWIIThis is a specific way to make the reference to an assigning authority Organization, i.e. bylooking up the organization through its RWII.3. An OID for assigning authority, which structurally renders the RWII similar to the TII butwith a very different semantics.Real World Instance Identifier (RWII)valueauthorityCharacterStringISO Object IdentifierThis alternative, while structurally similar to the TII is in fact very different. The TII issupposed to be globally and dependably unique. This dependable uniqueness, can not berequired from real world identifiers, that are ofthen reported orally or on paper. Morover,such numbers are often reused either accidentially (roll-over of counters) or voluntarily (oldnumber considered outdated).4. The traditional way to represent assiging authority would be through a single "code" fromsome "master table"Real World Instance Identifier (RWII)valueauthorityCharacterStringCharacterStringOptions 3 and 4 are seemingly simple but they do lead to practicability problems: They don’tscale. The OID is pseudo-unique and not meaningful (e.g. what is the OID of the state ofIndiana?) In both options 3 and 4 you have to interpret the authority part from some unknowntable or directory. This would not be a real problem if RWIIs would only be such official thingsas SSN, ITIN, EID, FID, DLN, etc. But the traditional medical record numbers are assignedlocally. Also Inventory numbers for devices are assigned locally.Options 2 through 4 use various schemes of forreign keys to refer to organizations, whichviolates the MDF rules that forreign keys must be turned into explicit associations. Alternative 1is principally open to whether or not forreign keys are used, but if Datatypes are considereddifferent from RIM classes the question is how such an association from a data type to a RIMDRAFT version 1.0 22 Mar 199977


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTclass could be made?Regardless whether the MDF deprecate forreign keys, this identifier data type "wants to be aforreign key" (as Mark Tucker puts it.) Indeed, this data type embodies the fact that we use"keys" in order to refer to things accross (foreign) models.Mark Tucker further offered the following "trick" to make alternative 4 useable and - to a certainextent - interoperable: People could use use local codes for assiging authorities within their usualcommunication horizon, assuming that master tables would be synchronized. For outsidecommunication, a "row" of such a master table could just be included in the message. This mastertable row would be used to map "strings" to "things".This allows for very short forms of identifiers, which is good. Conversely, representing assigingauthority as an Organization instance (alternative 1) would lead to ugly lengthy messages.However, two problems arise:It is not guarranteed that the strings for assigning authorities wouls be unioque within a message.How would we represent this "master file" construct?The Stakeholder hierarchy basically is such a master file structure. Thus the question is why wewould represent associations to "master" stuff differently for this data type than for all other RIMclasses?There is no easy way out of this dilemma, which suggests to put this Real World InstanceIdentifier "data type" as a class directly into the RIM. This allows the "data type" to associatewith other classes, such as organization. From this "data type" we can define CMETs and we canimplement those on ITSs however we like, i.e. we do not have to rely on a stereotypicautomatism to derive lengthy ITS representations when a short form would be more exonomicaland more pleasing to the "look and feel" of the message.There is a number of RIM changes pending that need a discussion and vote jointly with PAFMand CQ in the upcoming <strong>HL7</strong> meeting (Toronto.) Figure 4 [p. 78] shows the structure aroundStakeholder_identifier as of RIM 0.88.78 22 Mar 1999DRAFT version 1.0


3.3.1 Real World Instance IdentifierDRAFTFigure 4: Stakeholder_identifier as of RIM 0.88t.The changes in detail are as follows:DRAFT version 1.0 22 Mar 199979


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTPAFM1. PAFM (Richard Ohlmann) suggested to pass Stewardship of the Stakeholder_identifier classover to Control/Query.Rationale: this class will undergo a broadening of scope. PAFM therefore no longer has totake the burden of maintaining this class for everyone else. That’s what Control/Query isfor.CQ2. Rename class "Stakeholder_identifier" to "Real_world_instance_identifier".Rationale: to signify the broadening of this classe’s scope.3. Rename attribute "id" to "value" in order to disambiguate this attribute from a technicalinstance identifier.4. Assign Data Type Character String (ST) to the attribute "value".5. Rename Attribute: "effective_dt" to "validity_period".6. Assign Data Type: "Interval of Point in Time" to attribute validity_period.7. Delete Attribute: termination_dtRationale: the two attributes effective_dt and termination_dt were used to signify thevalidity period of the identifier. A period of time can more properly (and more compact) berepresented by the new data type Interval of Point in Time. This allows for infinite as well asunknown begin and termination dates.8. Delete Attribute: issued_dtRationale: it is unclear why date of issuing differs from effective date. There seems to be nousecase to me (PAFM folks: please confirm or defend!)9. Delete Attribute: qualifying_information_txt.Rationale: the use of this attribute is in part taken over by "namespace". Where it is nothandled through namespace different assiging authorities should be used. This prevents thesame information to be representable in different ways.10. Rename class: "Identifier_assigning_authority" to "Identifier_namespace"Definition: A list of identifiers owned and managed by an organization stakeholder. Anorganization that manages a name space is an identifier assigning authority.80 22 Mar 1999DRAFT version 1.0


3.3.1 Real World Instance IdentifierDRAFT11. Remove all attributes.Rationale: This is no longer a role-class. Nobody could define the use case of the oldrole-class and the begin/end time attributes. It seems to have been created as modelingstereotype that was not uesful in practice.12. Add attribute "name" of type Character String (ST).Definition: The name of a namespace is a symbol that might be used as a short form for thenamespace in messages. This accomodates the practice that assigning authorities are justkept in a table of symbols, without attaching any real information about the organization.13. Change role-names and multiplicities as shown in Figure 5. [p. 82]PAFM14. Move Attribute: citizenship_country_cd from Person to Stakeholder.Rationale: in an international use context of <strong>HL7</strong> it is necessary to keep track of the"citizenship" of organizations as well as of individual persons.15. Rename Attribute: "citizenship_country_cd" to "citizenship_cd".Rationale: A shorter name is easier to read, write, speak and memorize.16. Delete Attribute: "nationality_cd"Rationale: The difference between citizenship and nationality is unclear, did not exist in<strong>HL7</strong> v2.x, and thus, can be deleted.PAFMThe following are suggestions for simplification of the stakeholder affiliation loop. Thesechanges are not essential to the Control Query related requirements. Nevertheless, since thestakeholder affiliation loop would be used by all of Control Queries "customers" we have aninterest in this to be as cumberless as possible.17. Move Attribue: "family_relationship_cd" from Stakeholder_affiliate toStakeholder_affiliation.18. Reroute Association: from Stakeholder_affiliation "secondary_participant" to attach directlyat Stakeholder.19. Delete Class: Stakeholder_affiliate.DRAFT version 1.0 22 Mar 199981


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTRationale: This additional relationship class on the "secondary" leg of stakeholder affiliatewas primarily a modeling stereotype of little known practical use. The familiary relationshipcan as well be carried by the stakeholder affiliate class where applicable. This leads to amodel that is simpler to use and simpler to understand while maintaining the same level ofexpressiveness and explicity.20. Delete Association loop "subdivision" at Organization.Rationale: this subdividing of organizations is a kind of "affiliation" relationship, whichwould also be expresed by the "Stakeholder_affiliation" class. There should be only one wayof expressing affiliations (including subdivision).Stakeholder_affiliation.family_relationship_cd should have a value reserved for subdivisionof organizations. Note that affiliation_type code is to express the "purpose" of a particularaffiliation (e.g. emergency contact), while family_relationship is the durable relationshipbetween stakeholders throughout all purposeful affiliations.Others21. 21.New Association: classes that would have a real world instance identifier, such as,"Durable_medical_equipment" should be associated to the Real_world_instance_identifierclass. To exemplify that the new class can be used not only to identify stakeholders but alsothings and animals.We can also reuse this data type in order to put the identifiers for stakeholders in theirproper place in the model, instead of pushing them all up into the highest level of thehierarchy, i.e. the Stakeholder class.The following diagram shows the effect of the proposed changes.82 22 Mar 1999DRAFT version 1.0


Durable_medical_equipment3.3.1 Real World Instance IdentifierDRAFTis_assigned0..1Real_world_instance_identifiervalue : ST1type_cd : CVassigned_tovalidity_period : INV0..*{one-of}from_namespace0..*0..1 containsIdentifier_namespacename : STmanaged_by0..*is_assigned0..1Stakeholderis_primary_in0..*addrcredit_rating_cd1has_as_primaryemail_address_txtphontype_cdcitizenship_cd 1 has_as_secondaryis_secondary_in0..*Stakeholder_affiliationaffiliation_type_cddesceffective_dttermination_dtfamily_relationship_cdmanages0..1Organizationorganization_name_type_cdorganization_nmstandard_industry_class_cdPersonbirth_dttmbirthplace_addrconfidentiality_constraint_cddeceased_dttmdeceased_indknown_bydisability_cdeducation_level_cdethnic_group_cdgender_cdlanguage_cdmarital_status_cdmilitary_branch_of_service_cdmilitary_rank_nmmilitary_status_cdrace_cdreligious_affiliation_cdstudent_cdvery_important_person_cdPerson_namename : PN0..* purpose_cd : CVvalidity_tmr : IVLprimary_ind : BLFigure 5: The Stakeholder_Identifier has become the "Real World Instance Identifier" and is thususeful for other things, such as the inventory number of medical devices.This is basically a stepwise RIM change as would be required for Harmonization. We willdiscuss this with PAFM and other affected technical committees at the next <strong>HL7</strong> meeting(Toronto).DRAFT version 1.0 22 Mar 199983


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT3.3.2 Postal and Residential AddressThe old <strong>HL7</strong> address data types (AD, XAD) regarded an address as a data structure where eachcomponent had a special role. For instance, AD distinguished ZIP, city, state, country, street, andother parts of the address.Over time people discovered more information elements that could be known about an addressand added those elements as components to the address data type. Those additional componentswhere county, census tract, etc. Those information items would normally not appear on mailinglabels and one would not necessarily ask for them if oue would go visit someone under a givenaddress.On the other hand it turned out that there are a number of information elements that do appear onmailing labels which are nevertheless rare and therefore remained unclassified. For instance, U.S.military addresses may have a unit designation "UNIT 2050" instead of a street and instead or inaddition to a city. The name of a ship (e.g. "U.S.S. Enterprise") can appear instead of a city.Internationally there are other address parts that may exist in one country but may be unknown inanother country. For example, in U.S. addresses one finds directional codes like "N", "S", "W",and "E", which are essential to find a given address in one city. Those direction codes areunknown, for instance, in Germany.Robin Zimmerman and Joann Larson have compiled an analysis of U.S. and some internationaladdresses based on information of the universal postal union (http://www.upu.int/) (UPU). Thiswork reinforces the observation that there are so many different kinds of address parts thatcreating a fixed data structure where every part has its slot is impractical. See also examples ofworld wide addresses (http://www.upu.int/addressing/AN/AN.pdf) as published by the UPU.There is also an australian standard that defines the pieces an address is made up of.Another problem with the old address data types was that they ordered the parts of an address bythe meaning of that part. The most important use case for address information, however, isprinting a mailing label. In order to generate a mailing label it doesn’t matter what the emaning ofthe different parts of an address is, as long as those parts appear at the appropriate place on thelabel.The placement of address parts, however, depends on the country. For example, while in U.S.and most European addresses the ZIP code appears somewhere at the end, Japanese ZIP codesare written at the very top. In fact, Japanese addreesses are writen in the reverse direction: fromthe most general locator tho the specific locations, with the name of the recipient appearing at theend.84 22 Mar 1999DRAFT version 1.0


3.3.2 Postal and Residential AddressDRAFTEven in addresses of the north western part of the world there are such differences as to how ZIPcode and city are placed. In Germany and most European countries, for instance, the ZIP code isplaced in front of the city, while in England, the ZIP code appears after the city name on aseparate line. In the U.S. the zip code follows the city and usually the state code. In mostEuropean countries, special country codes (different from ISO 3166 country codes) are writtenbefore the ZIP code (separated from the ZIP code by a dash). In U.S. and England country codesappear at the end. In Great Britain, however, the ZIP appears even after the country designator,whereas in the U.S.A. the country code appears at the very end.In short, layout and meaning of address parts are independent (orthogonal) issues, but the addressdata type must take care of both. The focus, however, is not on the meaning of the parts, but onthe layout. Although we could define a semantically very fine-grained address part classification,those would be impractiacl to use with a large majority of existing information systems that donot make those fine grained semantic distinctions. There are simply too many different addressparts and too many different country-specific variations, that may or may not really correspond.Thus, focusing primarily on the layout of address labels is a way to establish a greatest commondenominator for interoperability. System A might store addresses in 5 lines. System B mightdistinguish ZIP code, country, state and a street line. System C might distinguish a house-numberon the street line (common in Germany or Holland). System B can use system C’s addresses andA can use addresses from both B and C.It is still a problem how system C can find a house number in the street-line or how system B canidentify a street-line in a list of lines received from system A. Rather than forcing everyone tomake the most fine-grained distinction we require those systems who make the distinctions todeal with the less distinctive addresses.DRAFT version 1.0 22 Mar 199985


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTPostal and Residential AddressThis Address data type is used to communicate postal addresses and residential addresses. Themain use of such data is to allow printing mail labels (postal address), or to allow a person tophysically visit that address (residential address). The difference between postal and residentialaddress is whether or not there is just a post box. The residential address is not supposed tocontain other information that might be useful for finding geographic locations or doingepidemiological studies. These addresses are thus not very well suited for describing thelocations of mobile visits or the "residency" of homeless people.componentnamepurposebad addressflagvaluetype/domain optionality descriptionCode Value[p. 116]Boolean [p.28]LIST OFAddress Part[p. 86]optionaloptionalmandatoryA purpose code indicates what a given address is tobe used for. Examples are: prefered residency (usedprimarily for visiting), temporary (visit or mailing,but see History [p. 154] ), preferred mailing address(used specifically for mailing), and some morespecific ones, such as "birth address" (to trackaddresses of small children). An address withoutspecific purpose code might be a default addressuseful for any purpose, but an address with a specificpurpose code would be prefered for that respectivepurpose.Indicates that an address is not working. Absence ofa status means "unknown" status, i.e., that is’tpresumably a good address. If the flag is setexplicitly to false, it means that this address has beenproven to work at least once.This contains the actual address data as a list ofaddress parts that may or may not have semantictags.86 22 Mar 1999DRAFT version 1.0


3.3.2 Postal and Residential AddressDRAFTAddress PartThis type is not used outside of the Address [p. 85] data type. Addresses are regarded as a tokenlist. Tokens usually are character strings but may have a tag that signifies the role of the token.Typical parts that exist in about every address are ZIP code, city, country but other roles may bedefined regionally, nationally, or on an enterprize level (e.g. in military addresses). Addressesare usually broken up into lines which is indicated by special line break tokens.componentnamevalueroletype/domain optionality descriptionCharacter String[p. 40]Code Value [p.116]Purpose Codes for AddressShort Longmandatoryexception: for linebreak tokens.optionalMeaningR RES residency used primarily to visit an address.P PO postal address used to send mail.T TMP temporary address visit or mailing, but see History [p. 154]B...BRTH birth address CDC uses those for child immunization.Role Codes for Address PartsThe value of an address part is whatis printed on a label.The role of an address part (if any)indicate whether an address part isthe ZIP code, city, country, post box,etc.DRAFT version 1.0 22 Mar 199987


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTShort LongL LIT literal this is the default role codeK DELC CNT countryT CTY city (town)E STAZ ZIP ZIP codeH HNRAADLMeaningdelimiter stuff, printed without framing whitespace. Line break if no valuecomponent provided.state ("E" as in French état, which should reconcile the French who have to use "E"for their "departements")house number (aka. "primary street number", however, it is not the number of thestreet, but the number of the house or lot alongside the street.)additional locator can be a unit designator, such as appartment number, suitenumber, but also floor. There may be several unit designators in an address tocover things like: "3rd floor, Appt. 342". This can also be a designator that pointsaway from the location, rather than specifying a smaller location within somelarger one. Example is Dutch "t.o." to mean "opposite to" for house boats.S STR street name or numberST STT street type (e.g. street, avenue, road, lane, ...) (probably not useful enough)D DIR direction (e.g., N, S, W, E)P POB P.O. Box...ExamplesPlease note that the person name is not part of our address type even though it is mentioned byUPU and Joann/Robin’s list.A U.S. address1028 Pinewood CourtIndianapolis, IN 46240U.S.A.88 22 Mar 1999DRAFT version 1.0


ExamplesDRAFT(Address (LIST(AddressPart :value "1028 Pinewood Court")(AddressPart :role "DEL")(AddressPart :value "Indianapolis" :role "CTY")(AddressPart :value ", " :role "DEL")(AddressPart :value "IN" :role "STA")(AddressPart :value "46240" :role "ZIP")(AddressPart :role "DEL")(AddressPart :value "U.S.A." :role "CNT"))); LIT is the default role; DEL’s value is newline by default; DEL comes w/o extra space; DEL’s value is newline by defaultA German addressWindsteiner Weg 54AD-14165 Berlin(Address (LIST(AddressPart :value "Windsteiner Weg 54A") ; LIT is the default role(AddressPart :role "DEL"); DEL’s value is newline by default(AddressPart :value "D" :role "CNT")(AddressPart :value "-" :role "DEL") ; no whitespace before and after(AddressPart :value "14165" :role "ZIP")(AddressPart :value "Berlin" :role "CTY")))White Space RulesAddress labels contain white space. The white space rules used in typestetting are not trivial. Ingeneral two words are separated by white space. An interpuction mark, like a komma or periodfollows directly to the preceding non-whitespace stuff, but those marks are always followed bywhitespace. Dashes are not surrounded by whitespace at all. Note the whitespace rules do notreally exist for languages such as Thai or Japanese where white space is basically not used.However, you can always simply ignore whitespace, which is why Thai and Japanese are easierto print. In any case, neither Thai nor Japanese would have whitespace where it was not allowedin Latin script.The difficult whitespace rules can, for the purpose of the Address data type be broken down intoonly six precise rules:1. White space never accumulates, i.e. two subsequent spaces are the same as one. Subsequentline breaks can be reduced to one. White space around a line break is not significant.2. Literals may contain explicit white space, subject to the same white space reduction rules.There is no notion of a literal line break within the text of a single address part.DRAFT version 1.0 22 Mar 199989


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT3. Leading and trailing explicit whitespace is insignificant in all address parts, except fordelimiter (DEL) address parts.4. By default an address part is surrounded by implicit white space.5. Delimiter (DEL) address parts are not surrounded by any implicit white space.6. Leading and trailing explicit whitespace is significant in in delimiter (DEL) address parts.This means that all address parts are generally surrounded by white space, but white space doesnever accumulate. Delimiters are never surrounded by implicit white space and every whitespacecontributed by preceeding or succeeding address parts is discarded, whether it was implicit orexplicit. For example, all of the following variants(lit "1028") (lit "Pinewood Court")(lit "1028 ") (lit "Pinewood Court")(lit "1028") (lit " Pinewood Court")(lit "1028 ") (lit " Pinewood Court")(lit "1028 ") (lit " Pinewood Court")are printed the same way:"1028 Pinewood Court"with only one white space between "1028" and "Pinewood Court".A DEL address part is a delimiter, and would never be framed by implicit white space. As notedabove, a comma is always followed by white space, but this whitespace would have to be part ofthe value part of the delimiter. <strong>HL7</strong> systems do not have to enforce all those typographical rules.For example, all of the following variants(lit "Indianapolis") (del ", ") (lit "IN")(lit "Indianapolis ") (del ", ") (lit "IN")(lit "Indianapolis") (del ", ") (lit " IN")(lit "Indianapolis ") (del ", ") (lit " IN")are printed the same way:90 22 Mar 1999DRAFT version 1.0


ExamplesDRAFT"Indianapolis, IN"with no white space before the comma and only one white space after the comma, i.e. the whitespace that has been provided literally in the delimiter value string. This literal space could havebeen missing, as in the following cases(lit "Indianapolis") (del ",") (lit "IN")(lit "Indianapolis ") (del ",") (lit "IN")(lit "Indianapolis") (del ",") (lit " IN")(lit "Indianapolis ") (del ",") (lit " IN")(lit "Indianapolis") (del ",") (lit " IN")which are printed all the same way:"Indianapolis,IN"without the space after the comma. This is not good typographic style, but it is not enforced by<strong>HL7</strong> rules. No space is wanted around dashes, such as in European addresses:(cnt "D") (del "-") (zip "12200") (cty "Berlin")(cnt "D ") (del "-") (zip "12200") (cty "Berlin")(cnt "D ") (del "-") (zip "12200") (cty " Berlin")which are printed all the same way:"D-12200 Berlin"The DEL address part does not need any value for a DEL’s value is a line break by default. Notethat our whitespace rules apply nicely to line breaks, since a line break makes trailing white spaceof the previous line redundant and leading white space of the subsequent line is correctlyremoved too.DRAFT version 1.0 22 Mar 199991


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTFurther ExamplesThe following is another U.S. address with maximal tagging of the address parts:1001 W 10th Street RG5Indianapolis, IN 46202U.S.A.(Address (LIST(AddressPart :value "1001" :role "HNR")(AddressPart :value "W" :role "DIR")(AddressPart :value "10th" :role "STR")(AddressPart :value "Street" :role "STT")(AddressPart :value "RG5" :role "LIT")(AddressPart :role "DEL")(AddressPart :value "Indianapolis" :role "CTY")(AddressPart :value ", " :role "DEL")(AddressPart :value "IN" :role "STA")(AddressPart :value "46202" :role "ZIP")(AddressPart :role "DEL")(AddressPart :value "U.S.A." :role "CNT")))The instance notation shows how different the new address type is compared with the old <strong>HL7</strong>AD/XAD types.This address type is an interesting construct: It is kind of the inverse of a record data structure. Ina record, we have a bunch of slots that may or may not contain data. In this data type we have abunch of data that may or may not be assigned slots.XML ITSIt is especially interesting to see how this data type maps into XML. An automatic mapping (asthe one used for the HIMSS demo) would create a very long unreadable XML. But the reason forthe popularity of XML is that markup can be added gently to a basically "human readable" text.XML wise a much nicer represenation would be:1001 W 10th Street RG5Indianapolis, IN 46240U.S.A.92 22 Mar 1999DRAFT version 1.0


ExamplesDRAFTthe contents of this address could now be refined:1001 W 10th Street RG5Indianapolis, IN 46240U.S.A.note that in the above represenation we at least allowed address part roles to occur as XMLattributes. If DTDs were not used, one could even create a nicer representation if we turn the rolecodes into XML tags.1001 W 10th Street RG5Indianapolis, IN 46240U.S.A.Actually the address data type is an example for the paradigmatic use case of XML: a bunch ofdata that may or may not be further marked up. It would be very odd if we would not use XML inthis classic way for this classic use case.Outstanding IssuesShould we allow for address part values other than mere Character Strings? Especially, shouldwe allow for code values? Using code values seems to make sense for things like country codeand state. Using a code table for state or countries is of course safer and allows to processaddresses into groups.While this is possible in general, we have three problems:1. The data type definition and all of the instances would become more complex, since wehave to define the AddressPart.value as a type choice between CharacterString [p. 40] andCodeValue [p. 116] (or even ConceptDescriptor [p. 122] !)2. While there are codes for U.S. states and countries (e.g., ISO 3166 Country Code(http://www.unece.org/trade/rec/rec03en.htm)) those codes are not used uniformly. Thereare two forms to abbreviate U.S. states, e.g., the Commonwealth of Massachusetts can be"MA" or "Mass.". While the ISO country code is suggested for international use, there is aDRAFT version 1.0 22 Mar 199993


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTlong tradition in Europe to abbreviate countries in a different code (same that is used forcountry stickers on cars.) Thus, the ISO code for Germany is "DE" but "D" is used all overEurope.Since there are different code tables in use one might even require the Concept Descriptordata type to account for the translations. This is a considerable overhead, for what use?3. The use case of codes in addresses is very limited. If a receiver really wants to rely on thosecodes, we set up a number of requirements that did not exist before. (1) the address partmust be tagged with an explicit role, (2) the right code must be used by the sender. The usecase to code addresses is very localized, which means, the coding of address parts may beneeded in one application but it is not needed in many others. In order to print labels andvisit people, coded address parts are not essential.We probably do not whant to make the address data type any more complex than it already is.<strong>HL7</strong> should certainly not impose more requirements to code certain address parts. It just seemsnot to be a widely demanded use case, an a priory argument for coded address parts, which couldoffset the lack of use cases, seems to not exist.However, there is one powerful way in which the simpler address data type defined here canmeet the needs of those who would like to have coded address fields: type casting.Through type casting a message would be valid even though the sender put a CodeValue, orConceptDescriptor in place of a CharacterString. This means, a sender, who does code addressparts, is able to send his coded address parts to a peer, who also prefers to receive coded addressparts where possible. Thus, an implementation may behave as if the address data type would bedefined in a more complex way.The point is, we don’t have to make the <strong>HL7</strong> specification more difficult to understand andimplement for those who do not want this extra feature of coded address parts and still allowthose who want to deal with the extra work to go ahead and do it. This is another example whereimplicit type casting in a well defined type system proves extremely useful: the canonicalspecification can remain simple, and still extra requirements can be supported in a compatibleway!3.3.3 Person NameThe <strong>HL7</strong> v2 person name data types (PN, XPN) have basically the same problems as the datatype for addresses [p. ??] . I.e., they try to make slots for data so that whatever name parts existmust be fitted in one of the available slots. This has the same disadvantages: that name part typesdo not classify in a simple and interchangeable way throughout all cultures, but still everyonemust use the same classification. Second problem is that the meaning of a name part and thepositioning of a name part are orthogonal (independent) aspects of a name. As an additionalproblem, person names may occur in different ordering and some name parts are or are not used94 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTdepending on the use case (e.g., formal vs. familiar style).The decisions made here were informed by the following references:1. Bidgood DW Jr, Tracy WR. In search of the name. Proc Annu Symp Comput Appl MedCare, 1993; p. 54-58.2. Bidgood DW Jr, Tracy WR. ANSI HISPP MSDS: COMMON DATA TYPES forharmonization of communication standards in medical informatics. Final Draft. 10/30/1993.Available as Postscript(http://www.mcis.duke.edu/standards/HISPP/MSDS/CommonDataType1102.ps) or Word(http://www.mcis.duke.edu/standards/HISPP/MSDS/CommonDataType1102.doc).3.Hopkins R. Strategic short study: names and numbers as identifiers. CEN TC251. Availableas PDF (http://www.centc251.org/SSS/NandN/SSSNandN18.pdf) or Word(http://www.centc251.org/SSS/NandN/SSSNandN18.rtf). Note especially Appendix B:National Name Forms by Arthur Waugh, Australia.4. Anonymus. A Study on names in the US and in the Netherlands Available here(http://www.mcis.duke.edu/standards/<strong>HL7</strong>/localization/<strong>HL7</strong>NetherlandsNames97-198.htm).5. This conference call was based on a worksheet that summmarizes some earlier discussions.We first present the proposed data structure for person name and then we will show examples,discuss ramifications, and justify why this particular design has been chosen.Data Type Specification for Person NameEarlier discussions included class person name and person name variant, but we found therequirement to model person name as a RIM class. What we did not realize is that, similar to thestakeholder id, our RIM class already exists, it only needs to be polished.The RIM class Person_name will be developed from the class Person_alternate_name of RIM0.88 jointly with PAFM. A person may have multiple instance of the person name class,reflecting the multiple names the person is or was known by.Within this RIM class, there is a code that indicates what purpose a given name is to be used for.Most people in the world will have one name that is currently used.DRAFT version 1.0 22 Mar 199995


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTSYMBOL SHORTnormallicenseNLName Purpose CodesDESCRIPTIONThe name normally used. May be restricted through validity timeintervals.Name not normally used, but registered on some record, license or othercertificate of professional or academic credential, but that is not normallyused (includes birth certificates, school records, degrees & titles, andlicenses.)artist A An artist’s pseudonym includes "stage name", writer’s name.indigenous IreligiousRIndigenous or tribal names, such as existing abong native Americans andAustralians.Name adopted through practice of religion. For example, "FatherIrenaeus," "Brother John," or "Sister Clementine" are religious names thatpersons adopted through entering an order or assuming a religious officeor both.Note that name purpose codes apply to an entire name that usually consists of several of the nameparts described below.There is also a way to specify the validity time of a name.This class also contains a representation of a single name variant as a list of person name partsthat may or may not have semantic tags.Those RIM changes will have to be discussed jointly with CQ and PAFM at the Toronto meetingin April 1999. We will seek definite closure on the issue in Toronto after which Harmonizationwill be but a formal issue, since all relevant parties will have agreed to one proposal.Person Name (PN)This type used in the RIM class Person_name that will be developed from the classPerson_alternate_name of RIM 0.88 jointly with PAFM. Person names consist of tagged PersonName Parts [p. 96] . Typical name parts that exist in about every name are given names, andfamiliy names, other part types may be defined culturally.LIST OF PersonNamePart [p. 96]96 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTPerson Name PartThis type used in the Person Name data type only. Each person name part may have a tag thatsignifies the role of the name part. Typical name parts that exist in about every person name aregiven names, and familiy names, other part types may be defined culturally.componentnamevalueclassifierstype/domain optionality descriptionCharacterStringSET OF CodeValuemandatoryoptionalThe value of a name part.Classifications of a name part. One name part canfall into multiple categories, such as given namevs. familiy name and name of public records vs.nickname.Name Part ClassifiersSYMBOL SHORT DESCRIPTIONAxis 1givenfamilyprefixsuffixdelimiterThis is the main classifier. Only one value is allowed.GFPSDGiven name (don’t call it "first name" since this given names do notalways come first)Family name, this is the name that links to the genealogy. In somecultures (e.g. Eritrea) the family name of a son is the first name of hisfather.A prefix has a strong association to the immediately following namepart. A prefix has no implicit trailing white space (it has implicit leadingwhite space though). Note that prefixes can be inverted.A suffix has a strong association to the immediately preceeding namepart. A prefix has no implicit leading white space (it has implicit trailingwhite space though). Suffices can not be inverted.A delimiter has no meaning other than being literally printed in thisname representation. A delimiter has no implicit leading and trailingwhite space.Axis 2 Name change classifiers decribe how a name part came about. More than one valueallowed.birthBA name that a person had shortly after being born. Usually for familiynames but may be used to mark given names at birth that may havechanged later.DRAFT version 1.0 22 Mar 199997


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTunmarriedchosenadoptionspouseAxis 3nickcallmerecordinitialinvisibleUHCMA name that a person (either sex) had immediately before her/his firstmarriage. Usually called "maiden name", this concept of maiden name isonly for compatibility with cultures that keep up this traditional concept.In most cases maiden name is equal to birth name. If there are adoptionor deed polls before first marriage the maiden name should specify thelast family name a person acquired before giving it up again throughmarriage.A name that a person assumed because of free choice. Most systemsmay not track this, but some might. Subsumed in the concept of"chosen" are pseudonyme (alias), and deed poll. The difference in civildignity of the name part is given through the R classifier below. I.e. adeed poll creates a chosen name of record, whereas a pseudonym createsa name not noted in civil records.A name that a person took on because of being adopted. Adoptions mayhappen for adults too and may happen after marriage. The effect on the"maiden" name is not fully defined and may, as always, simple dependon the discretion of the person or a data entry clerk.The name assumed from the partner in a marital relationship (hence the"M"). Usually the spouse’s familiy name. Note that no inference aboutgender can be made from the existence of spouse names.Additional classifiers. More than one value allowed.NCRI0 (zero)Indicates that the name part is a nickname. Not explicitly used forprefixes and suffixes, since those inherit this flag from their associatedsignificant name parts. Note that most nicknames are given namesalthough it is not required.A callme name is (usually a given name) that is preferred when a personis directly addressed.This flag indicates that the name part is known in some official record.Usually the antonyme of nickname. Note that the name purpose code"license" applies to all name parts or a name, whereas this code appliesonly to name name part.Indicates that a name part is just an initial. Initials do not imply a trailingperiod since this would not work with non-Latin scripts. Initials mayconsist of more than one letter, e.g., "Ph." could stand for "Philippe" or"Th." for "Thomas".Indicates that a name part is not normally shown. For instance,traditional maiden names are not normally shown. Middle names may beinvisible too.98 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTweakWUsed only for prefixes and suffixes (affixes). A weak affix has a weakerassociation to its main name part than a genuine (strong) affix. Weakprefixes are not normally inverted. When a weak affix and a strong affixoccur together, the strong affix is closer to the its associated main namepart than the weak affix.Axis 4 Additional lassifiers for affixes. Usually only one value allowed per affix.Classification does not try to be complete.voorvoegsel VVacademicATprofessional PTnobletyNTWhite Space RulesA dutch "voorvoegsel" is something like "van" or "de" that might haveindicated noblety in the past but no longer so. Similar prefixes exist inother languages such es Spanish, French or Portugese.Indicate that a prefix like "Dr." or a suffix like "MD" or "PhD" is anacademic title.Primarily in the British Imperial culture people tend to have anabbreviation of their professional organization as part of their credentialsuffices.In Europe there are still people with noblety titles. German "von" isgenerally a noblety title, not a mere voorveugsel. Others are "Earl of" or"His Majesty King of ..." etc. Rarely used nowadays, but some systemsdo keep track of this.Names contain white space. The white space rules used in typestetting are not trivial. In generaltwo name parts are separated by white space. An interpuction mark, like a komma or periodfollows directly to the preceding non-whitespace stuff, but those marks are always followed bywhitespace. Dashes are not surrounded by whitespace at all. Note the whitespace rules do notreally exist for languages such as Thai or Japanese where white space is basically not used.However, you can always simply ignore whitespace, which is why Thai and Japanese are easierto print. In any case, neither Thai nor Japanese would have whitespace where it was not allowedin Latin script.The difficult whitespace rules can, for the purpose of the person name data type, be broken downinto the following precise rules:1. White space never accumulates, i.e. two subsequent spaces are the same as one.2. Literals may contain explicit white space subject to the same white space reduction rules.DRAFT version 1.0 22 Mar 199999


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT3. Except for prefix, suffix and delimiter name parts, every name part is surrounded by implicitwhite space. Leading and trailing explicit whitespace is insignificant in all those name parts.4. Delimiter name parts are not surrounded by any implicit white space. Leading and trailingexplicit whitespace is significant in in delimiter name parts.5. Prefix name parts only have implicit leading white space but no implicit trailing whitespace. Trailing explicit whitespace is significant in prefix name parts.6. Suffix name parts only have implicit trailing white space but no implicit leading whitespace. Leading explicit whitespace is significant in suffix name parts.This means that all address parts are generally surrounded by white space, but white space doesnever accumulate. Delimiters are never surrounded by implicit white space, prefixes are notfollowed by implicit white space and suffixes are not preceeded by implicit white space. Everywhitespace contributed by preceeding or succeeding name parts around those special name partsis discarded, whether it was implicit or explicit.ExamplesIrma Jongeneel, of <strong>HL7</strong> the Netherlands, has many nice ramifications in her name, so we willdwell a little bit on her name. Irma has two given names "Irma" and "Corine". In her childhoodher family name was "de Haas". Then Irma married Gerard Jongeneel. In Holland both spousescan choose to use either or both of their familiy names in arbitrary order. For the public recordsIrma chose the combination "Irma Corine Jongeneel-de Haas". But we know her by the name"Irma Jongeneel", i.e. for casual cases she assumed the family name of her spouse. But if Irmawould have to show up in a court of law and her name was cited, she would be called "IrmaCorine de Haas e.g. Jongeneel" where "e.g." stands for "echtgenote van" meaning "spouse of".Let’s write down the variants that we know now in the familiar instance notation.First the name by which we know herIrma Jongeneel(PN(PersonNamePart :value "Irma":classifiers (SET given record))(PersonNamePart :value "Jongeneel":classifiers (SET family record spouse)))Just as with the address we have to take care about spacing. When the name is to be printed weusually have the name parts separated by white space. But there are notable exceptions which wewill encounter in the following example.100 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTThe following is the name of her marriage record (?)Irma Corine Jongeneel-de Haas(PN(PersonNamePart :value "Irma":classifiers (SET given record))(PersonNamePart :value "Corine":classifiers (SET given record))(PersonNamePart :value "Jongeneel":classifiers (SET family record spouse))(PersonNamePart :value "-":classifiers (SET delimiter))(PersonNamePart :value "de Haas":classifiers (SET family record birth)))Note that the dash "-" is printed without leading and trainling white space. This is signified by theflag delimiter in the name classifier set. We know this flag already from the from the Addressdata type. Since names never have line breaks, this line break feature does not exist withdelimiters in person names.VoorvoegselThere is a problem with the "de" that is classified as a voorvoegsel in dutch. Another verycommon voorvoegsel is "van" as in "van Soest". This Dutch "van" is not actually a nobletyprefix, although it sounds like it used to be one. Such prefixes exist in many languages, including,French, German, and Portugese.The problem with such prefixes is that they belong to exactly one other name part, e.g., "Haas".In Dutch the part "Haas" of "de Haas" is called the significant part of that family name, since it issignificant for alphabetic sorting. Since "de" can not occur without "Haas" and "Haas" will notoccur without "de" both are linked stronger than "de Haas" and "Jongeneel".One way to handle this associativity is through nesting. With parentheses we could write "(Irma(de Haas) Jongeneel)" to show that "de" and "Haas" are associated stronger than the other parts.However, nesting is costly as it leads to significant additional complexity in the data typedefinition. Not that nesting is a bad idea per se. However, since the nesting depth appears to belimited to three levels, the generality of nesting seems to not outweigh the wimplicity of a simplelinear list.There are other ramifications though, such as prefixes that consist of more than one part such asin French "Eduard de l’Aigle". Here "de l’" is one prefix that consists of two parts and thatconnects to the significant part without spacing. To make things more complex we have to realizethat "de l’Aigle" is in fact a contraction of "de-la-Aigle". But we decide not to deal with this kindDRAFT version 1.0 22 Mar 1999101


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTof lexical variations. It is probably safe to consider "de l’" as one prefix that binds strongly to thefollowing significant name part.Thus we could go without nesting by using special name part flags "prefix". Prefix means thatthis name part binds strongly to the following name part and we consider it to bind without space.Let’s try how that feels:de Haas(PN(PersonNamePart :value "de ":classifiers (SET prefix))(PersonNamePart :value "Haas":classifiers (SET family)))Note that "de " contains a literal space. Alternatively we could define flags for prefix-with-spaceand prefix-no-space, but this would just make things more complex. As a rule we say that namepart prefixes associate without space to the following name. If a space is required, it must beincluded in the name part. See the white space rules above [p. 99] .Eduard de l’Aigle has a prefix that includes no spaceEduard de l’Aigle(PN(PersonNamePart :value "Eduard":classifiers (SET given))(PersonNamePart :value "de l’":classifiers (SET prefix))(PersonNamePart :value "Aigle":classifiers (SET family record)))InversionThis method is challenged when we want to capture a inverted name form such as "Haas, de,Irma" used in a phone book or in bibliographies. Here we lose the strong association between tothe prefix."de" and the its significant name "Haas". The prefix is postponed after the significantname "Haas", there is even an intermittent comma, and, to make things even worse, the spacingof "de" is different ("de" vs. "de "). It’s a matter of finding the most elegant solution. You canalways argue about elegance of course.Haas, de, Irma102 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFT(PN(PersonNamePart :value "Haas":classifiers (SET family))(PersonNamePart :value ", ":classifiers (SET delimiter))(PersonNamePart :value "de ":classifiers (SET prefix inverted))(PersonNamePart :value ", ":classifiers (SET delimiter))(PersonNamePart :value "Irma":classifiers (SET given)))Here we say that the prefix "de " (with trailing space!) is inverted. The computer knows now thatthe prefix is associated with some preceeding stuff. The rule is: An inverted prefix associates tothe nearest preceeding name part that is not a delimiter. Furthermore, the rule for printing thename is: Trailing literal white space is to be removed from inverted prefixes.For Eduard de l’Aigle this works likewise:Aigle, de l’, Eduard(PN(PersonNamePart :value "Aigle":classifiers (SET family))(PersonNamePart :value ", ":classifiers (SET delimiter))(PersonNamePart :value "de l’":classifiers (SET prefix inverted))(PersonNamePart :value ", ":classifiers (SET delimiter))(PersonNamePart :value "Eduard":classifiers (SET given)))To completely cover all ramifications we can further undo the contraction "de l’A..." to "de la":Aigle, de la, Eduard(PN(PersonNamePart :value "Aigle":classifiers (SET family))(PersonNamePart :value ", ":classifiers (SET delimiter))(PersonNamePart :value "de la":classifiers (SET prefix inverted))DRAFT version 1.0 22 Mar 1999103


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT(PersonNamePart :value ", ":classifiers (SET delimiter))(PersonNamePart :value "Eduard":classifiers (SET given)))However, this decomposition and contraction of "de la " to "de l’" and viceversa is outside the scope of <strong>HL7</strong>. This is rarely taken proper care of even in phone books orbibliographic databases so that hardly any <strong>HL7</strong> application will need to care.Echtgenote van, née, geb.As we said earlier, when Irma shows up in a court of law, she might be calledIrma Corine de Haas e.g. Jongeneel(PN(PersonNamePart :value "Irma":classifiers (SET given record))(PersonNamePart :value "Corine":classifiers (SET given record))(PersonNamePart :value "de ":classifiers (SET prefix)))(PersonNamePart :value "Haas":classifiers (SET family record birth)))(PersonNamePart :value "e.g.":classifiers (SET prefix weak))(PersonNamePart :value "Jongeneel":classifiers (SET family record spouse))The "e.g." behaves pretty much like a prefix. It is not "significant" it associates with thefollowing name part. The difference is that the association is weak. A weak association of aprefix or suffix means that the prefix might be dropped. It is still a prefix, which means that itmoves wherever the following name part moves, but a weak prefix could be omitted.Note that a weak prefix may be followed by a (strong) prefix, such as in "Gerard Jongeneel e.g.de Haas". Note also that if a weak prefix is followed by a name part which in turn is followed byan inverted (strong) prefix, the inversion would be undone by insertion of the (strong) prefixbetween the weak prefix and the significant name part. Contemplate "Jongeneel, Gerard e.g.Haas, de" as an example.In "Claudine de l’Aigle née Dubois" and "Dorothea Schadow geb. Riemer" "née" and "geb."formally behave just like the "echtgenote van", i.e. they are weak prefices. However, note that thesemantics is reversed. Echntgenote van means "spouse of" while née and geborene means "born"in French and German respectively.104 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTClaudine de l’Aigle née Dubois(PN(PersonNamePart :value "Claudine":classifiers (SET given record))(PersonNamePart :value "de l’":classifiers (SET prefix)))(PersonNamePart :value "Aigle":classifiers (SET family record spouse)))(PersonNamePart :value "née":classifiers (SET prefix weak))(PersonNamePart :value "Dubois":classifiers (SET family record birth))The semantic difference between née and e.g. is not important since the classification of nameparts into birth vs. spouse are non-ambiguous.NicknamesLet’s play a little bit with nicknames. I know Bob Dolin as "Bob", but at <strong>HL7</strong> he is enrolled as"Robert Dolin" and on papers he calls himself "Robert H. Dolin". This is no big deal, since wehave three distinct name forms that we decided to threat as separate Person names without tryingto relate those name parts accross the variants.The following is the first example of a complete Person Name structure.Bob Dolin, Robert Dolin, or Robert H. Dolin(SET(Person_name:value (PN(PersonNamePart :value "Bob":classifiers (SET given nick))(PersonNamePart :value "Dolin":classifiers (SET family))))(Person_name:value (PN(PersonNamePart :value "Robert":classifiers (SET given))(PersonNamePart :value "Dolin":classifiers (SET family))))(Person_name:value (PN(PersonNamePart :value "Robert"DRAFT version 1.0 22 Mar 1999105


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT:classifiers (SET given))(PersonNamePart :value "H.":classifiers (SET given initial))(PersonNamePart :value "Dolin":classifiers (SET family)))))we did not classify the person name variants here, since this would open up another can ofworms. It almost seems like there is a gradual scale of formality which tells which of the variousperson names to use.Degrees of formality may be relevant, but are not yet handled in the <strong>HL7</strong> data type. Otherexamples are: sloppy (Kiki), familiar (Kathy), nick (Kathrin), of record (Katharina) highlyofficial (Ekatharina). We need input from Japan on that. Note also the "Bob Dolin" exampleabove.Let’s take Woody Beeler. Woody is known as "George (Woody) W. Beeler" in the <strong>HL7</strong>membership data base. This parenthesis is an interesting construct that we might want to cover abit more semantic and a bit less literal. The way Woody would pronounce this example isprobably: "My name is George W. Beeler, but call me Woody." The parentheses are just a styleto print the name badge. Actually the <strong>HL7</strong> name badge looks like:WoodyGeorge W. BeelerWe do not allow line breaks in person names, instead of literal parenthesis or line breaks, wesuggest a semantic markup using the callme name part classifier.George (Woody) W. Beeler(PN(PersonNamePart :value "George":classifiers (SET given))(PersonNamePart :value "Woody":classifiers (SET callme))(PersonNamePart :value "W.":classifiers (SET given initial))(PersonNamePart :value "Beeler":classifiers (SET family)))106 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTTwo different applications could now use the same name variant to produce a name badge for an<strong>HL7</strong> meeting and to print the <strong>HL7</strong> membership directory. The rule for the badge application is: ifthere are "callme" name parts, print those in big and fat, and print all the other names below,except those names that are classified only as "callme". For the electronic membership directorythe rule would be: print all names in order and use put callme-only name parts in parentheses.Incomplete ClassificationLet’s take some example where we just can’t classify the names. Consider "Iketani Sahoko". Ofcourse, if you know some Japanese you will know that Sahoko is a Japanese female and "Iketani"is her familiy name. But let’s assume you don’t know that :-). All you have is an unconsciousgirl wo has the name "Iketani Sahoko" printed (in latin letters) somewhere on her purse.Iketani Sahoko(PN(PersonNamePart :value "Iketani")(PersonNamePart :value "Sahoko"))You now send this name without any classifier. The point is that you can not tell which one is thegiven name and which one is the familiy name. If you guess from the order (given name = firstname) you are wrong. So, if in doubt, why being forced to guess? Of course, most data bases willforce you to guess. But this wild guess can be done by the receiving <strong>HL7</strong> interface just as well asby a unknowledgeable human. Later, when you learn more about your ptient, you can enter thecorrect classification:Iketani Sahoko(PN(PersonNamePart :value "Iketani":classifiers (SET family))(PersonNamePart :value "Sahoko":classifiers (SET given)))<strong>HL7</strong> v2.3 CompatibilityThe XPN data type of <strong>HL7</strong> version 2.3.x may serve as a validation to see what other name typesor name part types may be needed. Of course, there is also the issue of compatibility betweenversion 2 and version 3 of <strong>HL7</strong>.One problem with mapping those name type codes between v2.3 and v3.0 is that our new personname type is structurally different from the old one. It is not possible, therefore, to simply reusethose codes without further thoughts.DRAFT version 1.0 22 Mar 1999107


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThe following table shows v2.3.x person name type codes. The right most column determineswhether a code stand for an inherent meaning of a name (part) or for its purpose.code meaningALDMaliaslegaldisplaymaiden namecommentsC adopted inherent meaningB name at birth inherent meaningPname of spouseinherent meaning(name taken from)U unsepcified ?? (obsolete)<strong>HL7</strong> v2.3 XPN name types.purpose, a person uses different aliases or pseudonymes in differentcontexts (i.e. when refering to himself as an author of a book, anactor, your friend, a customer in a bank, or a patient in a hospital.purpose, this is the name of public record (if any) Such records do notexist in all countries. In Germany legal names definitely exist, I amnot so sure about the U.S.purpose: for the purpose of "displaying"; however, this is quite vague.See below.inherent meaning, but there are also quite pragmatic implications. Seebelow.The first issue is that the old person name had a bunch of fixed slots and a name type codeaffecting the interpretation of data found in all slots. Our new type has name parts wich areindividually classified and it has a purpose code for name variants which affect all name parts ofthe name variant. The semantics of the name parts, i.e. what those parts are, is described entirelyin the name part classifiers. Each name variant has a certain use case, purpose or context.We have not retained the term "alias," for three reasons. First, one main assumption of our newapproach to person names is to support different name variants, where every variant is baiscallyan alias for a person. Thus there is no need to further qualify that. Second, the term "alias" has anegative connotation (e.g., only thieves and other bad guys need aliases.) Third and finally thereare different kinds of pseudonymes that we may want to indicate positively, i.e. artist’s names(writer and stage names), indigeous (tribal) names, and religious names.In opposition to aliases, in some countries there are legal acts of name changes. In Australia, forinstance, this is called "deed poll".108 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTIn Germany such name changes happen under exceptional conditions only and are always subjectto official recording. The naming system in Germany is quite tightly regulated and you are notsupposed to use any other name, except in certain situations where one would expectpseudonymes (e.g., book authors, actors, etc.)In the U.S., however, name changes seem to be more frequent than in Germany and the namingsystem is less regulated as in Germany. One issue that one would need to clarify is the meaningof "legal" name. Legal name, obviously, has different meanings in different countries, dependingon how the naming system is regulated.The concept of display name was vague all along. The question is what display? The whole ideaof names is that they are "displayed" on paper, computer screens, and in spoken language. Theuse case of display names thus is not clear. Basically there is no longer a need to have a nametype "display name" in our new person name type. This is so, because we no longer distort thenatural (or purposeful) ordering of the name parts by requiring name parts to be put in differentslots. Name parts occur in some order that is defined or selected by someone, either the holder ofthat name or the computer system, or the citation style guide, etc.Some names are used in Licenses or other accreditations and it is quite important to record thename as such. Examples are: school records, graduation certificates, license to practice aprofession, etc. Notably, women who had a Doctoral degree were the first ones who assumeddouble names in Germany many decades ago. The reason was that their dissertations andcertifications were issued for their maiden names. Later on, when those women married theywould have lost their certifications by switching their family names entirely.In many cases, keeping a name history is enough. However, the license name type allows one toindicate the reason why a certain name is still kept in the history, i.e., in this case, because it ismentioned in a license or record.Maiden name, name at birth, name of spouse, adopted name, and the like.This was a very difficult discussion, where a lot of arguments were exchanged but where peoplealso said they could not even see the issue being so lively discussed.Let’s put this into historical perspective.In versions 2.1 and 2.2 of <strong>HL7</strong> there was no name type code at all, and the only place a "maiden"name was even mentioned was "PID-mother’s maiden name". There was obviously no place tospecify the patient’s maiden name. This seemed to be somehow less of a problem in the U.S., butit was definitely a problem in Germany, which is why <strong>HL7</strong> Germany redefined mother’s maidenname to patient’s maiden name.DRAFT version 1.0 22 Mar 1999109


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThen came the name type code, and with it came the maiden name type code. The meaning ofwhich was clear at that time, since there was just the maiden name and adopted name. It probablywas not quite clear what would happen with a female that was adopted at 5 years, had a familyname before and switched the family name through adoption and later married and switched thename again. We had a way to express the name she had after adoption, we were able to specifythe name befor marriage, which in this case are the same! Two ways to specify the same name,but on the other hand, there was no way to specify neither the name before adoption, nor thename after marriage. Which is pretty odd, but, again, didn’t seem to matter very much.The famous Dutch name change initiative that started with a Sermon by John Baptist in summer1997’s meeting in San Francisco (or was it Tampa?), was the major driving force for bringing in"birth" name and "spouse" name types. As far as I know, the rationale was not to address theoddities mentioned in the last paragraph. Rather, the issue was that "maiden" seemed to imply"female before marriage" or even stronger cultural connotations. Since the people of theNetherlands have long had a very reasonable and free culture, the Dutch did away with thosesexist traditions long before the rest of the world even realized the issue.So the driving force behind "birth" name was to open up the narrow sense of "maiden". In thatsense, "birth" was clearly meant to subsume "maiden".The "spouse" name type on the other hand was meant as kind of the antonyme of "birth". Theabove examples around Irma Jongeneel are an extensive description of the dutch naming systemwhich essentially explain why "birth" and "spouse" name types are so important in theNetherlands. It is all because a married (or otherwise officially associated) couple of persons (notnecessarily of opposite gender), will sort of combine their family names while both names remainas independently useful family names. That’s why birth name would get the "birth" classifier andthe name of the spouse would get the "spouse" classifier.From that perspective it seemed like "maiden" was subsumed by "birth", as a way to express thesame concept with less sexist connotations.But this was everything else than agreed to by everyone.It turned out that the dutch reform has created more different notions than was originallyexpected. For example, again, what happens if someone changes his/her name before marriage?We finally decided that "maiden" and "birth" should not be merged, in parts, because "maidenname" is a cultural entity that may not exist in the Netherlads but still exist in many computersystems.We made the observation that the above mentioned name types have different "directions" ofmeaning in time. They do not so much express what any name part is semantically, since familynames are family names, but they try to capture how names come about. Dawid added, that thosename types not only capture how names came about, but also, how names ceased to be used.110 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTIn the "ancient" U.S. name system of the 1950s and the German name system that losened uponly recently the issues were simple. For instance, my wife’s name is "Dorothea Schadow" buther maiden name is "Riemer".Riemer lifetime|CURRENT---> SchadowIf we mention the maiden name of my wife, we indicate that this maiden name, "Riemer", wasused for her before she assumed my family name, "Schadow", through marriage. So her currentname is "Schadow" and will remain "Schadow" for the unforseeable future. Her family name was"Riemer" but no longer so. Now, it is just her maiden name. Thus, "maiden" name seems not toexplain how the name "Riemer" came about, but it tells how the name part "Riemer" ceased to beused.From the perspective of this very traditional naming scheme "maiden" and "current" is all youneed to distinguish. And indeed most existing information systems are build based on thistraditional misconception. No matter how strongly we may insist in this through our data basedesign, this is not how the world really works.Since "maiden" is a term routed in the traditional patriarchal system, we can define "maiden"name as:A "maiden name" is the surname of a woman before she marries.at lest, this is what Webster’s has to say about "maiden name". Clearly, this notion appearsarchaic today. But still ADT system’s data bases, data entry forms and even application logicsometimes is built on this misconception.Again, the Dutch people are the avant-garde of a more reasonable approach to looking at things.In the dutch naming system the "directions" are different, as Irma’s example showed that"maiden" is not an issue here:|BIRTH---> de Haas-----------------------+------------------------------> lifetime|SPOUSE---> JongeneelIn the Dutch system, all name parts point forward. The name types explain how name parts cameabout, not how they ceased to be used.From that perspective, "maiden" and "birth" do have different meanings. In the Dutch system theentire concept of "maiden name" simply does no longer exist. In Germany and the U.S. it stillexists.DRAFT version 1.0 22 Mar 1999111


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTOne could assume that maiden marks a name that ceased to be used, but this position seems to beno consensus. At the most I would open up the concept of "maiden name" to be less sexist so thatI would like to see the definition to read as follows:A maiden name is a name part that a person had immediately before this person’s firstmarriage and that was given up due to that marriage.By "marriage" I understand any kind of "culturally accepted personal association between humanbeings." This is open enough to include the wildest things as long as they are accepted in thatculture (not necessarily accepted in other cultures). This includes homosexual marriages, religous(non-civil) marriages civil (non-religious) mariages; simply anything that causes someone to giveup some of his/her name parts.This is not just semantic talk. Practical connotations to a name part classified as "maiden" wouldbe "don’t use it", except in special circumstances or with special prefixes.What happens if someone get’s married and does not change her/his name?From my perspective this is simple: "maiden name" simply does not apply.However one can argue the other way: since "maiden" means young unmarried girl, you do havea maiden name even though you might have never gave up your name. Notably every maidenwould have just a maiden name. Every unmarried person would have only a maiden name. Hereit all depends on whether we think of names as slotted parts or as tagged parts. If name parts areslotted in data fields, the maiden name of a maiden is duplicated:Pippi Langstrumpf(SlottedName:given-name "Pippi":current-name "Langstrumpf":maiden-name "Langstrumpf")In our new system, however, we tag names without duplications:Pippi Langstrumpf(PN(PersonNamePart :value "Pippi":classifiers (SET given))(PersonNamePart :value "Langstrumpf":classifiers (SET family maiden (current))))112 22 Mar 1999DRAFT version 1.0


3.3.3 Person NameDRAFTWhat it all boils down to is the following problems:How do we map to and from slotted name structures?Do we have to adjust our model 100% to those flawed name categories that do not even holdin those cultures where they are most used? If so, how?We gradually assumed the following rationale: birth name is the name you have at birth. Maidenname is the name you have just before your first marriage. An "Adoption name" is a name youhave since you have been adopted (Beware of the ambivalence with "adopted name").The immediate question becomes: what happens when you marry a second time? What if you areadopted after you first married (this can be done in some countries)? For me the question is, howmany reasons of name changes do we have to capture? When is it enough to just keep a history ofnames?How many different events in a life do we want to recognize as having special name codes?The answer is proably: "it depends". In Some cultures becoming a widow is a reason for a namechange. In others you might change names as you give birth to children. You might also changenames as you enter a religious community (e.g., as you become a monk, or a pope :-) Do we wantto keep track of all this? Probably, it all depends.For <strong>HL7</strong> we have to stick to practical use cases. However, if we design the name data typeaccording to a majority of existing information systems, we would still get stuck with the"first-m.i.-last" name pattern. A lot of the argument about maiden name was due to existingsystems that either require a certain input or give a certain output. What should we do?In general, we can recommend to consider only using the Dutch system, where we have a1. name part at birth.2. name part assumed through adoption (name of adopting parent)3. name part assumed through deed poll (free change of name)4. name part assumed through marriage (name of spouse)Except from birth name, all other name change events may happen in arbitrary order and mayrepeat. All the rest is covered in a history. When you have a new name and you want to map to anold-stlye slotted name do the following to determine the maiden name:1. If there has been no change of family name since birth, use that one and only family name atbirth as the last name.2. If a name part in question is taken from a spouse do not use this as a maiden name.DRAFT version 1.0 22 Mar 1999113


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTIn other words, the maiden name is the family name in the history that was not assumed fromspouse. Dealing with adoptions and deed polls is difficult, however, those things are not takencare of by the usual slotted name types anyway, so why bother?The only strong rationale to keep maiden name is because mapping from a traditional slottedname structure to the new name style is difficult. With a "maiden name" you don’t actually knowwhether this name was used already at birth "birth" or came only through "adoption" or "deedpoll". There is considerable overlap with the unmarried name classifier and the other classifiersof Axis 2. Consequently we had to relax the notion that axis 2 classifiers need to be mutualexclusive.InitialsWe recognized the the term "initials" may have slightly different meanings in an internationalcontext. In the Netherlands "initials" are all the first letters of your given names and familynames as you choose.In Holland there is also the concept of voorletters which are the first letters of the given names.In Holland adults are normally recorded only using their voorletters and family names. This issimilar to the vancouver citation style that never spells out first names.However, we confirmed that the term "inital" means first letter (of whatever), regardless of givenor family name. The beautiful initials that start a chapter of medieval books are called "initals"too (e.g., the Schwabacher initals). When "initals" is used in the plural form in context of namesand signatures, it usually refers to all the initials of given and family names. It is then used as ashort form of a signature.A typical dutch name using only voorletters would be recorded as a person name variant. Wewould not need to associate initals with spelled-out name parts.Academic titlesAcademic titles and professional credentials are like voorveugsels and noblety titles on axis 4.You can classify academic degrees and professional titles as suffixes or prefixes. This keeps trackof the problem that "PhD" and "MD" are suffixes but "Dr." and "Prof. Dr. med. Dr. phil. h.c." areprefixes.3.3.4 Organization NameWe need much less flexibility and power with organization names. We considered what might beto organization names:114 22 Mar 1999DRAFT version 1.0


3.3.4 Organization NameDRAFTDifferent name parts, such as "Hewlett-Packard" vs. "HP" vs. "Inc.", "Co.", "Ltd.", "B.V.","AG", "GmbH", etc."Marriage" of companies and trading of divisions, thus, UNIX was a trade mark of AT&T,then USL, then Novell, and who knows. "Daimler" and "Crysler" are now"Daimler-Crysler" and "Behring", a manufacturer of vaccines, is known or subsumed bysome other name in the U.S.Anyway, we concluded that noone really keeps track of those things, so all we need is anorganization name string and, perhaps, a name type code. <strong>HL7</strong> v2.3 had a name type code tablefor organization names (XON) including:Organization NameType Codes(adopted from <strong>HL7</strong>v2.3)LADlegalaliasdisplayST stock exchangeDisplay name has no defined use, since names are always displayed and it begs the question"whose display?". I wonder whether anyone in healthcare would want to include the Wall Streetticker symbol or the Indianapolis Star newspaper’s abbreviation of a manufacturer of vaccines.But there is no reason why we should restrict this existing "feature" of version 2.3.All in all this is not a very controversial or important issue. So, unless there is any significantobjection we can just stick to a v2.3-like solution.Organization Name (ON)A collection of organization name variants.SET OF Organization Name Variant [p. 115]DRAFT version 1.0 22 Mar 1999115


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTOrganization Name VariantThis type is not used outside of the Organization Name [p. 114] data type. Organization Namesare regarded as a collection of organization name variants each used in different contexts or fora different purpose.componentnametype/domain optionality descriptiontype Code Value optionalvalueCharacterStringmandatory3.4 Technical Concepts and the Code ValueA type code indicates what an organization nameis to be used for. Examples are: alias, legal,stock-exchange.This contains the actual name data as a simplecharacter string.The Code Value data type is the basic building block for referring to concepts, both technicaland real world concepts. A Code Value is essentially a symbol with all contextual informationnecessary to interpret that symbol, i.e. the literal and the code system that defines a given literal.116 22 Mar 1999DRAFT version 1.0


3.4 Technical Concepts and the Code ValueDRAFTCode ValueA code value is exactly one symbol in a code system. The meaning of the symbol is definedexclusively and completely by the code system that the symbol is from.componentnamevaluecode systemcode systemversionprint namereplacementFor exampletype/domain optionality descriptionCharacterString [p. 40]a code byitselfCharacterString [p. 40]CharacterString [p. 40]CharacterString [p. 40]required this is the plain symbol, like "784.0"required,can befixed bycontextoptionaloptionalconditional,iff value isnot setdenotes the code system that defined the plainsymbola version descriptor defined specifically for thegiven code system.a sensible name for the code as a curtesy to aninterpreter of the message. THE PRINTNAMEHAS NO MEANING, it can never be sent aloneand it can never modify the meaning of the codevaluea name for the concept to be used in case that theconcept is not codeable in the specified codingsystem. If the value attribute is set, thereplacement attribute MUST NOT be set. In noway can a replacement string modify themeaning of the code value(CodeValue:value "text/html":codeSystem "MIME-TP")would refer to the technical concept "HTML media type", while(CodeValue:value "784.0":codeSystem "ICD9 CM")DRAFT version 1.0 22 Mar 1999117


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTwould refer to the real world concept "headache" as defined by ICD9 CM (i.e., in ICD9 CM, thisconcept of headache does not include the concept of "tension headache", 307.81).Technical concepts will be referred to simply by using the Code Value. The Code Value will alsobe used as the building block for more complex real-world concepts.The code system is a mandatory component of the Code Value data type. However, in a givenmessage it need not be sent, if it is fixed by the context. For example, in an <strong>HL7</strong> message headerfield designating the event code, only one coding system is allowed, i.e. the <strong>HL7</strong> event code. Itwould only be redundant to send a code system identifier for a code value in that place.It is recommended that <strong>HL7</strong> interface software that knows about the default code system fill inthe default code system component before handing the Code Value to the application layersoftware. The strong binding to the field in the message header may get lost while the message isprocessed, and thus the default code system may no longer be inferable later.In fact, an implicit type conversion rule exists between Character String and Code Value. If in agiven field is declared as a Code Value with a mandatory code system, but the message containsa Character String in that field, the character string found is taken as the value part of a CodeValue and the mandatory code system is taken as the code system identifier. An exception israised when the supplied character string is not a defined symbol of the mandatory code system.The above conversion rule allows to build concise messages with code values, just like the <strong>HL7</strong>v2.x ID data type allowed one to do.3.4.1 Outstanding IssuesThe code system obviously is by itself a technical concept identifier. If we are going to usethe data type Code Value for concept identifiers, we have a recursive type definition. Recursion isnot a bad idea in general, but the question is: what terminates the recursion?If <strong>HL7</strong> maintains a list of coding schemes and defines symbols for any one of those schemes, wecan circumvent this problem of recursion by defining the component named code system as asimple Character String. We can continue to use the code system register that was used with<strong>HL7</strong> v2.x.What happens if <strong>HL7</strong> outsources its code of coding systems? What happens if there are multiplecodes of coding systems (e.g. suppose the CEN coding system registry standard becomes an ISOnorm?)<strong>HL7</strong> could for all times maintain its registry of coding systems. And if <strong>HL7</strong> will outsource themaintenance of the registry of coding systems in the future, it would always require only onebackward compatible registry to be used. If we believe that <strong>HL7</strong> will for all times maintain itsown registry of coding systems, we could shortcut any recursion and instead use a Character118 22 Mar 1999DRAFT version 1.0


3.4.1 Outstanding IssuesDRAFTString.[An alternative would be to use ISO Object Identifiers as coding system identifiers.]The code system version is used as a refinement of the code system descriptor.Logically, any version information is useful only together with the code system identifier.The hard difference between a code system name and a version is problematic. For instance, thequestion is, whether "ICD" is the code system name and "9" or "10" is the version? If so, whatabout the derivatives of ICD-9 (e.g., ICD-9-CM) and ICD-10 (e.g., ICD-9-PCS)? What about theminor versions where a few codes are taken out or brought in every now and then? If we defineall coding systems in a special <strong>HL7</strong>-maintained table, we would not need to use a separateversion identifier, because the <strong>HL7</strong> code system registry could simply define a new code systemsymbol for every new major and minor version of every code system.A possible policy to some of this is: whenever a code system changes in an incompatible way,such as between ICD-9 and ICD-10, there will be a new entry in the <strong>HL7</strong> registry and thus a newcode system identifier will be created. Different versions would only be used for changes that arecompatible.It would not matter how the other organization calls an update of their coding system. Forexample, WHO speaks about "International Classification of Diseases, 9th revision" but <strong>HL7</strong>still considers this another coding system, not just another revision or version of basically thesame code system. By contrast, when LOINC updates from revision "1.0j" to "1.0k", <strong>HL7</strong> wouldconsider this to be just another version of LOINC, since LOINC revisions are backwardscompatible.How can we assure that the stuff people will put into the version component is standardized andinteroperably useful?<strong>HL7</strong> would still have to make sure that the true version identifier of LOINC 1.0j is either of"1.0J," "1.0j," "1.0-J," "1.0 j," but not just any of those. While the organization whomaintains a code system will have their own version numbering scheme, they will not defineunambiguous exact string representations for their revision ids. And <strong>HL7</strong> can not expect them todefine precise character string representations for their version identifiers. Thus, <strong>HL7</strong> has tomaintain a list of the version identifiers for each code system, or at least a set of clearly definedrules about how the version identifying string can be inferred from the version id used by theother organization.Unregistered local coding schemes have been the cause of a lot of trouble in the past.Laboratories, whose main concern is not <strong>HL7</strong> update their code system ids quite frequently andwithout caring for backwards compatibility. This places a lot of burden on the shoulders of <strong>HL7</strong>communication system managers. This burden would not be easier, but heavier, if everyideolectic coding scheme that changes ever so often would have be registered with <strong>HL7</strong>.DRAFT version 1.0 22 Mar 1999119


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThe answer could be to say that locally defined coding systems do not have any meaning outsidethe defining organization. Thus, there is no point in registering anyway. As long as the codingsystem identifiers do not collide with the <strong>HL7</strong> defined code system identifiers, it wouldn’t matterif there are code system name conflicts between different sites for their local code systems.Traditionally, <strong>HL7</strong> defined the letter "L" to stand for any local system, or, if more than one localcode system exists at a given site, to name those "99zzz" where z would be a digit. We can loosenthis constraint a little bit by saying that every code system name starting with "99" be local.3.5 Real World ConceptsThe old CE data type and its interim proposed successors (with various names LCE/CWE andCE/CNE) were basically one pair of Code Value [p. 116] plus a free text string that could be usedto convey the original text in an uncoded fashion.The new data type for real world concepts is essentially a generalization the CE. The ConceptDescriptor is defined as a collection of Code Values [p. 116] with one, two, or more codes.There is an important difference for the semantics of a collection of Code Values [p. 116] . Twothose semantic flavors exist:1. A collection of quasi-synonyms, i.e. codes that have been selected from different codingsystems in order to convey the same meaning.2. A collection of codes, possibly from the same coding system, that modify the overallmeaning.Both flavors of collections of code values will have to be supported by the new data type for realworld concepts. An example from <strong>HL7</strong> v2.x is the "specimen source code" in the OBR-Segment,which was such a conglomerate of quasi-synonyms and modifiers.The Concept Descriptor supports the two kinds of collections of Code Values without mixingthem all together. The Concept Descriptor data type therefore is a rich nested structure, whosecomplex structure reflects the complexity of the task it has to perform.There may be a requirement to the new data type for real world concepts to keep track of thesystems which perform translations on those codes. Thus, every code value could be annotated bywhom, when and how a particular quasi-synonymous code value was added to the collection ofquasi-synonyms.When codes are translated to other codes of other code systems, the original meaning isnecessarily distorted. Thus, it does matter which translation occurred based on which prior CodeValue. The new data type Concept Descriptor keeps track of the order in which translationswhere performed and on the quality of those translations.120 22 Mar 1999DRAFT version 1.0


3.5 Real World ConceptsDRAFTThe Concept Descriptor [p. 122] is basically a partially ordered set of Code Translations. Everycode value is considered one translation. The first code value is the translation from the originaltext to a code value. Other translations to other code systems may be added to the conceptdescriptor either based on code values already in the set of translations or from the original text.Every translation refers to the the translation that it is based on.Codes and their modifiers are collected in a Code Phrase [p. 124] . The code phrase is anintermediate level between Code Value and Code Translation. That means that every CodeTranslation contains an entire Code Phrase. Examples are given after the formal definitions of theinvolved data types.DRAFT version 1.0 22 Mar 1999121


3.5.2 Code Translation3.5.2 Code TranslationDRAFTCode TranslationThis data type holds one code phrase as one translation in a set of translations describing aconcept. The additional information in this data type points to the source code used in thetranslation process and describes who or what performed the translation and what the quality ofthis translation is.componentnametermoriginproducerqualitytype/domain optionality descriptionCode Phrase [p.124]reference toCodeTranslation[p. 123]TechnicalInstanceIdentifier [p. 65]FloatingPointNumber[0..1]requiredrequiredoptionaloptionalAll the meaning of the translation is found here, therest is descriptive stuff.This is the code in the list of translations on whichthis translation was based. This is a requiredcomponent which means, whoever adds anadditional translation must reference the sourcecode. No reference here means that the giventranslation is the original code.This identifier tells what system performed thetranslation. This information can be useful to auditthe translation process or to estimate the quality ofthe term based on prior experience with thetranslation of a given producer. This identifierrefers to some system, not a particular humancoding clerk.An estimation of the translation quality. This is avalue between 0 and 1, where 1 stands for anabsolutely accurate translation and 0 stands forrandom fuzz. We do not require a special methodto be used here to estimate the quality. This canjust be a subjective estimation of the form we usein eliciting probabilities for a belief network. Butwe can recommend some example methods of howthose values can be computed. We can also map allother quality estimations mentioned in theliterature onto the interval [0..1] of real numbers.DRAFT version 1.0 22 Mar 1999123


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT3.5.3 Code PhraseCode PhraseA code phrase is a list of code values which all together make up a meaning. This can be usedfor example in SNOMED, where you can combine multiple codes into a new compositemeaning. <strong>HL7</strong> used to combine codes and modifiers for the OBR specimen source. And HCFAprocedure codes also come with modifiers.3.5.4 ExamplesORDERED LIST OF Code Value [p. 116]The following example is completely made up. None of the mentioned code systems exist, andthe scenario is admittedly rather strange. A code value for the hair color "ash-blond" in somelocal hair color code:(CodeValue :value "AB":codeSystem "99hcc":printName "ash blond")the translation into the official WHO approved International Code for Hair Colors (ICHC).ICHC does not have a code for "ash-blond" but it has "pale-blond." So we take that one.(CodeValue :value "10.2":codeSystem "ICHC":printName "pale blond")Now, what we have are two codes that both try to describe the same concept (i.e. what thephysician has seen as the hair color). We have to build a concept descriptor that contains bothcode values, the original "ash-blond" and its translation "pale-blond" into ICHC.(ConceptDescriptor:originalText "... the patient’s hair had an ashy-blondish color ...":translations(SET(CodeTranslation :label "xlat-1-label":term(Code-Value:value "AB"124 22 Mar 1999DRAFT version 1.0


3.5.4 ExamplesDRAFT)):codeSystem "99hcc":printName "ash blond"):origin #null)(CodeTranslation:term(CodeValue:value "10.2":codeSystem "ICHC":printName "pale blond"):origin (ref "xlat-1-label"))In this example the type definition is deliberatedly "violated" in that the code phrase was not usedas the term component of the Code Translation. This demonstrates the type conversion [p. 22]feature of our type system. We can allow to send one related type for another.Suppose, the CDC is conducting a study to correlate ear infection with hair color. The PilologicalSociety of America (PILS-A) just has agreed on an Advanced Hair Color Code (AVACC), whichCDC is using for its study. This code is post-coordinated. It has the axes (1) base color (black,brown, blond) (2) gray-tone (none, slight, medium, strong) and (3) homogeneity (homogene,spotty, ... [here I could be more creative in my native language]). The translator guesses that"blond, slight, homogene" would fit best (although the original text didn’t say anything abouthomogeneity). So we add that other translation:(ConceptDescriptor:originalText "... the patient’s hair had an ashy-blondish color ...":translations(SET(Code-Translation :label "xlat-1-label":term(CodeValue:value "AB":codeSystem "99hcc":printName "ash blond"):origin #null)(CodeTranslation :label "xlat-2-label":term(CodeValue:value "10.2"DRAFT version 1.0 22 Mar 1999125


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT)):codeSystem "ICHC":printName "pale blond"):origin (ref "xlat-1-label"))(CodeTranslation:term(CodePhrase(LIST :of "Code-Value"(Code-Value:value "B001":codeSystem "PILS-AVACC":printName "blond")(CodeValue:value "G002":codeSystem "PILS-AVACC":printName "slight gray")(CodeValue:value "H001":codeSystem "PILS-AVACC":printName "homogene"))):origin (ref "xlat-2-label"))Because the translation program interXhair TM does not know about the local code "99hcc", it canonly translate from the ICHC term.The features quality and producer of a translation are not shown in the above example here.The Concept Descriptor can also deal with coding exceptions. The distinction between "codewithout exceptions" and "code with exceptions" was proposed before and we should make surethat we capture the requirements that this proposal tries to address. An exception in this system ofcoding and translating occurs if some particular quality that was observed can not be coded in aparticular coding system.For example, 46 year old Jane Jammer comes into Dr Doolittle’s office with the complaint of anitchy sensation in her gut, but it is not quite painful. On the question where that sensation islocated exactly, Mrs. Jammer points to her upper left abdomen but then draws a circle that coversabout everything.126 22 Mar 1999DRAFT version 1.0


3.5.4 ExamplesDRAFTSo Dr. Doolittle tries to code this chief complaint using a Multiaxial Code for Primary CareMedicine (PRIMAX). PRIMAX might have an axis for sensation (S) and location (L). The doctoris lucky to find 123 "ABDOMEN" as a fairly general descriptor for the location. But the doctorfinds only "pain," "numbness," "tension," "heat," and "cold" as sensations. So where does the"itchy but not quite painful" sensation go into? Unfortunately this code does not come with thecategory not otherwise classified (NOC) not otherwise specified (NOS) or just other that manyclassification systems (like ICD) have. So, the physician can not code that chief complaint of hispatient.The physician writes down the following:(ConceptDescriptor:originalText "... an ’itchy’ feeling in her ’guts’ that is notquite painful ...":translations(SET(CodeTranslation:term(CodePhrase(LIST :of "CodeValue"(CodeValue:value #other:codeSystem "PRIMAX":replacement "itchy feeling, not painful")(CodeValue:value "L-123":codeSystem "PRIMAX":printName "abdomen"))):origin #null)))DRAFT version 1.0 22 Mar 1999127


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT3.5.5 Outstanding IssuesThe special value #null means a value (NoInformation) of the No Information [p. 31]data type without a null flavor. The special value #other stands for(NoInformation :flavor "other")In order to fully support this, we need canonical taxonomy of flavors of null.In the above example, PRIMAX is a multiaxial code, it has sensation (S), location (L), and maybe other axes, like timing (T), and the situation in which the problem occurs (W). PRIMAX (likeSNOMED) does not require you to pick a value from every axis. So, no one knows what this#other in PRIMAX refers to, sensation? timing? work-relatedness?It seems to be redundant to have a code phrase such as the following(Code-Phrase(LIST(CodeValue:value "S-001":codeSystem "PRIMAX":printName "pain")(CodeValue:value "L-123":codeSystem "PRIMAX":printName "abdomen")(CodeValue:value "T-032":codeSystem "PRIMAX":printName "post prandial")(CodeValue:value "W-120":codeSystem "PRIMAX":printName "pulling a carriage")))128 22 Mar 1999DRAFT version 1.0


3.5.5 Outstanding IssuesDRAFTBecause every code here is taken from the same code system PRIMAX, one would not need tospecify PRIMAX as the code system for all those related Code Values.It also seems as if a code phrase does only make sense in certain code systems. For example, inLOINC a code phrase is pretty useless if not contradictory to the (original) style of LOINC (thathas been loosened up lately). In LOINC you would say(CodeValue:value "2703-7":codeSystem "LOINC":version "1.0K":print-name "OXYGEN:PPRES:PT:BLDA:QN")for the partial pressure of oxygen (pO 2 )in an arterial blood sample. It is certainly wrong inLOINC to say the same in a phrase that first mentions pO 2 in NOS blood (BLD) and then adds toit the modifier that the specimen was really arterial blood.(Code-Phrase(LIST :of "Code-Value"(Code-Value:value "11556-8":code-system "LOINC":version "1.0K":print-name "OXYGEN:PPRES:PT:BLD:QN")(Code-Value:value "BLDA":code-system "LOINC-SYSTEM":version "1.0K":print-name "arterial blood"))If the ability to form code phrases depends on the code system, the code system might define asyntax for literal expressions of those phrases, such as "M12345 F03847 D94578" whichSNOMED apparently suggests.DRAFT version 1.0 22 Mar 1999129


3.5.5 Outstanding IssuesDRAFTThis is not quite right, because LOINC is still not multiaxial. You would have to guess that thethird Code Value in the phrase is here to assign a value to the second Code Value, like "method:=FICK".Sometimes we need to label specific parts in a code phrase. A code phrase is just a container of aflat sequence of code values. Language has deep structure (look at Chomsky’s famous nounphrase (NP) and verbal phrase (VP))Our data type is already quite complex. If we do a recursion of the EBNF form:CodePhrase ::= { CodeTerm };CodeTerm::= CodePhrase | CodeValue;then we would be very powerful, but would also add a significant amount of complexity. We donot fear recursion here, but we do not want to create a super-powerful data type that will providethousands of ways for people to abuse its power and hardly any idea about how to use the powerproperly.DRAFT version 1.0 22 Mar 1999131


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT4 Quantities4.1 OverviewAll our quantitative concepts can be constructed by the means that mathmatics has developedduring the past 3000 years. The most fundamental and abstract quantitative concept is thenumber. There are different kinds of numbers. Primarily there are natural numbers (1, 2, ...),cardinal numbers (0, 1, 2, ...) and integer numbers (..., -2, -1, 0, 1, 2, ...). Such numbers are theresults of enumerating, counting or simple calculations (+, -, ·, ÷, mod) with integer numbers.The set of integer numbers is countably infinite and discrete.Next there are rational numbers that are constructed through division (1/2, 1/3, 2/3, 1/4, ...). Theset of rational numbers is continuous and infinite but still countable (G. Cantor). Geometry hasintroduced irrational numbers (e.g., square root of 2, pi, ...). The superset of rationals andirrational numbers is called real numbers. The set of real numbers is continuous, infinite, and notcountable.The ancient Arabs have introduced the custom to represent numbers as decimal digits where eachposition has a certain value. This Arabic numbering system was a great advance over the ancientHebrew and Greek custom to use letters as numbers, or the arcane Roman number system. WithArabic numbers one could calculate much easier.However, numbers with decimal point can only approaximate most rational and irrationalnumbers, hence, numbers with a decimal point can not be considered exact.Most computer programming languages distingush between the two data types integer andfloating point number. Some know rationals and complex numbers. Whereas <strong>HL7</strong> v2.x had onlyone data type for numbers, <strong>HL7</strong> v3 will distinguish between interger and floating point. Thisdistinction is suggested not just by technological considerations (both are implemented quitedifferently).The main reason for distinguishing integer and floating point numbers is about semantics. Integernumbers are exact results of counting and enumerating. In natural science and real life, integernumbers are rather rare. Measurements, estimations, and many scientific computations havefloating point numbers as their results, imprecise real numbers. Measurements are butapproximations to the quantitative phenomena of nature.There are other distingished quantitative phenomena that can be partially described by numbersbut which have a meaning beyond numbers. Among such quantitative phenomena are physicalmeasurements with units of measure, money, and real time as measured by clendars.132 22 Mar 1999DRAFT version 1.0


4.2 Integer NumberDRAFTThis specification defines data types for integer and floating point numbers, for physicalmeasurements, money, and calendars. There are many more quantitative phenomena that we mayor may not define data types for in the future. Examples for those we will define are vectors,waveforms, and possibly matrices. We will probably not consider complex numbers, except if aconcrete use case appears.4.2 Integer NumberInteger Number (Integer, IN)Integer numbers are precise numbers that are results of counting and enumerating. Integernumbers are discrete, the set of integers is infinite but countable. No arbitrary limit is imposedon the range of integer numbers. Two special ineger values are defined for the positive andnegative infinity.No fixed arbitrary limits on value rangePRIMITIVE TYPENo arbitrary limit is imposed on the range of integer numbers. Thus, theoretically, the capacity ofany binary representation is exceeded, whether 16 bit, 32 bit, 64 bit, or 128 bit size. Domaincommittees should not limit the ranges of integers only to make sure the numbers fit into currentdata base technology. In finance and accounting those limits are frequently exceeded (e.g.,consider the U.S. national budget expressed in Italian Lira or Japanese Yen.) Designers ofImplementable Technology Specifications (ITS) should be aware of the possible capacity limitsof their target technology.The infinity of integer numbers is represented as a special value. The representation of integernumbers is up to the ITS. In our instance notation we use the special symbol #iinf for positiveinfinity (Aleph 0 ), #niinf for negative infinity (- Aleph 0 .) Note that #niinf = - #iinf.Constraints on value rangesIn cases where limits on the value range are suggested semantically by the application domain,the committees should specify those limits. For example, the number of prior patient visits is anon-negative integer including 0.Although we do not yet have a formalism to express constraints, we should not hesitate todocument those constraints informally. We will eventually define (or deploy) a constraintexpression language.DRAFT version 1.0 22 Mar 1999133


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTITS Presentation and LiteralsWe allow integer numbers to be represented by character string literals containing signs, decimaldigits, and symbols for infinities. Implementable Technology Specifications (ITS) such as forXML will most likely use the string literal to represent integers. Other ITSs, such as for CORBA,might choose to represent integers by variable length bit strings or by choices of either a nativeinteger format or a special long integer format.We may even want to define non-decimal representations in bases 2, 8, 16, and 64.4.3 Floating Point NumberFloating Point Number (Float, FPN)Floating point numbers are approximations for real numbers. Floating point numbers occurwhenever quantities of the real world are measured or estimated or as the result of calculationsthat include other floating point numbers.componentnametype/domain optionality descriptionvalue Real Number requiredprecisionInteger Number[p. 133]requiredSemantic components vs. representational componentsThe value without the notion of precision or withan arbitrary precision. We do not specify a datatype for true real numbers of infinite precision.The precision of the floating point number interms of the number of significant decimal digits.A floating point number has the semantic components value and precision, however, this doesnot necessarily mean that any representation of a floating point number will be a structure of twodistinct components. Especially, since we do not specify a data type for true real numbers ofinfinite precision, the value component is not of an existing data type.PrecisionThe precision of a floating point number is defined here as the number of decimal digits.According to Robert S. Ledley [Use of computers in biology and medicine, New-York, 1965,p. 519ff]: "A number composed of n significant figures is said to be correct to n significantfigures if its value is correct to within 1/2 unit in the least significant position. For example, if9072 is correct to four significant figures, then it is understood that the number lies between9072.5 and 9071.5 (that is 9072 ± 0.5) [...]"134 22 Mar 1999DRAFT version 1.0


4.3 Floating Point NumberDRAFTObviously this method of stating the uncertainty of a number is dependent on the number’sdecimal representation. For binary representations we could, in principle, specify the precisionmore granularly. However, the statement that a value lies within a certain range is problematicanyway, because it begs the question about which level of confidence we assume. We will definea generic data type for probability distributions that allows exact statements of uncertainty.Sometimes the term precision is put in opposition to accuracy. Where precision means theexactness of the numeric representation of a value, accuracy refers to the smallness of error in themeasurement or estimation process. While those concepts can be distinguished, they are relatedinasmuch as we do not want to specify a higher precision of a number than we can justify by theaccuracy of the measuring process generating the number. Conversely, we do not want to specifya number with less precision than justifiable by the accuracy.In fact, there is considerable confusion around the meaning of such terms as precision, accuracy,error, etc. There is hardly a commonly accepted definition of those terms. A review of some ofthe available literature on that topic may help: the NIST’s Guidelines for the expression ofuncertainty in measurement. (http://physics.nist.gov/cuu/Uncertainty/index.html) which in turn isbased on the ISO’s International Vocabulary of Basic and General Terms in Metrology (VIM).In addition, the European standard ENV 12435 Medical informatics - expression of the results ofmeasurements in health sciences, in its normative Annex D, summarizes the NIST’s position.To summarize: NIST’s Guidelines, and ISO’s VIM regard the term accuracy as a "qualitativeconcept". Other related terms are repeatability, reproducibility, error (random and systematic),etc. All those slightly different but related and overlapping concepts have been subsumed underthe broader concept of uncertainty in a 1981 publication by the International Committee forWeights and Measures (CIPM) in accordance with ISO and IEC. The uncertainty of measurementis given as a probability distribution around the true measurement value (measurand). Given sucha probability distribution, a value range can be specified within which the true value is foundwith some level of confidence.These concepts of specifying accuracy based on statistical methods are well known in themedical profession. However, these statistical methods are quite complex, and exact probabilitydistributions are often unknown. Therefore, we want to keep those separate from a basic datatype of floating point numbers. However, floating point numbers are approximations to realnumbers and we want to account for this approximative nature by keeping a basic notion ofprecision in terms of significant digits right in the floating point data type.In many situations, significant digits are a sufficient estimate of the uncertainty, but even moreimportant, we must account for significant digits at interfaces, especially when convertingbetween different representations. For instance, we do not want a value 4.0 to become3.999999999999999999 in such a conversion, as it happens sometimes when converting decimalrepresentations to IEEE binary representations.DRAFT version 1.0 22 Mar 1999135


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTNo fixed arbitrary limits on value rangeNo arbitrary limit is imposed on the range or precision of floating point numbers. Thus,theoretically, the capacity of any binary representation is exceeded, whether 32 bit, 64 bit, or 128bit size. Domain committees should not limit the ranges and precision of floating point numbersonly to make sure the numbers fit into current data base technology. Designers of ImplementableTechnology Specifications (ITS) should be aware of the possible capacity limits of their targettechnology.The infinity of floating point numbers is represented as a special value. The representation offloating point numbers is up to the ITS. In our instance notation we use the special symbol#finf for positive infinity (Aleph 1 ), #nfinf for negative infinity (- Aleph 1 .) Note that#nfinf = - #finf.Constraints on value rangesIn cases where limits on the value range are suggested semantically by the application domain,the committees should specify those limits. For example, probabilities should be expressed infloating point numbers between 0 and 1.Although we do not yet have a formalism to express constraints, we should not hesitate todocument those constraints informally. We will eventually define (or deploy) a constraintexpression language.ITS Presentation and LiteralsWe allow floating point numbers to be represented by character string literals containing signs,decimal digits, a decimal point and exponents. An ITS for XML will most likely use the stringliteral to represent floating point numbers. Other ITSs, such as for CORBA, might choose torepresent floating point numbers by variable length bit strings or by choices of either a native(IEEE) floating point format or a special long floating point format.Decimal floating point numbers can be represented in a standard way, so that only significantdigits appear. This standard representation always starts with an optional minus sign and thedecimal point, followed by all significant digits of the mantissa followed by the exponent. Thus123000 is represented as ".123e6" to mean .123 × 10 6 ; 0.000123 is represented as ".123e-3"to mean .123 × 10 -3 ; and -12.3 is represented as "-.123e2". to mean -.123 × 10 2 .The reason why we define decimal literals for data types is to make the data human readable. Torender the value 12.3 as ".123e2" is not considered intuitive. The European standardENV 12435 recommends that the exponent should be adjusted such as to yield a mantissabetween 0.1 and 1000. Those representations tend to be easier to memorize. The externalrepresentation is of the form:136 22 Mar 1999DRAFT version 1.0


4.4 Ratiosign ::= + | -digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9digits ::= digit digits | digitdecimal ::= digits . digits | . digitsmantissa ::= sign decimal | decimalexponent ::= sign digits | digitsfloat ::= mantissa e exponent | mantissaDRAFTNumber of significant digitsThe number of significant digits is determined according to Ledley (ibid.) and ENV 12435:1. All non-zero digits are significant.2. Leading zeroes are not significant, regardless of the decimal point’s position.3. All trailing zeroes are significant, regardless of the decimal point’s position.Note that rule number 3 diverts from Ledley and ENV 12435. Judgment about the significance oftrailing zeroes is often deferred to common sense. However, in a computer communicationstandard common sense is not a viable criterion (common sense is not available on computers.)Therefore we consider all trailing zeroes significant. For example 2000.0 would have fivesignificant digits and 1.20 would have three. If the zeroes are only used to fix the decimal point(such as in 2000) but are not significant we require to use exponents in the representation: "2e3"to mean "2 × 10 3 ".4.4 Ratio<strong>HL7</strong> v2.3 defined the data type "structured numeric" (SN) for various purposes. Among thosepurposes was to cater the need to express rational numbers that often occur as titers in laboratorymedicine. A titer is the maximal dissolution at which an analyte can still be detected. Typicalvalues of titers are: "1:32", "1:64", "1:128", etc. Powers of 1/2 or 1/10 are common. Sometimestiter results are falsely represented by writing donw only the denominator (e.g. 2 meaning 1:2 and128 meaning 1:128). Great confusion exists in practice when comparing titers to referencevalues. Such, one almost always sees or hears statements like "1:256 > 1:128" when the oppositeis true.Regardless of how negligent those titers are commonly treated in medical praxis, titers arerational numbers. In the inroduction, however, we noted that rational numbers are exact. Titervalues sure are measurements, and all measurements are inexact.DRAFT version 1.0 22 Mar 1999137


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThus, in theory, a titer of 1:128 could be reported as 0.0078125. However, no human user wouldunderstand such a result. One could recover the original ratio using the inverse of10000000/78125 which is 128, but to do that, the receiver would have to know that the givennumber is to be presented to the user as a ratio of 1/n.Since rational numbers are exact mathematical constructs, and since this exactness is notavailable in medicine, this specification defines a generalization of rational numbers, the Ratio. Aratio is any quotient of two quantities. Those can be two integers, in which case we have an exactrational number. But the quotient can be built as well from floating point values, or physicalmeasurements or any combination thereof.Note that the ratio has the semantics of a quotient. The ratio data type must not be used onlybecause it is a handy representation of two related values. Notably, blood pressure values,commonly reported as 120/80 mm Hg are not ratios!RatioA ratio quantity is a quantity that comes about through division of a numerator quantity with adenominator quantity. Ratios occur in laboratory medicine as "titers", i.e., the maximaldissolutions at which an analyte can still be detected.component name type/domain optionality descriptionnumeratordenominatorQuantityQuantityrequireddefault is 1requiredmust not be zerodefault is 1A Quantity is a generalization of the following data types:Integer Number [p. 133]Floating Point Number [p. 134]PhysicalQantity [p. 138]MonetaryAmount [p. 140]Ratio [p. 137] (recursively)... other quantitative data types4.5 MeasurementsThe numerator quantity.The denominator quantity.138 22 Mar 1999DRAFT version 1.0


4.5.1 Physical QuantitiesDRAFT4.5.1 Physical QuantitiesAll versions of <strong>HL7</strong> v2.x had the data type "Composite Quantity with Unit" (CQ) defined. Thisdata type, however, was not normally used in measurement observations (OBX). Instead, in anOBX you would send a numerical result (value type NM) and send the units in a separate OBXfield. Moreover, units used to have different code tables depending on whether the CQ type or theOBX mechanism was used. We want to clean this up. It seems to be so natural to define a datatype for measurements (or "dimensioned quantities") that many other standardization groupsadopted (reinvented) this two component data type over and over again.CEN TC251, WG 1, PT 26’s first working document Health Informatics; Electronic HealthcareRecord Communication; Part 1: Extended Architecture in table 25 [p. 52f] defines a type"quantity" as "A measurement expressed as a numeric value and unit of measurement" with thetwo component structure (value, unit).The current draft 5 of CORBAmed’s Clinical Observation Access Service (COAS) specifies an"MeasurementElement" that basically contains value and unit, however, the structure is slightlydifferent.We define the data type Physical Quantity as follows:Physical QuantityA physical measurement is a dimensioned quantity expressing the result of a measurement act.It consists of a value and a unit.componentnamevalueunitUnitstype/domain optionality descriptionFloating Point Number[p. 134]Concept Descriptor [p.122]requiredrequiredThe magnitude of the quantity measuredin terms of the unit.The unit, which is a real world concept.Units are mathematical structures, quite different from other vocabularies. Armed with a little bitof mathematics, dealing with units is much simpler than dealing with the usual medical concepts.Units are hard to attack with semantic networks, but easy to deal with in simple algebraicalstructures. [More will follow, see also Schadow G, McDonald CJ, et al. Units of Measures inClinical Information Systems. JAMIA. Apr/May 1999.]DRAFT version 1.0 22 Mar 1999139


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTExisting codes for units of measure are:1. ISO 2955 (1983)2. ANSI X3.50 (1986)3. <strong>HL7</strong> ISO+/ANSI+, equals ASTM 1238, equals HISPP MSDS CDT (based on ISO+).4. There is a new Unified Code for Units of Measures (http://aurora.rg.iupui.edu/UCUM)(UCUM) that we will submit to either ANSI X3.50, ISO TC12, or as an <strong>HL7</strong> defined code(probably maintained by Regenstrief, similar to LOINC). The UCUM is much morecomplete, does not suffer from ambiguities and imprecise semantics as the other codes do.Regardless of what coding system <strong>HL7</strong> ends up recommending (or mandating) we will be able toaccommodate this in the above defined structure.Constraints on the Dimension of a MeasurementNot all physical kinds of quantities (or dimensions) are applicable in every use of themeasurement data type. Subsets of units of measures are defined through the semantics of unitsand could be specified in either of three ways:1. with a special code for kinds of quantities,2. with a special expression language (similar to the units code itself),3. with a paradigmatic unit to which a given unit must be convertible.Ad. 1: Examples for a special code for kinds of quantities is the "property" code of LOINC. I.e."TIME" for time durations (e.g., seconds)Ad. 2: Examples for a special expression language is the way dimensions are commonlyspecified, "T" for time, "L" for length, "LT -1 " for velocity, "LT -2 " for acceleration and "LT -2 M"for force.Ad 3: If an attribute "encounter duration" is defined as a measurement then one could give theparadigmatic unit "s" (second) in the definition of that attribute, meaning that every value of thisattribute must be convertible to seconds. This would be true for all measurements with units suchas minute, hour, day, and many more.4.5.2 Monetary Quantities: CurrenciesExpressions of monetary amounts are of the same abstract form as physical quantities, i.e. acomposite of a value and a unit (the currency unit). As with physical quantities, this compositecan be regarded as a product (multiplication) of the value and the unit. As with physical units wehave submultiples of currency units (e.g., dollar and cent, pound and penny, mark and pfennig,rupee and paisa, etc.) Currencies appear to be just another dimension of measured quantities.140 22 Mar 1999DRAFT version 1.0


4.5.2 Monetary Quantities: CurrenciesDRAFTHowever, there is also a big semantic difference between monetary units and physical units.While "exchange rates" of physical units are pretty stable over many decades, the value ofmonetary units is negotiated differently each day in different places of the world. While aninternational inch is 2.54 centimeters exactly (since 1959), a U.S. dollar (USD) may be 1.795Deutsch mark (DEM) today and 1.659 DEM tomorrow. The same USD may be worth 1.795DEM in New York and 1.801 DEM in Frankfurt (Germany) at the same time.This suggests handling currencies differently from physical quantities. The methodology of thisdata type redesign work defines data types as semantic entities. The fact that some data typeswith different semantics may share a similar structures does not by itself warrant to lump bothtypes together.Monetary AmountA monetary amount is a quantity expressing the amount of of money in some currency.componentnamevaluecurrency unittype/domain optionality descriptionFloating PointNumber [p. 134]Concept Descriptor[p. 122]requiredrequiredThe magnitude of the monetary amount interms of the currency unit..The currency unit (e.g., US$, Deutsch Mark,Pound sterling), which is a real worldconcept.ISO 4217 is an international code for currency units. Although the standard text itself iscopyrighted, the values themselves are freely usable and are listed here(http://www.triacom.com/archive/iso4217.en.html). This code does only cover the "major"currency units of each country, e.g. U.S. dollar but not cents, British pound but not penny,German mark, but not pfennig, Indian rupee but not paisa, etc. This shouldn’t be a majorproblem, since most currency submultiples are 1/100 worth the major unit (yes the British turnedtowards a decimal system as well, no "shilling" any more; was 1/16 pound sterling.)Price ExpressionsExpressions of monetary units and physical units may be mixed as in price expressions, such as 5U.S. dollar (USD) per milliliter (price), or 20 USD per hour (salary). Two ways exist to constructprice expressions.1. using the Ratio [p. 137] data type with a monetary amount as numerator and a physicalquantity [p. 138] as a denominator.2. combining a code for physical units with a code for currency units.DRAFT version 1.0 22 Mar 1999141


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT(1.) The example price expressions above could be built with ratios as follows(Ratio:numerator (MonetaryAmount:value 5.00:currencyUnit "USD"):denominator (PhysicalQuantity:value 1:unit "ml"))(Ratio:numerator (MonetaryAmount:value 20.00:currencyUnit "USD"):denominator (PhysicalQuantity:value 1:unit "hr"))This is a clean and the most simplest solution, since separate codes for physical units andcurrency units are available today. This allows to flexibly combine quantities that have differentsemantic properties.The alternative (2.) is to merge a code for physical units with another code for currency units.This endeavor raises problematic questions about the differences in semantics.The way this could work in UCUM is that one would define an eighth base unit in addition to theseven existing base units. This would probably be the U.S. Dollar, or one troy ounce of gold -traditionally used as the standard currency by the World Monetray Fund.Lexically, the currency units would be treated just like any other unit. Semantically, however,their value would be taken from a dynamic table, which could be an on-line connecting directlyto New York’s Wall Street or any bank institution regarded as authoritative in any given realm.However, this raises question what happens if a message crosses a given realm? Whileconversions between physical units should be enabled because physical units of the samedimension are equivalent, currency units are not equivalent. Currency units do change theirexchange rates on an hourly basis. While it does not matter at all whether you have 1 yard or0.9144 meter, it does matter a lot whether you have 100 US Dollars or 3000 Indian Rupees.142 22 Mar 1999DRAFT version 1.0


4.5.3 Things as Pseudo UnitsDRAFTThis matter must be considered an open issue for the time being.4.5.3 Things as Pseudo UnitsSometimes all kinds of things are used in expressions of the same form as physical quantities,such asnumber × unitThose expressions are often used when numbers are reported that are the results of countingthings. For instance, if we count tablets and the number of tablets is 50, people naturally say, "50tablets", which almost lets "tablet" appear as if it wehere a physical unit. However that is not true.Not any object is a phyiscal unit. Moreover, the connection between things and physical units ismainly suggeted by European natural languages, where we say "50 tablets", "20 cars", "1000chicken" and the like. Other languages, like Japanese, use category suffixes behind countnumbers, such "5 pencils" would be "empitsu go-hong" in Japanese, where "hong" is used for allkinds of long and thin things. Should we therefore suggest to regard "hong" as a physical unit?Those thing-units do have certain properties in common with physical units, for example, you cannot add meters and seconds or apples and oranges. But there are also important differences. Allinternational standards on measurements state that when object counts are reported, themeasurement name should contain the things counted. One should not make up ad-hoc units. Inlab data bases one frequently finds units such as "red blood cells" vs. "white blood cells", whichis redundant, given that the measurement name is reported properly.Those thing-unit are most common in the pharmacy, where they appear as medication units ofapplication (e.g. tablet, capsule, vial, spray, etc.) that are often used as if those were units ofmeasure. Those symbols, however, are not units of measure, because they are not inherentlyquantities. While a metre is inherently a quantity (worth approx. 3.4 foot), a tablet or vial has nomagnitude by itself. A given tablet, vial or spray may have properties, such as strength orvolume, but those are different for any different kind of tablet, vial or spray under consideration.Conversely, a metre does not have different quantitative properties, a metre is a quantity inessence. Tablet, vial, or spray are not essentially quantitative items.Of course, you can count tablets (like you can count all kinds of things), of course, a tablet, as aphysical body does have volume, length, width, and depth. But the essence of a tablet is its formand not any specific kind of quantity. Conversely the essence of a meter is a certain amount oflength, the essence of a second is a certain amount of time, and the essence of a dollar is a certainamount of money. Not every kind of an object is a candidate unit.One may argue that not even all units or measure are real units, so why should one bother? Forexample, international Units (i.U.) are units that do not have a fixed magnitude associated withthem.DRAFT version 1.0 22 Mar 1999143


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTInternational Units are arbitrary units defined for every analyte by some internationalorganization IUPAC (?). Examples are i.U. for penicillin, insulin, streptokinase, urokinase, andother medications, but i.U. are defined for many enzymes, hormones and antibodies. Therationale for those units is twofold:1. these are functional units that measure a certain biochemical function rather than a specificmolecule, because many slightly different molecules can carry out to the same biochemicalfunction;2. the measurement process has so many parameters which all need to be standardized that it isnot possible to come up with comparable units, standardized across all analytes.The units U (= 1 umol/min) and katal (= 1 mol/s) of catalytic activity try to be standardized forall enzymes. However, the measurement conditions still need to be standardized because 1 katalof Phosphofructokinase measured at pH 7.4, 37 degree Celsius, in a Ringer solution, with thismuch ADP and no 1,2-Bisphosphoglycerate present, is quite different from 1 katal of the sameanalyte measured at pH 7.5, 28 degree Celsius, in plain water with only that much ADP present.The various international Units (i.U.) are still essentially quantitative concepts, becauseinternational Units are defined for no other purpose than to measure quantities. This is quitedifferent with tablets, vials, and sprays.The order/results committee will have to work out the specifics on the relationship between unitsof application and units of measures in its information model. It is quite important for a cleaninformation analysis to distinguish the semantics of physical units from those thing-units. Animportant purpose of this data type redesign is to facilitate information analysis, not to obscure it.4.6 Time4.6.1 Point in TimePoint in TimeA point in time is a scalar defining a point on axis of natural time. This naive concept of anabsolute time scale is not concerned with relativity of time as is important in astrophysics andcosmology.PRIMITIVE TYPE [see text]The natural time scale is, almost like the temperature scales (Celsius or Fahrenheit), an intervalscale (aka. difference scale). While the Celsius temperature scale defines a zero point at thefreezing point of water and a standard degree as 1/100 of the boiling point of water, the Christiancalendar defines the zero point at the birth of Christ, and the basic unit of time as the second.There are obvious problems with the determination of the zero point of the Christian calendar,but the principle is the same.144 22 Mar 1999DRAFT version 1.0


4.6.1 Point in TimeDRAFTZero points on the natural time axis are chosen arbitrarily, and called the "epoch".Many data type specifications for point in time are based on an epoch. Examples for epochs are:1/1/1970 00:00:00 UCT on Unix, 1/1/1980 00:00:00 UCT on MS DOS, 12/31/1959 00:00:00EST in the Regenstrief MRS, 10/15/1582 00:00:00 UCT in CORBA’s COAS. Basic durations areseconds, milliseconds, microseconds, or nanoseconds measured from that epoch. This way ofrepresenting time is very simple. Although it is not easily human readable, it is very easy tocompute with those standardized time values.Traditionally the even flow of time is "convoluted" in many cycles defined by calendars. Suchcycles are years, months, days, hours, minutes, seconds. Those cycles are not synchronized.Traditionally calendars have been define based on astronomical phenomena, however, calendaryears, months and days are not attached directly to astronomical phenomena. The closest fit is thecalendar day to the solar day, but the calendar month is definitely not the same as a lunar(synodal) month.Humans communicate points in time as calendar expressions. Calendars are quite complexconstructs which are dependent on culture. Bali, for example, is said to uses 6 different calendars.To account for the calendar problem, the basic Java library defines two classes:java.util.Date and java.util.Calendar. Date is defined as a point in universalcoordinated time of the form epoch/duration (Java’s epoch is 1.1.1900 00:00:00 UTC).Calendar is a generalization of a GregorianCalendar an potentially other calendars.It is quite difficult to convert a calendar expression into an epoch/duration form. There are notjust leap days (Feb. 29) added to leap years, but also leap seconds (added to leap days). Thealgorithms to determine leaps is difficult (leap year) or non-existent (leap second). The latter aretaken from tables published in Astronomical Almanacs. But fortunately, conversion is done bymost operating systems or the basic Java library.Calendar expressions are for humans to understand and are therefore represented as characterstring literals. The semantic components of a calendar expression may be different from thecomponents identifiable in a particular surface form.Quite solid standards for expressions in the Gregorian calendar are <strong>HL7</strong> v2.3’s TS data type, andISO 8601 (adopted in Europe as EN 28601). ASN.1’s (ISO 8824) GeneralizedTime is arestricted form of ISO 8601. <strong>HL7</strong>’s TS format is used by ASTM 1238 as well and lives on inANSI HISPP MSDS CDT’s DateTime format. Although <strong>HL7</strong>’s TS format and ISO 8601 aresimilar, they also have considerable differences.For <strong>HL7</strong> v3 it seems worthwhile to consider adopting ISO 8601 [more about ISO 8601(http://www.cl.cam.ac.uk/~mgk25/iso-time.html)]. However, ISO 8601 has some "features" thatmay be considered a disadvantage. First of all, ISO 8601 has too many unnecessary alternatives.A somewhat canonical date/time form isDRAFT version 1.0 22 Mar 1999145


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTYYYY-MM-DDThh:mm:ssthe dashes between the date components, the colons between the time components and the "T"between date and time components may, according to ISO 8601, as well be omitted. Theomission of those characters brings about a form very similar to ASN.1 or <strong>HL7</strong>’s TS. The way ofhandling precisions in TS of <strong>HL7</strong> v2.3 (after v2.2) is to leave out the less significant digits asrequired. However, without the "T" between date and time, this would be ambiguous with certainother valid ISO 8601 forms. ISO 8601 allows omission of the "T" by mutual agreement and onlyif no ambiguities are introduced - a clause that is usually hard to enforce (and therefore harmful)in standards.The W3C is considering a subset of ISO 8601 (http://www.w3.org/TR/NOTE-datetime) foradoption. W3C’s subset requires the "T" between date and time.Useful features of ISO 8601 that are not part of <strong>HL7</strong>’s TS type are so called "ordinal dates" of theformYYYY-DDDYYYY-WwwYYYY-Www-DThese allow to specify a date as (1) the day of a year, (2) the week of a year, or (3) the week ofthe year plus the day of the week.Moreover, ISO 8601 allows omission of more significant components (the delimiter dash, colon,or "T" must occur in those cases). This changes the semantics of the expression from a point intime to a calendar modulo expression. For example "---2" means every Tuesday, but subtlevariations may have big impact on the meaning: "-W-2" means Tuesday "of the current week"(whatever this means).Both, <strong>HL7</strong>’s TS and ISO 8601 handle time zones through offsets of the form "+hh:mm" or"-hh:mm" relative to UTC. TS adds a "Z" in front of the time zone suffix, while ISO 8601 usesthe "Z" to mean UTC specifically (thus in ISO 8601 an offset expression following the Z wouldbe contradictory).Other worth-having features are missing in ISO 8601, however. Those missing features includethe concept of significant digits available in TS, where you can say "198" to mean any year from1975 to 1985.It seems justified for <strong>HL7</strong> to sticks with its own tradition of the TS data type. However, someslight changes could be applied to render most TS expressions compatible with ISO 8601expression. Notably the "Z" should be used in the ISO 8601 style (i.e. only for UTC).146 22 Mar 1999DRAFT version 1.0


4.6.2 Time DurationsDRAFT4.6.2 Time DurationsSome recently developed type systems define a special data type for durations (e.g. for instancethe one developed by M. Stonebreaker for the POSTGRES object-relational data base project)The Arden syntax also knows such a concept. In this v3 data type model, however, time durationsare but a special case of a physical quantity. Durations of time are nothing else thanmeasurements in the dimension of time. Thus those durations have the units 1 s, 1 min, 1 hr, 1 d,1 wk, 1 mo, 1 a, etc.4.6.3 Other issues and curiosities about Time"I got sick at my birthday, about 20 years ago," is an expression that we might want to capture.One possible representation for this time would be "yyyy0219" if my birthday is February 19thand if yyyy is constrained to this year - yyyy is approximately 20 years. If from another sourcewe gather that I got sick in "1976", but don’t know the exact month and day, then we canconclude that I got sick in "19760219", because 1998 - 1976 = 22. This seems a somewhat rareuse case, but definitely worth considering."I got that cough in spring," might lead us to adjust probabilities for pollen allergy. The season ofthe year is of interest in epidemiology. Bob Dolin, in his JAMIA Article on Modeling thetemporal complexities of symptoms, suggests accounting for "season" in time expressions. Thedifficulty here is that seasons depend on the geographical latitude and we can not infer the seasonfrom the month of the year. January is Summer in Australia, South Africa, Chile, and Argentiniawhile northern folks assume that January is the worst part of the Winter. Moreover, at the equatorthere are not the usual four seasons, however, in tropical regions, there is the Monsun season,which may be considered one of two seasons, or a fifth season. I propose to defer season as partof a point in time expression until the use and the implications become more clear.Noteworthy references on time expressions are CEN TC251’s ENV 12381 Health careinformatics; time standards for health care specific problems and the ARDEN Syntax. Those twostandards not only define relations and operators on time values but also on events and episodeswhich are related in time.Relative times of the semantics NOW + duration offset stick out as the most prominent featuredefined by those and other time related standards. We might thus consider the ability to specifyrelative time. Some conventions use expressions like "t-1" to mean "yesterday". Relative timeexpressions are of the data type point in type, but the exact value depends on a parameter (theactual time) specified elsewhere.DRAFT version 1.0 22 Mar 1999147


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT4.6.4 Calendar Modulus ExpressionsA modulus is the remainder of an integer division. For example, 12 modulo 7 is 5. If we have thetime defined as epoch + duration in days, we can tell the day of the week of any date if we knowthe day of the week of the epoch. For instance, let our epoch be January 1 of 1582 (when theGregorian calendar was introduced) was a Monday. We can easily tell the weekday of January 311582: the offset from the epoch is 30 days. A week has seven days, 30 modulo 7 is 2. Mondayplus two days is Wednesday. The same way we can tell that the date epoch + 151840 days (sometime in 1998) is a Thursday.Other such modulus expressions exist in calendars, all of which have the form:unit 1 of the unit 2day of the weekmonth of the yearday of the monthweek of the yearday of the yearhour of the dayminute of the hoursecond of the minuteObviously, unit 1 must be less than unit 2 . All those units are defined by the calendar and may beslightly different from related units defined for time durations. For instance, the average Julianmonth is 30.4375 days, but a calendar month varies between 28 and 31 days. Thus the moduloexpression "month of the year" must be made available by the calendar and can not easily becalculated using the average month.How do we express complex modulo expressions that are not provided by the calendar? Thingslike "every other Tuesday" come to mind. We could tell whether or not a certain date is an everyother Tuesdays by testing the the equation:date modulo ( 2 x 7 ) = 1; given that 0=Monday, 1=Tuesday, ...while every Tuesday would be:date modulo 7 = 1; given that 0=Monday, 1=Tuesday, ...We decided to ponder on the calendar modulo expressions for some time before coming back toit.148 22 Mar 1999DRAFT version 1.0


5 Orthogonal IssuesDRAFT5 Orthogonal IssuesThere are variations of meaning that can apply to many different data types. Such variations areforming ranges, adding comments, specifying a validity period or a history of some data element,and, of course, specifying uncertainty about some information. Rather than define specific waysfor every data type to express such semantic variations, this type system uses generic types [p.12] combined with implicit type conversion [p. 22] to yield a similar effect as was used in <strong>HL7</strong>2.x to modify existing data types.<strong>HL7</strong> 2.x used to append new optional components at the end that served as modifiers of themeaning of the prior components. Thus the same message element instance could conform tomore than one type, the base type and the extended type.In a strong type system we can yield the same effect through generic types [p. 12] combined withimplicit type conversion [p. 22] . This method virtually "overlays" extended types on top of thebase types.5.1 IntervalDRAFT version 1.0 22 Mar 1999149


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTIntervalGeneric data type that can express a ranges or intervals of values. An interval is a set ofconsecutive values of any totally ordered data type. An interval is thus a continuous subset of itsbase data type.parameternameTcomponentnameallowedtypesOrderedTypetype/domain optionalityGENERIC TYPEdescriptionAny ordered type can be the basis of an interval. It does not matterwhether the base type is discrete or continuous or whether anyalgebraic operators are defined for that type.low T optional The lower boundary.low closedBooleanrequireddefaultfalsehigh T optional The upper boundary.high closedBooleanrequireddefaultfalsedescriptionIndicates whether the interval is closed or open at thelower boundary. For a boundary to be closed, a finiteboundary must be provided, i.e. unspecified orinfinite boundaries are always open.Indicates whether the interval is closed or open at thehigh boundary. For a boundary to be closed, a finiteboundary must be provided, i.e. unspecified orinfinite boundaries are always open.Ranges or intervals of values are most abundant as ranges of absolute time, used for ordering andscheduling. Note that an interval is not to be used to specify confidence intervals for uncertainvalues.We use the terms "range" and "interval" interchangably as synonyms. Webster’s dictionarydefines:rangeinterval1 a (1) : a series of things in a line, [...]1 a : a space of time between events or states [...]3 : a set of real numbers between two numbers either including or excluding one or both ofthem150 22 Mar 1999DRAFT version 1.0


5.1 IntervalDRAFTThus, in common language interval and range are not quite synonyms. A range is the ordered"line of things" while the common notion of an interval is the gap between two things. However,"interval" is used in mathematics for things being aligned in a set.People normally use ranges for three different purposes that can be intuitively described as1. a set of values, where each value may apply under some circumstances (e.g. an orderscheduled to begin at 3:15 and end at 4 o’clock);2. one single unknown value supposed lie within the range of values given (e.g. a measurementwhich turns out to be off the lower absolute limit and therefore can be reported only as arange with an upper boundary);3. one single value whose set of possible values is partitioned into equivalence classes becausethe exact differences are not interesting or not measurable (e.g in microbiologicsusceptibility testing, we may have a parameter "OXACILLIN SUSC" where only thefollowing equivalence classes are of interest: > 8.0 µg/ml (not susceptible); 4.0±2 µg/ml(limited susceptibility); and < 2.0 µg/ml (susceptible)).The interval data type shall be primarily used when the entire set of values is meant, not just onevalue from that set. Notably if the motivation for considering an interval is that there isuncertainty, then the interval is the wrong choice. For uncertainty or inaccuracy one of the datatypes for uncertainty [p. 155] must be used instead. Thus in the above list, only item 1 isdefinitely a use case for intervals.Intervals can be open or closed at either side:[n, m] is a closed interval. A value x is an element of the interval if and only if x is greater orequal than n and less or equal than m. That is, the boundaries are included in the interval.]n, m[ is an open interval. A value x is an element of the interval if and only if x is greater thann and less than m. That is, the boundaries n and m are not included in the interval.Obviously an interval can be closed on one side while open on the other side.Intervals can have finite or infinite boundaries on either side, if the base type contains the notionof inifinity. Note that an interval with two infinite boundaries is equivalent to the entire range ofan infinit base type.One boundary of an interval may be unknown. For example, the expression "< 3" is an intervalwith an unknown lower boundary and an open finite upper boundary. An interval must not haveboth boundaries unknown.An interval can only be closed at a finite boundary. That is, if a boundary is an infinity orunknown, the interval can not be closed at that boundary.DRAFT version 1.0 22 Mar 1999151


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTAlthough, we do distinguish between surface form and semantic components with intervals aswith any other data type, we specify a character string literal form for interval expressions that istuned toward intuitiveness and is recommended for use in character based encoding rules. Here isa mapping between surface forms (string literals) and the uniform interval form:literal interval form instance notation= n [n; unk[ (high open) (Interval :low n:lowClosed #true)< n ]unk; n[ (low open and high open) (Interval :high n)> n ]n; unk[ (low open and high open) (Interval :low n)= n [n; n][n,m] [n; m](Interval :low n:lowClosed #true:high n:highClosed #true)(Interval :low n:lowClosed #true:high m:highClosed #true)As always, various constraints can be made on data types. I.e., the components of the intervaldata structure can be constrained to certain allowable values or combinations of values only. As anotable special case, one could constrain intervals such that any allowable value would have tohave an unknown (or infinite) boundary at one side.5.2 General Annotations<strong>HL7</strong> v2.x made abundant use of the NTE segment for notes and comments. Up until now, there isno such construct for <strong>HL7</strong> version 3. The NTE segment was a very useful construct tocommunicate information that can not be communicated otherwise. NTE segments usuallycontain free text, meant to be shown to human users. Th v2 NTE segments had the disadvantagethat they would occur only at certain places in the message. A comment in an NTE segement wasscoped to parts of the message structure, however, the scope could not be narrowed down to thelevel of a single data element or component.152 22 Mar 1999DRAFT version 1.0


5.2 General AnnotationsDRAFTThe following generic type for annotations can be overlayed over a value of any other data type.An implicit conversion rule exists that will convert any annotated T to a T at the receiver side.Annotated InformationGeneric data to give allow arbitrary free text annotations for any message element instance.parameternameallowed typesGENERIC TYPEdescriptionT ANY Any message element type can be annotated.componentnametype/domain optionality descriptionvalue T required The information itself.noteFree Text [p.48]requiredThe annotation as free text to be eventuallydisplayed to a user or administrator.Note that this annotated information data type, as a Message Element Type (MET) could be usedto annotate any Message Element Instance (MEI), regardless whether that MEI was derived froma RIM class, a RIM attribute, or from any component of a data type. Thus this annotatedinformation generic type is enough to carry the NTE feature of version 2 over to version 3.Annotations are primarily used to eventually display the annotation to human users. For instance,a lab value might be sent annotated, in which case the medical record user interface programmight shows a little marker in the respective cell of the flowsheet. When the user clicks on thatmark, a text box pops up that displays the free text annotation.However, annotations in version 2 NTEs were sometimes used like a codes. This happens forthree different reasons1. instead of fixed canned notes and comments, only a single symbol is sent, as an abbreviationfor the whole commen;2. people want to save bandwidth by "compressing" longer comments into abbreviations; or3. the notes and comments are meant to be interpreted by computers instead of humans.To use free abbreviations or codes in NTE segments is a problematic habit, though. First of all, itis hardly interoperable, becasue one will hadly find any standard for notes and comments codes.Indeed if there were any such standard, then the use case of those codes would be so wellestablished that it would warrant better means than just annotations. Such codes usually translateinto No Information flavors, or attributess of specific classes.DRAFT version 1.0 22 Mar 1999153


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThere are clearly some "use cases" we deliberately will not support. There is no need for ad-hoc"compression" of data using such abbreviations. The problems and overhead that suchabbreviations put on the message processing side outweighs by far the minor saving ofbandwidth. Also, we do not want to support the use case of lazy message creation. Indeed manycoded annotations fall in the category of lazy message creation, where data could be sent inappropriate message fields.If there are use cases for coded annotations that are not supported by the RIM or the data typemodel, those should be fed back into the <strong>HL7</strong> development process. Codes that people used tosend in NTEs should be systematized and used to improve the <strong>HL7</strong> version 3 data models andmessages.It might have been reasonable in v2.x to use those coded NTE segments for this purpose, in v3we definitely want to use the available stanardized information structure. If any significant amoutof real existing annotation could not be accomodated in RIM data elements, we should drive ause case analysis from there suggesting improvements to the RIM.Disclaimer: we will get back to this as an open issue.5.3 The Historical DimensionIn the recent years <strong>HL7</strong> has experienced a need for data elements to be communicated with ahistory. I.e. the National Immunization Program (CDC, State Departments of Health) needed tocommunicate historic address information. Other examples for history are "effective date" and"end date" of identifiers or other data. The traditional approach to this problem was to extend apreexisiting data type T or to create a new data type X-T. Using generic types as described above,we no longer need to take care of history information for every existing type. Instead we candefine the following set of generic types:5.3.1 Generic Data Type for Information History154 22 Mar 1999DRAFT version 1.0


5.3.2 Generic Data Type "History Item"DRAFTHistoryGeneric data type to give the history of some information. This is an ordered list of data of thesame type along with the time interval giving the time the information was (or is) valid. Theorder of history items in the lists should be backwards in time. The history information is notlimited to the past history, expected future values can also appear.TGENERIC TYPEparameter name allowed types descriptionANYORDERED LIST OF History Item [p. 155] 5.3.2 Generic Data Type "History Item"History ItemAny data typecan be usedhere.Generic data to give the time range in which some information was, is, or is expected to bevalid.parameternameallowed typesGENERIC TYPEdescriptionT ANY Any data type can be used here.componentnametype/domain optionality descriptionvalue T required The information itself.validityperiodInterval [p. 149]requiredThe time interval the given information was,is, or is expected to be valid. The interval canbe open or closed infinite or undefined oneither side.When no validity period is known, it does not make sense to send a history item for theinformation, therefore, both components are required. However, an interval can be defined openand undefined or infinite on both sides. This should not be done unless in a case where infinite orundefined validity periods are semantically justified.DRAFT version 1.0 22 Mar 1999155


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT5.4 Uncertainty of InformationUncertainty may exist for all kinds of informations. Information is selection of a signal (value)from a set of possible signals (values). Uncertain information is selection of several values from aset of possible values where we assign to every value a probability (i.e. belief that the giveninformation applies). We may distinguish four cases:1. There are only two possible values where one is the negation of the other (boolean). In thatcase we need to specify a probability p for only one value (preferably the value meaning"true"). The probability of the other value is then 1 - p.2. The set of possible values may have no total order. In that case we have to send pairs of.3. The set of possible values may have a total order but is discrete. In that case, we can send pairs too. In addition, however, there is a mapping of the set to the setof natural numbers, and we can specify a discrete probability distribution (e.g., binominal,geometric, poisson) and the necessary parameters of those distributions.4. The set of possible values may have a total order but is continuous. In that case, we can notsend pairs. But we can select a continuous probability distribution (e.g.,normal, uniform, gamma, chi-square) and its necessary parameters.The following are examples of where uncertainty appears in the language of medical practiceA pathologist says: "There is a 30% probability that this lesion is malignant."A pathologist says: "This lesion is malignant." A medical record system may find out fromcase-based reasoning (experience) that if pathologist A discovered malignancy, he was rightin 80% of the cases, whereas if pathologist B makes the same statement, he was right in only70% of the cases.A pathologist says: "This lesion is probably malignant." Again from experience, a systemcan say that if the word "probably" was used the chance of malignancy is 40% (whereas ifthis pathologist had said "could be" the chance would have been only 10%).One might concluded that one needs to distinguish whether a probability was issued by the "user"or a such a system that keeps track of experiences with the pathologist’s judgment.One might further concluded that an expression of an uncertain discrete value (e.g., malignancy)should include both, a coded qualifier of confidence and a numeric probability, where each maybe assessed by different entities.The seemingly important distinction between "user assessed" probability and "system assessed"probability suggests that every uncertain information item may be associated with manyuncertainty qualifiers, each in the eye of another entity. Indeed soem piece of information may bebelieved at a different level of confidence by different people. Bayesian probabilities are156 22 Mar 1999DRAFT version 1.0


5.4 Uncertainty of InformationDRAFTsubjective, and thus, any probability is valid only in the context of the one who issued theprobability.Uncertainty assessments (probabilities) are subjective. Thus they depend on who states them. Forexample, if I am 70% sure that what I see in the microscope are malignant cells, I express myviews as such. If some experienced pathologist says that probability for malignancy is 70%, sheexpresses her view as such. Any receiver of that information must draw his own conclusionsbased on his trust in my or the pathologist’s judgment.Practically, a receiver might apply a penalty of 0.5 to what I say, whereas the pathologist’s viewswould be trusted at a level of 0.95. Thus from my statement, the receiver may infer a probabilityof 35% for malignancy while the pathologists statement may be transformed to 67%. If thereceiver has both of our statements, he may want to apply a noisy-or and infer his probability as1-(1-35%)(1-67%) = 79%.The bottom line is: the newly created value-probability-pair would be part of a new observationassessed by the receiver of both mine and the pathologists statements, penalized and combined bythe receiver. The receiver drafted his judgment about the case from information received byothers, but he has drawn his own conclusions and is responsible for them. This shows that there isnot one correct proability that would "objectively" qualify any given statement.When this newly drafted value-probability-pair is communicates further along to someone else,the sender may or may not quote both of his input-statements plus his own conclusion. In anycase, the receiver of that information would again penalize and combine what he has got based onhis trust in the judgment of the originators of the incoming statements.It generally doesn’t matter whether a probability was issued by a human "user" or by any kind ofdecision support "system". The same rules apply: the probability is subjective and the receiverhas a responsibility to value the uncertain information he received. Knowing the originator of theuncertain statement is essential (as it is always essential to know who said what), but knowingjust the category "user" vs. "system" does not help.A data type for uncertain information should, however, not include implied associations betweenRIM classes to suit the need for attributions of probabilities. Thus, one uncertain value should notbe attributed to some Healthcare_provider instance of the RIM. For example, we should not builda data type composed of the triple , where originator would be aforeign key to some Stakeholder or Healthcare_provider. Rather, the uncertain value would beincluded in a RIM class instance, where the attribution or responsibility of the statement is clearfrom that context of the RIM class.It is true that any instance of uncertain information must be attributed to an originating entity(Doctor or decision support system) just like a "supposedly certain" information must beattributed. But attribution of information is outside the scope of this data type model, sinceattribution is modeled properly in the RIM.DRAFT version 1.0 22 Mar 1999157


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFT5.4.1 Uncertain Discrete ValuesDiscrete values can be assigned a single probability number to express the level of confidencethat the given information appliesUncertain Discrete Value using Probabilities (UDV-P).Generic data type to specify one uncertain value as a pair of .parameternameallowed typesGENERIC TYPEdescriptionT DiscreteType Any data type that is discrete can be used.componentnametype/domain optionality descriptionvalue T requiredprobabilityFloating Point Number [p.134]0.0 to 1.0.requiredThe value to which a probability isassigned.The probability assigned to thevalue.Many people are reluctant to use probabilities to express their subjective belief, because theythink that such a probability is not "exact" enough, or that a probability would have to be derivedsomehow experimentally. While this is true in the "frequentist" sense, frequentist probabilitiesnever hold for individual cases, only at average in a population.Bayesian probabilities, on the other hand, do not have to be "exact", especially one does not needto carry out a series of experiments (samples) in order to specify a probability. Probabilities arealways estimated (frequentist probabilities are estimated as well). Bayesian probability theoryequals the notion of "probability" with "belief". The probability is thus an assessment of thesubjective belief of the originator of a statement. Some subjective numeric probability is oftenbetter than a mere indicator that a value is "estimated".Probabilities are always subjective. Just like any other information, uncertain information needsto be seen in the context of who gave that information (attribution). A recipient updates hisknowledge about a case from the received uncertain information based on how much confidencehe has in the judgment of the originator of the information.Both elements in the value-probability-pair are part of the statement made by one specificoriginator. Along a chain of communication, one value may be reported by different entities andassigned a different probability by each of them.158 22 Mar 1999DRAFT version 1.0


5.4.2 Non-Parametric Probability DistributionDRAFTThis data type does not allow to make specific attributions to originators of the information. Therules of attribution are the same whether information is given as uncertain or certain/precise. Inparticular, in case information is given in an instance of a RIM Service_event class, theattribution is provided by the Stakeholder designated as the active participation of type"originator of the information". For "slotted" data elements (PAFM), implicit attribution defaultsto the sending system.5.4.2 Non-Parametric Probability DistributionIf the domain of a discrete value contains more than two elements, one might want to specifyprobabilities for more than one element. This can be done using a non parametric probabilitydistribution. A non parametric probability distribution is a collections of alternativevalue-proability-pairs.Non-Parametric Probability DistributionGeneric data type to specify an uncertain discrete value as a set of pairs(uncertain discrete values [p. 157] ). The values are considered alternatives and are rated withprobabilities for each of the values to apply. Those values that are in the set of possiblealternative values but not mentioned in the non-parametric probability distribution data structurewill have the rest probability distributed equally over all unmentioned values. That way the basedata type can even be infinite (with the unmentioned values being just neglected).TGENERIC TYPEparameter name allowed types descriptionDiscreteAny data type that is discrete canbe used. Usually we would usenon-parametric probabilitydistributions for unordered typesonly and only if we assignprobabilities to a "small" set ofpossible values. For other casesone may prefer parametricprobability distributions.SET OF Uncertain Discrete Value using Probabilities [p. 157] Type cast rules allow conversion between and uncertain discrete value using probabilities andnon-parametric probability distribution and vice versa.The values in a discrete probability distribution are generally considered alternatives. It isunderstood that only one of the possible alternative values may truly apply. Because we may notknow which value it is, we may state probabilities for multiple values. This does not mean thatDRAFT version 1.0 22 Mar 1999159


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTthe values would in some way be "mixed." However, when Rough Sets theory or Fuzzy Logic isused as the underlying theory of uncertainty, the difference between "alternative" and "mixed"values becomes blur. Friedman and Halpern (1995) have shown that all of those theories foruncertainty (probability, rough sets, fuzzy logic, Dempster-Shafer) can be subsumed under atheory of "plausibility". This theory of plausibility would of course be open as to whether or not adistribution is considered over alternative values as opposed to a mixture of the values.However, probability is the most widely understood and deployed theory (although fuzzy logicdecision support systems are used in clinical medicine). If some value should be represented as a"mixture" of a set of categorial values, other means should be investigated before resorting to"plausibility" theory. For instance, suppose we have to decide about a color in the code system"red, orange, yellow, green, blue, purple". Probabilistically all those values would be alternativesand thus a given color may be stated "orange with a probability of 60%", but the alternatives redand yellow are also considered with probabilities 20% and 15% resp. More naturally we wouldlike to "mix" the colors saying that the color we see is 60% orange, 20% red, 15% yellow and 5%green. We could use fuzzy logic to do that, but a more easy to understand approach would be touse a more appropriate color model than the list of discrete codes. A more appropriate colormodel would, for instance, be the RGB system, where every color is represented as a mixture ofthe three base colors red, green and blue (or magenta, yellow, and cyan in subtractivecolor-mixing).An example for a discrete probabilities would be a differential diagnosis as a result of a decisionsupport system. For instance, for a patient with chest discomfort, it might find the followingprobability distribution:(NonParametricProbabilityDistribution(SET :of UDV-P(UDV-P:value "myocardial infarction":probability 0.4)(UDV-P:value "intercostal pain, unsp.":probability 0.3)(UDV-P:value "ulcus ventriculi sive duodeni":probability 0.1)(UDV-P:value "pleuritis sicca":probability 0.1)))160 22 Mar 1999DRAFT version 1.0


5.4.3 Parametric Probability DistributionDRAFTThis is a very compact representation of information that could (and should in general) becommunicated separately using Clinical_observation or Health_issue class instances (orOBX-segments in v2.3). However, there are advantages of using the data type for non-parametricprobability distribution:it is much more compact;it is immediately clear that the stated values are alternatives assessed by one originator ofthe observation;it is clearly specified from the definition of the data type that there is a rest-probability of0.1% that is not assigned to any of the other diagnoses.Those facts would be hard to discover from a bunch of Health_issue class instances.The Health_issue class instances could in some way be linked together to express the samedistribution. This would be the method of of choice if one wishes to track down more preciselyhow the alternative differential diagnoses have been confirmed or otherwise clinically addressed.For the purpose of patient care the expanded set of Health_issue instances would be clearly moreuseful. However, as an excerpt summary of a decision support process, the short form is usefultoo.5.4.3 Parametric Probability DistributionFor continuous values it is not possible to assign a probability to every single value. One canassign a probability to an interval of consecutive values (confidence inteval), however, theconfidence interval can be calculated from a continuous probability distribution.The data type for continuous probability distributions allows to choose from a large menu ofdistribution types commonly used in statistics. Every distribution type has specific paramters.However, for compatibility with systems that do not understand a particular distribution type, themean and standard deviation must always be given.DRAFT version 1.0 22 Mar 1999161


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowParametric Probability DistributionGeneric data type to specify an uncertain value of an ordered data type using a parametricmethod. That is, a distribution function and its parameters are specified. Aside from the specificparameters of the distribution a mean and standard deviation is always specified to helpmaintain interoperability is receiving applications can not deal with a certain the probabilitydistribution.DRAFTThe base data type may be discrete or continuous. Discrete ordered types are mapped to naturalnumbers by setting their "smallest" possible value to 1, the second to 2, and so on. The order ofnon-numeric types must be unambiguously defined.parameternameTcomponentnameallowedtypesOrderedTypetype/domain optionalitymean T requiredstandarddeviationtypeparameters ...dif(T)Code Value[p. 116]GENERIC TYPEdescriptionAny ordered type (anything that is unambiguously mapped tonumbers) can be the basis of an uncertain quantity. Examples areInteger Number [p. 133] , Floating Point Number [p. 134] , andPhysicalQuantity [p. 138] .requiredrequireddescriptionThe mean (expected value or first moment) of theprobability distribution. The mean is used tostandardize the data for computing the distribution.The mean is also what a receiver is most interested in.Applications that can not deal with distributions canstill get the idea about the described quantity bylooking at its mean.The standard deviation (square-root of variance orsquare-root of second moment) of the probabilitydistribution. The standard deviation is used tostandardize the data for computing the distribution.Applications that can not deal with distributions canstill get the idea about the confidence level by lookingat the standard deviation.The type of probability distribution. Possible valuesare as shown in the attached table.162 22 Mar 1999DRAFT version 1.0


5.4.3 Parametric Probability DistributionDRAFTThe number of parameters, their names and types depend on the selected distribution anddescribed in the attached table. This table will define component names to be used in the abovedata type definition.Distribution types, their mean and parameters.typedescription and parameterssymbol name or meaning type constraint or commentguessUsed to indicate that the mean is just a guess without any closer specificationof its probability. This pseudo distribution does not have any parameter asidefrom the expected value and standard deviation.E meanV varianceDISTRIBUTIONS OF DISCRETE RANDOM VARIABLESUsed for n identical trials with each outcomes being one of two possiblevalues (called success or failure) with constant probability p of success. Thedescribed random variable is the number of successes observed during n trials.binominaln number of trials Integer n > 1p probability of success Float p between 0 and 1E mean E = n pV variance V = n p( 1 - p )geometricUsed for identical trials with each outcomes being one of two possible values(called success or failure) with constant probability p of success. Thedescribed random variable is the number of trials until the first success isobserved.p probability of success Float p between 0 and 1E mean E = 1 / pV variance V = ( 1 - p ) / p 2 163DRAFT version 1.0 22 Mar 1999


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTUsed for identical trials with each outcomes being one of two possible values(called success or failure) with constant probability p of success. Thedescribed random variable is the number of trials needed until the rth successoccurs.negativebinominalp probability of success Float p between 0 and 1r number of successes Integer r > 2E mean E = r / pVvarianceV = n r (N - r) (N - n) / ( N 3 -N 2 )Used for a set of N items, where r items share a certain property P. Thedescribed random variable is the number of items with property P in a randomsample of n items.N the total number of items Integer N > 1hypergeometricrnumber of items withproperty PInteger r > 1n sample size Integer n > 1E mean E = (n r) / NV variance V = r(1 - p) / p 2PoissonDescribes the number of events observed in one unit that occur at an averageof lambda per unit. For example, the number of incidents of a certain diseaseobserved in a period of time given the average incidence of E. The Poissondistribution only has one parameter, which is the mean. The standarddistribution is the square-root of the mean.E meanV variance V = EDISTRIBUTIONS OF CONTINUOUS RANDOM VARIABLES164 22 Mar 1999DRAFT version 1.0


5.4.3 Parametric Probability DistributionDRAFTuniformThe uniform distribution assigns a constant probability density over a range ofpossible outcomes. No parameters besides mean E and standard deviation sare required. Width of the interval is sqrt(12 V) = 2 sqrt(3) s. Thus, theuniform distribution assigns probability densities f(x) > 0 for values E - sqrt(3)s >= x 0beta Float beta > 0E mean E = alpha betaV variance V = alpha beta 2chi-squareUsed to describe the sum of squares of random variables which occurs when avariance (second moment) is estimated (rather than presumed) from thesample. The chi-square distribution is a special type of gamma distributionwith parameter beta = 2 and alpha = E / beta. The only parameter of thechi-square distribution is thus the mean and must be a natural number, socalled the number of degrees of freedom (which is the number of independentparts in the sum).nnumber of degrees offreedomInteger n > 0E mean E = nV variance V = 2 nDRAFT version 1.0 22 Mar 1999165


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTUsed to describe the quotient of a standard normal random variable and thesquare-root of a chi-square random variable. The t-distribution has oneparameter n which is the number of degrees of freedom.Student-tnEnumber of degrees offreedommeanInteger n > 0E = 0 (the mean of a standardnormal random variable isalways 0)V variance V = n / ( n - 2 )Used to describe the quotient of two chi-square random variables. TheF-distribution has two parameters n 1 and n 2 which are the numbers ofdegrees of freedom of the numerator and denominator variable respectively.Fnmnumerator’s number ofdegrees of freedomdenominator’s number ofdegrees of freedomInteger m > 0Integer m > 0E mean E = m / ( m - 2 )VvarianceV = 2m 2 (m + n - 2) / ( n(m -2) 2 (m - 4) )The logarithmic normal (log-normal) distribution is often used to transformskewed random variable X into a normal form U = ln X. The log-normaldistribution has the same parameters as the normal distribution.logarithmicnormalµmean of the resultingnormal distributionFloatsigma standard deviation FloatEmean of the originalskewed distributionE = eµ + 0.5 sigma2Vvariance of the originalskewed distributionV = e 2µ + sigma 2 ( e sigma 2 - 1)166 22 Mar 1999DRAFT version 1.0


5.4.3 Parametric Probability DistributionDRAFTThe beta distribution is used for data that is bounded on both sides and may ormay not be skewed. Two parameters are available to adjust the curve.alpha Float alpha > 0betabeta Float beta > 0E mean T E = alpha / ( alpha + beta )V variance TV = alpha beta / ((alpha +beta) 2 (alpha + beta + 1))The distribution type "guess" can be used in two different ways1. a value is known to be uncertain but no information exists about the dispersion of theprobability distribution. In this case, no standard deviation is provided.2. a value is known to be uncertain and a dispersion is approximately known, but noinformation exists about the distribution type. For example, the common expression "Age:75±10 years" would be mapped to a distribution type of guess with standard deviation set to5 years. This seems to pretend a normal distribution, but it does not. Using 10/2 as thestandard deviation is just a convention.The mean component is mentioned explicitly. This component will be used in type casting aprobability distribution over type T to a simple value of type T in a case where a receivingapplication can not deal with or is not interested in probability distributions.The literature on statistics commonly lists the mean as dependent on the parameters of theprobability distributions (e.g. the mean of a binominal distribution with parameters n and p is np.Because we choose to mention the mean (to help in roughly grasping the "value") the parametersof the distributions may be defined in terms of the mean.In the above table, the dependencies between the explicit components mean and standarddeviation and the parameters of the distribution are not always resolved. If we want to give meanand standard deviation explicitly there will often be redundancy in the parameters. However, itseems to be useful to let people specify parameters in the natural way rather than dependent onmean and standard deviation. [needs revision]For example, in the table above, the uniform distribution was specified based on the mean andstandard deviation component without further parameters. This does not mean that the standarddeviation component contains the half-width of the uniform distribution.If there is redundancy in the parameters, it is an error if the specified mean and standard deviationcontradict what can also be derived from the distribution and its parameters.DRAFT version 1.0 22 Mar 1999167


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTThe type dif(T) is the data type of the difference of two values of type T. Often, T is the same asdif(T). For the data type T = Point in time, dif(T) is not Point in time but a Physical Quantity inthe dimension of time (i.e. units seconds, hour, minutes, etc.). This concept is generalizable sinceit describes the relationship between corresponding measurements on ratio-scales vs.interval-scales (e.g., absolute (Kelvin) temperatures vs. Celsius temperatures).Most distributions are given in a form where only natural numbers or real numbers areacceptable. If distributions of measurements (with units) are to be specified, we need a way toremove the units for the purpose of generating the distribution and then reapply the units. Forinstance, if Q = µ u is a measured quantity with numeric magnitude µ and unit u, then we canbind the quotient Q / u to the random variable and calculate the distribution. For each calculatednumber x i , we regain a quantity with unit as Q i = x i u.Most distributions are given in a "standard" form, that is with mean or left boundary equals 0 andstandard deviation equals 1 etc. Therefore one has to standardize the quantity to be describedfirst. This is similar to the problem of removing and reapplying units. The method is also similarand can be unified: a transformation transforms the numeric value to a standard form and laterre-transforms the standard form to the numeric value. Two issues must be considered:translation, i.e. moving the mean (or left boundary) into the origin (zero-point)scaling the value to adjust the standard deviation to one.This means, that any transformation of a value x to a normalized value y can be described as:y = ( x - o ) / sWe can combine the way we deal with the units and the standardization of the value into oneformula:y = ( Q i - µ u ) / ( s u )Here µ u is the expected value (mean) E expressed in the base type T (i.e. a Physical Quantity [p.138] ). This is further justification that we should indeed carry the mean µ u and the standarddeviation s u as an explicit components, so that scaling can be done accordingly. The product s uis the standard deviation (square root of the variance) of the described value. The standarddeviation is a component that an application might be interested in even if it can not deal with a"chi-square" distribution function.It would be awesome if we could define and implement an algebra for uncertain quantities.However, the little statistical understanding that I have tells me that it is a non-trivial task to tellthe distribution type and parameter from a sum, or product of two distributions or from theinverse of a distribution.168 22 Mar 1999DRAFT version 1.0


5.4.4 Uncertain Value using Narrative Expressions of ConfidenceDRAFT5.4.4 Uncertain Value using Narrative Expressions of ConfidenceUncertain Value using narrative expressions of confidenceGeneric data type to specify one uncertain value as a pair of . The qualifier isa coded representation of the confidence as used in narrative utterances, such as "probably","likely", "may be", "would be supported", "consistent with", "approximately", etc.GENERIC TYPEparameter name allowed types descriptionTAny data type that is allowed here, discrete orcontinuous.component name type/domain optionality descriptionvalue T requiredconfidenceConcept Descriptor [p.122]requiredThe value to which an uncertaintyqualifier is assigned.The confidence assigned to thevalue.Like it or not, we do have the use case that data is known to be just estimated and we may wantto signal that the data should be relied in with caution, without having any numeric probability.This occurs most frequently when textual reports are coded.We also have to deal with narrative expressions of uncertainty that are heard everywhere; and wemay want to capture those ambiguous and largely undefined qualifiers of confidence. This isalmost like an annotation to a value considered to be understood mainly by humans.We do not specify a closed list of codes to be used. Jim Case has an action item to submit a dozenor so of qualifiers he commonly has seen, others are invited to contribute as well.No special effort is made to assign numeric probabilities to the codes nor even to specify an orderin the set of codes. Translation to numeric probabilities is not trivial, as there may be linear orlogarithmic scales useful in different circumstances.We generally discourage to use narrative expressions of uncertainty rather than numeric ones.People should be reminded over and over again that probabilities are subjective measures ofbelief and that an "inexact" numeric probability is much more useful than a statement that "X islikely to be true". Coded probabilities have no reliable meaning. Not even the order of narrativeconfidence qualifiers is clear in all cases (e.g., is "A is likely" more or less likely that "probablyA"?) However, such coded confidence qualifiers do at least uncover the ambiguity that exists(whether we want it or not.)DRAFT version 1.0 22 Mar 1999169


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTOnly in cases where no numeric probabilities are available (e.g. coding of narratives) is shouldthe narrative expressions of confidence be used.170 22 Mar 1999DRAFT version 1.0


Appendix A: All Data Types At a GlanceAppendix A: All Data Types At a GlanceThe following is an overview of the data type that we have defined so far.DRAFTBoolean [p. 28]A boolean value is the domain of two valued logic: either true or false tertium non datur andall the stuff everyone should know about logics. The boolean type is amaizingly usefulthroughout all layers of abstraction, from the bit in a machine up to object oriented dataanalysis.No Information [p. 31]A No Information value can occur in place of any other value to express that specificinformation is missing and how or why it is missing. This is like a NULL in SQL but withthe ability to specify a certain flavor of missing information.Character String [p. 40]A character string is a primitive data type that contains Unicode characters. A singlecharacter is not considered an <strong>HL7</strong> data type. Note that the string type is not limited toASCII characters and none of the "escape" sequences of v2.3 are defined. TransmittingUnicode characters is considered an ITS layer issue and the application layer is not supposedto deal with the peculiarities of different character encodings.Multimedia Enabled Free Text [p. 48]Free text may be anything from a few formatted characters to complex documents orimages. This data type is defined similar to the ED data type that in turn is based on theMIME standard.Technical Instance Identifier [p. 65]Technical instance identifiers are unique and unravelable through the consistent andrequired use of the ISO OBJECT IDENTIFIER (OID) [p. 66] .Technical Instance Locator [p. 71]A technical instance locator is a reference to some technical thing (e.g., image, document,telephone, e-mail box, etc.) It is a generalization of the well-known URL concept.Postal and Residential Address [p. 85]This Address data type is used to communicate postal addresses and residential addresses.The main use of such data is to allow printing mail labels (postal address), or to allow aperson to physically visit that address (residential address). An address consists of taggedAddress Parts [p. 86] .Person Name [p. 96]This type used in the RIM class Person_name that will be developed from the classPerson_alternate_name of RIM 0.88 jointly with PAFM. Person names consist of taggedPerson Name Parts [p. ??] . Typical name parts that exist in about every name are givennames, and familiy names, other part types may be defined culturally.Organization Name [p. 115]A collection of organization name variants [p. 115] . Every Organization Name Variantrepresents an organization name used in different contexts or for a different purpose or at aDRAFT version 1.0 22 Mar 1999171


<strong>HL7</strong> v3.0 Data Types Specification - Version 0.9Gunther SchadowDRAFTdifferent time.Code Value [p. 116]A code value is used to refer to technical concepts and is also the basic building block forconstrucing more complex concept descriptors for real world concepts.Concept Descriptor [p. 122]Concept descriptors are the way to refer to real world concepts (e.g. diagnoses, procedures,etc.). Just as with the old CE data type one can specify a code from one coding system withits translation into another coding system. This data type is more general than the CE so thatmultiple Code Translations [p. 123] can be given, and their dependencies can be exactlyspecified. With Code Phrases [p. 124] one single axial code can be mapped to multiplecodes for a multi axial codeing system and vice versa.Integer Number [p. 133]Embody the usual concept of integer numbers. Integers are used almost only for counts orvalues derived from counts by addition and subtraction.Floating Point Number [p. 134]Embody the abstract concept of real numbers. Floating point numbers have a built-in notionof precision in terms of the number of significant decimal digits.Ratio of Quantities [p. 137]A quotient of any two quantities. Quantities currently defined areInteger Number [p. 133]Floating Point Number [p. 134]Physical Quantity [p. 138]Monetary Amount [p. 140]Point in Time [p. 144] (although those quantities are on difference scales, not ratioscales).Physical Quantity [p. 138]A physical measurement with units.Monetary Amount [p. 140]An amount of money in a certain currency unit.Point in Time [p. 144]A difference scale quantity in the physical dimension of time. Usual expressions of points intime are made based on calendars, which are quite complex "coordinate systems" for time.This is basically the old "TS" data type.Calendar Modulus Expressions [p. 147]Expression of the form day-of-the-month, or day-of-the-week, month-of-the-year,hour-of-the-day, all have a common structure (x of the y). This data type is not yet defined.We may end up with one or many data types to cover what was called TM (time) or "weekday" in <strong>HL7</strong>.Interval [p. 149]Also called "range". A continuous subset of an ordered type. Intervals are expressed byboundaries of the base type. Boundaries may be undefined.172 22 Mar 1999DRAFT version 1.0


Appendix A: All Data Types At a GlanceDRAFTAnnotated Information [p. 152]Whenever a sender feels that "there is more to say" about a data element, the annotationstructure can be sent that contains the data element together with some free form annotation.The annotation is meant to be interpreted by humans.History [p. 154]Generic data type that allows the history of some data element to be sent. A History is a listof History Items [p. 155] .History Item [p. 155]A History Item can be used wherever a validity time (effective date/time, expiry data/time)is essential part of some data. Used primarily as the element of a History [p. 154] .Uncertain Discrete Value using Probabilities [p. 157]A discrete value and an associated probability for that value to apply in a given context.Non-Parametric Probability Distribution [p. 159]A collection of Uncertain Discrete Value using Probabilities [p. 157] to specify a probabilitydistribution.Parametric Probability Distribution [p. 161]Contains mean, standard deviation and also a distribution type plus its parameters. This isuseful, for example, to specify "precisely" the accuracy of a measurement or to specifyresults of clinical trials.Uncertain Value using Narrative Expressions of Confidence [p. 168]A discrete value and a narrative expression of confidence for that value to apply in a givencontext. Those "narrative expressions" are keywords, such as "approximately", "probably","likely", "slight chance of", etc.DRAFT version 1.0 22 Mar 1999173

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!