This is a manual for version 2 of XCEDE (XML-based Clinical and Experimental Data Exchange). XCEDE 2 is an extensible schema designed to enable the transfer and storage of several types of scientific data and metadata including (but not limited to) clinical, demographic, behavioral, physiological and image data. The target audience for this manual is anyone who is interested in using or learning more about XCEDE 2. This manual will serve as both a tutorial and as a reference.
Though the schema was designed in the context of neuroscientific projects, the structure of the schema is quite generic and can be applied as is to a wide variety of scientific disciplines.
This manual has two parts. Part I, “XCEDE 2 Core Specification” describes the various components of XCEDE 2 and describes the specification in some detail. Part II, “Using and Extending XCEDE 2” is a more practical guide to the use of XCEDE 2, providing guidance to those users who wish to extend the schema to represent project-specific metadata, or integrate XCEDE 2 into software applications and services.
XCEDE 2 development started as an effort to “harmonize” the uses of XML within the testbeds of the Biomedical Informatics Research Network [BIRN] and their collaborators within the neuroscientific community. The feedback from these discussion sessions led to an initial prototype (based on XCEDE 1.0) of a next-generation schema. Over the course of four months of daily teleconferences, the first public release of XCEDE 2 took form.
XCEDE 2 has its origins in various XML schemas developed for collaborative neuroinformatics projects, including:
a general scientific metadata storage framework.
database and web service designed to facilitate management and exploration of neuroimaging and related data.
provides a format-agnostic interface to image data and is the basis for the binary data resource in XCEDE 2.
records the history of processing steps and is the basis for the <provenance> element in the schema
provides a general framework (and MATLAB interface) for storage and retrieval of thresholded statistical parametric maps and associated anatomical labels.
The primary developers of XCEDE 2 are (in alphabetical order): Syam Gadde (Duke University), Jeff Grethe (University of California, San Diego), Dave Keator (University of California, Irvine), and Dan Marcus (Washington University, St. Louis), all of whom have received support for this project through various BIRN testbeds. Nicole Aucoin (Brigham & Women's Hospital, Harvard University) led the development of the Data Provenance schema.
XCEDE is an evolving format -- later versions will be informed by feedback from users and developers. The latest schema and related information can be found on the project's web page at http://www.xcede.org/.
This section of the XCEDE 2.0 manual focuses on the high-level structure and major components of the XCEDE 2 Core schema. Each chapter contains a description of the component and provides examples of usage. The examples in this section are largely drawn from neuroscientific metadata, as this data served as the driving force behind development. However, these examples are by no means comprehensive; the schema was designed to be generally applicable (and extensible) to many types of scientific data, and this manual by necessity focuses on only a few.
An XCEDE 2 dataset is a data repository (stored in files, databases, or using other storage mechanisms) that can be represented as a collection of one or more XML documents, each of which validates against the XCEDE 2 XML schema (Appendix A, XML Schema for XCEDE 2.0). The XCEDE 2 specification does not prescribe any particular mechanism by which these documents are located, stored, grouped, or linked, though XCEDE 2 allows certain XML elements to link to other target elements and to optionally specify URLs as hints as to the location of the documents containing these targets.
For example, a given XCEDE 2 dataset may be stored as a single XML document, or a collection of files in a single directory on a file system, or may be distributed within a hierarchical directory structure (which may or may not reflect the structure of the data within), or may be stored within a database accessible by query through a web interface. The semantics of the dataset should be fully reflected in the XML representation, and should not be dependent on how the dataset is stored.
A schema-compliant XCEDE 2 document must be a valid XML 1.0 document [XML10], and must have one root element, <XCEDE>, which (like all other XCEDE 2 elements) is in the http://www.xcede.org/xcede-2 XML namespace.
Several major components of the XCEDE 2 dataset are represented as children of the XCEDE root element. These components include:
Experiment hierarchy. The components in the experiment hierarchy are represented by elements <project>, <subject>, <visit>, <study>, <episode>, <acquisition>. These are described in more detail in Chapter 2, Experiment Hierarchy.
Data. The element <data> is used to store actual data within the XML document. Examples include events (Chapter 6, Events) and assessments (Chapter 8, Assessments).
Data resources. Represented by the element <resource>, these point to external files which store actual data. See Chapter 3, Binary Data Resources.
Analyses. The <analysis> element encapsulates metadata about data that is derived from one or more inputs, whose data may be represented by other XCEDE components, such as <data> or data resources. See Chapter 9, Analysis.
Protocols. Structures to describe the expected course of an experimental paradigm are provided by the <protocol> element. See Chapter 7, Protocols.
Other XCEDE components appear as subcomponents of other XCEDE structures:
Data provenance. The <provenance> element appears in the types analysis_t (on which the <analysis> element is based) and dataResource_t (one of the available data models for the top-level <resource> element). See Chapter 5, Data Provenance.
Comments and annotations. The <commentList> and <annotationList> elements provide mechanisms to store, respectively, descriptive text regarding the real-world entities or concepts represented by the associated XML element, or about the stored XML element itself. These are available in all elements based on abstract_container_t, which includes all the hierarchy elements and <analysis>.
Informational resources. These point to documentation that may illuminate aspects of the data, data collection, protocol, etc. -- these might include peer-reviewed publications, equipment manuals, and others. Informational resources are represented in the type informationResource_t which is used in the <resourceList> element in abstract_container_t (as with the comments and annotations).
Terminology. Various elements in XCEDE 2 include the attribute group terminology_ag or use the type terminologyString, which allow the user to associate them with terms from particular nomenclatures/ontologies. For example, a subject's <species> may refer to a standard term listed in a well-known public nomenclature. See Chapter 10, Terminology.
The XCEDE XML document structure is defined using W3C XML Schema language [XMLSchema10]. With a few exceptions, the XCEDE 2 XML schema subscribes to the following conventions and best practices.
Naming and capitalization.
Element and attribute names are alphanumeric (<project>). All letters are lowercase, with the exception of acronyms, which are uppercase (<XCEDE>, projectID), and, in multi-word names, the initial letters of the second and following words which are capitalized (<dataResourceRef>). For the purposes of this rule, ID (for “identifier”) is considered an acronym.
Schema type names (which do not appear in XML instance documents except in xsi:type attributes) follow the same naming conventions as element and attribute names, but have the _t suffix (analysis_t). If the type is “abstract”, then it has an abstract_ prefix (abstract_container_t).
Attribute group names (which do not appear in XML instance documents) follow the same naming conventions as element and attribute names, but have the _ag suffix (allLevelExternalIDs_ag).
Schema best practices.
Follow the schema versioning best practices recommended by Roger Costello in [CostelloSchemaVersioning]. Some of the implications are listed below:
The XCEDE namespace will change for major versions, but not minor versions. So, the namespace is as follows: http://www.xcede.org/xcede-major_version. For example: http://www.xcede.org/xcede-2
XSD files will be named xcede-VERSION-DOMAIN. For example: xcede-2.0-core.xsd
All elements should have a corresponding named complexType. For example: <project> is of type project_t.
The schema prescribes that some elements precede or follow other elements -- document creators should follow this strictly. However, applications should be written without expecting a particular element order, in case later versions of the schema change the order of elements. That is, be conservative in what you write, but liberal in what you accept.
This is not to say that element order is not important -- the relative order of multiple elements with the same name should be preserved and can be relied upon by applications. So, if the schema specifies the content model of an element to have multiple <A> children and then multiple <B> children, applications should treat a dataset the same whether all the <A> children came before or after (or even interleaved with) the <B> elements, but all <A> elements should be presented to the application in the same order they appear in the document. The corollary is that the semantics of any XCEDE 2 element must not depend on the order in which differently-named children are presented in the document.
Document validation.
The structure of an XCEDE 2 document can be validated using the XCEDE 2 Core schema. Several XML Schema validators are available to perform this task. Validating against the schema will ensure that generic XCEDE 2 parsers can read the document.
Users may wish to perform content-based validation of an XCEDE document, for example, to make sure that “MR” image metadata stored in an XCEDE document have certain required fields such as TR, TE, magnet field strength, etc., and that their values fall within reasonable ranges. This kind of validation is not possible using XML Schema. However, other content-based schema languages exist for this purpose, most notably Schematron [Schematron]. As long as the XCEDE 2 Core schema is structurally sufficient to represent data for a given project or type of data, using a content-based schema language such as Schematron allows users to restrict or “specialize” the content of XCEDE documents without needing to modify or extend from the XCEDE 2 Core schema itself.
As illustrated in Figure 2.1, “XCEDE hierarchy”, the XCEDE experiment hierarchy consists of several levels representing divisions of experiment data at various granularities. Elements at each level contain level-specific “info” elements, whose schema types may be derived to store experiment-specific or data modality-specific metadata. The linking mechanism between levels is flexible enough to support the omission of levels if the schema user finds them unnecessary.
In the typical intended usage, a project is the top-level division of experiment data, and represents a research project which collects and analyzes data from one or more subjects which are divided (within the project) into subject groups. A subject may be a member of multiple research projects, and it is the subject group that maintains and distinguishes the mappings between subjects and research projects.
A visit may represent a subject's appearance at an experiment “site” (for collaborative projects, this could be the institution or lab at which the data is being collected or analyzed). A visit may be further subdivided into one or more studies, each of would consist of one or more data collection episodes.
Visit and study are more or less arbitrary divisions of the data that exist for convenience, and do not in themselves have any inherent meaning as far as the schema is concerned. However, an episode does have a meaning, and is intended to represent a unit of data collection by one or more instruments over a given time interval. Each set of data collected (perhaps by different instruments) over this time interval should be represented by an acquisition. Multiple acquisitions within an episode should be understood to occur simultaneously over the time interval represented by the episode. So, for example, an episode in an fMRI study may encapsulate the acquisition of a time-series of volume images from an MR scanner, as well as other acquisitions of behavioral or physiological data; all these (simultaneously collected) data would be stored as individual acquisitions and stored as part of the same episode.
At first thought, it might seem natural to represent the experiment hierarchy as described as a traditional XML hierarchy, where higher-level elements encapsulate lower-level elements as child elements. However, in XCEDE, all level elements (<project>, <visit>, etc.) are stored as children of the root <XCEDE> element. Links between levels are implicit in the level IDs assigned to each element and propagated to elements in lower levels. The great benefit of this approach is that applications are easier to write because all major elements are stored in the same place (under the XCEDE root element), and that XCEDE documents can be merged by merely concatenating the lists of top-level elements under a single XCEDE root element. Another advantage is to allow users of the schema to omit levels merely by omitting the unnecessary elements and IDs/links.
Specifying one or more level IDs allows one to explicitly define the hierarchical relations between XCEDE elements in different levels in the experiment hierarchy. Every level element has a set of these level IDs, composed of the element's own ID attribute, plus its “ancestor” ID attributes, indicating which higher-level elements have this element in their scope. For example, the <visit> element contains the ancestor ID attributes subjectGroupID, subjectID, and projectID. Level elements are linked to their ancestors in the hierarchy by sharing the appropriate subset of level IDs. In addition, any level element can linked from another XCEDE element that likewise specifies the appropriate level IDs.
For example, a link to a visit element may specify visitID, subjectID, subjectGroupID, and projectID attributes, and a level attribute with the value visit, indicating that this link is to a visit element. An application resolving this link will search for a visit element in the XCEDE 2 dataset whose attributes match those specified in the link (the visitID attribute in the link is matched against the ID attribute in the visit element).
Though the individual ID attributes are not required to be unique, the set of these level IDs applied to a level element must be unique. In addition, any link to a level element must provide enough level IDs to uniquely describe a single target level element. Level ID attributes not specified in the link are understood to match any value, but a link must specify enough of these IDs match at most one level element.
The elements that can link to level elements are <catalog>, <resource>, and <data> (children of the <XCEDE> root element), and <inputRef> and <outputRef> (children of the <analysis> element). As previously described, level elements (except for <project> and <subject>, which are at the top of the experiment hierarchy), by virtue of specifying their own ancestor IDs, automatically incorporate links to their ancestor elements.
The metadata hierarchy illustrated in Figure 2.1, “XCEDE hierarchy” can be represented in XCEDE as shown in Figure 2.2, “Metadata hierarchy instance” (only those elements/attributes relevant to linking are shown; the actual metadata contents of the elements are omitted for space). Note that subject “1” in the illustration is represented in two different projects, and so the corresponding <subject> element may be an ancestor for visits in both project “A” and “B”. However, the subject's role in each project is specified by the combination of the project ID and subject group ID (defined in the subject group lists in <project>), and these are used in every level element starting with <visit> and below.
Figure 2.2. Metadata hierarchy instance
<XCEDE xmlns="http://www.xcede.org/xcede-2">
<project ID="A">
<projectInfo>
<subjectGroupList>
<subjectGroup ID="X">
<subjectID>1</subjectID>
<subjectID>2</subjectID>
</subjectGroup>
</subjectGroupList>
</projectInfo>
</project>
<project ID="B">
<projectInfo>
<subjectGroupList>
<subjectGroup ID="Z">
<subjectID>3</subjectID>
</subjectGroup>
</subjectGroupList>
</projectInfo>
</project>
<subject ID="1" />
<subject ID="2" />
<subject ID="3" />
<visit ID="1"
projectID="A" subjectID="1" subjectGroupID="X" />
<study ID="MR scan"
projectID="A" subjectID="1" subjectGroupID="X" visitID="1" />
<episode ID="task run 1"
projectID="A" subjectID="1" subjectGroupID="X" visitID="1" studyID="MR" />
<acquisition ID="MR image"
projectID="A" subjectID="1" subjectGroupID="X" visitID="1" studyID="MR"
episodeID="task run 1" />
<acquisition ID="behavioral data"
projectID="A" subjectID="1" subjectGroupID="X" visitID="1" studyID="MR"
episodeID="task run 1" />
<acquisition ID="heart rate"
projectID="A" subjectID="1" subjectGroupID="X" visitID="1" studyID="MR"
episodeID="task run 1" />
<study ID="Clinical interview"
projectID="A" subjectID="1" subjectGroupID="X" visitID="2" />
<!-- ... etc. ... -->
</XCEDE>
The XCEDE 2 Binary Data Resource component is used to provide a generic interface to a binary data stream stored in one or more external files. Any of the binary data resource types described in this chapter can be used anywhere an abstract_resource_t is called for (with the appropriate xsi:type attribute); in the current XCEDE schema, these locations are the top-level <resource> element and the <dataResource> child element of <acquisition>.
XCEDE provides multiple layers of derived types to store more specialized information about the binary data. The base type and each of the derived types are described in turn.
abstract_resource_t. The abstract base type abstract_resource_t provides a few elements and attributes that are especially important for binary data resources. In particular, the <uri> element and its offset and size attributes point to a “chunk” of data stored in an external file. A series of <uri> elements define a stream of data that may be described in greater detail by the data types described below.
binaryDataResource_t. This type derives from abstract_resource_t and allows an application to interpret the data stream as a sequence of data items with a given data type (<elementType>) and byte order (<byteOrder>).
dimensionedBinaryDataResource_t. The data stream, until now, could only be interpreted as a one-dimensional sequence. This type provides <dimension> elements that allow the data stream to be interpreted as a multi-dimensional array of data items. Each dimension has a <size> and a <label>, as well as the ability to discard subsets of the data in the data stream (using the outputSelect attribute).
mappedBinaryDataResource_t. This type places the multi-dimensional array of data items represented by dimensionedBinaryDataResource_t into an arbitrary coordinate system.
Several examples of binary data are presented here, each showing the use of one of the different binary data types described in this chapter.
The basic binary data resource type describes a sequence of data items. For example, consider a data file (random_data_file.bin) containing 2048 random 32-bit floating point numbers, stored in little-endian (least-significant-byte first) order. The <dataResource> describing this data is shown in Figure 3.1, “Simple binaryDataResource_t example”.
Figure 3.1. Simple binaryDataResource_t example
<dataResource xsi:type="binaryDataResource_t"> <uri offset="0" size="8192">random_data_file.bin</uri> <elementType>float32</elementType> <byteOrder>lsbfirst</byteOrder> </dataResource>
Note the xsi:type specifying that this <dataResource> element is of type binaryDataResource_t. (The xsi: prefix should have already been declared previously in the XML file using something similar to xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")
The <elementType> element is restricted to one of several pre-defined strings (see the schema for details). The <byteOrder> element must be lsbfirst for little-endian data or msbfirst for big-endian data.
If the <compression> element is specified, it specifies that the file(s) pointed to by the <uri> elements are compressed. The content of the element should specify which type of compression (the only compression method specifically recognized by this specification is gzip). The size and offset attributes in the <uri> element always refer to the uncompressed data. An example of this is shown in Figure 3.2, “binaryDataResource_t with compression”.
Figure 3.2. binaryDataResource_t with compression
<dataResource xsi:type="binaryDataResource_t"> <uri offset="0" size="8192">random_data_file.bin.gz</uri> <elementType>float32</elementType> <byteOrder>lsbfirst</byteOrder> <compression>gzip</compression> </dataResource>
As a special case, if the application does not find the file pointed to by a URI, and the <compression> element is not present, it may search for the same file with an appended .gz suffix, and if it exists, treat it as implicitly gzip-compressed data. Figure 3.3, “binaryDataResource_t with implicit compression” shows how the same data in Figure 3.2, “binaryDataResource_t with compression” could be expressed using this alternative method. Pointing to the uncompressed version of the file (even when only the compressed version exists) allows the user to decompress or compress the data file at will, without affecting the ability of the application to read the data using the same binaryDataResource_t. Note that the <uri> and <compression> elements must be internally consistent. It would be an error to reference the uncompressed file random_data_file.bin and yet say that it was compressed using <compression>gzip</compression>. Likewise, explicit references to the compressed file (especially files that do not have the .gz suffix) must specify the compression method explicitly using the compression element.
Figure 3.3. binaryDataResource_t with implicit compression
<dataResource xsi:type="binaryDataResource_t"> <uri offset="0" size="8192">random_data_file.bin</uri> <elementType>float32</elementType> <byteOrder>lsbfirst</byteOrder> </dataResource>
Consider a camera that acquires an image using a 256x256 matrix of big-endian 32-bit signed integer voxels. This data has two spatial dimensions, which, by convention, we label x, and y (and z if a third spatial dimension is needed, and t if there is a time dimension). Figure 3.4, “dimensionedBinaryDataResource_t example” shows how this data might be represented.
Figure 3.4. dimensionedBinaryDataResource_t example
<dataResource xsi:type="dimensionedBinaryDataResource_t">
<uri offset="0" size="262144">rawdata.img</uri>
<elementType>int32</elementType>
<byteOrder>msbfirst</byteOrder>
<dimension label="x">
<size>256</size>
</dimension>
<dimension label="y">
<size>256</size>
</dimension>
</dataResource>
Dimensions are ordered from fastest-moving to slowest-moving. So in the above example, the x dimension index changes on each consecutive data item, but the y dimension changes every 256 elements.
A “mapped” binary data resource is a (perhaps multidimensional) array of values, the matrix indices of which can be converted into a location in a given coordinate system. The location of the bounding box of the data in this space is given by specifying a location (in target-space coordinates) for the the first data item, and two things for each dimension: a unit-length direction vector (in the target-space coordinate system) and the spacing between successive data items in that dimension. The transformation matrix for a three-dimensional coordinate system has the form shown in Figure 3.5, “Transformation matrix”. This transformation matrix converts from matrix indices (x,y,z) to a coordinate location (a,b,c). Figure 3.6, “mappedBinaryDataResource_t example” shows how the components of a transformation of MR image data into scanner RAS (Right/Anterior/Superior) coordinates are represented in a mappedBinaryDataResource_t. The unit vectors for each dimension are (XA XB XC) = (1 0 0), (YA YB YC) = ( 0 1 0 ), and (ZA ZB ZC) = ( 0 0 1 ), and are placed in the <direction> elements in each <dimension> element. The spacing values (SX SY SZ) = ( 3.75mm 3.75mm 4mm ) are put in the <spacing> element in each <dimension>. The coordinates of the first voxel in the data are given by (OA OB OC) = ( -120 -120 -52 ).
Figure 3.6. mappedBinaryDataResource_t example
<dataResource xsi:type="mappedBinaryDataResource_t">
<uri offset="0" size="442368">V0001.img</uri>
<uri offset="0" size="442368">V0002.img</uri>
<uri offset="0" size="442368">V0003.img</uri>
<uri offset="0" size="442368">V0004.img</uri>
<uri offset="0" size="442368">V0005.img</uri>
<!-- ... 135 more <uri> elements omitted for space ... -->
<elementType>int32</elementType>
<byteOrder>msbfirst</byteOrder>
<dimension label="x">
<size>64</size>
<spacing>3.75</spacing>
<gap>0</gap>
<direction>1 0 0</direction>
<units>mm</units>
</dimension>
<dimension label="y">
<size>64</size>
<spacing>3.75</spacing>
<gap>0</gap>
<direction>0 1 0</direction>
<units>mm</units>
</dimension>
<dimension label="z">
<size>27</size>
<spacing>4</spacing>
<gap>1</gap>
<direction>0 0 1</direction>
<units>mm</units>
</dimension>
<dimension label="t">
<size>140</size>
<spacing>2</spacing>
<gap>0</gap>
<datapoints>0 2 4 6 8</datapoints>
<units>sec</units>
</dimension>
<originCoords>-120 -120 -52</originCoords>
</dataResource>
A more complicated example is given by data generated by a Siemens MR scanner. In this case, the data represents a three-dimensional 64x64x32 image, stored in DICOM format. However, because the earlier versions of the DICOM format did not support three-dimensional data in one file, Siemens came upon the clever idea to “tile” the 32 two-dimensional slices across an NxN two-dimensional grid (Figure 3.7, “A “tiled”image”).
Applications may naturally want to express this data as a three-dimensional block, with columns, rows, and slices. In a conventionally-stored three-dimensional X×Y×Z image, the first X voxels compose the first row in the first slice, and then the next X voxels are the second row in the first slice; likewise the first X*Y voxels are the first slice, and the next X*Y voxels are the second slice, and so on. However, in the “tiled” image, though the first X voxels are again the first row in the first slice, the next X voxels are the first row in the second slice! At first it would seem that the dimension order has merely been switched, and specifying the labels of the dimensions as x, z, and y would fix things. However, we only hit six slices' first rows before hitting going to the second row of the same six slices. Only after going through all the rows in this fashion in the first six slices do we go on to the next six slices.
The end result is that the dimension that we are calling the z dimension has been split in two. The two components of the z dimension are interleaved with the x and y dimensions like so: x, z1, y, z2. The two components of the z dimension are distinguished with the splitRank attribute, as shown in Figure 3.8, “Split dimension example”.
Figure 3.8. Split dimension example
<dataResource xsi:type="binaryDataResource_t">
<uri offset="9240" size="589824">img0001.dcm</uri>
<elementType>uint32</elementType>
<byteOrder>lsbfirst</byteOrder>
<dimension label="x">
<size>64</size>
</dimension>
<dimension label="z" splitRank="1">
<size>6</size>
</dimension>
<dimension label="y">
<size>64</size>
</dimension>
<dimension label="z" splitRank="2">
<size>6</size>
</dimension>
</dataResource>
Applications should read this data as if it were four dimensions, and then permute the data to bring the two z dimensions together (in the order specified by splitRank) in the position of the highest-ranked split dimension, and the two dimensions can then be merged into one. The size of the new z dimension is the product of the sizes of the component split dimensions, so 6 * 6 = 36.
You may recall that the original data was acquired as a 64x64x32 volume, but the NxN tiling representation requires that the number of tiles be the square of an integer N. One more mechanism has been added to the <dimension> element to accomodate the presence of data that should be disregarded: the outputSelect attribute (see Figure 3.9, “outputSelect example”).
Figure 3.9. outputSelect example
<dataResource xsi:type="binaryDataResource_t">
<uri offset="0" size="589824">img0001.dcm</uri>
<elementType>uint32</elementType>
<byteOrder>lsbfirst</byteOrder>
<dimension label="x">
<size>64</size>
</dimension>
<dimension label="z" splitRank="1">
<size>6</size>
</dimension>
<dimension label="y">
<size>64</size>
</dimension>
<dimension label="z" splitRank="2" outputSelect="0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31">
<size>6</size>
</dimension>
</dataResource>
The outputSelect attribute specifies a list of indices along the given dimension (or combined dimension if it occurs on the highest-ranked component of a split dimension) that should be regarded as valid data. Data in the other indices should be ignored.
Catalogs in XCEDE 2 are containers of related resources, resource references, and data references. They are recursive in that catalogs can contain a list of catalogs and catalog references.
Catalogs are useful in a number of contexts. A catalog could be created, for example, to represent all of the acquisition resources associated with an episode. These catalogs could be contained within a parent catalog that represents each of the episodes in an MR study. Catalogs could also be used to represent the various resources generated as part of an analysis or uploaded and tagged by users.
A benefit of using catalogs, as opposed to linking resources directly in analysis and acquisition elements, is that the catalogs can be sent independent of the parent content to client applications that choose not to support the full XCEDE specification. Additionally, catalog documents can be quite large when they contain thousands of entries (for example, when pointing to individual DICOM files for an acquisition), so separating the catalogs from their parents can be more efficient.
Catalogs are represented by the catalog_t complex type, derived from abstract_tagged_entity_t. The catalog child element of the root XCEDE element is of type catalog_t. The resources included in the catalog can be pointed to from by resource references in a number of places in XCEDE, including analysis_t and acquisition_t.
The abstract base type abstract_tagged_entity_t provides an unbounded set of tags that users
and applications can populate to attach ad-hoc documentation and labels to the catalog and its contents. catalog_t also includes an unbounded list of
child subcatalogs (or references to child catalogs) and an unbounded list of entries. Each entry is either a resource, a reference to a resource, or
a reference to an XCEDE data element (an analysis, for example).
Figure 4.1. Catalog instance
<XCEDE> <catalog ID="ID0"> <catalogList> <catalog ID="ID1"> <entryList> <entry ID="ID2" name="lh.pial" description="pial surface of left hemisphere" format="FreeSurfer:surface-1" content="lh.pial" cachePath="" uri="file://c:/data/fBIRN-AHM2006/fbph2-000648622547/surf/lh.pial"/> <entry ID="ID3" name="brain" description="extracted brain mri" format="FreeSurfer:mgz-1" content="brain" cachePath="" uri="file://c:/data/fBIRN-AHM2006/fbph2-000648622547/mri/brain.mgz"/> <entry ID="ID4" name="zstat8" description="8th zstatisic contrast" format="nifti:nii-1" content="zstat8" cachePath="" uri="file://c:/data/fBIRN-AHM2006/sirp-hp65-stc-to7-gam.feat/stats/zstat8.nii"/> <entry ID="ID5" name="aparc+aseg" description="parcellation and segmentation label map" format="FreeSurfer:mgz-1" content="aparc+aseg" cachePath="" uri="file://c:/data/fBIRN-AHM2006/fbph2-000648622547/mri/aparc+aseg.mgz"/> </entryList> </catalog> <catalog ID="ID1"> <entryList> <entry ID="ID2" name="lh.pial" description="pial surface of left hemisphere" format="FreeSurfer:surface-1" content="lh.pial" cachePath="" uri="file://c:/data/fBIRN-AHM2006/fbph2-000648622547/surf/lh.pial"/> <entry ID="ID3" name="brain" description="extracted brain mri" format="FreeSurfer:mgz-1" content="brain" cachePath="" uri="file://c:/data/fBIRN-AHM2006/fbph2-000648622547/mri/brain.mgz"/> <entry ID="ID4" name="zstat8" description="8th zstatisic contrast" format="nifti:nii-1" content="zstat8" cachePath="" uri="file://c:/data/fBIRN-AHM2006/sirp-hp65-stc-to7-gam.feat/stats/zstat8.nii"/> <entry ID="ID5" name="aparc+aseg" description="parcellation and segmentation label map" format="FreeSurfer:mgz-1" content="aparc+aseg" cachePath="" uri="file://c:/data/fBIRN-AHM2006/fbph2-000648622547/mri/aparc+aseg.mgz"/> </entryList> </catalog> </catalogList> </catalog> </XCEDE>
Various elements in XCEDE 2 refer to data or data resources that were generated from other input data by processing tools. Data provenance refers to the art of tracking the origins of the derived data and the processing steps that resulted in the output data. Besides being informative, provenance information could allow a user to recreate the processing stream and replicate the output data using the same tools and inputs. Provenance information could either be written by the software tools themselves, or could be extracted by wrapper scripts that write the provenance information on the tool's behalf (the likely scenario for external or legacy applications).
Some of the following documentation comes from Nicole Aucoin who is leading the Data Provenance effort for the MBIRN.
provenance_t. Data represented by an instance of dataResource_t or analysis_t may contain a <provenance> element of this type, which contains an ordered sequence of processing steps (processStep_t) representing the processing steps used to generate this data.
processStep_t. This type contains two attributes, ID and parent, indicating a unique ID for the step and its parent step, respectively. It includes a number of child elements that describe the conditions under which an executable was run. All of the child elements are formally optional. The following are strongly recommended:
The name of the executable. Version and build numbers may be specified using the attributes version and build. See instructions below for inserting version control content. Alternately, you can define it as a user string.
The arguments passed into the executable. Optional attributes inputs and outputs are used to break down the full list of arguments into input and output arguments.
The time at which the program started to run.
Who was logged in and running this executable?
The unique id of the computer running this executable. Replaces Machine from old schema.
The type of hardware that the processing step is run on, sun, x86, etc.
The operating system under which the processing step is being run.
These are optional:
Revision control repository information. Use the CVS keyword Id, inserted into your source code file as $Id: chap_provenance.xml,v 1.14 2007/09/10 17:01:14 gadde Exp $ and it will be replaced with the file name, version, date of change and who changed it.
Also works with SVN, but see SVN notes for enabling it.
What was used to compile this executable? Name and version.
The name and version of a linked library. Can be repeated as many times as necessary. This is dependent on the library in question, examples are given for VTK and Tcl and Tk.
The time stamp indicating when the executable was built.
This tag gives the overall package version, if the executable is part of a larger unit. For example, mri_convert is part of FreeSurfer, and the whole FreeSurfer package may have a different version number than the mri_convert executable.
Where can the software be found? Link to a code repository or a web page. Can be extracted automatically if using SVN, via the HeadURL keyword.
Provenance elements are children of dataReource and analysis types and describe the sequence of processing steps that were executed to generate the data represented by the parent element. In the example below, the provenance of the NIfTI data file (represented as a binaryDataResource) indicates that it was generated in two processing steps (filter1 and filter2) on the same machine (lablin1) about 30 mintues apart.
Figure 5.1. Simple provenance example
<dataResource xsi:type="binaryDataResource_t"> <uri offset="352" size="8192">random_generated_data_file.nii</uri> <provenance ID="1"> <processStep parent="1" ID="1"> <program build="1.0" version="1.0">filter1</program> <programArguments inputs="-in analyze" outputs="-out minc">-size 10 -reps 100 -v -in minc -out nifti</programArguments> <timeStamp>11:20:37</timestamp> <user>xnatmaster</xnat> <hostName>lablin1</hostname> <architecture>x86</architecture> <platform version="6">fedora</platform> </processStep> <processStep parent="1" ID="2"> <program build="String" version="1.2">filter2</program> <programArguments inputs="-in minc" outputs="-out nifti">-size 20 -reps 200 -v -in minc -out nifti</programArguments> <timeStamp>11:50:21</timestamp> <user>xnatmaster</user> <hostName>lablin1</hostname> <architecture>x86</architecture> <platform version="6">fedora</platform> </processStep> </provenance> <elementType>float32</elementType> <byteOrder>lsbfirst</byteOrder> </dataResource>
The figure below illustrates a C code snippet (written by Nicole Aucoin and implemented in 3DSlicer) that generates provenance XML from an application. [Note: This may not be completely compliant with XCEDE2.]
void printAllInfo(int argc, char **argv)
{
int i;
struct tm * timeInfo;
time_t rawtime;
// yyyy/mm/dd-hh-mm-ss-ms-TZ
// plus one for 3 char time zone
char timeStr[27];
fprintf(stdout, "<ProcessStep>\n");
fprintf(stdout, "<Program version=\"$Revision: 1.14 $\">%s</ProgramName>\n", argv[0]);
fprintf(stdout, "<ProgramArguments>");
for (i = 1; i < argc; i++)
{
fprintf(stdout, " %s", argv[i]);
}
fprintf(stdout, "</ProgramArguments>\n");
fprintf(stdout, "<CVS>$Id: chap_provenance.xml,v 1.14 2007/09/10 17:01:14 gadde Exp $</CVS> <TimeStamp>");
time ( &rawtime );
timeInfo = localtime (&rawtime);
strftime (timeStr, 27, "%Y/%m/%d-%H-%M-%S-00-%Z", timeInfo);
fprintf(stdout, "%s</TimeStamp>\n", timeStr);
fprintf(stdout, "<User>%s</User>\n", getenv("USER"));
fprintf(stdout, "<HostName>");
if (getenv("HOSTNAME") == NULL)
{
if (getenv("HOST") == NULL)
{
fprintf(stdout, "(unknown)");
}
else
{
fprintf(stdout, "%s", getenv("HOST"));
}
}
else
{
fprintf(stdout, "%s", getenv("HOSTNAME"));
}
fprintf(stdout, "</HostName><Platform version=\"");
#if defined(sun) || defined(__sun)
#if defined(__SunOS_5_7)
fprintf(stdout, "2.7");
#endif
#if defined(__SunOS_5_8)
fprintf(stdout, "8");
#endif
#endif
#if defined(linux) || defined(__linux)
fprintf(stdout, "unknown");
#endif
fprintf(stdout, "\">");
// now the platform name
#if defined(linux) || defined(__linux)
fprintf(stdout, "Linux");
#endif
#if defined(macintosh) || defined(Macintosh)
fprintf(stdout, "MAC OS 9");
#endif
#ifdef __MACOSX__
fprintf(stdout, "MAC OS X");
#endif
#if defined(sun) || defined(__sun)
# if defined(__SVR4) || defined(__svr4__)
fprintf(stdout, "Solaris");
# else
fprintf(stdout, "SunOS");
# endif
#endif
#if defined(_WIN32) || defined(__WIN32__)
fprintf(stdout, "Windows");
#endif
fprintf(stdout, "</Platform>\n");
if (getenv("MACHTYPE") != NULL)
{
fprintf(stdout, "<Architecture>%s</Architecture>\n", getenv("MACHTYPE"));
}
fprintf(stdout, "<Compiler version=\"");
#if defined(__GNUC__)
#if defined(__GNU_PATCHLEVEL__)
fprintf(stdout, "%d", (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__));
#else
fprintf(stdout, "%d", (__GNUC__ * 10000 + __GNUC_MINOR__ * 100));
#endif
#endif
#if defined(_MSC_VER)
fprintf(stdout, "%d", (_MSC_VER));
#endif
// now the compiler name
fprintf(stdout, ">");
#if defined(__GNUC__)
fprintf(stdout, "GCC");
#else
#if defined(_MSC_VER)
fprintf(stdout, "MSC");
#else
fprintf(stdout, "UKNOWN");
#endif
#endif
fprintf(stdout, "</Compiler>\n");
fprintf(stdout, "<Library version=\"%s\">VTK</Librarary><Library version=\"unknown\">ITK</Library><Library version=\"%s\">KWWidgets</Library>\n", VTK_VERSION, KWWidgets_VERSION);
int major, minor, patchLevel;
Tcl_GetVersion(&major, &minor, &patchLevel, NULL);
fprintf(stdout, "<Library version=\"%d.%d.%d\">TCL</Library>\n", major, minor, patchLevel);
//fprintf(stdout, "<Library version=\"%d.%d.%d\">TK</Library>\n", major,
//minor, patchLevel);
#ifdef USE_PYTHON
fprintf(stdout, "<Library version=\"%s\">Python</Library>\n", PY_VERSION);
#endif
fprintf(stdout, "<Repository>$HeadURL: http://www.na-mic.org/svn/Slicer3/trunk/Applications/GUI/Slicer3.cxx $</Repository>\n");
fprintf(stdout, "<ProcessStep>\n");
fprintf(stdout, "\n");
}
For the version, you can use the CVS keyword Name, inserted into your source code file as $Name: $ and it will be replaced with a the version tag when you check out the code with a tag:
cvs -r tag co [modulename]</programlisting>
If you are using SVN, in order to use CVS style keywords and have them be replaced with the appropriate values, you must enable the SVN repository's properties on a file by file (and keyword by keyword) basis by issuing the following command:
svn propset svn:keywords "Revision Id Date HeadURL" [filename.ext]
Events in XCEDE are merely time intervals annotated with arbitrary metadata. This component can be used to represent several types of behavioral data, statistics calculated on time series data, or any other metadata whose proper interpretation requires that it be associated with a particular interval in time.
An XCEDE event consists of the following:
The onset (in seconds) of the time interval.
The duration (in seconds) of the time interval.
Usage of this field is user-specified
Usage of this field is user-specified
The units of the onset and duration fields. This field is optional, and it is recommended that users of the schema prescribe an implicit unit of measurement and use it consistently. In that case, this field may be considered informational only.
A value adds named metadata to this event.
The following instance shows how each of these fields may be populated.
<event type="visual" name="event#1" units="sec"> <onset>0</onset> <duration>2</duration> <value name="shape">square</value> <value name="shapecolor">red</value> </event>
Event elements are stored within the <data> element of an <acquisition>. The <data> element should be of type events_t (using xsi:type — see examples below).
All onsets are relative to an arbitrary time reference. Typically, time 0 (zero) could mean the start of data acquisition. An event list may be interpreted as concurrent with data in other <acquisition> elements (which could be other event lists). If so, the same time reference should be used in all concurrent acquisition data.
There is no ordering constraint on events in a list. Applications should depend on using the <onset> elements to order the events chronologically if they so desire.
An optional <params> element may precede the first event in a list, and this element stores arbitrary metadata (using the same <value> element used above) that apply to all events in the list.
Consider the timeline shown in Figure 6.1, “An event timeline”, representing stimuli and responses in a neuroimaging study. We show in Figure 6.2, “XCEDE Events example - stimulus/response data” how the first 5 seconds' worth of the events might be represented in XCEDE.
Figure 6.2. XCEDE Events example - stimulus/response data
<XCEDE xmlns="http://www.xcede.org/xcede-2"
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<acquisition ID="my_stimulus_response_data">
<dataRef ID="my_events" />
</acquisition>
<data ID="my_events" xsi:type="events_t">
<event type="visual">
<onset>0</onset>
<duration>2</duration>
<value name="shape">square</value>
<value name="shapecolor">red</value>
</event>
<event type="visual">
<onset>2.5</onset>
<duration>2</duration>
<value name="shape">square</value>
<value name="shapecolor">blue</value>
</event>
<event type="audio">
<onset>0.3</onset>
<duration>1.4</onset>
<value name="frequency">low</value>
</event>
<event type="audio">
<onset>2.0</onset>
<duration>1.4</onset>
<value name="frequency">low</value>
</event>
<event type="audio">
<onset>3.5</onset>
<duration>1.4</onset>
<value name="frequency">low</value>
</event>
<event type="response">
<onset>3.4</onset>
<value name="button">1</value>
</event>
</data>
</XCEDE>
Each stimulus and each response are stored as separate event elements. Note that all the visual events appear first in the XCEDE file, then the audio events, and then the response event. This ordering is arbitrary, and the events could easily have been presented in chronological (or random!) order. The semantic interpretation of the events within an event list must not depend on their document order.
Stimulus and response data are not the only appropriate content to represent in XCEDE events. Figure 6.3, “XCEDE Events example - QA data” shows how quality assurance (QA) statistics for each volume/timepoint in an fMRI scan can be stored as events. Note that the time reference for the onsets is arbitrary, but if the acquisition containing the event data is contained within the same episode as other time-locked data, it should be assumed that the time reference is the same for all acquisitions within the episode, unless otherwise explicitly specified. So, for example, this QA data might be associated with MR image data in the same episode, and time 0 (zero) would by default be assumed to have the same meaning in both sets of data.
Figure 6.3. XCEDE Events example - QA data
<XCEDE xmlns="http://www.xcede.org/xcede-2"
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<acquisition ID="my_stimulus_response_data">
<dataRef ID="my_events" />
</acquisition>
<data ID="my_events" xsi:type="events_t">
<event>
<onset>0</onset>
<duration>2</duration>
<value name="volmean">759.218</value>
<value name="cmassx">106.781</value>
<value name="cmassy">118.279</value>
<value name="cmassz">66.9694</value>
</event>
<event>
<onset>2</onset>
<duration>2</duration>
<value name="volmean">759.218</value>
<value name="cmassx">106.801</value>
<value name="cmassy">118.242</value>
<value name="cmassz">67.1636</value>
</event>
<!-- ... etc. ... -->
</data>
</XCEDE>
Specifications of experimental protocols are stored in XCEDE using the protocol_t tag structures.
Protocols in XCEDE consists of a hierarchial organization of steps defining the protocol and items within the steps defining the particular parameters of the protocol step. Formal protocols are built either from top down or bottom up using these building blocks
protocol_t. The protocol_t tag extends the abstract_protocol_t tag to include steps and items making up a protocol description.
An XCEDE protocol consists of the following tag groups. Each protocol block can be specified
Steps are used to specify an ordered list of step blocks that comprise the protocol. Each step block can be another protocol_t block or a reference to a step usign the stepRef tag.
Items are the smallest building blocks of a protocol. They are made up of any number of item tags which contain a specification of the parameters for the particular protocol step..
abstract_protocol_t. The abstract_protocol_t abstract tag provides basic information related to a protocol such as protocol time offset, min and max occurences, concept linkages, and more described below.The protocol_offset_t tag is used to specify timing of this particular protocol relative to other protocols or steps within an experiment. The protocolTimeRef and preferrredTimeOffset tags are used to describe the offset.The ID_name_description tag is used to specify an ID, name, and description for the protocol.The terminology_ag reference is used to give contextual meaning to the question and link it with known ontologies and concepts.The level attribute is used to specify the level hierarchically that the protocol refers to. For example, if the protocol is specifying scanner specific parameters then the level would point to the "study level" of the XCEDE hierarchy.The minOccurences and maxOccurences attributes are set to specify the min and max occurrences of the protocol.The minTimeFromStart and maxTimeFromStart attributes are set to specify a timing window from the start of the parent protocol. If this protocol block is at the top of the protcol hierarchy then "TimeFromStart" is the start of the study.
steps. The steps group and enclosing step and/or stepRef tags are used to embed or reference other parts of a protocol.
item. The protocolItem_t type is used to specify the smallest building blocks of the protocol.The itemText group and enclosing textLabel groups are used to specify one or more text labels. These labels are most often used when specifying the formal description of an assessment where each question on an assessment has some text most often asking a question. The location specifies where the text occurs. For example, the text could occur before or after some data entry field of an assessment questionaire. The value tag is used to specify the actual text.The itemRange group derived from the protocolItemRange_t is used define a valid range of values for a particular item via the attributes min, max, and unitsThe itemChoice group is used to specify valid entries for a particular protocol item. This is most often used when specifying assessment questions that have a fixed number of choices. The two attributes, itemCode and itemValue together specify the numeric code that correlates to the actual value (presumably string of text) of the choice as shown in the example below:
<xcede:itemChoice itemCode="1" itemValue="professional or graduate training (received degree)"/> <xcede:itemChoice itemCode="2" itemValue="college graduate"/> <xcede:itemChoice itemCode="3" itemValue="some college (at least one year)"/>
The protocol example below shows a formal protocol definition for an MRI visit that consists of an Socio-Economic scale assessment, followed by an MRI scanning session consisting of a T1 and a Sensory Motor fMRI acquisition.
<xcede:protocol ID="MRIV1" name="MRI Visit 1" minOccurences="1" maxOccurences="1"
required="true" level="visit" description="MRI scanning visit 1">
<xcede:steps>
<xcede:step ID="SES" name="Socio-Economic Scale" minOccurences="1" maxOccurences="1"
required="true">
<xcede:protocolOffset>
<xcede:protocolTimeRef>MRIV1</xcede:protocolTimeRef>
<xcede:preferredTimeOffset units="days">0</xcede:preferredTimeOffset>
<xcede:minTimeOffset units="days">0</xcede:minTimeOffset>
<xcede:maxTimeOffset units="days">0</xcede:maxTimeOffset>
</xcede:protocolOffset>
<xcede:items>
<xcede:item ID="ses_education_subject">
<xcede:itemText>
<xcede:textLabel location="leadText"
value="What is the highest level of education or professional training
that you have achieved?"/>
<xcede:textLabel location="trailText"
value="Any trailing text might go here"/>
</xcede:itemText>
<xcede:itemChoice itemCode="1"
itemValue="professional or graduate training (received degree)"/>
<xcede:itemChoice itemCode="2" itemValue="college graduate"/>
<xcede:itemChoice itemCode="3" itemValue="some college (at least one year)"/>
</xcede:item>
<xcede:item ID="ses_education_p_caretaker_prior_18">
<xcede:itemText>
<xcede:textLabel location="leadText"
value="What is the highest level of education or professional training that your primary caretaker until you were 18 years old has achieved?"/>
</xcede:itemText>
<xcede:itemChoice itemCode="1"
itemValue="professional or graduate training (received degree)"/>
<xcede:itemChoice itemCode="2" itemValue="college graduate"/>
<xcede:itemChoice itemCode="3" itemValue="some college (at least one year)"/>
</xcede:item>
</xcede:items>
</xcede:step>
<xcede:step ID="MRISCN1" name="MRI Scan Protocol, Visit 1" minOccurences="1"
maxOccurences="1" required="true">
<xcede:protocolOffset>
<xcede:protocolTimeRef>SES</xcede:protocolTimeRef>
<xcede:preferredTimeOffset units="hours">1</xcede:preferredTimeOffset>
<xcede:minTimeOffset units="hours">1</xcede:minTimeOffset>
<xcede:maxTimeOffset units="hours">8</xcede:maxTimeOffset>
</xcede:protocolOffset>
<xcede:steps>
<xcede:step name="T1" required="true" minOccurences="1">
<xcede:items>
<xcede:item xsi:type="xcede:protocolItem_t" name="FOV">
<xcede:itemRange min="24" max="24" units="mm"/>
</xcede:item>
<xcede:item xsi:type="xcede:protocolItem_t" name="Sequence">
<xcede:itemChoice itemValue="FSPGR"/>
<xcede:itemChoice itemValue="MP-RAGE"/>
</xcede:item>
<xcede:item xsi:type="xcede:protocolItem_t" name="Slice Thickness">
<xcede:itemRange min="1.2" max="1.5" units="mm"/>
</xcede:item>
<xcede:item xsi:type="xcede:protocolItem_t" name="Slices">
<xcede:itemRange min="160" max="170"/>
</xcede:item>
</xcede:items>
</xcede:step>
<xcede:step name="Sensory Motor" required="true" minOccurences="1"
maxOccurences="4" ID="SM">
<xcede:items>
<xcede:item name="weighting" xsi:type="xcede:protocolItem_t">
<xcede:itemChoice itemValue="t2"/>
</xcede:item>
<xcede:item name="TR" xsi:type="xcede:protocolItem_t">
<xcede:itemRange min="2" max="2" units="s"/>
</xcede:item>
<xcede:item name="NumberTRs" xsi:type="xcede:protocolItem_t">
<xcede:itemRange min="123" max="123" units="TR"/>
</xcede:item>
<xcede:item name="TE" xsi:type="xcede:protocolItem_t">
<xcede:itemRange min="30" max="30" units="ms"/>
</xcede:item>
<xcede:item name="FlipAngle" xsi:type="xcede:protocolItem_t">
<xcede:itemRange min="90" max="90" units="degrees"/>
</xcede:item>
</xcede:items>
</xcede:step>
</xcede:steps>
</xcede:step>
</xcede:steps>
</xcede:protocol>
Complete data storage and assessment specifications and descriptions are stored in XCEDE using the assessment_t and assessmentDescItem_t tag structures.
In XCEDE the formal definition of an assessment and the cataloging of the actual data values collected for that assessment are specified in different parts of the schema. The formal description of the assessment questions and possible answer choices are specified in the protocol section of the schema using the type protocol_t (described at a high level in Chapter 7, Protocols) whereas the actual assessment data for an acquisition are stored in the assessment_t tags
assessment_t. The assessment_t tag extends the abstract_data_t tag to include information about the acquired assessment.The name element contains the assessment name which is unique within the document and links the acquired listing of assessment data back to the formal assessment definition optionally contained within the protocol descriptionThe dataInstance element specifies the state of the acquired data. Specifically whether it is first entry, second entry, validated, unvalidated, etc.
assessmentInfo_t. The assessmentInfo_t tag is derived from the abstract_info_t element and is used to store a description of the assessment
assessmentItem_t. The assessmentItem_t tag is used to store the actual data captured by the assessment. The assessmentItem_t tag is composed of a <valueStatus> element which contains information about whether the subject declined to answer the question or why the data value might be missing. The <value> element stores the actual value of the assessment item as captured on the assessment iteself. The <normValue> is used to store the normalized value for the element. The <reconciliationNote> is used to document whether the values have been reconcilied if this data is part of multiple data entries. The <annotation> is used to annotate the assessment item.The terminology_ag reference is used to give contextual meaning to the question and link it with known ontologies and concepts.
The assessment example below shows both a formal assessment description stored in the <protocol> block and also the actual acquired assessment data
The formal description of each question for the assessment and possible answer choices:
Figure 8.1. Assessment description
<xcede:step ID="SES" name="Socio-Economic Scale" minOccurences="1" maxOccurences="1"
required="true">
<xcede:items>
<xcede:item ID="ses_education_subject">
<xcede:itemText>
<xcede:textLabel location="leadText"
value="What is the highest level of education or professional training
that you have achieved?"/>
<xcede:textLabel location="trailText"
value="Any trailing text might go here"/>
</xcede:itemText>
<xcede:itemChoice itemCode="1"
itemValue="professional or graduate training (received degree)"/>
<xcede:itemChoice itemCode="2" itemValue="college graduate"/>
<xcede:itemChoice itemCode="3" itemValue="some college (at least one year)"/>
</xcede:item>
<xcede:item ID="ses_education_p_caretaker_prior_18">
<xcede:itemText>
<xcede:textLabel location="leadText"
value="What is the highest level of education or professional training that your primary caretaker until you were 18 years old has achieved?"/>
</xcede:itemText>
<xcede:itemChoice itemCode="1"
itemValue="professional or graduate training (received degree)"/>
<xcede:itemChoice itemCode="2" itemValue="college graduate"/>
<xcede:itemChoice itemCode="3" itemValue="some college (at least one year)"/>
</xcede:item>
</xcede:items>
</xcede:step>
An instance of actual assessment data acquired on a subject during a protocol collection:
Figure 8.2. Acquired assessment data
<xcede:data xsi:type="xcede:assessment_t" subjectID="00301882920">
<xcede:name>Socio-Economic Status</xcede:name>
<xcede:dataInstance validated="true">
<xcede:assessmentInfo>
<xcede:description>This is the socio-economic scale.</xcede:description>
</xcede:assessmentInfo>
<xcede:assessmentItem ID="ses_education_subject">
<xcede:value>1</xcede:value>
</xcede:assessmentItem>
<xcede:assessmentItem ID="ses_education_p_caretaker_prior_18">
<xcede:value>2</xcede:value>
</xcede:assessmentItem>
<xcede:assessmentItem ID="ses_education_p_caretaker_lifetime">
<xcede:value>2</xcede:value>
</xcede:assessmentItem>
<xcede:assessmentItem ID="ses_education_s_caretaker_prior18">
<xcede:value>1</xcede:value>
</xcede:assessmentItem>
</xcede:dataInstance>
</xcede:data>