MANUALS AND GUIDES #22
GTSPP REAL-TIME QUALITY CONTROL MANUAL

1.0 Introduction
2.0 Quality Flagging
3.0 Instrumentation Knowledge
4.0 Test Monitoring
5.0 Pre and Post Processing
6.0 Quality Control Tests
7.0 Suggested Additional Tests
8.0 Implementation Details
9.0 Acknowledgements
10.0 Bibliography
Annex A: Duplicates Management
Annex B: GTSPP Quality Control Tests
Annex C: Suggested Additional Tests
Annex D: Specific Implementations

1.0 INTRODUCTION

This Manual has been produced within the context of the Global Temperature-Salinity Pilot Project (GTSPP). Because the work of assuring the quality of data handled by the Project is shared amongst data centres, it is important to have both consistent and well documented procedures. This Manual describes the means by which data quality is assessed and the actions taken as a result of the procedures.

The GTSPP handles all temperature and salinity profile data. This includes observations collected using water samplers, continuous profiling instruments such as CTDs, thermistor chain data and observations acquired using thermosalinographs. These data will reach data processing centres of the Project through the real-time channels of the IGOSS program or in delayed mode through the IODE system.

The procedures described here are intended to cover only the above-mentioned data types and specifically for data sent through the IGOSS system. However, there are obvious generalizations that can be made to other data types. Because of this, it is expected that this Manual will serve as a base on which to build more extensive procedures for the aforementioned data types and to broaden to other types, as well. Indeed, in some cases, tests of data types that are not strictly part of this Project are incorporated into this Manual simply because they are of obvious use and because these data types are often associated with the data of interest to the GTSPP.

Updates to this Manual are carried out as new procedures are recommended to the GTSPP and as these are accepted by the project Steering Group. Readers are encouraged to make suggestions on both how to improve existing tests, and of new tests that should be considered. In both cases, it is important to explain how the suggestion improves or expands upon the existing suite of tests. Suggestions may be forwarded to any participants of the GTSPP and these will be directed to the Steering Group. As tests are suggested but before incorporation, they will be documented in a section of the Manual. This will provide a means to accumulate suggestions, to disseminate them and solicit comments.

This Manual describes procedures that make extensive use of flags to indicate data quality. To make full use of this effort, participants of the GTSPP have agreed that data access based on quality flags will be available. That is, GTSPP participants will permit the selection of data from their archives based on quality flags as well as other criteria. These flags are always included with any data transfers that take place. Because the flags are always included, and because of the policy regarding changes to data, as described later, a user can expect the participants to disseminate data at any stage of processing. Furthermore, GTSPP participants have agreed to retain copies of the data as originally received and to make these available to the user if requested.

The implementation of the tests in this Manual requires interactive software to be written. The operator is consulted in the setting of flags or possibly in changing data values. In each case, information is provided to the operator to help them decide what action to take. In the descriptions of the tests, certain specific items of information and data displays are included. So, for example, when a station position fails a test of platform speed, a track chart of the platform is used. The amount of information displayed and the presentation technique is dependent upon the hardware and software capabilities at the implementation site. For this reason, the information to be displayed, and the method of presentation should be treated as recommendations

2.0 QUALITY FLAGGING

The purpose of this Manual is to set standards for quality control of real-time data and to describe exactly the screening process that is employed. By reading this document, users may assess the applicability of the procedures to their requirements and thereby judge whether they need do further work before using the data.

Attached to every profile is a number indicating the version of the Quality Control Manual which describes the tests employed. As the procedures documented by this Manual are expanded to include others or to refine the older tests, a new version flag will be assigned. It is recognized that the suite of tests performed will undergo modifications with time. For this reason it is necessary to record which version of quality control procedures have been applied to the data. This version number is associated with updates to this Manual. The version applied is to be assigned to each profile as it is processed and to be carried thereafter with the data. This document constitutes version 1.0.

Also attached to every profile is a number that indicates which tests have been employed. This number is constructed as follows. Each test of the Quality Control Manual is assigned an index number to base 2. The number that describes the suite of tests employed against a profile is the sum of the index numbers of the tests used. The index number is given with every test documented in this Manual. This number is then written in base 16. So the digits 0 through 9 represent numbers from 0 through 9, A=10 through to F=15. As an example, if there are 10 tests, and all are employed, the Test Number is then 3FF.

If a participating Data Center applies tests other than those described in this Manual, it should supply documentation with the data to explain the other tests. The use of other tests is indicated by a version number for the Manual that has a digit in the hundredths place. So, for example, a Version of 1.02 indicates that a Data Center has used the tests described in version 1.0 of the QC Manual but have also applied other tests (indicated by the digit 2) of their own. Each Data Centre may assign this last digit in a fashion suitable to their own operations.

The second type of flag is used to indicate the quality of the data. It is considered unproductive to attach a flag describing the result of each test performed to every observation since this may result in numerous flags that generally would not be used. Instead, it is deemed necessary to be able to assign flags to individual or groups of data values to indicate the confidence in the value. Participants of the GTSPP have agreed that the following rules shall apply.

1. Both independent and dependent variables can have a flag assignment.
2. Data aggregations (in the case here these are entire profiles) can also be assigned a flag. So the word element used later implies aggregations as well.
3. The flags indicating data quality are those currently used in IGOSS processing with one extension.

0 = No quality control has been assigned to this element
1 = The element appears to be correct
2 = The element appears to be probably good
3 = The element appears probably bad
4 = The element appears erroneous
5 = The element has been changed
6 to 8 = Reserved for future use
9 = The element is missing

The philosophy for flag assignment adopted by this Manual is that it is generally inadvisable to change data. Changes should only be made when it is clear what the change should be and that if a change were not made the data would be unusable. It is expected that subsequent versions of the Manual will improve on this.

The test descriptions allow for inferring values for those that have failed the test procedures. The inference of a correct value is done at the discretion of the person doing the quality control. It should be based on information which is not available to the test procedure but which the operator has at hand and assists in knowing what the correct value should be. Values should be changed only when there is certainty what is the correct value. In the instance where data values are changed, the original value is also preserved and is available to users or to other tests if needed.

Finally, because quality assessment is shared over processing centres, it is possible that data flagged as doubtful by one centre will be considered acceptable by another or vice versa. Flags can be changed by any processing centre as long as a record is kept of what the changes are.

The use of the flagging scheme described here will meet the stated requirements of the GTSPP. It is recognized that as new testing procedures are developed, it will be necessary to re-examine data. With version flags preserved with the data, it will be possible to identify what has been done, and therefore how best to approach the task of passing data through newer quality control procedures.

3.0 INSTRUMENTATION KNOWLEDGE

It is recognized that knowledge of the instrumentation used to make an observation can be useful in the assessment of the quality of the data. Likewise, knowledge of the platform from which the data were collected can also be used. Where available, this instrumentation knowledge should be sent with the data to the GTSPP participants. The present version of this Manual suggests tests that make use of instrumentation knowledge if available. It is expected that subsequent versions of the Manual will improve on this.

4.0 TEST MONITORING

All processing centers should monitor the performance of their quality control tests. In this way, deficiencies can be identified and recommendations made to improve procedures. These recommendations should be sent to the Steering Group designated to maintain this Manual. They will be discussed and included as appropriate in subsequent versions of the Manual.

5.0 PRE AND POST PROCESSING

The quality control tests described in the appendix assume a basic scrutiny has been applied to the data. Explicitly, the data have passed a format checking procedure which ensures that alphanumerics occur where expected and no illegal characters are present. It does not assume that values of variables have been checked to see if they are physically possible.

None of the tests described here automatically assigns a quality flag without the approval of the person doing the quality assessment. When a value or element fails a test, a recommendation of the flag to be assigned is made. The person doing the quality assessment then must decide the appropriate flag to use from a list of recommendations. The tests do restrict the flags that may be assigned in that a user is not permitted to assign any flag to a value or element failing a test.

There is a need to find and remove data duplications. A check for duplicate reports is necessary to eliminate statistical biases which would arise in products incorporating the same data more than once. In searching, the distinction between exact and inexact duplicates should be kept in mind. An exact duplicate is a report in which all the physical variable groups (including space-time coordinates) are identical to those of a previous report of the same type from the same platform. An inexact duplicate will have at least one difference.

Annex A contains the algorithm proposed by the Marine Environmental Data Service for the identification of duplicates. It discusses the implementation of the technique for data received in both real-time and delayed mode. In the context of this Manual, only the discussions of the handling of real-time data is relevant. The algorithm is based on near coincidences of position, and time. This means that tests 1.1 to 1.4 and test 2.1 of this Manual must be applied before duplications are sought. The basic criteria for a possible duplication is based on the experience of the TOGA Subsurface Data Centre. So, if stations are collected within 15 minutes or 5 km of each other, they may be duplicates. The identification of the stations of potential duplicates are then examined as well as the data to resolve whether or not a duplication exists. Then, other tests of the quality control are run on the output of the duplicates test. In this way, as little as possible is done before duplications are tested for.

There will also be a need for scientific assessment of the data quality. This would involve subjecting the data to a different set of tests by applying knowledge of the characteristics of the processes from which observations have been collected. It may also be that more data may be gathered together so that more sophisticated statistical tests can be applied. As such tests become generally accepted and an established application procedure developed, they could be incorporated into the context of this Manual and become part of the regular screening process conducted by participants of this project.

6.0 QUALITY CONTROL TESTS

The complete set of tests is included in Annex B. Each description has a number of sections that are always present. A description of the information that each contains follows:

Test Name: This is the short name of the test. Each test is numbered for ease of reference.
Prerequisites: This describes what tests are assumed to have applied before and what preparation of the data set is suggested before application of the test. If will also describe what information files are required.
Description: This section describes how the test is implemented and what actions are taken based on the results of the test.
History: This records any changes that have taken place in the test procedure and the date on which they were recorded. This section will record the evolution of a test procedure through the various versions of the Manual.
Rules: This section lists the rules that are applied to effect the various tests. Their numbering is for reference value only since they have been written so that they may be implemented in any order.

The tests have been grouped according to stages. The first stage is concerned with determining that the position, the time, and the identification of a profile are sensible. The second stage is concerned with resolving impossible values for variables. The next stage examines the consistency of the incoming data with respect to references such as climatologies. The next section looks at the internal consistency within the data set.

The grouping of the tests suggests a logical order of implementation in that the simpler, more basic tests occur before more complicated ones. The order of presentation of tests within a stage does not imply an order in implementation. In fact, should a value be changed as a result of a test, the new value should be retested by all of the tests within the stage. Indeed, since data values can be changed, the implementation of these tests cannot take place in a strictly sequential fashion.

The tests detailed by this Manual cannot be mutually exclusive in examining the various properties and characteristics of the data. As much as possible, each test should focus on a particular property to test if the data value or profile conforms to expectations. Modifications to old tests will be incorporated as they refine the focus of the test. New tests will be added to examine properties of data that are not adequately covered by this version.

Each of the tests has been written from the point of view that the data being examined have not been before. The difference this makes is that quality flag assignments do not check if the flag has already been set to something other than 0 (meaning no quality control has been performed). If this is not the case, the rules as written will need modifications to check if the flag has previously been set. If this is the case, and a flag indicates the value was changed, the user should be informed of the original value of the data before another change is performed. Then, if the flag is reset, the changed value should be preserved in the history of the station if the flag is set to be anything else. In other cases, where a flag is changed but the observation is untouched, it is not necessary to record the old flag, but simply to record that data have passed through a second organization and the quality tests done there.

The tests described in stage 5 represent a visual inspection of the data as received and usually after all other tests have been completed. This stage is necessary to ensure that no questionable data values pass through the suite of tests employed without being detected. The testing and flagging procedure of this stage relies upon the experience and knowledge of the person conducting the test. As experience is gained with the tests contained within this Manual, the processes used in the visual inspection of stage 5 will be converted to objective tests included in other sections of the Manual. However, there will always be a need to conduct this visual inspection as the final judgement of the validity of the data.

7.0 SUGGESTED ADDITIONAL TESTS

Other tests that have been suggested are listed in Annex C. These have not yet reached the stage of being incorporated into the Manual but have been suggested as worthy of consideration. They are noted here so that participants may record their experiences with their use and so that they may be considered for future versions.

8.0 IMPLEMENTATION DETAILS

Annex D contains some details of how certain of the tests are implemented in particular cases. The purpose of their inclusion is to provide further details that may assist others in understanding the details of a test procedure.

9.0 ACKNOWLEDGEMENTS

Contributions to the contents of the manual were made by J.R. Keeley, S. Levitus, D. McLain, N. Mikhailov, C. Noe, J.-P. Rebert, B. Searle, and W. White. Others have assisted in suggestions of how to improve tests and clarify the text. Information describing test procedures carried out by various organizations are noted in the Reference section. This Manual reflects the knowledge described by the references.

10.0 BIBLIOGRAPHY

1. Guidelines for evaluating and screening bathythermograph data, ICES Working Group on Marine Data Management, September, 1986.
2. Note sur les contrôles effectuées à Paris sur les données BATHY et TESAC par le centre SMISO, P. LeLay, Member of IGOSS OTA, 5 July, 1988.
3. Guide to Data Collection and Location Services Using Service Argos, Marine Meteorology and Related Oceanographic Activities Report 10, WMO/TD-No.262, 1988, Revised edition, 104pp.
4. Guide to Drifting Buoys, IOC/WMO Manuals and Guides 20, 1988, 69pp.
5. Ocean Temperature Fields, Northern Hemisphere Grid, 1985-1988, Office of Ocean Services, National Ocean Service, NOAA, June, 1988.
6. Ocean Temperature Fields, Southern Hemisphere Grid, 1985-1988, Office of Ocean Services, National Ocean Service, NOAA, July, 1989.
7. Guide to Operational Procedures for the Collection and Exchange of IGOSS Data, IOC/WMO Manuals and Guides 3, Revised June, 1989, 68pp.
8. Personal Communication, N. Mikhailov, 19 September, 1989.
9. Quality Improvement Profile System (QUIPS), Functional Description, R. Bauer, Compass Systems, 1987.
10. Seasonal Anomalies of Temperature and Salinity in the Northwest Atlantic in 1983, Canadian Technical Report of Hydrography and Ocean Sciences #74, March, 1988.
11. Reineger and Ross Interpolation Method, in Oceans IV: A Processing, Archiving and Retrieval System for Oceanographic Station Data, Marine Environmental Data Service Manuscript Report Series #15, 1970, pp40-41.
12. Marine Data Platforms - An Interactive Inventory, G. Soneira, W. Woodward and C. Noe, 7pp.
13. Guidelines for Evaluating and Screening Bathythermographic Data, ICES Working Group on Data Management, September, 1986, 4pp.
14. Data Monitoring and Quality Control of Marine Observations, W.S. Richardson and P.T. Reilly.
15. IOC/IODE "Manual of Quality Control Algorithms and Procedures for Oceanographic Data Going into International Oceanographic Data Exchange", draft, 1989.
16. IOC/WMO Guide to Operational Procedures for the Collection and Exchange of IGOSS Data, Manuals & Guides #3, 68pp, 1988.
17. UNESCO Technical Papers in Marine Science #44, Algorithms for the Computation of Fundamental Properties of Seawater, UNESCO, 1983.