National Oceanic and Atmospheric AdministrationNODC, National Oceanographic Data CenterDepartment of Commerce
NOAA Satellite and Information Service

Global Temperature-Salinity Profile Program

What's New
Acknowledgments
Overview
   What's GTSPP
   Activities
   Infrastructure
   Contributors
   Related Links
Access GTSPP Data
   User-Defined Data Sets
   Real-Time Data Sets
   Best Copy Data Sets
Documents
   Data Processing
   Data Quality Control
   Data Format Description
      ASCII Format
      NetCDF Format
   Code Tables
      Parameter Codes
      GTSPP Codes
      WMO Codes
      Platform/Ship Codes
         Call Signs: 0-9
         Call Signs: A-J
         Call Signs: K-P
         Call Signs: Q Part 1
         Call Signs: Q Part 2
         Call Signs: Q Part 3
         Call Signs: R-Z
         Platform Codes: 06-30
         Platform Codes: 31-32
         Platform Codes: 33
         Platform Codes: 34-53
         Platform Codes: 54-ZZ
         Platform Names: 1-C
         Platform Names: D-G
         Platform Names: H-K
         Platform Names: L-T
         Platform Names: U-U
         Platform Names: V-Z
Meeting Reports
   GTSPP Team Meeting
   WOCE UOT Meeting
Tools
FAQ
IDARS

>> 

EXACT DUPliCATES

The U.S. National Oceanographic Data Center (NODC) adds high resolution, delayed mode data to the GTSPP data base. Each new file of delayed mode data is checked internally for records with exact duplication in

    • date and time (year, month, day, hour, minute), and
    • latitude and longitude (degrees, minutes, seconds, hemisphere), and
    • data type.

In addition, each record of the new file is compared to data in the GTSPP data base to identify exact duplicate records. A data base update file is created from the input file from which all duplicates (either in the file or between the file and data base) are excluded. This prevents insertion of duplicate records into the database.

INEXACT DUPliCATES

Periodically, the GTSPP database is checked for inexact or near duplicate records in which two or more observations

    • are of the same data type, and
    • are within 15 minutes time, and
    • are within 5 kilometers distance of each other.

The following information from "near-duplicate" records is displayed on the screen for review:

    • NODC accession number (identifies the source data set)
    • program identifier (software that performed the most recent operation on this record)
    • data base load date
    • number of profiles (1 = temperature; 2 = temperature and salinity)
    • number of depth-data pairs
    • platform code
    • call sign
    • latitude
    • longitude
    • observation date and time
    • data type
    • GTSPP data base unique station identifier

During this interactive session, the operator decides which, if any, records are to be deleted from the data base.

It is not always possible to determine from the above information whether or not two data records are actually duplicates (i.e. from the same observation). For example, the geographic positions may be the same but the times differ by a few minutes and the number of depths differ in the two records. In those situations neither record is deleted. When it is not possible to identify duplicates, we err on the side of keeping duplicate records rather than eliminating good data.

When deciding which record (if any) to delete, the operator takes into account the data source, number of observed depths, and state of data quality checking of each record.

  Last modified:    Wed, 27-Oct-2010 19:09 UTC NODC.Webmaster@noaa.gov
 
Dept. of Commerce - NOAA - NESDIS - NODC
* External link: You will be leaving the Federal
   Government by following an external link.
USA.gov - The U.S. Government's Web Portal