NOAA Logo National Centers for Environmental Information

formerly the National Oceanographic Data Center (NODC)...  more on NCEI

NOAA Satellite and Information Service

You are here:HomeGTSPP › Documents

Delayed-Mode Data Duplicates Identification

Exact Duplicates Check

The GTSPP adds high resolution, delayed mode data to the GTSPP data base. Each new file of delayed mode data is checked internally for records with exact duplication in

  • date and time (year, month, day, hour, minute), and
  • latitude and longitude (degrees, minutes, seconds, hemisphere), and
  • data type

In addition, each record of the new file is compared to data in the GTSPP data base to identify exact duplicate records. A data base update file is created from the input file from which all duplicates (either in the file or between the file and data base) are excluded. This prevents insertion of duplicate records into the database.

Inexact Duplicates Check

Periodically, the GTSPP database is checked for inexact or near duplicate records in which two or more observations

  • are of the same data type, and
  • are within 15 minutes time, and
  • are within 5 kilometers distance of each other.

The following information from "near-duplicate" records is displayed on the screen for review:

  • NODC accession number (identifies the source data set)
  • program identifier (software that performed the most recent operation on this record)
  • data base load date
  • number of profiles (1 = temperature; 2 = temperature and salinity)
  • number of depth-data pairs
  • platform code
  • call sign
  • latitude
  • longitude
  • observation date and time
  • data type
  • GTSPP data base unique station identifier