Delayed-Mode Data Duplicates Identification
Exact Duplicates Check
The GTSPP adds high resolution, delayed mode data to the GTSPP data base. Each new file of delayed mode data is checked internally for records with exact duplication in
- date and time (year, month, day, hour, minute), and
- latitude and longitude (degrees, minutes, seconds, hemisphere), and
- data type
In addition, each record of the new file is compared to data in the GTSPP data base to identify exact duplicate records. A data base update file is created from the input file from which all duplicates (either in the file or between the file and data base) are excluded. This prevents insertion of duplicate records into the database.
Inexact Duplicates Check
Periodically, the GTSPP database is checked for inexact or near duplicate records in which two or more observations
- are of the same data type, and
- are within 15 minutes time, and
- are within 5 kilometers distance of each other.
The following information from "near-duplicate" records is displayed on the screen for review:
- NODC accession number (identifies the source data set)
- program identifier (software that performed the most recent operation on this record)
- data base load date
- number of profiles (1 = temperature; 2 = temperature and salinity)
- number of depth-data pairs
- platform code
- call sign
- latitude
- longitude
- observation date and time
- data type
- GTSPP data base unique station identifier