GTSPP Database OverviewThe primary objective of GTSPP is to make the bulk of the GTSPP data searchable and retrievable by users over the Internet in a timely manner. , The database system shall simply require the user to step through the data and graphing feature selection process to visually browse the data. After the graphics session, the user may download the data from the image by setting the browser options for ftp transfer of data and clicking on the 'download' link at the bottom of the search result page. The user may also download the entire GTSPP archive data file that the data was extracted from by linking to GTSPP's Online Data Access. Therefore, the proposed system should meet the following functional requirements:,
- Allow users on the Web to issue queries for data by location, time, and parameter types.
- Allow users to visually browse the data of their choice prior to downloading.
- Provide multiple output file formats, including the NetCDF (Network Common data File) format, the database load format in the MEDS ASCII form, and an ASCII format chosen for its flexibility and similarity to commonly used transfer formats of commercial database and data analysis software.
Database DesignThe term database is a collection of related data stored in some organized way. In a relational database, data are stored in tables that are related to one another in a series of one to many relationships by common fields. These common fields are set as primary and/or foreign keys. The creation of relationships between tables using key fields allows for the enforcement of referential integrity. The GTSPP CMD uses the Oracle Procedure Language/Structured Query Language (PL/SQL) to develop the database and the Practical Extraction and Report Language (PERL) to develop the web interfaces and the PERL Database Interface (DBI) to develop the protocol of connectivity between the Web server and the database.
Steps in building and implementing the GTSPP Database are:,
- Create the Data Base table structures. Within the current
database design, the primary tables are as follows:
callsigns, checkout, datatype, dbaudit, identcodes, medsid, nprofs, observation, parinv, platcodes, platnames, profile, shipnos, stainv, station, stationhistory, stationmatch, surfacecodes, surfaceparms, tcallsigns, tdbaudit, tmedsid, tnprofs, tobservation, tparinv, tplatcodes, tplatnames, tprofile, tshipnos, tstainv, tstation, tstationhistory, tsurfacecodes, tsurfaceparms, twiline, and useraccess.
- Generate the indexes for each table. The current indexes being
used in the data base are based on current operational programs and
retrieval routines. As new routines and retrieval capabilities are
generated the DBA will have to determine if new indexes are needed
to optimize the new queries.
- Perform index maintenance to maintain optimization. What happens
with normal database updates, with updates to the various tables and
new data being added, the indexes become fragmented and don't perform
as they should. The indexes need to be dropped and recreated when this
- The GTSPP DBA must grant access to the various users of the data
base. There are 3 types of users one being privileged, one who
can only read the tables and programs.
- Create a synonym for each table accessible by users and programs,
which makes it easier users and programmers so that they don't have
to fully qualify the table name.
- Backup Procedures
The GTSPP production database on SQL server "leeward" is dumped once a week on Saturday at 0100 GMT. The procedures to dump the production database can only be run on server "leeward" and uses the Oracle export utility program. The export of the GTSPP database currently takes approximately 4 hours to run, which exports all indexes, synonyms, grants and stored procedures along with the data structures. The export file when completed is about 30 gigabytes in size and when the import to the outside web server is complete, the file must be compressed with program gzip to conserve disk space utilization.
- Restore Procedure:
If the database needs to be restored, you would use the latest exported dump file to restore the database. If a table is corrupted you only need to restore that particular table. If the entire database needs to be restoring, it is necessary to truncate all the tables and drop all the indexes before you import the database. All data loaded or updated in the data base after the restore point will have to be reloaded or updated.
- Meeting Reports
- GTSPP Training Course
- AOML QC Manual
- CSIRO "Cookbook"
- Database Overview
- Data Holdings and Streams
- Data Processing Flows Diagram(JPG)
- Delayed-Mode Data Duplicates Identification
- GTSPP Data Quality Flags
- GTSPP Leaflet (PDF)
- GTSPP Poster (PDF)
- Real-time Quality Control Manual (PDF)
- Real-Time Data Duplicates Identification
- XBT Fall Rate Discussion
- Unique Data Tagging Scheme