GTSPP Ad hoc meeting (TAMU, 4 Apr, 2000)
Attendees: Thierry Carval - SISMER, France
Loic Petit de la Villeon - SISMER, France
Kurt Schnebele - NODC, USA
Charles Sun - NODC, USA
Yeun-ho Chong - AOML, USA
Bob Keeley - MEDS, Canada
1. Status of participants
MEDS reported that funding and support for GTSPP is stable. It is still an important program at MEDS and will continue to be supported.
NODC reports that with the coming retirement of Doug Hamilton, Charles Sun will be taking on his GTSPP responsibilities. They report stable funding, but although they would like to add new resources, they have not been able to do this. NODC views the GTSPP philosophy of data management to be the direction they wish to move towards. Possible changes include a switch to an Oracle data base system, although this is not yet final.
AOML reports stable funding as well with continued support to carry out quality control functions and provide scientific advice to profile data management.
CSIRO reported (conversation between Rick Bailey and Bob Keeley) that their funding is stable. They have been working on semi-automated QC software and upon completion of this (no time provided) they would like to run all historical temperature data from the Indian Ocean through their new software.
Scripps reported (conversation between Warren White and Bob Keeley) that they continue to support the science centre function of GTSPP.
2. UOT CD
NODC reported that the problem of duplicates on the last CD was addressed some time ago. However, a recent test showed there was still an unresolved problem. It has been identified, and software written to make the correction. Tests are going on now. They fully expect the duplicates issue to be solved before data are placed on the next CD.
NODC has written software to write GTSPP data into netCDF. The structure of netCDF forces the data to be written with one station per file in order to keep resulting file sizes to reasonable levels. This will result in about 1,000,000 files on the UOT CD. These will be subdivided into oceans and 3 month periods. All stations in these subdivisions (less than about 15,000) will be compressed into a single file. In addition, there will be an index file listing the location, date, profiles and other information that can be read by a user to identify profiles of interest. All data in compressed form will occupy about 550 Mb.
NODC is working with Bert Thompson of the WOCE DIU to identify high density XBT sampling along WOCE lines. There is some work to do on this, particularly with data from AX3. The high density data will be placed in a separate directory structure with one line-year-ship in each netCDF file. These data will also be in the other directory structure, so MEDS will prepare a description for the CD that will explain this.
MEDS had passed software to NODC that runs a wide variety of tests on the contents of the GTSPP format to ensure that information in the file is consistent with the GTSPP format requirements. The software is not yet operational at NODC, but they intend to continue to work on this. It was hoped that this software would be ready in time to use against the data to be placed on the next CD, but this is not certain.
MEDS summarized the products and content (excluding data) of the last CD and what might be included on the next. MEDS is assembling this part of the CD. NODC agreed to provide statistics of the numbers and types of data by ocean and year as they did last time. They also agreed to provide station location figures as they did before. NODC agreed to create figures that show the locations of the high density lines and some figures that display the temporal distribution as well. AOML agreed to review what they have and to notify MEDS if there are any updates. CSIRO may have some products for the Indian Ocean and MEDS will pursue this. SISMER agreed to generate data density figures as before as long as differences in the holdings of NODC and SISMER could be resolved in time (see discussion under item 3). They also agreed to notify MEDS if they thought other available products should be included.
Priority List of tasks for the next CD
This list summarizes work that needs to be or might be done prior to production of the next CD. They are listed from more to less important.
a) NODC: Resolve duplications and remove these from UOT CD files.
b) NODC, DIU: Identify and separate data collected along high density XBT lines.
c NODC, SISMER: Compare NODC holdings to those at Brest to ensure that NODC holdings are as complete as possible.
d) CSIRO: Complete scientific QC of 1996 data and return these to NODC.
e) NODC, MEDS: Prepare statistics of data volumes and types by ocean.
f) NODC, MEDS: Create station maps of data on CD
g) AOML, Scripps, CSIRO, MEDS to assemble any additional products describing the data on the CD.
3. Review of scientific QC and data status
NODC reported some statistics on XBT data arriving between 1990 and present. They show that over 100,000 profiles arrived within 1 year of the observation date. After that a further 50,000 arrive within another year. This translates to a bit less than 50% arriving in the first year, about 20% more within the next year and about 20% more of the total received in 4 years (about 90% of the total received arrive within 4 years). Matching high to low resolution profiles, they found that even after more than 5 years, there was still almost 30% of the real-time data not replaced by delayed mode data. (Does this take into account the real-time profiles received from TAO which will never come in delayed mode? If not this could have a significant impact).
A discussion was held regarding regular reports on data flow and availability. NODC agrees to produce regular reports that apply to delayed mode data receipts. Some examples of what was done for real-time data were shown and NODC agreed to generate suitable data flow products to be posted regularly on their GTSPP web site.
All of the 1996 data with the exception of the Indian Ocean have been returned from the science centres. CSIRO were encouraged (by MEDS) to complete this QC and send the data back to NODC in time for inclusion on the next CD.
SISMER reported that the last exchange with NODC was March, 1999, whereas NODC reported December, 1997. NODC agreed to check on this. However, all participants expected that there would be data at SISMER that is not at NODC and would be valuable to be included on the next CD. SISMER and NODC agreed to carry out an intercomparison as soon as possible. Since SISMER produced data density maps for the last CD and expressed interest in providing the same for the next version, there is an urgency to reconciling any differences between the two archives as soon as possible.
NODC reported that they have received data from the U.S. Navy and that some 113,000 profiles are represented in the GTSPP archives. NAVOCEANO and NODC have started bi-monthly data exchanges. A difficulty is that no ship identification is supplied by the navy. However, there is agreement that some indication of which stations belong to different cruises will be given.
NODC has about 5,000 stations from the U.K. Navy. The contact has been through the WDC-A. NODC was encouraged to continue efforts to acquire these data.
The last GTSPP meeting was informed that JODC was intending to make available higher resolution data. NODC pursued this over the past year, but as yet these data are not available.
In conversation with Rick Bailey from CSIRO, MEDS was informed that the Japanese Far Seas Fisheries data were being made available through an ftp site. MEDS will investigate this and provide a contact to NODC for them to follow up.
4. Archive
The OPDB (Ocean Products Data Base) at NODC is the main data management system for all types of profile data at NODC. The GTSPP archive is a separate archive designed especially to serve GTSPP data providers. NODC reported that stations in GTSPP are also in the OPDB, but there are a significant number of stations in OPDB not in GTSPP. NODC stated that it was their reading of user interest to record more information about the "history" of data in their archives. In this respect, the GTSPP archive structure is better suited. They stated that it was expected that requirements of GOOS will begin the convergence of handling data in the two different archive systems, so that differences should lessen. This process is expected to begin this year.
SISMER reported that the Coriolis data base is a descendent of the TOGA/WOCE system. They stated that Coriolis does handle history records consistent with the philosophy of GTSPP.
MEDS described plans that are being made to change how the U.S. SEAS program manages data (information gained at the recent SOOPIP meeting). The main difference is that profiles to be sent in real-time will be sent ashore in full resolution. There, the BATHY or TESAC message will be constructed and distributed on the GTS. At the same time, the full resolution files can also be sent to interested parties. They also were asked by SOOP to include line information with each station. There are a couple of issues that impact the archives. First, SOOP is making a transition to more focused line mode sampling and would like this information reliably linked to the data. Second, it has been suggested that by attaching a unique tag to stations, the problem of identifying duplications and matching real-time and full resolution profiles would be eased.
The line information provided by operators cannot be relied upon. So, data centres are going to have to develop software tools to help assign line numbers. MEDS will be undertaking some software development in this respect for another project and so can experiment with what can be done.
SOOP asked Keeley to draft a proposal of how to assign and use unique tags to be attached to data. He was asked to circulate this to SOOP members and he will also provide this to GTSPP partners.
NODC reported that there is still not a clear enough definition of what tasks will be carried out by the U.S. GOOS centre and what relationship that centre will have with NODC. MEDS reported that GOOS plans in Canada were still being solidified. We should review this at the next meeting.
5. Semi-automated QC procedure development
At the SOOP meeting (as reported by MEDS) CSIRO reported that it is still working on semi-automated software tools that can be used to identify the fraction of the total data that need closer scrutiny. They are testing present procedures on data from line IX1, the data from which they are very familiar. No time frame was given for when they expected the software to be ready.
AOML reported that they, too, were working on similar procedures. They use procedures such as comparing data to NCEP weekly analyses, and to statistics from WODB98 among other tests. It is expected that these procedures will be completed by June of this year. AOML promised to provide documentation describing their procedures. MEDS and NODC expressed interest in adopting whatever procedures are proven successful and can sensibly migrate to a data centre..
In both cases, the software being developed considered temperature only. Although other variables are important to GTSPP, it is expected that this will represent a step forward, and could be very helpful.
6. Support to CLIVAR/GOOS/ARGO/GODAE
The major point discussed here was related to Argo. MEDS described its role as the data centre for Canadian float data. SISMER will be handling the float data for France and others from Europe. The current expectation is that data will circulate on the GTS and likely in TESAC within 12 hours of collection. The major impact will be the rapid availability of full resolution data. The initial versions of these data will be available at the same time as the TESAC. Fully calibrated data is planned to be available within 90 days of data collection. Apart from expected content, the Argo full resolution files will contain many positions of the floats while at the surface although only one profile will be available. The PROVOR floats also record time, pressure, T and S at their parking depth and these form part of their data stream. In addition there will be engineering values, such as battery life, and so on, which are of interest to manufacturers and perhaps pertain to calibrations but are of questionable value to archive in the long term. If the GTSPP data structure is to be used for Argo data, this additional information must be managed sensibly.
Argo intends to manage their data files in netCDF. The data from each float will be held in a file, with profiles added each time the float surfaces and reports data. These files are to be shared among data centres. The details of this sharing, structure of the netCDF file, automated QC and so on are still under discussion.
7. Other Business
MEDS reported that it was receiving some MK12 data files from India. Software is now written to convert to the GTSPP data structure and to build BATHY messages for provision to the GTS. The BATHY files will be circulated on the GTS, and the full resolution data will be sent to NODC. This is an interim arrangement until India can make suitable arrangements in its own country to carry out these tasks.
MEDS raised the issue that although it and NODC use the same philosophy of assigning parameter codes for new data or metadata, there is no coordination between them. This has resulted in the same variable being assigned different codes at MEDS and NODC, but not (thankfully) different variables being assigned the same code. NODC offered to look into some way to make their tables more accessible to others with the idea that others could consult these tables first and if a new code was needed would ask NODC to update these common tables.