World Ocean Circulation Experiment
Global Data Resource
|Home | Documents | Software/Resources | Help|
WOCE Data Products
1-2 Nov 2001, University of Delaware, Lewes, DE, USA
The meeting was opened by the chairman of the V3 Working Group (V3-WG), Reiner Schlitzer, who welcomed the Working Group and thanked Katherine Bouton and Debbie Booth for local arrangements. Attending the meeting were the V3-WG members Katherine Bouton, Steve Diggs, Penny Holliday and Bernie Kilonsky, DPC Chairman David Legler, and DIU staff Doug White, Patrick Conlon and James Crease. Apologies were received from V3-WG member Victor Zlotnicki who was unable to attend.
2. Improvements to the WOCE netCDF files
A review of the V2 netCDF files by Bouton, Holliday and King had revealed a number of inconsistencies between the different files. The review was conducted with two requirements in mind; the need for successful searching of the data inventories, and the need for an integrated look and feel to the WOCE data. The inventory requirement is the information necessary for a search tool to be able to select data on the basic variables of time, location, depth (or pressure), and the presence of a particular variable. The "integrated look" requirements are such things as consistent variable names and units. The issues relating to both these requirements are discussed below.
There is little benefit in making suggestions to improve the data files and inventories if the DACs cannot achieve them in the timeframe remaining. In particular there is concern over the changes that the Float and CM DAC (both closed in 2001) can make at this late stage. However the V3 WG agreed that the suggested changes were necessary to achieve the level of integration the DPC requires. The onus is on the DPC chairs, Nathan and David, to present the arguments to the DACs and help them conform. The CM and float DAC may need to be helped by DIU, but it was agreed that a complete re-writing of their files by another DAC should be avoided because of the risk of errors.
Action 1: By Friday 9 Nov 01 the V3 WG will produce a list of requirements for the files and inventory tables, and a time schedule for changes to be made. Bindoff/Legler will communicate them to the DACs immediately.
The requirements as agreed by the V3 WG are set out in the document compiled by Katherine Bouton (attached .xls file).
Some specific points were raised in addition to those in the document:
It is not acceptable for any of the "required variables" (woce_date, woce_time_of_day, latitude, longitude, depth or pressure) to be global attributes (as some are in CM files at the moment). The reason for this is that placing them in the global attributes makes parsing and searching much slower. Instead the requirement is for a 1-dimensional variable, with the appropriate variable attributes.
The WG recommended that units should fit COARDS or EPIC conventions wherever possible (e.g. degree_N and degree_E). It was decided that having attributes data_min and data_max was preferable to a single data_range because it made for faster parsing.
A case was made for including sea surface temperature as an additional key variable since it was envisaged that users would want to distinguish between surface and subsurface temperature, and that SST data sets do not always include a depth or pressure variable.
Later in the meeting the issue of the time variable was raised and discussed at length. See Section 8 for a summary.
3. Providing Guidance to the User
The WG recommended that the user be provided with a short guide to the integration software and process (a list of FAQs). Some example questions might be:
What type of information is provided by the DIU software
What traps are there to look out for
What standards the files comply with
Where to find additional data
What are the different data streams
A summary of netCDF and ascii files that are presented
Link to the netCDF primer
An explanation of EXPOCODES and where to find a definitive list
Action 2: Bernie Kilonsky and Penny Holliday to compile the FAQ page for the V3 issue. They will request input from all the DACs.
4. Inventory Issues
The following discussion was concerned with the inventory tables for the static CDs only, since they are the highest priority at this time.
The basic requirements of the inventory table are for the user to be able to search on the following and have a filename and path returned (DPC14):
Depth or pressure range
The presence of variables
Experiment identifier (e.g. Expocode)
Searching on ranges of key variables
The WG discussed whether the inventory tables should allow a user to search for a range in the 6 key variables (temperature, salinity, sst, u, v, sea_level). The advantage of that aim is to give the user more sophisticated searches, the disadvantage is that it adds 12 columns to the inventory table and thus makes them larger and searches slower. On balance the WG decided that the ranges (data_min and data_max) of the 6 key variables should be included. It was unclear whether the script that the DIU had used to create their inventory tables from V2 files could be modified to pick out data_min and data_max rather than the presence of a variable.
Action 3: KAB to discuss with Shawn Smith (the originator of the script), though the WG was aware that individual DACs were developing their own scripts to create inventory tables.
ii) Optimising the inventory table format
The inventory tables will be provided to the DIU as flat ascii (tab-delimited), and the software in its current stage searches one large flat ascii files. However the DIU is aiming to make searching as efficient and quick as possible, and is investigating the possibility of keeping the inventory tables in a database which can be distributed freely on the CDs.
Action 4: DIU to report to DPC15 their decision on ascii vs. database for inventory tables.
iii) Encouraging generation of inventory tables by the DACs
The WG recommended that the DIU put online a file containing the required inventory headings and examples for the DACs to view. The DIU re-stated that they will accept inventories that contain a subset of the DACs holdings for testing in their search engine.
Action 5: DIU to put current inventories and search tool online to demonstrate to all DACs the present functionality.
Action 6: DIU to send to DACs a template of the inventory table structure that they require.
iv) The special case of the UOT tables
The UOT inventory tables were chosen as a test case by DPC14 because of the vast numbers of individual files that make up the data set - around 1 million profiles. The WG spent some time discussing the way in which the UOT inventory tables could be managed. The suggestion is that the UOT DAC provide a two-tiered inventory based on their existing zipped and tar’d data files:
Top level table (one line per tar’d file of one basin and one quarter, ~ 120 lines)
Second level, 120 sub-tables (one line per profile, ~5000 lines)
The search engine will be able to search only the necessary second level tables according to the users initial requirements.
The high density (HD) XBT files should be treated separately from the low density files. The HD files should be in a separate table again, with one line per profile. The search engine must be able to distinguish between UOT LD and UOT HD.
v) Minimising the size of inventory table files
The WG discussed the ways in which the filespace taken by the inventory tables could be minimised. Suggestions included:
a) For files where the min and max of a variables are equal (e.g. lat/lon for profiles), the max field can be left empty. This convention can be recognised as meaning the min and max are equal by the search software. This suggestion was accepted by the WG.
b) Some variable min and max could have reduced precision in the inventory tables, e.g. one decimal place for latitude and longitude, temperature, u, v, sea_level, and 2 decimal places for salinity. After discussion this suggestion was rejected since little space was saved and some information was lost.
vi) Standardising variable units
The units of the key variables (e.g. temperature, sea_level) need to be consistent within the inventory tables even if they differ between files from different DACs. For example the netCDF files from PODAAC have the units for SST in 1/100 °C, and sea_level in millimeters.
Action 7: the PODAAC must supply inventory tables with data_min and data_max expressed in degree_C and metres.
The WHPO has 3 possible units for temperature that reflect the temperature scale by which the measurements were made; ITS90, ITPS68 and degree_C.
Action 8: Schlitzer to contact Jim Swift to ask whether it was possible to use the standard degree_C units in the netCDF files, and add the temperature scale as a global attribute.
vii) pathnames in the inventory tables
These must have forward slashes (/) for compatibility with PCs, unix and Macintosh. The pathname should begin “./” so it can be used from either the disks themselves, or a directory mounted onto a hard disk.
viii) CD Name, Pathnames and Filenames
It became apparent that the inventory table will also need to have a column of the CD name (CD_NAME, max 8 characters) in order to allow the user to find the files. It was considered most useful to have this as a separate field to the pathname. Zip file names will also need to be separate from the pathname and unzipped filenames
ix) Global gridded fields
It was noted that some datasets are global gridded fields and as such a search on location only will return all the files (e.g. satellite data).
Action 9: The DIU is requested to implement in the search tool a method to allow the user to choose to exclude any of the data streams.
Action 10: The Kilonsky/Holliday FAQ document to explain the data stream granularity and search pitfalls such as this.
5. WHP Data Search Engine
Steve Diggs gave a demonstration of the WHPO online search engine (whpo.ucsd.edu/cgi). The search engine returns a subset of cruise tables according to the search criteria. It works by searching flat ascii files ($ delimited) using a cgi (perl) script. The ascii files are updated weekly from the main WHPO information database. The advantage of the cgi search over Java is that it was quicker to write and can be used online by low-memory computers. Drop down lists of some search criteria were not used since they proved to be too long to be useful.
The WG were impressed by the usefulness of the search engine, noting that it was a good integration tool.
Action 11: The WG encourages the WHPO to continue to develop and maintain the search engine, and to provide a link to it from the main WHPO data pages.
6. Status of DODS servers at WOCE DACs
The DACs present who had installed DODS servers were asked to comment on the ease of installation and usage. The consensus was that the installation was relatively straightforward, but that use of the servers by the community was limited. Servers are currently installed at WHPO, Fast SL, Surface Met, Cersat, PODAAC, and are currently under development at ADCP and MEDS. The WG noted that while use by the community appeared limited at present, the experience gained by the DACs was useful and ready to put to use in the context of the V3 data online.
7. Climate Data Portal
Bernie Kilonsky gave a brief overview of the CDP. Currently the CDP includes data from PMEL (TAO and hydrography), UH (sea-level) and MEDS (GTSPP). For a centre to be part of the CDP it must install particular database management software that CORBA (the transport mechanism) recognises. When a user performs a search using CDP (http://www.epic.noaa.gov/cdp/) the software looks for recognised servers and retrieves data. Future plans are to include ARGO profiles, WOCE and CLIVAR data, and DODS proxy servers. The data must follow Epic conventions (http://www.pmel.noaa.gov/epic/document/convention.htm) at the moment, though CDP are intending to expand to include other standard formats. Development is ongoing through a proposal funded for 2002. The main issues for development are expanding the data bases and increasing the functionality. He concluded by remarking that CDP may not be appropriate for the V3 static issue, but could be useful for future online versions.
8. The Time Variable
The woce_date and woce_time_of_day variables were devised to present users with a time variable that they could visually recognise if they chose to dump the netCDF files into an ascii file. This decision was made with low-tech users in mind. However it was noted that this format is distinctly computer-unfriendly and results in the program ncbrowse (a staple program for users unfamiliar with netCDF) being unusable for data sets with these variable as the only time information. ncbrowse does work on the V2 surface met data, because they express time as months and years since a time origin, a format recognised by ncbrowse because it is COARDS compliant. COARDS is the "Co-operative Ocean/Atmosphere Research Data Service ", a NOAA/university co-operative for the sharing and distribution of global atmospheric and oceanographic research data sets (see http://www.unidata.ucar.edu/packages/netcdf/conventions.html).
It was strongly argued by the WG that the V3 netCDF files should contain a time variable that was COARDS compatible in order to make the WOCE netCDF recognisable to ncbrowse, Ferret and other existing applications. The suggested new variable name is "time" and the format is supplied by the COARDS convention document:
Extracted from http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html:
"Time or date dimension:
Coordinate variables representing time must always explicitly include the units attribute; there is no default value. A time coordinate variable will be identifiable by its units, alone. The units attribute will be of character type with the string formatted as per the recommendations in the Unidata udunits package version 1.7.1 (http://www.unidata.ucar.edu/packages/udunits/index.html). The following excerpt from the udunits documentation explains the time unit encoding by example:
seconds since 1992-10-8 15:15:42.5 -6:00
indicates seconds since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). The time zone specification can also be written without a colon using one or two-digits (indicating hours) or three or four digits (indicating hours and minutes).
The acceptable units for time are listed in the file udunits.dat. The most commonly used of these strings (and their abbreviations) includes day (d), hour (hr, h), minute (min), second (sec, s), year (yr). Plural forms are also acceptable. The date string may include date alone; date and time; or date, time, and time zone. It is recommended that the unit "year" not be used as a unit of time. Year is an ambiguous unit as years are of varying length. Udunits defines a year as exactly 365 days.
A time coordinate variable is identifiable from its units string, alone. The udunits routines utScan and utIsTime can be used to make this determination. (*Note that at the time of this writing the author of this draft profile had not tested these routines personally.)"
(End of extract)
Action 12: The WG requests that each DAC add a new variable "time" that is COARDS compliant, to be in addition to the variables woce_date and woce_time_of_day. The DACs may choose their own time origin and format, as long as they are COARDS compliant.
** this action item needs to be modified when a consensus is reached**
9. Ocean Data View as an Integration Tool
Reiner Schlitzer gave an overview of Ocean Data View (ODV; http://www.awi-bremerhaven.de/GEO/ODV) as a potential tool for integrating the WOCE data. He noted that ODV would be an add-on product rather than being part of the WOCE data product (as "eWOCE"; http://www.awi-bremerhaven.de/GEO/eWOCE/). The software currently runs on Windows and Sun Solaris (not online) and needs to have a fast connection to the data files to be effective. Data must be in the ODV binary format which is designed to be useful for profile data (unlike netCDF!), and maintains quality flags. It is optimised to have compact data storage and fast access. ODV provides interactive data access, analysis and visualisation. It has automatic inventory functions, and extensive data selection functions (regions including irregular polygons, date/time, availability, name, data quality etc). It can import major oceanographic data formats (including WOCE) and exports ascii or ODV binary files. At present eWOCE contains WHP CTD and bottle files, and UOT data,
Recent developments underway for ODV include built-in support for netCDF files (i.e. it can interrogate the netCDF files without the need to convert to ODV binary); compatibility with Linux, Unix and MacOs X; and platform independence of the ODV binary files. Planned for the time of the V3 issue are access to all other WOCE data files and multi-platform compatibility. Distribution of the WOCE data files in ODV format plus the ODV software for all supported platforms could be on 4 CD-ROMs or one DVD (see also section 12 below).
The small disadvantage of ODV is that the binary files do not support all the non-critical information contained within the WOCE netCDF files, and thus could not replace the netCDF files. Some files are subsampled to make them a manageable size, (e.g. drifter data), though the user retains the option to import the full resolution files if they wish to.
The V3 WG was very excited by and impressed with the new version of ODV. With the inclusion of all the WOCE data types the ODV will be a very effective and easy tool for users requiring ascii files and for users who are less familiar with oceanographic data.
Action 13: The V3 WG congratulates Reiner Schlitzer on the progress with ODV and strongly encourages him to have the multi-platform version with all WOCE data available for the V3 issue.
10. The DIU Search Engine
Patrick Conlon demonstrated the DIU Java search tool that he has been developing. He spent much of the summer creating inventory tables because many DACs did not send them, but despite this has made good progress with the search engine. Currently the inventory tables are stored using Pointbase database that the Java tool can access (this may change). It takes 1-3 minutes to search the 50,000 lines (excluding the UOT table, 15-20 mins including UOT). The user will just need a web browser (version 4 upwards) and no other software. The user can draw a rectangle on the live map and choose a time period from drop-down menus. A "results" file is created which the user can view; this file is written to the user's hard disk which created some security problems that may need further work. It currently works on a PC, and is soon to be tested on a Mac and on Unix. Currently the tool searches on lat/lon and time only, though the final product will include more parameters. The DIU is intending to have a fully functional version of the search engine by DPC15 in March 2002.
Action 14: DIU to write a readme file to guide users through the search engine functions (also reachable by a "Help" button?). It was suggested this includes an explanation of the pathname, to help users only familiar with Windows Explorer and web browsers who have never encountered pathnames.
Action 15: The Kilonsky/Holliday FAQ document should link to the DIU guide. Both should link to the Bindoff netCDF primer.
The V3 WG was pleased with the progress of the DIU search engine. The generation of a list of filenames and pathnames is a big step forward and the DIU were congratulated.
11. The Giant Leap from a List of Files to the Data Files
The WG then discussed how to facilitate the big step from a file list to the data files themselves. Unix users who may have copied all the CDs to a disk drive will be able to write a simple script using the text results file to retrieve the data. The notion of a simple html link was suggested, though this may require the user to swap CDs and hence may not be effective. It was also acknowledged that the presence of zipped files complicated the generation of direct links. It was suggested that the DIU gave the user the choice between receiving a text file and an html file with local links (they could perhaps specify the location of the data files - hard disk or disk drive?). This was agreed to be the best solution for users of the entirely static version.
However in a flash of inspiration from Legler and Holliday the V3 WG came to a workable solution for users with online access. The entire V3 holdings (all unzipped) could be copied onto an online server, and the DIU search engine (online and included in the CD set) could give the user the option to generate DODS links to that server (in addition to plain text and local html links). That way the user with internet access could instantly view the file metadata and download at will. The server would simply be a copy of the V3 data holdings and as such would require minimum maintenance (as opposed to future online versions of WOCE data that will evolve in time and therefore require continuous maintenance). It was recommended that the server have a moveable domain name (e.g. wocedata.org).
Action 16: Legler and DIU to investigate potential location for the online server.
Action 17: DIU to implement 3 choices for the user's results file; plain text, local html links, and DODS server links.
Action 18: DIU to inform Peter Cornillon of this decision, to become familiar with DODS and creating DODS URLs, and to solicit help from Cornillon if necessary.
It was recommended that the DIU use the Fast SL data which are already present as a DODS server as a test case.
Action 19: Bernie Kilonsky to provide an inventory of his online SL files to the DIU.
12. Media for V3: CDs vs. DVDs
Throughout the meeting and this report the default media for the V3 has been CDs. However the V3 is aware that there will be a high number of CDs and that it will be desirable to have fewer disks, something that DVDs could provide. It is also aware that there are compatibility issues with DVDs, and concern over the lack of widespread use of DVDs. The decision to go with CDs or DVDs will be made at DPC15 when more information will be available.
The CD production timeframe was set out by DPC14 (following extracted from that report):
The V3 WG added/amended the following details:
14. Next meeting
Finally it was agreed that the V3 WG would meet the day before the DPC15 meeting in Hobart to discuss issues arising from the recommendations made by this meeting (Monday 18 March 02).