NODC Responses to Comments Received during netCDF Template Public Review and Comment Period
Thanks to everyone who sent us comments during the Public Review and Comment Period, February 20 through March 12. We solicited your feedback on the three components of our NODC netCDF templates: 1) The set of 13 templates, in the ASCII representation of a netCDF file known as CDL along with their corresponding real-world examples; 2) A decision tree to help you decide which of the templates to use; and 3) A table providing guidance and recommendations on creating the netCDF variable and global-level attributes. We received 46 comments from more than 15 people, and have published our responses to them at http://www.nodc.noaa.gov/. The responses include a description of any action we will take, and by Monday, March 26, we will publish Version 1 of the templates. At that time we will also publish information on how to provide ongoing feedback over time, as we anticipate making revisions on an annual basis for the next few years.
A few key "principles" appear to be emerging based on these responses:
- Be explicit! Do your best to provide accurate values for the attributes, and if you don't know them, state that clearly. Avoid "NA" or "N/A" as they can have multiple meanings.
- Keep your attributes focused on the content of the file, not the overall collection to which it might belong. For broader information about the collection, consider providing an FGDC or an ISO 19115-2 (preferred by NODC) metadata record.
- Also remember that ACDD and CF tend to focus on what are sometimes know as "discovery" and "use" metadata. Basically, this kind of information focuses on helping you find the file, then using it in your application. The attributes don't try to capture every possible detail of the provenance or processing of the data, for example. Again, for richer information your best bet is to look to ISO 19115-2 as your metadata transfer standard.
- Our templates don't try to cover every possibility. We focused on the templates that we felt would cover most situations involving observations of the ocean. If your situation doesn't match one of the templates, that is ok! Just use as much of the templates as you can and consider using some of the more complicated representations detailed in the full CF documentation.
As we move forward we will refine our thinking about these "principles" and look at ways they may help us to better explain how to implement these templates for specific data projects.
Again, thanks to everyone who took time to read and consider the templates, and come back in one week when you will be able to download Version 1.0 of the NODC netCDF templates!
Kenneth S. Casey
Technical Director, NODC
|Comment or Issue:||Your name (optional)||NODC Response||NODC Action|
|The first decision in the decision tree should not ask if the data are "satellite or in situ"... instead it should just ask if they are arranged on a space-time grid or not... if a regular grid, then use grid.. if oriented in sensor coordinates, use swath.||Anonymous||Yes, this is a good suggestion.||NODC will publish the new decision tree to the NODC web site next week along with the revised guidance and templates.|
|Platform ID variable for glider/ROV/AUV data sets.Most of these types of platforms won't have call signs, IMO numbers, WMO numbers. Is there a default for these platforms or can these be left off of the variable as they are "RECOMMENDED".||Frederick Bahr||We recommend that when populating any of the global or
variable level attributes that you be explicit whenever possible. For
platforms which do not have an associated WMO code (and you are certain
about it), you could simply omit the wmo_code attribute since it is not
relevant or better yet, be explicit and use the attribute, wmo_code="Not
applicable". However, if you are unsure of the wmo_code but believe one
exists, you might instead do something like this: |
platform_variable:long_name = "Sea Spray" ;
platform_variable:comment = "Example platform container variable" ;
platform_variable:call_sign = "Unknown" ;
platform_variable:ices_code = "320D"; // ICES codes at: http://www.ices.dk/datacentre/requests/Login.aspx
platform_variable:wmo_code = "Unknown";//Information on getting WMO codes is available at http://www.wmo.int/pages/prog/amp/mmop/wmo-number-rules.html
platform_variable:imo_code = "Unknown";
We recommend you avoid using ambiguous phrases like "NA" or "N/A" which might mean different things (e.g., Not Applicable, or Not Available). ACTION: NODC will add the web site URL for where to find out about WMO codes to both the guidance table and the template CDLs.
|Many times we are acting as an intermediary to make the data
available from sources that don't have the resources or skills to make
Many fields might have to be filled in with the best available information from the source.
|Frederick Bahr||We understand this challenge of relying on information provided by external sources. and in fact deal with it every day at NODC! So, please view these templates as "starting points" or guidelines to help you prepare more standardized data sets with less effort, and not as absolute requirements that must be fully completed before submitting your data to the NODC Archive. Again, as stated in the response to the comment above, we recommend you be explicit when representing your state of knowledge regarding the various attributes and do not attempt to convey information through implicit assumptions.||None required.|
|Are Boolean flags required? or can you just have an enumerated flag.||Frederick Bahr||The CF Conventions permit the use of boolean flag variables, enumerated flag variables, and a hybrid flag variable type that is a combination of the two. Our experience has shown that this last case is incredibly difficult to explain and properly use, so we recommend that when using flag variables that you keep them strictly enumerated or strictly boolean, and avoid using the blended form. To directly answer your question, you are free to use just an enumerated flag variable if you want to.||None required.|
|Thanks for the examples, very helpful.|
For the trajectory template:
- For Seagliders there often multiple instruments that record their data on different sampling grids, hence different sampling times. Hence there will not be a unique 'time' variable. Should the 'coordinates' therefore mention the proper associated time variable (and perhaps depth, lat, and lon variables as well) (You have 'do not change' on this REQUIRED variable attribute but your examples don't follow this). They will all have the same standard_name attributes. Normally we compute the lat, lon, depth and time of the vehicle and let consumers compute implied lat, lon, and depth from a sensor time as needed. Are we required to compute these? For example, we might have an optode that is sampled infrequently compared with the glider. Thus we might have a variables aa4330_dissolved_oxygen(aa4330_obs) with ancillary_variables "aa4330_time" but should we add coordinates as well?
|Jim Bennett||To handle situations like this, we recommend that you separate the data from the instruments measuring at different frequencies into different netCDF files. We hope that someday soon CF will provide conventions for the netCDF-4 enhanced capability known as "groups", which make it possible to represent variables with different time coordinate variables in a single file. However, for now, formatting these data in one file would require using the incomplete representation with many fill values for the lower frequency measurements. An approach like that could be inefficient and challenging for others to understand. Our NODC netCDF templates don't try to capture every possibility right now, so we recommend using your best judgement and follow as many of the concepts documented in the templates, CF, and ACDD as seems reasonable. CF feature types work best if all the geophysical variables are measured at each sample point or in other words they share the same coordinates along all the dimensions. If they are on different coordinates, it could be a problem for client applications.||No additional action at this time is required on the templates or guidance, but sufficient interest appears to be available to consider putting together a draft proposal to CF for including netCDF-4 groups in the convention.|
|- For the global 'sea_name', do we use the names or the ids for this field? E.g., "Davis Strait, Labrador Sea" or "15 15A" or both "Davis Strait 15, Labrador Sea 15A"? Is there a standard lat/lon bounding box definition for these seas so we can automatically deduce them?||Jim Bennett||We recommend that people use the actual sea name (example, "Labrador Sea" ) in the sea_name attribute. We are considering providing a web service for people to submit a coordinate and get all of the authoritative sea names that are affiliated with that area. However this will take some effort and we do not have a firm deadline in place yet.||No follow up action is required at this time, though we will continue working on the sea names web service.|
|- Why are __FillValue required for lat, lon, z, and time
variables? Are there accepted 'correct' fill values here? Should we use
inf or nan?||Jim Bennett||Jim, your question highlighted an error in the draft
templates. For coordinate variables like lat, lon, z, and time, there
should be no _FillValue. We'll fix this in both the templates and the
guidance table. |
For other variables, the templates also need to be modified to change _FillValue from Required to Recommended.
Regarding your question about "accepted" fill values to use, there is no single universally accepted fill value in use. However, if you are using the netCDF library to create your data, it will provide a default fill value for each data type (e.g. NC_FILL_CHAR, NC_FILL_BYTE, NC_FILL_SHORT, NC_FILL_INT, NC_FILL_FLOAT, and NC_FILL_DOUBLE in C). NODC recommends the use of these defaults over the use of "nan" or "inf."
|NODC will update the templates and guidance table to be clear that _FillValue attribute should not be used with coordinate variables, and that it is optional for other variables.|
|- For keyword vocabularies, I note that GCMD recommends not
using their current vocabulary; should we avoid this? Further I assume the
AGU index terms are those for manuscripts at
||Jim Bennett||The following is the reference for the Global Change Master
Directory Scientific Keywords, which is a valid keyword vocabulary for use
in the NetCDF templates:|
Olsen, L.M., G. Major, K. Shein, J. Scialdone, R. Vogel, S. Leicester, H. Weir, S. Ritz, T. Stevens, M. Meaux, C.Solomon, R. Bilodeau, M. Holland, T. Northcutt, R. A. Restrepo, 2007 .
NASA/Global Change Master Directory (GCMD) Earth Science Keywords. Version 220.127.116.11.0
|NODC will spell out 'NASA/GCMD Earth Science Keywords. Version 18.104.22.168.0' in the templates to avoid confusion.|
|Thanks for providing this guidance - this is going to be very helpful in the next version of the OceanSITES NetCDF specification. In reading through your documentation, I had some concerns, and I really appreciate having the chance to discuss them.||Nan Galbraith||Thanks Nan! We thank you for taking the time to provide such a careful and thoughtfiul review!||None requried.|
1.The CF feature types do not adequately represent surface mooring data (2D data sets comprised of multiple time series at different depths, with different sample rates and different variables measured at different points along the mooring line). Calling these profiles may preempt adequate description of instrument characteristics in the future.
|Nan Galbraith||We agree and understand the problem you are facing. We addressed a similar comment in item 6 above. Please see that for more information.||None required.|
|2. ACDD File dates: I applaud you for adding non-circular
definitions to the terms 'created' and 'modified.' However, in my group,
the file dates that need to be documented are a.) the date this version of
the data (the data values) were updated and b.) the date the current
NetCDF file was written. This combination tells the user whether he is
working with the most recent values, since the date the file was written
may just reflect formatting or metadata updates. |
Further, in our processing system, there is no concept of "date_created" as you define it: 'The date or date and time when the file was created. ... This time stamp will never change, even when modifying the file.' We never modify a file, we always overwrite, and there is no 'original' date available. Our work-around for ACDD file dates (created, modified and issued) is to use the date the NetCDF file was last written as date_created and date_issued, and the last date on which data values were changed as date_modified. If your convention is adopted, we will lose the ability to document the date of the version of the data values - a loss for us.
|Nan Galbraith||Nan, thanks for the comment and we see the dilemma you are facing. We would, in your case, recommend following the idea that netCDF attributes should refer as specifically as they can to the file they are contained in, that both your date_created and date_modified would be the same in a given netCDF file, even if it is an updated version of a previous netCDF file. References to the earlier version of the file could be achieved in other ways, perhaps using the history attribute or introducing a project-specific attribute like "version" as many other data projects do. However, if it makes most sense in your system to do things the way you have described, then our recommendation would be to make sure that is clear in your comments attribute and perhaps other places where you document the data.||None required.|
|3. geospatial_*_resolution terms: The term 'resolution' has
too many meaning to be useful here, and the examples ('point' or '10
degree grid') add to the confusion. Is this sample density, or accuracy of
values? To be consistent, time_coverage_resolution describes data density,
the interval between records. |
The function of this term, IMHO, is to convey the accuracy of the coordinate values; in most surface mooring data, x, y, and z coordinates are usually simply best guesses, and this is important for data aggregation (although it can also be conveyed in the coordinate attributes - not sure it is needed as a global attribute). In any case, the density of the measurements (e.g. 10 degree grid) can be obtained from the coordinates themselves, and in the case of depths, can't be summed up as an attribute, since coverage will often vary with depth.
As a work-around, I'm using geospatial_*_accuracy attributes; their meaning seems more clear to me.
|Nan Galbraith||We interpret these attributes as an attempt to convey sample
density and not accuracy and agree that there is some redundacy here. The
density of measurements can also be obtained from the CF coordinates
variable if the coordinates are on a regular grid. The redundancy occurs
because two different conventions are being used (CF and ACDD) each of
which doesn't necessarily assume the presence of the other in the same
file. We hope that in the future revisions to CF and ACDD will bring them
more fully in alignment with one another, but for now we strongly
recommend including both.|
Multiple reviewers raised issues about using the term "resolution" in a narrow sense of the term so if you are unsure what to do, please contact us at NODC with your specific situation and we'll work with you to help figure out what makes the most sense to do.
|No follow-on actions are required at this time, but at some point in the future NODC may wish to advocate on behalf of the community for a tighter coordinate between CF and ACDD.|
|4. I tried using the decision tree, but could not get past the first step: 'Identify the CF feature Type based on the coordinate relationship between the observation points'. My data is 'stacked time series stations ' at multiple depths at a single x, y location, and this is just not on the list of feature types, as far as I can see.||Nan Galbraith||Nan, we agree and understand the problem you are facing. We addressed a similar comment in item 6 above. Please see that for more information.||We will attempt to more clearly explain in the decision tree what to do with cases like this that just don't fit well with the existing templates we've provided.|
|5. Guidance: I appreciate the additional definitions you've provided - this is a big help; e.g. you equate publisher and distributor, and you essentially define 'creator' as the principal investigator . I can't find these definitions/translations in the ACDD documentation. There are still some terms that could be defined more clearly, or where 'Please follow the guidance in the Description' could be expanded upon, but overall, the guidance document is very good.||Nan Galbraith||Thanks!||We'll review all of our guidance information to see where we can attempt to add some clarity. We can place some links in the introduction to "Standard Attributes and Guidance Table" on the GEO-IDE Wiki to the ACDD and CF attribute definitions web sites. We can review each attribute with the phrase "Please follow the guidance in the Description" to see if we can explain further, but it would help if the reviewers could give us details as to which attributes were lacking in detail.|
|This form could be improved upon by using a checkbox to allow
multiple terms in the 'area of concern' below. I'll choose 'other concern'
since I'm commenting on all the areas in the pull-down menu. Thanks
again!||Nan Galbraith||Great suggestion. This is our first use of the Google Forms capability that NOAA now has as part of our enterprise adoption of Google services like email and calendar.||No follow-on actions are required at the moment but the next time we use a Google Form we will keep this suggestion in mind.|
|Just a suggestion regarding the guidance; I've been spending
quite a bit of time reading through this! One of the shortcomings of the
ACDD and its documentation is the lack of guidance on keywords. Should
these be general concepts like 'meteorology', or specific variable names,
like 'wind_speed'. Should they describe anything other than the types of
variables - perhaps something about the methodology - e.g. 'observational
I see you've included CF standard names as an option for the keywords vocabulary, which I guess indicates a leaning towards being very specific and describing the individual variables.
Since I have not found a really good option for a single set of keywords to describe my met and ocean data, I thought I'd take a look at some of the other code lists mentioned in the guidance document, NODC Data Types and NODC Observation Types. I'm unable to find these on line, though.
So, after this long comment, my actual suggestion: Could you add links to the NODC keyword vocabularies on your guidance page?
Thanks - Nan
|Nan Galbraith||The links to the various vocabularies were provided in he proper places, however it seems that for some people these links did not work.||We will explicitly write out the URLs in the guidance table.|
|Most measurements made in the ocean (and also the atmosphere) usually rely on pressure (p in excess of atmospheric pressure) as the fundamental variable that indicates depth (or altitude). Oceanographic instruments have pressure sensors, but almost never true depth sensors. We don't drop a tape measure into the ocean! If temperature (T) and salinity (S) are measured at the same time, then water mass density (rho) can be computed and depth (z) derived by integrating the hydrostatic approximation dz=dp/(rho g) where g (a function of position) is the gravitational acceleration. If measurements of T and S are not available, then density must be approximated before depth can be computed; therefore two approximations are involved. Dynamical quantities such as the dynamic height and geostrophic velocity are computed between pressure levels, not depth levels. Because pressure is a fundamental variable from which depth is only approximated, why not have pressure (in dbar or bar or Pascals) be a basic unit of depth? If the depth (z in m) is approximated and data files originally at constant pressure increments are converted to those at constant depth increments, pressure accuracy is lost, and the results are not raw, basic measurements but instead derived results.||E. D. (Ned) Cokelet||Our recommendation is to record your best estimate for the
coordinate system in the proper units, like depth in meters, and also to
record the accuracy/uncertainty associated with the indirect estimates of
length or position. This approach could be as simple as recording
instrument attributes for the length measurements, or it may even involve
recording the raw measurements used to calculate distances, such as
pressure values that are used to compute depth.|
If conversion to meters represents a significant compromise in precision or accuracy, or if pressure is a more meaningful indicator of the vertical coordinate than depth as was suggested, then the geophysical variable for pressure could be referenced in the following variable attribute like this:
z:ancillary_variables = "pressure" ; //.........RECOMMENDED - List other variables providing information about this variable.
That way, the next person to come along and use the data can choose whether to use the processed depth values, or whether they should work directly from the pressure values provided in the ancillary geophysical data variable. In part this decision may depend on the processing algorithm that was used to covert from pressure measurements to depth, or the specific uses being considered by the user. One would typically also record the conversion method in the following variable attribute:
z:comment = "" ; //........... RECOMMENDED - Add useful, additional information here.
State, for example, the parameters used to convert from pressure to depth from the hydrostatic equation, or that the depth in meters was equated to values of pressure in decibars.
|NODC will review the text of our guidance to see if there are better ways to communicate the purpose of the coordinate variables for the interoperability of the data formatted using the netCDF templates.|
|The link to
timing out. Please check that the site is up so that I can proceed to
evaluate the templates. |
The connection has timed out
The server at geo-ide.noaa.gov is taking too long to respond.
I tried to access the page 3/7/2012 14:32 PST
|John Wilkin||John - we apologize for the inconvenience and are glad you were able to finally access the site.||None required.|
|In the time series template, there is a recommended global
It is described as "Machine readable unique identifier for each file. A new uuid is created whenever the file is changed".
NDBC SW team had been implemented a mechanism for generating NetCDF files for NODC that allow us to provide modified NetCDF files via a version number in file names and an associated history text file along with it. Therefore the original version of data files are kept. When I see the definition of the uuid, my question is that the uuid attribute is maintained within NODC only for your internal purposes to know the "version" of the data? If not, how it is used outside NODC?
|Jing Zhou||Thanks for this very interesting question. We are suggesting the inclusion of UUIDs to facilitate many possible future activities such as duplicate checking, version management, and rich inventories. The Federation of Earth Science Information Partners (ESIP, http://www.esipfed.org) conducted an extensive review of the many types of identifiers available and eventually focused on two, DOIs for citable data sets (defined as a collection of related granules) and UUIDs for granules. See the ESIP work on identifiers at http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Identifiers and the ESIP Data Citation guidelines at http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines.||None required.|
|***geophysical_variable_1:coordinates = "time lat lon z" ;
//... REQUIRED - Do not change***|
Although there are no restrictions to the order of coordinate variables listed in the coordinates attribute as shown in the template above, we would like to have an option as
geophysical_variable_1:coordinates = "time z lat lon" ; //... REQUIRED
|Jing Zhou||Your comment highlights an error in our templates! You are right that the order doesn't matter.||We will modify "Do not change" to be something like "include the auxiliary coordinate variables and optionally coordinate variables in the list. The order itself does not matter."|
|It is great that NODC have provided these netCDF templates as
a way of example and best practice. The standard global attributes for
dataset discovery are also excellent practice on the metadata side.
Hopefully once more people start using these standards for then re-usable
software will follow which can e.g visualise an aggregation of netCDF data
such as a contoured vertical cross of temperature along a cruise
We envisage using the "profile" feature type (varying set of z off each profile) as a way of aggregating mutiple XBT or CTD profiles say from 1 cruise or 1 track. Such aggregations could be provided to data requestors or placed online in THREDDS.
|Andrew Walsh||Andrew - thanks for this comment and at NODC we also hope that more of the appliction developers will begin including support for these new CF feature types. It is our understanding that NOAA's PMEL, which develops and maintains the Live Access Server, is already working on that support, for example. Matlab R2012a also now completely and natively supports netCDF so no third party tools are needed nor are special compiler options needed at time of installation. This 100% native support includes the use of OPeNDAP URLs instead of local file paths in all of the Matlab netCDF functions.||None required.|
|I wonder about using the "Incomplete array" approach versus
the "ragged array" approach. ( See netCDF CF conventions|
document version 1.6, http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.pdf. Section 9.32 - "Incomplete
Mutidimensional array" and sections 9.3.3+9.3.4 - "ragged array")
Looking at the template examples it seems
they all use the "incomplete array" approach. I can see that the incomplete array approach is expensive for space (pads with empty values up to longest no. of depths/times) yet may be simpler for reading/writing. Whereas the ragged array is more compact but more complex for reading/writing as it uses indexes.
It seems that one might choose from these 2 approaches e.g. if you needed transmission efficiency or were short on disc space you might like the ragged array approach. So my question is whether NODC will put up templates that show the "ragged array" approach as well?
Thanks and Regards,
Oceanographic Data Manager
|Andrew Walsh||That is right, we haven't used contiguous ragged array representation in the templates and have chosen to focus on the incomplete array approaches. We wanted to start with the simplest representations first that we believe will handle most of the data that we receive at NODC, and then move to more complex representations. Later versions of the templates might include the contiguous ragged array representation. So, if you feel that ragged arrays are more appropriate ways to represent your data than please feel free to use them. Contact us and we will be happy to help with these more complicated representations.||We will attempt to be clearer in our decision trees and introductory material about this focus and our initial avoidance of the ragged arrays.|
|Here is the JPL DE group review of the NODC metadata model
The primary concern alluded to in the introductory email is that CF has not defined a swath model. However, the best practices as defined by the Group for High Resolution Sea Surface Temperature (GHRSST) Project is one way forward that has proven to be quite successful. See the GDS 2.0 specification document which can be found on www.ghrsst.org (https://www.ghrsst.org/files/download.php?m=documents&f=111107150103-GDS20r4.pdf)
|Ed Armstrong||Ed - thanks to you and the JPL Data Engineering group for your detailed and thoughtful comments. We agree 100% that the GHRSST approach documented in the GDS2.0 has been shown to be quite successful, and we even considered putting forth a draft swath template based on the GHRSST specifictions. However, in the end we decided to focus first on the feature types that CF had included in the new CF 1.6, and to worry about swaths a little later. In the meantime we were also hoping that the CF-Satellites group might come up with a draft swath feature type.||NODC will continue to monitor CF discussions and if no swatc feature type is defined in CF within about one year we will reconsider creating a template based on GHRSST's GDS 2.0 for Level 2 data.|
|One item that needs work on this proposed model that it is short on provenance. More is needed in addition to the 'history' attribute to correctly document dataset provenance and lineage. For example, 'source' should at a minimum specify satellite data inputs for value added datasets.||Ed Armstrong||We agree that the CF and ACDD standards don't include much in the way of provenance information. Some is captured with existing attributes though not very much, and perhaps more could be included within the data files by defining new attributes or redefining some of the existing ones. However, our approach has been to attempt to remain true to the CF and ACDD conventions to the best of our ability, and to rely instead on our overall Archive Management System to maintain the richer provenance information. Our archival preservation model uses various internal tracking mechanisms and and also tries to capture fuller provence information using associated, structured metadata files conforming to FGDC, and more recently, ISO 19115-2. The ISO 19115-2 standard in particular can handle provenance information very well.||None required at this time, though NODC will continue to monitor what is happening in the preservation community with respect to provenance information.|
|Here are some more specifics, recommendations and errors
found in the proposed model|
• An indication of center or pixel,, UL etc is needed for proper georeferecing
|Ed Armstrong||CF recommends using the attribute "bounds" to define the bounds of each pixel. If a coordinate variable has corresponding bounds variable, then the value in the variable itself doesn't matter as long as it is within the bounds. Georeferencing can then be done using the bounds.||None required.|
|Grid mapping links should point to here: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.5/cf-conventions.html#grid-mappings-and-projections||Ed Armstrong||We agree that information is useful, and already have a link to it in our guidance table.||None required.|
|geo_spatial_lon units --> 'degrees_east'||Ed Armstrong||Ed, good catch! It seems we fell victim to the old "copy and paste" problem.||We will change "degrees_north" in the geospatial_lon_units row to "degrees_east".|
|text discusses padding values, what should they be?? _FillValues should be explicitly designated||Ed Armstrong||Thanks Ed.||We will change the text to explicitly mention that padding should be done using the fill value (_FillValue) that is used for the variable.|
|'Project' attribute: how will this be used to describe a a hierarchy, e.g., cruise/transect/station||Ed Armstrong||Because of the various uses of the term 'project,' and the general inability of a netCDF attribute to easily capture hierarchical relationships, we recommend using the project attribute to most accurately describe the data that are represented in the file. Therefore, just reflect the project for which the data were collected, even if there is a broader program to which it pertains. Likewise, if the netCDF file has data from only one cruise from a multi cruise project, then consider recording here the name of the cruise and not the overall project. More complicated relationships can be captured in a ISO 19115-2 metadata file, which can be referenced using the global attribute metadata_link. Then, using ISO metadata conventions, the hierarchy of the data in relation to a station/cruise/project/program could be specified more explicitly.||None required.|
|For attribute 'source': states 'The method of production of the original data' -- not correct. its just the input data sources. Need a separate attribute for production aspects.||Ed Armstrong||Thanks Ed - this attribute often causes a lot of confusion since the word "source" does seem to clearly mean the input data sources. However, the language we used in our guidance table comes directly from the CF documentation at http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html#description-of-file-contents. We are trying to remain true to the standards as they are documented when possible, but clearly using the sources attribute identify input data sources would not be wrong.||None required.|
|A version ID of some sort (not uuid) is needed||Ed Armstrong||Currently, neither CF not ACDD include a standardized attribute to capture version information. We could have created a version attribute, but we followed the ESIP suggestion of using UUIDs for granules instead. Also, since our Archive Management System maintains strict version control of the data in the NODC Archive we felt we did not need to burden producers with that additional attribute. However, as many already do, the data Producer could just add a project specific attribute to capture the version of the file. Or, the history attribute could possibly be used for this purpose as well.||None required.|
|'time coverage_resolution' is missing underscore: What is this attribute ? Time difference from each time step?. How does this relate to satellite data ? Will it refer to pixel to pixel time difference ?||Ed Armstrong||We don't interpret this attribute as the pixel to pixel time difference, but rather the difference in time stamps between two layers of the data in time. If there is only one layer, then this attribute should be set to "point". If the density of the layers along time is not uniform, then there would certainly be an issue in determining what value to put into this attribute and the average value of the time steps could be used.. ACDD says the attribute is supposed to provide an "idea" of the density of points in time. A single value might not be sufficient to provide the "idea" in many cases. We suggest setting this to whatever seems reasonable, and if it is truly not relevant to set it to "Not Applicable".||We will correct the guidance table for the missing underscore character.|
|Attributes should be organized into separate global and variable tables.||Ed Armstrong||We discussed this possibility early in the effort and decided to keep all the guidance together in one place.||None required.|
|Title' should have some guidelines of what to include, such as source, instrument, level, parameter. Something for google to easily index eventually !||Ed Armstrong||Ed, this is a good idea.||We will provide additional guidance and an example of a good title, which answers the questions "Who, what, where, when, why?"|
|No list of required vs optional attributes||Ed Armstrong||Our templates follow ACDD, which only includes recommended attributes, and CF, which has some required and some recommended attributes. We considered putting a column on the guidance table that would specify REQUIRED or RECOMMENDED, but found it was not quite so easy as it sounds and the result seemed more confusing. Some attributes are required in some cases and not in others, for example. Instead, we chose to indicate REQUIRED/RECOMMENDED in the CDL templates themselves.||We will work in an ongoing fashion to improve the presentation of the templates and guidance, so will consider ways to add this information to the guidance table. For web presentation, things like pop-up information boxes with greater detail, for example, could be used.|
|No guideline on netCDF4 chunking (chunking can hurt
performance if not done correctly)||Ed Armstrong||It is true we did not go into any specifics on the using of chunking and internal compression.||We will add some text to the section where we recommend the use of netCDF-4 and try to be more clear and specific about variable-level compression and chunking.|
|Great Work! Just a few thoughts on documenting
An archived file will usually outlast the reliability of most links documented as an attribute inside that file. For this reason, archived datasets should recommend the use of persistent identifiers with an id authority (like in ISO 19115) for uniquely referencing items from file attributes. If wanted, a link could be generated using the id value in an attribute from a companion XSLT. Of course, reliable links such as those to a registry service or a DOI resolver are acceptable for archiving.
I suggest adding the following as global attributes.
In place of or in addition to ACDD 'id' and 'authority':
To accompany 'metadata_link':
For spatial reference system information:
:spatial_reference_system_id_code = "urn:ogc:def:crs:EPSG::4326" ;
:spatial_reference_system_id_authority = "International Association of Oil & Gas Producers (OGP)" ;
:spatial_reference_system_id_link = "http://www.epsg-registry.org/export.htm?gml=urn:ogc:def:crs:EPSG::4326" ;
|Philip Jones||These are great suggestions and we recommend data producers use them if they feel they can. However, one thing we struggled a lot with in the process of constructing these templates was the natural desire to add more and more attributes. So each time a new suggestion arose, we asked ourselves not only "Would this be useful?" but also other questions about the broad applicability weighed against burdening the templates with too many additional attributes. In the end we limited our additions to six simple ones we felt could easily be applied to nearly all data sets.||In moving forward, we will explore ways to collect additional proposed attributes in one place so they can be considered for inclusion in future versions of the templates. We will be soliciting ongoing suggestions for improvements to the overall guidance table, templates, and decision tree, so these newly proposed attributes could be part of that.|
|I did take a bit more detailed look at the table, noting that
use the 8601 standard for date and time. I think it would be appropriate to
look at IERS and the related standards they recommend for time-keeping,
since the IT standard is dealing with Civil Time, not TAI (international atomic
time) and the relationship between atomic time and Earth rotation time.
There are two primary reasons for considering this difference:
1. IERS, as well as the Bureaus of Standards and the International Astronomical
Union are really the official recorders of time, not the community charged with
IT standards. Thus, the 8601 IT standard is not a scientific standard, but an
interchange standard. I recognize that NOAA is currently part of the Dept.
of Commerce, but I think that for scientific use, the scientific time keepers
should take precedence in setting the standard.
2. In dealing with climate, there is a need to keep a consistent tracking of
time on a scale that goes back to the Paleoclimate record, well before the
Julian-Gregorian mess in Europe (as well as the other religious calendars
that contribute). In doing conversions from civil time records to Fourier analysis,
it is important to avoid having the irregularities in the civil calendars producing
spurious frequency components in the spectrum. We did use Astronomical
Julian Date on both ERBE and CERES, but I think we made a mistake in allowing
the data management team to use months with the variable length of February.
Unfortunately, the average number of days in a month is 365.2422/12.0 which
is about 30.4 days.
I'll comment more on this issue and others I find later. Still think what you've
got is pretty helpful.
|Bruce Barkstrom||We believe these concerns are more about the standard itself rather than our templates and leave it to the submitted to convey them to the standards bodies if he feels it is warranted. Our feedback on these concerns: 1) ISO does not forbid scientific time keepers to be part of the standard making process and ISO is not just about IT standards. 2) CF does allow multiple calendars including Julian and Gregorian calendars and as far as we understand the situation, the libraries which are used to read dates do take Julian-Gregorian issues into account when converting dates and times.||None Required.|
|The term "resolution" is often misused. If you check for
references on resolution|
of imaging instruments, I believe you'll find that the term refers to the shortest
distance between statistically distinguishable signals, not to the smallest
distance between two points in the image grid. You may find
to be helpful.
Of course I'm aware that our modeling colleagues who work with GCM's that have
equal angle grids often refer to the discretization in angle as the model's resolution.
There is some discussion of the "resolution" of spherical harmonics based horizontal
spatial grids, although the definition may not be that straightforward.
One of the most important early works on the subject is
Backus, G. and Gilbert, F., 1970: Uniqueness in the inversion of inaccurate gross
Earth data, Phil. Trans. of the Royal Society, Series A. Math and Physical Sciences,
Vol. 266, No. 1173, pp. 123-192.
This paper shows that for measurements that are produced with a convolution of
a kernel and a continuous field of interest, the resolving power (in the sense of
distinguishing two features with statistical certainty) and the uncertainty of the
resolved field are inversely related to each other. In other words, if you try to
deconvolve the measurement to increase the resolving power (or resolution),
you decrease the certainty of the retrieved quantity.
Imaging instrument, such as MODIS, have Point Spread Functions (or their Fourier
equivalent the Modulation Transfer Functions) that are equivalent to Backus and
Gilbert's kernals for relating measurements to the quantities being measured.
Thus, such treatments as orthorectification of images can magnify the radiometric
uncertainty - even making the uncertainty a function of position within the retrieved
To put it a bit more bluntly, the GCMD definition of "resolution" is wrong.
|Bruce Barkstrom||Multiple comments have addressed concerns with use of the term resolution, and we thank you for the additional information and references. Please see our response to comment #13 above. Also, like the comment above, this concern has more to do with the standard itself than the NODC use of it in the templates. We will work with out data producers to use terms like this as carefully and accurately as possible.||None Required.|
|Another concern with the CF Profile is a difference between
images that represent|
near vertical views of the Earth, where the term "swath" may be useful, as compared
with instruments using time probing of vertical propagation or with instruments sensing
radiances that need to incorporate the direction and wavelength of the radiances
being sensed as independent degrees of freedom (fully equivalent to latitude, longitude,
altitude, and time). I am also not sure that the CF description is adequate to deal with
limb-sounding or stratospheric sounding instruments such as MLS that look sideways
and do vertical scans (or even rotating vertical scans).
One concrete example comes from vertically sounding lidar measurements, in which
three vertical shots may be combined together to produce a composite vertical profile.
Most swath imaging instruments (Landsat or MODIS) would find this process to be
strange, since it would average a sequence of pixels over several rows - reducing the
spatial resolving power. In other words, a vertical curtain of lidar (or radar) shots is not
equivalent in geometry and measurement treatment to a horizontal swath (or "carpet")
of measurements. At a more fundamental level, the physics of vegetation and geology
produces very different spatial and temporal patterns than the physics of clouds and aerosols.
|Bruce Barkstrom||Like the comment above, this concern has more to do with the standard itself than the NODC use of it in the templates.||None required.|
|Nice work. Any idea when the swath data type template will be written?||Andy Bingham||Andy - we don't have a specific timeline for creating a swath template. We are tracking the CF-Satellites dicsussion group and will follow their lead, but if there is no draft CF standard by the time we are ready to publish a Version 2 of these templates then we will likely propose a template based on GHRSST's L2P format (see the comment above for more on the GHRSST Data Soecification).||None required.|
|I will pass this onto colleagues here in Oz. We have been
discussing the use of feature types and feature type catalogues to
describe oceanographic phenomena and defining community feature types.
Your work may be something we can use and build on. We will get back to
you with comments.||Greg Reed||Greg - thanks. Please let us know how your work proceeds and we hope these efforts provde useful to you.||None required.|
|The work looks very good to to me, too. A real
contribution.||Steve Hankin||Thanks Steve! We appreciate your positive and enthusiastic comments.||None required.|
|A key issue from my standpoint is the requirement|
that data include an estimate of uncertainty - and
there's an ISO standard on how to state that
uncertainty. That standard comes from the national
bureaus of standards, including NIST. I'll include
that in the list later today.
|Bruce Barkstrom||Thanks Bruce. We agree that estimates of uncertainty can be hugely useful. But specific content beyond that required by CF and recommended by ACDD is beyond the scope of these templates.||We will continue to encourage data providers and projects to develop useful estimates of uncertainty and include them with their netCDF data whenever possible.|
|My first question is about these
:time_coverage_start = "" ; //....................................... RECOMMENDED - Use ISO8601 for date and time. (ACDD)
:time_coverage_end = "" ; //......................................... RECOMMENDED - Use ISO8601 for date and time.(ACDD)
Typically the gridded dataset would consist of a number of years of data, but the individual files may be monthly or yearly.
Given that the ACDD metadata should be for the dataset (in this example spanning years), would you expect the time_coverage_start and time_coverage_end in each file to reflect the temporal range of the full dataset?
Particularly if the dataset is still growing with recent data
Thanks for your thoughts on this,
|Victoria Bennett||Great question, Victoria. At NODC we believe the intent of attributes like this is to describe the data contained within the netCDF file, not the entire dataset spanning multiple files. We grapple with this concept of a "granule", the individual file, versus the "collection", the related set of files, all the time. The way we and most other archive centers approach this is to have the file-level attributes in the netCDF file describe that particular file, but then use an FGDC or, more recently, ISO 19115-2 compliant metadata record to describe the whole collection of related files. At NODC, our present discovery systems help you find the collection you want by searching through those collection-level metadata records, but we are now working on granule-level (file-level) discovery systems to help you find just the specific subset of files within the collection once you know which collection you want. This kind of two-tiered discovery system is just being explored but we hope to have some working prototypes soon, and having accurate granule-level (file level) metadata attributes is a key to its success.||None required.|
Thanks for using our data in the cdl examples- good idea! However I have a small quibble- in the table, our (USGS) data is described as "One or more moored profiling buoys with the exact same time and depth (z) values." In truth, this is an aggregation of 5 Seacats spaced along a mooring. They didn't profile (I believe profile means to move up and down or make measurements at multiple depths), each made repeated measurements at it's own depth. The aggregation makes it appear to fit the CF profile definition, but I'd prefer what's in the table to be accurate. Maybe an ADCP example would fit this better?
Best wishes, Ellyn
|Ellyn Montgomery||Orthogonal feature types could be composed of data aggregated
from different sensors. One consideration here is that the measured
coordinate variables must be positioned at standard intervals. However,
even if the data from a sensor array conforms to the basic requirements of
the orthogonal feature type, the creator of the NetCDF data ultimately
makes the determination on whether an orthogonal aggregation is a suitable
way to represent the data. |
In this example, it was informative to aggregate the data from different sensors, because this was the simplest way to indicate that the sensors were physically attached together on the same mooring. The sensors were ordered by their monotonically increasing depth positions, which was captured in the feature type by using a reference to the depth coordinate system. An assumption of using an orthogonal feature type is that the spacing of sensors along the mooring do not change depth throughout the deployment.
Alternatively, the data creator may choose to represent each sensor from a sensor array using a separate feature type, like a time-series for each sensor attached to a mooring. This may be justified for example if the best 'sensor level' data for the coordinate systems vary slightly from standard intervals required by the coordinate variable representation. In this case the sensor array data may still be aggregated as an incomplete multidimensional array representation, providing the opportunity to specify for each sensor the depth as a function of time and/or the time as a function of depth (i.e. different seacat sensors).
|NODC will provide improved overall guidance on what to do in cases not covered explicitly by the templates.|