Store-file specification

The model fit data produced by NestFit are organized by pixel and model type (i.e., the number of components fit) hierarchically in a HDF5 based store. This page documents the internal structure of the data and metadata in the store. FITS images and cubes may be produced for a sub-set of data products.

Store Description

All data products are stored in a special directory with the extension .store. For parallel cube-fitting, multiple HDF5 “chunk” files are placed in this directory so that each process may write separately without locking. The entries within each chunk are then soft-linked to the primary table.hdf file (without copying). For more information on HDF5 soft links, see the h5py documentation here. The table.hdf file stores the metadata, cube header, and the aggregated products created from post-processing.

The directory for the store has the extension .store and the following structure:

- <NAME>.store
    - chunk0.hdf
    - chunk1.hdf
    - ...
    - table.hdf

where the files chunk<N>.hdf are the HDF5 files created by each process.

The specification and layout of the data in the table.hdf is given in the following section. The table file may be accessed directly from the attribute nestfit.HdfStore.hdf and is an instance of h5py.File. Please see the h5py documentation for a description of HDF5 files and how to use them.

Note that data product arrays are strided in the C convention with fastest varying index being furthest to the right. They are optimized for displaying maps of a given parameter combination.

Specification

The data stored in the HDF table file has the following specification. Group names are indicated by a "*", attributes by a "-", and datasets by a "=" followed by the dimension. Child items are indicated by indentation. Group and dataset names can be joined by a "/", so a valid path to the dataset posteriors in the group '/pix/0/0/1' would be hdf['/pix/0/0/1/posteriors'], for example. Attributes are accessed with group.attrs['<NAME>'].

* / : root group
- lnZ_threshold: evidence threshold used when selecting one model over another
- multinest_kwargs : additional keyword arguments passed to MultiNest
- n_max_components : the maximum number of components to iteratively fit
- naxis1 : number of longitude pixels
- naxis2 : number of latitude pixels
- nchunks : number of HDF chunk files in the store
- model_name : name of the model, corresponds to NestFit module name
- n_params : number of model parameters per velocity component
- par_names : ascii names of the model parameters
- par_names_short : one to two character parameter names
- tex_labels : TeX formatted parameter label names
- tex_labels_with_units : TeX formatted parameter label names with units
    * pix : hierarchical directory containing data for each pixel
        * <LON> : the longitude pixel number
            * <LAT> : the latitude pixel number
            - nbest : best fit model number
            - i_lon : longitude pixel number
            - i_lat : latitude pixel number
                * <N> : model number
                - AIC
                - AICc
                - BIC
                - global_lnZ
                - global_lnZ_err
                - marg_cols
                - marg_quantiles
                - max_loglike
                - max_loglike
                - n_chan_tot
                - n_live
                - n_params
                - n_samples
                - ncomp
                - null_lnZ
                - par_names
                = bestfit_params (n=1; p*m)
                = map_params     (n=1; p*m)
                = marginals      (n=2; M, p*m)
                = posteriors     (n=2; n, p*m+2)
    * products : post-processing aggregate products
        = nbest                (n=2; b, l)
        = evidence             (n=3; m, b, l)
        = evidence_err         (n=3; m, b, l)
        = AIC                  (n=3; m, b, l)
        = AICc                 (n=3; m, b, l)
        = BIC                  (n=3; m, b, l)
        = conv_evidence        (n=3; m, b, l)
        = conv_nbest           (n=2; b, l)
        = marg_quantiles       (n=1; M)
        = nbest_MAP            (n=4; m, p, b, l)
        = nbest_bestfit        (n=4; m, p, b, l)
        = nbest_marginals      (n=5; m, p, M, b, l)
        = pdf_bins             (n=2; p, h)
        = post_pdfs            (n=6; r, m, p, h, b, l)
        = conv_post_pdfs       (n=6; r, m, p, h, b, l)
        = conv_marginals       (n=6; r, m, p, M, b, l)
        = peak_intensity       (n=4; t, m, b, l)
        = integrated_intensity (n=4; t, m, b, l)
        = hf_deblended         (n=5; t, m, S, b, l)
        * model_spec : predicted model spectral cubes
            = trans<ID>        (n=4; m, S, b, l)
    * full_header : all header keywords stored as attributes
    * simple_header : subset of coordinate system related header keywords

Product dimension key codes:
  * n: number of samples
  * b: latitude pixel index
  * l: longitude pixel index
  * p: model parameter
  * m: model component number
  * M: marginal distribution quantile
  * r: run number (ie, the index for the 1-comp run, 2-comp run, etc.)
  * h: marginal PDF bin
  * t: transition
  * S: spectral channel

Quantile indices for marginal cubes:
  *  0 : 0.00   (min)
  *  1 : 0.01
  *  2 : 0.10
  *  3 : 0.25
  *  4 : 0.50   (median)
  *  5 : 0.75
  *  6 : 0.90
  *  7 : 0.99
  *  8 : 1.00   (max)
  *  9 : 0.1587 (-1 sigma) -- NOTE listed precision is truncated
  * 10 : 0.8413 (+1 sigma)
  * 11 : 0.0228 (-2 sigma)
  * 12 : 0.9772 (+2 sigma)
  * 13 : 0.0013 (-3 sigma)
  * 14 : 0.9987 (+3 sigma)