stompy.io — Reading and writing of various formats and data sources

Submodules

stompy.io.match_datasets module

Venturing into generic code to match two datasets.

Not remotely generic at this point, and makes some assumptions about dimensions, depth, time, etc.

class stompy.io.match_datasets.MatchVarsCruise(varA, varB, B_type)[source]

Bases: object

stompy.io.qnc module

class stompy.io.qnc.QDataset(*args, **kws)[source]

Bases: netCDF4._netCDF4.Dataset

class VarProxy(dataset, varname)[source]

Bases: object

represents a variable about to be defined, but waiting for dimension names and data, via setattr.

create(dims, data, **kwargs)[source]
add_dimension(dim_name, length)[source]

create dimension if it doesn’t exist, otherwise check that the length requested matches what does exist.

alias(**kwargs)[source]

had been just copying the variables. But why not just update self.variables? This works even if not writing to the file.

copy(skip=[], fn=None, **create_args)[source]

make a deep copy of self, into a writable, diskless QDataset if fn is given, target is a netCDF file on disk.

copy_ncattrs_to(new)[source]
interpolate_dimension(int_dim, int_var, new_coordinate, max_gap=None, gap_fields=None, int_mode='nearest')[source]

return a new dataset as a copy of this one, but with the given dimension interpolated according to varname=values

typically this would be done to a list of datasets, after which they could be appended.

it can also be used to collapse a ‘dependent’ coordinate into an independent coordinate - e.g. if depth bins are a function of time, this can be used to interpolate onto a constant depth axis, which will also remove the time dimension from that depth variable.

max_gap: jumps in the source variable greater than max_gap are filled with nan (or -99 if int valued). For now this is only supported when int_dim has just one dimension gap_fields: None, or a list of variable names to be masked based on gaps.

int_mode:
‘nearest’ - grab the integer value from the nearest sample may add ‘linear’ in the future, which would cast to float
select(**kwargs)[source]
within(**kwargs)[source]
exception stompy.io.qnc.QncException[source]

Bases: exceptions.Exception

class stompy.io.qnc.QuickVar(nc, v, transpose=None)[source]

Bases: object

as_datenum()[source]
dimensions
dims
stompy.io.qnc.anon_dim_name(size, **kws)[source]

Name given to on-demand dimensions. kws: unused, but might include the type?

stompy.io.qnc.as_tuple(x)[source]
stompy.io.qnc.concatenate(ncs, cat_dim, skip=[], new_dim=None)[source]

ncs is an ordered list of QDataset objects If a single QDataset is given, it will be copied at the metadata level new_dim: if given, then fields not having cat_dim, but differing

between datasets, will be concatenated along new_dim.

for convenience, elements of gdms which are None are silently dropped

stompy.io.qnc.downsample(ds, dim, stride, lowpass=True)[source]

Lowpass variables along the given dimension, and resample at the given stride. lowpass=False => decimate, no lowpass lowpass=<float> => lowpass window size is lowpass*stride

stompy.io.qnc.empty(fn=None, overwrite=False, **kwargs)[source]
stompy.io.qnc.linear_to_orthogonal_nc(nc_src, lin_dim, ortho_dims, nc_dst=None)[source]

copy a dataset, changing a linear dimension into a pair of orthogonal dimensions

stompy.io.qnc.mat_to_nc(mat, dim_map={}, autosqueeze=True)[source]
stompy.io.qnc.ortho_to_transect_nc(src_nc, src_x_var, src_y_var, transect_xy, dst_nc=None)[source]

Extract a transect to a new dataset

stompy.io.qnc.sanitize_name(s)[source]

make s suitable for use as a variable or dimension name

stompy.io.qnc.to_str(s)[source]

stompy.io.rbr module

class stompy.io.rbr.Calibration(txt, coefs, units)[source]

Bases: object

a container for calibration information - these aren’t actually used, though

static from_dict(d)[source]
class stompy.io.rbr.Rbr(dat_file, instrument_tz=<UTC>, target_tz=<UTC>)[source]

Bases: object

autotrim()[source]

Trim the timeseries to reflect when it looks like it was actually in the water

clean_name(s)[source]
read()[source]
remove_all_spikes(columns=['Cond', 'Salinity', 'SpecCond'])[source]
remove_spikes(ci, method='d2', d2_threshold=30)[source]

attempt to automatically remove the spikes. not the best idea, but hopefully saves some time for a quick look at data

d2_threshold=number of standard deviations in 2nd derivative
to consider an outlier
synthesize_fields()[source]

In case we have conductivity, temp and pressure but not salinity, calculate it here.

to_xarray()[source]
update_fields()[source]

set fields like self.t, self.cond, etc. which reference slices of self.data

class stompy.io.rbr.RbrHex(dat_file, instrument_tz=<UTC>, target_tz=<UTC>)[source]

Bases: stompy.io.rbr.Rbr

subclass for reading hex files.

static concatenate(Rs)[source]
parse_datetime(d, t)[source]
parse_raw()[source]
read()[source]
read_calibration()[source]
read_calibrations()[source]
read_data()[source]
read_extras()[source]

fields I don’t want to mess with yet

read_headers()[source]
class stompy.io.rbr.RbrRsk(dat_file, instrument_tz=<UTC>, target_tz=<UTC>)[source]

Bases: stompy.io.rbr.Rbr

read()[source]
read_calibrations()[source]
read_data()[source]
read_data_raw()[source]

shove the data table into a numpy array

read_extras()[source]

fields I don’t want to mess with yet

read_headers()[source]
class stompy.io.rbr.RbrText(dat_file, instrument_tz=<UTC>, target_tz=<UTC>)[source]

Bases: stompy.io.rbr.Rbr

parse_field_names()[source]

populate self.fields as list of tuples of (‘name’,parser)

read()[source]
read_data()[source]

self.data is [Nsamples,Nfields] self.columns: list of data column names

skip_headers()[source]
stompy.io.rbr.load(fn, **kwargs)[source]

try to detect whether fn is a Rsk or hex file, and return a corresponding instance kwargs:

instrument_tz=pytz.utc target_tz=pytz.utc

stompy.io.rdb module

Tools for reading RDB files, the text-based format often used in USGS data. See stompy/test/data for examples of this type of data.

class stompy.io.rdb.Rdb(text=None, source_file=None, fp=None)[source]

Bases: object

data()[source]

assuming that only one data type was requested, try to figure out which column it is, and return that data for single-valued columns, this will expand the data out to be the right length

float_or_nan(s)[source]
keys()[source]
parse_date(s)[source]

parse a date like ‘2008-01-13 00:31’ into a float representing absolute days since 0ad

parse_source_file()[source]
record_count = 0
series(*keys)[source]

return a tuple of vectors for the given keys, but only when all values have valid data

stompy.io.rdb.rdb_to_dataset(filename=None, text=None, to_utc=True)[source]

Read an rdb file and return an xarray dataset. if to_utc is set, look for a tz_cd attribute, and adjust times to UTC if tz_cd is present.

If no data was found, return None

stompy.io.rdb_codes module

Handle database of USGS codes used in RDB files, namely for parameters (e.g. streamflow in cfs) and statistics (e.g. mean)

stompy.io.rdb_codes.parm_code_lookup(code)[source]
stompy.io.rdb_codes.parm_codes()[source]
stompy.io.rdb_codes.sanitize_code(code)[source]

Make a canonical text version of a code - a 5 digit, 0-padded string

stompy.io.rdb_codes.stat_code_lookup(code)[source]

code: an integer or string code for USGS statistics. returns the name of the statistics, or if it cannot be found, a string representation of the code (e.g. “99900”)

stompy.io.rdb_codes.stat_codes()[source]

stompy.io.rdb_datadescriptors module

stompy.io.rdb_datadescriptors.dd_to_synonyms(code)[source]

stompy.io.rdradcp module

A mostly direct translation of rdradcp.m to python. 1/3/2013: Updated with DKR changes to rdradcp.m

class stompy.io.rdradcp.Adcp[source]

Bases: object

class stompy.io.rdradcp.Config[source]

Bases: object

class stompy.io.rdradcp.Header[source]

Bases: object

stompy.io.rdradcp.adcp_merge_nmea(r, gps_fn, adjust_to_utc=False)[source]

parse a NMEA file from WinRiver (i.e. with RDENS sentences), and add lat/lon to r. adjust_to_utc: use GPS time to modify the hours place of r.mtime

stompy.io.rdradcp.add_depth(r)[source]
stompy.io.rdradcp.checkheader(fd)[source]

Given an open file object, read the ensemble size, skip ahead, make sure we can read the cfg bytes of the next ensemble, come back to the starting place, and report success.

stompy.io.rdradcp.get_bin_dtype()[source]
stompy.io.rdradcp.get_ens_dtype(sourceprog='WINRIVER')[source]
stompy.io.rdradcp.getopt(val, *args)[source]
stompy.io.rdradcp.invalidate_from_bed(r)[source]

where bottom track is good, nan out data in bottom 5%

stompy.io.rdradcp.msg_print(s)[source]
stompy.io.rdradcp.nmean(x, dim=None)[source]
stompy.io.rdradcp.nmedian(x, window=inf, dim=None)[source]
stompy.io.rdradcp.rd_buffer(fd, num_av, msg=<function msg_print>)[source]

RH: return ens=None, hdr=None if there’s a problem

returns (ens,hdr,cfg,pos)

stompy.io.rdradcp.rd_fix(fd, msg=<function msg_print>)[source]
stompy.io.rdradcp.rd_fixseg(fd)[source]

returns Config, nbyte Reads the configuration data from the fixed leader

stompy.io.rdradcp.rd_hdr(fd, msg=<function msg_print>)[source]
stompy.io.rdradcp.rd_hdrseg(fd)[source]
stompy.io.rdradcp.rdradcp(name, num_av=5, nens=-1, baseyear=2000, despike='no', log_fp=None)[source]

The original documentation from Rich Pawlowicz’s code:

RDRADCP Read (raw binary) RDI ADCP files, ADCP=RDRADCP(NAME) reads the raw binary RDI BB/Workhorse ADCP file NAME and puts all the relevant configuration and measured data into a data structure ADCP (which is self-explanatory). This program is designed for handling data recorded by moored instruments (primarily Workhorse-type but can also read Broadband) and then downloaded post-deployment. For vessel-mount data I usually make p-files (which integrate nav info and do coordinate transformations) and then use RDPADCP.

This current version does have some handling of VMDAS, WINRIVER, and WINRIVER2 output files, but it is still ‘beta’. There are (inadequately documented) timestamps of various kinds from VMDAS, for example, and caveat emptor on WINRIVER2 NMEA data.

(ADCP,CFG)=RDRADCP(…) returns configuration data in a separate data structure.

Various options can be specified on input: (..)=RDRADCP(NAME,NUMAV) averages NUMAV ensembles together in the result. (..)=RDRADCP(NAME,NUMAV,NENS) reads only NENS ensembles (-1 for all). (..)=RDRADCP(NAME,NUMAV,(NFIRST NEND)) reads only the specified range of ensembles. This is useful if you want to get rid of bad data before/after the deployment period.

Notes: - sometimes the ends of files are filled with garbage. In this case you may

have to rerun things explicitly specifying how many records to read (or the last record to read). I don’t handle bad data very well. Also - in Aug/2007 I discovered that WINRIVER-2 files can have a varying number of bytes per ensemble. Thus the estimated number of ensembles in a file (based on the length of the first ensemble and file size) can be too high or too low.
  • I don’t read in absolutely every parameter stored in the binaries; just the ones that are ‘most’ useful. Look through the code if you want to get other things.

  • chaining of files does not occur (i.e. read .000, .001, etc.). Sometimes a ping is split between the end of one file and the beginning of another. The only way to get this data is to concatentate the files, using cat file1.000 file1.001 > file1 (unix) copy file1.000/B+file2.001/B file3.000/B (DOS/Windows)

    (as of Dec 2005 we can probably read a .001 file)

  • velocity fields are always called east/north/vertical/error for all coordinate systems even though they should be treated as 1/2/3/4 in beam coordinates etc.

String parameter/option pairs can be added after these initial parameters:

‘baseyear’: Base century for BB/v8WH firmware (default to 2000).

‘despike’: ‘no’ | ‘yes’ | 3-element vector

Controls ensemble averaging. With ‘no’ a simple mean is used (default). With ‘yes’ a mean is applied to all values that fall within a window around the median (giving some outlier rejection). This is useful for noisy data. Window sizes are [.3 .3 .3] m/s for [ horiz_vel vert_vel error_vel ] values. If you want to change these values, set ‘despike’ to the 3-element vector.

  1. Pawlowicz (rich@eos.ubc.ca) - 17/09/99

R. Pawlowicz - 17/Oct/99 5/july/00 - handled byte offsets (and mysterious ‘extra” bytes) slightly better, Y2K

5/Oct/00 - bug fix - size of ens stayed 2 when NUMAV==1 due to initialization, hopefully this is now fixed.

10/Mar/02 - #bytes per record changes mysteriously, tried a more robust workaround. Guess that we have an extra 2 bytes if the record length is even?

28/Mar/02 - added more firmware-dependent changes to format; hopefully this works for everything now (put previous changes on firmer footing?)

30/Mar/02 - made cfg output more intuitive by decoding things. An early version of WAVESMON and PARSE which split out this data from a wave recorder inserted an extra two bytes per record. I have removed the code to handle this but if you need it see line 509

29/Nov/02 - A change in the bottom-track block for version 4.05 (very old!).

29/Jan/03 - Status block in v4.25 150khzBB two bytes short?

14/Oct/03 - Added code to at least ‘ignore’ WinRiver GPS blocks.

11/Nov/03 - VMDAS navigation block, added hooks to output navigation data.

26/Mar/04 - better decoding of nav blocks - better handling of weird bytes at beginning and end of file - (code fixes due to Matt Drennan).

25/Aug/04 - fixes to “junk bytes” handling.

27/Jan/05 - even more fixed to junk byte handling (move 1 byte at a time rather than two for odd lengths.

29/Sep/2005 - median windowing done slightly incorrectly in a way which biases results in a negative way in data is very noisy. Now fixed.

28/Dc/2005 - redid code for recovering from ensembles that mysteriously change length, added ‘checkheader’ to make a complete check of ensembles.

Feb/2006 - handling of firmware version 9 (navigator)

23/Aug/2006 - more firmware updates (16.27)

23/Aug2006 - ouput some bt QC stiff

29/Oct/2006 - winriver bottom track block had errors in it - now fixed.

30/Oct/2006 - pitch_std, roll_std now uint8 and not int8 (thanks Felipe pimenta)

13/Aug/2007 - added Rio Grande (firmware v 10), better handling of those cursed winriver ASCII NMEA blocks whose lengths change unpredictably. skipping the inadequately documented 2022 WINRIVER-2 NMEA block

13/Mar/2010 - firmware version 50 for WH.

31/Aug/2012 - Rusty Holleman / RMA - ported to python

Python port details:

log_fp: a file-like object - the message are the same as in the matlab code, but this allows them to be redirected elsewhere.

stompy.io.rdradcp.ship_to_earth(r)[source]

rotate r.east_vel, r.north_vel, r.bt_vel by compass heading. ship values moved to r.ship_**var**

stompy.io.rdradcp.to_earth(r)[source]

Module contents