stompy.utils — General utilities

class stompy.utils.BruteKDE(values, weights, bw)[source]

Bases: object

class stompy.utils.Bucket(**kwargs)[source]

Bases: object

stompy.utils.LinearNDExtrapolator(points, values, eps=None)[source]

Kludge the linear interpolator to also extrapolate. Accomplished by estimating a value and gradient along the convex hull and creating points “far” away that extrapolate that gradient.

exception stompy.utils.PrincipalThetaException[source]

Bases: exceptions.Exception

stompy.utils.add_to(instance)[source]
stompy.utils.alpha_to_zerobased(s)[source]
stompy.utils.array_append(A, b=<object object>)[source]

append b to A, where b.shape == A.shape[1:] Attempts to make this fast by dynamically resizing the base array of A, and returning the appropriate slice.

if b is None, zeros are appended to A

the sentinel bit is to allow specifying a value of None, distinct from not specifying a value

stompy.utils.array_concatenate(AB)[source]

similiar to array_append, but B.shape[1:] == A.shape[1:]

while the calling convention is similar to concatenate, it currently only supports 2 arrays

stompy.utils.as_sheet(sheet, fn)[source]
stompy.utils.bootstrap_resample(X, n=None)[source]

Bootstrap resample an array_like credits to http://nbviewer.ipython.org/gist/aflaxman/6871948

Parameters:

X : array_like
data to resample
n : int, optional
length of resampled array, equal to len(X) if n==None

Results:

returns X_resamples

stompy.utils.bootstrap_stat(X, n_pop=10000, n_elements=None, pop_stat=<function mean>, bootstrap_stat=<function <lambda>>)[source]
stompy.utils.bounds(pnts)[source]

returns array [{lower,upper},pnts.shape[-1]]

stompy.utils.break_track(xy, waypoints, radius_min=400, radius_max=800, min_samples=10)[source]

xy: coordinate sequence of trackline waypoints: collection of waypoints

return pairs of indices giving start/end of transects, split by waypoints.

stompy.utils.call_with_path(cmd, path)[source]
stompy.utils.cdiff(a, n=1, axis=-1)[source]

Like np.diff, but include difference from last element back to first.

stompy.utils.ceil_dt64(t, dt, t0=numpy.datetime64('1970-01-01T00:00:00'))[source]

Round the given t up to an integer number of dt from a reference time (defaults to unix epoch)

stompy.utils.cell_region_to_array(sheet, xl_range, dtype='O', fn=None)[source]
stompy.utils.cell_region_to_df(sheet, xl_range, idx_ncols=0, idx_nrows=0, fn=None)[source]
stompy.utils.center_to_edge(c, dx_single=None, axis=0)[source]

take ‘cell’ center locations c, and infer boundary locations. first/last cells get width of the first/last inter-cell spacing. if there is only one sample and dx_single is specified, use that for width. otherwise error.

stompy.utils.center_to_edge_2d(X, Y, dx_single=None, dy_single=None)[source]
stompy.utils.center_to_interval(c)[source]

c: coordinates of centers, d: sizes of intervals, with the first/last interval assumed same as neighbors

stompy.utils.cf_string_to_dt64(x)[source]

return a seconds-based numpy datetime from something like 1000 seconds since 1983-01-09T12:43:10

This is conditionally called from to_datetime(), too.

A timezone, either as a trailing ‘Z’ or -0:00 is allowed, but other timezones are not (since that would introduce an ambiguity as to whether to adjust to UTC, or leave in another timezone)

stompy.utils.circular_n(iterable, n)[source]
stompy.utils.circular_pairs(iterable)[source]

like pairwise, but closes the loop. s -> (s0,s1), (s1,s2), (s2, s3), …, (sN,s0)

stompy.utils.circumcenter(p1, p2, p3)[source]
stompy.utils.concatenate_safe_dtypes(ab)[source]

Concatenate two arrays, but allow for the dtypes to be different. The fields are taken from the first array - matching fields in subsequent arrays are copied, others discarded.

stompy.utils.deriv(c, x)[source]
stompy.utils.dice_interval(subinterval, overlap_fraction, start, end=None)[source]

subinterval gives a duration, say 90 [s] overlap_fraction=0 means end of one interval is start of the next, overlap_fraction=0.5 means middle of one interval is start of next. start is either a scalar, or a pair of scalars if start is a scalar, end must be specified also as a scalar

yields [substart,subend] pairs

stompy.utils.dist(a, b=None)[source]
stompy.utils.dist_along(x, y=None)[source]
stompy.utils.dnum_jday0(dnum)[source]
stompy.utils.dnum_to_dt64(dnum, units='us')[source]
stompy.utils.download_url(url, local_file, log=None, on_abort='pass', on_exists='pass', **extra_args)[source]

log: an object or module with info(), warning(), and error() methods ala the logging module. on_abort: if an exception is raised during download, ‘pass’

leaves partial files in tact, ‘remove’ deletes partial files
on_exists: ‘pass’ do nothing and return. ‘exception’: raise an exception,
‘replace’: delete and re-download. Note that this is not atomic, and if the download fails the original file may be deleted anyway.
extra_args: keyword arguments passed on to requests.get or ftplib.FTP()
useful options here:
verify=False: if an https server has a bad certificate but you want
to proceed anyway.
stompy.utils.dt64_to_dnum(dt64)[source]
stompy.utils.enumerate_groups(keys)[source]

given an array of labels, return, in increasing value of key, yield tuples (key,idxs) such that np.all(keys[idxs]==key)

stompy.utils.expand_xxyy(xxyy, factor)[source]
stompy.utils.expand_xyxy(xyxy, factor)[source]
stompy.utils.fill_invalid(A, axis=0, ends='constant')[source]

ends: ‘constant’ missing values at the ends will take nearest valid value ‘linear’ missing values will be extrapolated with a linear fit through the first/last valid values

returns new array, though currently this is just a view on the original data.

stompy.utils.fill_tidal_data(da, fill_time=True)[source]

Extract tidal harmonics from an incomplete xarray DataArray, use those to fill in the gaps and return a complete DataArray.

Uses all 37 of the standard NOAA harmonics, may not be stable with short time series.

A 5-day lowpass is removed from the harmonic decomposition, and added back in afterwards.

Assumes that the DataArray has a ‘time’ coordinate with datetime64 values.

The time dimension must be dense enough to extract an exact time step

If fill_time is True, holes in the time coordinate will be filled, too.

stompy.utils.find_lag(t, x, t_ref, x_ref)[source]

Report lag in time between x(t) and a reference x_ref(t_ref).

If times are already numeric, they are left as is and the lag is reported in the same units.

If times are not numeric, they are converted to date nums via utils.to_dnum. If they are np.datetime64, lag is reported as timedelta64. If the inputs are datetimes, lag is reported as timedelta. Otherwise, lag is kept as a floating point number

stompy.utils.find_lag_xr(data, ref)[source]

Report lag in time of data (xr.DataArray) with respect to reference (xr.DataArray). Requires that data and ref are xr.DataArray, with coordinate values.

stompy.utils.find_phase(jd, u, which='ebb', **kwargs)[source]
stompy.utils.find_slack(jd, u, leave_mean=False, which='both')[source]
stompy.utils.floor_dt64(t, dt, t0=numpy.datetime64('1970-01-01T00:00:00'))[source]

Round the given t down to an integer number of dt from a reference time (defaults to unix epoch)

class stompy.utils.forwardTo(objectName, attrName)[source]

Bases: object

credits: http://code.activestate.com/recipes/510402-attribute-proxy-forwarding-attribute-access/

A descriptor based recipe that makes it possible to write shorthands that forward attribute access from one object onto another.

>>> class C(object):
...     def __init__(self):
...         class CC(object):
...             def xx(self, extra):
...                 return 100 + extra
...             foo = 42
...         self.cc = CC()
...
...     localcc = forwardTo('cc', 'xx')
...     localfoo = forwardTo('cc', 'foo')
...
>>> print C().localcc(10)
110
>>> print C().localfoo
42
Arguments: objectName - name of the attribute containing the second object.
attrName - name of the attribute in the second object.

Returns: An object that will forward any calls as described above.

stompy.utils.group_by_sign_hysteresis(Q, Qlow=0, Qhigh=0)[source]

A seemingly over-complicated way of solving a simple problem. Breaking a time series into windows of positive and negative values (e.g. delimiting flood and ebb in a velocity or flux time series). With the complicating factor of the time series sometimes hovering close to zero. This function introduces some hysteresis, while preserving the exact timing of the original zero-crossing. Qlow (typ. <0) and Qhigh (typ>0) set the hysteresis band. Crossings only count when they start below Qlow and end above Qhigh. Within that period, there could be many small amplitude crossings - the last one is used. This is in contrast to standard hysteresis where the timing would corresponding to crossing Qhigh or Qlow.

returns two arrays, pos_windows = [Npos,2], neg_windows=[Nneg,2] giving the indices in Q for respective windows

stompy.utils.hashes_to_array(records, float_fill=nan, int_fill=-99, uint_fill=0)[source]
stompy.utils.haversine(a, b)[source]

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

a,b: longitude/latitude in degrees.

Credit to ballsatballs.dotballs, https://stackoverflow.com/questions/29545704/fast-haversine-approximation-python-pandas

returns distance in km.

stompy.utils.hour_tide(jd, u=None, h=None, jd_new=None, leave_mean=False)[source]

Calculate tide-hour from a time series of u (velocity, flood-positive) or h (water level, i.e. positive-up).

jd: time in days, i.e. julian day, datenum, etc. u: velocity, flood-positive h: water level, positive up. jd_new: if you want to evaluate the tidal hour at a different

(but overlapping) set of times.
leave_mean: by default, the time series mean is removed, but this
can be disabled by passing True here.

the return values are between 0 and 12, with 0 being slack before ebb (hmm - may need to verify that.).

stompy.utils.hour_tide_fn(jd, u, leave_mean=False)[source]

Return a function for extracting tidal hour from the time/velocity given. Use the _fn version if making repeated calls with different jd_new, but the same jd,u

stompy.utils.interp_bilinear(x, y, z)[source]
stompy.utils.interp_near(x, sx, sy, max_dx=None)[source]
stompy.utils.invert_permutation(p)[source]

Returns an array s, where s[i] gives the index of i in p. The argument p is assumed to be some permutation of 0, 1, …, len(p)-1. see http://stackoverflow.com/questions/11649577/how-to-invert-a-permutation-array-in-numpy

this version, using np.arange instead of xrange, is 23% faster than the np.put() version on that stackoverflow question.

stompy.utils.is_stale(target, srcs, ignore_missing=False)[source]

Makefile-esque checker – if target does not exist or is older than any of srcs, return true (i.e. stale). if a src does not exist, raise an Exception, unless ignore_missing=True.

stompy.utils.isnant(x)[source]

try isnan, if it fails try isnat

stompy.utils.isnat(x)[source]

datetime64 analog to isnan. doesn’t yet exist in numpy - other ways give warnings and are likely to change.

stompy.utils.isolate_downcasts(ds, z_surface='auto', min_cast_samples=10, z_var='z', time_dim='time', positive='up', z_slush=0.3)[source]

take a timeseries with depth, split it into downcasts.

stompy.utils.mag(vec)[source]
stompy.utils.mean_dt64(vec)[source]
stompy.utils.model_skill(xmodel, xobs, ignore_nan=True)[source]

Wilmott 1981 model skill metric

stompy.utils.moving_average_nearest(x, y, n)[source]

x,y: 1-D arrays n: integer giving size of the averaging window

Returns a new array with each element calculated as the mean of the n nearest (in terms of x coordinate) input values.

Very inefficient implementation!

stompy.utils.murphy_skill(xmodel, xobs, xref=None, ignore_nan=True)[source]

Murphy Skill metric

stompy.utils.nan_cov(m, rowvar=1, demean=False)[source]

covariance of the matrix, follows calling conventions of numpy’s cov function w.r.t. rows/columns

stompy.utils.nanrms(v)[source]
stompy.utils.nearest(A, x, max_dx=None)[source]

like searchsorted, but return the index of the nearest value, not just the first value greater than x. if max_dx is given, then return -1 for items where the nearest match was more than max_dx away

stompy.utils.nearest_val(A, x)[source]

return something like x, but with each element (or the scalar) replaced by the nearest value in the sorted 1-D array A

stompy.utils.orient_intersection(lsA, lsB, delta=1.0)[source]

lsA,lsB: LineString geometries with exactly one intersecting point. returns 1: if lsA crosses lsB left to right, when looking from the start towards the end of lsB. -1: the other case.

delta: current implementation uses a finite difference, and delta specifies the scale of that difference.

stompy.utils.parse_xl_address(s)[source]
stompy.utils.parse_xl_range(s)[source]
stompy.utils.path(append)[source]
stompy.utils.point_in_polygon(pgeom, randomize=False)[source]

return a point that lies inside the given [multi]polygon geometry

randomize: if True, pick points randomly, but fallback to deterministic approach when the geometry is to sparse

stompy.utils.point_line_distance(point, line)[source]

point: [nd] array line [2,nd] array

stompy.utils.point_segment_distance(point, seg, return_alpha=False)[source]

Distance from point to finite segment point: [nd] array seg [2,nd] array

stompy.utils.point_segments_distance(point, segs, return_alpha=False)[source]

Like point_segment_distance, but vectorized over segments Distance from point to finite segment point: [nd] array seg [ns,2,nd] array

stompy.utils.poly_circumcenter(points)[source]

unbiased (mostly) estimate of circumcenter, by computing circumcenter of consecutive groups of 3 points Similar to Janet, though Janet uses two sets of three while this code uses all consecutive groups of three.

stompy.utils.principal_theta(vec, eta=None, positive='flood', detrend=False, ambiguous='warn', ignore_nan=True)[source]

vec: 2D velocity data, last dimension must be {x,y} component. eta: if specified, freesurface data with same time dimension as vec.

used in conjunction with positive to resolve the ambiguity in theta. this is approximate at best, esp. since there is no time information. the assumption is that the tides are between standing and progressive, such that u*h and u*dh/dt are both positive for flood-positive u.
if positive is a number, it is taken as the preferential theta, and
ambiguity will be resolved by choosing the angle close to positive

ambiguous: ‘warn’,’error’,’standing’,’progressive’,’nan’ see code.

stompy.utils.principal_vec(vec, **kw)[source]
stompy.utils.progress(a, interval_s=5.0, msg='%s', func=<bound method Logger.info of <logging.Logger object>>)[source]

Print progress messages while iterating over a sequence a.

a: iterable interval_s: progress will be printed every x seconds msg: message format, with %s format string func: alternate display mechanism. defaults log.info

stompy.utils.quadmesh_interp(X, Z, V, x, z)[source]
stompy.utils.quadmesh_interp_one(X, Z, V, x, z)[source]
stompy.utils.quantize(a, stride, axis=0, reducer=<function mean>)[source]
stompy.utils.recarray_add_fields(A, new_fields)[source]

A: a record array new_fields: [ (‘name’,data), … ] where data must be the same length as A. So far, no support for non-scalar values

stompy.utils.recarray_del_fields(A, old_fields)[source]
stompy.utils.records_to_array(records)[source]
stompy.utils.remove_repeated(A, axis=0)[source]

A: numpy 1D array return A, without repeated element values.

stompy.utils.resample_to_common(A, Z, z=None, dz=None, n_samples=None, max_z=None, min_z=None, left=None, right=None)[source]

given data in A, with the coordinates of one axis dependent on the other, resample, so that all elements of that axis have the same coordinates. in other words, A~[time,depth] Z~[time,depth] but you want z~[depth].

for now, z has to be the second axis.

Z can either be the per-element z values, or just a pair of values giving the evenly spaced range.

stompy.utils.rms(v)[source]
stompy.utils.rot(angle, pnts)[source]
stompy.utils.rot_fn(angle)[source]
stompy.utils.rotate_to_principal(vec, eta=None, positive='flood', detrend=False, ambiguous='warn')[source]
stompy.utils.runfile(fn)[source]

Run a python script. Sets __file__ appropriately. Globals

stompy.utils.select_increasing(x)[source]

Return a bitmask over x removing any samples which are less than or equal to the largest preceding sample

stompy.utils.set_keywords(obj, kw)[source]

Utility for __init__ methods to update object state with keyword arguments. Checks that the attributes already exist, to avoid spelling mistakes. Uses getattr and setattr for compatibility with properties.

stompy.utils.signed_area(points)[source]
stompy.utils.strftime(d, fmt='%Y-%m-%d %H:%M')[source]

convenience method to convert to python datetime and format to string

stompy.utils.to_datetime(x)[source]
stompy.utils.to_dnum(x)[source]
stompy.utils.to_dt64(x)[source]
stompy.utils.to_jdate(t)[source]

Convert a time-like scalar t to integer julian date, e.g. 2016291 (year and day of year concatenated) The first of the year is 000, so add 1 if you want 2016-01-01 to be 2016001.

stompy.utils.to_unit(vecs)[source]
stompy.utils.to_unix(t)[source]

Convert t to unix epoch time, defined as the number of seconds since 1970-01-01 00:00:00. The result is a float or int, and is not distinguishable a priori from datenums. For that reason, numerical values passed to the other date converters are assumed to be datenums.

Also note that if you want to double-check the results of this function via python’s datetime module, use datetime.datetime.utcfromtimestamp(x). Otherwise you are likely to get some unwelcome time zone conversions.

stompy.utils.touch(fname, times=None)[source]

Like unix touch. Thanks to http://stackoverflow.com/questions/1158076/implement-touch-using-python

stompy.utils.uniquify_paths(fns)[source]
stompy.utils.unix_to_dt64(t)[source]

Convert a floating point unix timestamp to numpy datetime64

stompy.utils.within(item, ends, as_slice=False, fmt='auto')[source]

original version defaulted to bitfield, but could be forced to return a slice with as_slice. as_slice overrides fmt, and is the same as fmt=’slice’ fmt: auto, slice, mask, index

stompy.utils.within_2d(vecs, xxyy)[source]

Return a bit mask for points falling within a bounding rectangle. Nothing clever, just a minor bit of shorthand.

vecs: […,2] array of xy coordinates. xxyy: bounding box, [xmin,xmax,ymin,ymax]