optimize

Optimization functionality

pynumdiff.optimize.optimize(func, x, dt, dxdt_truth=None, tvgamma=0.01, search_space_updates={}, metric='rmse', padding=0, opt_method='Nelder-Mead', maxiter=10, parallel=True, huberM=6)

Find the optimal hyperparameters for a given differentiation method.

Parameters:
  • func (function) – differentiation method to optimize parameters for, e.g. kalman_smooth.rtsdiff

  • x (np.array[float]) – data to differentiate

  • dt (float) – step size

  • dxdt_truth (np.array[float]) – actual time series of the derivative of x, if known

  • tvgamma (float) – Only used if dxdt_truth is not given. Regularization value used to select for parameters that yield a smooth derivative. Larger value results in a smoother derivative.

  • search_space_updates (dict) – Each method has a default search space of parameter settings, structured as {param1:[numerical, values], param2:{categorical, values}, param3:value, ...} (defined in _optimize.py). The Cartesian product of values are used as initial starting points in optimization. If left None, the default search space is used, if {param1:[different,values]}, these are applied.

  • metric (str) – either 'rmse' or 'error_correlation', only applies if dxdt_truth is given

  • padding (int) – number of steps to ignore at the beginning and end of the data series, or 'auto' to ignore 2.5% at each end. Larger value causes the optimization to emphasize the accuracy in the series middle.

  • opt_method (str) – Optimization technique used by scipy.minimize, the workhorse

  • maxiter (int) – passed down to scipy.minimize, maximum iterations

  • parallel (bool) – whether to use multiple processes to optimize, typically faster for single optimizations. For experiments, it is often a better use of resources to parallelize at that level, meaning each must run in its own process, since spawned processes are not allowed to further spawn.

  • huberM (float) – For ground-truth-less situation, if \(M < \infty\), use outlier-robust, Huber-based accuracy metric in objective. \(M\) is in units akin to standard deviation (see evaluate.robust_rme), so transition from quadratic to linear regime for errors lying \(>\!M\sigma\) away from mean error.

Returns:

  • opt_params (dict) – best parameter settings for the differentation method

  • opt_value (float) – lowest value found for objective function

pynumdiff.optimize.suggest_method(x, dt, dxdt_truth=None, cutoff_frequency=None)

This is meant as an easy-to-use, automatic way for users with some time on their hands to determine a good method and settings for their data. It calls the optimizer over (almost) all methods in the repo using default search spaces defined at the top of the pynumdiff/optimize/_optimize.py file. This routine will take a few minutes to run.

Excluded:
  • first_order, because iterating causes drift

  • lineardiff, iterative_velocity, and jerk_sliding, because they either take too long, can be fragile, or tend not to do best

  • all cvxpy-based methods if it is not installed

  • velocity because it tends to not be best but dominates the optimization process by directly optimizing the second term of the metric \(L = \text{RMSE} \Big( \text{trapz}(\mathbf{ \hat{\dot{x}}}(\Phi)) + \mu, \mathbf{y} \Big) + \gamma \Big({TV}\big(\mathbf{\hat{ \dot{x}}}(\Phi)\big)\Big)\)

Parameters:
  • x (np.array[float]) – data to differentiate

  • dt (float) – step size, because most methods are not designed to work with variable step sizes

  • dxdt_truth (np.array[float]) – if known, you can pass true derivative values; otherwise you must use :code: cutoff_frequency

  • cutoff_frequency (float) – in Hz, the highest dominant frequency of interest in the signal, used to find parameter \(\gamma\) for regularization of the optimization process in the absence of ground truth. See https://ieeexplore.ieee.org/document/9241009. Estimate by (a) counting real number of peaks per second in the data, (b) looking at power spectrum and choosing a cutoff, or (c) making an educated guess.

Returns:

tuple[callable, dict] of

  • method – a reference to the function handle of the differentiation method that worked best

  • opt_params – optimal parameter settings for the differentation method