1. User API

This section describes the main API users are expected to interact with.

class jenn.NeuralNet(layer_sizes: list[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]

Neural network model.

Parameters:
  • layer_sizes – number of nodes in each layer (including input/output layers)

  • hidden_activation – activation function used in hidden layers

  • output_activation – activation function used in output layer

fit(x: np.ndarray, y: np.ndarray, dydx: np.ndarray | None = None, is_normalize: bool = False, alpha: float = 0.05, beta: np.ndarray | float = 1.0, gamma: np.ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, batch_size: int | None = None, max_iter: int = 1000, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_warmstart: bool = False, is_verbose: bool = False) Self[source]

Train neural network.

Note

If training is taking too long, it can be stopped gracefully by creating a local file called STOP in the running directory. Just be sure to delete it before the next run.

Parameters:
  • x – training data inputs, array of shape (n_x, m)

  • y – training data outputs, array of shape (n_y, m)

  • dydx – training data Jacobian, array of shape (n_y, n_x, m)

  • is_normalize – normalize training by mean and variance

  • alpha – optimizer learning rate for line search

  • beta – LSE coefficients [defaulted to one] (optional)

  • gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)

  • lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)

  • beta1ADAM optimizer hyperparameter to control momentum

  • beta2 – ADAM optimizer hyperparameter to control momentum

  • tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration

  • tol – stop when cost function doesn’t improve more than specified tolerance

  • max_count – stop when line search iterations exceed maximum count specified

  • epsilon_absolute – absolute error stopping criterion

  • epsilon_relative – relative error stopping criterion

  • epochs – number of passes through data

  • batch_size – size of each batch for minibatch

  • max_iter – max number of optimizer iterations

  • shuffle – shuffle minibatches or not

  • random_state – control repeatability

  • is_backtracking – use backtracking line search or not

  • is_warmstart – do not initialize parameters

  • is_verbose – print out progress for each (iteration, batch, epoch)

Returns:

NeuralNet instance (self)

Warning

Normalization usually helps, except when the training data is made up of very small numbers. In that case, normalizing by the variance has the undesirable effect of dividing by a very small number and should not be used.

classmethod load(file: str | Path = 'parameters.json') NeuralNet[source]

Load serialized parameters into a new NeuralNet instance.

predict(x: np.ndarray) np.ndarray[source]

Predict responses.

Parameters:

x – vectorized inputs, array of shape (n_x, m)

Returns:

predicted response(s), array of shape (n_y, m)

predict_partials(x: np.ndarray) np.ndarray[source]

Predict partials derivatives.

Parameters:

x – vectorized inputs, array of shape (n_x, m)

Returns:

predicted partial(s), array of shape (n_y, n_x, m)

save(file: str | Path = 'parameters.json') None[source]

Serialize parameters and save to JSON file.

jenn.plot_actual_by_predicted(y_pred: NDArray, y_true: NDArray, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, legend_fontsize: int = 7, legend_label: str = 'data', alpha: float = 0.5, ax: plt.Axes | None = None) Figure | SubFigure | None[source]

Plot predicted vs. actual value.

Note

This method uses ravel(). A NumPy array with shape \((n_y, m)\) will become \((n_y m,)\). This is useful to merge all responses in one plot. Use indexing to handle responses separately, e.g. jenn.plot_actual_by_predicted(y_pred=model.predict(x=x_test[2]), y_true=y_test[2]).

Parameters:
  • y_pred – predicted values for each dataset, list of arrays of shape (m,)

  • y_true – true values for each dataset, list of arrays of shape (m,)

  • figsize – figure size

  • fontsize – text size to use for axis labels

  • fontsize – text size to use for legend labels

  • alpha – transparency of dots (between 0 and 1)

  • ax – the matplotlib axes on which to plot the data

Returns:

matplotlib Figure instance

jenn.plot_contours(func: Callable, x_min: np.ndarray, x_max: np.ndarray, x0: np.ndarray | None = None, x1_index: int = 0, x2_index: int = 1, y_index: int = 0, x_train: np.ndarray | None = None, x_test: np.ndarray | None = None, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 0.5, title: str = '', x1_label: str | None = None, x2_label: str | None = None, y_label: str | None = None, levels: int = 20, resolution: int = 100, show_colorbar: bool = False, ax: plt.Axes | None = None) Figure | SubFigure | None[source]

Plot contours of a scalar function of two variables, y = f(x1, x2).

Note

This method takes in a function of signature form y=f(x) and maps it onto a function of signature form y=f(x1, x2) such that the contours can be plotted.

Parameters:
  • func – the function to be evaluate, y = f(x)

  • lb – lower bounds on x

  • ub – upper bounds on x

  • x1_index – index of x to use for factor #1

  • x2_index – index of x to use for factor #2

  • y_index – index of y to be plotted

  • x_train – option to overlay training data if provided

  • x_test – option to overlay test data if provided

  • figsize – figure size

  • fontsize – text size

  • alpha – transparency of dots (between 0 and 1)

  • title – title of figure

  • x1_label – factor #1 label

  • x2_label – factor #1 label

  • y_label – response label

  • levels – number of contour levels

  • resolution – line resolution

  • show_colorbar – show the colorbar

  • ax – the matplotlib axes on which to plot the data

Returns:

matplotlib figure instance

jenn.plot_convergence(histories: History | list[History], figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '', legend: list[str] | None = None, is_xlog: bool = False, is_ylog: bool = True, ax: plt.Axes | None = None) Figure | SubFigure | None[source]

Plot training history.

Parameters:
  • histories – training history for each model

  • figsize – subfigure size of each subplot

  • fontsize – text size

  • alpha – transparency of dots (between 0 and 1)

  • title – title of figure

  • legend – label for each model

  • is_xlog – use log scale for x-axis

  • is_ylog – use log scale for y-axis

  • ax – the matplotlib axes on which to plot the data

Returns:

matplotlib figure instance

jenn.plot_goodness_of_fit(y_pred: NDArray, y_true: NDArray, title: str = '', percent: bool = False) Figure[source]

Make goodness of fit summary plots.

jenn.plot_histogram(y_pred: NDArray, y_true: NDArray, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, legend_fontsize: int = 7, legend_label: str = 'data', alpha: float = 0.75, percent: bool = False, ax: plt.Axes | None = None) Figure | SubFigure | None[source]

Plot prediction error distribution.

Note

This method uses ravel(). A NumPy array with shape (n_y, m) becomes (n_y * m,).

Parameters:
  • y_pred – predicted values for each dataset, list of arrays of shape (m,)

  • y_true – true values for each dataset, list of arrays of shape (m,)

  • figsize – figure size

  • fontsize – text size to use for axis labels

  • fontsize – text size to use for legend labels

  • alpha – transparency of dots (between 0 and 1)

  • percent – show residuals as percentages

  • ax – the matplotlib axes on which to plot the data

Returns:

matplotlib Figure instance

jenn.plot_residual_by_predicted(y_pred: NDArray, y_true: NDArray, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, legend_fontsize: int = 7, legend_label: str = 'data', alpha: float = 0.5, percent: bool = False, ax: plt.Axes | None = None) Figure | SubFigure | None[source]

Plot prediction error vs. predicted value.

Note

This method uses ravel(). A NumPy array with shape \((n_y, m)\) will become \((n_y m,)\). This is useful to merge all responses in one plot. Use indexing to handle responses separately, e.g. jenn.plot_actual_by_predicted(y_pred=model.predict(x=x_test[2]), y_true=y_test[2]).

Parameters:
  • y_pred – predicted values for each dataset, list of arrays of shape (m,)

  • y_true – true values for each dataset, list of arrays of shape (m,)

  • figsize – figure size

  • fontsize – text size to use for axis labels

  • fontsize – text size to use for legend labels

  • alpha – transparency of dots (between 0 and 1)

  • percent – show residuals as percentages

  • ax – the matplotlib axes on which to plot the data

Returns:

matplotlib Figure instance

jenn.plot_sensitivity_profiles(func: Callable | list[Callable], x_min: np.ndarray, x_max: np.ndarray, x0: np.ndarray | None = None, x_true: np.ndarray | None = None, y_true: np.ndarray | None = None, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '', xlabels: list[str] | None = None, ylabels: list[str] | None = None, legend_fontsize: int = 7, legend_label: str | list[str] | None = None, resolution: int = 100, show_cursor: bool = True) Figure[source]

Plot grid of all outputs vs. all inputs evaluated at x0.

Parameters:
  • func – callable function(s) for evaluating y = func(x)

  • x_min – lower bound, array of shape (n_x, 1)

  • x_max – upper bound, array of shape (n_x, 1)

  • x0 – point of evaluation, array of shape (n_x, 1)

  • x_true – true data inputs, array of shape (n_x, m)

  • y_true – true data outputs, array of shape (n_y, m)

  • figsize – figure size

  • fontsize – text size

  • alpha – transparency of dots (between 0 and 1)

  • title – title of figure

  • xlabels – x-axis labels

  • ylabels – y-axis labels

  • resolution – line resolution

  • legend_fontsize – legend text size

  • legend_label – legend labels for each model in func list

  • show_cursor – show x0 as a red dot (or not)

jenn.utilities.finite_difference(f: Callable, x: np.ndarray, dx: float = 1e-06) np.ndarray[source]

Evaluate partial derivative using finite difference.

Parameters:

x – inputs, array of shape (n_x, m)

Returns:

partials, array of shape (n_y, n_x, m)

jenn.utilities.from_jmp(equation: str | Path) NeuralNet[source]

Load trained JMP model given formula.

Note

Expected equation assumed to be obtained from JMP using the “Save Profile Formulas” method and copy/paste.

Note

JMP yields a separate equation for each output. It does not provided a single equation that predicts all outputs at once. This function therefore yields NeuralNet objects that predict only a single output (consistent with JMP).

Warning

Order of inputs matches order used in JMP. Burden is on user to keep track of variable ordering.

Parameters:

equation – either the equation itself or a filename containing it

Returns:

jenn.model.NeuralNet object preloaded with the JMP parameters

jenn.utilities.rbf(r: ndarray, epsilon: float = 0.0, out: ndarray | None = None) ndarray[source]

Compute Gaussian Radial Basis Function (RBF).

Parameters:
  • r – radius from center of RBF

  • epsilon – hyperparameter

jenn.utilities.sample(f: Callable, m_random: int, m_levels: int, lb: np.typing.ArrayLike, ub: np.typing.ArrayLike, dx: float = 1e-06, f_prime: Callable | None = None, random_state: int | None = None) tuple[np.ndarray, np.ndarray, np.ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:
  • f (Callable) – callable function to be sampled, y = f(x)

  • m_random (int) – number of random samples

  • m_levels (int) – number of levels per factor for full factorial

  • lb (np.ndarray) – lower bound on the factors

  • ub (np.ndarray) – upper bound on the factors

  • dx (float) – finite difference step size

  • f_prime (Callable) – callable 1st derivative to be sampled, y = f’(x)

  • random_state (int) – random seed (for repeatability)

Returns:

sampled (x, y, y’)

Return type:

np.ndarray

2. Core API

The core API implements all theory described in the paper. This section is intended for developers.

2.1. Model.

This module contains the main class to train a neural net and make predictions. It acts as an interface between the user and the core functions doing computations under-the-hood.

#################
# Example Usage #
#################

import jenn

# Fit model
model = jenn.NeuralNet(
    layer_sizes=[
        x_train.shape[0],  # input layer
        7, 7,              # hidden layer(s) -- user defined
        y_train.shape[0]   # output layer
     ],
    ).fit(
        x_train, y_train, dydx_train, **kwargs # note: user must provide this
    )

# Predict response only
y_pred = model.predict(x_test)

# Predict partials only
dydx_pred = model.predict_partials(x_train)

# Predict response and partials in one step (preferred)
y_pred, dydx_pred = model(x_test)

Note

The __call__() method should be preferred over separately calling predict() followed by predict_partials() whenever both the response and its partials are needed at the same point. This saves computations since, in the latter approach, forward propagation is unecessarily performed twice. Similarly, to avoid unecessary partial deerivative calculations, the predict() method should be preferred whenever only response values are needed. The method predict_partials() is provided for those situations where it is necessary to separate out Jacobian predictions, due to how some target optimization software architected for example.

class jenn.core.model.NeuralNet(layer_sizes: list[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]

Neural network model.

Parameters:
  • layer_sizes – number of nodes in each layer (including input/output layers)

  • hidden_activation – activation function used in hidden layers

  • output_activation – activation function used in output layer

fit(x: np.ndarray, y: np.ndarray, dydx: np.ndarray | None = None, is_normalize: bool = False, alpha: float = 0.05, beta: np.ndarray | float = 1.0, gamma: np.ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, batch_size: int | None = None, max_iter: int = 1000, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_warmstart: bool = False, is_verbose: bool = False) Self[source]

Train neural network.

Note

If training is taking too long, it can be stopped gracefully by creating a local file called STOP in the running directory. Just be sure to delete it before the next run.

Parameters:
  • x – training data inputs, array of shape (n_x, m)

  • y – training data outputs, array of shape (n_y, m)

  • dydx – training data Jacobian, array of shape (n_y, n_x, m)

  • is_normalize – normalize training by mean and variance

  • alpha – optimizer learning rate for line search

  • beta – LSE coefficients [defaulted to one] (optional)

  • gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)

  • lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)

  • beta1

    ADAM optimizer hyperparameter to control momentum

  • beta2 – ADAM optimizer hyperparameter to control momentum

  • tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration

  • tol – stop when cost function doesn’t improve more than specified tolerance

  • max_count – stop when line search iterations exceed maximum count specified

  • epsilon_absolute – absolute error stopping criterion

  • epsilon_relative – relative error stopping criterion

  • epochs – number of passes through data

  • batch_size – size of each batch for minibatch

  • max_iter – max number of optimizer iterations

  • shuffle – shuffle minibatches or not

  • random_state – control repeatability

  • is_backtracking – use backtracking line search or not

  • is_warmstart – do not initialize parameters

  • is_verbose – print out progress for each (iteration, batch, epoch)

Returns:

NeuralNet instance (self)

Warning

Normalization usually helps, except when the training data is made up of very small numbers. In that case, normalizing by the variance has the undesirable effect of dividing by a very small number and should not be used.

classmethod load(file: str | Path = 'parameters.json') NeuralNet[source]

Load serialized parameters into a new NeuralNet instance.

predict(x: np.ndarray) np.ndarray[source]

Predict responses.

Parameters:

x – vectorized inputs, array of shape (n_x, m)

Returns:

predicted response(s), array of shape (n_y, m)

predict_partials(x: np.ndarray) np.ndarray[source]

Predict partials derivatives.

Parameters:

x – vectorized inputs, array of shape (n_x, m)

Returns:

predicted partial(s), array of shape (n_y, n_x, m)

save(file: str | Path = 'parameters.json') None[source]

Serialize parameters and save to JSON file.

2.2. Activation.

This module implements activation functions used by the neural network.

class jenn.core.activation.Activation[source]

Activation function base class.

abstract classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]

Evaluate activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

abstract classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

abstract classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • y – 1st derivative already evaluated at x (optional)

  • ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

class jenn.core.activation.Linear[source]

Linear activation function.

\[y = x\]
classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]

Evaluate activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • y – 1st derivative already evaluated at x (optional)

  • ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

class jenn.core.activation.Relu[source]

Rectified linear unit activation.

\[\begin{split}y = \begin{cases} x & \text{if}~ x \ge 0 \\ 0 & \text{otherwise} \end{cases}\end{split}\]
classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]

Evaluate activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • y – 1st derivative already evaluated at x (optional)

  • ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

class jenn.core.activation.Tanh[source]

Hyperbolic tangent.

\[y = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]
classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]

Evaluate activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:
  • x – input array at which to evaluate the function

  • y – response already evaluated at x (optional)

  • y – 1st derivative already evaluated at x (optional)

  • ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

2.3. Cache.

This module defines a convenience class to all quantities computed during forward propagation, so they don’t have to be recomputed again during backward propgation. See paper for details and notation.

class jenn.core.cache.Cache(layer_sizes: list[int], m: int = 1)[source]

Neural net cache.

A cache s neural net quantities computed during forward prop for each layer, so they don’t have to be recomputed again during backprop. This makes the algorithm faster.

Warning

The attributes of this class are not protected. It’s possible to overwrite them instead of updating them in place. To ensure that an array is updated in place, use the numpy [:] syntax:

cache = Cache(shapes)
layer_1_activations = cache.A[1]
layer_1_activations[:] = new_array_values  # note [:]

Note

The variables and their symbols refer to the theory in the companion paper for this library.

Parameters:
  • layer_sizes – number of nodes in each layer (including input/output layers)

  • m – number of examples (used to preallocate arrays)

Variables:
  • Z (List[numpy.ndarray]) – \(Z^{[l]} \in \mathbb{R}^{n^{[l]}\times m}~\forall~ l = 1 \dots L\)

  • Z_prime (List[numpy.ndarray]) – \({Z^\prime}^{[l]} \in \mathbb{R}^{n^{[l]}\times n_x \times m}~\forall~ l = 1 \dots L\)

  • A (List[numpy.ndarray]) – \(A^{[l]} = g(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)

  • A_prime (List[numpy.ndarray]) – \({A^\prime}^{[l]} = g^\prime(Z^{[l]})Z^{\prime[l]} \in \mathbb{R}^{n^{[l]}\times n_x \times m}\)

  • G_prime (List[numpy.ndarray]) – \(G^{\prime} = g^{\prime}(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)

  • G_prime_prime (List[numpy.ndarray]) – \(G^{\prime\prime} = g^{\prime\prime}(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}\)

  • dA (List[numpy.ndarray]) – \({\partial \mathcal{J}}/{dA^{[l]}} \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)

  • dA_prime\({\partial \mathcal{J}}/{dA^{\prime[l]}} \in \mathbb{R}^{n^{[l]} \times n_x \times m}~\forall~ l = 1 \dots L\)

property m: int

Return number of examples.

property n_x: int

Return number of inputs.

property n_y: int

Return number of outputs.

2.4. Cost Function.

This module contains class and methods to efficiently compute the neural net cost function used for training. It is a modified version of the Least Squared Estimator (LSE), augmented with a penalty function for regularization and another term which accounts for Jacobian prediction error. See paper for details and notation.

class jenn.core.cost.Cost(data: Dataset, parameters: Parameters, lambd: float = 0.0)[source]

Neural Network cost function.

Parameters:
  • data – Dataset object containing training data (and associated metadata)

  • parameters – object containing neural net parameters (and associated metadata) for each layer

  • lambd – regularization coefficient to avoid overfitting

evaluate(Y_pred: ndarray, J_pred: ndarray | None = None) float64[source]

Evaluate cost function.

Parameters:
  • Y_pred – predicted outputs \(A^{[L]} \in \mathbb{R}^{n_x \times m}\)

  • J_pred – predicted Jacobian \(A^{\prime[L]} \in \mathbb{R}^{n_y \times n_x \times m}\)

class jenn.core.cost.GradientEnhancement(J_true: ndarray, J_weights: ndarray | float = 1.0)[source]

Least Squares Estimator for partials.

Parameters:
  • J_true – training data jacobian \(Y^{\prime} \in \mathbb{R}^{n_y \times m}\)

  • J_weights – weights by which to prioritize partials (optional)

evaluate(J_pred: ndarray) float64[source]

Compute least squares estimator for the partials.

Parameters:

J_pred – predicted Jacobian \(A^{\prime[L]} \in \mathbb{R}^{n_y \times n_x \times m}\)

class jenn.core.cost.Regularization(weights: list[numpy.ndarray], lambd: float = 0.0)[source]

Compute regularization penalty.

evaluate() float[source]

Compute L2 norm penalty.

Parameters:
  • weights – neural parameters \(W^{[l]} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\) associated with each layer

  • lambd – regularization coefficient \(\lambda \in \mathbb{R}\) (hyperparameter to be tuned)

class jenn.core.cost.SquaredLoss(Y_true: ndarray, Y_weights: ndarray | float = 1.0)[source]

Least Squares Estimator.

Parameters:
  • Y_true – training data outputs \(Y \in \mathbb{R}^{n_y \times m}\)

  • Y_weights – weights by which to prioritize data points (optional)

evaluate(Y_pred: ndarray) float64[source]

Compute least squares estimator of the states in place.

Parameters:

Y_pred – predicted outputs \(A^{[L]} \in \mathbb{R}^{n_y \times m}\)

2.5. Data.

This module contains convenience utilities to manage and handle training data.

class jenn.core.data.Dataset(X: ndarray, Y: ndarray, J: ndarray | None = None, Y_weights: ndarray | float = 1.0, J_weights: ndarray | float = 1.0)[source]

Store training data and associated metadata for easy access.

Parameters:
  • X – training data outputs, array of shape (n_x, m)

  • Y – training data outputs, array of shape (n_y, m)

  • J – training data Jacobians, array of shape (n_y, n_x, m)

property avg_x: ndarray

Return mean of input data as array of shape (n_x, 1).

property avg_y: ndarray

Return mean of output data as array of shape (n_y, 1).

property m: int

Return number of training examples.

mini_batches(batch_size: int | None, shuffle: bool = True, random_state: int | None = None) list[jenn.core.data.Dataset][source]

Breakup data into multiple batches and return list of Datasets.

Parameters:
  • batch_size – mini batch size (if None, single batch with all data)

  • shuffle – swhether to huffle data points or not

  • random_state – random seed (useful to make runs repeatable)

Returns:

list of Dataset representing data broken up in batches

property n_x: int

Return number of inputs.

property n_y: int

Return number of outputs.

normalize() Dataset[source]

Return normalized Dataset.

set_weights(beta: ndarray | float = 1.0, gamma: ndarray | float = 1.0) None[source]

Prioritize certain points more than others.

Rational: this can be used to reward the optimizer more in certain regions.

Parameters:
  • beta – multiplier(s) on Y

  • beta – multiplier(s) on J

property std_x: ndarray

Return standard dev of input data, array of shape (n_x, 1).

property std_y: ndarray

Return standard dev of output data, array of shape (n_y, 1).

jenn.core.data.avg(array: ndarray) ndarray[source]

Compute mean and reshape as column array.

Parameters:

array – array of shape (-1, m)

Returns:

column array corresponding to mean of each row

jenn.core.data.denormalize(data: ndarray, mu: ndarray, sigma: ndarray) ndarray[source]

Undo normalization.

Parameters:
  • data – normalized data, array of shape (-1, m)

  • mu – mean of the data, array of shape (-1, 1)

  • sigma – std deviation of the data, array of shape (-1, 1)

Returns:

denormalized data, array of shape (-1, m)

jenn.core.data.denormalize_partials(partials: ndarray, sigma_x: ndarray, sigma_y: ndarray) ndarray[source]

Undo normalization of partials.

Parameters:
  • partials – normalized training data partials \(\bar{J}\in\mathbb{R}^{n_y\times n_x \times m}\)

  • sigma_x – std dev of training data factors \(\sigma_x\), array of shape (-1, 1)

  • sigma_y – std dev of training data responses \(\sigma_y\), array of shape (-1, 1)

Returns:

denormalized partials, array of shape (n_y, n_x, m)

jenn.core.data.mini_batches(X: ndarray, batch_size: int | None, shuffle: bool = True, random_state: int | None = None) list[tuple[int, ...]][source]

Create randomized mini-batches.

Parameters:
  • X – training data input \(X\in\mathbb{R}^{n_x\times m}\)

  • batch_size – mini batch size (if None, single batch with all data)

  • shuffle – swhether to huffle data points or not

  • random_state – random seed (useful to make runs repeatable)

Returns:

list of tuples containing training data indices allocated to each batch

jenn.core.data.normalize(data: ndarray, mu: ndarray, sigma: ndarray) ndarray[source]

Center data about mean and normalize by standard deviation.

Parameters:
  • data – data to be normalized, array of shape (-1, m)

  • mu – mean of the data, array of shape (-1, 1)

  • sigma – std deviation of the data, array of shape (-1, 1)

Returns:

normalized data, array of shape (-1, m)

jenn.core.data.normalize_partials(partials: ndarray | None, sigma_x: ndarray, sigma_y: ndarray) ndarray | None[source]

Normalize partials.

Parameters:
  • partials – training data partials to be normalized \(J\in\mathbb{R}^{n_y\times n_x \times m}\)

  • sigma_x – std dev of training data factors \(\sigma_x\), array of shape (-1, 1)

  • sigma_y – std dev of training data responses \(\sigma_y\), array of shape (-1, 1)

Returns:

normalized partials, array of shape (n_y, n_x, m)

jenn.core.data.std(array: ndarray) ndarray[source]

Compute standard deviation and reshape as column array.

Parameters:

array – array of shape (-1, m)

Returns:

column array corresponding to std dev of each row

2.6. Optimization.

This module implements gradient-based optimization using ADAM.

class jenn.core.optimization.ADAM(beta_1: float = 0.9, beta_2: float = 0.99)[source]

Take single step along the search direction as determined by ADAM.

Parameters \(\boldsymbol{x}\) are updated according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\) where \(\boldsymbol{s}\) is determined by ADAM in such a way to improve efficiency. This is accomplished making use of previous information (see paper).

Parameters:
  • beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)

  • beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)

class jenn.core.optimization.ADAMOptimizer(beta_1: float = 0.9, beta_2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 10)[source]

Search for optimum using ADAM algorithm.

Parameters:
  • beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)

  • beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)

  • tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration

  • tol – stop when cost function doesn’t improve more than specified tolerance

  • max_count – stop when line search iterations exceed maximum count specified

class jenn.core.optimization.Backtracking(update: Update, tau: float = 0.5, tol: float = 1e-06, max_count: int = 10)[source]

Search for optimum along a search direction.

Parameters:
  • update – object that updates parameters according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

  • tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration

  • tol – stop when cost function doesn’t improve more than specified tolerance

  • max_count – stop when line search iterations exceed maximum count specified

__call__(x0: np.ndarray, y0: np.ndarray, search_direction: np.ndarray, cost_function: Callable, learning_rate: float = 0.05) tuple[np.ndarray, np.ndarray][source]

Take multiple “update” steps along search direction.

Parameters:
  • x0 – initial value of parameters to be updated, array of shape (n,)

  • y0 – initial value of cost function evaluated at x0, array of shape (n,)

  • cost – objective function \(f\)

  • grads – gradient \(\nabla_x f\) of objective function \(f\) w.r.t. each parameter, array of shape (n,)

  • learning_rate – maximum allowed step size \(\alpha \le \alpha_{max}\)

Returns:

updated parameters and cost \(x, y\), 2 x array of shape (n,)

class jenn.core.optimization.GD[source]

Take single step along the search direction using gradient descent.

GD simply follows the steepest path according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\) where \(\boldsymbol{s} = \nabla_x f\)

class jenn.core.optimization.GDOptimizer(tau: float = 0.5, tol: float = 1e-06, max_count: int = 10)[source]

Search for optimum using gradient descent.

Warning

This optimizer is very inefficient. It was intended as a baseline during development. It is not recommended. Use ADAM instead.

Parameters:
  • tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration

  • tol – stop when cost function doesn’t improve more than specified tolerance

  • max_count – stop when line search iterations exceed maximum count specified

class jenn.core.optimization.LineSearch(update: Update)[source]

Take multiple steps of varying size by progressively varying \(\alpha\) along the search direction.

Parameters:

update – object that implements Update base class to update parameters according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

abstract __call__(x0: np.ndarray, y0: np.ndarray, search_direction: np.ndarray, cost_function: Callable, learning_rate: float) tuple[np.ndarray, np.ndarray][source]

Take multiple steps along the search direction.

Parameters:
  • x0 – initial value of parameters to be updated, array of shape (n,)

  • y0 – initial value of cost function evaluated at x0, array of shape (n,)

  • grads – cost function gradient w.r.t. parameters, array of shape (n,)

  • cost – cost function, array of shape (1,)

  • learning_rate – initial step size \(\alpha\)

Returns:

updated parameters and cost, 2 x array of shape (n,)

class jenn.core.optimization.Optimizer(line_search: LineSearch)[source]

Find optimum using gradient-based optimization.

Parameters:

line_search – object that implements algorithm to compute search direction \(\boldsymbol{s}\) given the gradient \(\nabla_x f\) at the current parameter values \(\boldsymbol{x}\) and take multiple steps along it to update them according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

minimize(x: np.ndarray, f: Callable, dfdx: Callable, alpha: float = 0.01, max_iter: int = 100, verbose: bool = False, epoch: int | None = None, batch: int | None = None, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, N1_max: int = 100, N2_max: int = 100) np.ndarray[source]

Minimize single objective function.

Parameters:
  • x – parameters to be updated, array of shape (n,)

  • f – cost function \(y = f(\boldsymbol{x})\)

  • alpha – learning rate \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

  • max_iter – maximum number of optimizer iterations allowed

  • verbose – whether or not to send progress output to standard out

  • epoch – the epoch in which this optimization is being run (for printing)

  • batch – the batch in which this optimization is being run (for printing)

  • epsilon_absolute – absolute error stopping criterion

  • epsilon_relative – relative error stopping criterion

  • N1_max – number of iterations for which absolute criterion must hold true before stop

  • N2_max – number of iterations for which relative criterion must hold true before stop

class jenn.core.optimization.Update[source]

Base class for line search.

Update parameters \(\boldsymbol{x}\) by taking a step along the search direction \(\boldsymbol{s}\) according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

__call__(params: np.ndarray, grads: np.ndarray, alpha: float, **kwargs: dict[str, Any]) np.ndarray[source]

Take a single step along search direction.

Parameters:
  • params – parameters \(x\) to be updated

  • grads – gradient \(\nabla_x f\) of objective function \(f\) w.r.t. each parameter \(x\)

  • alpha – learning rate \(\alpha\)

2.7. Parameters.

This module defines a utility class to store and manage neural net parameters and metadata.

class jenn.core.parameters.Parameters(layer_sizes: list[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]

Neural network parameters.

Warning

The attributes of this class are not protected. It’s possible to overwrite them instead of updating them in place. To ensure that an array is updated in place, use the numpy [:] syntax:

parameters = Parameters(**kwargs)
layer_1_weights = parameters.W[1]
layer_1_weights[:] = new_array_values  # note [:]

Note

The variables and their symbols refer to the theory in the companion paper for this library.

Parameters:
  • layer_sizes – number of nodes in each layer (including input/output layers)

  • hidden_activation – activation function used in hidden layers

  • output_activation – activation function used in output layer

Variables:
  • W (List[np.ndarray]) – weights \(\boldsymbol{W} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\) for each layer

  • b (List[np.ndarray]) – biases \(\boldsymbol{b} \in \mathbb{R}^{n^{[l]} \times 1}\) for each layer

  • a (List[str]) – activation names for each layer

  • dW (List[np.ndarray]) – partials w.r.t. weight \(dL/dW^{[l]} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\)

  • db (List[np.ndarray]) – partials w.r.t. bias \(dL/db^{[l]} \in \mathbb{R}^{n^{[l]} \times 1}\)

  • mu_x (List[np.ndarray]) – mean of training data inputs used for normalization \(\mu_x \in \mathbb{R}^{n_x \times 1}\)

  • mu_y – mean of training data outputs used for normalization \(\mu_y \in \mathbb{R}^{n_y \times 1}\)

  • sigma_x (List[np.ndarray]) – standard deviation of training data inputs used for normalization \(\sigma_x \in \mathbb{R}^{n_x \times 1}\)

  • sigma_y (List[np.ndarray]) – standard deviation of training data outputs used for normalization \(\sigma_y \in \mathbb{R}^{n_y \times 1}\)

property L: int

Return number of layers.

initialize(random_state: int | None = None) None[source]

Use He initialization to initialize parameters.

Parameters:

random_state – optional random seed (for repeatability)

property layers: Iterable[int]

Return iterator of index for each layer.

classmethod load(binary_file: str | Path = 'parameters.json') Parameters[source]

Load serialized parameters into a new Parameters instance.

Parameters:

binary_file (str | Path) – JSON file containing saved parameters

Returns:

a new instance of Parameters

Return type:

Parameters

property n_x: int

Return number of inputs.

property n_y: int

Return number of outputs.

property partials: Iterable[int]

Return iterator of index for each partial.

save(binary_file: str | Path = 'parameters.json') None[source]

Save parameters to specified json file.

stack() ndarray[source]

Stack W, b into a single array.

parameters.stack()
>> np.array([[W1], [b1], [W2], [b2], [W3], [b3]])

Note

This method is used to convert the list format used by the neural net into a single array of stacked parameters for optimization.

stack_partials() ndarray[source]

Stack backprop partials dW, db.

parameters.stack_partials()
>> np.array([[dW1], [db1], [dW2], [db2], [dW3], [db3]])

Note

This method is used to convert the list format used by the neural net into a single array of stacked parameters for optimization.

stack_partials_per_layer() list[numpy.ndarray][source]

Stack backprop partials dW, db per layer.

parameters.stack_partials_per_layer()
>> [np.array([[dW1], [db1]]), np.array([[dW2], [db2]]), np.array([[dW3], [db3]]),]
stack_per_layer() list[numpy.ndarray][source]

Stack W, b into a single array for each layer.

parameters.stack_per_layer()
>> [np.array([[W1], [b1]]), [W2], [b2]]), np.array([[W3], [b3]])]
unstack(parameters: ndarray | list[numpy.ndarray]) None[source]

Unstack parameters W, b back into list of arrays.

Parameters:

parameters – neural network parameters as either a single array where all layers are stacked on top of each other or a list of stacked parameters for each layer.

# Unstack from single stack
parameters.unstack(np.array([[W1], [b1], [W2], [b2], [W3], [b3]]))
parameters.W, parameters.b
>> [W1, W2, W3], [b1, b2, b3]

# Unstack from list of stacks
parameters.unstack([np.array([[W1], [b1]]), [W2], [b2]]), np.array([[W3], [b3]])])
parameters.W, parameters.b
>> [W1, W2, W3], [b1, b2, b3]

Note

This method is used to convert optimization results expressed as a single array of stacked parameters, back into the list format used by the neural net.

unstack_partials(partials: ndarray | list[numpy.ndarray]) None[source]

Unstack backprop partials dW, db back into list of arrays.

Parameters:

partials – neural network partials as either a single array where all layers are stacked on top of each other or a list of stacked parameters for each layer.

# Unstack from single stack
parameters.unstack(np.array([[dW1], [db1], [dW2], [db2], [dW3], [db3]]))
parameters.dW, parameters.db
>> [dW1, dW2, dW3], [db1, db2, db3]

# Unstack from list of stacks
parameters.unstack([np.array([[dW1], [db1]]), [dW2], [db2]]), np.array([[dW3], [db3]])])
parameters.dW, parameters.db
>> [dW1, dW2, dW3], [db1, db2, db3]

Note

This method is used to convert optimization results expressed as a single array of stacked parameters, back into the list format used by the neural net.

validate_parameters() None[source]

Validate parameters.

2.8. Propagation.

This module contains the critical functionality to propagate information forward and backward through the neural net.

jenn.core.propagation.eye(n: int, m: int) ndarray[source]

Copy identify matrix of shape (n, n) m times.

jenn.core.propagation.first_layer_forward(X: np.ndarray, cache: Cache | None = None) None[source]

Compute input layer activations (in place).

Parameters:
  • X – training data inputs, array of shape (n_x, m)

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.first_layer_partials(X: np.ndarray, cache: Cache | None) None[source]

Compute input layer partial (in place).

Parameters:
  • X – training data inputs, array of shape (n_x, m)

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.gradient_enhancement(layer: int, parameters: Parameters, cache: Cache, data: Dataset) None[source]

Add gradient enhancement to backprop (in place).

Parameters:
  • layer – index of current layer.

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

  • data – object containing training and associated metadata

jenn.core.propagation.last_layer_backward(cache: Cache, data: Dataset) None[source]

Propagate backward through last layer (in place).

Parameters:
  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

  • data – object containing training and associated metadata

jenn.core.propagation.model_backward(data: Dataset, parameters: Parameters, cache: Cache, lambd: float = 0.0) None[source]

Propagate backward through all layers (in place).

Parameters:
  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

  • data – object containing training and associated metadata

  • lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)

jenn.core.propagation.model_forward(X: np.ndarray, parameters: Parameters, cache: Cache) np.ndarray[source]

Propagate forward in order to predict reponse(r).

Parameters:
  • X – training data inputs, array of shape (n_x, m)

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.model_partials_forward(X: np.ndarray, parameters: Parameters, cache: Cache) tuple[np.ndarray, np.ndarray][source]

Propagate forward in order to predict reponse(r) and partial(r).

Parameters:
  • X – training data inputs, array of shape (n_x, m)

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.next_layer_backward(layer: int, parameters: Parameters, cache: Cache, data: Dataset, lambd: float) None[source]

Propagate backward through next layer (in place).

Parameters:
  • layer – index of current layer.

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

  • data – object containing training and associated metadata

  • lambd – coefficient that multiplies regularization term in cost function

jenn.core.propagation.next_layer_forward(layer: int, parameters: Parameters, cache: Cache) None[source]

Propagate forward through one layer (in place).

Parameters:
  • layer – index of current layer.

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.next_layer_partials(layer: int, parameters: Parameters, cache: Cache) np.ndarray[source]

Compute j^th partial in place for one layer (in place).

Parameters:
  • layer – index of current layer.

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.partials_forward(X: np.ndarray, parameters: Parameters, cache: Cache) np.ndarray[source]

Propagate forward in order to predict partial(r).

Parameters:
  • X – training data inputs, array of shape (n_x, m)

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

2.9. Training.

This class implements the core algorithm responsible for training the neural networks.

jenn.core.training.objective_function(X: np.ndarray, cost: Cost, parameters: Parameters, cache: Cache, stacked_params: np.ndarray) np.float64[source]

Evaluate cost function for training.

Parameters:
  • X – training data inputs, array of shape (n_x, m)

  • cost – cost function to be evaluated

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

  • stacked_params – neural network parameters returned by the optimizer, represented as single array of stacked parameters for all layers.

jenn.core.training.objective_gradient(data: Dataset, parameters: Parameters, cache: Cache, lambd: float, stacked_params: np.ndarray) np.ndarray[source]

Evaluate cost function gradient for backprop.

Parameters:
  • data – object containing training and associated metadata

  • parameters – object that stores neural net parameters for each layer

  • cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

  • lambd – coefficient that multiplies regularization term in cost function

  • gamma – coefficient that multiplies jacobian-enhancement term in cost function

  • stacked_params – neural network parameters returned by the optimizer, represented as single array of stacked parameters for all layers.

jenn.core.training.train_model(data: Dataset, parameters: Parameters, alpha: float = 0.05, beta: np.ndarray | float = 1.0, gamma: np.ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, max_iter: int = 200, batch_size: int | None = None, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_verbose: bool = False) dict[source]

Train neural net.

Note

If training is taking too long, it can be stopped gracefully by creating a local file called STOP in the running directory. Just be sure to delete it before the next run.

Parameters:
  • data – object containing training and associated metadata

  • parameters – object that stores neural net parameters for each layer

  • alpha – learning rate \(\alpha\)

  • beta – LSE coefficients [defaulted to one] (optional)

  • gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)

  • lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)

  • beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)

  • beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)

  • tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration

  • tol – stop when cost function doesn’t improve more than specified tolerance

  • max_count – stop when line search iterations exceed maximum count specified

  • epsilon_absolute – absolute error stopping criterion

  • epsilon_relative – relative error stopping criterion

  • epochs – number of passes through data

  • batch_size – mini batch size (if None, single batch with all data)

  • max_iter – maximum number of optimizer iterations allowed

  • shuffle – swhether to huffle data points or not

  • random_state – random seed (useful to make runs repeatable)

  • is_backtracking – whether or not to use backtracking during line search

  • is_verbose – print out progress for each iteration, each batch, each epoch

Returns:

cost function training history accessed as cost = history[epoch][batch][iter]