1. User API

This section describes the main API users are expected to interact with.

1.1. Model.

This module contains the main class to train a neural net and make predictions. It acts as an interface between the user and the core functions doing computations under-the-hood.

#################
# Example Usage #
#################

import jenn 

# Fit model
nn = jenn.model.NeuralNet(
    layer_sizes=[
        x_train.shape[0],  # input layer 
        7, 7,              # hidden layer(s) -- user defined
        y_train.shape[0]   # output layer 
     ],  
    ).fit(
        x_train, y_train, dydx_train, **kwargs # note: user must provide this
    )

# Predict response only 
y_pred = nn.predict(x_test)

# Predict partials only 
dydx_pred = nn.predict_partials(x_train)

# Predict response and partials in one step (preferred)
y_pred, dydx_pred = nn.evaluate(x_test) 

Note

The method evaluate() is preferred over separately calling predict() followed by predict_partials() whenever both the response and its partials are needed at the same point. This saves computations since, in the latter approach, forward propagation is unecessarily performed twice. Similarly, to avoid unecessary partial deerivative calculations, the predict() method should be preferred whenever only response values are needed. The method predict_partials() is provided for those situations where it is necessary to separate out Jacobian predictions, due to how some target optimization software architected for example.

class jenn.model.NeuralNet(layer_sizes: List[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]

Neural network model.

Parameters:

layer_sizes – number of nodes in each layer (including input/output layers)
hidden_activation – activation function used in hidden layers
output_activation – activation function used in output layer

evaluate(x: ndarray) → Tuple[ndarray, ndarray][source]

Predict responses and their partials.

Parameters:: x – vectorized inputs, array of shape (n_x, m)
Returns:: predicted response(s), array of shape (n_y, m)
Returns:: predicted partial(s), array of shape (n_y, n_x, m)

fit(x: ndarray, y: ndarray, dydx: ndarray | None = None, is_normalize: bool = False, alpha: float = 0.05, beta: ndarray | float = 1.0, gamma: ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, batch_size: int | None = None, max_iter: int = 1000, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_warmstart: bool = False, is_verbose: bool = False) → NeuralNet[source]

Train neural network.

Parameters:

x – training data inputs, array of shape (n_x, m)
y – training data outputs, array of shape (n_y, m)
dydx – training data Jacobian, array of shape (n_y, n_x, m)
is_normalize – normalize training by mean and variance
alpha – optimizer learning rate for line search
beta – LSE coefficients [defaulted to one] (optional)
gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)
beta1 – ADAM optimizer hyperparameter to control momentum
beta2 – ADAM optimizer hyperparameter to control momentum
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion
epochs – number of passes through data
batch_size – size of each batch for minibatch
max_iter – max number of optimizer iterations
shuffle – shuffle minibatches or not
random_state – control repeatability
is_backtracking – use backtracking line search or not
is_warmstart – do not initialize parameters
is_verbose – print out progress for each (iteration, batch, epoch)

Returns:

NeuralNet instance (self)

Warning

Normalization usually helps, except when the training data is made up of very small numbers. In that case, normalizing by the variance has the undesirable effect of dividing by a very small number and should not be used.

load(file: str | Path = 'parameters.json') → NeuralNet[source]: Load previously saved parameters from json file.

predict(x: ndarray) → ndarray[source]

Predict responses.

Parameters:: x – vectorized inputs, array of shape (n_x, m)
Returns:: predicted response(s), array of shape (n_y, m)

predict_partials(x: ndarray) → ndarray[source]

Predict partials.

Parameters:: x – vectorized inputs, array of shape (n_x, m)
Returns:: predicted partial(s), array of shape (n_y, n_x, m)

save(file: str | Path = 'parameters.json') → None[source]: Serialize parameters and save to JSON file.

1.2. Plotting.

This module provides optional but helpful utilities to assess goodness of fit and visualize trends.

#################
# Example Usage #
#################

import jenn 

# Assuming the following are available: 
x_train, y_train, dydx_train = _ # user provided
x_test, y_test, dydx_test = _    # user provided
nn = _                           # previously trained NeuralNet

# Show goodness of fit of the partials 
i = 0  # index of the response to plot
jenn.utils.plot.goodness_of_fit(
    y_true=dydx_test[i], 
    y_pred=nn.predict_partials(x_test)[i], 
    title="Partial Derivative: dy/dx (NN)"
)

# Example: visualize local trends
jenn.utils.plot.sensitivity_profiles(
    f=[nn.predict], 
    x_min=x_train.min(), 
    x_max=x_train.max(), 
    x_true=x_train, 
    y_true=y_train, 
    resolution=100, 
    legend=['nn'], 
    xlabels=['x'], 
    ylabels=['y'],
)

jenn.utils.plot.actual_by_predicted(y_pred: ndarray, y_true: ndarray, ax: Axes | None = None, figsize: Tuple[float, float] = (3.25, 3), title: str = '', fontsize: int = 9, alpha: float = 1.0) → Figure[source]

Create actual by predicted plot for a single response.

Parameters:

y_pred – predicted values, array of shape (m,)
y_true – true values, array of shape (m,)
ax – the matplotlib axes on which to plot the data
figsize – figure size
title – title of figure
fontsize – text size
alpha – transparency of dots (between 0 and 1)

Returns:

matplotlib figure instance

jenn.utils.plot.contours(func: Callable, lb: Tuple[float, float], ub: Tuple[float, float], x_train: ndarray | None = None, x_test: ndarray | None = None, figsize: Tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 0.5, title: str = '', xlabel: str = '', ylabel: str = '', levels: int = 20, resolution: int = 100, ax: Axes | None = None) → None | Figure[source]

Plot contours of a scalar function of two variables.

Parameters:

figsize – figure size
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure
xlabel – factor #1 label
ylabel – factor #2 label
levels – number of contour levels
resolution – line resolution
ax – the matplotlib axes on which to plot the data

Returns:

matplotlib figure instance

jenn.utils.plot.convergence(histories: List[Dict[str, Dict[str, List[float]]]], figsize: Tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '', legend: List[str] | None = None) → Figure | None[source]

Plot training history.

Parameters:

histories – training history for each model
figsize – subfigure size of each subplot
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure
legend – label for each model

Returns:

matplotlib figure instance

jenn.utils.plot.goodness_of_fit(y_true: ndarray, y_pred: ndarray, percent_residuals: bool = False, figsize: Tuple[float, float] = (6.5, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '') → Figure[source]

Create ‘residual by predicted’ and ‘actual by predicted’ plots.

Parameters:

y_true – true values, array of shape (m,)
y_pred – predicted values, array of shape (m,)
percent_residuals – shows residuals as percentages if True
figsize – figure size
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure

Returns:

matplotlib figure instance

jenn.utils.plot.residuals_by_predicted(y_pred: ndarray, y_true: ndarray, percent_residuals: bool = False, ax: Axes | None = None, figsize: Tuple[float, float] = (3.25, 3), title: str = '', fontsize: int = 9, alpha: float = 1.0) → Figure[source]

Create residual by predicted plot for a single response.

Parameters:

y_pred – predicted values, array of shape (m,)
y_true – true values, array of shape (m,)
percent_residuals – shows residuals as percentages if True
ax – the matplotlib axes on which to plot the data
figsize – figure size
title – title of figure
fontsize – text size
alpha – transparency of dots (between 0 and 1)

Returns:

matplotlib figure instance

jenn.utils.plot.sensitivity_profile(ax: Axes, x0: ndarray, y0: ndarray, x_pred: ndarray, y_pred: ndarray | List[ndarray], x_true: ndarray | None = None, y_true: ndarray | None = None, alpha: float = 1.0, xlabel: str = 'x', ylabel: str = 'y', legend: List[str] | None = None, figsize: Tuple[float, float] = (6.5, 3), fontsize: int = 9, show_cursor: bool = True) → Figure[source]

Plot sensitivity profile for a single input, single output.

Parameters:

ax – the matplotlib axes on which to plot the data
x0 – point at which the profile is centered, array of shape (1,)
y0 – model evaluated as x0, list of arrays of shape (1,)
x_pred – input values for prediction, an array of shape (m,)
y_pred – predicted output values for each model, list of arrays of shape (m,)
x_true – inputs value of actual data, array of shape (m, n_x)
y_true – output values of actual data. An array of shape (m,)
alpha – transparency of dots (between 0 and 1)
xlabel – label of x-axis
ylabel – label of y-axis
legend – legend name of each model
figsize – figure size
fontsize – text size
show_cursor – show x0 as a red dot (or not)

Returns:

matplotlib figure instance

jenn.utils.plot.sensitivity_profiles(f: Callable | List[Callable], x_min: ndarray, x_max: ndarray, x0: ndarray | None = None, x_true: ndarray | None = None, y_true: ndarray | None = None, figsize: Tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '', xlabels: List[str] | None = None, ylabels: List[str] | None = None, legend: List[str] | None = None, resolution: int = 100, show_cursor: bool = True) → Figure[source]

Plot grid of all outputs vs. all inputs evaluated at x0.

Parameters:

f – callable function(s) for evaluating y_pred = f_pred(x)
x0 – point at which to evaluate profiles, array of shape (n_x, 1)
x_true – inputs at which y_true is evaluated, array of shape (n_x, m)
y_true – true values, array of shape (n_y, m)
figsize – figure size
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure
xlabels – x-axis labels
ylabels – y-axis labels

resolution: line resolution :param legend: legend labels for each model :param show_cursor: show x0 as a red dot (or not)

1.3. Metrics.

jenn.utils.metrics.r_square(y_pred: ndarray, y_true: ndarray) → ndarray[source]

Compute R-square value for each output.

Parameters:

y_pred – predicted values, array of shape (n_y, m)
y_true – actuial values, array of shape (n_y, m)

Returns:

R-Squared values for each predicted reponse

1.4. Synthetic Data.

This module provide synthetic test functions that can be used to generate exmaple data for illustration and testing. Simply inherit from the base class to implement new test functions.

#################
# Example Usage #
#################

import jenn 

(
    x_train, 
    y_train, 
    dydx_train,
) = jenn.synthetic.Sinusoid.sample(
    m_lhs=0,    # number latin hypercube samples 
    m_levels=4, # number of full factorial levels per factor
    lb=-3.14,   # lower bound of domain 
    ub=3.14,    # upper bound of domain 
)

(
    x_test, 
    y_test, 
    dydx_test,
) = jenn.synthetic.Sinusoid.sample(
    m_lhs=30, 
    m_levels=0, 
    lb=-3.14,
    ub=3.14,
)

class jenn.synthetic.Linear[source]

Linear function.

\[f(x) = \beta_0 + \sum_{i=1}^p \beta_i x_i\]

classmethod evaluate(x: ndarray, a: float | ndarray = 1.0, b: float = 0.0) → ndarray[source]

Evaluate function.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: response, array of shape (n_y, m)

classmethod first_derivative(x: ndarray, a: float | ndarray = 1.0, b: float = 0.0) → ndarray[source]

Evaluate partial derivative.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod sample(m_lhs: int = 100, m_levels: int = 0, lb: ndarray | float = -1.0, ub: ndarray | float = 1.0, dx: float | None = 1e-06, random_state: int | None = None) → Tuple[ndarray, ndarray, ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:

m_lhs – number of latin hypercube samples
m_levels – number of levels per factor for full factorial
lb – lower bound on the factors
ub – upper bound on the factors
dx – finite difference step size (if None, analytical partials are used)
random_state – random seed (for repeatability)

class jenn.synthetic.Parabola[source]

Parabolic function.

\[f(x) = \frac{1}{n} \sum_{i=1}^p (x_i - {x_0}_i)^2\]

classmethod evaluate(x: ndarray, x0: ndarray | float = 0.0) → ndarray[source]

Evaluate function.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: response, array of shape (n_y, m)

classmethod first_derivative(x: ndarray, x0: ndarray | float = 0.0) → ndarray[source]

Evaluate partial derivative.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod sample(m_lhs: int = 100, m_levels: int = 0, lb: ndarray | float = -1.0, ub: ndarray | float = 1.0, dx: float | None = 1e-06, random_state: int | None = None) → Tuple[ndarray, ndarray, ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:

m_lhs – number of latin hypercube samples
m_levels – number of levels per factor for full factorial
lb – lower bound on the factors
ub – upper bound on the factors
dx – finite difference step size (if None, analytical partials are used)
random_state – random seed (for repeatability)

class jenn.synthetic.Rastrigin[source]

Rastrigin function.

\[f(x) = \sum_{i=1}^p ( x_i^2 - 10 \cos(2\pi x_i) )\]

classmethod evaluate(x: ndarray) → ndarray[source]

Evaluate function.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: response, array of shape (n_y, m)

classmethod first_derivative(x: ndarray) → ndarray[source]

Evaluate partial derivative.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod sample(m_lhs: int = 100, m_levels: int = 0, lb: ndarray | float = array([-1., -1.]), ub: ndarray | float = array([1.5, 1.5]), dx: float | None = 1e-06, random_state: int | None = None) → Tuple[ndarray, ndarray, ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:

m_lhs – number of latin hypercube samples
m_levels – number of levels per factor for full factorial
lb – lower bound on the factors
ub – upper bound on the factors
dx – finite difference step size (if None, analytical partials are used)
random_state – random seed (for repeatability)

class jenn.synthetic.Rosenbrock[source]

Banana Rosenbrock function.

\[f(x) = (1 - x_1)^2 + 100 (x_2 - x_1^2)^ 2\]

classmethod evaluate(x: ndarray) → ndarray[source]

Evaluate function.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: response, array of shape (n_y, m)

classmethod first_derivative(x: ndarray) → ndarray[source]

Evaluate partial derivative.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod sample(m_lhs: int = 100, m_levels: int = 0, lb: ndarray | float = array([-2., -2.]), ub: ndarray | float = array([2., 2.]), dx: float | None = 1e-06, random_state: int | None = None) → Tuple[ndarray, ndarray, ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:

m_lhs – number of latin hypercube samples
m_levels – number of levels per factor for full factorial
lb – lower bound on the factors
ub – upper bound on the factors
dx – finite difference step size (if None, analytical partials are used)
random_state – random seed (for repeatability)

class jenn.synthetic.Sinusoid[source]

Sinusoidal function.

\[f(x) = x \sin(x)\]

classmethod evaluate(x: ndarray) → ndarray[source]

Evaluate function.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: response, array of shape (n_y, m)

classmethod first_derivative(x: ndarray) → ndarray[source]

Evaluate partial derivative.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod sample(m_lhs: int = 100, m_levels: int = 0, lb: ndarray | float = -3.141592653589793, ub: ndarray | float = 3.141592653589793, dx: float | None = 1e-06, random_state: int | None = None) → Tuple[ndarray, ndarray, ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:

m_lhs – number of latin hypercube samples
m_levels – number of levels per factor for full factorial
lb – lower bound on the factors
ub – upper bound on the factors
dx – finite difference step size (if None, analytical partials are used)
random_state – random seed (for repeatability)

class jenn.synthetic.TestFunction[source]

Test function base class.

abstract evaluate(x: ndarray) → ndarray[source]

Evaluate function.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: response, array of shape (n_y, m)

abstract first_derivative(x: ndarray) → ndarray[source]

Evaluate partial derivative.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod first_derivative_FD(x: ndarray, dx: float = 1e-06) → ndarray[source]

Evaluate partial derivative using finite difference.

Parameters:: x – inputs, array of shape (n_x, m)
Returns:: partials, array of shape (n_y, n_x, m)

classmethod sample(m_lhs: int, m_levels: int, lb: ndarray | float, ub: ndarray | float, dx: float | None = 1e-06, random_state: int | None = None) → Tuple[ndarray, ndarray, ndarray][source]

Generate synthetic data by sampling the test function.

Parameters:

m_lhs – number of latin hypercube samples
m_levels – number of levels per factor for full factorial
lb – lower bound on the factors
ub – upper bound on the factors
dx – finite difference step size (if None, analytical partials are used)
random_state – random seed (for repeatability)

2. Core API

The core API implements all theory described in the paper. This section is intended for developers.

2.1. Activation.

This module implements activation functions used by the neural network.

class jenn.core.activation.Activation[source]

Activation function base class.

abstract classmethod evaluate(x: ndarray, y: ndarray | None = None) → ndarray[source]

Evaluate activation function.

Parameters:

x – input array at which to evaluate the function
y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

abstract classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) → ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

abstract classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) → ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

class jenn.core.activation.Linear[source]

Linear activation function.

\[y = x\]

classmethod evaluate(x: ndarray, y: ndarray | None = None) → ndarray[source]

Evaluate activation function.

Parameters:

x – input array at which to evaluate the function
y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) → ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) → ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

class jenn.core.activation.Relu[source]

Rectified linear unit activation.

\[\begin{split}y = \begin{cases} x & \text{if}~ x \ge 0 \\ 0 & \text{otherwise} \end{cases}\end{split}\]

classmethod evaluate(x: ndarray, y: ndarray | None = None) → ndarray[source]

Evaluate activation function.

Parameters:

x – input array at which to evaluate the function
y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) → ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) → ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

class jenn.core.activation.Tanh[source]

Hyperbolic tangent.

\[y = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]

classmethod evaluate(x: ndarray, y: ndarray | None = None) → ndarray[source]

Evaluate activation function.

Parameters:

x – input array at which to evaluate the function
y – output array in which to write the results (optional)

Returns:

activation function evaluated at x (as new array if y not provided as input)

classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) → ndarray[source]

Evaluate 1st derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)

Returns:

1st derivative (as new array if dy not provided as input)

classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) → ndarray[source]

Evaluate 2nd derivative of activation function.

Parameters:

x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)

Returns:

2nd derivative (as new array if ddy not provided as input)

2.2. Cache.

This module defines a convenience class to all quantities computed during forward propagation, so they don’t have to be recomputed again during backward propgation. See paper for details and notation.

class jenn.core.cache.Cache(layer_sizes: List[int], m: int = 1)[source]

Neural net cache.

A cache s neural net quantities computed during forward prop for each layer, so they don’t have to be recomputed again during backprop. This makes the algorithm faster.

Warning

The attributes of this class are not protected. It’s possible to overwrite them instead of updating them in place. To ensure that an array is updated in place, use the numpy [:] syntax:

cache = Cache(shapes)
layer_1_activations = cache.A[1]
layer_1_activations[:] = new_array_values  # note [:]

Note

The variables and their symbols refer to the theory in the companion paper for this library.

Parameters:

layer_sizes – number of nodes in each layer (including input/output layers)
m – number of examples (used to preallocate arrays)

Variables:

Z (List[numpy.ndarray]) – \(Z^{[l]} \in \mathbb{R}^{n^{[l]}\times m}~\forall~ l = 1 \dots L\)
Z_prime (List[numpy.ndarray]) – \({Z^\prime}^{[l]} \in \mathbb{R}^{n^{[l]}\times n_x \times m}~\forall~ l = 1 \dots L\)
A (List[numpy.ndarray]) – \(A^{[l]} = g(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)
A_prime (List[numpy.ndarray]) – \({A^\prime}^{[l]} = g^\prime(Z^{[l]})Z^{\prime[l]} \in \mathbb{R}^{n^{[l]}\times n_x \times m}\)
G_prime (List[numpy.ndarray]) – \(G^{\prime} = g^{\prime}(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)
G_prime_prime (List[numpy.ndarray]) – \(G^{\prime\prime} = g^{\prime\prime}(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}\)
dA (List[numpy.ndarray]) – \({\partial \mathcal{J}}/{dA^{[l]}} \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)
dA_prime – \({\partial \mathcal{J}}/{dA^{\prime[l]}} \in \mathbb{R}^{n^{[l]} \times n_x \times m}~\forall~ l = 1 \dots L\)

property m: int: Return number of examples.

property n_x: int: Return number of inputs.

property n_y: int: Return number of outputs.

2.3. Cost Function.

This module contains class and methods to efficiently compute the neural net cost function used for training. It is a modified version of the Least Squared Estimator (LSE), augmented with a penalty function for regularization and another term which accounts for Jacobian prediction error. See paper for details and notation.

class jenn.core.cost.Cost(data: Dataset, parameters: Parameters, lambd: float = 0.0)[source]

Neural Network cost function.

Parameters:

data – Dataset object containing training data (and associated metadata)
parameters – object containing neural net parameters (and associated metadata) for each layer
lambd – regularization coefficient to avoid overfitting

evaluate(Y_pred: ndarray, J_pred: ndarray | None = None) → float64[source]

Evaluate cost function.

Parameters:

Y_pred – predicted outputs \(A^{[L]} \in \mathbb{R}^{n_x \times m}\)
J_pred – predicted Jacobian \(A^{\prime[L]} \in \mathbb{R}^{n_y \times n_x \times m}\)

class jenn.core.cost.GradientEnhancement(J_true: ndarray, J_weights: ndarray | float = 1.0)[source]

Least Squares Estimator for partials.

Parameters:

J_true – training data jacobian \(Y^{\prime} \in \mathbb{R}^{n_y \times m}\)
J_weights – weights by which to prioritize partials (optional)

evaluate(J_pred: ndarray) → float64[source]

Compute least squares estimator for the partials.

Parameters:: J_pred – predicted Jacobian \(A^{\prime[L]} \in \mathbb{R}^{n_y \times n_x \times m}\)

class jenn.core.cost.Regularization(weights: List[ndarray], lambd: float = 0.0)[source]

Compute regularization penalty.

evaluate() → float[source]

Compute L2 norm penalty.

Parameters:

weights – neural parameters \(W^{[l]} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\) associated with each layer
lambd – regularization coefficient \(\lambda \in \mathbb{R}\) (hyperparameter to be tuned)

class jenn.core.cost.SquaredLoss(Y_true: ndarray, Y_weights: ndarray | float = 1.0)[source]

Least Squares Estimator.

Parameters:

Y_true – training data outputs \(Y \in \mathbb{R}^{n_y \times m}\)
Y_weights – weights by which to prioritize data points (optional)

evaluate(Y_pred: ndarray) → float64[source]

Compute least squares estimator of the states in place.

Parameters:: Y_pred – predicted outputs \(A^{[L]} \in \mathbb{R}^{n_y \times m}\)

2.4. Data.

This module contains convenience utilities to manage and handle training data.

class jenn.core.data.Dataset(X: ndarray, Y: ndarray, J: ndarray | None = None, Y_weights: ndarray | float = 1.0, J_weights: ndarray | float = 1.0)[source]

Store training data and associated metadata for easy access.

Parameters:

X – training data outputs, array of shape (n_x, m)
Y – training data outputs, array of shape (n_y, m)
J – training data Jacobians, array of shape (n_y, n_x, m)

property avg_x: ndarray: Return mean of input data as array of shape (n_x, 1).

property avg_y: ndarray: Return mean of output data as array of shape (n_y, 1).

property m: int: Return number of training examples.

mini_batches(batch_size: int | None, shuffle: bool = True, random_state: int | None = None) → List[Dataset][source]

Breakup data into multiple batches and return list of Datasets.

Parameters:

batch_size – mini batch size (if None, single batch with all data)
shuffle – swhether to huffle data points or not
random_state – random seed (useful to make runs repeatable)

Returns:

list of Dataset representing data broken up in batches

property n_x: int: Return number of inputs.

property n_y: int: Return number of outputs.

normalize() → Dataset[source]: Return normalized Dataset.

set_weights(beta: ndarray | float = 1.0, gamma: ndarray | float = 1.0) → None[source]

Prioritize certain points more than others.

Rational: this can be used to reward the optimizer more in certain regions.

Parameters:

beta – multiplier(s) on Y
beta – multiplier(s) on J

property std_x: ndarray: Return standard dev of input data, array of shape (n_x, 1).

property std_y: ndarray: Return standard dev of output data, array of shape (n_y, 1).

jenn.core.data.avg(array: ndarray) → ndarray[source]

Compute mean and reshape as column array.

Parameters:: array – array of shape (-1, m)
Returns:: column array corresponding to mean of each row

jenn.core.data.denormalize(data: ndarray, mu: ndarray, sigma: ndarray) → ndarray[source]

Undo normalization.

Parameters:

data – normalized data, array of shape (-1, m)
mu – mean of the data, array of shape (-1, 1)
sigma – std deviation of the data, array of shape (-1, 1)

Returns:

denormalized data, array of shape (-1, m)

jenn.core.data.denormalize_partials(partials: ndarray, sigma_x: ndarray, sigma_y: ndarray) → ndarray[source]

Undo normalization of partials.

Parameters:

partials – normalized training data partials \(\bar{J}\in\mathbb{R}^{n_y\times n_x \times m}\)
sigma_x – std dev of training data factors \(\sigma_x\), array of shape (-1, 1)
sigma_y – std dev of training data responses \(\sigma_y\), array of shape (-1, 1)

Returns:

denormalized partials, array of shape (n_y, n_x, m)

jenn.core.data.mini_batches(X: ndarray, batch_size: int | None, shuffle: bool = True, random_state: int | None = None) → List[Tuple[int, ...]][source]

Create randomized mini-batches.

Parameters:

X – training data input \(X\in\mathbb{R}^{n_x\times m}\)
batch_size – mini batch size (if None, single batch with all data)
shuffle – swhether to huffle data points or not
random_state – random seed (useful to make runs repeatable)

Returns:

list of tuples containing training data indices allocated to each batch

jenn.core.data.normalize(data: ndarray, mu: ndarray, sigma: ndarray) → ndarray[source]

Center data about mean and normalize by standard deviation.

Parameters:

data – data to be normalized, array of shape (-1, m)
mu – mean of the data, array of shape (-1, 1)
sigma – std deviation of the data, array of shape (-1, 1)

Returns:

normalized data, array of shape (-1, m)

jenn.core.data.normalize_partials(partials: ndarray | None, sigma_x: ndarray, sigma_y: ndarray) → ndarray | None[source]

Normalize partials.

Parameters:

partials – training data partials to be normalized \(J\in\mathbb{R}^{n_y\times n_x \times m}\)
sigma_x – std dev of training data factors \(\sigma_x\), array of shape (-1, 1)
sigma_y – std dev of training data responses \(\sigma_y\), array of shape (-1, 1)

Returns:

normalized partials, array of shape (n_y, n_x, m)

jenn.core.data.std(array: ndarray) → ndarray[source]

Compute standard deviation and reshape as column array.

Parameters:: array – array of shape (-1, m)
Returns:: column array corresponding to std dev of each row

2.5. Optimization.

This module implements gradient-based optimization using ADAM.

class jenn.core.optimization.ADAM(beta_1: float = 0.9, beta_2: float = 0.99)[source]

Take single step along the search direction as determined by ADAM.

Parameters \(\boldsymbol{x}\) are updated according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\) where \(\boldsymbol{s}\) is determined by ADAM in such a way to improve efficiency. This is accomplished making use of previous information (see paper).

Parameters:

beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)
beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)

class jenn.core.optimization.ADAMOptimizer(beta_1: float = 0.9, beta_2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000)[source]

Search for optimum using ADAM algorithm.

Parameters:

beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)
beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified

class jenn.core.optimization.Backtracking(update: Update, tau: float = 0.5, tol: float = 1e-06, max_count: int = 1000)[source]

Search for optimum along a search direction.

Parameters:

update – object that updates parameters according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified

__call__(params: ndarray, grads: ndarray, cost: Callable, learning_rate: float = 0.05) → ndarray[source]

Take multiple “update” steps along search direction.

Parameters:

params – parameters \(x\) to be updated, array of shape (n,)
grads – gradient \(\nabla_x f\) of objective function \(f\) w.r.t. each parameter, array of shape (n,)
cost – objective function \(f\)
learning_rate – maximum allowed step size \(\alpha \le \alpha_{max}\)

Returns:

updated parameters \(x\), array of shape (n,)

class jenn.core.optimization.GD[source]

Take single step along the search direction using gradient descent.

GD simply follows the steepest path according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\) where \(\boldsymbol{s} = \nabla_x f\)

class jenn.core.optimization.GDOptimizer(tau: float = 0.5, tol: float = 1e-06, max_count: int = 1000)[source]

Search for optimum using gradient descent.

Warning

This optimizer is very inefficient. It was intended as a baseline during development. It is not recommended. Use ADAM instead.

Parameters:

tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified

class jenn.core.optimization.LineSearch(update: Update)[source]

Take multiple steps of varying size by progressively varying \(\alpha\) along the search direction.

Parameters:: update – object that implements Update base class to update parameters according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

abstract __call__(params: ndarray, grads: ndarray, cost: Callable, learning_rate: float) → ndarray[source]

Take multiple steps along the search direction.

Parameters:

params – parameters to be updated, array of shape (n,)
grads – cost function gradient w.r.t. parameters, array of shape (n,)
cost – cost function, array of shape (1,)
learning_rate – initial step size \(\alpha\)

Returns:

new_params: updated parameters, array of shape (n,)

class jenn.core.optimization.Optimizer(line_search: LineSearch)[source]

Find optimum using gradient-based optimization.

Parameters:: line_search – object that implements algorithm to compute search direction \(\boldsymbol{s}\) given the gradient \(\nabla_x f\) at the current parameter values \(\boldsymbol{x}\) and take multiple steps along it to update them according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

minimize(x: ndarray, f: Callable, dfdx: Callable, alpha: float = 0.01, max_iter: int = 100, verbose: bool = False, epoch: int | None = None, batch: int | None = None, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12) → ndarray[source]

Minimize single objective function.

Parameters:

x – parameters to be updated, array of shape (n,)
f – cost function \(y = f(\boldsymbol{x})\)
alpha – learning rate \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
max_iter – maximum number of optimizer iterations allowed
verbose – whether or not to send progress output to standard out
epoch – the epoch in which this optimization is being run (for printing)
batch – the batch in which this optimization is being run (for printing)
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion

class jenn.core.optimization.Update[source]

Base class for line search.

Update parameters \(\boldsymbol{x}\) by taking a step along the search direction \(\boldsymbol{s}\) according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)

__call__(params: ndarray, grads: ndarray, alpha: float) → ndarray[source]

Take a single step along search direction.

Parameters:

params – parameters \(x\) to be updated
grads – gradient \(\nabla_x f\) of objective function \(f\) w.r.t. each parameter \(x\)
alpha – learning rate \(\alpha\)

2.6. Parameters.

This module defines a utility class to store and manage neural net parameters and metadata.

class jenn.core.parameters.Parameters(layer_sizes: List[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]

Neural network parameters.

Warning

The attributes of this class are not protected. It’s possible to overwrite them instead of updating them in place. To ensure that an array is updated in place, use the numpy [:] syntax:

parameters = Parameters(**kwargs)
layer_1_weights = parameters.W[1]
layer_1_weights[:] = new_array_values  # note [:]

Note

The variables and their symbols refer to the theory in the companion paper for this library.

Parameters:

layer_sizes – number of nodes in each layer (including input/output layers)
hidden_activation – activation function used in hidden layers
output_activation – activation function used in output layer

Variables:

W (List[np.ndarray]) – weights \(\boldsymbol{W} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\) for each layer
b (List[np.ndarray]) – biases \(\boldsymbol{b} \in \mathbb{R}^{n^{[l]} \times 1}\) for each layer
a (List[str]) – activation names for each layer
dW (List[np.ndarray]) – partials w.r.t. weight \(dL/dW^{[l]} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\)
db (List[np.ndarray]) – partials w.r.t. bias \(dL/db^{[l]} \in \mathbb{R}^{n^{[l]} \times 1}\)
mu_x (List[np.ndarray]) – mean of training data inputs used for normalization \(\mu_x \in \mathbb{R}^{n_x \times 1}\)
mu_y – mean of training data outputs used for normalization \(\mu_y \in \mathbb{R}^{n_y \times 1}\)
sigma_x (List[np.ndarray]) – standard deviation of training data inputs used for normalization \(\sigma_x \in \mathbb{R}^{n_x \times 1}\)
sigma_y (List[np.ndarray]) – standard deviation of training data outputs used for normalization \(\sigma_y \in \mathbb{R}^{n_y \times 1}\)

property L: int: Return number of layers.

initialize(random_state: int | None = None) → None[source]

Use He initialization to initialize parameters.

Parameters:: random_state – optional random seed (for repeatability)

property layers: Iterable[int]: Return iterator of index for each layer.

load(binary_file: str | Path = 'parameters.json') → None[source]: Load parameters from specified json file.

property n_x: int: Return number of inputs.

property n_y: int: Return number of outputs.

property partials: Iterable[int]: Return iterator of index for each partial.

save(binary_file: str | Path = 'parameters.json') → None[source]: Save parameters to specified json file.

stack() → ndarray[source]

Stack W, b into a single array.

parameters.stack()
>> np.array([[W1], [b1], [W2], [b2], [W3], [b3]])

Note

This method is used to convert the list format used by the neural net into a single array of stacked parameters for optimization.

stack_partials() → ndarray[source]

Stack backprop partials dW, db.

parameters.stack_partials()
>> np.array([[dW1], [db1], [dW2], [db2], [dW3], [db3]])

Note

This method is used to convert the list format used by the neural net into a single array of stacked parameters for optimization.

stack_partials_per_layer() → List[ndarray][source]

Stack backprop partials dW, db per layer.

parameters.stack_partials_per_layer()
>> [np.array([[dW1], [db1]]), np.array([[dW2], [db2]]), np.array([[dW3], [db3]]),]

stack_per_layer() → List[ndarray][source]

Stack W, b into a single array for each layer.

parameters.stack_per_layer()
>> [np.array([[W1], [b1]]), [W2], [b2]]), np.array([[W3], [b3]])]

unstack(parameters: ndarray | List[ndarray]) → None[source]

Unstack parameters W, b back into list of arrays.

Parameters:: parameters – neural network parameters as either a single array where all layers are stacked on top of each other or a list of stacked parameters for each layer.

# Unstack from single stack
parameters.unstack(np.array([[W1], [b1], [W2], [b2], [W3], [b3]]))
parameters.W, parameters.b
>> [W1, W2, W3], [b1, b2, b3]

# Unstack from list of stacks
parameters.unstack([np.array([[W1], [b1]]), [W2], [b2]]), np.array([[W3], [b3]])])
parameters.W, parameters.b
>> [W1, W2, W3], [b1, b2, b3]

Note

This method is used to convert optimization results expressed as a single array of stacked parameters, back into the list format used by the neural net.

unstack_partials(partials: ndarray | List[ndarray]) → None[source]

Unstack backprop partials dW, db back into list of arrays.

Parameters:: partials – neural network partials as either a single array where all layers are stacked on top of each other or a list of stacked parameters for each layer.

# Unstack from single stack
parameters.unstack(np.array([[dW1], [db1], [dW2], [db2], [dW3], [db3]]))
parameters.dW, parameters.db
>> [dW1, dW2, dW3], [db1, db2, db3]

# Unstack from list of stacks
parameters.unstack([np.array([[dW1], [db1]]), [dW2], [db2]]), np.array([[dW3], [db3]])])
parameters.dW, parameters.db
>> [dW1, dW2, dW3], [db1, db2, db3]

Note

This method is used to convert optimization results expressed as a single array of stacked parameters, back into the list format used by the neural net.

2.7. Propagation.

This module contains the critical functionality to propagate information forward and backward through the neural net.

jenn.core.propagation.eye(n: int, m: int) → ndarray[source]: Copy identify matrix of shape (n, n) m times.

jenn.core.propagation.first_layer_forward(X: ndarray, cache: Cache | None = None) → None[source]

Compute input layer activations (in place).

Parameters:

X – training data inputs, array of shape (n_x, m)
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.first_layer_partials(X: ndarray, cache: Cache | None) → None[source]

Compute input layer partial (in place).

Parameters:

X – training data inputs, array of shape (n_x, m)
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.gradient_enhancement(layer: int, parameters: Parameters, cache: Cache, data: Dataset) → None[source]

Add gradient enhancement to backprop (in place).

Parameters:

layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata

jenn.core.propagation.last_layer_backward(cache: Cache, data: Dataset) → None[source]

Propagate backward through last layer (in place).

Parameters:

cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata

jenn.core.propagation.model_backward(data: Dataset, parameters: Parameters, cache: Cache, lambd: float = 0.0) → None[source]

Propagate backward through all layers (in place).

Parameters:

parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)

jenn.core.propagation.model_forward(X: ndarray, parameters: Parameters, cache: Cache) → ndarray[source]

Propagate forward in order to predict reponse(r).

Parameters:

X – training data inputs, array of shape (n_x, m)
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.model_partials_forward(X: ndarray, parameters: Parameters, cache: Cache) → Tuple[ndarray, ndarray][source]

Propagate forward in order to predict reponse(r) and partial(r).

Parameters:

X – training data inputs, array of shape (n_x, m)
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.next_layer_backward(layer: int, parameters: Parameters, cache: Cache, data: Dataset, lambd: float) → None[source]

Propagate backward through next layer (in place).

Parameters:

layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata
lambd – coefficient that multiplies regularization term in cost function

jenn.core.propagation.next_layer_forward(layer: int, parameters: Parameters, cache: Cache) → None[source]

Propagate forward through one layer (in place).

Parameters:

layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.next_layer_partials(layer: int, parameters: Parameters, cache: Cache) → ndarray[source]

Compute j^th partial in place for one layer (in place).

Parameters:

layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

jenn.core.propagation.partials_forward(X: ndarray, parameters: Parameters, cache: Cache) → ndarray[source]

Propagate forward in order to predict partial(r).

Parameters:

X – training data inputs, array of shape (n_x, m)
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

2.8. Training.

This class implements the core algorithm responsible for training the neural networks.

jenn.core.training.objective_function(X: ndarray, cost: Cost, parameters: Parameters, cache: Cache, stacked_params: ndarray) → float64[source]

Evaluate cost function for training.

Parameters:: X – training data inputs, array of shape (n_x, m)

cost: cost function to be evaluated :param parameters: object that stores neural net parameters for each

layer

Parameters:: cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them

stacked_params: neural network parameters returned by the optimizer,: represented as single array of stacked parameters for all layers.

jenn.core.training.objective_gradient(data: Dataset, parameters: Parameters, cache: Cache, lambd: float, stacked_params: ndarray) → ndarray[source]

Evaluate cost function gradient for backprop.

Parameters:

data – object containing training and associated metadata
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
lambd – coefficient that multiplies regularization term in cost function
gamma – coefficient that multiplies jacobian-enhancement term in cost function
stacked_params – neural network parameters returned by the optimizer, represented as single array of stacked parameters for all layers.

jenn.core.training.train_model(data: Dataset, parameters: Parameters, alpha: float = 0.05, beta: ndarray | float = 1.0, gamma: ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, max_iter: int = 200, batch_size: int | None = None, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_verbose: bool = False) → dict[source]

Train neural net.

Parameters:

data – object containing training and associated metadata
parameters – object that stores neural net parameters for each layer
alpha – learning rate \(\alpha\)
beta – LSE coefficients [defaulted to one] (optional)
gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)
beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)
beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion
epochs – number of passes through data
batch_size – mini batch size (if None, single batch with all data)
max_iter – maximum number of optimizer iterations allowed
shuffle – swhether to huffle data points or not
random_state – random seed (useful to make runs repeatable)
is_backtracking – whether or not to use backtracking during line search
is_verbose – print out progress for each iteration, each batch, each epoch

Returns:

cost function training history accessed as cost = history[epoch][batch][iter]