1. User API
This section describes the main API users are expected to interact with.
- class jenn.NeuralNet(layer_sizes: list[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]
Neural network model.
- Parameters:
layer_sizes – number of nodes in each layer (including input/output layers)
hidden_activation – activation function used in hidden layers
output_activation – activation function used in output layer
- fit(x: np.ndarray, y: np.ndarray, dydx: np.ndarray | None = None, is_normalize: bool = False, alpha: float = 0.05, beta: np.ndarray | float = 1.0, gamma: np.ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, batch_size: int | None = None, max_iter: int = 1000, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_warmstart: bool = False, is_verbose: bool = False) Self[source]
Train neural network.
Note
If training is taking too long, it can be stopped gracefully by creating a local file called STOP in the running directory. Just be sure to delete it before the next run.
- Parameters:
x – training data inputs, array of shape (n_x, m)
y – training data outputs, array of shape (n_y, m)
dydx – training data Jacobian, array of shape (n_y, n_x, m)
is_normalize – normalize training by mean and variance
alpha – optimizer learning rate for line search
beta – LSE coefficients [defaulted to one] (optional)
gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)
beta1 – ADAM optimizer hyperparameter to control momentum
beta2 – ADAM optimizer hyperparameter to control momentum
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion
epochs – number of passes through data
batch_size – size of each batch for minibatch
max_iter – max number of optimizer iterations
shuffle – shuffle minibatches or not
random_state – control repeatability
is_backtracking – use backtracking line search or not
is_warmstart – do not initialize parameters
is_verbose – print out progress for each (iteration, batch, epoch)
- Returns:
NeuralNet instance (self)
Warning
Normalization usually helps, except when the training data is made up of very small numbers. In that case, normalizing by the variance has the undesirable effect of dividing by a very small number and should not be used.
- classmethod load(file: str | Path = 'parameters.json') NeuralNet[source]
Load serialized parameters into a new NeuralNet instance.
- predict(x: np.ndarray) np.ndarray[source]
Predict responses.
- Parameters:
x – vectorized inputs, array of shape (n_x, m)
- Returns:
predicted response(s), array of shape (n_y, m)
- jenn.plot_actual_by_predicted(y_pred: NDArray, y_true: NDArray, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, legend_fontsize: int = 7, legend_label: str = 'data', alpha: float = 0.5, ax: plt.Axes | None = None) Figure | SubFigure | None[source]
Plot predicted vs. actual value.
Note
This method uses ravel(). A NumPy array with shape \((n_y, m)\) will become \((n_y m,)\). This is useful to merge all responses in one plot. Use indexing to handle responses separately, e.g.
jenn.plot_actual_by_predicted(y_pred=model.predict(x=x_test[2]), y_true=y_test[2]).- Parameters:
y_pred – predicted values for each dataset, list of arrays of shape (m,)
y_true – true values for each dataset, list of arrays of shape (m,)
figsize – figure size
fontsize – text size to use for axis labels
fontsize – text size to use for legend labels
alpha – transparency of dots (between 0 and 1)
ax – the matplotlib axes on which to plot the data
- Returns:
matplotlib Figure instance
- jenn.plot_contours(func: Callable, x_min: np.ndarray, x_max: np.ndarray, x0: np.ndarray | None = None, x1_index: int = 0, x2_index: int = 1, y_index: int = 0, x_train: np.ndarray | None = None, x_test: np.ndarray | None = None, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 0.5, title: str = '', x1_label: str | None = None, x2_label: str | None = None, y_label: str | None = None, levels: int = 20, resolution: int = 100, show_colorbar: bool = False, ax: plt.Axes | None = None) Figure | SubFigure | None[source]
Plot contours of a scalar function of two variables, y = f(x1, x2).
Note
This method takes in a function of signature form y=f(x) and maps it onto a function of signature form y=f(x1, x2) such that the contours can be plotted.
- Parameters:
func – the function to be evaluate, y = f(x)
lb – lower bounds on x
ub – upper bounds on x
x1_index – index of x to use for factor #1
x2_index – index of x to use for factor #2
y_index – index of y to be plotted
x_train – option to overlay training data if provided
x_test – option to overlay test data if provided
figsize – figure size
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure
x1_label – factor #1 label
x2_label – factor #1 label
y_label – response label
levels – number of contour levels
resolution – line resolution
show_colorbar – show the colorbar
ax – the matplotlib axes on which to plot the data
- Returns:
matplotlib figure instance
- jenn.plot_convergence(histories: History | list[History], figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '', legend: list[str] | None = None, is_xlog: bool = False, is_ylog: bool = True, ax: plt.Axes | None = None) Figure | SubFigure | None[source]
Plot training history.
- Parameters:
histories – training history for each model
figsize – subfigure size of each subplot
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure
legend – label for each model
is_xlog – use log scale for x-axis
is_ylog – use log scale for y-axis
ax – the matplotlib axes on which to plot the data
- Returns:
matplotlib figure instance
- jenn.plot_goodness_of_fit(y_pred: NDArray, y_true: NDArray, title: str = '', percent: bool = False) Figure[source]
Make goodness of fit summary plots.
- jenn.plot_histogram(y_pred: NDArray, y_true: NDArray, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, legend_fontsize: int = 7, legend_label: str = 'data', alpha: float = 0.75, percent: bool = False, ax: plt.Axes | None = None) Figure | SubFigure | None[source]
Plot prediction error distribution.
Note
This method uses ravel(). A NumPy array with shape (n_y, m) becomes (n_y * m,).
- Parameters:
y_pred – predicted values for each dataset, list of arrays of shape (m,)
y_true – true values for each dataset, list of arrays of shape (m,)
figsize – figure size
fontsize – text size to use for axis labels
fontsize – text size to use for legend labels
alpha – transparency of dots (between 0 and 1)
percent – show residuals as percentages
ax – the matplotlib axes on which to plot the data
- Returns:
matplotlib Figure instance
- jenn.plot_residual_by_predicted(y_pred: NDArray, y_true: NDArray, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, legend_fontsize: int = 7, legend_label: str = 'data', alpha: float = 0.5, percent: bool = False, ax: plt.Axes | None = None) Figure | SubFigure | None[source]
Plot prediction error vs. predicted value.
Note
This method uses ravel(). A NumPy array with shape \((n_y, m)\) will become \((n_y m,)\). This is useful to merge all responses in one plot. Use indexing to handle responses separately, e.g.
jenn.plot_actual_by_predicted(y_pred=model.predict(x=x_test[2]), y_true=y_test[2]).- Parameters:
y_pred – predicted values for each dataset, list of arrays of shape (m,)
y_true – true values for each dataset, list of arrays of shape (m,)
figsize – figure size
fontsize – text size to use for axis labels
fontsize – text size to use for legend labels
alpha – transparency of dots (between 0 and 1)
percent – show residuals as percentages
ax – the matplotlib axes on which to plot the data
- Returns:
matplotlib Figure instance
- jenn.plot_sensitivity_profiles(func: Callable | list[Callable], x_min: np.ndarray, x_max: np.ndarray, x0: np.ndarray | None = None, x_true: np.ndarray | None = None, y_true: np.ndarray | None = None, figsize: tuple[float, float] = (3.25, 3), fontsize: int = 9, alpha: float = 1.0, title: str = '', xlabels: list[str] | None = None, ylabels: list[str] | None = None, legend_fontsize: int = 7, legend_label: str | list[str] | None = None, resolution: int = 100, show_cursor: bool = True) Figure[source]
Plot grid of all outputs vs. all inputs evaluated at x0.
- Parameters:
func – callable function(s) for evaluating y = func(x)
x_min – lower bound, array of shape (n_x, 1)
x_max – upper bound, array of shape (n_x, 1)
x0 – point of evaluation, array of shape (n_x, 1)
x_true – true data inputs, array of shape (n_x, m)
y_true – true data outputs, array of shape (n_y, m)
figsize – figure size
fontsize – text size
alpha – transparency of dots (between 0 and 1)
title – title of figure
xlabels – x-axis labels
ylabels – y-axis labels
resolution – line resolution
legend_fontsize – legend text size
legend_label – legend labels for each model in func list
show_cursor – show x0 as a red dot (or not)
- jenn.utilities.finite_difference(f: Callable, x: np.ndarray, dx: float = 1e-06) np.ndarray[source]
Evaluate partial derivative using finite difference.
- Parameters:
x – inputs, array of shape (n_x, m)
- Returns:
partials, array of shape (n_y, n_x, m)
- jenn.utilities.from_jmp(equation: str | Path) NeuralNet[source]
Load trained JMP model given formula.
Note
Expected equation assumed to be obtained from JMP using the “Save Profile Formulas” method and copy/paste.
Note
JMP yields a separate equation for each output. It does not provided a single equation that predicts all outputs at once. This function therefore yields NeuralNet objects that predict only a single output (consistent with JMP).
Warning
Order of inputs matches order used in JMP. Burden is on user to keep track of variable ordering.
- Parameters:
equation – either the equation itself or a filename containing it
- Returns:
jenn.model.NeuralNet object preloaded with the JMP parameters
- jenn.utilities.rbf(r: ndarray, epsilon: float = 0.0, out: ndarray | None = None) ndarray[source]
Compute Gaussian Radial Basis Function (RBF).
- Parameters:
r – radius from center of RBF
epsilon – hyperparameter
- jenn.utilities.sample(f: Callable, m_random: int, m_levels: int, lb: np.typing.ArrayLike, ub: np.typing.ArrayLike, dx: float = 1e-06, f_prime: Callable | None = None, random_state: int | None = None) tuple[np.ndarray, np.ndarray, np.ndarray][source]
Generate synthetic data by sampling the test function.
- Parameters:
f (Callable) – callable function to be sampled, y = f(x)
m_random (int) – number of random samples
m_levels (int) – number of levels per factor for full factorial
lb (np.ndarray) – lower bound on the factors
ub (np.ndarray) – upper bound on the factors
dx (float) – finite difference step size
f_prime (Callable) – callable 1st derivative to be sampled, y = f’(x)
random_state (int) – random seed (for repeatability)
- Returns:
sampled (x, y, y’)
- Return type:
np.ndarray
2. Core API
The core API implements all theory described in the paper. This section is intended for developers.
2.1. Model.
This module contains the main class to train a neural net and make predictions. It acts as an interface between the user and the core functions doing computations under-the-hood.
#################
# Example Usage #
#################
import jenn
# Fit model
model = jenn.NeuralNet(
layer_sizes=[
x_train.shape[0], # input layer
7, 7, # hidden layer(s) -- user defined
y_train.shape[0] # output layer
],
).fit(
x_train, y_train, dydx_train, **kwargs # note: user must provide this
)
# Predict response only
y_pred = model.predict(x_test)
# Predict partials only
dydx_pred = model.predict_partials(x_train)
# Predict response and partials in one step (preferred)
y_pred, dydx_pred = model(x_test)
Note
The __call__() method should be preferred over separately calling predict() followed by predict_partials() whenever both the response and its partials are needed at the same point. This saves computations since, in the latter approach, forward propagation is unecessarily performed twice. Similarly, to avoid unecessary partial deerivative calculations, the predict() method should be preferred whenever only response values are needed. The method predict_partials() is provided for those situations where it is necessary to separate out Jacobian predictions, due to how some target optimization software architected for example.
- class jenn.core.model.NeuralNet(layer_sizes: list[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]
Neural network model.
- Parameters:
layer_sizes – number of nodes in each layer (including input/output layers)
hidden_activation – activation function used in hidden layers
output_activation – activation function used in output layer
- fit(x: np.ndarray, y: np.ndarray, dydx: np.ndarray | None = None, is_normalize: bool = False, alpha: float = 0.05, beta: np.ndarray | float = 1.0, gamma: np.ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, batch_size: int | None = None, max_iter: int = 1000, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_warmstart: bool = False, is_verbose: bool = False) Self[source]
Train neural network.
Note
If training is taking too long, it can be stopped gracefully by creating a local file called STOP in the running directory. Just be sure to delete it before the next run.
- Parameters:
x – training data inputs, array of shape (n_x, m)
y – training data outputs, array of shape (n_y, m)
dydx – training data Jacobian, array of shape (n_y, n_x, m)
is_normalize – normalize training by mean and variance
alpha – optimizer learning rate for line search
beta – LSE coefficients [defaulted to one] (optional)
gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)
beta1 –
ADAM optimizer hyperparameter to control momentum
beta2 – ADAM optimizer hyperparameter to control momentum
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion
epochs – number of passes through data
batch_size – size of each batch for minibatch
max_iter – max number of optimizer iterations
shuffle – shuffle minibatches or not
random_state – control repeatability
is_backtracking – use backtracking line search or not
is_warmstart – do not initialize parameters
is_verbose – print out progress for each (iteration, batch, epoch)
- Returns:
NeuralNet instance (self)
Warning
Normalization usually helps, except when the training data is made up of very small numbers. In that case, normalizing by the variance has the undesirable effect of dividing by a very small number and should not be used.
- classmethod load(file: str | Path = 'parameters.json') NeuralNet[source]
Load serialized parameters into a new NeuralNet instance.
- predict(x: np.ndarray) np.ndarray[source]
Predict responses.
- Parameters:
x – vectorized inputs, array of shape (n_x, m)
- Returns:
predicted response(s), array of shape (n_y, m)
2.2. Activation.
This module implements activation functions used by the neural network.
- class jenn.core.activation.Activation[source]
Activation function base class.
- abstract classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]
Evaluate activation function.
- Parameters:
x – input array at which to evaluate the function
y – output array in which to write the results (optional)
- Returns:
activation function evaluated at x (as new array if y not provided as input)
- abstract classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]
Evaluate 1st derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)
- Returns:
1st derivative (as new array if dy not provided as input)
- abstract classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]
Evaluate 2nd derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)
- Returns:
2nd derivative (as new array if ddy not provided as input)
- class jenn.core.activation.Linear[source]
Linear activation function.
\[y = x\]- classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]
Evaluate activation function.
- Parameters:
x – input array at which to evaluate the function
y – output array in which to write the results (optional)
- Returns:
activation function evaluated at x (as new array if y not provided as input)
- classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]
Evaluate 1st derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)
- Returns:
1st derivative (as new array if dy not provided as input)
- classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]
Evaluate 2nd derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)
- Returns:
2nd derivative (as new array if ddy not provided as input)
- class jenn.core.activation.Relu[source]
Rectified linear unit activation.
\[\begin{split}y = \begin{cases} x & \text{if}~ x \ge 0 \\ 0 & \text{otherwise} \end{cases}\end{split}\]- classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]
Evaluate activation function.
- Parameters:
x – input array at which to evaluate the function
y – output array in which to write the results (optional)
- Returns:
activation function evaluated at x (as new array if y not provided as input)
- classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]
Evaluate 1st derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)
- Returns:
1st derivative (as new array if dy not provided as input)
- classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]
Evaluate 2nd derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)
- Returns:
2nd derivative (as new array if ddy not provided as input)
- class jenn.core.activation.Tanh[source]
Hyperbolic tangent.
\[y = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]- classmethod evaluate(x: ndarray, y: ndarray | None = None) ndarray[source]
Evaluate activation function.
- Parameters:
x – input array at which to evaluate the function
y – output array in which to write the results (optional)
- Returns:
activation function evaluated at x (as new array if y not provided as input)
- classmethod first_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None) ndarray[source]
Evaluate 1st derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
dy – output array in which to write the 1st derivative (optional)
- Returns:
1st derivative (as new array if dy not provided as input)
- classmethod second_derivative(x: ndarray, y: ndarray | None = None, dy: ndarray | None = None, ddy: ndarray | None = None) ndarray[source]
Evaluate 2nd derivative of activation function.
- Parameters:
x – input array at which to evaluate the function
y – response already evaluated at x (optional)
y – 1st derivative already evaluated at x (optional)
ddy – output array in which to write the 2nd derivative (optional)
- Returns:
2nd derivative (as new array if ddy not provided as input)
2.3. Cache.
This module defines a convenience class to all quantities computed during forward propagation, so they don’t have to be recomputed again during backward propgation. See paper for details and notation.
- class jenn.core.cache.Cache(layer_sizes: list[int], m: int = 1)[source]
Neural net cache.
A cache s neural net quantities computed during forward prop for each layer, so they don’t have to be recomputed again during backprop. This makes the algorithm faster.
Warning
The attributes of this class are not protected. It’s possible to overwrite them instead of updating them in place. To ensure that an array is updated in place, use the numpy [:] syntax:
cache = Cache(shapes) layer_1_activations = cache.A[1] layer_1_activations[:] = new_array_values # note [:]
Note
The variables and their symbols refer to the theory in the companion paper for this library.
- Parameters:
layer_sizes – number of nodes in each layer (including input/output layers)
m – number of examples (used to preallocate arrays)
- Variables:
Z (List[numpy.ndarray]) – \(Z^{[l]} \in \mathbb{R}^{n^{[l]}\times m}~\forall~ l = 1 \dots L\)
Z_prime (List[numpy.ndarray]) – \({Z^\prime}^{[l]} \in \mathbb{R}^{n^{[l]}\times n_x \times m}~\forall~ l = 1 \dots L\)
A (List[numpy.ndarray]) – \(A^{[l]} = g(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)
A_prime (List[numpy.ndarray]) – \({A^\prime}^{[l]} = g^\prime(Z^{[l]})Z^{\prime[l]} \in \mathbb{R}^{n^{[l]}\times n_x \times m}\)
G_prime (List[numpy.ndarray]) – \(G^{\prime} = g^{\prime}(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)
G_prime_prime (List[numpy.ndarray]) – \(G^{\prime\prime} = g^{\prime\prime}(Z^{[l]}) \in \mathbb{R}^{n^{[l]} \times m}\)
dA (List[numpy.ndarray]) – \({\partial \mathcal{J}}/{dA^{[l]}} \in \mathbb{R}^{n^{[l]} \times m}~\forall~ l = 1 \dots L\)
dA_prime – \({\partial \mathcal{J}}/{dA^{\prime[l]}} \in \mathbb{R}^{n^{[l]} \times n_x \times m}~\forall~ l = 1 \dots L\)
- property m: int
Return number of examples.
- property n_x: int
Return number of inputs.
- property n_y: int
Return number of outputs.
2.4. Cost Function.
This module contains class and methods to efficiently compute the neural net cost function used for training. It is a modified version of the Least Squared Estimator (LSE), augmented with a penalty function for regularization and another term which accounts for Jacobian prediction error. See paper for details and notation.
- class jenn.core.cost.Cost(data: Dataset, parameters: Parameters, lambd: float = 0.0)[source]
Neural Network cost function.
- Parameters:
data – Dataset object containing training data (and associated metadata)
parameters – object containing neural net parameters (and associated metadata) for each layer
lambd – regularization coefficient to avoid overfitting
- class jenn.core.cost.GradientEnhancement(J_true: ndarray, J_weights: ndarray | float = 1.0)[source]
Least Squares Estimator for partials.
- Parameters:
J_true – training data jacobian \(Y^{\prime} \in \mathbb{R}^{n_y \times m}\)
J_weights – weights by which to prioritize partials (optional)
- class jenn.core.cost.Regularization(weights: list[numpy.ndarray], lambd: float = 0.0)[source]
Compute regularization penalty.
- class jenn.core.cost.SquaredLoss(Y_true: ndarray, Y_weights: ndarray | float = 1.0)[source]
Least Squares Estimator.
- Parameters:
Y_true – training data outputs \(Y \in \mathbb{R}^{n_y \times m}\)
Y_weights – weights by which to prioritize data points (optional)
2.5. Data.
This module contains convenience utilities to manage and handle training data.
- class jenn.core.data.Dataset(X: ndarray, Y: ndarray, J: ndarray | None = None, Y_weights: ndarray | float = 1.0, J_weights: ndarray | float = 1.0)[source]
Store training data and associated metadata for easy access.
- Parameters:
X – training data outputs, array of shape (n_x, m)
Y – training data outputs, array of shape (n_y, m)
J – training data Jacobians, array of shape (n_y, n_x, m)
- property avg_x: ndarray
Return mean of input data as array of shape (n_x, 1).
- property avg_y: ndarray
Return mean of output data as array of shape (n_y, 1).
- property m: int
Return number of training examples.
- mini_batches(batch_size: int | None, shuffle: bool = True, random_state: int | None = None) list[jenn.core.data.Dataset][source]
Breakup data into multiple batches and return list of Datasets.
- Parameters:
batch_size – mini batch size (if None, single batch with all data)
shuffle – swhether to huffle data points or not
random_state – random seed (useful to make runs repeatable)
- Returns:
list of Dataset representing data broken up in batches
- property n_x: int
Return number of inputs.
- property n_y: int
Return number of outputs.
- set_weights(beta: ndarray | float = 1.0, gamma: ndarray | float = 1.0) None[source]
Prioritize certain points more than others.
Rational: this can be used to reward the optimizer more in certain regions.
- Parameters:
beta – multiplier(s) on Y
beta – multiplier(s) on J
- property std_x: ndarray
Return standard dev of input data, array of shape (n_x, 1).
- property std_y: ndarray
Return standard dev of output data, array of shape (n_y, 1).
- jenn.core.data.avg(array: ndarray) ndarray[source]
Compute mean and reshape as column array.
- Parameters:
array – array of shape (-1, m)
- Returns:
column array corresponding to mean of each row
- jenn.core.data.denormalize(data: ndarray, mu: ndarray, sigma: ndarray) ndarray[source]
Undo normalization.
- Parameters:
data – normalized data, array of shape (-1, m)
mu – mean of the data, array of shape (-1, 1)
sigma – std deviation of the data, array of shape (-1, 1)
- Returns:
denormalized data, array of shape (-1, m)
- jenn.core.data.denormalize_partials(partials: ndarray, sigma_x: ndarray, sigma_y: ndarray) ndarray[source]
Undo normalization of partials.
- Parameters:
partials – normalized training data partials \(\bar{J}\in\mathbb{R}^{n_y\times n_x \times m}\)
sigma_x – std dev of training data factors \(\sigma_x\), array of shape (-1, 1)
sigma_y – std dev of training data responses \(\sigma_y\), array of shape (-1, 1)
- Returns:
denormalized partials, array of shape (n_y, n_x, m)
- jenn.core.data.mini_batches(X: ndarray, batch_size: int | None, shuffle: bool = True, random_state: int | None = None) list[tuple[int, ...]][source]
Create randomized mini-batches.
- Parameters:
X – training data input \(X\in\mathbb{R}^{n_x\times m}\)
batch_size – mini batch size (if None, single batch with all data)
shuffle – swhether to huffle data points or not
random_state – random seed (useful to make runs repeatable)
- Returns:
list of tuples containing training data indices allocated to each batch
- jenn.core.data.normalize(data: ndarray, mu: ndarray, sigma: ndarray) ndarray[source]
Center data about mean and normalize by standard deviation.
- Parameters:
data – data to be normalized, array of shape (-1, m)
mu – mean of the data, array of shape (-1, 1)
sigma – std deviation of the data, array of shape (-1, 1)
- Returns:
normalized data, array of shape (-1, m)
- jenn.core.data.normalize_partials(partials: ndarray | None, sigma_x: ndarray, sigma_y: ndarray) ndarray | None[source]
Normalize partials.
- Parameters:
partials – training data partials to be normalized \(J\in\mathbb{R}^{n_y\times n_x \times m}\)
sigma_x – std dev of training data factors \(\sigma_x\), array of shape (-1, 1)
sigma_y – std dev of training data responses \(\sigma_y\), array of shape (-1, 1)
- Returns:
normalized partials, array of shape (n_y, n_x, m)
- jenn.core.data.std(array: ndarray) ndarray[source]
Compute standard deviation and reshape as column array.
- Parameters:
array – array of shape (-1, m)
- Returns:
column array corresponding to std dev of each row
2.6. Optimization.
This module implements gradient-based optimization using ADAM.
- class jenn.core.optimization.ADAM(beta_1: float = 0.9, beta_2: float = 0.99)[source]
Take single step along the search direction as determined by ADAM.
Parameters \(\boldsymbol{x}\) are updated according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\) where \(\boldsymbol{s}\) is determined by ADAM in such a way to improve efficiency. This is accomplished making use of previous information (see paper).
- Parameters:
beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)
beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)
- class jenn.core.optimization.ADAMOptimizer(beta_1: float = 0.9, beta_2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 10)[source]
Search for optimum using ADAM algorithm.
- Parameters:
beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)
beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
- class jenn.core.optimization.Backtracking(update: Update, tau: float = 0.5, tol: float = 1e-06, max_count: int = 10)[source]
Search for optimum along a search direction.
- Parameters:
update – object that updates parameters according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
- __call__(x0: np.ndarray, y0: np.ndarray, search_direction: np.ndarray, cost_function: Callable, learning_rate: float = 0.05) tuple[np.ndarray, np.ndarray][source]
Take multiple “update” steps along search direction.
- Parameters:
x0 – initial value of parameters to be updated, array of shape (n,)
y0 – initial value of cost function evaluated at x0, array of shape (n,)
cost – objective function \(f\)
grads – gradient \(\nabla_x f\) of objective function \(f\) w.r.t. each parameter, array of shape (n,)
learning_rate – maximum allowed step size \(\alpha \le \alpha_{max}\)
- Returns:
updated parameters and cost \(x, y\), 2 x array of shape (n,)
- class jenn.core.optimization.GD[source]
Take single step along the search direction using gradient descent.
GD simply follows the steepest path according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\) where \(\boldsymbol{s} = \nabla_x f\)
- class jenn.core.optimization.GDOptimizer(tau: float = 0.5, tol: float = 1e-06, max_count: int = 10)[source]
Search for optimum using gradient descent.
Warning
This optimizer is very inefficient. It was intended as a baseline during development. It is not recommended. Use ADAM instead.
- Parameters:
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
- class jenn.core.optimization.LineSearch(update: Update)[source]
Take multiple steps of varying size by progressively varying \(\alpha\) along the search direction.
- Parameters:
update – object that implements Update base class to update parameters according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
- abstract __call__(x0: np.ndarray, y0: np.ndarray, search_direction: np.ndarray, cost_function: Callable, learning_rate: float) tuple[np.ndarray, np.ndarray][source]
Take multiple steps along the search direction.
- Parameters:
x0 – initial value of parameters to be updated, array of shape (n,)
y0 – initial value of cost function evaluated at x0, array of shape (n,)
grads – cost function gradient w.r.t. parameters, array of shape (n,)
cost – cost function, array of shape (1,)
learning_rate – initial step size \(\alpha\)
- Returns:
updated parameters and cost, 2 x array of shape (n,)
- class jenn.core.optimization.Optimizer(line_search: LineSearch)[source]
Find optimum using gradient-based optimization.
- Parameters:
line_search – object that implements algorithm to compute search direction \(\boldsymbol{s}\) given the gradient \(\nabla_x f\) at the current parameter values \(\boldsymbol{x}\) and take multiple steps along it to update them according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
- minimize(x: np.ndarray, f: Callable, dfdx: Callable, alpha: float = 0.01, max_iter: int = 100, verbose: bool = False, epoch: int | None = None, batch: int | None = None, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, N1_max: int = 100, N2_max: int = 100) np.ndarray[source]
Minimize single objective function.
- Parameters:
x – parameters to be updated, array of shape (n,)
f – cost function \(y = f(\boldsymbol{x})\)
alpha – learning rate \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
max_iter – maximum number of optimizer iterations allowed
verbose – whether or not to send progress output to standard out
epoch – the epoch in which this optimization is being run (for printing)
batch – the batch in which this optimization is being run (for printing)
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion
N1_max – number of iterations for which absolute criterion must hold true before stop
N2_max – number of iterations for which relative criterion must hold true before stop
- class jenn.core.optimization.Update[source]
Base class for line search.
Update parameters \(\boldsymbol{x}\) by taking a step along the search direction \(\boldsymbol{s}\) according to \(\boldsymbol{x} := \boldsymbol{x} + \alpha \boldsymbol{s}\)
- __call__(params: np.ndarray, grads: np.ndarray, alpha: float, **kwargs: dict[str, Any]) np.ndarray[source]
Take a single step along search direction.
- Parameters:
params – parameters \(x\) to be updated
grads – gradient \(\nabla_x f\) of objective function \(f\) w.r.t. each parameter \(x\)
alpha – learning rate \(\alpha\)
2.7. Parameters.
This module defines a utility class to store and manage neural net parameters and metadata.
- class jenn.core.parameters.Parameters(layer_sizes: list[int], hidden_activation: str = 'tanh', output_activation: str = 'linear')[source]
Neural network parameters.
Warning
The attributes of this class are not protected. It’s possible to overwrite them instead of updating them in place. To ensure that an array is updated in place, use the numpy [:] syntax:
parameters = Parameters(**kwargs) layer_1_weights = parameters.W[1] layer_1_weights[:] = new_array_values # note [:]
Note
The variables and their symbols refer to the theory in the companion paper for this library.
- Parameters:
layer_sizes – number of nodes in each layer (including input/output layers)
hidden_activation – activation function used in hidden layers
output_activation – activation function used in output layer
- Variables:
W (List[np.ndarray]) – weights \(\boldsymbol{W} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\) for each layer
b (List[np.ndarray]) – biases \(\boldsymbol{b} \in \mathbb{R}^{n^{[l]} \times 1}\) for each layer
a (List[str]) – activation names for each layer
dW (List[np.ndarray]) – partials w.r.t. weight \(dL/dW^{[l]} \in \mathbb{R}^{n^{[l]} \times n^{[l-1]}}\)
db (List[np.ndarray]) – partials w.r.t. bias \(dL/db^{[l]} \in \mathbb{R}^{n^{[l]} \times 1}\)
mu_x (List[np.ndarray]) – mean of training data inputs used for normalization \(\mu_x \in \mathbb{R}^{n_x \times 1}\)
mu_y – mean of training data outputs used for normalization \(\mu_y \in \mathbb{R}^{n_y \times 1}\)
sigma_x (List[np.ndarray]) – standard deviation of training data inputs used for normalization \(\sigma_x \in \mathbb{R}^{n_x \times 1}\)
sigma_y (List[np.ndarray]) – standard deviation of training data outputs used for normalization \(\sigma_y \in \mathbb{R}^{n_y \times 1}\)
- property L: int
Return number of layers.
- initialize(random_state: int | None = None) None[source]
Use He initialization to initialize parameters.
- Parameters:
random_state – optional random seed (for repeatability)
- property layers: Iterable[int]
Return iterator of index for each layer.
- classmethod load(binary_file: str | Path = 'parameters.json') Parameters[source]
Load serialized parameters into a new Parameters instance.
- Parameters:
binary_file (str | Path) – JSON file containing saved parameters
- Returns:
a new instance of Parameters
- Return type:
- property n_x: int
Return number of inputs.
- property n_y: int
Return number of outputs.
- property partials: Iterable[int]
Return iterator of index for each partial.
- save(binary_file: str | Path = 'parameters.json') None[source]
Save parameters to specified json file.
- stack() ndarray[source]
Stack W, b into a single array.
parameters.stack() >> np.array([[W1], [b1], [W2], [b2], [W3], [b3]])
Note
This method is used to convert the list format used by the neural net into a single array of stacked parameters for optimization.
- stack_partials() ndarray[source]
Stack backprop partials dW, db.
parameters.stack_partials() >> np.array([[dW1], [db1], [dW2], [db2], [dW3], [db3]])
Note
This method is used to convert the list format used by the neural net into a single array of stacked parameters for optimization.
- stack_partials_per_layer() list[numpy.ndarray][source]
Stack backprop partials dW, db per layer.
parameters.stack_partials_per_layer() >> [np.array([[dW1], [db1]]), np.array([[dW2], [db2]]), np.array([[dW3], [db3]]),]
- stack_per_layer() list[numpy.ndarray][source]
Stack W, b into a single array for each layer.
parameters.stack_per_layer() >> [np.array([[W1], [b1]]), [W2], [b2]]), np.array([[W3], [b3]])]
- unstack(parameters: ndarray | list[numpy.ndarray]) None[source]
Unstack parameters W, b back into list of arrays.
- Parameters:
parameters – neural network parameters as either a single array where all layers are stacked on top of each other or a list of stacked parameters for each layer.
# Unstack from single stack parameters.unstack(np.array([[W1], [b1], [W2], [b2], [W3], [b3]])) parameters.W, parameters.b >> [W1, W2, W3], [b1, b2, b3] # Unstack from list of stacks parameters.unstack([np.array([[W1], [b1]]), [W2], [b2]]), np.array([[W3], [b3]])]) parameters.W, parameters.b >> [W1, W2, W3], [b1, b2, b3]
Note
This method is used to convert optimization results expressed as a single array of stacked parameters, back into the list format used by the neural net.
- unstack_partials(partials: ndarray | list[numpy.ndarray]) None[source]
Unstack backprop partials dW, db back into list of arrays.
- Parameters:
partials – neural network partials as either a single array where all layers are stacked on top of each other or a list of stacked parameters for each layer.
# Unstack from single stack parameters.unstack(np.array([[dW1], [db1], [dW2], [db2], [dW3], [db3]])) parameters.dW, parameters.db >> [dW1, dW2, dW3], [db1, db2, db3] # Unstack from list of stacks parameters.unstack([np.array([[dW1], [db1]]), [dW2], [db2]]), np.array([[dW3], [db3]])]) parameters.dW, parameters.db >> [dW1, dW2, dW3], [db1, db2, db3]
Note
This method is used to convert optimization results expressed as a single array of stacked parameters, back into the list format used by the neural net.
2.8. Propagation.
This module contains the critical functionality to propagate information forward and backward through the neural net.
- jenn.core.propagation.eye(n: int, m: int) ndarray[source]
Copy identify matrix of shape (n, n) m times.
- jenn.core.propagation.first_layer_forward(X: np.ndarray, cache: Cache | None = None) None[source]
Compute input layer activations (in place).
- Parameters:
X – training data inputs, array of shape (n_x, m)
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
- jenn.core.propagation.first_layer_partials(X: np.ndarray, cache: Cache | None) None[source]
Compute input layer partial (in place).
- Parameters:
X – training data inputs, array of shape (n_x, m)
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
- jenn.core.propagation.gradient_enhancement(layer: int, parameters: Parameters, cache: Cache, data: Dataset) None[source]
Add gradient enhancement to backprop (in place).
- Parameters:
layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata
- jenn.core.propagation.last_layer_backward(cache: Cache, data: Dataset) None[source]
Propagate backward through last layer (in place).
- Parameters:
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata
- jenn.core.propagation.model_backward(data: Dataset, parameters: Parameters, cache: Cache, lambd: float = 0.0) None[source]
Propagate backward through all layers (in place).
- Parameters:
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)
- jenn.core.propagation.model_forward(X: np.ndarray, parameters: Parameters, cache: Cache) np.ndarray[source]
Propagate forward in order to predict reponse(r).
- Parameters:
X – training data inputs, array of shape (n_x, m)
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
- jenn.core.propagation.model_partials_forward(X: np.ndarray, parameters: Parameters, cache: Cache) tuple[np.ndarray, np.ndarray][source]
Propagate forward in order to predict reponse(r) and partial(r).
- Parameters:
X – training data inputs, array of shape (n_x, m)
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
- jenn.core.propagation.next_layer_backward(layer: int, parameters: Parameters, cache: Cache, data: Dataset, lambd: float) None[source]
Propagate backward through next layer (in place).
- Parameters:
layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
data – object containing training and associated metadata
lambd – coefficient that multiplies regularization term in cost function
- jenn.core.propagation.next_layer_forward(layer: int, parameters: Parameters, cache: Cache) None[source]
Propagate forward through one layer (in place).
- Parameters:
layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
- jenn.core.propagation.next_layer_partials(layer: int, parameters: Parameters, cache: Cache) np.ndarray[source]
Compute j^th partial in place for one layer (in place).
- Parameters:
layer – index of current layer.
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
- jenn.core.propagation.partials_forward(X: np.ndarray, parameters: Parameters, cache: Cache) np.ndarray[source]
Propagate forward in order to predict partial(r).
- Parameters:
X – training data inputs, array of shape (n_x, m)
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
2.9. Training.
This class implements the core algorithm responsible for training the neural networks.
- jenn.core.training.objective_function(X: np.ndarray, cost: Cost, parameters: Parameters, cache: Cache, stacked_params: np.ndarray) np.float64[source]
Evaluate cost function for training.
- Parameters:
X – training data inputs, array of shape (n_x, m)
cost – cost function to be evaluated
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
stacked_params – neural network parameters returned by the optimizer, represented as single array of stacked parameters for all layers.
- jenn.core.training.objective_gradient(data: Dataset, parameters: Parameters, cache: Cache, lambd: float, stacked_params: np.ndarray) np.ndarray[source]
Evaluate cost function gradient for backprop.
- Parameters:
data – object containing training and associated metadata
parameters – object that stores neural net parameters for each layer
cache – neural net cache that stores neural net quantities computed during forward prop for each layer, so they can be accessed during backprop to avoid re-computing them
lambd – coefficient that multiplies regularization term in cost function
gamma – coefficient that multiplies jacobian-enhancement term in cost function
stacked_params – neural network parameters returned by the optimizer, represented as single array of stacked parameters for all layers.
- jenn.core.training.train_model(data: Dataset, parameters: Parameters, alpha: float = 0.05, beta: np.ndarray | float = 1.0, gamma: np.ndarray | float = 1.0, lambd: float = 0.0, beta1: float = 0.9, beta2: float = 0.99, tau: float = 0.5, tol: float = 1e-12, max_count: int = 1000, epsilon_absolute: float = 1e-12, epsilon_relative: float = 1e-12, epochs: int = 1, max_iter: int = 200, batch_size: int | None = None, shuffle: bool = True, random_state: int | None = None, is_backtracking: bool = False, is_verbose: bool = False) dict[source]
Train neural net.
Note
If training is taking too long, it can be stopped gracefully by creating a local file called STOP in the running directory. Just be sure to delete it before the next run.
- Parameters:
data – object containing training and associated metadata
parameters – object that stores neural net parameters for each layer
alpha – learning rate \(\alpha\)
beta – LSE coefficients [defaulted to one] (optional)
gamma – jacobian-enhancement regularization coefficient [defaulted to zero] (optional)
lambd – regularization coefficient to avoid overfitting [defaulted to zero] (optional)
beta_1 – exponential decay rate of 1st moment vector \(\beta_1\in[0, 1)\)
beta_2 – exponential decay rate of 2nd moment vector \(\beta_2\in[0, 1)\)
tau – amount by which to reduce \(\alpha := \tau \times \alpha\) on each iteration
tol – stop when cost function doesn’t improve more than specified tolerance
max_count – stop when line search iterations exceed maximum count specified
epsilon_absolute – absolute error stopping criterion
epsilon_relative – relative error stopping criterion
epochs – number of passes through data
batch_size – mini batch size (if None, single batch with all data)
max_iter – maximum number of optimizer iterations allowed
shuffle – swhether to huffle data points or not
random_state – random seed (useful to make runs repeatable)
is_backtracking – whether or not to use backtracking during line search
is_verbose – print out progress for each iteration, each batch, each epoch
- Returns:
cost function training history accessed as cost = history[epoch][batch][iter]