Jacobian-Vector Product (GVP) Module#

The GVP module provides implementations of Jacobian-vector products (JVP), vector-Jacobian products (VJP), and related utilities.

JVP (Jacobian-Vector Product)#

torch_secorder.core.gvp.jvp(func: Callable[[], Tensor], params: List[Tensor], v: Tensor | List[Tensor], create_graph: bool = False) Tensor | List[Tensor][source]#

Compute the Jacobian-vector product (JVP): J v.

Parameters:
  • func – A callable that returns a tensor output (can be vector-valued).

  • params – List of parameters with respect to which to compute the Jacobian.

  • v – Vector to multiply with the Jacobian. Can be a single tensor or a list of tensors matching the structure of params.

  • create_graph – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.

Returns:

The JVP (same shape as the output of func).

Computes the Jacobian-vector product for a given function and parameters.

Example:#

import torch
from torch_secorder.core.gvp import jvp

def func():
    return torch.stack([x[0] ** 2, 3 * x[1] ** 2])

x = torch.tensor([1.0, 2.0], requires_grad=True)
v = torch.tensor([0.5, -1.0])
jvp_result = jvp(func, [x], v)

VJP (Vector-Jacobian Product)#

torch_secorder.core.gvp.vjp(func: Callable[[], Tensor], params: List[Tensor], v: Tensor, create_graph: bool = False) Tensor | List[Tensor][source]#

Compute the vector-Jacobian product (VJP): v^T J.

Parameters:
  • func – A callable that returns a tensor output (can be vector-valued).

  • params – List of parameters with respect to which to compute the Jacobian.

  • v – Vector to multiply with the Jacobian (should match the output shape of func).

  • create_graph – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.

Returns:

The VJP (list of tensors matching the structure of params).

Computes the vector-Jacobian product for a given function and parameters.

Example:#

import torch
from torch_secorder.core.gvp import vjp

def func():
    return torch.stack([x[0] ** 2, 3 * x[1] ** 2])

x = torch.tensor([1.0, 2.0], requires_grad=True)
v = torch.tensor([0.5, -1.0])
vjp_result = vjp(func, [x], v)

Model JVP#

torch_secorder.core.gvp.model_jvp(model: Module, x: Tensor, v: Tensor | List[Tensor], create_graph: bool = False) Tensor[source]#

Compute the JVP for a model’s output with respect to its parameters.

Parameters:
  • model – The PyTorch model.

  • x – Input tensor.

  • v – Vector to multiply with the Jacobian (should match the structure of model.parameters()).

  • create_graph – If True, graph of the derivative will be constructed.

Returns:

The JVP (same shape as the model output).

A convenience wrapper for computing JVP with respect to a model’s parameters.

Example:#

import torch
import torch.nn as nn
from torch_secorder.core.gvp import model_jvp

model = nn.Linear(10, 1)
x = torch.randn(1, 10)
v = [torch.randn_like(p) for p in model.parameters()]
jvp_result = model_jvp(model, x, v)

Model VJP#

torch_secorder.core.gvp.model_vjp(model: Module, x: Tensor, v: Tensor, create_graph: bool = False) Tensor | List[Tensor][source]#

Compute the VJP for a model’s output with respect to its parameters.

Parameters:
  • model – The PyTorch model.

  • x – Input tensor.

  • v – Vector to multiply with the Jacobian (should match the output shape of model(x)).

  • create_graph – If True, graph of the derivative will be constructed.

Returns:

The VJP (list of tensors matching the structure of model.parameters()).

A convenience wrapper for computing VJP with respect to a model’s parameters.

Example:#

import torch
import torch.nn as nn
from torch_secorder.core.gvp import model_vjp

model = nn.Linear(10, 1)
x = torch.randn(1, 10)
v = torch.randn(1, 1)
vjp_result = model_vjp(model, x, v)

Batch JVP#

torch_secorder.core.gvp.batch_jvp(func: Callable[[], Tensor], params: List[Tensor], vs: Tensor | List[Tensor], create_graph: bool = False) Tensor[source]#

Compute a batch of Jacobian-vector products (JVPs).

Parameters:
  • func – A callable that returns a tensor output (can be vector-valued).

  • params – List of parameters with respect to which to compute the Jacobian.

  • vs – Batch of vectors to multiply with the Jacobian. Should be a tensor of shape (batch, …) or a list of such tensors.

  • create_graph – If True, graph of the derivative will be constructed.

Returns:

Tensor of shape (batch, …) with the JVPs for each vector in the batch.

Computes JVPs for a batch of vectors efficiently.

Example:#

import torch
from torch_secorder.core.gvp import batch_jvp

def func():
    return torch.stack([x[0] ** 2, 3 * x[1] ** 2])

x = torch.tensor([1.0, 2.0], requires_grad=True)
vs = torch.stack([
    torch.tensor([1.0, 0.0]),
    torch.tensor([0.0, 1.0])
])
batch_result = batch_jvp(func, [x], vs)

Batch VJP#

torch_secorder.core.gvp.batch_vjp(func: Callable[[], Tensor], params: List[Tensor], vs: Tensor, create_graph: bool = False) List[Tensor][source]#

Compute a batch of vector-Jacobian products (VJPs).

Parameters:
  • func – A callable that returns a tensor output (can be vector-valued).

  • params – List of parameters with respect to which to compute the Jacobian.

  • vs – Batch of vectors to multiply with the Jacobian (should match the output shape of func, with batch dimension first).

  • create_graph – If True, graph of the derivative will be constructed.

Returns:

List of tensors, each of shape (batch, …) matching the structure of params.

Computes VJPs for a batch of vectors efficiently.

Example:#

import torch
from torch_secorder.core.gvp import batch_vjp

def func():
    return torch.stack([x[0] ** 2, 3 * x[1] ** 2])

x = torch.tensor([1.0, 2.0], requires_grad=True)
vs = torch.stack([
    torch.tensor([1.0, 0.0]),
    torch.tensor([0.0, 1.0])
])
batch_result = batch_vjp(func, [x], vs)

Full Jacobian#

torch_secorder.core.gvp.full_jacobian(func: Callable[[], Tensor], params: List[Tensor], create_graph: bool = False) List[Tensor][source]#

Compute the full Jacobian matrix of func with respect to params.

Parameters:
  • func – A callable that returns a tensor output (can be vector-valued).

  • params – List of parameters with respect to which to compute the Jacobian.

  • create_graph – If True, graph of the derivative will be constructed.

Returns:

List of Jacobian tensors, one for each parameter, with shape (output_dim, param_shape).

Computes the full Jacobian matrix for a given function and parameters.

Example:#

import torch
from torch_secorder.core.gvp import full_jacobian

def func():
    return torch.stack([x[0] ** 2, 3 * x[1] ** 2])

x = torch.tensor([1.0, 2.0], requires_grad=True)
jac = full_jacobian(func, [x])

Notes#

  1. JVP (`jvp`, `model_jvp`, `batch_jvp`): These functions compute the product of the Jacobian matrix with a vector (or batch of vectors). This is generally more efficient than computing the full Jacobian when only the product is needed.

  2. VJP (`vjp`, `model_vjp`, `batch_vjp`): These functions compute the product of a vector (or batch of vectors) with the transpose of the Jacobian matrix. This is also known as a reverse-mode differentiation and is the basis for backpropagation.

  3. `create_graph` Parameter: When create_graph=True is set, a computational graph of the derivative itself is constructed. This allows for computing higher-order derivatives (e.g., Hessian-vector products from JVPs/VJPs).

  4. `allow_unused=True`: This parameter in torch.autograd.grad is used to allow gradients to be computed for parameters that might not be part of the computational graph for a specific output. If a parameter does not affect the output, its gradient will be None, and the functions handle this by replacing None with zero tensors.

  5. Batch Computations: The batch_jvp and batch_vjp functions provide an efficient way to compute JVPs and VJPs for multiple vectors in a single call, which can be beneficial for performance compared to looping through individual vector computations.

  6. Full Jacobian (`full_jacobian`): While JVP and VJP are efficient for products, full_jacobian computes the entire Jacobian matrix. This can be memory-intensive for models with many inputs/outputs or parameters but is useful when the entire matrix is required for analysis.