MLX API

The MLX layer is optimized for Apple Silicon (M1/M2/M3) using the MLX framework.

CvxpyLayer

class cvxpylayers.mlx.CvxpyLayer[source]

Bases: object

A differentiable convex optimization layer for MLX.

This layer wraps a parametrized CVXPY problem, solving it in the forward pass and computing gradients via implicit differentiation. Optimized for Apple Silicon (M1/M2/M3) with unified memory architecture.

Example

>>> import cvxpy as cp
>>> import mlx.core as mx
>>> from cvxpylayers.mlx import CvxpyLayer
>>>
>>> # Define a simple QP
>>> x = cp.Variable(2)
>>> A = cp.Parameter((3, 2))
>>> b = cp.Parameter(3)
>>> problem = cp.Problem(cp.Minimize(cp.sum_squares(A @ x - b)), [x >= 0])
>>>
>>> # Create the layer
>>> layer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
>>>
>>> # Solve and compute gradients
>>> A_mx = mx.random.normal((3, 2))
>>> b_mx = mx.random.normal((3,))
>>> (solution,) = layer(A_mx, b_mx)
>>>
>>> # Gradient computation
>>> def loss_fn(A, b):
...     (x,) = layer(A, b)
...     return mx.sum(x)
>>> grad_fn = mx.grad(loss_fn, argnums=[0, 1])
>>> grads = grad_fn(A_mx, b_mx)
__init__(problem, parameters, variables, solver=None, gp=False, verbose=False, canon_backend=None, solver_args=None)[source]

Initialize the differentiable optimization layer.

Parameters:
  • problem (Problem) – A CVXPY Problem. Must be DPP-compliant (problem.is_dpp() must return True).

  • parameters (list[Parameter]) – List of CVXPY Parameters that will be filled with values at runtime. Order must match the order of arrays passed to __call__().

  • variables (list[Variable]) – List of CVXPY Variables whose optimal values will be returned by __call__(). Order determines the order of returned arrays.

  • solver (str | None) – CVXPY solver to use (e.g., cp.CLARABEL, cp.SCS). If None, uses the default diffcp solver.

  • gp (bool) – If True, problem is a geometric program. Parameters will be log-transformed before solving.

  • verbose (bool) – If True, print solver output.

  • canon_backend (str | None) – Backend for canonicalization. Options are ‘diffcp’, ‘cuclarabel’, or None (auto-select).

  • solver_args (dict[str, Any] | None) – Default keyword arguments passed to the solver. Can be overridden per-call in __call__().

Raises:
  • AssertionError – If problem is not DPP-compliant.

  • ValueError – If parameters or variables are not part of the problem.

Return type:

None

__call__(*params, solver_args=None)[source]

Solve the optimization problem and return optimal variable values.

Parameters:
  • *params (array) – Array values for each CVXPY Parameter, in the same order as the parameters argument to __init__(). Each array shape must match the corresponding Parameter shape, optionally with a batch dimension prepended. Batched and unbatched parameters can be mixed; unbatched parameters are broadcast.

  • solver_args (dict[str, Any] | None) – Keyword arguments passed to the solver, overriding any defaults set in __init__().

Returns:

Tuple of arrays containing optimal values for each CVXPY Variable specified in the variables argument to __init__(). If inputs are batched, outputs will have matching batch dimensions.

Raises:

RuntimeError – If the solver fails to find a solution.

Return type:

tuple[array, …]

Example

>>> # Single problem
>>> (x_opt,) = layer(A_array, b_array)
>>>
>>> # Batched: solve 10 problems in parallel
>>> A_batch = mx.random.normal((10, 3, 2))
>>> b_batch = mx.random.normal((10, 3))
>>> (x_batch,) = layer(A_batch, b_batch)  # x_batch.shape = (10, 2)
forward(*params, solver_args=None)[source]

Forward pass (alias for __call__).

Parameters:
  • params (array)

  • solver_args (dict[str, Any] | None)

Return type:

tuple[array, …]

Usage Example

import cvxpy as cp
import mlx.core as mx
from cvxpylayers.mlx import CvxpyLayer

# Define problem
n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
problem = cp.Problem(
    cp.Minimize(cp.sum_squares(A @ x - b)),
    [x >= 0]
)

# Create layer
layer = CvxpyLayer(problem, parameters=[A, b], variables=[x])

# Solve
A_mx = mx.random.normal((m, n))
b_mx = mx.random.normal((m,))

(x_sol,) = layer(A_mx, b_mx)

Computing Gradients

Use mx.grad to compute gradients:

def loss_fn(A, b):
    (x,) = layer(A, b)
    return mx.sum(x)

# Gradient with respect to A and b
grad_fn = mx.grad(loss_fn, argnums=[0, 1])
dA, db = grad_fn(A_mx, b_mx)

# Evaluate gradients
mx.eval(dA, db)

Value and Gradient

Compute both value and gradient efficiently:

def loss_fn(A, b):
    (x,) = layer(A, b)
    return mx.sum(x)

value_and_grad_fn = mx.value_and_grad(loss_fn, argnums=[0, 1])
loss_val, (dA, db) = value_and_grad_fn(A_mx, b_mx)

Apple Silicon Optimization

MLX automatically uses the unified memory architecture of Apple Silicon, allowing efficient computation without explicit device management:

# No need to move tensors to GPU - MLX handles this automatically
A_mx = mx.random.normal((1000, 500))
b_mx = mx.random.normal((1000,))

(x_sol,) = layer(A_mx, b_mx)
mx.eval(x_sol)  # Force evaluation

Notes

  • MLX uses lazy evaluation; call mx.eval() to force computation

  • The MLX layer supports batched execution like PyTorch and JAX

  • For best performance on Apple Silicon, prefer MLX over PyTorch CPU