MLX API¶
The MLX layer is optimized for Apple Silicon (M1/M2/M3) using the MLX framework.
CvxpyLayer¶
- class cvxpylayers.mlx.CvxpyLayer[source]¶
Bases:
objectA differentiable convex optimization layer for MLX.
This layer wraps a parametrized CVXPY problem, solving it in the forward pass and computing gradients via implicit differentiation. Optimized for Apple Silicon (M1/M2/M3) with unified memory architecture.
Example
>>> import cvxpy as cp >>> import mlx.core as mx >>> from cvxpylayers.mlx import CvxpyLayer >>> >>> # Define a simple QP >>> x = cp.Variable(2) >>> A = cp.Parameter((3, 2)) >>> b = cp.Parameter(3) >>> problem = cp.Problem(cp.Minimize(cp.sum_squares(A @ x - b)), [x >= 0]) >>> >>> # Create the layer >>> layer = CvxpyLayer(problem, parameters=[A, b], variables=[x]) >>> >>> # Solve and compute gradients >>> A_mx = mx.random.normal((3, 2)) >>> b_mx = mx.random.normal((3,)) >>> (solution,) = layer(A_mx, b_mx) >>> >>> # Gradient computation >>> def loss_fn(A, b): ... (x,) = layer(A, b) ... return mx.sum(x) >>> grad_fn = mx.grad(loss_fn, argnums=[0, 1]) >>> grads = grad_fn(A_mx, b_mx)
- __init__(problem, parameters, variables, solver=None, gp=False, verbose=False, canon_backend=None, solver_args=None)[source]¶
Initialize the differentiable optimization layer.
- Parameters:
problem (Problem) – A CVXPY Problem. Must be DPP-compliant (
problem.is_dpp()must return True).parameters (list[Parameter]) – List of CVXPY Parameters that will be filled with values at runtime. Order must match the order of arrays passed to __call__().
variables (list[Variable]) – List of CVXPY Variables whose optimal values will be returned by __call__(). Order determines the order of returned arrays.
solver (str | None) – CVXPY solver to use (e.g.,
cp.CLARABEL,cp.SCS). If None, uses the default diffcp solver.gp (bool) – If True, problem is a geometric program. Parameters will be log-transformed before solving.
verbose (bool) – If True, print solver output.
canon_backend (str | None) – Backend for canonicalization. Options are ‘diffcp’, ‘cuclarabel’, or None (auto-select).
solver_args (dict[str, Any] | None) – Default keyword arguments passed to the solver. Can be overridden per-call in __call__().
- Raises:
AssertionError – If problem is not DPP-compliant.
ValueError – If parameters or variables are not part of the problem.
- Return type:
None
- __call__(*params, solver_args=None)[source]¶
Solve the optimization problem and return optimal variable values.
- Parameters:
*params (array) – Array values for each CVXPY Parameter, in the same order as the
parametersargument to __init__(). Each array shape must match the corresponding Parameter shape, optionally with a batch dimension prepended. Batched and unbatched parameters can be mixed; unbatched parameters are broadcast.solver_args (dict[str, Any] | None) – Keyword arguments passed to the solver, overriding any defaults set in __init__().
- Returns:
Tuple of arrays containing optimal values for each CVXPY Variable specified in the
variablesargument to __init__(). If inputs are batched, outputs will have matching batch dimensions.- Raises:
RuntimeError – If the solver fails to find a solution.
- Return type:
tuple[array, …]
Example
>>> # Single problem >>> (x_opt,) = layer(A_array, b_array) >>> >>> # Batched: solve 10 problems in parallel >>> A_batch = mx.random.normal((10, 3, 2)) >>> b_batch = mx.random.normal((10, 3)) >>> (x_batch,) = layer(A_batch, b_batch) # x_batch.shape = (10, 2)
Usage Example¶
import cvxpy as cp
import mlx.core as mx
from cvxpylayers.mlx import CvxpyLayer
# Define problem
n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
problem = cp.Problem(
cp.Minimize(cp.sum_squares(A @ x - b)),
[x >= 0]
)
# Create layer
layer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
# Solve
A_mx = mx.random.normal((m, n))
b_mx = mx.random.normal((m,))
(x_sol,) = layer(A_mx, b_mx)
Computing Gradients¶
Use mx.grad to compute gradients:
def loss_fn(A, b):
(x,) = layer(A, b)
return mx.sum(x)
# Gradient with respect to A and b
grad_fn = mx.grad(loss_fn, argnums=[0, 1])
dA, db = grad_fn(A_mx, b_mx)
# Evaluate gradients
mx.eval(dA, db)
Value and Gradient¶
Compute both value and gradient efficiently:
def loss_fn(A, b):
(x,) = layer(A, b)
return mx.sum(x)
value_and_grad_fn = mx.value_and_grad(loss_fn, argnums=[0, 1])
loss_val, (dA, db) = value_and_grad_fn(A_mx, b_mx)
Apple Silicon Optimization¶
MLX automatically uses the unified memory architecture of Apple Silicon, allowing efficient computation without explicit device management:
# No need to move tensors to GPU - MLX handles this automatically
A_mx = mx.random.normal((1000, 500))
b_mx = mx.random.normal((1000,))
(x_sol,) = layer(A_mx, b_mx)
mx.eval(x_sol) # Force evaluation
Notes¶
MLX uses lazy evaluation; call
mx.eval()to force computationThe MLX layer supports batched execution like PyTorch and JAX
For best performance on Apple Silicon, prefer MLX over PyTorch CPU