Source code for invode.sampling

# sampling.p

import numpy as np
from scipy.stats import qmc

[docs] def lhs_sample(param_bounds, n_samples, seed=None): """ Generate Latin Hypercube Samples for parameter space exploration. This function creates a set of well-distributed parameter samples using Latin Hypercube Sampling (LHS), a stratified sampling technique that ensures good coverage of the parameter space. LHS divides each parameter dimension into equally probable intervals and samples exactly once from each interval, providing better space-filling properties than random sampling. Latin Hypercube Sampling is particularly effective for: - High-dimensional parameter spaces where uniform coverage is important - Expensive function evaluations where sample efficiency matters - Situations requiring reproducible sampling with controlled randomness - Initial exploration phases of optimization algorithms Parameters ---------- param_bounds : dict Dictionary mapping parameter names to their bounds. Each key should be a string parameter name, and each value should be a tuple (min_val, max_val) defining the lower and upper bounds for that parameter. Example: ``{'k1': (0.1, 10.0), 'k2': (0.01, 1.0), 'alpha': (-1, 1)}`` n_samples : int Number of parameter samples to generate. Must be a positive integer. Each sample will contain values for all parameters specified in param_bounds. seed : int, optional Random seed for reproducible sampling. If None, the sampling will be non-deterministic. Using the same seed with identical inputs guarantees identical sample sets, which is useful for debugging and reproducible research. Default is None. Returns ------- list of dict A list containing n_samples dictionaries, where each dictionary represents one parameter sample. Each dictionary has the same keys as param_bounds, with values sampled from the corresponding parameter ranges using LHS. The returned samples have the following properties: - Each parameter dimension is divided into n_samples equally probable strata - Exactly one sample is drawn from each stratum in each dimension - Samples are randomly permuted to avoid correlation between dimensions - All parameter values are within their specified bounds Raises ------ ValueError If n_samples is not a positive integer, or if any parameter bounds are invalid (e.g., min_val >= max_val). TypeError If param_bounds is not a dictionary or contains non-numeric bounds. Notes ----- The function uses SciPy's quasi-Monte Carlo (qmc) module for LHS generation, which provides high-quality space-filling sequences. The sampling process involves three steps: 1. Generate unit hypercube samples using LHS in [0,1]^d 2. Scale samples to the specified parameter bounds 3. Convert arrays back to parameter dictionaries The Latin Hypercube design ensures that: - The marginal distribution of each parameter is uniform over its bounds - No two samples share the same stratum in any single dimension - The samples collectively provide good coverage of the parameter space - Correlation between different parameter dimensions is minimized Examples -------- Basic usage with two parameters: >>> bounds = {'rate': (0.1, 1.0), 'decay': (0.01, 0.1)} >>> samples = lhs_sample(bounds, n_samples=5, seed=42) >>> len(samples) 5 >>> samples[0].keys() dict_keys(['rate', 'decay']) >>> all(0.1 <= s['rate'] <= 1.0 for s in samples) True High-dimensional parameter space: >>> param_bounds = { ... 'k1': (0.1, 10.0), ... 'k2': (0.01, 1.0), ... 'k3': (-5.0, 5.0), ... 'alpha': (0.0, 1.0), ... 'beta': (1e-6, 1e-3) ... } >>> samples = lhs_sample(param_bounds, n_samples=100, seed=123) >>> print(f"Generated {len(samples)} samples in {len(param_bounds)}D space") Generated 100 samples in 5D space Reproducible sampling for debugging: >>> # Same seed produces identical samples >>> samples1 = lhs_sample({'x': (0, 1)}, n_samples=3, seed=42) >>> samples2 = lhs_sample({'x': (0, 1)}, n_samples=3, seed=42) >>> samples1 == samples2 True Integration with optimization loops: >>> def objective_function(params): ... return (params['x'] - 0.5)**2 + (params['y'] - 0.3)**2 >>> >>> bounds = {'x': (0, 1), 'y': (0, 1)} >>> candidates = lhs_sample(bounds, n_samples=50, seed=42) >>> errors = [objective_function(params) for params in candidates] >>> best_idx = np.argmin(errors) >>> best_params = candidates[best_idx] >>> print(f"Best parameters: {best_params}") Comparison with Random Sampling ------------------------------- LHS provides better space coverage than pure random sampling: >>> import matplotlib.pyplot as plt >>> import numpy as np >>> >>> # Generate LHS samples >>> lhs_samples = lhs_sample({'x': (0, 1), 'y': (0, 1)}, n_samples=20, seed=42) >>> lhs_x = [s['x'] for s in lhs_samples] >>> lhs_y = [s['y'] for s in lhs_samples] >>> >>> # Generate random samples for comparison >>> np.random.seed(42) >>> rand_x = np.random.uniform(0, 1, 20) >>> rand_y = np.random.uniform(0, 1, 20) >>> >>> # Plot comparison >>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4)) >>> ax1.scatter(lhs_x, lhs_y, alpha=0.7) >>> ax1.set_title('Latin Hypercube Sampling') >>> ax1.grid(True, alpha=0.3) >>> >>> ax2.scatter(rand_x, rand_y, alpha=0.7) >>> ax2.set_title('Random Sampling') >>> ax2.grid(True, alpha=0.3) >>> plt.show() Performance Characteristics --------------------------- - **Time Complexity**: O(n_samples * d) where d is the number of parameters - **Space Complexity**: O(n_samples * d) for storing the sample matrix - **Quality**: Provides better uniformity than random sampling for the same number of samples - **Scalability**: Efficient for high-dimensional spaces (tested up to 100+ dimensions) See Also -------- scipy.stats.qmc.LatinHypercube : The underlying LHS generator scipy.stats.qmc.scale : Function for scaling unit samples to custom bounds numpy.random.uniform : Alternative random sampling approach References ---------- .. [1] McKay, Michael D., Richard J. Beckman, and William J. Conover. "A comparison of three methods for selecting values of input variables in the analysis of output from a computer code." Technometrics 42.1 (2000): 55-61. .. [2] Stein, M. (1987). "Large sample properties of simulations using Latin hypercube sampling." Technometrics, 29(2), 143-151. """ keys = list(param_bounds.keys()) bounds = np.array([param_bounds[k] for k in keys]) # shape: (n_params, 2) sampler = qmc.LatinHypercube(d=len(keys), seed=seed) unit_samples = sampler.random(n=n_samples) scaled_samples = qmc.scale(unit_samples, bounds[:, 0], bounds[:, 1]) return [dict(zip(keys, sample)) for sample in scaled_samples]