Source code for invode.sampling

# sampling.p

import numpy as np
from scipy.stats import qmc


[docs]
def lhs_sample(param_bounds, n_samples, seed=None):
    """
    Generate Latin Hypercube Samples for parameter space exploration.
    
    This function creates a set of well-distributed parameter samples using Latin 
    Hypercube Sampling (LHS), a stratified sampling technique that ensures good 
    coverage of the parameter space. LHS divides each parameter dimension into 
    equally probable intervals and samples exactly once from each interval, 
    providing better space-filling properties than random sampling.
    
    Latin Hypercube Sampling is particularly effective for:
    
    - High-dimensional parameter spaces where uniform coverage is important
    - Expensive function evaluations where sample efficiency matters
    - Situations requiring reproducible sampling with controlled randomness
    - Initial exploration phases of optimization algorithms
    
    Parameters
    ----------
    param_bounds : dict
        Dictionary mapping parameter names to their bounds. Each key should be
        a string parameter name, and each value should be a tuple (min_val, max_val)
        defining the lower and upper bounds for that parameter.
        
        Example: ``{'k1': (0.1, 10.0), 'k2': (0.01, 1.0), 'alpha': (-1, 1)}``
        
    n_samples : int
        Number of parameter samples to generate. Must be a positive integer.
        Each sample will contain values for all parameters specified in param_bounds.
        
    seed : int, optional
        Random seed for reproducible sampling. If None, the sampling will be
        non-deterministic. Using the same seed with identical inputs guarantees
        identical sample sets, which is useful for debugging and reproducible
        research. Default is None.
    
    Returns
    -------
    list of dict
        A list containing n_samples dictionaries, where each dictionary represents
        one parameter sample. Each dictionary has the same keys as param_bounds,
        with values sampled from the corresponding parameter ranges using LHS.
        
        The returned samples have the following properties:
        
        - Each parameter dimension is divided into n_samples equally probable strata
        - Exactly one sample is drawn from each stratum in each dimension
        - Samples are randomly permuted to avoid correlation between dimensions
        - All parameter values are within their specified bounds
    
    Raises
    ------
    ValueError
        If n_samples is not a positive integer, or if any parameter bounds are
        invalid (e.g., min_val >= max_val).
    TypeError
        If param_bounds is not a dictionary or contains non-numeric bounds.
    
    Notes
    -----
    The function uses SciPy's quasi-Monte Carlo (qmc) module for LHS generation,
    which provides high-quality space-filling sequences. The sampling process
    involves three steps:
    
    1. Generate unit hypercube samples using LHS in [0,1]^d
    2. Scale samples to the specified parameter bounds
    3. Convert arrays back to parameter dictionaries
    
    The Latin Hypercube design ensures that:
    
    - The marginal distribution of each parameter is uniform over its bounds
    - No two samples share the same stratum in any single dimension
    - The samples collectively provide good coverage of the parameter space
    - Correlation between different parameter dimensions is minimized
    
    Examples
    --------
    Basic usage with two parameters:
    
    >>> bounds = {'rate': (0.1, 1.0), 'decay': (0.01, 0.1)}
    >>> samples = lhs_sample(bounds, n_samples=5, seed=42)
    >>> len(samples)
    5
    >>> samples[0].keys()
    dict_keys(['rate', 'decay'])
    >>> all(0.1 <= s['rate'] <= 1.0 for s in samples)
    True
    
    High-dimensional parameter space:
    
    >>> param_bounds = {
    ...     'k1': (0.1, 10.0),
    ...     'k2': (0.01, 1.0), 
    ...     'k3': (-5.0, 5.0),
    ...     'alpha': (0.0, 1.0),
    ...     'beta': (1e-6, 1e-3)
    ... }
    >>> samples = lhs_sample(param_bounds, n_samples=100, seed=123)
    >>> print(f"Generated {len(samples)} samples in {len(param_bounds)}D space")
    Generated 100 samples in 5D space
    
    Reproducible sampling for debugging:
    
    >>> # Same seed produces identical samples
    >>> samples1 = lhs_sample({'x': (0, 1)}, n_samples=3, seed=42)
    >>> samples2 = lhs_sample({'x': (0, 1)}, n_samples=3, seed=42)
    >>> samples1 == samples2
    True
    
    Integration with optimization loops:
    
    >>> def objective_function(params):
    ...     return (params['x'] - 0.5)**2 + (params['y'] - 0.3)**2
    >>> 
    >>> bounds = {'x': (0, 1), 'y': (0, 1)}
    >>> candidates = lhs_sample(bounds, n_samples=50, seed=42)
    >>> errors = [objective_function(params) for params in candidates]
    >>> best_idx = np.argmin(errors)
    >>> best_params = candidates[best_idx]
    >>> print(f"Best parameters: {best_params}")
    
    Comparison with Random Sampling
    -------------------------------
    LHS provides better space coverage than pure random sampling:
    
    >>> import matplotlib.pyplot as plt
    >>> import numpy as np
    >>> 
    >>> # Generate LHS samples
    >>> lhs_samples = lhs_sample({'x': (0, 1), 'y': (0, 1)}, n_samples=20, seed=42)
    >>> lhs_x = [s['x'] for s in lhs_samples]
    >>> lhs_y = [s['y'] for s in lhs_samples]
    >>> 
    >>> # Generate random samples for comparison
    >>> np.random.seed(42)
    >>> rand_x = np.random.uniform(0, 1, 20)
    >>> rand_y = np.random.uniform(0, 1, 20)
    >>> 
    >>> # Plot comparison
    >>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
    >>> ax1.scatter(lhs_x, lhs_y, alpha=0.7)
    >>> ax1.set_title('Latin Hypercube Sampling')
    >>> ax1.grid(True, alpha=0.3)
    >>> 
    >>> ax2.scatter(rand_x, rand_y, alpha=0.7)
    >>> ax2.set_title('Random Sampling')  
    >>> ax2.grid(True, alpha=0.3)
    >>> plt.show()
    
    Performance Characteristics
    ---------------------------
    - **Time Complexity**: O(n_samples * d) where d is the number of parameters
    - **Space Complexity**: O(n_samples * d) for storing the sample matrix
    - **Quality**: Provides better uniformity than random sampling for the same
      number of samples
    - **Scalability**: Efficient for high-dimensional spaces (tested up to 100+ dimensions)
    
    See Also
    --------
    scipy.stats.qmc.LatinHypercube : The underlying LHS generator
    scipy.stats.qmc.scale : Function for scaling unit samples to custom bounds
    numpy.random.uniform : Alternative random sampling approach
    
    References
    ----------
    .. [1] McKay, Michael D., Richard J. Beckman, and William J. Conover. "A comparison of three
            methods for selecting values of input variables in the analysis of output from a computer code." Technometrics 42.1 (2000): 55-61.
    .. [2] Stein, M. (1987). "Large sample properties of simulations using Latin 
           hypercube sampling." Technometrics, 29(2), 143-151.
    """
        
    keys = list(param_bounds.keys())
    bounds = np.array([param_bounds[k] for k in keys])  # shape: (n_params, 2)

    sampler = qmc.LatinHypercube(d=len(keys), seed=seed)
    unit_samples = sampler.random(n=n_samples)
    scaled_samples = qmc.scale(unit_samples, bounds[:, 0], bounds[:, 1])
    return [dict(zip(keys, sample)) for sample in scaled_samples]