Patterns

BasePattern

class a2pm.patterns.BasePattern(features=None, probability=0.5, momentum=0.99, seed=None)

Bases: sklearn.base.BaseEstimator

Base Perturbation Pattern.

A pattern analyzes specific feature subsets to fully or partially adapt itself, and then create valid and coherent perturbations in new data. This base class cannot be directly utilized.

It must be a class implementing the fit, partial_fit and transform methods, according to the following signatures:

fit(self, X, y=None) -> self

partial_fit(self, X, y=None) -> self

transform(self, X) -> numpy array

Parameters
  • features (int, array-like or None) – Index or array-like of indices of features whose values are to be analyzed and perturbed.

    Set to None to use all features.

  • probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.

    Set to 1 to always apply the pattern.

  • momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.

    Set to 1 to remain fully adapted to the initial data, without updates.

    Set to 0 to always fully adapt to new data, as in fit.

  • seed (int, None or a generator) – Seed for reproducible random number generation.

    Set to None to disable reproducibility, or to a generator to use it unaltered.

Variables

generator (numpy generator object) – The random number generator to be used by an inheriting class.

to_apply() bool

Checks if the pattern is to be applied, according to the probability.

Returns

True if the pattern is to be applied; False otherwise.

Return type

bool

fit_transform(X, y=None) numpy.ndarray

Fully adapts the pattern to new data, and then applies it to create data perturbations.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (ignored) – Parameter compatibility.

Returns

X_perturbed – Perturbed data.

Return type

numpy array of shape (n_samples, n_features)

partial_fit_transform(X, y=None) numpy.ndarray

Partially adapts the pattern to new data, according to the momentum, and then applies it to create data perturbations.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (ignored) – Parameter compatibility.

Returns

X_perturbed – Perturbed data.

Return type

numpy array of shape (n_samples, n_features)

set_params(**params)

Sets the parameters.

Parameters

**params (dict of ‘parameter name - value’ pairs) – New valid parameters for this pattern.

Returns

This pattern instance.

Return type

self

set_momentum(momentum) None

Sets the momentum.

Parameters

momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.

Set to 1 to remain fully adapted to the initial data, without updates.

Set to 0 to always fully adapt to new data, as in fit.

Raises

ValueError – If the parameters do not fulfill the constraints.

set_probability(probability) None

Sets the probability.

Parameters

probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.

Set to 1 to always apply the pattern.

Raises

ValueError – If the parameters do not fulfill the constraints.

set_features(features) None

Sets the features.

Parameters

features (int, array-like or None) – Index or array-like of indices of features whose values are to be perturbed.

Set to None to use all features.

Raises

ValueError – If the parameters do not fulfill the constraints.

set_seed(seed) None

Sets the seed for random number generation.

Parameters

seed (int, None or a generator) – Seed for reproducible random number generation.

Set to None to disable reproducibility, or set to a generator to use it unaltered.

Raises

ValueError – If the parameters do not fulfill the constraints.

CombinationPattern

class a2pm.patterns.CombinationPattern(features=None, locked_features=None, probability=0.5, momentum=0.99, seed=None)

Bases: a2pm.patterns.base_pattern.BasePattern

Combination Perturbation Pattern.

Perturbs features by replacing their values with other valid combinations. Intended use: categorical features (nominal and ordinal).

The valid combinations start being partially updated when the partial_fit or partial_fit_transform methods are called.

Parameters
  • features (int, array-like or None) – Index or array-like of indices of features whose values are to be used in valid combinations.

    Set to None to use all features.

  • locked_features (int, array-like or None) – Index or array-like of indices of features whose values are to be used in valid combinations, without being modified.

    These locked feature indices must also be present in the general features parameter.

    Set to None to not lock any feature.

  • probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.

    Set to 1 to always apply the pattern.

  • momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.

    Set to 1 to remain fully adapted to the initial data, without updates.

    Set to 0 to always fully adapt to new data, as in fit.

  • seed (int, None or a generator) – Seed for reproducible random number generation.

    Set to None to disable reproducibility, or to a generator to use it unaltered.

Variables
  • valid_cmbs (numpy array of combinations) – The valid combinations recorded by the feature analysis of this pattern. Only available after a call to fit or partial_fit.

  • generator (numpy generator object) – The random number generator used by this pattern.

fit(X, y=None)

Fully adapts the pattern to new data.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (ignored) – Parameter compatibility.

Returns

This pattern instance.

Return type

self

partial_fit(X, y=None)

Partially adapts the pattern to new data.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (ignored) – Parameter compatibility.

Returns

This pattern instance.

Return type

self

transform(X) numpy.ndarray

Applies the pattern to create data perturbations.

Parameters

X (array-like of shape (n_samples, n_features)) – Input data.

Returns

X_perturbed – Perturbed data.

Return type

numpy array of shape (n_samples, n_features)

set_params(**params)

Sets the parameters.

Parameters

**params (dict of ‘parameter name - value’ pairs) – Valid parameters for this pattern.

Returns

This pattern instance.

Return type

self

set_locked_features(locked_features) None

Sets the locked features.

Parameters

locked_features (int, array-like or None) – Index or array-like of indices of features whose values are to be used in valid combinations, without being modified.

These locked feature indices must also be present in the general features parameter.

Set to None to not lock any feature.

Raises

ValueError – If the parameters do not fulfill the constraints.

IntervalPattern

class a2pm.patterns.IntervalPattern(features=None, integer_features=None, ratio=0.1, max_ratio=None, missing_value=None, probability=0.5, momentum=0.99, seed=None)

Bases: a2pm.patterns.base_pattern.BasePattern

Interval Perturbation Pattern.

Perturbs features by increasing or decreasing their values, according to a ratio of the valid interval of minimum and maximum values. Intended use: numerical features (continuous and discrete).

The valid interval starts being partially updated when the partial_fit or partial_fit_transform methods are called.

Parameters
  • features (int, array-like or None) – Index or array-like of indices of features whose values are to be increased or decreased.

    Set to None to use all features.

  • integer_features (int, array-like or None) – Index or array-like of indices of features whose values are to be increased or decreased, without a fractional part.

    These integer feature indices must also be present in the general features parameter.

    Set to None to not impose integer values on any feature.

  • ratio (float, > 0.0) – Ratio of increase/decrease of the value of a feature, relative to its minimum and maximum values.

  • max_ratio (float or None, >= min_ratio) – Maximum ratio. If provided, a random value in the [ratio, max_ratio) interval will be used.

    Set to None to always use the exact value of ratio.

  • missing_value (float or None) – Value to be considered as missing when found in a feature, preventing its perturbation.

    Set to None to perturb all found values.

  • probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.

    Set to 1 to always apply the pattern.

  • momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.

    Set to 1 to remain fully adapted to the initial data, without updates.

    Set to 0 to always fully adapt to new data, as in fit.

  • seed (int, None or a generator) – Seed for reproducible random number generation.

    Set to None to disable reproducibility, or to a generator to use it unaltered.

Variables
  • moving_mins (numpy array of numbers) – The minimum values recorded by the feature analysis of this pattern. Only available after a call to fit or partial_fit.

  • moving_maxs (numpy array of numbers) – The maximum values recorded by the feature analysis of this pattern. Only available after a call to fit or partial_fit.

  • generator (numpy generator object) – The random number generator used by this pattern.

fit(X, y=None)

Fully adapts the pattern to new data.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (ignored) – Parameter compatibility.

Returns

This pattern instance.

Return type

self

partial_fit(X, y=None)

Partially adapts the pattern to new data.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • y (ignored) – Parameter compatibility.

Returns

This pattern instance.

Return type

self

transform(X) numpy.ndarray

Applies the pattern to create data perturbations.

Parameters

X (array-like of shape (n_samples, n_features)) – Input data.

Returns

X_perturbed – Perturbed data.

Return type

numpy array of shape (n_samples, n_features)

set_params(**params)

Sets the parameters.

Parameters

**params (dict of ‘parameter name - value’ pairs) – Valid parameters for this pattern.

Returns

This pattern instance.

Return type

self

set_missing_value(missing_value) None

Sets the missing value.

Parameters

missing_value (float or None) – Value to be considered as missing when found in a feature, preventing its perturbation.

Set to None to perturb all found values.

Raises

ValueError – If the parameters do not fulfill the constraints.

set_ratio(ratio, max_ratio) None

Sets the ratio.

Parameters
  • ratio (float, > 0.0) – Ratio of increase/decrease of the value of a feature, relative to its minimum and maximum values.

  • max_ratio (float or None, >= min_ratio) – Maximum ratio. If provided, a random value in the [ratio, max_ratio) interval will be used.

    Set to None to always use the exact value of ratio.

Raises

ValueError – If the parameters do not fulfill the constraints.

set_integer_features(integer_features) None

Sets the integer features.

Parameters

integer_features (int, array-like or None) – Index or array-like of indices of features whose values are to be increased or decreased, without a fractional part.

These integer feature indices must also be present in the general features parameter.

Set to None to not impose integer values on any feature.

Raises

ValueError – If the parameters do not fulfill the constraints.