Patterns
BasePattern
- class a2pm.patterns.BasePattern(features=None, probability=0.5, momentum=0.99, seed=None)
Bases:
sklearn.base.BaseEstimator
Base Perturbation Pattern.
A pattern analyzes specific feature subsets to fully or partially adapt itself, and then create valid and coherent perturbations in new data. This base class cannot be directly utilized.
It must be a class implementing the fit, partial_fit and transform methods, according to the following signatures:
fit(self, X, y=None) -> self
partial_fit(self, X, y=None) -> self
transform(self, X) -> numpy array
- Parameters
features (int, array-like or None) – Index or array-like of indices of features whose values are to be analyzed and perturbed.
Set to None to use all features.
probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.
Set to 1 to always apply the pattern.
momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.
Set to 1 to remain fully adapted to the initial data, without updates.
Set to 0 to always fully adapt to new data, as in fit.
seed (int, None or a generator) – Seed for reproducible random number generation.
Set to None to disable reproducibility, or to a generator to use it unaltered.
- Variables
generator (numpy generator object) – The random number generator to be used by an inheriting class.
- to_apply() bool
Checks if the pattern is to be applied, according to the probability.
- Returns
True if the pattern is to be applied; False otherwise.
- Return type
bool
- fit_transform(X, y=None) numpy.ndarray
Fully adapts the pattern to new data, and then applies it to create data perturbations.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
y (ignored) – Parameter compatibility.
- Returns
X_perturbed – Perturbed data.
- Return type
numpy array of shape (n_samples, n_features)
- partial_fit_transform(X, y=None) numpy.ndarray
Partially adapts the pattern to new data, according to the momentum, and then applies it to create data perturbations.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
y (ignored) – Parameter compatibility.
- Returns
X_perturbed – Perturbed data.
- Return type
numpy array of shape (n_samples, n_features)
- set_params(**params)
Sets the parameters.
- Parameters
**params (dict of ‘parameter name - value’ pairs) – New valid parameters for this pattern.
- Returns
This pattern instance.
- Return type
self
- set_momentum(momentum) None
Sets the momentum.
- Parameters
momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.
Set to 1 to remain fully adapted to the initial data, without updates.
Set to 0 to always fully adapt to new data, as in fit.
- Raises
ValueError – If the parameters do not fulfill the constraints.
- set_probability(probability) None
Sets the probability.
- Parameters
probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.
Set to 1 to always apply the pattern.
- Raises
ValueError – If the parameters do not fulfill the constraints.
- set_features(features) None
Sets the features.
- Parameters
features (int, array-like or None) – Index or array-like of indices of features whose values are to be perturbed.
Set to None to use all features.
- Raises
ValueError – If the parameters do not fulfill the constraints.
- set_seed(seed) None
Sets the seed for random number generation.
- Parameters
seed (int, None or a generator) – Seed for reproducible random number generation.
Set to None to disable reproducibility, or set to a generator to use it unaltered.
- Raises
ValueError – If the parameters do not fulfill the constraints.
CombinationPattern
- class a2pm.patterns.CombinationPattern(features=None, locked_features=None, probability=0.5, momentum=0.99, seed=None)
Bases:
a2pm.patterns.base_pattern.BasePattern
Combination Perturbation Pattern.
Perturbs features by replacing their values with other valid combinations. Intended use: categorical features (nominal and ordinal).
The valid combinations start being partially updated when the partial_fit or partial_fit_transform methods are called.
- Parameters
features (int, array-like or None) – Index or array-like of indices of features whose values are to be used in valid combinations.
Set to None to use all features.
locked_features (int, array-like or None) – Index or array-like of indices of features whose values are to be used in valid combinations, without being modified.
These locked feature indices must also be present in the general features parameter.
Set to None to not lock any feature.
probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.
Set to 1 to always apply the pattern.
momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.
Set to 1 to remain fully adapted to the initial data, without updates.
Set to 0 to always fully adapt to new data, as in fit.
seed (int, None or a generator) – Seed for reproducible random number generation.
Set to None to disable reproducibility, or to a generator to use it unaltered.
- Variables
valid_cmbs (numpy array of combinations) – The valid combinations recorded by the feature analysis of this pattern. Only available after a call to fit or partial_fit.
generator (numpy generator object) – The random number generator used by this pattern.
- fit(X, y=None)
Fully adapts the pattern to new data.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
y (ignored) – Parameter compatibility.
- Returns
This pattern instance.
- Return type
self
- partial_fit(X, y=None)
Partially adapts the pattern to new data.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
y (ignored) – Parameter compatibility.
- Returns
This pattern instance.
- Return type
self
- transform(X) numpy.ndarray
Applies the pattern to create data perturbations.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
- Returns
X_perturbed – Perturbed data.
- Return type
numpy array of shape (n_samples, n_features)
- set_params(**params)
Sets the parameters.
- Parameters
**params (dict of ‘parameter name - value’ pairs) – Valid parameters for this pattern.
- Returns
This pattern instance.
- Return type
self
- set_locked_features(locked_features) None
Sets the locked features.
- Parameters
locked_features (int, array-like or None) – Index or array-like of indices of features whose values are to be used in valid combinations, without being modified.
These locked feature indices must also be present in the general features parameter.
Set to None to not lock any feature.
- Raises
ValueError – If the parameters do not fulfill the constraints.
IntervalPattern
- class a2pm.patterns.IntervalPattern(features=None, integer_features=None, ratio=0.1, max_ratio=None, missing_value=None, probability=0.5, momentum=0.99, seed=None)
Bases:
a2pm.patterns.base_pattern.BasePattern
Interval Perturbation Pattern.
Perturbs features by increasing or decreasing their values, according to a ratio of the valid interval of minimum and maximum values. Intended use: numerical features (continuous and discrete).
The valid interval starts being partially updated when the partial_fit or partial_fit_transform methods are called.
- Parameters
features (int, array-like or None) – Index or array-like of indices of features whose values are to be increased or decreased.
Set to None to use all features.
integer_features (int, array-like or None) – Index or array-like of indices of features whose values are to be increased or decreased, without a fractional part.
These integer feature indices must also be present in the general features parameter.
Set to None to not impose integer values on any feature.
ratio (float, > 0.0) – Ratio of increase/decrease of the value of a feature, relative to its minimum and maximum values.
max_ratio (float or None, >= min_ratio) – Maximum ratio. If provided, a random value in the [ratio, max_ratio) interval will be used.
Set to None to always use the exact value of ratio.
missing_value (float or None) – Value to be considered as missing when found in a feature, preventing its perturbation.
Set to None to perturb all found values.
probability (float, in the (0.0, 1.0] interval) – Probability of applying the pattern in transform.
Set to 1 to always apply the pattern.
momentum (float, in the [0.0, 1.0] interval) – Momentum of the partial_fit updates.
Set to 1 to remain fully adapted to the initial data, without updates.
Set to 0 to always fully adapt to new data, as in fit.
seed (int, None or a generator) – Seed for reproducible random number generation.
Set to None to disable reproducibility, or to a generator to use it unaltered.
- Variables
moving_mins (numpy array of numbers) – The minimum values recorded by the feature analysis of this pattern. Only available after a call to fit or partial_fit.
moving_maxs (numpy array of numbers) – The maximum values recorded by the feature analysis of this pattern. Only available after a call to fit or partial_fit.
generator (numpy generator object) – The random number generator used by this pattern.
- fit(X, y=None)
Fully adapts the pattern to new data.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
y (ignored) – Parameter compatibility.
- Returns
This pattern instance.
- Return type
self
- partial_fit(X, y=None)
Partially adapts the pattern to new data.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
y (ignored) – Parameter compatibility.
- Returns
This pattern instance.
- Return type
self
- transform(X) numpy.ndarray
Applies the pattern to create data perturbations.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input data.
- Returns
X_perturbed – Perturbed data.
- Return type
numpy array of shape (n_samples, n_features)
- set_params(**params)
Sets the parameters.
- Parameters
**params (dict of ‘parameter name - value’ pairs) – Valid parameters for this pattern.
- Returns
This pattern instance.
- Return type
self
- set_missing_value(missing_value) None
Sets the missing value.
- Parameters
missing_value (float or None) – Value to be considered as missing when found in a feature, preventing its perturbation.
Set to None to perturb all found values.
- Raises
ValueError – If the parameters do not fulfill the constraints.
- set_ratio(ratio, max_ratio) None
Sets the ratio.
- Parameters
ratio (float, > 0.0) – Ratio of increase/decrease of the value of a feature, relative to its minimum and maximum values.
max_ratio (float or None, >= min_ratio) – Maximum ratio. If provided, a random value in the [ratio, max_ratio) interval will be used.
Set to None to always use the exact value of ratio.
- Raises
ValueError – If the parameters do not fulfill the constraints.
- set_integer_features(integer_features) None
Sets the integer features.
- Parameters
integer_features (int, array-like or None) – Index or array-like of indices of features whose values are to be increased or decreased, without a fractional part.
These integer feature indices must also be present in the general features parameter.
Set to None to not impose integer values on any feature.
- Raises
ValueError – If the parameters do not fulfill the constraints.