Model Selection

This example demonstrates how to perform online model selection using a multi-armed bandit approach on the Phishing dataset.

Models and Optimizers

We use the following models and components:

LogisticRegression: Two logistic regression models, each with a different SGD optimizer:
- log1 uses sgd1 (learning rate 0.0001)
- log2 uses sgd2 (learning rate 0.00001)
BanditClassifier: Combines multiple models and selects the best one online using a bandit policy.
EpsilonGreedy: The bandit policy used to balance exploration and exploitation, with parameters for epsilon, decay, burn-in, and random seed.

We also use the following metric for evaluation:

Accuracy

Preprocessing

StandardScaler is used to standardize features for the dataset.

Beaver File Structure

Let's see how we would write the beaver file:

Connector

We start by defining the connector, specifying the Kafka bootstrap servers and security protocol.

connector {
        bootstrap_servers = "localhost:39092"
        security_protocol = "plaintext"
        consumer_group = 'model_selection_models'
        auto_offset_reset = "earliest"
}

Models and Optimizers

We define the optimizers, logistic regression models, bandit policy, and the bandit classifier:

optimizer <SGD> sgd1
    params:
        lr= 0.0001

optimizer <SGD> sgd2
    params:
        lr= 1e-05

algorithm <LogisticRegression> log1
    params:
        optimizer =sgd1

algorithm <LogisticRegression> log2
    params:
        optimizer =sgd2

algorithm <EpsilonGreedy> epsilon
    params:
        epsilon=0.1,
        decay=0.001,
        burn_in=20,
        seed=42

algorithm <BanditClassifier> bandit
    params:
        models = [log1 , log2],
        metric = acc,
        policy = epsilon

Preprocessing

We define the preprocessor:

preprocessor <StandardScaler> scaler

Metric

We define the evaluation metric:

metric <Accuracy> acc

Data

We define the data source and specify the target feature and preprocessor:

data Phishing {

    input_topic = "Phishing"
    features:
        target_feature = is_phishing
    preprocessors = scaler
}

Pipeline

Finally, we define the pipeline that brings everything together:

pipeline banditPipeline {
    output_topic = 'banditPipeline'
    data = Phishing
    algorithm = bandit
    metrics = acc
}

And that's it! This setup allows you to automatically select and adapt between different models in an online fashion, optimizing performance on the Phishing dataset using a bandit-based selection strategy.

connector {
        bootstrap_servers = "localhost:39092"
        security_protocol = "plaintext"
        consumer_group = 'model_selection_models'
        auto_offset_reset = "earliest"
}

optimizer <SGD> sgd1
    params:
        lr= 0.0001

optimizer <SGD> sgd2
    params:
        lr= 1e-05

algorithm <LogisticRegression> log1
    params:
        optimizer =sgd1

algorithm <LogisticRegression> log2
    params:
        optimizer =sgd2

preprocessor <StandardScaler> scaler

metric <Accuracy> acc

algorithm <EpsilonGreedy> epsilon
    params:
        epsilon=0.1,
        decay=0.001,
        burn_in=20,
        seed=42


algorithm <BanditClassifier> bandit
    params:
        models = [log1 , log2],
        metric = acc,
        policy = epsilon



data Phishing {

    input_topic = "Phishing"
    features:
        target_feature = is_phishing
    preprocessors = scaler
}


pipeline banditPipeline {
    output_topic = 'banditPipeline'
    data = Phishing
    algorithm = bandit
    metrics = acc
}

Models and Optimizers​

Preprocessing​

Beaver File Structure​

Connector​

Models and Optimizers​

Preprocessing​

Metric​

Data​

Pipeline​

Models and Optimizers

Preprocessing

Beaver File Structure

Connector

Models and Optimizers

Preprocessing

Metric

Data

Pipeline