Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models

On this article, you’ll study three expert-level function engineering methods — counterfactual options, domain-constrained representations, and causal-invariant options — for constructing strong and explainable fashions in high-stakes settings.

Subjects we’ll cowl embody:

How one can generate counterfactual sensitivity options for decision-boundary consciousness.
How one can prepare a constrained autoencoder that encodes a monotonic area rule into its illustration.
How one can uncover causal-invariant options that stay secure throughout environments.

With out additional delay, let’s start.

Knowledgeable-Stage Characteristic Engineering: Superior Strategies for Excessive-Stakes Fashions
Picture by Editor

Introduction

Constructing machine studying fashions in high-stakes contexts like finance, healthcare, and demanding infrastructure usually calls for robustness, explainability, and different domain-specific constraints. In these conditions, it may be price going past traditional function engineering methods and adopting superior, expert-level methods tailor-made to such settings.

This text presents three such methods, explains how they work, and highlights their sensible affect.

Counterfactual Characteristic Era

Counterfactual function technology contains methods that quantify how delicate predictions are to determination boundaries by setting up hypothetical knowledge factors from minimal modifications to unique options. The concept is easy: ask “how a lot should an unique function worth change for the mannequin’s prediction to cross a crucial threshold?” These derived options enhance interpretability — e.g. “how shut is a affected person to a prognosis?” or “what’s the minimal earnings enhance required for mortgage approval?”— they usually encode sensitivity immediately in function area, which may enhance robustness.

The Python instance under creates a counterfactual sensitivity function, cf_delta_feat0, measuring how a lot enter function feat_0 should change (holding all others mounted) to cross the classifier’s determination boundary. We’ll use NumPy, pandas, and scikit-learn.

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler

# Toy knowledge and baseline linear classifier
X, y = make_classification(n_samples=500, n_features=5, random_state=42)
df = pd.DataFrame(X, columns=[f”feat_{i}” for i in range(X.shape[1])])
df[‘target’] = y

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df.drop(columns=”goal”))
clf = LogisticRegression().match(X_scaled, y)

# Choice boundary parameters
weights = clf.coef_[0]
bias = clf.intercept_[0]

def counterfactual_delta_feat0(x, eps=1e-9):
“””
Minimal change to function 0, holding different options mounted,
required to maneuver the linear logit rating to the choice boundary (0).
For a linear mannequin: delta = -score / w0
“””
rating = np.dot(weights, x) + bias
w0 = weights[0]
return -score / (w0 + eps)

df[‘cf_delta_feat0’] = [counterfactual_delta_feat0(x) for x in X_scaled]
df.head()

import numpy as np

import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.datasets import make_classification

from sklearn.preprocessing import StandardScaler

# Toy knowledge and baseline linear classifier

X, y = make_classification(n_samples=500, n_features=5, random_state=42)

df = pd.DataFrame(X, columns=[f“feat_{i}” for i in range(X.shape[1])])

df[‘target’] = y

scaler = StandardScaler()

X_scaled = scaler.fit_transform(df.drop(columns=“goal”))

clf = LogisticRegression().match(X_scaled, y)

# Choice boundary parameters

weights = clf.coef_[0]

bias = clf.intercept_[0]

def counterfactual_delta_feat0(x, eps=1e–9):

“”“

Minimal change to function 0, holding different options mounted,

required to maneuver the linear logit rating to the choice boundary (0).

For a linear mannequin: delta = -score / w0

““”

rating = np.dot(weights, x) + bias

w0 = weights[0]

return –rating / (w0 + eps)

df[‘cf_delta_feat0’] = [counterfactual_delta_feat0(x) for x in X_scaled]

df.head()

Area-Constrained Illustration Studying (Constrained Autoencoders)

Autoencoders are broadly used for unsupervised illustration studying. We are able to adapt them for domain-constrained illustration studying: study a compressed illustration (latent options) whereas implementing specific area guidelines (e.g., security margins or monotonicity legal guidelines). Not like unconstrained latent components, domain-constrained representations are educated to respect bodily, moral, or regulatory constraints.

Beneath, we prepare an autoencoder that learns three latent options and reconstructs inputs whereas softly implementing a monotonic rule: larger values of feat_0 mustn’t lower the chance of the constructive label. We add a easy supervised predictor head and penalize violations by way of a finite-difference monotonicity loss. Implementation makes use of PyTorch.

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split

# Supervised cut up utilizing the sooner DataFrame `df`
X_train, X_val, y_train, y_val = train_test_split(
df.drop(columns=”goal”).values, df[‘target’].values, test_size=0.2, random_state=42
)

X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)

torch.manual_seed(42)

class ConstrainedAutoencoder(nn.Module):
def __init__(self, input_dim, latent_dim=3):
tremendous().__init__()
self.encoder = nn.Sequential(
nn.Linear(input_dim, 8), nn.ReLU(),
nn.Linear(8, latent_dim)
)
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 8), nn.ReLU(),
nn.Linear(8, input_dim)
)
# Small predictor head on high of the latent code (logit output)
self.predictor = nn.Linear(latent_dim, 1)

def ahead(self, x):
z = self.encoder(x)
recon = self.decoder(z)
logit = self.predictor(z)
return recon, z, logit

mannequin = ConstrainedAutoencoder(input_dim=X_train.form[1])
optimizer = optim.Adam(mannequin.parameters(), lr=1e-3)
recon_loss_fn = nn.MSELoss()
pred_loss_fn = nn.BCEWithLogitsLoss()

epsilon = 1e-2 # finite-difference step for monotonicity on feat_0
for epoch in vary(50):
mannequin.prepare()
optimizer.zero_grad()

recon, z, logit = mannequin(X_train)
# Reconstruction + supervised prediction loss
loss_recon = recon_loss_fn(recon, X_train)
loss_pred = pred_loss_fn(logit, y_train)

# Monotonicity penalty: y_logit(x + e*e0) – y_logit(x) must be >= 0
X_plus = X_train.clone()
X_plus[:, 0] = X_plus[:, 0] + epsilon
_, _, logit_plus = mannequin(X_plus)

mono_violation = torch.relu(logit – logit_plus) # adverse slope if > 0
loss_mono = mono_violation.imply()

loss = loss_recon + 0.5 * loss_pred + 0.1 * loss_mono
loss.backward()
optimizer.step()

# Latent options now mirror the monotonic constraint
with torch.no_grad():
_, latent_feats, _ = mannequin(X_train)
latent_feats[:5]

import torch

import torch.nn as nn

import torch.optim as optim

from sklearn.model_selection import train_test_cut up

# Supervised cut up utilizing the sooner DataFrame `df`

X_train, X_val, y_train, y_val = train_test_split(

df.drop(columns=“goal”).values, df[‘target’].values, test_size=0.2, random_state=42

)

X_train = torch.tensor(X_train, dtype=torch.float32)

y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)

torch.manual_seed(42)

class ConstrainedAutoencoder(nn.Module):

def __init__(self, input_dim, latent_dim=3):

tremendous().__init__()

self.encoder = nn.Sequential(

nn.Linear(input_dim, 8), nn.ReLU(),

nn.Linear(8, latent_dim)

)

self.decoder = nn.Sequential(

nn.Linear(latent_dim, 8), nn.ReLU(),

nn.Linear(8, input_dim)

)

# Small predictor head on high of the latent code (logit output)

self.predictor = nn.Linear(latent_dim, 1)

def ahead(self, x):

z = self.encoder(x)

recon = self.decoder(z)

logit = self.predictor(z)

return recon, z, logit

mannequin = ConstrainedAutoencoder(input_dim=X_train.form[1])

optimizer = optim.Adam(mannequin.parameters(), lr=1e–3)

recon_loss_fn = nn.MSELoss()

pred_loss_fn = nn.BCEWithLogitsLoss()

epsilon = 1e–2 # finite-difference step for monotonicity on feat_0

for epoch in vary(50):

mannequin.prepare()

optimizer.zero_grad()

recon, z, logit = mannequin(X_train)

# Reconstruction + supervised prediction loss

loss_recon = recon_loss_fn(recon, X_train)

loss_pred = pred_loss_fn(logit, y_train)

# Monotonicity penalty: y_logit(x + e*e0) – y_logit(x) must be >= 0

X_plus = X_train.clone()

X_plus[:, 0] = X_plus[:, 0] + epsilon

_, _, logit_plus = mannequin(X_plus)

mono_violation = torch.relu(logit – logit_plus) # adverse slope if > 0

loss_mono = mono_violation.imply()

loss = loss_recon + 0.5 * loss_pred + 0.1 * loss_mono

loss.backward()

optimizer.step()

# Latent options now mirror the monotonic constraint

with torch.no_grad():

_, latent_feats, _ = mannequin(X_train)

latent_feats[:5]

Causal-Invariant Options

Causal-invariant options are variables whose relationship to the result stays secure throughout totally different contexts or environments. By focusing on causal alerts fairly than spurious correlations, fashions generalize higher to out-of-distribution settings. One sensible route is to penalize modifications in threat gradients throughout environments so the mannequin can’t lean on environment-specific shortcuts.

The instance under simulates two environments. Solely the primary function is really causal; the second turns into spuriously correlated with the label in surroundings 1. We prepare a shared linear mannequin throughout environments whereas penalizing gradient mismatch, encouraging reliance on invariant (causal) construction.

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(42)
np.random.seed(42)

# Two environments with a spurious sign in env1
n = 300
X_env1 = np.random.randn(n, 2)
X_env2 = np.random.randn(n, 2)

# True causal relation: y relies upon solely on X[:,0]
y_env1 = (X_env1[:, 0] + 0.1*np.random.randn(n) > 0).astype(int)
y_env2 = (X_env2[:, 0] + 0.1*np.random.randn(n) > 0).astype(int)

# Inject spurious correlation in env1 by way of function 1
X_env1[:, 1] = y_env1 + 0.1*np.random.randn(n)

X1, y1 = torch.tensor(X_env1, dtype=torch.float32), torch.tensor(y_env1, dtype=torch.float32)
X2, y2 = torch.tensor(X_env2, dtype=torch.float32), torch.tensor(y_env2, dtype=torch.float32)

class LinearModel(nn.Module):
def __init__(self):
tremendous().__init__()
self.w = nn.Parameter(torch.randn(2, 1))

def ahead(self, x):
return x @ self.w

mannequin = LinearModel()
optimizer = optim.Adam(mannequin.parameters(), lr=1e-2)

def env_risk(x, y, w):
logits = x @ w
return torch.imply((logits.squeeze() – y)**2)

for epoch in vary(2000):
optimizer.zero_grad()
risk1 = env_risk(X1, y1, mannequin.w)
risk2 = env_risk(X2, y2, mannequin.w)

# Invariance penalty: align threat gradients throughout environments
grad1 = torch.autograd.grad(risk1, mannequin.w, create_graph=True)[0]
grad2 = torch.autograd.grad(risk2, mannequin.w, create_graph=True)[0]
penalty = torch.sum((grad1 – grad2)**2)

loss = (risk1 + risk2) + 100.0 * penalty
loss.backward()
optimizer.step()

print(“Discovered weights:”, mannequin.w.knowledge.numpy().ravel())

import numpy as np

import torch

import torch.nn as nn

import torch.optim as optim

torch.manual_seed(42)

np.random.seed(42)

# Two environments with a spurious sign in env1

n = 300

X_env1 = np.random.randn(n, 2)

X_env2 = np.random.randn(n, 2)

# True causal relation: y relies upon solely on X[:,0]

y_env1 = (X_env1[:, 0] + 0.1*np.random.randn(n) > 0).astype(int)

y_env2 = (X_env2[:, 0] + 0.1*np.random.randn(n) > 0).astype(int)

# Inject spurious correlation in env1 by way of function 1

X_env1[:, 1] = y_env1 + 0.1*np.random.randn(n)

X1, y1 = torch.tensor(X_env1, dtype=torch.float32), torch.tensor(y_env1, dtype=torch.float32)

X2, y2 = torch.tensor(X_env2, dtype=torch.float32), torch.tensor(y_env2, dtype=torch.float32)

class LinearModel(nn.Module):

def __init__(self):

tremendous().__init__()

self.w = nn.Parameter(torch.randn(2, 1))

def ahead(self, x):

return x @ self.w

mannequin = LinearModel()

optimizer = optim.Adam(mannequin.parameters(), lr=1e–2)

def env_risk(x, y, w):

logits = x @ w

return torch.imply((logits.squeeze() – y)**2)

for epoch in vary(2000):

optimizer.zero_grad()

risk1 = env_risk(X1, y1, mannequin.w)

risk2 = env_risk(X2, y2, mannequin.w)

# Invariance penalty: align threat gradients throughout environments

grad1 = torch.autograd.grad(risk1, mannequin.w, create_graph=True)[0]

grad2 = torch.autograd.grad(risk2, mannequin.w, create_graph=True)[0]

penalty = torch.sum((grad1 – grad2)**2)

loss = (risk1 + risk2) + 100.0 * penalty

loss.backward()

optimizer.step()

print(“Discovered weights:”, mannequin.w.knowledge.numpy().ravel())

Closing Remarks

We lined three superior function engineering methods for high-stakes machine studying: counterfactual sensitivity options for decision-boundary consciousness, domain-constrained autoencoders that encode skilled guidelines, and causal-invariant options that promote secure generalization. Used judiciously, these instruments could make fashions extra strong, interpretable, and dependable the place it issues most.

Expert-Level Feature Engineering: Advanced Techniques for High-Stakes Models

Introduction

Counterfactual Characteristic Era

Area-Constrained Illustration Studying (Constrained Autoencoders)

Causal-Invariant Options

Closing Remarks

Leave a Reply Cancel reply

Follow US

Popular News

A Turning Point for the Franchise?

Twisted Metal Season 2’s Axel Backstory Will Not Come From The Games, Reveals Showrunner

Hideo Kojima Reviews Pluribus & Compares It To Iconic 69-Year-Old Sci-Fi Movie

40 Cauliflower Recipes You’ll Actually Want to Make

Empirerun (empirerun.net) program details. Reviews, Scam or Paying

Categories

About US

Quick Links

Important Links

Subscribe US

Introduction

Counterfactual Characteristic Era

Area-Constrained Illustration Studying (Constrained Autoencoders)

Causal-Invariant Options

Closing Remarks

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

A Turning Point for the Franchise?

Twisted Metal Season 2’s Axel Backstory Will Not Come From The Games, Reveals Showrunner

Hideo Kojima Reviews Pluribus & Compares It To Iconic 69-Year-Old Sci-Fi Movie

40 Cauliflower Recipes You’ll Actually Want to Make

Empirerun (empirerun.net) program details. Reviews, Scam or Paying

Categories

About US

Quick Links

Important Links

Subscribe US