dataclr.models#

To maximize the performance of dataclr models during feature selection:

  • Single-threaded Execution: Ensure that models are configured to use a single thread (e.g., n_jobs=1) if they support parallel execution. This avoids contention between the parallelized feature selection process and the model’s internal parallelism.

  • Non-parallelized Solvers: For models like LogisticRegression in scikit-learn, use solvers that are not parallelized, such as solver='liblinear'.

These adjustments ensure the distributed feature selection algorithms in dataclr operate efficiently without interference.

class dataclr.models.BaseModel#

Abstract base class for machine learning models.

This class defines the interface that models must adhere to for compatibility with feature selection methods. Subclasses must implement the fit and predict methods.

Attributes for Wrapper Method Compatibility:
  • feature_importances_: Attribute for feature importance scores

    (e.g., tree-based models).

  • coef_: Attribute for feature coefficients (e.g., linear models).

Subclasses must ensure that at least one of these attributes is implemented to support wrapper-based feature selection methods.

abstractmethod fit(X_train: DataFrame, y_train: Series) None#

Abstract method to train the model.

Parameters:
  • X_train (pd.DataFrame) – Feature matrix for training data.

  • y_train (pd.Series) – Target variable for training data.

Raises:

NotImplementedError – This method must be implemented in a subclass.

abstractmethod predict(X_test: DataFrame) ndarray#

Abstract method to generate predictions.

Parameters:

X_test (pd.DataFrame) – Feature matrix for testing data.

Returns:

Array of predictions.

Return type:

np.ndarray

Raises:

NotImplementedError – This method must be implemented in a subclass.