dataclr.models#
To maximize the performance of dataclr models during feature selection:
Single-threaded Execution: Ensure that models are configured to use a single thread (e.g.,
n_jobs=1
) if they support parallel execution. This avoids contention between the parallelized feature selection process and the model’s internal parallelism.Non-parallelized Solvers: For models like
LogisticRegression
in scikit-learn, use solvers that are not parallelized, such assolver='liblinear'
.
These adjustments ensure the distributed feature selection algorithms in dataclr operate efficiently without interference.
- class dataclr.models.BaseModel#
Abstract base class for machine learning models.
This class defines the interface that models must adhere to for compatibility with feature selection methods. Subclasses must implement the
fit
andpredict
methods.- Attributes for Wrapper Method Compatibility:
feature_importances_
: Attribute for feature importance scores(e.g., tree-based models).
coef_
: Attribute for feature coefficients (e.g., linear models).
Subclasses must ensure that at least one of these attributes is implemented to support wrapper-based feature selection methods.
- abstractmethod fit(X_train: DataFrame, y_train: Series) None #
Abstract method to train the model.
- Parameters:
X_train (pd.DataFrame) – Feature matrix for training data.
y_train (pd.Series) – Target variable for training data.
- Raises:
NotImplementedError – This method must be implemented in a subclass.
- abstractmethod predict(X_test: DataFrame) ndarray #
Abstract method to generate predictions.
- Parameters:
X_test (pd.DataFrame) – Feature matrix for testing data.
- Returns:
Array of predictions.
- Return type:
np.ndarray
- Raises:
NotImplementedError – This method must be implemented in a subclass.