Releases: AdrianAntico/RetroFit
Feature Engineering Class
Feature Engineering is now a class-based setup. User can choose between datatable, polars, and pandas for feature engineering operations.
The ML examples on the readme currently reflects usage for the datatable version with expanded feature engineering to highlight their usage.
V0.1.4
RetroFit class:
Added XGBoost and LightGBM. Scoring also allows users to pass in new data for scoring. Examples on README
Added RetroFit class v1
Added the first version of many for the RetroFit class for machine learning
####################################
# Goals
####################################
Class Initialization
Model Initialization
Training
Grid Tuning
Scoring
Model Evaluation
Model Interpretation
####################################
# Functions
####################################
ML1_Single_Train()
ML1_Single_Score()
####################################
# Attributes
####################################
self.ModelArgs = ModelArgs
self.ModelArgsNames = [*self.ModelArgs]
self.Runs = len(self.ModelArgs)
self.DataSets = DataSets
self.DataSetsNames = [*self.DataSets]
self.ModelList = dict()
self.ModelListNames = []
self.FitList = dict()
self.FitListNames = []
self.EvaluationList = dict()
self.EvaluationListNames = []
self.InterpretationList = dict()
self.InterpretationListNames = []
self.CompareModelsList = dict()
self.CompareModelsListNames = []
####################################
# Example Usage
####################################
# Setup Environment
import timeit
import datatable as dt
from datatable import sort, f, by
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import MachineLearning as ml
# Load some data
# BechmarkData.csv is located is the tests folder
Path = "./BenchmarkData.csv"
data = dt.fread(Path)
# Create partitioned data sets
Data = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName=None,
PartitionType='random',
Ratios=[0.7,0.2,0.1],
ByVariables=None,
Sort=False,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
# Prepare modeling data sets
DataSets = ml.ML0_GetModelData(
Processing='Ftrl',
TrainData=Data['TrainData'],
ValidationData=Data['ValidationData'],
TestData=Data['TestData'],
ArgsList=None,
TargetColumnName='Leads',
NumericColumnNames=['XREGS1', 'XREGS2', 'XREGS3'],
CategoricalColumnNames=['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'],
TextColumnNames=None,
WeightColumnName=None,
Threads=-1,
InputFrame='datatable')
# Get args list for algorithm and target type
ModelArgs = ml.ML0_Parameters(
Algorithms='Ftrl',
TargetType="Regression",
TrainMethod="Train")
# Initialize RetroFit
x = RetroFit(ModelArgs, DataSets)
# Train Model
x.ML1_Single_Train(Algorithm='Ftrl')
# Score data
x.ML1_Single_Score(DataName=x.DataSetsNames[2], ModelName=x.ModelListNames[0], Algorithm='Ftrl')
# Scoring data names
x.DataSets.keys()
# Check ModelArgs Dict
x.ModelArgs
# Check the names of data sets collected
x.DataSetsNames
# List of model names
x.ModelListNames
# List of model fitted names
x.FitListNames
# List of comparisons
x.CompareModelsListNames
V0.1.0
Enhanced FE2_AutoDataPartition() for Processing = 'datatable' and 'polars'
Added methods for xgboost and lightgbm for ML0_GetModelData()
Modified sorting and subsetting tasks for Processing = 'polars'
V0.0.9
Added polars processing to FE2_AutoDataPartition(), added examples to README, and fixed some bugs in the other functions
New Functions
Created framework for organizing modules and functions within modules.
New functions include:
FE2_AutoDataParition()
# Example
import datatable as dt
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import utils as u
# random
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
DataSets = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName='CalendarDateColumn',
PartitionType='random',
Ratios=[0.70,0.20,0.10],
ByVariables=None,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
TrainData = DataSets['TrainData']
ValidationData = DataSets['ValidationData']
TestData = DataSets['TestData']
ArgsList = DataSets['ArgsList']
FE1_DummyVariables()
import datatable as dt
import retrofit
from retrofit import FeatureEngineering as fe
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
Output = fe.FE1_DummyVariables(
data=data,
ArgsList=None,
CategoricalColumnNames=['MarketingSegments','MarketingSegments2'],
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
data = Output['data']
ArgsList = Output['ArgsList']
ML0_GetModelData()
# ML0_GetModelData Example:
import datatable as dt
from datatable import sort, f, by
import retrofit
from retrofit import FeatureEngineering as fe
from retrofit import MachineLearning as ml
# Load some data
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
# Create partitioned data sets
DataSets = fe.FE2_AutoDataParition(
data=data,
ArgsList=None,
DateColumnName='CalendarDateColumn',
PartitionType='random',
Ratios=[0.70,0.20,0.10],
ByVariables=None,
Processing='datatable',
InputFrame='datatable',
OutputFrame='datatable')
# Collect partitioned data
TrainData = DataSets['TrainData']
ValidationData = DataSets['ValidationData']
TestData = DataSets['TestData']
del DataSets
# Create catboost data sets
DataSets = ml.ML0_GetModelData(
TrainData=TrainData,
ValidationData=ValidationData,
TestData=TestData,
ArgsList=None,
TargetColumnName='Leads',
NumericColumnNames=['XREGS1', 'XREGS2', 'XREGS3'],
CategoricalColumnNames=['MarketingSegments','MarketingSegments2','MarketingSegments3','Label'],
TextColumnNames=None,
WeightColumnName=None,
Threads=-1,
Processing='catboost',
InputFrame='datatable')
# Collect catboost training data
catboost_train = DataSets['train_data']
catboost_validation = DataSets['validation_data']
catboost_test = DataSets['test_data']
V0.0.3
Initial release to PYPI today. Available functions include:
- AutoLags(): Processing can be done via datatable or polars
- AutoRollStats(): Processing can be done via datatable (polars coming soon)
- AutoDiff(): Processing can be done via datatable (polars coming soon)
- AutoCalendarVariables(): Processing can be done via datatable (polars coming soon)
The goal is to enable the user to pass in a datatable frame, a polars frame, or a pandas frame, then allow the user to choose between datatable or polars to handle the processing (my two favorite data wrangling packages in Python), and then allow the user to select from the three for an output frame type.