____ ____ ____ _ _ _
| _ \ |___ \ | _ \ _ __ ___ __| |(_) ___ | |_
| |_) | __) || |_) || '__| / _ \ / _` || | / __|| __|
| __/ / __/ | __/ | | | __/| (_| || || (__ | |_
|_| |_____||_| |_| \___| \__,_||_| \___| \__|
P2Predict is an open-source Python comand-line program for advanced procurement price prediction. It employs Artificial Intelligence & machine learning techniques to provide reliable and actionable insights into price trends, aiding in strategic decision-making in procurement.
The project is in active development - Contributions are welcome!
As I work on core features first, this program is targeting procurement & commodity managers that are fairly technical. At this stage, P2Predict is not polished for the non-technical business user.
This software is released under the MIT license. See LICENSE
for the license details.
- Predict prices (or any other target feature) based on a trained model
- Import training data from a CSV file
- Perform feature/impact analysis
- Auto detection of the most predictive features using RFE and a Random Forest
- Auto detection of low information features that might bias the model if selected
- Implementation of Hyper Parameter Optimization for Ridge, XGBoost and Random Forest
- Train a machine learning model on the selected features to predict the price (or any other target)
- Support for Ridge, XGBoost, and Random Forest ML algorithms
- Models can be easily saved and loaded
- Evaluation metrics (mean absolute error and R^2 scores are supported)
- Create a PDF file with model performance indicators (predicted vs actual price, distribution of prediction errors, ...)
To use P2Predict, follow these steps:
- Install the required dependencies by invoking
pip install -r requirements.txt
- Ensure your data is in a CSV format.
- Remove any blanks or gaps in the data (empty columns, empty cells, etc.).
- Address any errors in the data (e.g., #NAs).
- Verify that numeric columns do not contain text.
-
Use the
p2predict_train.py
tool to train a new model. -
The tool accepts the following arguments:
python3 p2predict_train.py [OPTIONS]
--input
,-i
: Path to your input CSV file. This dataset is used for training.--target
,-t
: Name of the feature to predict (e.g., "Price").--expert
,-e
: Toggle Expert Mode. In Expert mode, you have more control over the training process.--algorithm
,-a
: Choose the machine learning algorithm to be used (ridge, xgboost, or randomforest).--interactive
: Activate interactive mode. If not set, you must specify all required options.--verbose
,-v
: Increase output verbosity.--training_features
,-f
: List of training features to be used, separated by commas.
For a complete list of options, run
python3 p2predict_train.py --help
.python3 p2predict_train.py --interactive
This launches P2Predict in Interactive Auto-Mode, where the program guides you through the process and optimizes decisions automatically.
python3 p2predict_train.py --expert --interactive
This launches P2Predict in Interactive Expert Mode, giving you access to advanced features like feature selection, algorithm selection, and hyperparameter optimization.
python3 p2predict_train.py --expert --input examples/example.csv --algorithm ridge --target Price --training_features Weight,Size
This runs P2Predict in Non-Interactive Expert Mode, training a Ridge regression model using the specified input file, target, and training features.
python3 p2predict_train.py --input data/sales.csv --target Revenue --algorithm xgboost
This example trains an XGBoost model in Auto-Mode, using 'Revenue' as the target variable. The program will automatically select the best features from the 'sales.csv' file.
python3 p2predict_train.py --input data/customer_data.csv --target Churn --algorithm randomforest --verbose
This runs a Random Forest model training with increased verbosity, predicting customer churn based on the data in 'customer_data.csv'.
python3 p2predict_train.py --input data/housing.csv --target Price --algorithm ridge --training_features Area,Bedrooms,Location,YearBuilt
This example trains a Ridge regression model to predict housing prices, explicitly specifying multiple training features.
-
After training, the model will be saved and can be used for predictions with
p2predict.py
.
-
Use the
p2predict.py
tool to predict a new price using a trained model. -
The tool accepts the following arguments:
python3 p2predict.py -m MODEL_PATH [-p PREDICT_USING] [-i PREDICT_FILE]
-m, --model MODEL_PATH
: Path to the trained model file (.model file).-p, --predict_using TEXT
: Inline prediction feature/value pair to be fed to the trained model.-i, --predict_file FILE
: A CSV file that contains prediction features and values. This file will be fed to the trained model to generate predictions.
Examples:
- Using inline prediction:
python3 p2predict.py -m models/my_trained_model.model -p "weight_g:25,region:5"
This command uses the model saved in
models/my_trained_model.model
to predict the price for an object with a weight of 25g and located in region 5.- Using a prediction file:
python3 p2predict.py -m models/my_trained_model.model -i prediction_data.csv
This command uses the model saved in
models/my_trained_model.model
to generate predictions for all the entries in theprediction_data.csv
file.
Note: Make sure the features you provide (either inline or in the CSV file) match the features the model was trained on. The model in these examples was trained using p2predict_train.py
, and the prediction features should correspond to the training features used.
Check requirements.txt
for exact versions. Install with pip install -r requirements.txt
- click
- joblib
- matplotlib
- mpld3
- numpy
- pandas
- rich
- scikit-learn
- seaborn
- xgboost
- halo
- questionary
For data examples, check dummy/example.csv
.
I warmly welcome contributions to this project! If you have an exciting feature idea or have discovered a bug, I'd be delighted to hear from you. Please don't hesitate to open an issue or submit a pull request.
I'm particularly keen on expanding my collection of datasets for various direct and indirect commodities. This includes, but is not limited to, Integrated Circuits (ICs), Passive Components, Plastic Parts, and Mechanical Parts. If you're aware of any open datasets in these areas, or if your organization would like to contribute a dataset, I'd be incredibly grateful. Please reach out to me - your input could be invaluable in enhancing the project's capabilities and benefiting the wider procurement community.
I work on open source projects during my free time. If you think this projects adds value to the procurement community, please consider sponsoring a donation!