Better installation instructions and less restrictive dependency vers…

…ions for easier install
UMassCDS · Mar 11, 2024 · 39b2234 · 39b2234
1 parent 425f293
commit 39b2234
Show file tree

Hide file tree

Showing 2 changed files with 9 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -9,11 +9,11 @@ The `ihop` directory is a python module with submodules that can also be run as
 - `ihop.visualizations`: Visualization utilities to create T-SNE projections used the in the cluster viewer applications
 - `ihop.utils`: Options to configure logging and Spark environment
 - `ihop.resources`: Data resources
-	- `ihop.resources.analogies`: Subreddit algebra analogies for tuning community2vec, taken from [social-dimensions](https://github.com/CSSLab/social-dimensions) with minor updates
+    - `ihop.resources.analogies`: Subreddit algebra analogies for tuning community2vec, taken from [social-dimensions](https://github.com/CSSLab/social-dimensions) with minor updates
     - `ihop.resources.collections`: Pre-defined collections of subreddits from the Media Cloud team.
 
 # External Dependencies
-- Python >= 3.7
+- Python >= 3.8, <3.11. Note that <3.11 is only necessary because of the `dev` dependency [irrCAC](https://pypi.org/project/irrcac/)
 - [Java](https://docs.oracle.com/en/java/javase/17/install/overview-jdk-installation.html) or [OpenJDK](https://openjdk.java.net/install/) (at least version 8). Make sure you have `JAVA_HOME` set appropriately
 - (Optional to support faster compression & customize Hadoop config for Spark) [Hadoop](https://hadoop.apache.org) at least version 3.3 is needed for Pyspark to properly decompress the Reddit zst files (see [this issue](https://stackoverflow.com/questions/64607248/configure-spark-on-yarn-to-use-hadoop-native-libraries) or [this one](https://stackoverflow.com/questions/67099204/reading-a-zst-archive-in-scala-spark-native-zstandard-library-not-available)). Install Hadoop and configure the environment variables using [these instructions](https://phoenixnap.com/kb/install-hadoop-ubuntu).
 - [unzstd](http://manpages.ubuntu.com/manpages/bionic/man1/unzstd.1.html) and [bzip2](https://www.sourceware.org/bzip2/) are used for recompressing the monthly Reddit dumps to bzip2 format, which Spark and Gensim are more readily able to handle than the zst files.
@@ -22,7 +22,7 @@ The `ihop` directory is a python module with submodules that can also be run as
 # Setup and Installation
 Note that [Pyspark](https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies) is used for training embeddings and LDA models, so you must have Java installed (at least version 8) and `JAVA_HOME` set appropriately.
 
-Use [Anaconda](https://docs.anaconda.com/anaconda/install/index.html) to create the `ihop` environment by running `conda create -n ihop python=3.9`, install the package and its dependencies using `pip install .`. Note that if you are using a Mac with an M1 or M2 chip, install sklearn and numpy *first* using conda, `conda install numpy==1.21.2 scikit-learn==1.0.1`.  This installs everything needed for just for preprocessing data and training models. If you're going to run the Dash app, install using `pip install .[app]`.
+Use [Anaconda](https://docs.anaconda.com/anaconda/install/index.html) to create the `ihop` environment by running `conda create -n ihop python=3.10`, install the package and its dependencies using `pip install .`. This installs everything needed for just for preprocessing data and training models. If you're going to run the Dash app, install using `pip install .[app]`. Note that if you are using a Mac with an M1 or M2 chip, install sklearn and numpy *first* using conda, `conda install numpy scikit-learn`. 
 
 For testing and development tools, install the `ihop` package to be importable for testing, install using `pip install -e .[test,dev]`
 

diff --git a/setup.cfg b/setup.cfg
@@ -1,27 +1,27 @@
 [metadata]
 name = ihop
-version = 2.0.0
+version = 2.1.0
 description = Tools for clustering Reddit based on NLP techniques
-long_description = = file: README.md
+long_description = file: README.md
 long_description_content_type = text/markdown
 license_files = LICENSE
 
 [options]
-python_requires = <3.10, >=3.8
+python_requires = >=3.8,<3.11
 packages = find:
 include_package_data = True
 install_requires =
     dvc[s3]==3.27.0
     gensim==4.1.2
-    joblib==1.1.0
-    numpy==1.21.2
+    joblib>=1.1.0
+    numpy>=1.21.0
     matplotlib==3.5.0
     pandas==1.3.5
     pyspark>=3.2.0
     pytimeparse==1.1.8
     scipy
     s3fs[boto3]>=2022.3.0
-    scikit-learn==1.0.1
+    scikit-learn>=1.4.0
 
 [options.extras_require]
 app =