Skip to content

PreFer Challenge Wiki

Adrienne Mendrik edited this page Mar 31, 2024 · 5 revisions

Scope

Research problem

Accurate predictions of the number and timing of children are crucial for effective resource allocation in society. However, despite many studies in the social sciences, we have no clear understanding of which factors are most important for fertility prediction or how well we are able to predict fertility behaviour.

Purpose statement

To gain insight into how well methods are able to predict fertility within a three year period (2021-2023), based on survey data from previous years (2007-2020) of people in the LISS panel who were aged 18-45 in 2020. The LISS panel is a representative online longitudinal panel of Dutch households.

Leaderboards

The PreFer challenge data is separated into a training dataset for tuning your method and a holdout dataset that is to validate your method performance. After submission your method will be run on the holdout data. Your performance scores on the holdout data will be added to the leaderboards, so your scores can be compared to the performance scores of other methods.

ℹ️ Leaderboards are generated at fixed time points, check out (important dates) for leaderboard submission deadlines.

The following leaderboards will be available:

*For the prediction of having a child in 2021-2023 (positive class).

For this challenge the F1 leaderboard is the main leaderboard.

ℹ️ The Python code that is used to calculate the metrics for the challenge leaderboards is included in this repo. Check out score.py.

Frequently Asked Questions

How to fork and clone this repository?

To fork and clone this repository, follow these steps:

  1. On the GitHub page click the "Fork" button in the top right corner of the page.
  2. Select the account or organization where you want to fork the repository.
  3. Wait for the forking process to complete.
  4. Once the forking process is complete, you will have a copy of the repository in your own GitHub account or organization.
  5. After forking the repository, use GitHub desktop or the following command to clone it to your local machine:
git clone https://github.com/<your-username>/fertility-prediction-challenge.git

How to update files in your forked repository?

Update the files in your forked repository in one of the following ways:

  • Edit the scripts directly in your github repository as explained here.
  • Change the files locally and then upload them to your github repository manually.
  • Use Git through the command line or use GitHub Desktop. Clone (i.e. save all the repo files on your computer) the repository as described here. Edit the files locally (on your computer) using the software that you normally use to work with Python or R scripts. After editing and saving the files, commit the changes (i.e. save changes in the local repository) as explained here. Then push the commit (i.e. upload changed version to your online repository) as explained here.

How to add or edit dependencies (libraries/packages)?

Python

Check out the environment.yml file to see which libraries are installed by default. You can add or remove libraries from this environment.yml file as you desire.

They can be copied from the output of the following command:

conda env export

It is recommended to state particular versions (i.e. pandas=1.5 rather than pandas>=1.5). Import the libraries in your submission.py file.

R

No packages are pre-installed.

Add packages to the packages.R file:

install.packages(c("dplyr","data.table","tidyr"), repos="https://cran.r-project.org")

List the used packages in your submission.R file (i.e. by adding library(c("dplyr","data.table","tidyr"))).

How to test your implementation?

You can test your implementation either via Docker or directly via anaconda.

Docker

First, install Docker. To test your implementation via Docker, build the Docker image:

docker build -t eyra-rank .

Then, run the Docker container:

docker run eyra-rank

This should run the script with the example data. You can run it against other data using:

docker run -v "$(pwd)/data:/data"  eyra-rank predict /data/PreFer_fake_data.csv

Anaconda

To test your implementation directly via anaconda, first create a new conda environment:

conda env create -f environment.yml --no-default-packages
conda activate eyra-rank

Then, run the script:

python3 run.py predict data/PreFer_fake_data.csv
Clone this wiki locally