incidence_table with nan values #185

LotteNotelaers · 2024-09-23T22:25:04Z

Dear,

I get an error running the setup_data_structures.py step. Specifically, when running the def build_incidence_table line 87.

This is the incidence dataframe (result line 85)

If the next line is run (line 87): incidence_table[control_row.target] = incidence
the result is:

So it looks like the numbers are not transferred to the incidence_table dataframe.

Can you help me with this issue?

Kind regards,
Lotte

bettinardi · 2024-09-23T23:36:52Z

I'm guessing there's either a configuration file inconsistency or a crosswalk file inconsistency. Would you like to share your configuration yaml and/or your geography crosswalk file

LotteNotelaers · 2024-09-24T00:00:27Z

Hi bettinardi,

Thanks for your help. Here are the files:
geo_cross_walk.csv
settings.zip
I needed to zip the settings.yaml because this file type is not supported by Github.

Kind regards,
Lotte

LotteNotelaers · 2024-09-24T11:45:14Z

Hi,

I think it has to do with the household_df indices being a string and the hh_id column in the person_df being of mixed type.

In the input seed data, the SERIALNO (=hh_id) column is of mixed type, both int and str are in that column. I found that you can specify the dtypes in the settings.yaml. This makes sure they are consistently recognized as strings when reading the csv files. This resolved the problem.

Thanks for the help!

LotteNotelaers · 2024-09-24T12:24:31Z

Hi,

In line 242 in setup_data_structures.py I get an error now because it tries to set the type of the hh_id to int.

household_groups[household_id_col] = household_groups.index.astype(int)

{OverflowError}Python int too large to convert to C long

This doesn`t work because the hh_id contains numbers but also sometimes letters.

What would be the best way to resolve this?

change the code to household_groups[household_id_col] = household_groups.index.astype(str)
or add new household IDs to the seed data and make sure the new IDs only contain numbers?

Kind regards,
Lotte

bettinardi · 2024-09-24T15:19:05Z

add new household IDs to the seed data and make sure the new IDs only contain numbers

HH_ID is different than a PUMS serial number

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incidence_table with nan values #185

incidence_table with nan values #185

LotteNotelaers commented Sep 23, 2024

bettinardi commented Sep 23, 2024

LotteNotelaers commented Sep 24, 2024

LotteNotelaers commented Sep 24, 2024

LotteNotelaers commented Sep 24, 2024

bettinardi commented Sep 24, 2024

incidence_table with nan values #185

incidence_table with nan values #185

Comments

LotteNotelaers commented Sep 23, 2024

bettinardi commented Sep 23, 2024

LotteNotelaers commented Sep 24, 2024

LotteNotelaers commented Sep 24, 2024

LotteNotelaers commented Sep 24, 2024

bettinardi commented Sep 24, 2024