I use python to solve the problem, and there is a source code called donation-analysis.py in the source file.
My program will read records in incont.txt
line by line. When the program read a new record, it will do:
- check whether all fields that we are interested in are valid
- if all fields are valid, store the record in two hash tables called
transaction_record
anddonor_list
- check the number of record in
donor_list
with key valueNAME
+ZIP_CODE
- if there are more than one record in
donor_list[NAME + ZIP_CODE]
, the donor is a repeat donor, go to step 5, otherwise, go to step 6 - using
CMTE_ID
,ZIP_CODE
andyear
as a key, call the function calledprintrecord()
to output the requested calculations - repeat step 1-5 until read the end of file
I use two hash tables called transaction_record
and donor_list
to store records.
- In
transaction_record
, if contributions have same recipient, zip code and calender year, they will be listed together. The key oftransaction_record
isCMTE_ID
+ZIP_CODE
+year
, and the corresponding value are lists of ( or maybe only one list )[NAME,TRANSACTION_MT, month_date]
- In
donor_list
, all contributions from same donor will map to same key. The key isNAME + ZIP_CODE
, and the value are lists (or maybe one list) of[CMTE_ID,TRANSACTION_MT,year,month_date]
People may wonder why we store same record two times. Although creating two hash tables which store same data may waste some space, the time complexity can reduce a lot when we find data we need and output the requested calculations.
All functions are in a class called donation_analysis()
, what they have to do is:
load_percentile()
: It load the percentile from percentile.txt and store it in self.percentilefind_percentile_ind()
: It find the ordinal rank through the nearest-rank methodprocess_transaction_data():
load records fromincont.txt
and store them in two hash tables. It will calledprintrecord()
when the new record is from a repeat donor.printrecord()
: GivenCMTE_ID
,ZIP_CODE
andyear
It uses priority queue to sort the contribution from repeat donors by the transaction month and list all output data.isValid()
: check whether the new record has correct format
There is only one program called donation-analysis.py in the source file. It doesn't require additional libraries, environments, or dependencies, Users can run the program through run.sh.