Repository for code and data associated with the CAM framework paper on analogy generation and evaluation via InstructGPT (Davinci).
/data/non_adapt
: Candidate Analogies for Cybersecurity (cyber), Machine Learning (ai), and High-school science (sci) domains/data/non_analogies
: Non-Analogies used as negative sample for training Analoginess Scorer/data/sci_src
: Additional Analogies (generated with source concept as part of the prompt) for High-school science domain used as positive samples for training Analoginess Scorer
For all files names, pn means analogies generated with prompt id n in the paper, ht means high temperature, lt means low temperature.
Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt.
/data/analoginess_scorer/non_adapt.txt
: Candidate Analogies classified as non-analogy and analogy
Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Predicted Class (0 -- non-analogy, 1 -- analogy).
/data/analoginess_scorer/prompt_refiner/non_adapt.txt
: Candidate Analogies classified as non-analogy and analogy after prompt refinement
Each file contains the following fields separated by tab: (1) Generated analogy after prompt refinement, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Predicted Class (0 -- non-analogy, 1 -- analogy), (7) Generated analogy before prompt refinement (classified as non-analogy).
/data/extracted_src/non_adapt.txt
: Source extracted from Analogies
Each file contains the following fields separated by tab: (1) Generated Analogy, (2) Target Concept, (3) Prompt, (4) Temperature (low -- lt or high --ht) (5) Domain (for non-adaptive analogies)/Preference/Discipline, (6) Extracted Source(s).
In case the PLM generated multiple mappings between source and target concepts, they are separated by ###. We only used the first mapping in our experiments.
/data/ret_am/
: Each file contains analogies retrieved from the Web for one target and source concept pair. It has a json dictionary with the following keys and values: (1) "src_spec_res" with lists of Bing search results returned for source-specific queries, (2) "gen_res" with lists of Bing search results returned for general queries, (3) "all_bing_queries" with the list of all the source-specific and general queries.
/data/ranked_analogies/non_adapt.txt
: Contains Creative Ranking Score, Reliability Score, and Creativity Score of Analogies.
Trained model available at: https://drive.google.com/file/d/1egpgQyjDd9I0b-_wmIRPsgnK0OGs9FqX/view?usp=sharing.
Download the model and place in under /model
under this repo folder to run Analogical Style Scorer Prediction Code (src/analoginess_scorer_pred.py
)
data/crowd-eval/science.json
, data/crowd-eval/ml.json
, data/crowd-eval/cyber.json
: Format {analogy: meaning: {worker id1: score1, ...}, novelty: {worker id1: score1, ...}, queries: {worker id1: queries1}, urls found: {worker id1: urls1}}