diff --git a/.gitignore b/.gitignore
index 524f096..902b29b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -22,3 +22,6 @@
 # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
 hs_err_pid*
 replay_pid*
+/.idea/
+/data/
+/prompts/
diff --git a/docs/batch.md b/docs/batch.md
new file mode 100644
index 0000000..59d361f
--- /dev/null
+++ b/docs/batch.md
@@ -0,0 +1,34 @@
+# batch
+
+This command creates prompts from all phenopackets in the input directory.
+
+## Getting the input data
+
+Go to the [Releases](https://github.com/monarch-initiative/phenopacket-store/releases) section of 
+[phenopacket-store](https://github.com/monarch-initiative/phenopacket-store){:target="_blank"}, and download the
+latest release (currently 0.1.5 on April 29, 2024, but evolving rapidly). Currently, this resource contains over 4300 phenopackets.
+
+
+Download one of the archives (e.g., ``all_phenopackets.zip``) and unpack in a location of your choice.
+
+
+Then run the following command.
+
+```shell title="batch"
+java -jar phenopacket2prompt.jar batch  -d <all_phenopackets>
+```
+where ``<all_phenopackets>`` is the complete relative or absolute path to the unpacked directory. 
+
+phenopacket2prompt will create a new subdirectory called ``prompts``in the current directory. It will contain
+one folder for each language (currently, English-en and Spanish-es), as well as a file called ``correct_results.tsv``
+with the following structure
+
+
+| Disease name                               | OMIM identifier |                                 Prompt file name |
+|--------------------------------------------|:---------------:|-------------------------------------------------:|
+| Birt-Hogg-Dube syndrome 2                  |   OMIM:620459   | PMID_36440963_IIIPMID_36440963_III-33-prompt.txt |
+| Immunodeficiency 115 with autoinflammation |   OMIM:620632   |                 PMID_26008899_patient-prompt.txt |
+| Jacobsen syndrome	                         |   OMIM:147791   |           	PMID_15266616_148-prompt.txt   |
+
+
+Note that the prompt file name is the same for every language.
\ No newline at end of file
diff --git a/docs/english.md b/docs/english.md
index e69de29..9966294 100644
--- a/docs/english.md
+++ b/docs/english.md
@@ -0,0 +1,3 @@
+# English
+
+Todo -- let's write a summary of the translations in each language.
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 7fa954b..5de88dd 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -8,36 +8,6 @@ GA4GH phenopackets.
 
 
 
-## Installation
-
-
-Most users should download the prebuilt executable file from the
-[Releases](https://github.com/monarch-initiative/phenopacket2prompt/releases) page of the GutHub repository.
-
-It is also possible to build the application from source using standard Maven and Java tools.
-
-```shell title="building the app"
-git clone https://github.com/monarch-initiative/phenopacket2prompt.git
-cd phenopacket2prompt
-maven package
-java -jar target/phenopacket2prompt.jar
-```
-
-## Setup
-
-
-First download the latest copy of the [Human Phenotype Ontology](https://hpo.jax.org/app/) hp.json file. This file is
-used for text mining of clinical signs and symptoms. For more information about the HPO, see
-[Koehler et al. (2021)](https://pubmed.ncbi.nlm.nih.gov/33264411/). Adjust the path to the `phenopacket2prompt.jar`
-file as necessary.
-
-
-
-```shell title="download"
-java -jar phenopacket2prompt.jar download
-```
-
-
 
 
 ## Running phenopacket2prompt
diff --git a/docs/setup.md b/docs/setup.md
index 914214d..be562a8 100644
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -1,6 +1,6 @@
 # Set-up
 
-TODO -- how to setup Java etc.
+phenopacket2prompt requires at least Java 17. To build it from scratch, maven is also required.
 
 ## Download command
 Before running the batch command, run the download command to get the necessary files
@@ -9,19 +9,34 @@ Before running the batch command, run the download command to get the necessary
 java -jar target/phenopacket2prompt.jar download
 ```
 
-## Batch command
-To run the batch command, first download the latest release from the 
-[releases](https://github.com/monarch-initiative/phenopacket-store/releases) section of the phenopacket-store
-repository. Unpack either all_phenopackets.tgz or all_phenopackets.zip (the files are identical except for the
-method of compression).
 
+
+## Installation
+
+
+Most users should download the prebuilt executable file from the
+[Releases](https://github.com/monarch-initiative/phenopacket2prompt/releases) page of the GutHub repository.
+
+It is also possible to build the application from source using standard Maven and Java tools.
+
+```shell title="building the app"
+git clone https://github.com/monarch-initiative/phenopacket2prompt.git
+cd phenopacket2prompt
+maven package
+java -jar target/phenopacket2prompt.jar
 ```
-java -jar target/phenopacket2prompt.jar batch -d <all_phenopackets>
-```
-Replasce `<all_phenopackets>` with the actual path on your system.
 
-The app should create a folder "prompts", with two subdirectories, "en" and "es" with English and Spanish prompts. 
-There are some errors that still need to be fixed, but several thousand prompts should appear.
+## Setup
+
+
+First download the latest copy of the [Human Phenotype Ontology](https://hpo.jax.org/app/) hp.json file. This file is
+used for text mining of clinical signs and symptoms. For more information about the HPO, see
+[Koehler et al. (2021)](https://pubmed.ncbi.nlm.nih.gov/33264411/). Adjust the path to the `phenopacket2prompt.jar`
+file as necessary.
+
+
+
+```shell title="download"
+java -jar phenopacket2prompt.jar download
+```
 
-## Todo
-also output a file with expected diagnosis
diff --git a/src/main/java/org/monarchinitiative/phenopacket2prompt/cmd/GbtTranslateBatchCommand.java b/src/main/java/org/monarchinitiative/phenopacket2prompt/cmd/GbtTranslateBatchCommand.java
index 1546a32..c70d93d 100644
--- a/src/main/java/org/monarchinitiative/phenopacket2prompt/cmd/GbtTranslateBatchCommand.java
+++ b/src/main/java/org/monarchinitiative/phenopacket2prompt/cmd/GbtTranslateBatchCommand.java
@@ -8,6 +8,7 @@
 import org.monarchinitiative.phenopacket2prompt.international.HpInternationalOboParser;
 import org.monarchinitiative.phenopacket2prompt.model.PhenopacketDisease;
 import org.monarchinitiative.phenopacket2prompt.model.PpktIndividual;
+import org.monarchinitiative.phenopacket2prompt.output.CorrectResult;
 import org.monarchinitiative.phenopacket2prompt.output.PromptGenerator;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -60,12 +61,26 @@ public Integer call() throws Exception {
         LOGGER.info("Got {} translations", internationalMap.size());
         List<File> ppktFiles = getAllPhenopacketJsonFiles();
         createDir("prompts");
-        outputPromptsEnglish(ppktFiles, hpo);
+        List<CorrectResult>  correctResultList = outputPromptsEnglish(ppktFiles, hpo);
+        // output all non-English languages here
         PromptGenerator spanish = PromptGenerator.spanish(hpo, internationalMap.get("es"));
         outputPromptsInternational(ppktFiles, hpo, "es", spanish);
+        // output file with correct diagnosis list
+        outputCorrectResults(correctResultList);
         return 0;
     }
 
+    private void outputCorrectResults(List<CorrectResult> correctResultList) {
+        File outfile = new File("prompts" + File.separator + "correct_results.tsv");
+        try (BufferedWriter bw = new BufferedWriter(new FileWriter(outfile))) {
+            for (var cres : correctResultList) {
+                bw.write(String.format("%s\t%s\t%s\n", cres.diseaseLabel(), cres.diseaseId().getValue(), cres.promptFileName()));
+            }
+        } catch (IOException e) {
+            e.printStackTrace();
+        }
+        System.out.printf("[INFO] Output a total of %d prompts in en and es.\n", correctResultList.size());
+    }
 
 
     private String getFileName(String phenopacketID) {
@@ -99,10 +114,11 @@ private void outputPromptsInternational(List<File> ppktFiles, Ontology hpo, Stri
     }
 
 
-    private void outputPromptsEnglish(List<File> ppktFiles, Ontology hpo) {
+    private List<CorrectResult> outputPromptsEnglish(List<File> ppktFiles, Ontology hpo) {
         createDir("prompts/en");
+        List<CorrectResult> correctResultList = new ArrayList<>();
         PromptGenerator generator = PromptGenerator.english(hpo);
-        List<String> diagnosisList = new ArrayList<>();
+
         for (var f: ppktFiles) {
             PpktIndividual individual = new PpktIndividual(f);
             List<PhenopacketDisease> diseaseList = individual.getDiseases();
@@ -114,13 +130,15 @@ private void outputPromptsEnglish(List<File> ppktFiles, Ontology hpo) {
             String promptFileName = getFileName( individual.getPhenopacketId());
             String diagnosisLine = String.format("%s\t%s\t%s\t%s", pdisease.getDiseaseId(), pdisease.getLabel(), promptFileName, f.getAbsolutePath());
             try {
-                diagnosisList.add(diagnosisLine);
                 String prompt = generator.createPrompt(individual);
                 outputPrompt(prompt, promptFileName, "prompts/en");
+                var cres = new CorrectResult(promptFileName, pdisease.getDiseaseId(), pdisease.getLabel());
+                correctResultList.add(cres);
             } catch (Exception e) {
                 e.printStackTrace();
             }
         }
+        return correctResultList;
     }
 
 
diff --git a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/CorrectResult.java b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/CorrectResult.java
new file mode 100644
index 0000000..c8d714c
--- /dev/null
+++ b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/CorrectResult.java
@@ -0,0 +1,6 @@
+package org.monarchinitiative.phenopacket2prompt.output;
+
+import org.monarchinitiative.phenol.ontology.data.TermId;
+
+public record CorrectResult(String promptFileName, TermId diseaseId, String diseaseLabel) {
+}
diff --git a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/english/PpktTextEnglish.java b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/english/PpktTextEnglish.java
index cd63858..5bf3c37 100644
--- a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/english/PpktTextEnglish.java
+++ b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/english/PpktTextEnglish.java
@@ -6,30 +6,14 @@ public class PpktTextEnglish implements PhenopacketTextGenerator {
     @Override
     public String QUERY_HEADER() {
         return  """
-I am running an experiment on a clinicopathological case conference to see how your diagnoses 
-compare with those of human experts. I am going to give you part of a medical case. These have 
-all been published in the New England Journal of Medicine. You are not trying to treat any patients.
-As you read the case, you will notice that there are expert discussants giving their thoughts. 
-In this case, you are “Dr. GPT-4,” an Al language model who is discussing the case along with 
-human experts. A clinicopathological case conference has several unspoken rules. The first is 
-that there is most often a single definitive diagnosis (though rarely there may be more than one),
-and it is a diagnosis that is known today to exist in humans. The diagnosis is almost always 
-confirmed by some sort of clinical pathology test or anatomic pathology test, though in 
-rare cases when such a test does not exist for a diagnosis the diagnosis can instead be 
-made using validated clinical criteria or very rarely just confirmed by expert opinion. 
-You will be told at the end of the case description whether a diagnostic test/tests are 
-being ordered, which you can assume will make the diagnosis/diagnoses. After you read the case, 
-I want you to give two pieces of information. The first piece of information is your most likely 
-diagnosis/diagnoses. You need to be as specific as possible -- the goal is to get the correct 
-answer, not a broad category of answers. You do not need to explain your reasoning, just give 
-the diagnosis/diagnoses. The second piece of information is to give a robust differential diagnosis, 
-ranked by their probability so that the most likely diagnosis is at the top, and the least likely 
-is at the bottom. There is no limit to the number of diagnoses on your differential. You can give 
-as many diagnoses as you think are reasonable. You do not need to explain your reasoning, 
-just list the diagnoses. Again, the goal is to be as specific as possible with each of the 
-diagnoses. 
-Do you have any questions, Dr. GPT-4?
+I am running an experiment on a clinical case report to see how your diagnoses compare with those of human experts. I am going to give you part of a medical case. You are not trying to treat any patients. In this case, you are “Dr. GPT-4,” an AI language model who is providing a diagnosis Here are some guidelines. First, there is a single definitive diagnosis, and it is a diagnosis that is known today to exist in humans. The diagnosis is almost always confirmed by some sort of genetic test, though in rare cases when such a test does not exist for a diagnosis the diagnosis can instead be made using validated clinical criteria or very rarely just confirmed by expert opinion. After you read the case, I want you to give a differential diagnosis with a list of candidate diagnoses ranked by probability starting with the most likely candidate. Each candidate should be specified with the OMIM identifier and disease name. For instance, if the first candidate is Branchiooculofacial syndrome and the second is Cystic fibrosis, provide this:
 
+1. OMIM:113620 - Branchiooculofacial syndrome
+2. OMIM:219700 - Cystic fibrosis
+
+This list should provide as many diagnoses as you think are reasonable.
+
+You do not need to explain your reasoning, just list the diagnoses together with the OMIM identifiers. 
 Here is the case:
 
 """;
diff --git a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktIndividualSpanish.java b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktIndividualSpanish.java
index 2fd5a6b..5e5060b 100644
--- a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktIndividualSpanish.java
+++ b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktIndividualSpanish.java
@@ -343,7 +343,7 @@ private String lastEncounterAvailable(PhenopacketSex psex, PhenopacketAge lastEx
             // should never happen
             throw new PhenolRuntimeException("Did not recognize last exam age type " + lastExamAge.ageType());
         }
-        return String.format("The proband was a %s who presented with", individualDescription);
+        return String.format("El paciente era %s quien se presentó con", individualDescription);
     }
 
     /**
@@ -370,9 +370,9 @@ private String onsetAvailable(PhenopacketSex psex, PhenopacketAge onsetAge) {
 
     private String ageNotAvailable(PhenopacketSex psex) {
         return switch (psex) {
-            case FEMALE -> "The proband was a female who presented with";
-            case MALE -> "The proband was a male who presented with";
-            default -> "The proband presented with";
+            case FEMALE -> "La paciente se presentó con";
+            case MALE -> "El paciente se presentó con";
+            default -> "El paciente se presentó con";
         };
     }
 
diff --git a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktTextSpanish.java b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktTextSpanish.java
index 3ec19bf..c31542b 100644
--- a/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktTextSpanish.java
+++ b/src/main/java/org/monarchinitiative/phenopacket2prompt/output/impl/spanish/PpktTextSpanish.java
@@ -7,31 +7,16 @@ public class PpktTextSpanish implements PhenopacketTextGenerator {
     @Override
     public String QUERY_HEADER() {
         return  """
-Estoy realizando un experimento en una conferencia de casos clinicopatológicos para ver cómo sus diagnósticos\s
-se comparan con los de los expertos humanos. Les voy a dar parte de un caso médico. Estos han sido\s
-todos han sido publicados en el New England Journal of Medicine. Usted no está tratando a ningún paciente.
-Cuando lea el caso, observará que hay expertos que exponen sus opiniones.\s
-En este caso, usted es el "Dr. GPT-4", un modelo de lenguaje Al que está discutiendo el caso junto con expertos humanos.\s
-expertos humanos. Una conferencia clinicopatológica tiene varias reglas tácitas. La primera es\s
-que la mayoría de las veces hay un único diagnóstico definitivo (aunque rara vez puede haber más de uno),
-y se trata de un diagnóstico que hoy se sabe que existe en humanos. El diagnóstico casi siempre se\s
-confirmado mediante algún tipo de prueba de patología clínica o anatomopatológica, aunque en\s
-casos raros en los que no existe una prueba de este tipo para un diagnóstico, éste puede\s
-diagnóstico puede realizarse mediante criterios clínicos validados o, en muy raras ocasiones, simplemente confirmarse mediante la opinión de un experto.\s
-Al final de la descripción del caso se le indicará si se solicita alguna prueba o pruebas diagnósticas.\s
-diagnósticas, que puede suponer que harán el diagnóstico o diagnósticos. Después de leer el caso\s
-quiero que des dos datos. El primer dato es su diagnóstico o diagnósticos más probables.\s
-diagnóstico/diagnósticos. El objetivo es obtener la respuesta correcta, no una amplia categoría de respuestas.\s
-correcta, no una amplia categoría de respuestas. No es necesario que explique su razonamiento.\s
-el/los diagnóstico/s. El segundo dato es dar un diagnóstico diferencial sólido,\s
-ordenados por su probabilidad, de modo que el diagnóstico más probable esté arriba y el menos probable, abajo.\s
-esté en la parte inferior. El número de diagnósticos diferenciales es ilimitado. Puede dar\s
-Puede dar tantos diagnósticos como considere razonables. No es necesario que explique su razonamiento,\s
-sólo enumere los diagnósticos. De nuevo, el objetivo es ser lo más específico posible con cada uno de los\s
-diagnósticos.\s
-¿Tiene alguna pregunta, Dr. GPT-4?
-                                 
+Estoy realizando un experimento con el informe de un caso clínico para comparar sus diagnósticos con los de expertos humanos. Les voy a dar parte de un caso médico. No estás intentando tratar a ningún paciente. En este caso, usted es el “Dr. GPT-4”, un modelo de lenguaje de IA que proporciona un diagnóstico. Aquí hay algunas pautas. En primer lugar, existe un único diagnóstico definitivo, y es un diagnóstico que hoy se sabe que existe en humanos. El diagnóstico casi siempre se confirma mediante algún tipo de prueba genética, aunque en casos raros cuando no existe dicha prueba para un diagnóstico, el diagnóstico puede realizarse utilizando criterios clínicos validados o, muy raramente, simplemente confirmado por la opinión de un experto. Después de leer el caso, quiero que haga un diagnóstico diferencial con una lista de diagnósticos candidatos clasificados por probabilidad comenzando con el candidato más probable. Cada candidato debe especificarse con el identificador OMIM y el nombre de la enfermedad. Por ejemplo, si el primer candidato es el síndrome branquiooculofacial y el segundo es la fibrosis quística, proporcione lo siguiente:
+                
+1. OMIM:113620 - Síndrome branquiooculofacial
+2. OMIM:219700 - Fibrosis quística
+
+Esta lista debe proporcionar tantos diagnósticos como considere razonables.
+
+No es necesario que explique su razonamiento, simplemente enumere los diagnósticos junto con los identificadores OMIM.
 Este es el caso:
+             
 """;
     }