Update README.md

julienijs · May 19, 2024 · 570d674 · 570d674
1 parent c4c7f53
commit 570d674
Showing 1 changed file with 5 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -35,21 +35,21 @@ The dataset used in this experiment is based on the the multilingual parallel ED
 
 3 new datasets were derived from the EDGe corpus:
 - EDGe_Zipped_Sizes.xlsx: contains the sizes of all the files in EDGe when they are zipped
-- morph_zipped_all.xlsx: contains the sizes of all the files in EDGe when they are morphologically distorted and then zipped over 1000 iterations
-- synt_zipped_all.xlsx: contains the sizes of all the files in EDGe when they are syntactically distorted and then zipped over 1000 iterations
+- EDGe_Morph_Zipped.xlsx: contains the sizes of all the files in EDGe when they are morphologically distorted and then zipped over 1000 iterations
+- EDGe_Synt_zipped.xlsx: contains the sizes of all the files in EDGe when they are syntactically distorted and then zipped over 1000 iterations
 
 ### Workflow & code
 #### Step 1: create a file with all the file sizes of the zipped files in EDGe - EDGe_Zipped_Sizes.xlsx
 In order to create EDGe_Zipped_Sizes.xlsx first all the files in the dataset need to be zipped. This is done by running gzip_files.py. The second step is retrieving all the file sizes of the zipped files. This is done by running file_size.py on the newly created zipped files.
 
 #### Step 2: morphological distortion - morph_zipped_all.xlsx
-In this step all files are first morphologically distorted and subsequently zipped. Morphological distortion is achieved as described above, by randomly deleting 10% of all characters in the file. For each file this is done 1000 times and each time the size of the file is stored in morph_zipped_all.xlsx. This step requires morphological_distortion_pipeline.py.
+In this step all files are first morphologically distorted and subsequently zipped. Morphological distortion is achieved as described above, by randomly deleting 10% of all characters in the file. For each file this is done 1000 times and each time the size of the file is stored in EDGe_Morph_Zipped.xlsx. This step requires morphological_distortion_pipeline.py.
 
 #### Step 3: syntactic distortion - synt_zipped_all.xlsx
-In this step all files are first syntactically distorted and subsequently zipped. Syntactic distortion is achieved as described above, by randomly deleting 10% of all words in the file. For each file this is done 1000 times and each time the size of the file is stored in synt_zipped_all.xlsx. This step requires syntactic_distortion_pipeline.py.
+In this step all files are first syntactically distorted and subsequently zipped. Syntactic distortion is achieved as described above, by randomly deleting 10% of all words in the file. For each file this is done 1000 times and each time the size of the file is stored in EDGe_Synt_zipped.xlsx. This step requires syntactic_distortion_pipeline.py.
 
 #### Step 4: statistical analysis in R
-The statistical analysis of the created datasets (input = EDGe_Zipped_Sizes.xlsx, morph_zipped_all.xlsx and synt_zipped_all.xlsx) is done by running complexity_analysis.R. The script calculates the morphological and syntactic complexity as described above. The output of this script are graphs in .png format.
+The statistical analysis of the created datasets (input = EDGe_Zipped_Sizes.xlsx, EDGe_Morph_Zipped.xlsx and EDGe_Synt_zipped.xlsx) is done by running complexity_analysis.R. The script calculates the morphological and syntactic complexity as described above. The output of this script are graphs in .png format.
 
 ### Result
 ![Syntactic vs morphological complexity ratio](https://user-images.githubusercontent.com/107923146/212687027-2c4eaac4-89a9-45b5-b8bf-000191aa7c16.png)