-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path2- BFMInstructions
32 lines (22 loc) · 2.39 KB
/
2- BFMInstructions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
In order to use the tools, specific yet easy steps must be followed during the extraction of data from BFM in order to produce the adequate dataframe. What follows is the methodology to apply whenever this tool is used.
-1-Search for data on database BFM
Conduct a search, with Corpus Query Language (CQL) if advanced, in order to obtain the desired data from the database.
-2-Export the data from database BFM
BFM allows you to export in .txt (document texte) from the database once you've gathered the data. The data needs to be ordered according to the reference criteria and the reference criteria needs to be set on "text:datecompo". This .txt document needs to be imported in a new excel tab.
-3-Import the raw data excel document in an ordered excel table
After having opened a new excel file, click on "Données" and "A partir du texte" to import our downloaded text document of raw data exported from the database BFM in step 2. During importation, the "Types de données d’origine" has to be "Délimité" in order to create clear cells and the "Origine du fichier", or the origin of the file, needs to be set on "65001 : Unicode (UTF-8)". Then, in the list of "Séparateurs", "Tabulation" has to be ticked and the "Identificateur de texte" is to be set on "aucun". The import should present a clear table of organized data.
-4-Inform the ordered excel table
A line needs to be added above the first line in order to have the space required to title the columns. Four should be precised, from left to right : "Date", the date of the occurence, "LC", the left context, "Forme", the form of the occurence and "RC", the right context.
-5-Import the finished ordered excel table to the code
This import needs to be execute at the very beginning of the code. It is required in the first cell, which is represented here:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
webbrowser.open('http://bfm.ens-lyon.fr')
excel = pd.read_excel ('Name.xlsx')
f= excel.drop(columns=['LC','RC'])
x=input('insert first forme')
df = f[(f['Forme']==x)]
All that is required to import the finished file is to replace "Name" by the actual name given to the saved file. After that, the code needs to be run and self-explainatory direction will be given during its course (such as "insert first form").
Code written by : Giovanni Merici
Data extraction instructions written by : Axelle Domingues