Telcell is a collection of scripts than can be used to determine the evidence that a pair of any phones is used by the same person.
- Python 3.10
- Make sure the requirements are installed
pip install -r requirements.txt
To run the tests do:
pip install -r test-requirements.txt
coverage run --branch --source telcell --module pytest --strict-markers tests/
python example.py
The script example.py contains information and an example pipeline to run the library. It uses testdata that is included in the repository and should return output like:
DummyModel: [1.0, 1.0, 1.0, 1.0]
MeasurementPairClassifier: [1.0, 1.0, 1.0, 1.0]
At this moment only a csv file can be used as input, but an extra input source can be easily realised if necessary. The following columns are expected to be present in the csv file:
- owner
- device
- cellinfo.wgs84.lat
- cellinfo.wgs84.lon
- timestamp
Any additional columns are stored under the extra
attribute of each resulting Measurement
object.
After parsing, the data is stored as Tracks
and Measurement
objects. Each Track
has an owner
, device
and a
sequence of Measurement
. A Measurement
consists of coords
(a Point
object), a timestamp (datetime
) and
an extra
(Mapping
) for additional information.
The next step is data processing or crunching. The data will be transformed in a format that can be used by the models.
Even though custom models can be used with telcell, a number of models have been implemented.
Dummy model
: always returns an LR of 1.MeasurementPairClassifier
: generates colocated and dislocated training pairs from the tracks based on the time interval and the rarity of the location. Fits a Logistic Regression model and an ELUB bounded KDE calibrator. Returns LRs for the training data.RarePairModel
: uses coverage data (gps locations with corresponding antennas) to fit coverage_models for each time interval bin. For the validation/case data, a registration pair is chosen from a pair of tracks, for which the registrations are within a specific time interval and the location for the second measurement is the rarest with respect to the background oftrack_b
. For this pair a calibrated score is given by the model/calibrator which is subsequently used to calculate the LR via a number of statistical calculations.
We use the evaluation of the library lrbenchmark. A Setup
object is created with the run_pipeline
function, the
models that have to be evaluated, the necessary parameters and the data itself. All different combinations will be
evaluated, resulting in multiple lrs that can be used to determine if the two phones were carried by the same person or
not.
Instead of using the pipeline for evaluation, functionality is also present to write the generated LRs to file.
Replace make_output_plots
with write_lrs
with the required parameters and the LRs will be written to file.
The command to run the dashboard is PYTHONPATH=. streamlit run data_analysis/dashboard.py -- --file-name <PATH_TO_FILE>
for the repository root.
The PYTHONPATH
is necessary to set, so streamlit can find the imported files correctly.
In this dashboard, there exist two applications:
Tracks and pairs
, to visualize the registrations and pairs from ameasurements.csv
file (so this can both be casework as validation measurements). The<PATH_TO_FILE>
refers to the path of thismeasurements.csv
. You can overwrite the default path to this file with a different one. The extra--
in the above command is necessary for streamlit to parse the arguments belonging to the script, instead of to streamlit itself.Travel speed
, to visualize the travel speeds found in ameasurements.csv
file, that is provided via<PATH_TO_FILE>
.