Cumulative distribution of train passengers

Models the distribution of passengers inside a train along its journey. Higher density means higher probability of people in that position of the train carriage. This is the probability density function of a mixture distribution.

It assumes that the biggest/only factor in the spatial distribution of passengers is the location of "stairs" (stairs and escalators) on station platforms.

Two random beta distributions are generated for every "stair" location, one with a larger and the other a smaller variance. A smaller random uniform distribution is also generated. The three form a mixture distribution and is weighted then summed. The summed pdf for every stair is summed for every station, giving a pdf for boarders at every station. Some passengers in the train will alight whilst some board, so the final distribution after each station is another mixture distribution consisting of the distribution of current passengers in the train, plus the distribution of all boarders, minus the distribution of alighting passengers. For simplicity, the distribution of alighting passengers are assumed to be uniform.

The bright green lines represents the x-position of the stairs for every station. The colored lines are the probability density function of the spatial distribution of passengers along the 1D train.

As the train moves from Tokyo to Kanda, some passengers alight the train and some board it. Thus the cumulative distribution of the train after Kanda is a mixture of the Tokyo and Kanda distributions. This is why the KDE for Kanda still resembles Tokyo.

Ochanomizu and Yotsuya has stairs on the far end of the platform, with the latter actually beyond the train carriage. The result is an increase in the density of passengers on the left side of the train.

This chart shows the same data but in the same plot for easier inter-station comparison. The density on the left (front of train) after Ochanomizu and Yotsuya is immediately observable.

Equations

The probability density function m of passenger spatial distribution for every station i is:

$$m_0=b_0$$

$$m_i=(m_{i-1}\times (1 - p^b_i))+(b_i\times p^b_i)$$

$$b_i=\sum_{j=0}^{n_j}\frac{S_j}{n_j}$$

$$S_j=(B_c\times p_c) + (B_f\times p_f) + (U\times p_u)$$

$b_i$ is the distribution of passengers boarding the train at station $i$
$p^b_i$ is the proportion of total passengers that are boarders from station $i$
- The current implementation models alighting passengers with a uniform distribution
- $1 - p^b_i$ is the proportion of passengers alighting at station $i$
- Calculated from link load (origin-destination) data
- $p^b_i + (1 - p^b_i) = 1$ and both are >= 0
$j$ is the j-th stair at station $i$
$n_j$ is the number of stairs at station $i$
- Data from station platform layout map from the JR website
$S_j$ is the distribution of boarders coming from stair $j$
$B$ is the pdf of the beta distribution; $B_c$ means with a small variance and $B_f$ means a large variance
$U$ is the pdf of the uniform distribution (supported on the platform boundaries)
$p_c$ is the proportion of boarders from a particular stair with the small variance spatial process
- close_concentration = 20.
- prop_normal_close = 0.3
$p_f$ is the proportion of boarders from a particular stair with the large variance spatial process
- far_concentration = 7.
- prop_normal_far = 0.6
$p_u$ is the proportion of boarders from a particular stair with the uniform random spatial process
- prop_uniform = 0.1
$p_c + p_f + p_u = 1$ and all three are >= 0

This assumes each stair in the station is equally important, but this might not be true, as some passengers might be predominantly from particular stairs. The equation can be easily adapted to support data for stair traffic:

$$b_i=\sum_{j=0}^{n_j}\frac{S_j}{p_j}$$

Where $p_j$ is the probability of passengers coming from stair $j$. $\sum_{j=0}^{n_j}p_j$ must equal 1

The beta distribution is used because it is more appropriate to model proportions (which is bounded between 0-1 exclusive). For values exactly at 0 and 1, it turns it into 0.01 and 0.99 for the beta distribution. The normal distribution would cause edge effects on the boundaries because values outside the boundary was clamped. The alternative was to ignore those values, but that would cause the integral of the "pdf" to be less than 1.

Potential extensions

Easily adjustable variables (the parameters used for the beta distributions and their weight is hardcoded in stair_pdfs_sep as above)
Consider that some alighting passengers will exit at a location close to the stairs they plan to go through (currently a uniform distribution is subtracted)
Consider that some passengers will board at a location convenient for their destination station
Consider variables such as shelter (for rainy weather)
Proper origin-destination data (the current OD data is probably commuter tickets only)
Evaluate model with real world data of passenger distributions
Crowd simulation to model passengers dispersing throughout the train, as passengers do not mindlessly cluster together when there is space along the train

Applications

Suggest passengers where to wait to mitigate overcrowding
Pricing advertisements on the platform based on crowd sizes
Inform future station layout design
Understand spatial processes of people in transit through detailed slices of time
As a replacement for lack of real world measurements of passengers distributions

Usage

Install rust/cargo
mkdir out
cargo run

Data sources

Station platform layout
- https://www.jreast.co.jp/map/
- eg: https://www.jreast.co.jp/estation/stations/1039.html
Link load (origin-destination) data
- https://www.mlit.go.jp/sogoseisaku/transport/sosei_transport_tk_000035.html

References

Kruschke, John K. (2015). Doing Bayesian Data Analysis (Second Edition). Chapter 6.2

Q&A

Why not use Python?

Well I initially did use it. But pdm sync didn't work at all because apparently I have a python package without a name or package metadata, so pdm crashes because it assumes there always is metadata.

Well whatever I'll just not use pdm and just tell users to "download this list of dependencies and hope it works".

Unfortunately matplotlib didn't work because some dynamically linked library was the wrong version. No problem, there's a workaround. But it didn't fix the broken linking. I went to my marey repo, and it crashes because the workaround is script-specific.

I solved the issue by disabling conda. But now I have no pdm, no conda, just relying on system Python. Which is guaranteed to break in a update three months later

I just can't use something that is guaranteed to break every three months.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github		.github
data		data
examples		examples
maps		maps
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cumulative distribution of train passengers

Equations

Potential extensions

Applications

Usage

Data sources

References

Q&A

About

Releases

Sponsor this project

Packages

Languages

License

akazukin5151/train-passenger-distribution

Folders and files

Latest commit

History

Repository files navigation

Cumulative distribution of train passengers

Equations

Potential extensions

Applications

Usage

Data sources

References

Q&A

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages