Skip to content

Thoughts & Lessons on SQL Modeling

Jonathan Bloedow edited this page Feb 8, 2024 · 1 revision

SQL Modeling

Thoughts and Conclusions from LASER Modeling Prototypes with SQL

I have been doing a month or two of experiments with Light Agent DataFrame modeling entirely in SQL. Most of this has been in SQLite, but also some comparisons in MySQL.

The motivation is that we were already thinking about Light Agent modeling as large numbers of agents where each agent consisted of list of properties or attributes. In my mind, this is a table or dataframe.

For example,

ID Node Age Infected Infection_Timer Incubation_Timer Immunity_Timer
0 0 25.2 1 5 2 0
1 7 13.3 1 7 0 0
2 33 5.6 0 0 0 -1
3 5 85.9 0 0 0 -1
4 10 44.1 1 2 0 0

And one of the most established and mature technologies for manipulating large tables of data is SQL.

Some quick notes that I'll develop into more detailed thoughts later:

  • The concept works really well. "Update all column A and B for all agents where x and y and z is true by doing ..." is what SQL is good at.
  • 1e5 agents can be modeled with very satisfactory performance.
  • 1e6 population with 80-90% EULA can also be modeled with satisfactory performance.
  • But no more than that, so far.
  • Indexing matters, as one would expect.
  • GPT/AI is really good at creating queries from English.
  • It's very cool being able to work completely declaratively.
  • Developing and debugging a model in a technology where the entire population is immediately and easily queryable is very nice.
  • Going to and from csv at any stage of the simulation is very powerful.
  • It's pretty easy to go to/from any other dataframe manipulating technology, including towards performant implementations like Numpy or c-accelerated numpy or numba.
Clone this wiki locally