Skip to content

Latest commit

 

History

History
532 lines (353 loc) · 36.7 KB

DEVELOPER.md

File metadata and controls

532 lines (353 loc) · 36.7 KB

Developer Notes

These notes are intended for developers working on the internals of Joker itself. They are not comprehensive.

Library Code (Namespaces)

As with Clojure, Joker supports "libraries" of code organized into namespaces. It offers a number of namespaces that are built-in to the Joker executable itself, as well as the ability to dynamically (at run time and on-demand) extend these namespaces via external Joker source files typically organized into directory trees and deployed alongside the Joker executable. (Currently, Joker does not support dynamic extension via non-Joker code, such as Go plugins.)

Whether built-in (as described below) or separately deployed via source files written in Joker (as described in Organizing Libraries (Namespaces)), developers should be aware of the progression of any given namespace.

Namespace States

The states through which a given namespaces transitions are:

  1. Available
  2. Mapped
  3. Loaded

Available Namespaces

A namespace is available if its source code is either:

  • compiled (in some form) directly into the Joker executable
  • deployed as Joker code such that a running Joker executable can locate and load it

The compiled namespaces are also the built-in namespaces, and are described below. These start out mapped, though not necessarily loaded; the core namespaces are loaded on-demand when first referenced. (joker.core is referenced immediately upon startup of the Joker executable; joker.repl is as well, when running Joker as a REPL.)

The set of core namespaces is hardcoded (in any given Joker executable) in joker.core/*core-namespaces*; this list is built automatically from information generated by core/gen_code/gen_code.go.

Other built-in namespaces, the so-called std namespaces, start out (like the core namespaces) as both available and mapped.

A namespace that is available, but not mapped, is not found via e.g. (the-ns 'megacorp.biz.logic). Only when first referenced (via :require or similar) is it searched for and then (if found) loaded.

There's (currently) no formal predicate for whether a namespace is available (due to being deployed). One could wrap a reference within a try, as in (try (require 'megacorp.biz.logic) (catch Error e ...)); however, this would have the side-effect (in the success case) of mapping and loading the namespace, so is not a pure predicate.

Mapped Namespaces

A namespace is mapped if it is present in (all-ns), which enumerates all the namespaces mapped into the current (global) environment.

In this state, the namespace is "registered" (to coin a synonym) with the canonical Clojure namespace mechanism as implemented by Joker.

But the namespace itself hasn't yet necessarily been initialized. Only when that happens (potentially "lazily") is the namespace said to be loaded.

Loaded Namespaces

When actually needed, via a :require clause in an (ns ...) specification, due to (require ...), or (for an already-mapped namespace) directly as a symbol qualifier via e.g. joker.some.namespace/somevar, a namespace is loaded, meaning its internal code and data structures are fully initialized.

For example, running Joker with the --verbose option to observe some of the pertinent transitions (and with a two-line Joker script in a/b/c.joke that does (ns a.b.c) and (println "here i am!")):

$ joker --verbose
Lazily running fast version of string.InternsOrThunks().
NamespaceFor: Lazily initialized joker.string for joker.repl
NamespaceFor: Lazily initialized joker.repl for FindNameSpace
Welcome to joker v0.14.2. Use EOF (Ctrl-D) or SIGINT (Ctrl-C) to exit.
user=> (all-ns)
(joker.walk joker.template joker.io joker.json joker.base64 joker.csv joker.filepath joker.url joker.html user joker.core joker.hiccup joker.strconv joker.better-cond joker.bolt joker.crypto joker.math joker.os joker.uuid joker.yaml joker.string joker.test joker.pprint joker.hex joker.http joker.repl joker.set joker.tools.cli joker.time)
user=> joker.core/*core-namespaces*
#{joker.tools.cli joker.test user joker.template joker.core joker.walk joker.set joker.repl joker.hiccup joker.pprint joker.better-cond}
user=> (the-ns 'joker.hiccup)
Lazily running fast version of html.InternsOrThunks().
NamespaceFor: Lazily initialized joker.html for joker.hiccup
NamespaceFor: Lazily initialized joker.hiccup for FindNameSpace
#object[Namespace "joker.hiccup"]
user=> (use 'joker.hiccup)
nil
user=> (use 'joker.template)
NamespaceFor: Lazily initialized joker.walk for joker.template
NamespaceFor: Lazily initialized joker.template for FindNameSpace
nil
user=> (the-ns 'joker.hiccup)
#object[Namespace "joker.hiccup"]
user=> (the-ns 'a.b.c)
<repl>:7:10: Exception: No namespace: a.b.c found
Stacktrace:
  global <repl>:7:1
  core/the-ns <joker.core>:2316:18
user=> (use 'a.b.c)
here i am!
nil
user=> (all-ns)
(joker.string joker.test joker.pprint joker.hex joker.http joker.uuid joker.yaml joker.repl joker.set joker.tools.cli joker.time joker.walk joker.template joker.io joker.json joker.base64 joker.csv joker.filepath joker.url joker.html user joker.core joker.hiccup joker.better-cond joker.bolt joker.crypto joker.math joker.os joker.strconv a.b.c)
user=> (defn all-ns-as-set-of-strings [] (set (map str (all-ns))))
#'user/all-ns-as-set-of-strings
user=> (all-ns-as-set-of-strings)
#{"joker.crypto" "joker.strconv" "joker.pprint" "joker.csv" "joker.io" "joker.string" "joker.url" "joker.template" "joker.core" "joker.tools.cli" "joker.uuid" "joker.html" "joker.set" "joker.hex" "joker.time" "joker.json" "joker.bolt" "joker.hiccup" "user" "joker.yaml" "joker.filepath" "joker.repl" "joker.os" "joker.base64" "joker.better-cond" "joker.math" "joker.test" "joker.http" "joker.walk" "a.b.c"}
user=> ((all-ns-as-set-of-strings) "a.b.c")
"a.b.c"
user=> ((all-ns-as-set-of-strings) "joker.foo")
nil
user=> (joker.core/ns-initialized? 'joker.os)
false
user=> (joker.core/ns-initialized? 'joker.hiccup)
true
user=> (joker.core/ns-initialized? 'joker.html)
true
user=> (joker.core/ns-initialized? 'a.b.c)
true
user=>

First, note that joker.string is lazily initialized. This is due to running Joker as a REPL, because that automatically loads joker.repl, which in turn requires joker.string.

Then, (the-ns 'joker.hiccup) explicitly loads that namespace, meaning that it is initialized. (It needn't be initialized again during this same invocation of Joker.)

Further, because joker.hiccup requires joker.html (an already-mapped namespace), the latter is loaded (lazily initialized).

Similarly, referencing joker.template causes joker.walk to also be loaded (initialized). As a core namespace, joker.walk doesn't define an InternsOrThunks() function that needs to be called.

(the-ns a.b.c) then fails because a.b.c is not mapped. But because it is available (a/b/c.joke exists), it can be referenced, and thus mapped and loaded, via (use 'a.b.c), (require 'a.b.c), or similar. (all-ns) then includes a.b.c in its result.

A helpful (if longwindedly-named) function is then defined to return a set of all mapped namespaces, as strings, which can in turn be used to easily determine whether a namespace (as a string) is mapped. a.b.c and the nonexistent joker.foo are each tested this way.

The private function joker.core/ns-initialized? is then used to test whether various mapped namespaces have been initialized. Of these, only joker.os returns false, because it has not yet been referenced.

Note that, at present, there are no explicit tests for whether a namespace is available (in the general sense). One could attempt to load a namespace with a (try ...), but that would have the (potential) side effect of actually loading the namespace.

These distinctions should be of little, if any, important to developers of Joker code, since these transitions are (largely) managed automatically on behalf of canonical Joker code. But such distinctions are potentially of interest to developers working on Joker internals, including core or std namespaces.

Built-in Namespaces

As explained in the README.md file, Joker provides several built-in namespaces, such as joker.core, joker.math, joker.string, and so on.

All necessary support for these namespaces is included in the Joker executable, so their source code needn't be distributed nor deployed along with Joker. This allows Joker to be deployed as a stand-alone executable.

The built-in namespaces are organized into two sets:

  • Core namespaces, which provide functions, macros, and so on necessary for rudimentary functioning of Joker or expected to be of widespread interest

  • Standard-library-wrapping ("std") namespaces, which provide Clojure-like interfaces to various Go standard libraries' public APIs

The mechanisms used to incorporate these namespaces into the Joker executable differ substantially, so it is important to understand them when considering adding (or changing) a namespace to the Joker executable.

Core Namespaces

Core namespaces, starting with joker.core, define the features (mostly macros and functions) that are necessary for even rudimentary Joker scripts to run.

Their source code resides in the core/data/ directory as *.joke files.

Not every such file corresponds to a single namespace; the linter_*.joke files modify the joker.core namespace, while the remaining files do correspond to namespaces, and are named by dropping the joker. prefix and changing all . characters to _. So, for example, the joker.tools.cli namespace is defined by core/data/tools_cli.joke.

When Joker is built (via the run.sh script), go generate ./... is first run. Among other things, this causes the following source line (as a Go comment) in core/object.go to be executed:

//go:generate go run -tags gen_code gen_code/gen_code.go

This line builds and runs core/gen_code/gen_code.go, which finds, in the CoreSourceFiles array defined near the top of core/gen_code/gen_code.go, a list of files (in core/data/) to be processed.

As explained in the block comment just above the var CoreSourceFiles []... definition, the files must be ordered so any given file depends solely on files (namespaces) defined above it (earlier in the array).

Compiling Core Namespace Structures to Native Go Code

Processing a .joke file consists of reading and evaluating forms in the file via Joker's (Clojure-like) Reader. This is done for the core-library-defining (that is, not linter-specific) files, yielding fully populated data structures as if all core namespaces (and std namespaces upon which they depend) have been fully loaded in a Joker invocation. (Keep in mind that this is done before a proper Joker executable is actually built.)

Then, the data structures defining (among other things) the resulting namespaces are compiled into Go code that, when (in turn) compiled into a Joker executable, creates them in toto, mostly via static initialization of numerous package-scope variables.

Packing Linter-specific Joker Files as Native Go Data Structures

Linter-specific files (named core/data/linter_*.joke) are treated differently. After all the core-library-defining files are compiled to Go code (as described above), these linter-specific files are read and evaluated, "packing" the resulting forms into a portable binary format, and encoding the resulting binary data as a []byte array in Go source files named core/a_*_data.go, where * is the same as in core/data/*.joke.

This approach does not involve the normal Read phase at Joker startup time (though the Evaluation phase remains largely the same). So, the overhead involved in parsing certain Clojure forms is avoided, in lieu of using (what one assumes would be) faster code paths that convert binary blobs directly to AST forms. But most of Joker's object types (corresponding generally to Clojure forms) are stringized into the binary-data stream, and parsed back out at load time; so not all parsing overhead is avoided.

A disadvantage of this approach is that it requires changes to core/pack.go when changes are made to certain aspects of the AST.

Building Native Go Files Into the Joker Executable Itself

As native-Go-code compilation (for core namespaces and linter files) occurs before the go build step performed by run.sh, the result is that that step includes those core/a_*.go source files. The binary data contained in the core/a_*_data.go (linter-data) files is, when needed, unpacked and the results used to modify the environment as appropriate for the linter mode involved.

The resulting Joker executable thus starts up with all the core-namespace-related data structures already nearly-fully populated, with remaining work done via a combination of initialization functions (func init()), dynamic-variable initialization (of *out*, *command-line-args*, etc.), and lazy initialization (such as compiled regular expressions in joker.hiccup) when the respective namespaces are actually referenced for the first time during that invocation.

When in linter mode, the forms encoded (as a []byte array) in the pertinent core/a_linter_*_data.go files are unpacked and evaluated upon startup, after joker.core has been fully loaded.

Avoid Copying Dynamic Variables

IMPORTANT: Because the compiled structures are (mostly) statically initialized in the default, fast-startup, version of Joker, core libraries defining variables whose values depend on dynamic variables might not work properly even when the values are copied by namespaces other than joker.core (and thus the values are referenced, in the slow-startup version, only when the namespaces are actually loaded).

The one case that currently exists, joker.test/*test-out*, is defined as a copy of joker.core/*out*; but the latter is set at runtime (hence the adjective "dynamic"), so gen_code.go detects that and specially handles this case by copying the value of it into the value of *test-out* only during the lazy-initialization phase of joker.test, instead of leaving the assignment performed when gen_code.go evaluates the forms in core/data/*.joke (at which point in time *out* is nil).

But the general case of such a reference is neither handled nor detected (though either or both could be implemented if deemed necessary).

So while (def clargs *command-line-args*) might work, even though *command-line-args* is initialized at runtime, (def nargs (count *command-line-args*)) might silently always set nargs to 0 in the fast-startup version, since there are no Joker command-line arguments present when gen_code runs.

Now, this wouldn't work in joker.core anyway, because that namespace is always processed (in both variants of Joker) before dynamic variables (such as *command-line-args*, *classpath*, and so on) are set.

If nargs (in the example shown above) is defined in (say) joker.test, however, the slow-startup version of Joker will perform that assignment after dynamic variables have been initialized, because that's when it reads in and evaluates the blobs comprising that namespace (once it's referenced).

But the fast-startup version of Joker will have already "baked in" the value of nargs when gen_code.go ran; there's no runtime code currently generated to dynamically set such a dependent variable after that variable has been set.

This doesn't affect functions that merely reference dynamic variables. E.g. (defn nargs [] (count *command-line-args*)) would work fine (and nargs would be called as a function), since Joker does not compile such forms into Go code in optimized form.

Arguably, copying of dynamic variables is an unwise practice in any case: as highlighted above, the user of a namespace doesn't necessarily control when the namespace code is loaded and any such assignments performed. Providing initialization/reset functions for such namespaces, or simply promoting the desired assignees to functions that simply reference the dynamic variables, is probably better, as these approaches either give the namespace user control over when to perform the copying of values, or obviate the issue.

For a dynamic variable such as *command-line-args*, this might not seem important; but for something like *out* or *classpath*, which user code might change while running, it's important for said user code (or any namespaces it uses) to be able to predict when their values will actually be captured by core namespaces, just as they would be by user-defined (3rd-party) namespaces/libraries.

The list of such dynamic variables is kept in core/gen_code/gen_code.go, and is currently:

	knownLateInits = map[string]struct{}{
		"joker.core/*in*":                struct{}{},
		"joker.core/*out*":               struct{}{},
		"joker.core/*err*":               struct{}{},
		"joker.core/*command-line-args*": struct{}{},
		"joker.core/*classpath*":         struct{}{},
		"joker.core/*core-namespaces*":   struct{}{},
		"joker.core/*verbose*":           struct{}{},
		"joker.core/*file*":              struct{}{},
		"joker.core/*main-file*":         struct{}{},
	}

Adding a Core Namespace

Assuming one has determined it appropriate to add a new core namespace to the Joker executable (versus deploying it as a separate *.joke file), one must code it up (presumably as Joker code, though some Go code can be added to support it as well).

Then, besides putting that source code in core/data/*.joke, one must:

  • Add it to the core/gen_code/gen_code.go CoreSourceFiles array (after any core namespaces upon which it depends)

Further, if the new namespace depends on any standard-library-wrapping namespaces:

  • Edit the core/gen_code/gen_code.go import statement to include each such library's Go code
  • Ensure that code has already been generated (that library's std/*/a_*.go files have already been created), perhaps using an older version of Joker to run generate-std.joke from within the std subdirectory

Create suitable tests, e.g. in tests/eval/.

Finally, it's time to build as usual (e.g. via ./run.sh), then run ./eval-tests.sh or even ./all-tests.sh.

When Joker is run, the namespace is automatically added to *core-namespaces* as an "available" library; upon being loaded, it will be added to *loaded-libs*. (The fast-startup version of Joker will have already loaded all core libraries upon startup.)

Note that, in the slow_init version of Joker, core libraries (other than joker.core and, when running the Repl, joker.repl) do not show up in joker.core/*loaded-libs* (which is returned by the public function loaded-libs) until after they've been loaded via :require or similar.

Standard-library-wrapping (std) Namespaces

These namespaces are also defined by Joker code, which resides in std/*.joke files.

These *.joke files, however, have code of a particular form that is processed by the std/generate-std.joke script (after an initial version of Joker is built). They cannot, as explained below, define arbitrary macros and functions for use by normal Joker code.

The Joker Script That Writes Go Code

The std/generate-std.joke script, which is run after the Joker executable is first built (by run.sh), reads in the pertinent namespaces, currently defined via (def namespaces ...) at the top of the script. This definition dynamically discovers all the *.joke files in std/.

(apply require :reload namespaces) loads the target namespaces, then the script processes each namespace in namespaces by examining its public members and "compiling" them into Go code, which it stores in std/*/a_*.go, where * is the same name, std/*/a_*_slow_init.go, and possibly std/*/a_*_fast_init.go.

For example, std/math.joke is processed such that the resulting Go code is written to std/math/a_math*.go.

Note: This processing does not handle arbitrary Joker code! In particular, "logic" (such as (if ...)) in function bodies is neither recognized nor handled; it's actually discarded, in that it does not appear (in any form) in the final Joker executable. Similarly, no macros (public or otherwise) appear at all; so, as with logic in functions, they're useful only insofar as they might affect how other public members are defined during the running of std/generate-std.joke.

Instead, the processing consists primarily of examining the metadata for each (public) member and emitting Go code that, when built into (the soon-to-be-rebuilt) Joker executable, creates the namespace (joker.math in the above example), "interns" the public symbols, and includes (attached to those symbols) both suitable metadata and Go-code "stubs" that handle Joker code referencing a given symbol and the underlying Go implementation (typically a standard-library API, such as math.sin for joker.math/sin).

Those stubs handle arity, types, and results.

Whether they call Go code directly, or call support code written in Go (typically included in a file named std/*/*_native.go, e.g. std/math/math_native.go) -- and the specific Go-code invocation used -- is determined via the :go metadata and return-type tags for the public member, as defined in the original std/*.joke file.

The a_*.go files generated for std namespaces cause the namespaces to be mapped by the time the Joker executable has finished starting up. That's why they appear in (all-ns), even when they haven't actually been loaded (lazily initialized).

Advantages and Disadvantages vis-a-vis Core Namespaces

As standard-library-wrapping namespaces are lazily loaded (i.e. on-demand), and needn't build up the ASTs that the core namespaces build up, they can be expected (in the standard, not fast-startup, build of Joker) to offer lower overhead at startup and/or first-use time. That is, only namespace generation, interning of symbols, and metadata is built up; other logic is "baked in" via compilation of the Go code accompanying these namespaces.

However, any logic (such as conditionals, loops, and so on) to be performed by them must be expressed in Go, rather than Joker, code; this mechanism is designed for easier creation of "thin" wrappers between Joker and Go code, not as a general mechanism for embedding Joker code in the Joker executable.

Another advantage (besides performance) of this approach is that the resulting code that builds up the target namespace has no dependencies on any other Joker namespaces -- not even on joker.core.

That means a core namespace may actually depend on one of these (standard-library-wrapping) namespaces, as long as std/generate-std.joke has been run and the resulting std/*/a_*.go file has been made available in the working directory (e.g. by being added to the Git repository).

NOTE: generate-std.joke generates two or three a_*.go files per namespace, depending on whether the namespace is required by any of the core namespaces. a_*_slow_init.go handles the runtime (including "lazy") initialization; if the namespace is required by a core namespace, it's generated for only the gen_code program to use, and a_*_fast_init.go is generated to handle the runtime/lazy initialization needed by Joker itself.

Optimizing Build Time

The run.sh script includes an optimization that avoids building Joker a second (final) time after it runs std/generate-std.joke to generate the std/*/a_*.go files.

That optimization starts by computing a hash of the contents of the std/ directory before running the script, and another one afterwards.

If the hashes are identical, run.sh assumes nothing has changed in the std/*.joke files with respect to the std/*/a_*.go files present prior to running the script, and thus there's no need to rebuild the Joker executable so the changed files are built in.

(Of course, even if a std/*.joke file hasn't changed, any changes to std/generate-std.joke or any of the std/*/*.go files, handwritten or autogenerated, will result in a different hash being computed and thus a rebuild.)

Adding a New Standard-library-wrapping Namespace

Besides creating std/foo.joke with appropriate metadata (such as :go) for each public member (in joker.foo), one must:

  • mkdir -p std/foo
  • (cd std; ../joker generate-std.joke) to create std/foo/a_foo*.go ** NOTE: If ../joker does not exist (due to a failed build while iterating through this process), any recent version (such as the installed, official, version you might have in $PATH) may be used
  • If necessary, write supporting Go code, which goes in std/foo/foo_native.go and other Go files in std/foo/*.go
  • Add the resulting set of Go files (in std/foo), as well as std/foo.joke, to the repository
  • Add the appropriate line to the import block at the top of main.go
  • Add tests to tests/eval/
  • Rebuild the Joker executable (via run.sh or equivalent)
  • Run the tests (via ./all-tests.sh or just ./eval-tests.sh)

While some might object to the inclusion of generated files (std/*/a_*.go) in the repository, Joker currently depends on their presence in order to build, due to circular dependencies (related to the bootstrapping of Joker) as described below.

Understanding the generate-std.joke Script

This script generates foo/a_foo*.go files based on foo.joke files.

Given:

(defn <RTN-TYPE> FN
  DOCSTRING
  {:added VERSION
   :go GOCODE}
  [ARGSPEC...])

This results in the following code in a_foo.go:

var __GOFN__P ProcFn = __GOFN
var GOFN Proc = Proc{Fn: __GOFN__P, Name: "GOFN", Package: "std/foo"}

func __GOFN(_args []Object) Object { BODY }

That is, GOFN is a Proc that wraps a ProcFn var (__GOFN__P) to which the implementation itself, named __GOFN, is assigned.

GOFN is a slightly mangled form of FN (an underscore is appended, etc.; see the go-name function in the script) and BODY chooses an implementation based on the number of elements in _args. (So [ARGSPEC...] could actually be ([ARGSPEC1...]) ([ARGSPEC2...]...), each with a unique number of arguments, in which case GOCODE is not just a string, but a map of the number of arguments to the corresponding string.) PanicArity() is called if the number of arguments does not match.

Each such implementation extracts the arguments based on their ARGSPEC-declared types (ARGSPEC typically being ^ARGTYPE ARGNAME), via ExtractARGTYPE(_args, N) (where N is the argument index), then calls the corresponding GOCODE, saving the result in _res, which is then returned.

If RTN-TYPE is omitted, GOCODE's result is returned as-is (which typically requires GOCODE to refer to a custom implementation in foo/foo_native.go, as in the case of a function that returns nil, aka NIL in Joker's Go code); otherwise, Make<RTN-TYPE> is called to wrap the result in the desired type.

Non-functions (such as constants and variables) and functions (see above) follow.

Next, this follows all those vars (functions and non-functions):

func Init() {
{non-fn-inits}
        InternsOrThunks()
}

Any non-function runtime initializations are performed in {non-fn-inits}.

<NSNAME>Namespace is then defined as a global variable initialized to a global Clojure namespace with NSFULLNAME (e.g. "joker.foo") as a symbol, said namespace being added to the set joker.core/*loaded-libs*:

var fooNamespace = GLOBAL_ENV.EnsureSymbolIsLib(MakeSymbol("joker.foo"))

a_foo.go finishes with:

func init() {
        fooNamespace.Lazy = Init
}

That is, upon Joker startup, the namespace is first registered (mapped), then its lazy-initialization function (Init()) is registered for it.

a_os_slow_init.go defines (the "slow" version of) InternsOrThunks():

func InternsOrThunks() {
        <NSNAME>Namespace.ResetMeta(MakeMeta(nil, "{NSDOCSTRING}", "VERSION"))
        {interns}
}

NSDOCSTRING comes from the :doc metadata in the ns invocation at the top of foo.joke; VERSION is currently hardcoded to "1.0". That's also where imports are specified; they're generated near the top of foo/a_foo_slow_init.go, just after the package specification.

Then the non-function and function names are interned in that same namespace (where {interns} appears, above), with each such intern looking like:

<NSNAME>Namespace.InternVar("FN", GOFN,
  MakeMeta(NewListFrom(NewVectorFrom(MakeSymbol("ARG1"), ...)),
           DOCSTRING)

ARGn is basically each ARGSPEC, including & where applicable, but without the tags (i.e. the type info is lost here).

This is where Joker looks up bar in (bar ...), using the applicable namespace in effect, and knows to call bar_ (the GOFN for bar) with the array of Object's comprising the arguments in ....

Generating Documentation

Once a joker executable has been built with the desired new and changed namespaces, online documentation is generated via:

$ (cd docs; ../joker generate-docs.joke)

Joker distributions currently include core and std libraries' documentation in their repositories, so new and changed .html files should be added to the changeset(s) along with the corresponding library code.

Beware Circular Dependencies

Joker currently has circular dependencies between the core and std namespaces, as well as within the std namespace itself.

Circular Dependencies Between Core and STD Namespaces

There's actually a circular dependency between the two sets of namespaces:

  • core/gen_code/gen_code.go imports std/string, so the initialization code that adds the namespace is run
  • std/string/a_string.go is generated by std/generate-std.joke
  • std/generate-std.joke is run by the first Joker executable built by run.sh
  • That Joker executable cannot be built until after gen_code.go has been run

This circular dependency is avoided, in practice, by ensuring that any std/*/a_*.go files are already generated and present before any new dependencies upon them are added to gen_code.go.

However, a std/*.joke file therefore cannot depend on any core/data/*.joke-defined namespace that, in turn, requires gen_code.go to import its std/*/a_*.go file.

So, while joker.repl and joker.tools.cli currently depend on joker.string, std/string.joke does not depend on them, and preexisted their being added to the core namespaces.

One approach to avoid this problem without (any longer) including generated artifacts (a_*.go files) in the repository, nor requiring an old version of Joker for bootstrapping, would be for the build process (in run.sh) to start by building a Joker executable that includes only joker.core.

Then, that interim Joker executable could be used to run std/generate-std.joke to generate the a_*.go files in std/, after which a "complete" version of Joker would then be built.

However, as explained below, that wouldn't solve the problem entirely, since std/generate-std.joke currently requires more than just joker.core to work.

Circular Dependencies Within STD

The std/*/a_*.go files are needed to build Joker, but are generated by std/generate-std.joke, which needs Joker to run.

Further, std/generate-std.joke requires both joker.os and joker.string.

Those dependencies mean that even if the Joker build process was changed to start by building a Joker executable supporting only joker.core, the resulting executable would be unable to run std/generate-std.joke.

Again, the presence of std/*/a_*.go (at least for joker.os and joker.string) in the repository avoids this being a problem. (Another solution would be to use an older version of Joker to be used to run std/generate-std.joke and thus build a "fresh" one.)

Converting joker.os and joker.string into core libraries (so, the underlying support code would be in package core), and adding them to the list of libraries built into an "interim" Joker executable (as described above), is one approach to solving this issue.

Faster Startup

run.sh builds (via the go generate ./... step) an extra set of Go source files that, unless disabled via a build tag, statically initialize most of the core namespace info. (Some runtime initialization must still be performed, due mainly to limitations in the Go compiler.)

Developer Notes

TBD, but something like this was done to search for Joker code that runs before main() and determine how best to handle it in a slow-vs-fast split build:

grep --color -nH --null -E -e '^(func init\(|var )' *.go ../*.go ../std/*/*.go | grep -v ' ProcFn = '

Overview of Changes Made to Joker

The fast-startup version necessitated (as of this writing) these changes:

  • Regex is now *Regex (a reference type), mainly so runtime initialization (from a regexp.MustCompile() call) can be assigned into the .Value or equivalent member of a static structure.
  • internalNamespaceInfo is a new struct type that wraps []byte for the core namespace, adding init func() (the slow version uses this for lazy-loading of core namespaces; might be replaceable via the .Lazy mechanism if we always map all core namespaces) and available bool (which aids detecting missing a_*.go files more elegantly).
  • Many (larger/complicated) static vars' definitions and initializations have been separated out into _slow_init.go files (e.g. procs_slow_init.go, environment_slow_init.go, etc.), which are // +build slow_init, in that they aren't built into Joker itself.
  • An additional source file, core/environment_fast_init.go, contains an empty receiver to parallel the one in core/environment_slow_init.go.
  • Proc now wraps the former Proc (renamed ProcFn) and adds self-identifying info (the name of the procedure and its package), to help code generation when it encounters them.
  • A new core/gen_code/gen_code.go program replaces the old gen_data.go program. It generates a_*code.go files that mostly define static variables representing the structures resulting when loading core/data/core.joke and the like; they are compiled only when building Joker itself. It also does the work that gen_data.go used to do, except only for core/data/linter_*.joke files and the generation of core/a_data.go.
  • The new run.sh runs, via the go generate ./... step, gen_code.go, which takes about only a few seconds on my Ryzen 3, and which generates these static-initializing files.
  • run.sh continues on to building either the “original” (slower) Joker executable, hardlinked to joker.slow, or the fast-startup version, hardlinked to joker.fast. Whichever is built, it becomes the default for subsequent use (such as running std/generate-std.joke and then running the executable itself with whatever arguments were provided to run.sh).
  • A new core/code.go module is a helper for gen_code.go, since the latter isn’t part of package core.
  • A new core/gen_go package is used solely by gen_code and implements the details of compiling Go variables into (mostly) static Go code.
  • The new private function joker.core/ns-initialized? tells whether a namespace has been initialized (fully, including potentially lazily, loaded). Useful as a debugging tool, it's also used by std/generate-std.joke to determine which std libraries are preloaded by loading all core libraries due to being required by them.

Debugging Tools

go-spew

When built via e.g. go build -tags go_spew, the private joker.core/go-spew function is enabled. (Otherwise it does nothing and returns false.)

This function dumps, to stderr, the internal structure of the argument passed to it (i.e. a Joker object), and returns true.

Optionally, a second argument may be specified that is a map with configuration options as described in the go-spew documentation, though not all such operations are yet supported by Joker's go-spew function.

For example, the internals of the keyword :hey can be output in this fashion:

user=> (joker.core/go-spew :hey {:MaxDepth 5 :Indent "    " :UseOrdinals true})
(core.Keyword) {
    InfoHolder: (core.InfoHolder) {
        info: (*core.ObjectInfo)(#1)({
            Position: (core.Position) {
                endLine: (int) 1,
                endColumn: (int) 24,
                startLine: (int) 1,
                startColumn: (int) 21,
                filename: (*string)(#2)((len=6) "<repl>")
            }
        })
    },
    ns: (*string)(<nil>),
    name: (*string)(#3)((len=3) "hey"),
    hash: (uint32) 819820356
}
true
user=>

Note: The SpewState configuration option is not currently supported; each distinct call to go-spew thus starts with a "fresh" state.

Build Tags

The source code comprising Joker currently supports these custom build tags:

gen_code

This enables building code needed by gen_code or disables code that gen_code itself generates.

go_spew

This enables joker.core/go-spew and some internal code (typically depending on core.VerbosityLevel > 0) calling go-spew, rather than no-ops.