Vectors #9

claytongentry · 2024-11-04T16:52:55Z

Vectors represent the "nouns" of your manifold universe. They are the entities on whose behalf you can roll up metrics, like a Page or a Campaign.

To add a vector to your umbrella, run manifolds vectors add page.

This will generate a config file in your vectors/ directory that you can fill out like so:

attributes:
  published_at: TIMESTAMP
  title: STRING

You can express the vectors you'd like to intersect in your manifold.yml like so:

vectors:
  # List the vectors you want to include in this manifold
  # Example:
  - page

Then when you run manifolds generate, your dimensions schema is generated from the vectors you specified.

johncarney · 2024-11-04T19:53:38Z

.rubocop.yml

+RSpec/ExampleLength:
+  Max: 18
+
+RSpec/MultipleExpectations:
+  Enabled: false
+
+RSpec/MultipleMemoizedHelpers:
+  Enabled: false
+
+RSpec/NestedGroups:
+  Max: 4
+


These are fairly loose settings. MultipleExpectations and MultipleMemoizedHelpers in particular are very helpful in keeping specs clean. Multiple assertions in a test is regarded as a smell in TDD generally, and too many memoized helpers is a pretty clear sign that you need to refactor your code. Consider removing these settings from here then run rubocop --auto-gen-config to generate a .rubocop_todo.yml. This gives you a set of goals you can work toward over time.

spec/manifolds/cli_spec.rb

johncarney · 2024-11-04T20:04:46Z

spec/manifolds/cli_spec.rb

+      File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/config_template.yml", "vectors:\nmetrics:")
+      File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/vector_template.yml", "attributes:")
+    end
+  end

  describe "#init" do
    subject(:cli) { described_class.new(logger: null_logger) }


Putting the action being tested in a before blocks makes tests harder to read because you need to jump back and forth in the file to figure out what's going on.

@tylr may disagree with this one. In other non-Ruby projects I've always put the execution logic in the test itself rather than any setup clause. I just thought this was rspec convention.

Yes. Execution should be with test. Setup is for, well, setting up.

I've seen different practices over the years, but my understanding that RSpec maintainers recommend keeping the execution logic inside the example. In fact I believe they also recommend keeping as much setup in the example as is practical. Personally I feel that is not often very practical.

I like to opt for the one liners, but agree with John here. The allow calls themselves are part of a behavior.

I could maybe see it being ok in a before within a describe "#method" block where all calls to the method need to be mocked. Or even extracting a method if a ton of specs need that setup.

Cool. In that case, I think we all actually agree.

I hadn't even looked at the stubs. I think I will bite my tongue for now :D

johncarney · 2024-11-04T20:08:57Z

spec/manifolds/cli_spec.rb

+        expect(Dir.exist?("./#{project_name}/projects")).to be true
+        expect(Dir.exist?("./#{project_name}/vectors")).to be true


Either split this into two tests (my preference), or wrap them in an aggregate_failures block.

Also, expect(Pathname("some/path")).to be_directory reads a lot better.

it "creates a 'projects' directory for the project" do cli.init(project_name) expect(Pathname(".#{project_name}/projects")).to be_directory end it "creates a 'vectors' directory for the project" do cli.init(project_name) expect(Pathname(".#{project_name}/vectors")).to be_directory end

it "creates 'projects' and 'vectors' directories for the project" do cli.init(project_name) aggregate_failures do expect(Pathname(".#{project_name}/projects")).to be_directory expect(Pathname(".#{project_name}/vectors")).to be_directory end end

Thanks, @johncarney. I had a similar thought and recently started exploring better test matchers to implement a similar change.

This is an interesting topic to dive into...

On a related note, we should consider adopting single-expectation tests that exclude any initialization logic (see: https://www.betterspecs.org/#single). In my experience, this approach leads to better specifications for several nuanced reasons.

To transform these tests into true specs that read like specifications, I would structure them like this:

context "when a project is initialized" do let(:project_name) { Faker.snake_case_name_or_something } let(:base_path) { File.join(Dir.pwd, project_name) } before { cli.init(project_name) } it { expect(Pathname(File.join(base_path, 'projects'))).to be_directory } it { expect(Pathname(File.join(base_path, 'vectors'))).to be_directory } end

This refactoring reduces the lines of code and provides a clear entry point for specifying behaviors without the need for extensive context/state setup. But there's more to consider...

expect(Pathname(File.join(base_path, 'projects'))).to be_directory

Isn’t Pathname(File.join(base_path, 'projects')) an example of application domain details creeping into our tests? I would argue that it is. Ideally, development should be driven by specs that don’t embed specific implementation details.

Here's an updated approach to consider:

describe CLI do subject(:cli) { CLI.new(...) } describe '#init' do context "with a name and directory" do let(:name) { Faker.snake_case_name_or_something } let(:directory) { File.join(Dir.pwd, name) } before { subject.init(name, directory:) } it { expect(Pathname(subject.vectors_directory)).to be_directory } it { expect(Pathname(subject.projects_directory)).to be_directory } end end end

I'll admit that defining what makes a good spec is near impossible. However, a useful litmus test is whether the test code reads like natural language and whether the test output is concise and hierarchically clear.

Consider the current test output:

> rspec -f d Manifolds::CLI #new is expected to be directory #create when within an umbrella project creates a tables directory within the project creates a routines directory within the project creates a manifold.yml file writes the manifold.yml file with dimensions writes the manifold.yml file with metrics when outside an umbrella project does not allow adding projects and logs an error #generate calls generate_dimensions_schema on bq service with correct project name Manifolds::Services::BigQueryService #generate_dimensions_schema checks if the configuration file exists writes the dimensions schema to a file logs success message when the configuration file does not exist logs an error message Finished in 0.01196 seconds (files took 0.08927 seconds to load) 12 examples, 0 failures

We can identify redundancies, unnatural language, missing context, etc in the output. By following my suggested organization, the output might look more like this:

Manifolds::CLI #init creates a project directory #create within a project creates a tables directory creates a routines directory

Ok, I digress.

spec/manifolds/cli_spec.rb

tylr

Ok...I went way too deep on test thread but a couple other quick bits...

Requires

We're doing requires wrong. We should have a manifold.rb file in lib/ root that does all the necessary loading so that the files living under it's namespace don't need to maintain requires. They will eventually be duplicative and impossible to manage. Let's hit this in a subsequent PR but catch it before it goes off the...rails :)

Relative paths

Similarly, paths. Relative paths are bad and I don't believe you can guarantee ./ means the same thing from all execution contexts.

I think there are clever ways in ruby to add the lib/ root to the execution environments load path. So you can do stuff like require 'lib/manifolds/cli' anywhere within the project. If a google or AI prompt yields an obvious answer great. Otherwise, let's not add any more relative paths in this PR and start using the convention I normally see.

current_directory = Dir.pwd # same as pwd from any terminal
path = File.join(current_directory, 'my', 'path', 'to_a.file') # I believe this is a best practice because it takes care of escape sequences and delimiters.

Ok thanks!

tylr · 2024-11-07T06:58:50Z

spec/manifolds/cli_spec.rb

+        expect(Dir.exist?("./#{project_name}/projects")).to be true
+        expect(Dir.exist?("./#{project_name}/vectors")).to be true


Thanks, @johncarney. I had a similar thought and recently started exploring better test matchers to implement a similar change.

This is an interesting topic to dive into...

On a related note, we should consider adopting single-expectation tests that exclude any initialization logic (see: https://www.betterspecs.org/#single). In my experience, this approach leads to better specifications for several nuanced reasons.

To transform these tests into true specs that read like specifications, I would structure them like this:

context "when a project is initialized" do let(:project_name) { Faker.snake_case_name_or_something } let(:base_path) { File.join(Dir.pwd, project_name) } before { cli.init(project_name) } it { expect(Pathname(File.join(base_path, 'projects'))).to be_directory } it { expect(Pathname(File.join(base_path, 'vectors'))).to be_directory } end

This refactoring reduces the lines of code and provides a clear entry point for specifying behaviors without the need for extensive context/state setup. But there's more to consider...

expect(Pathname(File.join(base_path, 'projects'))).to be_directory

Isn’t Pathname(File.join(base_path, 'projects')) an example of application domain details creeping into our tests? I would argue that it is. Ideally, development should be driven by specs that don’t embed specific implementation details.

Here's an updated approach to consider:

describe CLI do subject(:cli) { CLI.new(...) } describe '#init' do context "with a name and directory" do let(:name) { Faker.snake_case_name_or_something } let(:directory) { File.join(Dir.pwd, name) } before { subject.init(name, directory:) } it { expect(Pathname(subject.vectors_directory)).to be_directory } it { expect(Pathname(subject.projects_directory)).to be_directory } end end end

I'll admit that defining what makes a good spec is near impossible. However, a useful litmus test is whether the test code reads like natural language and whether the test output is concise and hierarchically clear.

Consider the current test output:

> rspec -f d Manifolds::CLI #new is expected to be directory #create when within an umbrella project creates a tables directory within the project creates a routines directory within the project creates a manifold.yml file writes the manifold.yml file with dimensions writes the manifold.yml file with metrics when outside an umbrella project does not allow adding projects and logs an error #generate calls generate_dimensions_schema on bq service with correct project name Manifolds::Services::BigQueryService #generate_dimensions_schema checks if the configuration file exists writes the dimensions schema to a file logs success message when the configuration file does not exist logs an error message Finished in 0.01196 seconds (files took 0.08927 seconds to load) 12 examples, 0 failures

We can identify redundancies, unnatural language, missing context, etc in the output. By following my suggested organization, the output might look more like this:

Manifolds::CLI #init creates a project directory #create within a project creates a tables directory creates a routines directory

Ok, I digress.

claytongentry

Thanks for the extensive reviews @johncarney and @tylr .

Have made changes:

Enforced a couple of those rubocop rules I'd allowed to lax.
Single assertion for most (all?) tests.
Removed all relative path usage in favor of Dir.pwd and File.join.

@tylr I don't quite the grasp the difference in your suggestion vs the test output:

➜  manifolds git:(claytongentry/entities) rspec -f d

Randomized with seed 7975

Manifolds::CLI
  vectors
    add
      when outside an umbrella project
        indicates that the command must be run within a project
      when adding a vector within an umbrella project
        creates a vector configuration file with 'attributes'
  #add
    when adding a project within an umbrella project
      creates a 'routines' directory
      adds metrics to the project's manifold configuration
      adds vectors to the project's manifold configuration
      creates a 'tables' directory
    when outside an umbrella project
      indicates that the command must be run within a project
  #init
    when initializing a new project
      creates a 'projects' directory
      creates a 'vectors' directory

Manifolds::Services::VectorService
  #load_vector_schema
    loads and transforms vector schema
    when vector file doesn't exist
      logs error
      returns nil

Manifolds::Services::BigQueryService
  #generate_dimensions_schema
    when the project configuration exists
      includes the expected schema structure
      generates a dimensions schema file
    when the project configuration is missing
      indicates the configuration is missing

Finished in 0.05442 seconds (files took 0.28224 seconds to load)
15 examples, 0 failures

It seems to read like English to me, as you're suggesting. Let me know if I'm missing something.

claytongentry · 2024-11-07T13:58:12Z

spec/manifolds/cli_spec.rb

+      File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/config_template.yml", "vectors:\nmetrics:")
+      File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/vector_template.yml", "attributes:")
+    end
+  end

  describe "#init" do
    subject(:cli) { described_class.new(logger: null_logger) }


@tylr may disagree with this one. In other non-Ruby projects I've always put the execution logic in the test itself rather than any setup clause. I just thought this was rspec convention.

tylr

One small rubyism suggestion. Up to you to implement John / my suggestion on the mocks in before blocks. I could see that changing a few ways as test suite grows.

I think this passes my QA overall and fine merging.

tylr · 2024-11-11T16:13:50Z

lib/manifolds/services/big_query_service.rb

+        fields = []
+
+        # Load vector schemas
+        config["vectors"]&.each do |vector|
+          @logger.info("Loading vector schema for '#{vector}'.")
+          vector_schema = @vector_service.load_vector_schema(vector)
+          fields << vector_schema if vector_schema
+        end


Suggested change

fields = []

# Load vector schemas

config["vectors"]&.each do |vector|

@logger.info("Loading vector schema for '#{vector}'.")

vector_schema = @vector_service.load_vector_schema(vector)

fields << vector_schema if vector_schema

end

fields = config["vectors"].reduce([]) do |list, vector|

@logger.info("Loading vector schema for '#{vector}'.")

[*@vector_service.load_vector_schema(vector), *list]

end

💯 agree with Tyler. Map/reduce is your friend. Only thing is in the original, there's a safe access operator on config["vectors"]. The simplest way to tweak Tyler's suggestion so it handles nil is to wrap it in Array(). Array(nil) gives you an empty array:

fields = Array(config["vector"]).reduce do |list, vector| # ... end

Another thing I noticed is that you could use filter_map here.

fields = Array(config["vectors"]).filter_map do |vector| @logger.info("Loading vector schema for '#{vector}'.")) @vector_service.load_vector_schema(vector) end

filter_map was introduced in Ruby 2.7 and to this day I still keep forgetting about it :D

As a side note, I dislike accessing instance variables directly. It tends to impede refactoring. Except in rare circumstances I always use attr_reader (never attr_writer or attr_accessor though—those should be removed from the language).

Sorry I missed this — I appreciate the note. I like filter_map, will snag it in a followup.

tylr · 2024-11-11T16:17:26Z

spec/manifolds/cli_spec.rb

+      File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/config_template.yml", "vectors:\nmetrics:")
+      File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/vector_template.yml", "attributes:")
+    end
+  end

  describe "#init" do
    subject(:cli) { described_class.new(logger: null_logger) }


I like to opt for the one liners, but agree with John here. The allow calls themselves are part of a behavior.

I could maybe see it being ok in a before within a describe "#method" block where all calls to the method need to be mocked. Or even extracting a method if a ton of specs need that setup.

claytongentry added 3 commits November 4, 2024 11:52

Add Entities

7628a19

turn off rules

d1e114f

Rename to vectors and improve cli interface

8f4cedd

claytongentry changed the title ~~Entities~~ Vectors Nov 4, 2024

claytongentry added 2 commits November 4, 2024 12:30

fix vector writes

4b80d03

nest dimensions

83c7fea

claytongentry marked this pull request as ready for review November 4, 2024 17:44

claytongentry added 2 commits November 4, 2024 12:48

update readme

15934f5

add usage

3990a34

claytongentry requested a review from tylr November 4, 2024 17:52

claytongentry added 3 commits November 4, 2024 13:03

snip extra docs

a789dc6

more bdd

4aa9ae6

logger spy

a82a184

johncarney reviewed Nov 4, 2024

View reviewed changes

tylr requested changes Nov 7, 2024

View reviewed changes

claytongentry added 13 commits November 7, 2024 07:54

Generate rubocop todo

fd4b719

split tests

0ef9053

load yaml

18e88e6

more yaml parsing

83fc0f5

clean up cops

bef2d4c

rm empty todo

88b43cf

snip

b68ae96

squash test log output

af177e4

absolute

d44c6b7

more absolute

6dd1ffc

more absolute

3cfc221

more absolute

d1af506

improve test legibility

95ff35b

claytongentry commented Nov 7, 2024

View reviewed changes

claytongentry added 26 commits November 9, 2024 07:11

safe load

6bc6467

one more

c0877eb

no need to manually activate / deactivate fakefs

7631128

Add more coverage to with-directory project specs

9ddac7b

force pathname

d238b2f

more pathname

5eb13de

rename to fields

a62468d

a million pathnames

f39fa4f

use pathname methods

fd83bfc

path not a file

cbc7018

improve test log

7b80d25

Implement Project.create as a class method

32ec4c0

shorter line

56a93d9

pathname .freeze

f591949

turn back on

aa62630

improve memoized helpers

5b5deeb

Pass rubocops

dd3067a

delete unused

992e20a

vectors do not create routines or tables directories

1b3d3bc

no todos

f7c247d

tighten

8920446

tighten

1d892ae

snip

ee71dc9

abstract ugly template setup

a7a1408

nicen

53290bd

read and write

35d67b3

tylr approved these changes Nov 11, 2024

View reviewed changes

claytongentry added 2 commits November 11, 2024 11:42

more comments

7a8910e

bump version

361d80d

claytongentry merged commit b257a5d into master Nov 11, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectors #9

Vectors #9

claytongentry commented Nov 4, 2024 •

edited

Loading

johncarney Nov 4, 2024

johncarney Nov 4, 2024

claytongentry Nov 7, 2024

johncarney Nov 8, 2024

tylr Nov 11, 2024

claytongentry Nov 11, 2024

johncarney Nov 11, 2024

johncarney Nov 4, 2024

tylr Nov 7, 2024 •

edited

Loading

tylr left a comment

tylr Nov 7, 2024 •

edited

Loading

claytongentry left a comment

claytongentry Nov 7, 2024

tylr left a comment

tylr Nov 11, 2024

johncarney Nov 11, 2024 •

edited

Loading

claytongentry Nov 11, 2024

tylr Nov 11, 2024

		expect(Dir.exist?("./#{project_name}/projects")).to be true
		expect(Dir.exist?("./#{project_name}/vectors")).to be true

Vectors #9

Vectors #9

Conversation

claytongentry commented Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylr Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

tylr left a comment

Choose a reason for hiding this comment

Requires

Relative paths

tylr Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

claytongentry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johncarney Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

claytongentry commented Nov 4, 2024 •

edited

Loading

tylr Nov 7, 2024 •

edited

Loading

tylr Nov 7, 2024 •

edited

Loading

johncarney Nov 11, 2024 •

edited

Loading