-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectors #9
Vectors #9
Conversation
.rubocop.yml
Outdated
RSpec/ExampleLength: | ||
Max: 18 | ||
|
||
RSpec/MultipleExpectations: | ||
Enabled: false | ||
|
||
RSpec/MultipleMemoizedHelpers: | ||
Enabled: false | ||
|
||
RSpec/NestedGroups: | ||
Max: 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are fairly loose settings. MultipleExpectations and MultipleMemoizedHelpers in particular are very helpful in keeping specs clean. Multiple assertions in a test is regarded as a smell in TDD generally, and too many memoized helpers is a pretty clear sign that you need to refactor your code. Consider removing these settings from here then run rubocop --auto-gen-config
to generate a .rubocop_todo.yml. This gives you a set of goals you can work toward over time.
File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/config_template.yml", "vectors:\nmetrics:") | ||
File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/vector_template.yml", "attributes:") | ||
end | ||
end | ||
|
||
describe "#init" do | ||
subject(:cli) { described_class.new(logger: null_logger) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the action being tested in a before
blocks makes tests harder to read because you need to jump back and forth in the file to figure out what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylr may disagree with this one. In other non-Ruby projects I've always put the execution logic in the test itself rather than any setup clause. I just thought this was rspec convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Execution should be with test. Setup is for, well, setting up.
I've seen different practices over the years, but my understanding that RSpec maintainers recommend keeping the execution logic inside the example. In fact I believe they also recommend keeping as much setup in the example as is practical. Personally I feel that is not often very practical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to opt for the one liners, but agree with John here. The allow
calls themselves are part of a behavior.
I could maybe see it being ok in a before
within a describe "#method"
block where all calls to the method need to be mocked. Or even extracting a method if a ton of specs need that setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. In that case, I think we all actually agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't even looked at the stubs. I think I will bite my tongue for now :D
spec/manifolds/cli_spec.rb
Outdated
expect(Dir.exist?("./#{project_name}/projects")).to be true | ||
expect(Dir.exist?("./#{project_name}/vectors")).to be true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either split this into two tests (my preference), or wrap them in an aggregate_failures
block.
Also, expect(Pathname("some/path")).to be_directory
reads a lot better.
it "creates a 'projects' directory for the project" do
cli.init(project_name)
expect(Pathname(".#{project_name}/projects")).to be_directory
end
it "creates a 'vectors' directory for the project" do
cli.init(project_name)
expect(Pathname(".#{project_name}/vectors")).to be_directory
end
it "creates 'projects' and 'vectors' directories for the project" do
cli.init(project_name)
aggregate_failures do
expect(Pathname(".#{project_name}/projects")).to be_directory
expect(Pathname(".#{project_name}/vectors")).to be_directory
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @johncarney. I had a similar thought and recently started exploring better test matchers to implement a similar change.
This is an interesting topic to dive into...
On a related note, we should consider adopting single-expectation tests that exclude any initialization logic (see: https://www.betterspecs.org/#single). In my experience, this approach leads to better specifications for several nuanced reasons.
To transform these tests into true specs that read like specifications, I would structure them like this:
context "when a project is initialized" do
let(:project_name) { Faker.snake_case_name_or_something }
let(:base_path) { File.join(Dir.pwd, project_name) }
before { cli.init(project_name) }
it { expect(Pathname(File.join(base_path, 'projects'))).to be_directory }
it { expect(Pathname(File.join(base_path, 'vectors'))).to be_directory }
end
This refactoring reduces the lines of code and provides a clear entry point for specifying behaviors without the need for extensive context/state setup. But there's more to consider...
expect(Pathname(File.join(base_path, 'projects'))).to be_directory
Isn’t Pathname(File.join(base_path, 'projects'))
an example of application domain details creeping into our tests? I would argue that it is. Ideally, development should be driven by specs that don’t embed specific implementation details.
Here's an updated approach to consider:
describe CLI do
subject(:cli) { CLI.new(...) }
describe '#init' do
context "with a name and directory" do
let(:name) { Faker.snake_case_name_or_something }
let(:directory) { File.join(Dir.pwd, name) }
before { subject.init(name, directory:) }
it { expect(Pathname(subject.vectors_directory)).to be_directory }
it { expect(Pathname(subject.projects_directory)).to be_directory }
end
end
end
I'll admit that defining what makes a good spec is near impossible. However, a useful litmus test is whether the test code reads like natural language and whether the test output is concise and hierarchically clear.
Consider the current test output:
> rspec -f d
Manifolds::CLI
#new
is expected to be directory
#create
when within an umbrella project
creates a tables directory within the project
creates a routines directory within the project
creates a manifold.yml file
writes the manifold.yml file with dimensions
writes the manifold.yml file with metrics
when outside an umbrella project
does not allow adding projects and logs an error
#generate
calls generate_dimensions_schema on bq service with correct project name
Manifolds::Services::BigQueryService
#generate_dimensions_schema
checks if the configuration file exists
writes the dimensions schema to a file
logs success message
when the configuration file does not exist
logs an error message
Finished in 0.01196 seconds (files took 0.08927 seconds to load)
12 examples, 0 failures
We can identify redundancies, unnatural language, missing context, etc in the output. By following my suggested organization, the output might look more like this:
Manifolds::CLI
#init
creates a project directory
#create
within a project
creates a tables directory
creates a routines directory
Ok, I digress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok...I went way too deep on test thread but a couple other quick bits...
Requires
We're doing requires wrong. We should have a manifold.rb
file in lib/
root that does all the necessary loading so that the files living under it's namespace don't need to maintain requires. They will eventually be duplicative and impossible to manage. Let's hit this in a subsequent PR but catch it before it goes off the...rails :)
Relative paths
Similarly, paths. Relative paths are bad and I don't believe you can guarantee ./
means the same thing from all execution contexts.
I think there are clever ways in ruby to add the lib/
root to the execution environments load path. So you can do stuff like require 'lib/manifolds/cli'
anywhere within the project. If a google or AI prompt yields an obvious answer great. Otherwise, let's not add any more relative paths in this PR and start using the convention I normally see.
current_directory = Dir.pwd # same as pwd from any terminal
path = File.join(current_directory, 'my', 'path', 'to_a.file') # I believe this is a best practice because it takes care of escape sequences and delimiters.
Ok thanks!
spec/manifolds/cli_spec.rb
Outdated
expect(Dir.exist?("./#{project_name}/projects")).to be true | ||
expect(Dir.exist?("./#{project_name}/vectors")).to be true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @johncarney. I had a similar thought and recently started exploring better test matchers to implement a similar change.
This is an interesting topic to dive into...
On a related note, we should consider adopting single-expectation tests that exclude any initialization logic (see: https://www.betterspecs.org/#single). In my experience, this approach leads to better specifications for several nuanced reasons.
To transform these tests into true specs that read like specifications, I would structure them like this:
context "when a project is initialized" do
let(:project_name) { Faker.snake_case_name_or_something }
let(:base_path) { File.join(Dir.pwd, project_name) }
before { cli.init(project_name) }
it { expect(Pathname(File.join(base_path, 'projects'))).to be_directory }
it { expect(Pathname(File.join(base_path, 'vectors'))).to be_directory }
end
This refactoring reduces the lines of code and provides a clear entry point for specifying behaviors without the need for extensive context/state setup. But there's more to consider...
expect(Pathname(File.join(base_path, 'projects'))).to be_directory
Isn’t Pathname(File.join(base_path, 'projects'))
an example of application domain details creeping into our tests? I would argue that it is. Ideally, development should be driven by specs that don’t embed specific implementation details.
Here's an updated approach to consider:
describe CLI do
subject(:cli) { CLI.new(...) }
describe '#init' do
context "with a name and directory" do
let(:name) { Faker.snake_case_name_or_something }
let(:directory) { File.join(Dir.pwd, name) }
before { subject.init(name, directory:) }
it { expect(Pathname(subject.vectors_directory)).to be_directory }
it { expect(Pathname(subject.projects_directory)).to be_directory }
end
end
end
I'll admit that defining what makes a good spec is near impossible. However, a useful litmus test is whether the test code reads like natural language and whether the test output is concise and hierarchically clear.
Consider the current test output:
> rspec -f d
Manifolds::CLI
#new
is expected to be directory
#create
when within an umbrella project
creates a tables directory within the project
creates a routines directory within the project
creates a manifold.yml file
writes the manifold.yml file with dimensions
writes the manifold.yml file with metrics
when outside an umbrella project
does not allow adding projects and logs an error
#generate
calls generate_dimensions_schema on bq service with correct project name
Manifolds::Services::BigQueryService
#generate_dimensions_schema
checks if the configuration file exists
writes the dimensions schema to a file
logs success message
when the configuration file does not exist
logs an error message
Finished in 0.01196 seconds (files took 0.08927 seconds to load)
12 examples, 0 failures
We can identify redundancies, unnatural language, missing context, etc in the output. By following my suggested organization, the output might look more like this:
Manifolds::CLI
#init
creates a project directory
#create
within a project
creates a tables directory
creates a routines directory
Ok, I digress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extensive reviews @johncarney and @tylr .
Have made changes:
- Enforced a couple of those rubocop rules I'd allowed to lax.
- Single assertion for most (all?) tests.
- Removed all relative path usage in favor of
Dir.pwd
andFile.join
.
@tylr I don't quite the grasp the difference in your suggestion vs the test output:
➜ manifolds git:(claytongentry/entities) rspec -f d
Randomized with seed 7975
Manifolds::CLI
vectors
add
when outside an umbrella project
indicates that the command must be run within a project
when adding a vector within an umbrella project
creates a vector configuration file with 'attributes'
#add
when adding a project within an umbrella project
creates a 'routines' directory
adds metrics to the project's manifold configuration
adds vectors to the project's manifold configuration
creates a 'tables' directory
when outside an umbrella project
indicates that the command must be run within a project
#init
when initializing a new project
creates a 'projects' directory
creates a 'vectors' directory
Manifolds::Services::VectorService
#load_vector_schema
loads and transforms vector schema
when vector file doesn't exist
logs error
returns nil
Manifolds::Services::BigQueryService
#generate_dimensions_schema
when the project configuration exists
includes the expected schema structure
generates a dimensions schema file
when the project configuration is missing
indicates the configuration is missing
Finished in 0.05442 seconds (files took 0.28224 seconds to load)
15 examples, 0 failures
It seems to read like English to me, as you're suggesting. Let me know if I'm missing something.
File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/config_template.yml", "vectors:\nmetrics:") | ||
File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/vector_template.yml", "attributes:") | ||
end | ||
end | ||
|
||
describe "#init" do | ||
subject(:cli) { described_class.new(logger: null_logger) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylr may disagree with this one. In other non-Ruby projects I've always put the execution logic in the test itself rather than any setup clause. I just thought this was rspec convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small rubyism suggestion. Up to you to implement John / my suggestion on the mocks in before
blocks. I could see that changing a few ways as test suite grows.
I think this passes my QA overall and fine merging.
fields = [] | ||
|
||
# Load vector schemas | ||
config["vectors"]&.each do |vector| | ||
@logger.info("Loading vector schema for '#{vector}'.") | ||
vector_schema = @vector_service.load_vector_schema(vector) | ||
fields << vector_schema if vector_schema | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fields = [] | |
# Load vector schemas | |
config["vectors"]&.each do |vector| | |
@logger.info("Loading vector schema for '#{vector}'.") | |
vector_schema = @vector_service.load_vector_schema(vector) | |
fields << vector_schema if vector_schema | |
end | |
fields = config["vectors"].reduce([]) do |list, vector| | |
@logger.info("Loading vector schema for '#{vector}'.") | |
[*@vector_service.load_vector_schema(vector), *list] | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 agree with Tyler. Map/reduce is your friend. Only thing is in the original, there's a safe access operator on config["vectors"]
. The simplest way to tweak Tyler's suggestion so it handles nil
is to wrap it in Array()
. Array(nil)
gives you an empty array:
fields = Array(config["vector"]).reduce do |list, vector|
# ...
end
Another thing I noticed is that you could use filter_map
here.
fields = Array(config["vectors"]).filter_map do |vector|
@logger.info("Loading vector schema for '#{vector}'."))
@vector_service.load_vector_schema(vector)
end
filter_map
was introduced in Ruby 2.7 and to this day I still keep forgetting about it :D
As a side note, I dislike accessing instance variables directly. It tends to impede refactoring. Except in rare circumstances I always use attr_reader
(never attr_writer
or attr_accessor
though—those should be removed from the language).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this — I appreciate the note. I like filter_map
, will snag it in a followup.
File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/config_template.yml", "vectors:\nmetrics:") | ||
File.write("#{File.dirname(__FILE__)}/../../lib/manifolds/templates/vector_template.yml", "attributes:") | ||
end | ||
end | ||
|
||
describe "#init" do | ||
subject(:cli) { described_class.new(logger: null_logger) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to opt for the one liners, but agree with John here. The allow
calls themselves are part of a behavior.
I could maybe see it being ok in a before
within a describe "#method"
block where all calls to the method need to be mocked. Or even extracting a method if a ton of specs need that setup.
Vectors represent the "nouns" of your manifold universe. They are the entities on whose behalf you can roll up metrics, like a Page or a Campaign.
To add a vector to your umbrella, run
manifolds vectors add page
.This will generate a config file in your
vectors/
directory that you can fill out like so:You can express the vectors you'd like to intersect in your manifold.yml like so:
Then when you run
manifolds generate
, your dimensions schema is generated from the vectors you specified.