-
Notifications
You must be signed in to change notification settings - Fork 4
Koans AboutRegularExpressions
class AboutRegularExpressions < Neo::Koan
def test_a_pattern_is_a_regular_expression
assert_equal Regexp, /pattern/.class
end
def test_a_regexp_can_search_a_string_for_matching_content
assert_equal "match", "some matching content"[/match/]
end
def test_a_failed_match_returns_nil
assert_equal nil, "some matching content"[/missing/]
end
Regular Expressions, or Regexp
, are usually indicated with /.../
and test whether a string contains a given pattern, or extract the portions that match. If no match is found, nil
is returned.
def test_question_mark_means_optional
assert_equal "ab", "abbcccddddeeeee"[/ab?/]
assert_equal "a", "abbcccddddeeeee"[/az?/]
end
def test_plus_means_one_or_more
assert_equal "bccc", "abbcccddddeeeee"[/bc+/]
end
def test_asterisk_means_zero_or_more
assert_equal "abb", "abbcccddddeeeee"[/ab*/]
assert_equal "a", "abbcccddddeeeee"[/az*/]
assert_equal "", "abbcccddddeeeee"[/z*/]
end
?
, +
, and *
are repetition metacharacters called "quantifiers" and are used to specify how many times a construct needs to occur. ?
indicates that it is optional, +
means one or more, and *
means zero or more.
def test_the_left_most_match_wins
assert_equal "a", "abbccc az"[/az*/]
end
def test_character_classes_give_options_for_a_character
animals = ["cat", "bat", "rat", "zat"]
assert_equal ["cat", "bat", "rat"], animals.select { |a| a[/[cbr]at/] }
end
Instead of expressing every character you'd like to match, you may instead provide options for the construct to return.
def test_slash_d_is_a_shortcut_for_a_digit_character_class
assert_equal "42", "the number is 42"[/[0123456789]+/]
assert_equal "42", "the number is 42"[/\d+/]
end
def test_character_classes_can_include_ranges
assert_equal "42", "the number is 42"[/[0-9]+/]
end
def test_slash_s_is_a_shortcut_for_a_whitespace_character_class
assert_equal " \t\n", "space: \t\n"[/\s+/]
end
def test_slash_w_is_a_shortcut_for_a_word_character_class
# NOTE: This is more like how a programmer might define a word.
assert_equal "variable_1", "variable_1 = 42"[/[a-zA-Z0-9_]+/]
assert_equal "variable_1", "variable_1 = 42"[/\w+/]
end
def test_period_is_a_shortcut_for_any_non_newline_character
assert_equal "abc", "abc\n123"[/a.+/]
end
def test_a_character_class_can_be_negated
assert_equal "the number is ", "the number is 42"[/[^0-9]+/]
end
def test_shortcut_character_classes_are_negated_with_capitals
assert_equal "the number is ", "the number is 42"[/\D+/]
assert_equal "space:", "space: \t\n"[/\S+/]
# ... a programmer would most likely do
assert_equal " = ", "variable_1 = 42"[/[^a-zA-Z0-9_]+/]
assert_equal " = ", "variable_1 = 42"[/\W+/]
end
You can return matches for only digits and the shorthand for that instead of expressing every digit is to write \d+
. If you'd like to return a matching digit in a certain range, you may do that as well ex. [0-9]
. You can also opt to negate what should be returned using ^
which returns everything in the string except what you indicated to negate. For character classes, you can use a capital W to negate `ex. /\W/.
def test_slash_a_anchors_to_the_start_of_the_string
assert_equal "start", "start end"[/\Astart/]
assert_equal nil, "start end"[/\Aend/]
end
def test_slash_z_anchors_to_the_end_of_the_string
assert_equal "end", "start end"[/end\z/]
assert_equal nil, "start end"[/start\z/]
end
def test_caret_anchors_to_the_start_of_lines
assert_equal "2", "num 42\n2 lines"[/^\d+/]
end
def test_dollar_sign_anchors_to_the_end_of_lines
assert_equal "42", "2 lines\nnum 42"[/\d+$/]
end
def test_slash_b_anchors_to_a_word_boundary
assert_equal "vines", "bovine vines"[/\bvine./]
end
Anchors are metacharacters that match positions between characters and anchor the match to a specific position. For example \A
matches to the beginning of a string,\Z\
matches to the end of a string, ^
matches to the start of a line, $
matches to the end of a line, and \b
anchors to a word boundary.
def test_parentheses_group_contents
assert_equal "hahaha", "ahahaha"[/(ha)+/]
end
def test_parentheses_also_capture_matched_content_by_number
assert_equal "Gray", "Gray, James"[/(\w+), (\w+)/, 1]
assert_equal "James", "Gray, James"[/(\w+), (\w+)/, 2]
end
def test_variables_can_also_be_used_to_access_captures
assert_equal "Gray, James", "Name: Gray, James"[/(\w+), (\w+)/]
assert_equal "Gray", $1
assert_equal "James", $2
end
def test_a_vertical_pipe_means_or
grays = /(James|Dana|Summer) Gray/
assert_equal "James Gray", "James Gray"[grays]
assert_equal "Summer", "Summer Gray"[grays, 1]
assert_equal nil, "Jim Gray"[grays, 1]
end
def test_scan_is_like_find_all
assert_equal ["one", "two", "three"], "one two-three".scan(/\w+/)
end
def test_sub_is_like_find_and_replace
assert_equal "one t-three", "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
def test_gsub_is_like_find_and_replace_all
assert_equal "one t-t", "one two-three".gsub(/(t\w*)/) { $1[0, 1] }
end
end
Using parentheses can be used to match a group of characters and can even allow you to match by position number. The vertical bar metacharacter combines multiple expressions into a single one that matches either of the expressions. Subs and GSubs may also be used to not return specific portions of a match.