Skip to content

This repository contains gradle project with java implementation of word break problem.

License

Notifications You must be signed in to change notification settings

olesiakissa/word-break-problem

Repository files navigation

Word Break Problem

Build Status codecov Codacy Badge Codacy Badge

The words.txt file contains a sorted list of approximately 173,000 words. The words are listed one word per line, do not contain spaces, and are all lowercase.

What we have to find

  • the longest concatenated word (that is, the longest word that is comprised entirely of shorter words in the file)
  • the 2nd longest concatenated word
  • the total count of all the concatenated words in the file

HashSet as perfect data structure or Why Am I Not Using Tries

In this project I used HashSet as a data structure to contain a list of words because according to Big O complexity table

  • It has O(1) lookup time while trees have O(m) where m depends on the length of the string we are looking up;
  • In some cases tries require much more space than hashset because memory can be allocated for each string character while in hashset it is a single chunk of memory for the whole string entry;
  • Imagine that the alphabet is not 26 english characters but 136,690 Unicode symbols. That means that each node of tree will represent 136,690 symbols. Here we are moving to the conclusion that complexity grows from O(m) to O(alphabet_size x key_length x N) where N is number of keys in Trie. Sounds not that nice, uh? Tries just can't work fast on big input alphabets.

Gathered metrics & results

snap1

About

This repository contains gradle project with java implementation of word break problem.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published