In this assignment, your task is to create a class called Wordlist
that counts
how many times each word appears in a file. The class must use a
singly-linked list as its underlying representation --- vectors, arrays, or
other container data structures are not allowed.
When it's done, you'll be able to write code like this:
Wordlist lst("tiny_shakespeare.txt");
lst.print_stats();
Which prints:
Number of different words: 25670
Total number of words: 202651
Most frequent word: the 5437
Number of singletons: 14919 (58%)
All the code you'll submit for this assignment goes in Wordlist.h.
Don't put main
in Wordlist.h. Instead, put main
in
a1_main.cpp, along with all the code you need to test your
Wordlist
class.
Use a1_main.cpp (it includes Wordlist.h) for testing your code.
Note You can download all the files for this assignment in a single .zip archive from the Github repository for the course. Click on the green "Code" button, and then click on "Download ZIP".
Wordlist.h has the class Wordlist
where you should implement all
the virtual methods listed in the Wordlist_base
class (in
Wordlist_base.h).
Most of the methods in Wordlist_base
are virtual and abstract methods, and
so you must write your own version of them in Wordlist
. A few methods, such
as print_stats
, are not virtual
and have implementations that you can't
change, i.e. your Wordlist
class must work correctly with them as given.
Do not change Wordlist_base.h in any way: keep it as-is.
Write your implementation of Wordlist
in Wordlist.h. It must
publicly inherit from Wordlist_base
, and use the Node
struct
(given in
Wordlist
) to implement a singly-linked list.
Don't use vectors, arrays, or any other data structures in Wordlist.h.
In addition to the methods listed in Wordlist_base
, in Wordlist
write a
default constructor that takes no parameters and creates an empty Wordlist
object:
Wordlist lst;
// ... lst is an empty Wordlist object ...
Also, write a constructor that takes the name of a file as input, and adds all
the words in that file to the Wordlist
object. Read the words from the file
using C++'s standard <<
operator (the first example at the top of this
assignment shows how this constructor should work, and what its output ought to
be).
Write a destructor for Wordlist
that de-allocates all the nodes in the list.
In Wordlist_base
, the destructor is called ~Wordlist_base()
, and the one you
write for Wordlist
should be called ~Wordlist()
.
Use the test_read()
function in a1_main.cpp to test your code.
For example, small.txt contains the following text:
This is
a test
or is this
a test?
When you run this code:
// ...
void test_read()
{
Wordlist lst;
string w;
while (cin >> w)
{
lst.add_word(w);
}
lst.print_words();
cout << endl;
lst.print_stats();
}
int main()
{
test_read();
}
The output for small.txt should be:
❯ ./a1_main < small.txt
1. {"This", 1}
2. {"a", 2}
3. {"is", 2}
4. {"or", 1}
5. {"test", 1}
6. {"test?", 1}
7. {"this", 1}
Number of different words: 7
Total number of words: 9
Most frequent word: a 2
Number of singletons: 5 (71%)
Notice that case matters, e.g. "This"
and "this"
are counted as
different words. Also, punctuation matters, e.g. "test"
and "test?"
are
counted as different.
Note Real life programs would likely strip out punctuation and ignore letter case, but in this assignment we want to count every word exactly as it appears in the file. This makes the code a littler simpler, and more consistent across students.
Here's another example using the larger file tiny_shakespeare.txt:
❯ ./a1_main < tiny_shakespeare.txt >tiny_shakespeare_out
This could take a minute or two to run --- it's a big file and a singly-linked
is not the fastest way to implement Wordlist
.
There's more than 25,000 lines of output, and so the example uses
>tiny_shakespeare_out
to print the output to the file
tiny_shakespeare_out:
1. {"&C:", 2}
...
25670. {"zodiacs", 1}
Number of different words: 25670
Total number of words: 202651
Most frequent word: the 5437
Number of singletons: 14919 (58%)
A good way to test your program is to use the Linux diff
command to compare
your output to this expected output:
> ./a1_main < tiny_shakespeare.txt >out
❯ diff out tiny_shakespeare_out
If diff
prints nothing, then the two files are identical. Otherwise, it prints
each pair of different lines.
When you're done, submit just your Wordlist.h on Canvas. Don't submit anything else. A copy of Wordlist_base.h will be in the same folder as your Wordlist.h when it's compiled.
The marker will use their own a1_main.cpp
that will include your
Wordlist.h and will test the methods in it. They will compile your
code on Ubuntu Linux using makefile, which runs this command:
> make a1_main
g++ -O3 -std=c++17 -Wall -Wextra -Werror -Wfatal-errors -Wno-sign-compare -Wnon-virtual-dtor -g a1_main.cpp -o a1_main
This is a very strict compilation command! No warnings or errors are allowed! Make sure that your code compiles with this command before submitting it.
The marker will test the correctness of your code on at least one text file you have not seen before, and they will also test individual method calls using test functions you have not seen.
Your program will also be run with valgrind
to check for memory leaks, and
other memory errors, e.g.:
> valgrind ./a1_main
==13731== Memcheck, a memory error detector
==13731== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==13731== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==13731== Command: ./a1_main
==13731==
1. {"This", 1}
2. {"a", 2}
3. {"is", 2}
4. {"or", 1}
5. {"test", 1}
6. {"test?", 1}
7. {"this", 1}
Number of different words: 7
Total number of words: 9
Most frequent word: a 2
Number of singletons: 5 (71%)
==13731==
==13731== HEAP SUMMARY:
==13731== in use at exit: 0 bytes in 0 blocks
==13731== total heap usage: 10 allocs, 10 frees, 78,160 bytes allocated
==13731==
==13731== All heap blocks were freed -- no leaks are possible
==13731==
==13731== For lists of detected and suppressed errors, rerun with: -s
==13731== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
valgrind
should report "no leaks are possible", and should not print any
other errors.
Be sure to test your program, and run it with valgrind
, before submitting it.
- 2 marks for each of the 9 virtual methods in
Wordlist_base
that is implemented correctly. - 2 marks for a default constructor that creates an empty
Wordlist
object. - 2 marks for a constructor that takes the name of a file as input, and adds
all the words in that file to the
Wordlist
. Read the words from the file using C++'s standard<<
operator (the first example at the top of this assignment shows how this constructor should work, and what its output ought to be).
- All code is sensibly and consistently indented, and all lines are 100 characters in length, or less.
- Whitespace is used to group related pieces of a code to make it easier for humans to read. All whitespace has a purpose.
- Variable and function names are self-descriptive.
- Appropriate features of C++ are used, as discussed in class and in the notes. Note If you use a feature that we haven't discussed in class, you must explain it in a comment, even if you think it's obvious.
- Comments are used when needed to explain code whose purpose is not obvious from the code itself. There should be no commented-out code from previous versions.
- a score of 0 if you change
Node
is ways that are not allowed, if you modify anything inWordlist_base
, or if you use a vector, array, or any other data structure other than a singly-linked list. - at least -1 mark if your file has an incorrect name, or you submit it in the incorrect format, etc.
- up to -3 marks if you do not include your full name, email, and SFU ID in the header of your file.
- a score of 0 if you don't include the "statement of originality, or its modified in any way.
- a score of 0 if you submit a "wrong" non-working file, and then after the due date submit the "right" file. If you can provide evidence that you finished the assignment on time, then it may be marked.
A good way to catch bugs is to put assert(is_sorted());
at the end of
add_word()
. Then every time you add a word it will check that the list is in
sorted order (by word).
Test as you go! When you write a method, add a few test cases for it, e.g. using
assert
or if-statements.