In this article, we are going to know how to split the String input (a sentence or a paragraph) into words and count each words’ occurrences.
Image by Author |
The picture above is our problem definition and let’s see how to do this in Java.
String text = "We resolve to be brave. We resolve,, to be good. We resolve to uphold the law according to our oath.";
This is the paragraph or input.
- String textLower = text.toLowerCase();
- textLower = textLower.replaceAll("\\W", " ");
- textLower = textLower.replaceAll("\\s+", " ");
- String[] words = textLower.split("\\s+");
Here, we have changed all letters to small letters using toLowerCase(). And then we replace characters apart from [a-zA-Z0–9_] with a space using replaceAll(“\\W”, “ ”). We can also replaceAll(“[^a-zA-Z0–9]”, “ ”).
Then we removed the spaces using replaceAll(“\\s+”, “ ”) and added a single space; this is to bypass the additional spaces and consecutive non-word characters or marks.
Now, we split the string with their spaces using split(“\\s+”) and have set them in an array called words.
- Set<String> noDup = new LinkedHashSet<String>(Arrays.asList(words));
- String [] noDupWords = new String[noDup.size()];
- noDupWords = noDup.toArray(noDupWords);
Now, we have brought the word array to Sets so that we can remove duplicate words easily.
Now, we have created a new String array with the size of a distinct word count and put all values of the set to that array.
String retText = "";
Then we’ve initiated an empty String variable to append the output.
Here, we check the occurrence of each word in the distinct elements’ array with the original array and get the count of each word.
Then we append them to the String Variable which was initialized earlier called retText.
Put a print statement after this loop and see the output.
we,3
resolve,3
to,4
be,2
brave,1
good,1
uphold,1
the,1
law,1
according,1
our,1
oath,1
This is what the program will give.
The full implementation.
Hope the article can help. Share your thoughts too.
Comments
Post a Comment