Skip to main content

Grouping Words With Their Counts in Java

In this article, we are going to know how to split the String input (a sentence or a paragraph) into words and count each words’ occurrences.

Image by Author

The picture above is our problem definition and let’s see how to do this in Java.

String text = "We resolve to be brave. We resolve,,  to be good. We resolve to uphold the law according to our oath.";

This is the paragraph or input.
  • String textLower = text.toLowerCase();
  • textLower = textLower.replaceAll("\\W", " ");
  • textLower = textLower.replaceAll("\\s+", " ");
  • String[] words = textLower.split("\\s+");
Here, we have changed all letters to small letters using toLowerCase(). And then we replace characters apart from [a-zA-Z0–9_] with a space using replaceAll(“\\W”, “ ”). We can also replaceAll(“[^a-zA-Z0–9]”, “ ”). 

Then we removed the spaces using replaceAll(“\\s+”, “ ”) and added a single space; this is to bypass the additional spaces and consecutive non-word characters or marks. 

Now, we split the string with their spaces using split(“\\s+”) and have set them in an array called words.
  • Set<String> noDup = new LinkedHashSet<String>(Arrays.asList(words));
  • String [] noDupWords = new String[noDup.size()];
  • noDupWords = noDup.toArray(noDupWords);
Now, we have brought the word array to Sets so that we can remove duplicate words easily. 

Now, we have created a new String array with the size of a distinct word count and put all values of the set to that array.

String retText = "";

Then we’ve initiated an empty String variable to append the output.


Here, we check the occurrence of each word in the distinct elements’ array with the original array and get the count of each word. 

Then we append them to the String Variable which was initialized earlier called retText.

Put a print statement after this loop and see the output.

we,3
resolve,3
to,4
be,2
brave,1
good,1
uphold,1
the,1
law,1
according,1
our,1
oath,1

This is what the program will give. 

The full implementation.



Hope the article can help. Share your thoughts too.

Comments

Popular posts from this blog

A 3000 Years Old Love Story

Pharaoh Ramesses the Great and Queen Nefertari Pharaoh Ramesses II the Great ruled ancient Egypt during the 19th dynasty (1292-1190 BCE). His reign was the second-longest in Egyptian history, lasting from 1279 to 1213 BCE. He assumed the throne in 1279 BC as a royal member of the Nineteenth Dynasty and ruled for 67 years. In Greek sources, Ramesses II was also known as Ozymandias, with the first half of the appellation deriving from Ramesses' regnal name, Usermaatre Setepenre, which means 'The Maat of Ra is mighty, Chosen of Ra'.  He is also recognized as the Egyptian Empire's greatest, most renowned, and most dominating pharaoh. His successors and subsequent Egyptians are reported to have referred to him as the Great Ancestor. Ramesses II was a famous explorer, monarch, and warrior who conducted multiple military excursions to the Levant to reestablish Egyptian dominance over Canaan. He is also supposed to have conducted journeys south to Nubia, which are documented in...

Parallel A* Search on GPU

A* search is a fundamental topic in Artificial Intelligence. In this article, let’s see how we can implement this marvelous algorithm in parallel on Graphics Processing Unit (GPU). Traditional A* Search Classical A* search implementations typically use two lists, the open list, and the closed list, to store the states during expansion. The closed list stores all of the visited states and is used to prevent the same state from being expanded multiple times. To detect duplicated nodes, this list is frequently implemented by a linked hash table. The open list normally contains states whose successors have not yet been thoroughly investigated. The open list’s data structure is a priority queue, which is typically implemented by a binary heap. The open list of states is sorted using the heuristic function  f(x) : f(x) = g(x) + h(x). The distance or cost from the starting node to the current state  x  is defined by the function  g(x) , and the estimated distance or co...

Sorting Algorithms  : A Comprehensive Guide

Sorting Algorithms  : Explained With Illustrations Sorting is the process of structuring data in a specific format. The sorting algorithm explicitly states how to arrange data in a specific order. The most popular orders are numerical or lexicographical. When humans understood the importance of searching speedily, they coined the term sorting. The significance of sorting stems from the fact that if data is stored in a sorted manner, data searching can be optimized to a very high level. Sorting is also used to make data more readable. The following are a few examples of sorting in real-world scenarios. 1. Telephone Directory: The telephone directory stores people’s phone numbers alphabetically so that the names can be easily searched. 2. Dictionary: Words are stored in alphabetical order in the dictionary, making it easy to find any word. Sorting algorithms can be classified into two types. Integer sorts Comparison sorts Integer sorts Counting sorts are another name for integer sort...