Skip to main content

Grouping Words With Their Counts in Java

In this article, we are going to know how to split the String input (a sentence or a paragraph) into words and count each words’ occurrences.

Image by Author

The picture above is our problem definition and let’s see how to do this in Java.

String text = "We resolve to be brave. We resolve,,  to be good. We resolve to uphold the law according to our oath.";

This is the paragraph or input.
  • String textLower = text.toLowerCase();
  • textLower = textLower.replaceAll("\\W", " ");
  • textLower = textLower.replaceAll("\\s+", " ");
  • String[] words = textLower.split("\\s+");
Here, we have changed all letters to small letters using toLowerCase(). And then we replace characters apart from [a-zA-Z0–9_] with a space using replaceAll(“\\W”, “ ”). We can also replaceAll(“[^a-zA-Z0–9]”, “ ”). 

Then we removed the spaces using replaceAll(“\\s+”, “ ”) and added a single space; this is to bypass the additional spaces and consecutive non-word characters or marks. 

Now, we split the string with their spaces using split(“\\s+”) and have set them in an array called words.
  • Set<String> noDup = new LinkedHashSet<String>(Arrays.asList(words));
  • String [] noDupWords = new String[noDup.size()];
  • noDupWords = noDup.toArray(noDupWords);
Now, we have brought the word array to Sets so that we can remove duplicate words easily. 

Now, we have created a new String array with the size of a distinct word count and put all values of the set to that array.

String retText = "";

Then we’ve initiated an empty String variable to append the output.


Here, we check the occurrence of each word in the distinct elements’ array with the original array and get the count of each word. 

Then we append them to the String Variable which was initialized earlier called retText.

Put a print statement after this loop and see the output.

we,3
resolve,3
to,4
be,2
brave,1
good,1
uphold,1
the,1
law,1
according,1
our,1
oath,1

This is what the program will give. 

The full implementation.



Hope the article can help. Share your thoughts too.

Comments

Popular posts from this blog

A 3000 Years Old Love Story

Pharaoh Ramesses the Great and Queen Nefertari Pharaoh Ramesses II the Great ruled ancient Egypt during the 19th dynasty (1292-1190 BCE). His reign was the second-longest in Egyptian history, lasting from 1279 to 1213 BCE. He assumed the throne in 1279 BC as a royal member of the Nineteenth Dynasty and ruled for 67 years. In Greek sources, Ramesses II was also known as Ozymandias, with the first half of the appellation deriving from Ramesses' regnal name, Usermaatre Setepenre, which means 'The Maat of Ra is mighty, Chosen of Ra'.  He is also recognized as the Egyptian Empire's greatest, most renowned, and most dominating pharaoh. His successors and subsequent Egyptians are reported to have referred to him as the Great Ancestor. Ramesses II was a famous explorer, monarch, and warrior who conducted multiple military excursions to the Levant to reestablish Egyptian dominance over Canaan. He is also supposed to have conducted journeys south to Nubia, which are documented in...

Parallel A* Search on GPU

A* search is a fundamental topic in Artificial Intelligence. In this article, let’s see how we can implement this marvelous algorithm in parallel on Graphics Processing Unit (GPU). Traditional A* Search Classical A* search implementations typically use two lists, the open list, and the closed list, to store the states during expansion. The closed list stores all of the visited states and is used to prevent the same state from being expanded multiple times. To detect duplicated nodes, this list is frequently implemented by a linked hash table. The open list normally contains states whose successors have not yet been thoroughly investigated. The open list’s data structure is a priority queue, which is typically implemented by a binary heap. The open list of states is sorted using the heuristic function  f(x) : f(x) = g(x) + h(x). The distance or cost from the starting node to the current state  x  is defined by the function  g(x) , and the estimated distance or co...

Dead Reckoning

When it is the beginning , the navigator is clearly aware of the position/location. When he starts to move (in the mid-sea or mid sky), he can get some known (measured) factors other than the position/location in terms of a fixed landmark . They are, The direction of movement (by using a compass) Speed of movement Time taken to reach each heading Using all this information, the navigator calculates the distance and route which he has covered and keeps track of his movement by plotting a nautical chart ( also called a sea chart). This technique is known as Dead reckoning . In brief, Dead recko ning is a process to determine the position of the navigator (sailing a ship or flying an aircraft) using the record of courses that have been sailed (or flown), the distance covered (by using the velocity in which he has traveled and time taken to reach the next course from the previous course), known point (the previous point is the known point) and the estimated or known or approximated ...