String Tokenize in Groovy

In programming, tokenizing a String is a simple way of achieving lexical analysis. It is a simple way of producing parts of a String that we can process one by one. We achieve this in Java using the StringTokenizer class. Groovy has made it more convenient by providing the tokenize() method in the String class. Below are some examples on how to use tokenization of String in Groovy.

Groovy Tokenize Whitespace

Whitespace means space, tab, or carriage return. This is the simplest of all delimiters and tokenizing by Whitespace means we break down a String into words. Which is like parsing words of a String one by one. Below is an example on how to use this:
def string = "I feel good"
println string.tokenize()

This code will break down the String and return a List with three elements, which are the three words in the String:

[The, quick, brown, fox]

We can explore more on the result:

def string = "I feel good"
def tokens = string.tokenize()
println tokens.size()
tokens.each{ token ->
    println token

This will be the output:
This shows that tokenize has broken it down and returned a List of substrings, in order of appearance in the String.

Groovy Tokenize With Custom Delimiter

Sometimes what we are processing does not represent sentences. Sometimes the String is prepared for consumption of a program. For example, if we want to process a CSV file where the delimiter is comma. This is not for human but strictly for program consumption.
def fakeCsvLineContent = "1,Doe,James,5000"
def tokens = fakeCsvLineContent.tokenize(",")
println "ID = " + tokens[0]
println "Last Name = " + tokens[1]
println "First Name = " + tokens[2]
println "Salary = " + tokens[3]

The code will print:

ID = 1
Last Name = Doe
First Name = James
Salary = 5000

Just take note that the parameter to tokenize method follows the behavior of StringTokenizer. The parameter is treated as representing the list of character delimiter to break the String. For example
def fakeCsvLineContent = "1,Doe,-James,5000"
def tokens = fakeCsvLineContent.tokenize(",-")
println tokens.size()
tokens.each{ token ->
    println token

It does not break the String using the exact comma and dash sequence. But it interpret it as break into components using either comma or dash. Hence the result below: