Using the split method of the String class

2021-06-05

Hello everyone. This page is an English translation of a Japanese page. (The original Japanese has been slightly rewritten to make it easier to translate into English.)

This time we used the String#split method. This method is used to split a string.

String to be handled

The strings to be handled this time are as follows.

  • java-beginner.com x.xx.xxx.xxxx – – [27/Jul/2016:23:48:20 +0900]

This is an excerpt of the following from the top of the server’s access log.

  1. Domain name of the access destination
  2. Host name or IP address of the access source
  3. Client-side user name (usually blank)
  4. User name for the authentication (usually blank)
  5. Access Date

If you want to divide this into half-width spaces, I think it is common to use the following methods of the String class.

  • String[] split(String regex)
  • String[] split(String regex, int limit)

Let’s actually write a program that uses the above method.

Split by spaces

I thought of splitting it up with whitespace, so I came up with the following program. The temporary argument “String regex" is a regular expression, but I simply set it to " “.

Source
public class HelloSplit {

	public static void main(String[] args) {
        String log = "java-beginner.com x.xx.xxx.xxxx - - [27/Jul/2016:23:48:20 +0900]";

        for (String s : log.split(" ")) {
            System.out.println(s);
        }
	}

}
Execution results
java-beginner.com
x.xx.xxx.xxxx
-
-
[27/Jul/2016:23:48:20
+0900]

I used log.split(" “) at the extended for statement to split the string. However, the result has a problem in the last part. If you look closely, you can see that in the time section, there is a space for " +0900″.

A tentative way to improve this is to use log.split(" “, 5) to limit the number of splits. However, this method is only useful for this excerpt log. Let’s use regular expressions more generically.

Use generic regular expressions.

For general purpose, we used log.split(“(?! (\\s\+))\s"). Since the regular expression part is written in a Java file, the slash is escaped by writing another slash. This way of writing will tell the compiler that “(?! (\s\+))\s".

I’m not familiar with regular expressions at all, but according to my research, you can negate a pattern by writing “?!" to negate the pattern. So, other than " +", " " will be treated as a split string. The result is as follows.

Source
public class HelloSplitRegex {

	public static void main(String[] args) {
        String log = "java-beginner.com x.xx.xxx.xxxx - - [27/Jul/2016:23:48:20 +0900]";

        String regex = "(?!(\\s\\+))\\s";

        System.out.println("regex: " + regex);

        for (String s : log.split(regex)) {
            System.out.println(s);
        }
	}

}
Execution results
regex: (?!(\s\+))\s
java-beginner.com
x.xx.xxx.xxxx
-
-
[27/Jul/2016:23:48:20 +0900]

As expected, the time part was not split. There may be better ways to do this, but for this time I was satisfied with this program.

That’s all. I hope this is helpful to you.

Articles next and previous in the same category