String stringContent = Files.readString("c:/temp/myfile.txt", StandardCharsets.US_ASCII);
Below is an example of reading a text file with charset ISO_8859_1 which is ISO Latin Alphabet No. 1 or ISO-LATIN-1. Also, the contents of this file is read into a String variable.
String stringContent = Files.readString("c:/temp/myfile.txt", StandardCharsets.ISO_8859_1);If we wish to support language aside from English, UTF8 is a very good choice. See below example for using Eight-bit UCS Transformation Format or what is called UTF8.
String stringContent = Files.readString("c:/temp/myfile.txt", StandardCharsets.UTF_8);
Another standard of file is the Sixteen-bit UCS Transformation Format with big-endian byte order. Below is an example of how to read such a file into a String using Java 11.
String stringContent = Files.readString("c:/temp/myfile.txt", StandardCharsets.UTF_16BE);
Similar to 16bit big-endian, some files are in little-endian. Below is an example of reading a text file in Sixteen-bit UCS Transformation Format with little-endian byte order using Java 11, and assign to a String the contents of.
String stringContent = Files.readString("c:/temp/myfile.txt", StandardCharsets.UTF_16LE);The last example below shows how to read a file in Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark.
String stringContent = Files.readString("c:/temp/myfile.txt", StandardCharsets.UTF_16);Why do we need so many format? It is because there are many standards around and the files are encoded differently depending on how they were saved. So we need to know how to read each in their own format. These are just the standard ones supported by Java 11, but are good enough because these are the most common. Most text files are encoded in one of these six char sets.
public static String readFileToString(String path) throws IOException { byte[] encodedBytes = Files.readAllBytes(Paths.get(path)); String result = new String(encodedBytes, Charset.forName("US-ASCII")); return result; }
Below is a version of the same method using the ISO Latin Alphabet No. 1 or what is known simple as the ISO-LATIN-1. It also does the same thing of reading a file into String using Files.readAllBytes() but using the ISO-LATIN-1 charset.
public static String readFileToString(String path) throws IOException { byte[] encodedBytes = Files.readAllBytes(Paths.get(path)); String result = new String(encodedBytes, Charset.forName("ISO-8859-1")); return result; }If we are dealing with charset other than English, UTF-8 is a always a good choice. So below is a rewritten example using Eight-bit UCS Transformation Format that does the same purpose.
public static String readFileToString(String path) throws IOException { byte[] encodedBytes = Files.readAllBytes(Paths.get(path)); String result = new String(encodedBytes, Charset.forName("UTF-8")); return result; }And we also rewrite below using Sixteen-bit UCS Transformation Format big-endian byte order, or just simly UTF-16BE.
public static String readFileToString(String path) throws IOException { byte[] encodedBytes = Files.readAllBytes(Paths.get(path)); String result = new String(encodedBytes, Charset.forName("UTF-16BE")); return result; }And if the file we wish to read is in UTF-16LE or the 16 bit UCS little-endian, we can use below example:
public static String readFileToString(String path) throws IOException { byte[] encodedBytes = Files.readAllBytes(Paths.get(path)); String result = new String(encodedBytes, Charset.forName("UTF-16LE")); return result; }
And lastly is an example using UTF-=16 where the byte order is identified by an optional byte-order mark.
public static String readFileToString(String path) throws IOException { byte[] encodedBytes = Files.readAllBytes(Paths.get(path)); String result = new String(encodedBytes, Charset.forName("UTF-16")); return result; }
List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("US-ASCII"));So we can use this and just append all the lines to a final String and return it to the caller, to get only one String. Below is an example using US-ASCII:
public static String readFileToString(String path) throws IOException { List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("US-ASCII")); String result = ""; for (String s : lines) { result += s + "\t"; } return result; }
Below is rewritten example in LATIN1 charset.
public static String readFileToString(String path) throws IOException { List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("ISO-8859-1")); String result = ""; for (String s : lines) { result += s + "\t"; } return result; }Also the same method in UTF-8 char set format, which is very popular in applications.
public static String readFileToString(String path) throws IOException { List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("UTF-8")); String result = ""; for (String s : lines) { result += s + "\t"; } return result; }And the implementation using UTF-16 big-endian char set format, in case the file was written that way.
public static String readFileToString(String path) throws IOException { List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("UTF-16BE")); String result = ""; for (String s : lines) { result += s + "\t"; } return result; }
And also UTF-16 little-endian char set, which is another possibility.
public static String readFileToString(String path) throws IOException { List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("UTF-16LE")); String result = ""; for (String s : lines) { result += s + "\t"; } return result; }
And lastly, just plain UTF-16 char set where encoding is identified by a byte-order mark.
public static String readFileToString(String path) throws IOException { List<String> lines = Files.readAllLines(Paths.get(path), Charset.forName("UTF-16")); String result = ""; for (String s : lines) { result += s + "\t"; } return result; }
This is just for fun that we explore different ways of doing the same thing. And of course depends on what version of Java you have. But for this case, we explore the use of Scanner class. See below example:
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path) ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }We can use a scanner to read and return contents of a file to a String variable as shown above. And similarly, we can add char set to this, in case we are parsing different formats of file. For example below is ASCII.
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path), "US-ASCII" ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }
Below is reading a file contents into String using scanner with ISO-LATIN-1 char set.
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path), "ISO-8859-1" ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }
Below is reading a file contents into String using scanner with Eight-bit UCS Transformation Format.
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path), "UTF-8" ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }
Below is reading a file contents into String using scanner with Sixteen-bit UCS Transformation Format, big-endian byte order char set.
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path), "UTF-16BE" ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }
Below is reading a file contents into String using scanner with Sixteen-bit UCS Transformation Format, little-endian byte order char set.
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path), "UTF-16LE" ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }Below is reading a file contents into String using scanner with Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark char set.
public static String readFile(String path) throws IOException { Scanner scanner = new Scanner( new File(path), "UTF-16" ); String result = scanner.useDelimiter("\\A").next(); scanner.close(); return result; }
Another favorite method if many programmers is to just use Apache Commons where there are many convenient methods ready to use. Such as FileUtils.readFileToString(). See below Syntax:
public static String readFileToString(File file, Charset encoding) throws IOExceptionThe parameter if the file to read and expected encoding of the file. And it returns the contents of a file as a String. See below again in action with ASCII.
public static String readFile(String path) throws IOException { File file = new File(path); String result = FileUtils.readFileToString(file, StandardCharsets.US_ASCII); return result; }
And we show different examples using different encodings. Below is LATIN1.
public static String readFile(String path) throws IOException { File file = new File(path); String result = FileUtils.readFileToString(file, StandardCharsets.ISO_8859_1); return result; }
And we show below example with UTF-8 for internationalization support.
public static String readFile(String path) throws IOException { File file = new File(path); String result = FileUtils.readFileToString(file, StandardCharsets.UTF_8); return result; }
And 16 bit UCS big-endian charset.
public static String readFile(String path) throws IOException { File file = new File(path); String result = FileUtils.readFileToString(file, StandardCharsets.UTF_16BE); return result; }
And 16 bit UCS little-endian charset.
public static String readFile(String path) throws IOException { File file = new File(path); String result = FileUtils.readFileToString(file, StandardCharsets.UTF_16LE); return result; }
And 16 bit UCS with byte order identified by an optional byte-order mark.
public static String readFile(String path) throws IOException { File file = new File(path); String result = FileUtils.readFileToString(file, StandardCharsets.UTF_16); return result; }
If we are using an older version of Java, you may want to use a simple self implementation shown below:
public static String readFile(String path) throws IOException { StringBuilder result = new StringBuilder(); BufferedReader reader = new BufferedReader(new FileReader(path)); try { char[] buf = new char[1024]; int r = 0; while ((r = reader.read(buf)) != -1) { result.append(buf, 0, r); } } finally { reader.close(); } return result.toString(); }