Java Read File Line by Line Using Scanner
Reading files in Java is the cause for a lot of confusion. There are multiple means of accomplishing the same job and it'southward often non clear which file reading method is all-time to use. Something that's quick and dirty for a small example file might not be the best method to utilise when yous need to read a very big file. Something that worked in an earlier Java version, might non exist the preferred method anymore.
This commodity aims to be the definitive guide for reading files in Java 7, 8 and 9. I'm going to cover all the ways you can read files in Java. Too often, you'll read an article that tells you i way to read a file, only to notice afterwards there are other ways to do that. I'g actually going to cover xv dissimilar ways to read a file in Coffee. I'm going to cover reading files in multiple ways with the core Java libraries also as two third party libraries.
But that's not all – what good is knowing how to do something in multiple ways if you lot don't know which mode is best for your state of affairs?
I also put each of these methods to a real operation test and document the results. That way, you volition have some hard data to know the performance metrics of each method.
Methodology
JDK Versions
Java code samples don't live in isolation, especially when it comes to Java I/O, every bit the API keeps evolving. All code for this article has been tested on:
- Coffee SE seven (jdk1.seven.0_80)
- Coffee SE eight (jdk1.8.0_162)
- Java SE ix (jdk-ix.0.4)
When at that place is an incompatibility, it will be stated in that section. Otherwise, the code works unaltered for unlike Java versions. The chief incompatibility is the use of lambda expressions which was introduced in Java 8.
Java File Reading Libraries
At that place are multiple ways of reading from files in Java. This article aims to be a comprehensive collection of all the different methods. I will cover:
- java.io.FileReader.read()
- java.io.BufferedReader.readLine()
- coffee.io.FileInputStream.read()
- java.io.BufferedInputStream.read()
- coffee.nio.file.Files.readAllBytes()
- java.nio.file.Files.readAllLines()
- java.nio.file.Files.lines()
- coffee.util.Scanner.nextLine()
- org.apache.commons.io.FileUtils.readLines() – Apache Commons
- com.google.common.io.Files.readLines() – Google Guava
Endmost File Resource
Prior to JDK7, when opening a file in Java, all file resource would need to be manually closed using a try-take hold of-finally block. JDK7 introduced the endeavor-with-resource statement, which simplifies the process of closing streams. You no longer demand to write explicit lawmaking to close streams because the JVM volition automatically close the stream for y'all, whether an exception occurred or non. All examples used in this commodity use the try-with-resources statement for importing, loading, parsing and closing files.
File Location
All examples volition read examination files from C:\temp.
Encoding
Character encoding is not explicitly saved with text files so Java makes assumptions about the encoding when reading files. Commonly, the assumption is right but sometimes you want to be explicit when instructing your programs to read from files. When encoding isn't right, y'all'll see funny characters appear when reading files.
All examples for reading text files apply two encoding variations:
Default system encoding where no encoding is specified and explicitly setting the encoding to UTF-8.
Download Code
All code files are available from Github.
Code Quality and Code Encapsulation
There is a divergence between writing lawmaking for your personal or work projection and writing code to explicate and teach concepts.
If I was writing this lawmaking for my ain projection, I would utilize proper object-oriented principles like encapsulation, brainchild, polymorphism, etc. But I wanted to brand each case stand up lone and easily understood, which meant that some of the lawmaking has been copied from one example to the next. I did this on purpose considering I didn't want the reader to have to figure out all the encapsulation and object structures I so cleverly created. That would take abroad from the examples.
For the same reason, I chose NOT to write these example with a unit testing framework like JUnit or TestNG because that's non the purpose of this article. That would add another library for the reader to understand that has cipher to exercise with reading files in Java. That'due south why all the example are written inline within the main method, without actress methods or classes.
My main purpose is to brand the examples as easy to empathize as possible and I believe that having extra unit testing and encapsulation code will not assistance with this. That doesn't mean that's how I would encourage you to write your own personal code. It's just the fashion I chose to write the examples in this article to brand them easier to understand.
Exception Treatment
All examples declare any checked exceptions in the throwing method announcement.
The purpose of this commodity is to testify all the different ways to read from files in Coffee – information technology's not meant to testify how to handle exceptions, which will be very specific to your state of affairs.
So instead of creating unhelpful attempt catch blocks that simply print exception stack traces and clutter up the lawmaking, all example will declare any checked exception in the calling method. This will make the code cleaner and easier to understand without sacrificing whatever functionality.
Future Updates
As Coffee file reading evolves, I will be updating this article with any required changes.
File Reading Methods
I organized the file reading methods into three groups:
- Classic I/O classes that have been part of Java since earlier JDK 1.7. This includes the java.io and java.util packages.
- New Java I/O classes that have been role of Coffee since JDK1.7. This covers the coffee.nio.file.Files class.
- Third party I/O classes from the Apache Eatables and Google Guava projects.
Classic I/O – Reading Text
1a) FileReader – Default Encoding
FileReader reads in one character at a fourth dimension, without any buffering. It's meant for reading text files. It uses the default grapheme encoding on your arrangement, so I take provided examples for both the default case, too as specifying the encoding explicitly.
1
2
three
four
5
six
7
8
ix
10
eleven
12
13
14
fifteen
16
17
eighteen
nineteen
import coffee.io.FileReader ;
import java.io.IOException ;public class ReadFile_FileReader_Read {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;endeavour ( FileReader fileReader = new FileReader (fileName) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = fileReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;//display one grapheme at a time
Organization.out.print (singleChar) ;
}
}
}
}
1b) FileReader – Explicit Encoding (InputStreamReader)
It's actually not possible to set the encoding explicitly on a FileReader then y'all have to use the parent class, InputStreamReader and wrap it around a FileInputStream:
one
two
iii
four
5
half-dozen
seven
8
9
x
xi
12
13
14
fifteen
16
17
18
xix
xx
21
22
import coffee.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public class ReadFile_FileReader_Read_Encoding {
public static void primary( Cord [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
FileInputStream fileInputStream = new FileInputStream (fileName) ;//specify UTF-8 encoding explicitly
endeavor ( InputStreamReader inputStreamReader =
new InputStreamReader (fileInputStream, "UTF-8" ) ) {int singleCharInt;
char singleChar;
while ( (singleCharInt = inputStreamReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organization.out.print (singleChar) ; //display one character at a time
}
}
}
}
2a) BufferedReader – Default Encoding
BufferedReader reads an entire line at a time, instead of i character at a time like FileReader. It'due south meant for reading text files.
ane
2
3
4
5
half dozen
vii
viii
ix
10
11
12
xiii
14
15
16
17
import java.io.BufferedReader ;
import java.io.FileReader ;
import java.io.IOException ;public class ReadFile_BufferedReader_ReadLine {
public static void principal( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileReader fileReader = new FileReader (fileName) ;try ( BufferedReader bufferedReader = new BufferedReader (fileReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != null ) {
Organisation.out.println (line) ;
}
}
}
}
2b) BufferedReader – Explicit Encoding
In a similar fashion to how we fix encoding explicitly for FileReader, nosotros need to create FileInputStream, wrap it inside InputStreamReader with an explicit encoding and pass that to BufferedReader:
1
2
iii
4
5
6
7
8
9
10
xi
12
13
14
fifteen
16
17
18
xix
20
21
22
import java.io.BufferedReader ;
import coffee.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public course ReadFile_BufferedReader_ReadLine_Encoding {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;FileInputStream fileInputStream = new FileInputStream (fileName) ;
//specify UTF-8 encoding explicitly
InputStreamReader inputStreamReader = new InputStreamReader (fileInputStream, "UTF-8" ) ;endeavor ( BufferedReader bufferedReader = new BufferedReader (inputStreamReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != aught ) {
System.out.println (line) ;
}
}
}
}
Classic I/O – Reading Bytes
1) FileInputStream
FileInputStream reads in 1 byte at a time, without whatever buffering. While information technology's meant for reading binary files such as images or audio files, information technology can still be used to read text file. It's similar to reading with FileReader in that you lot're reading one character at a time as an integer and you need to bandage that int to a char to run across the ASCII value.
By default, it uses the default character encoding on your organisation, then I have provided examples for both the default example, also equally specifying the encoding explicitly.
one
two
3
4
5
6
7
eight
9
x
11
12
13
14
15
16
17
18
19
20
21
import java.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_FileInputStream_Read {
public static void main( Cord [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;endeavour ( FileInputStream fileInputStream = new FileInputStream (file) ) {
int singleCharInt;
char singleChar;while ( (singleCharInt = fileInputStream.read ( ) ) != - ane ) {
singleChar = ( char ) singleCharInt;
System.out.print (singleChar) ;
}
}
}
}
ii) BufferedInputStream
BufferedInputStream reads a fix of bytes all at once into an internal byte array buffer. The buffer size can exist prepare explicitly or use the default, which is what nosotros'll demonstrate in our instance. The default buffer size appears to be 8KB only I have non explicitly verified this. All performance tests used the default buffer size so information technology will automatically re-size the buffer when information technology needs to.
i
ii
3
iv
5
vi
vii
8
9
10
eleven
12
thirteen
14
15
16
17
18
19
twenty
21
22
import java.io.BufferedInputStream ;
import java.io.File ;
import coffee.io.FileInputStream ;
import java.io.FileNotFoundException ;
import coffee.io.IOException ;public course ReadFile_BufferedInputStream_Read {
public static void chief( String [ ] pArgs) throws FileNotFoundException, IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;
FileInputStream fileInputStream = new FileInputStream (file) ;try ( BufferedInputStream bufferedInputStream = new BufferedInputStream (fileInputStream) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = bufferedInputStream.read ( ) ) != - ane ) {
singleChar = ( char ) singleCharInt;
Arrangement.out.print (singleChar) ;
}
}
}
}
New I/O – Reading Text
1a) Files.readAllLines() – Default Encoding
The Files class is part of the new Java I/O classes introduced in jdk1.seven. Information technology only has static utility methods for working with files and directories.
The readAllLines() method that uses the default character encoding was introduced in jdk1.8 so this example will non work in Java 7.
ane
2
three
4
5
6
7
viii
nine
ten
11
12
13
14
15
xvi
17
import coffee.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;
import java.util.List ;public grade ReadFile_Files_ReadAllLines {
public static void main( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;Listing fileLinesList = Files.readAllLines (file.toPath ( ) ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
1b) Files.readAllLines() – Explicit Encoding
one
two
3
iv
5
six
7
8
nine
10
xi
12
13
14
xv
16
17
18
19
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.List ;public course ReadFile_Files_ReadAllLines_Encoding {
public static void chief( Cord [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//use UTF-8 encoding
Listing fileLinesList = Files.readAllLines (file.toPath ( ), StandardCharsets.UTF_8 ) ;for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
2a) Files.lines() – Default Encoding
This lawmaking was tested to work in Java 8 and 9. Java 7 didn't run because of the lack of support for lambda expressions.
1
2
3
4
five
six
7
8
nine
ten
11
12
xiii
14
15
16
17
import coffee.io.File ;
import java.io.IOException ;
import coffee.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ) ) ) {
linesStream.forEach (line -> {
Organisation.out.println (line) ;
} ) ;
}
}
}
2b) Files.lines() – Explicit Encoding
Just like in the previous example, this lawmaking was tested and works in Coffee eight and nine just not in Coffee seven.
1
2
3
iv
5
6
7
8
ix
10
xi
12
thirteen
14
15
xvi
17
18
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import coffee.nio.file.Files ;
import java.util.stream.Stream ;public form ReadFile_Files_Lines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ), StandardCharsets.UTF_8 ) ) {
linesStream.forEach (line -> {
Arrangement.out.println (line) ;
} ) ;
}
}
}
3a) Scanner – Default Encoding
The Scanner class was introduced in jdk1.7 and tin be used to read from files or from the console (user input).
1
two
3
4
five
half dozen
7
viii
9
x
11
12
13
xiv
xv
16
17
18
19
import coffee.io.File ;
import coffee.io.FileNotFoundException ;
import java.util.Scanner ;public form ReadFile_Scanner_NextLine {
public static void main( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Scanner scanner = new Scanner(file) ) {
Cord line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
System.out.println (line) ;
}
}
}
}
3b) Scanner – Explicit Encoding
one
2
3
4
v
6
7
8
nine
10
11
12
13
14
15
xvi
17
18
19
20
import java.io.File ;
import java.io.FileNotFoundException ;
import java.util.Scanner ;public class ReadFile_Scanner_NextLine_Encoding {
public static void main( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//utilize UTF-8 encoding
try (Scanner scanner = new Scanner(file, "UTF-8" ) ) {
String line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
System.out.println (line) ;
}
}
}
}
New I/O – Reading Bytes
Files.readAllBytes()
Even though the documentation for this method states that "it is not intended for reading in large files" I plant this to be the accented all-time performing file reading method, even on files as large as 1GB.
ane
2
3
4
5
half dozen
7
8
9
10
11
12
13
14
15
sixteen
17
import java.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;public grade ReadFile_Files_ReadAllBytes {
public static void master( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;byte [ ] fileBytes = Files.readAllBytes (file.toPath ( ) ) ;
char singleChar;
for ( byte b : fileBytes) {
singleChar = ( char ) b;
System.out.print (singleChar) ;
}
}
}
tertiary Party I/O – Reading Text
Eatables – FileUtils.readLines()
Apache Eatables IO is an open source Java library that comes with utility classes for reading and writing text and binary files. I listed it in this commodity considering it can be used instead of the built in Java libraries. The class we're using is FileUtils.
For this article, version 2.6 was used which is uniform with JDK 1.seven+
Note that y'all need to explicitly specify the encoding and that method for using the default encoding has been deprecated.
1
2
three
4
5
6
seven
8
9
x
11
12
13
14
xv
16
17
18
import coffee.io.File ;
import java.io.IOException ;
import java.util.List ;import org.apache.commons.io.FileUtils ;
public class ReadFile_Commons_FileUtils_ReadLines {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = FileUtils.readLines (file, "UTF-8" ) ;
for ( Cord line : fileLinesList) {
Organisation.out.println (line) ;
}
}
}
Guava – Files.readLines()
Google Guava is an open source library that comes with utility classes for common tasks like collections handling, cache direction, IO operations, string processing.
I listed information technology in this article because it tin exist used instead of the built in Java libraries and I wanted to compare its performance with the Java congenital in libraries.
For this commodity, version 23.0 was used.
I'm not going to examine all the different means to read files with Guava, since this commodity is not meant for that. For a more than detailed look at all the dissimilar ways to read and write files with Guava, accept a wait at Baeldung's in depth article.
When reading a file, Guava requires that the grapheme encoding be set explicitly, just like Apache Eatables.
Compatibility annotation: This lawmaking was tested successfully on Java 8 and 9. I couldn't get it to work on Java 7 and kept getting "Unsupported major.modest version 52.0" mistake. Guava has a separate API dr. for Java 7 which uses a slightly different version of the Files.readLine() method. I thought I could get it to work simply I kept getting that mistake.
i
ii
3
4
v
6
7
viii
ix
10
xi
12
xiii
xiv
15
xvi
17
18
19
import coffee.io.File ;
import java.io.IOException ;
import coffee.util.List ;import com.google.common.base of operations.Charsets ;
import com.google.common.io.Files ;public form ReadFile_Guava_Files_ReadLines {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readLines (file, Charsets.UTF_8 ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
Performance Testing
Since there are then many ways to read from a file in Java, a natural question is "What file reading method is the all-time for my situation?" And then I decided to examination each of these methods confronting each other using sample data files of different sizes and timing the results.
Each code sample from this article displays the contents of the file to a string and then to the panel (System.out). However, during the performance tests the System.out line was commented out since it would seriously slow down the operation of each method.
Each performance examination measures the time information technology takes to read in the file – line past line, character by character, or byte past byte without displaying anything to the console. I ran each test five-ten times and took the boilerplate and so as not to let whatever outliers influence each test. I likewise ran the default encoding version of each file reading method – i.e. I didn't specify the encoding explicitly.
Dev Setup
The dev environment used for these tests:
- Intel Cadre i7-3615 QM @2.3 GHz, 8GB RAM
- Windows eight x64
- Eclipse IDE for Java Developers, Oxygen.2 Release (four.7.2)
- Java SE 9 (jdk-nine.0.iv)
Information Files
GitHub doesn't allow pushing files larger than 100 MB, so I couldn't find a applied mode to shop my large test files to permit others to replicate my tests. Then instead of storing them, I'chiliad providing the tools I used to generate them so you can create examination files that are like in size to mine. Plain they won't be the aforementioned, merely you'll generate files that are similar in size as I used in my performance tests.
Random String Generator was used to generate sample text and then I simply copy-pasted to create larger versions of the file. When the file started getting too large to manage inside a text editor, I had to utilise the command line to merge multiple text files into a larger text file:
re-create *.txt sample-1GB.txt
I created the post-obit 7 data file sizes to test each file reading method across a range of file sizes:
- 1KB
- 10KB
- 100KB
- 1MB
- 10MB
- 100MB
- 1GB
Performance Summary
At that place were some surprises and some expected results from the functioning tests.
As expected, the worst performers were the methods that read in a file grapheme by character or byte by byte. But what surprised me was that the native Java IO libraries outperformed both tertiary party libraries – Apache Eatables IO and Google Guava.
What'southward more – both Google Guava and Apache Commons IO threw a java.lang.OutOfMemoryError when trying to read in the one GB examination file. This likewise happened with the Files.readAllLines(Path) method merely the remaining vii methods were able to read in all exam files, including the 1GB test file.
The post-obit tabular array summarizes the average fourth dimension (in milliseconds) each file reading method took to complete. I highlighted the top three methods in light-green, the boilerplate performing methods in yellow and the worst performing methods in cherry-red:
The following chart summarizes the in a higher place table merely with the following changes:
I removed java.io.FileInputStream.read() from the nautical chart because its operation was then bad it would skew the entire chart and you wouldn't see the other lines properly
I summarized the data from 1KB to 1MB considering after that, the chart would get too skewed with so many nether performers and as well some methods threw a java.lang.OutOfMemoryError at 1GB
The Winners
The new Java I/O libraries (java.nio) had the all-time overall winner (java.nio.Files.readAllBytes()) merely it was followed closely behind past BufferedReader.readLine() which was as well a proven height performer across the lath. The other excellent performer was java.nio.Files.lines(Path) which had slightly worse numbers for smaller examination files but really excelled with the larger test files.
The accented fastest file reader beyond all data tests was java.nio.Files.readAllBytes(Path). Information technology was consistently the fastest and even reading a 1GB file only took about i second.
The following chart compares performance for a 100KB examination file:
You lot can come across that the everyman times were for Files.readAllBytes(), BufferedInputStream.read() and BufferedReader.readLine().
The post-obit chart compares performance for reading a 10MB file. I didn't carp including the bar for FileInputStream.Read() because the performance was then bad information technology would skew the unabridged chart and you couldn't tell how the other methods performed relative to each other:
Files.readAllBytes() really outperforms all other methods and BufferedReader.readLine() is a distant 2nd.
The Losers
As expected, the absolute worst performer was java.io.FileInputStream.read() which was orders of magnitude slower than its rivals for most tests. FileReader.read() was as well a poor performer for the same reason – reading files byte by byte (or character by character) instead of with buffers drastically degrades functioning.
Both the Apache Commons IO FileUtils.readLines() and Guava Files.readLines() crashed with an OutOfMemoryError when trying to read the 1GB exam file and they were nigh average in performance for the remaining exam files.
java.nio.Files.readAllLines() also crashed when trying to read the 1GB test file just it performed quite well for smaller file sizes.
Functioning Rankings
Here'south a ranked list of how well each file reading method did, in terms of speed and treatment of big files, besides as compatibility with dissimilar Java versions.
Rank | File Reading Method |
---|---|
1 | coffee.nio.file.Files.readAllBytes() |
2 | java.io.BufferedFileReader.readLine() |
iii | coffee.nio.file.Files.lines() |
4 | java.io.BufferedInputStream.read() |
5 | java.util.Scanner.nextLine() |
vi | java.nio.file.Files.readAllLines() |
7 | org.apache.commons.io.FileUtils.readLines() |
8 | com.google.common.io.Files.readLines() |
9 | java.io.FileReader.read() |
x | java.io.FileInputStream.Read() |
Conclusion
I tried to present a comprehensive set of methods for reading files in Java, both text and binary. We looked at 15 different ways of reading files in Java and we ran performance tests to come across which methods are the fastest.
The new Java IO library (java.nio) proved to be a great performer but so was the archetype BufferedReader.
Source: https://funnelgarden.com/java_read_file/
0 Response to "Java Read File Line by Line Using Scanner"
Post a Comment