Java Concurrency in Practice - Book Review

I used my holidays to concentrate myself on the reading of my last book : Java Concurrency in Practice of Brian Goetz (with Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes and Doug Lea).

This book is, in my point of view, the reference for the development of concurrency programs in Java.

Reading this book, you will learn that concurrency is everywhere when programming in Java (all the examples are in Java, but the theory  is valid for almost all programming languages). You will also learn why the GUI Frameworks are single-threaded. You will also understand that a lot of Java programs aren't correct because of the lack of thread safety.

The first chapter, the introduction, explains what's are the threads and why we use parallel processing. It contains also the first interleaving example (really simple) and how to solve it. In the third chapter, the author explains what is Thread Safety and how to achieve it using locks (intrinsic locks with synchronized). In the next chapter, you learn how to share objects between several threads. This include the notions of visibility, immutability, thread confinement and safe publication. With the fourth chapter, you learn how to design a thread safe class, delegate the thread safety to an other class and learn why it's really important to document the synchronization policies. In the last chapter of this first part, we see how to build blocks of statements. We use concurrent collections, blocking queues, synchronizers, blocking and interruptible methods.

The second part is about structuring concurrent applications. It contains information about the executor framework, finding parallelism, the cancellation and the shutdown of tasks, the thread pools and the GUI applications.

The third part is about liveness hazards, performance and scalability and also about testing concurrent programs.

The last part describe advanced topics. It contains explicit locks using ReentrantLock. It explains also how to build custom synchronizers. A chapter is about building concurrent programs using non-blocking algorithms. This algorithms are better performing but a lot more difficult to build. And the last chapter is about the Java Memory Model. This chapter is very technical but really interesting if you are interested to understand deeply the Java language.

To conclude, this book is a reference for every person who want to write concurrent applications.

Post Scriptum : This is the hundredth post of this blog. I'm proud to see that there is a lot of regular readers and I hope that this blog will live long.

Holidays

Monday, I will go on holidays for one week. I'll not have internet except on my smartphone, so I'll not post on this blog during one week.

I will just approve comments and perhaps answer to comments if this is necessary, but this is all.

So, to next week.

A website for JTheque

I've the pleasure to inform you that I've created a new website in English for JTheque : http://www.jtheque.com

The old websites of JTheque (a French website and a French forge) were completely out of date and were too complicated too manage. This time I created an english, really simple, website to add the more useful informations about my JTheque projects. I always wanted to have a real website for JTheque. Before that, I had the Maven auto generated websites, but this is not a real website and it's not really good-looking. I used Google Sites to create this website.

At this time, I've included three projects in the website : JTheque Core, JTheque Utils and JTheque XML Utils. There is not a lot of informations for the moment,  but from this time I'll include all the future informations in this new website and I will of course inform you via this blog of all the informations about my projects.

I hope that this website will interest you and that it will help to promote a little my JTheque Project.

If you found any error on the website, don't hesitate to contact me, via comment or email or whatever you want. If you need more informations on one or more project, don't hesitate to request them and I will include them in the website as soon as possible.

Do not use relative path with LogBack

A little tip that can be useful and save a lot of time : Do not use relative path with LogBack.

I wondered why this little LogBack configuration didn't work :

<?xml version="1.0" encoding="UTF-8" ?>

<configuration>
    <contextName>JTheque</contextName>

    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logs/jtheque.log</file>

        <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
            <FileNamePattern>logs/jtheque.%i.log.zip</FileNamePattern>
            <MinIndex>1</MinIndex>
            <MaxIndex>5</MaxIndex>
        </rollingPolicy>

        <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
            <MaxFileSize>5MB</MaxFileSize>
        </triggeringPolicy>

        <layout class="ch.qos.logback.classic.PatternLayout">
            <Pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</Pattern>
        </layout>
    </appender>

    <root level="DEBUG">
        <appender-ref ref="FILE"/>
    </root>
</configuration>

No file were written. I searched during a long time and after that tested with an absolute path and it worked really well. But absolute path is not very good. But, you can use system properties in the configuration, so I used user.dir to make the think works :

        ...
        <file>${user.dir}/logs/jtheque.log</file>

        <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
            <FileNamePattern>${user.dir}/logs/jtheque.%i.log.zip</FileNamePattern>
            <MinIndex>1</MinIndex>
            <MaxIndex>5</MaxIndex>
        </rollingPolicy>
        ...

And this time, it works well !

Hope this will be useful to somebody.

Generate graphs benchmarks easily

After launching a lot of benchmarks for file copy benchmark and always generating the graphs from the results in Excel, I realized that I was loosing a lot of time to do that. So like any Java developer, I decided to create a little tool that do the work automatically for me.

For creating benchmarks, I'm using a little micro-benchmarking framework, described here. After the results are generated, I automatically generate a bar chart of the result using JFreeChart.

Here is an example of graph generated by the tool :

Example graph

Read more…

Presentation and use of H2 Database Engine

It makes a long time now that I started to use the H2 Database Engine as embedded database in JTheque and other projects. This post is a presentation of this database engine and some informations about its utilization.

H2 is a pure Java database. It can work as embedded database or in server mode. The developers of the database have made all to have a very small footprint for this database, it takes around 1MB jar file size.

Read more…

Java File Copy Benchmarks Update

I've made an update of my benchmark about file copy methods in Java. I've been asked for new informations about this benchmark and for new test, so I've included more results and informations.

This new version include two new complete benchmarks :

  1. Benchmark on the same disk (Ext4)
  2. Benchmark between two disks (Ext4 -> Ext4)

And of course the old benchmark is always here : Benchmark between two disks (Ext4 -> NTFS).

I've also included more informations about the disk and the benchmark. The statistics informations about the results are also included in the post. So you can found the standard deviation of the results and the confidence intervals of the results stats.

And last but not least I've included a new method to copy files using the cp executable of Linux.

The results are always available at the same place : File Copy in Java - Benchmark

File copy in Java - Benchmark

Yesterday I wondered if the copyFile method in JTheque Utils was the best method or if I need to change. So I decided to do a benchmark.

So I searched all the methods to copy a File in Java, even the bad methods and found 5 methods :

  1. Native Copy : Make the copy using the cp executable of Linux
  2. Naive Streams Copy : Open two streams, one to read, one to write and transfer the content byte by byte.
  3. Naive Readers Copy : Open two readers, one to read, one to write and transfer the content character by character.
  4. Buffered Streams Copy : Same as the first but using buffered streams instead of simple streams.
  5. Buffered Readers Copy : Same as the second but using buffered readers instead of simple readers.
  6. Custom Buffer Stream Copy : Same as the first but reading the file not byte by byte but using a simple byte array as buffer.
  7. Custom Buffer Reader Copy : Same as the fifth but using a Reader instead of a stream.
  8. Custom Buffer Buffered Stream Copy : Same as the fifth but using buffered streams.
  9. Custom Buffer Buffered Reader Copy : Same as the sixth but using buffered readers.
  10. NIO Buffer Copy : Using NIO Channel and using a ByteBuffer to make the transfer.
  11. NIO Transfer Copy : Using NIO Channel and direct transfer from one channel to other.
  12. Path (Java 7) Copy : Using the Path class of Java 7 and its method copyTo()

I think, this is the principal methods to copy a file to another file. The different methods are available at the end of the post. Pay attention that the methods with Readers only works with text files because Readers are using character by character reading so it doesn't work on a binary file like an image. Here I used a buffer size of 4096 bytes. Of course, use a higher value improve the performances of custom buffer strategies.

For the benchmark, I made the tests using different files.

  1. Little file (5 KB)
  2. Medium file (50 KB)
  3. Big file (5 MB)
  4. Fat file (50 MB)
  5. And an enormous file (1.3 GB) only binary

And I made the tests first using text files and then using binary files. I made the tests using in three modes :

  1. On the same hard disk. It's an IDE Hard Disk of 250 GB with 8 MB of cache. It's formatted in Ext4.
  2. Between two disk. I used the first disk and an other SATA Hard Disk of 250 GB with 16 MB of cache. It's formatted in Ext4.
  3. Between two disk. I used the first disk and an other SATA Hard Disk of 1 TB with 32 MB of cache. It's formatted using NTFS.

I used a benchmark framework, described here, to make the tests of all the methods. The tests have been made on my personal computer (Ubuntu 10.04 64 bits, Intel Core 2 Duo 3.16 GHz, 6 Go DDR2, SATA Hard Disks). The Java version used is a Java 7 64 bits Virtual Machine.

I've cut the post into several pages due to the length of the post :

  1. Introduction about the benchmark
  2. Benchmark on the same disk
  3. Benchmark between Ext4 and Ext4
  4. Benchmark between Ext4 and NTFS
  5. Conclusions about the benchmark results

Benchmark on the same disk (Ext4)

So let's start with the results of the benchmarking using the same disk.

Little Text Benchmark Results

We can see that here the native and naive streams methods are a lot slower than the other methods. So lets remove the naive  streams method from the graph to have a better view on the other methods :

Little Text Benchmark Sub Results

The first conclusion we can do is that the naive readers is a lot faster than the naive streams. It's because Reader use a buffer internally and this is not the case in streams. The others methods are closer, so we'll see with the next sizes what happens.

Medium Text Benchmark Results

Here, we have removed the two naive methods because they are too slows compared to the others.

The readers methods are slower than the equivalent streams methods because readers are working on chars, so they must make characters conversion for every char of the file, so this is a cost to add.

Another observation is that the custom buffer strategy is faster than the buffering of the streams and than using custom buffer with a buffered stream or a single stream doesn't change anything. The same observation can be made using the custom buffer using readers, it's the same with buffered readers or not. This is logical, because with custom buffer we made 4096 (size of the buffer) times less invocations to the read method and because we ask for a complete buffer we have not a lot of I/O operations. So the buffer of the streams (or the readers) is not useful here.

The NIO Buffer, NIO Transfer and Path strategies are almost equivalent to custom buffer.

Big Text Benchmark Results

Here we see the limits of the simple buffered stream (and readers methods). And another really interesting thing we see is that the native is now faster than buffered streams and readers. Native method must start an external program and this has a cost not negligible. But the copy using the cp executable is really fast and that's because when the file size grows, the native method becomes interesting. All the other methods except the readers are almost equivalent.

Fat Text Benchmark Results

This time we can see that the native copy method is here as fast as the custom buffer streams. The fast method is the NIO Transfer method.

It's interesting to see that it doesn't take 100 ms to copy a 50 MB file.

We'll see with binary now. We'll directly start with a 5 MB file.

Big Binary Benchmark Results

We see exactly the same results as with a text file. The native method start to be interesting. We see precisely that the NIOand Path methods are really interesting here.

Fat Binary Benchmark Results

We can see that all the methods are really, really close, but the native, NIO Buffer, NIO Transfer and Path methods are the best. Just to be sure of these results, let's test with a bigger file :

Enormous Binary Benchmark Results

Here we can see that the native method become to be the fastest one. The other method are really close. I thought the NIO Transfer will be normally faster. Due to the size of the file the benchmark has been made only a little number of times, so the number can be inaccurate. We see that he Path method is really close to the other.

The detailed informations (standard deviation, confidence intervals and other stats stuff) are available in the conclusion page.

Benchmark between two disks (Ext4 -> Ext4)

Here are the results of the same tests but using two hard disk with the same formatting (Ext4).

Little Text Benchmark Results

We see exactly the same results as in the first benchmark. The naive streams iscompletely useless for little files. So let's remove itand see what happens for interesting methods :

Little Text Benchmark Results

Here again, the conclusion are the same and the times are not enough big to make global conclusions.

Medium Text Benchmark Results

Here, we have the limits of the buffered strategy and see a real advantage of custom buffer strategy. We also see that the NIO Transfer and Path methods are taking a little advantage. But again, the times are really short.

Big Text Benchmark Results

We can see the reintroduction of the native method on the interesting methods.

Fat Text Benchmark Results

So we covered the text files. If we compare the times between the first benchmark (the same disk) and this method (between two disk), we can see that the times are almost the same, just a little slower for some methods. So let's watch the big binary files :

Fat Binary Benchmark Results

Again, the results are close to using the same disk. So let's see with the last file :

Enormous Binary Benchmark Results

This time, the differences are impressive. The native and NIO Buffer methods are the fastest methods. The NIO Transfer is a little slower but the Path method is a lot slower here.

This transfer is a lot faster than on the same disk. I'm not sure of the cause of these results. The only reason I can found is that the operating system can made the two things at the same time, reading on the first disk and writing on the second disk. If someone has a better conclusion, don't hesitate to comment the post.

The detailed informations (standard deviation, confidence intervals and other stats stuff) are available in the conclusion page.

Benchmark between two disks (Ext4 -> NTFS)

Here are the results of the first version of this post. The first disk is always the same, but the second disk is a NTFS. For concision, I removed some graphes. I've also removed the conclusion that are the same as the first two benchmarks. The native method is not covered in these results.

Little Text File - Best results

The best two versions are the Buffered Streams and Buffered Readers. Here this is because the buffered streams and readers can write the file in only one operation. Here the times are in microseconds, so there is really little differences between the methods. So the results are not really relevant.

Now, let's test with a bigger file.

Medium Text File

We can see that the versions with the Readers are a little slower than the version with the streams. This is because Readers works on character and for every read() operation, a char conversion must be made, and the same conversion must be made on the other side.

Another observation is that the custom buffer strategy is faster than the buffering of the streams and than using custom buffer with a buffered stream or a single stream doesn't change anything. The same observation can be made using the custom buffer using readers, it's the same with buffered readers or not. This is logical, because with custom buffer we made 4096 (size of the buffer) times less invocations to the read method and because we ask for a complete buffer we have not a lot of I/O operations. So the buffer of the streams (or the readers) is not useful here. The NIO buffer strategy is almost equivalent to custom buffer. And the direct transfer using NIO is here slower than the custom buffer methods. I think this is because here the cost of invoking native methods in the operating system level is higher than simply the cost of making the file copy.

Big Text File - Best results

Here, it's now clear that the custom buffer strategy is a better than the simple buffered streams or readers and that using custom buffer and buffered streams is really useful for bigger files. The Custom Buffer Readers method is better than Custom Buffer Streams because FileReader use a buffer internally.

And now, continue with a bigger file :

Fat Text File Results

You can see that it doesn't take 500 ms to copy a 50 MB file using the custom buffer strategy and that it even doesn't take 400 ms with the NIO Transfer method. Really quick isn't it ? We can see that for a big file, the NIO Transfer start to show an advantage, we'll better see that in the binary file benchmarks. We will directly start with a big file (5 MB) for this benchmark :

Big Binary File Results

So we can make the same conclusion as for the text files, of course, the buffered streams methods is not fast. The other methods are really close.

Fat Binary File Results

We see here again that the NIO Transfer is gaining advantages more the files is bigger.

And just for the pleasure, a great file (1.3 GB) :

Enormous Binary File Results

We see that all the methods are really close, but the NIO Transfer method has an advantage of 500 ms. It's not negligible.

A conclusion we can make is that transfering a file from Ext4 to Ext4 is a lot faster than from Ext4 to NTFS. I think it's logical because the operating system must made conversions. I think it's not because of the disk, because the NTFS disk is the faster I've.

Conclusion

In conclusion, the NIO Transfer method is the best one for big files but it's not the fastest for little files (< 5 MB). But the custom buffer strategy (and the NIO Buffer too) are also really fast methods to copy files. We've also see that the method using the native utility tools to make the copy is faster as NIO for big files (< 1 GB) but it's really slow for little files because of the cost of invoking an external program.

So perhaps, the best method is a method that make a custom buffer strategy on the little files and a NIO Transfer on the big ones and perhaps use the native executable on the really bigger ones. But it will be interesting to also make the tests on an other computer and operating system.

We can take several rules from this benchmark :

  1. Never made a copy of file byte by byte (or char by char)
  2. Prefer a buffer in your side more than in the stream to make less invocations of the read method, but don't forget the buffer in the side of the streams
  3. Pay attention to the size of the buffers
  4. Don't use char conversion if you only need to tranfer the content of a file, so don't use Reader if you need only streams.
  5. Don't hesitate to use channels to make file transfer, it's the fastest way to make a file transfer.
  6. Consider the native executable invocation only for really bigger files.
  7. The new Path method of Java 7 is really fast except for the transfer of an enormous file between two disks.

I hope this benchmark (and its results) interested you.

Here are the sources of the benchmark : File Copy Benchmark Version 3

Here are the informations complete for the benchmark between two disks : Complete results of first two benchmarks

Effective Java, second edition – Book Review

Before reading that book, I read the translation in French of the first edition, but I've thinked that it will be interesting to read the second edition and this time in English.

Effective Java Cover

This book is the book to read if you want to write good Java code. All the advices are really useful. It's really comfortable to read book from a person who master Java. In fact, Joshua led the design and implementation of numerous Java features.

Read more…