Tip : Batch resize images on Ubuntu Linux

Baptiste Wicht

2010-09-08 07:26

After needing to optimize a lot of images at once, this weekend I needed to resize a lot of images to the same size because they were too big.

Like every other thing in Linux, there is a really simple tool to automate that. I used imagemagick to do that. Of course, there is certainly a lot of other things to make that work, but this is the first I've found and it works well.

So first, you need to install it if you don't have the tool :

sudo apt-get install imagemagick

And then, you can resize all the JPG images to a width of 640px of the current folder using the single command :

mogrify -resize 640 *.jpg

If you want the height, just add a x :

mogrify -resize x640 *.jpg

You can also specify maximum width and height, that can be useful if you have big images and you don't want a width larger than x and a height larger than y but you don't want to resize little images in the same folder. Here is an example resizing images if the width is larger than 1280 or height larger than 1024 :

mogrify -resize '1280x1024>' *.jpg

With all that commands, the ratio is preserved. If you want more informations on the possible resize options, you can consult the documentation of ImageMagick.

Hope that will help someone.

Java Concurrency - Part 6 : Atomic Variables

Baptiste Wicht

2010-09-08 07:14

Comments

When a data (typically a variable) can be accessed by several threads, you must synchronize the access to the data to ensure visibility and correctness.

By example, if you have a simple counter (yes, once again) :

public class Counter {
    private int value;

    public int getValue(){
        return value;
    }

    public int getNextValue(){
        return value++;
    }

    public int getPreviousValue(){
        return value--;
    }
}

This class works really well in single-threaded environment, but don't work at all when several threads access the same Counter instance. If you don't know why, read this post about synchronization. You can solve the problem using synchronized at method level :

public class SynchronizedCounter {
    private int value;

    public synchronized int getValue(){
        return value;
    }

    public synchronized int getNextValue(){
        return value++;
    }

    public synchronized int getPreviousValue(){
        return value--;
    }
}

This class now works well. But locking is not a lightweight mechanism and have several disadvantages. When several threads try to acquire the same lock, one or more threads will be suspended and they will be resumed later. When the critical section is little, the overhead is really heavy especially when the lock is often acquired and there is a lot of contention. Another disadvantage is that the other threads waiting of the lock cannot do something else during waiting and if the thread who has the lock is delayed (due to a page fault or the end of the time quanta by example), the others threads cannot take their turn.

So how to do to avoid this disadvantages ? We must use non-blocking algorithms. This algorithms don't use blocking mechanisms and by that fact are more scalable and performing. These algorithms use low-level machine instructions which are atomic to ensure the atomicity of higher-level operations. While locking is a pessimistic approach, we can also use optimistic technique to develop algorithms. This time, we'll detect collisions between threads in which case, the operation fails and we do something else (often retrying the same operation).

The actual processors provide several instructions that simplify greatly the implementation of these non-blocking algorithms, the most-used operation today is the compare-and-swap operation (CAS). This operation takes three parameters, the memory address, the expected current value and the new value. It atomically update the value at the given memory address if the current value is the expected, otherwise it do nothing. In both cases, the operation return the value at the address after the operation execution. So when several threads try to execute the CAS operation, one thread wins and the others do nothing. So the caller can choose to retry or to do something else. We often use this operation to implement another operation, the compare-and-set. This method makes exactly the same things as CAS but return a boolean indicating if the operation succeeded or not.

Before Java 5.0, this operation was not available directly to developer, but in Java 5.0 several atomic variables (for int, long, boolean and reference values) were added. The int and long versions also supports numeric operations. The JVM compiles these classes with the better operations provided by the hardware machine, CAS or a Java implementation of the operation using a lock. Here are the classes :

AtomicInteger
AtomicLong
AtomicBoolean
AtomicReference

All these classes supports compare-and-set (via the compareAndSet() method) and other operations (get(), set() and getAndSet()). The setters operations are implemented using compareAndSet. These classes supports multi-threaded access and have a better scalability than synchronizing all the operations.

Here is how we can rewrite our counter using an AtomicInteger :

public class AtomicCounter {
    private final AtomicInteger value = new AtomicInteger(0);

    public int getValue(){
        return value.get();
    }

    public int getNextValue(){
        return value.incrementAndGet();
    }

    public int getPreviousValue(){
        return value.decrementAndGet();
    }
}

The incrementAndGet() and decrementAndGet() methods are two of the numeric operations provided by the AtomicLong and AtomicInteger classes. You also have getAndDecrement(), getAndIncrement(), getAndAdd(int i) and addAndGet().

This version is faster than the synchronized one and is also thread safe.

If you only have the compareAndSet(), here is how we can implement increment() method using it :

public void increment(AtomicInteger integer){
    while(true){
        int current = integer.get();
        int next = current + 1;

        if(integer.compareAndSet(current, next)){
            return;
        }
    }
}

This seems to be complicated, but this is the cost of non-blocking algorithms. When we detect collision, we retry until the operation succeeded. This is the common schema for non-blocking algorithms.

Here is a thread-safe Stack implemented using AtomicReference :

public class Stack {
    private final AtomicReference<Element> head = new AtomicReference<Element>(null);

    public void push(String value){
        Element newElement = new Element(value);

        while(true){
            Element oldHead = head.get();
            newElement.next = oldHead;

            //Trying to set the new element as the head
            if(head.compareAndSet(oldHead, newElement)){
                return;
            }
        }
    }

    public String pop(){
        while(true){
            Element oldHead = head.get();

            //The stack is empty
            if(oldHead == null){
                return null;
            }

            Element newHead = oldHead.next;

            //Trying to set the new element as the head
            if(head.compareAndSet(oldHead, newHead)){
                return oldHead.value;
            }
        }
    }

    private static final class Element {
        private final String value;
        private Element next;

        private Element(String value) {
            this.value = value;
        }
    }
}

It's really more complicated than using synchronized on the two methods but also more performing if there is contention (and often even if there is no contention).

So this ends this post. To conclude, atomic variables classes are a really good way to implement non-blocking algorithms and moreover are also a very good alternative to volatile variables, because they can provide atomicity and visibility.

Java Concurrency - Part 5 : Monitors (Locks and Conditions)

Baptiste Wicht

2010-09-06 07:13

Comments

After seeing how to synchronize code using semaphores, we'll see how to do that using monitors.

Monitors are an other mechanism of concurrent programming. It's a higher level mechanism than semaphores and also more powerful. A monitor is an instance of a class that can be used safely by several threads. All the methods of a monitor are executed with mutual exclusion. So at most one thread can execute a method of the monitor at the same time. This mutual exclusion policy makes easier to work with monitor and to develop the method content of the monitor.

Monitors have an other feature, the possibility to make a thread waiting for a condition. During the wait time, the thread temporarily gives up its exclusive access and must reacquire it after the condition has been met. You can also signal one or more threads that a condition has been met.

There is several advantages on using monitors instead of a lower-level mechanisms :

All the synchronization code is centralized in one location and the users of this code don’t need to know how it’s implemented.
The code doesn't depend on the number of processes, it works for as many processes as you want
You don’t need to release something like a mutex, so you cannot forget to do it

When we must describe a monitor, we simple use the monitor keyword and describe the methods as common methods :

monitor SimpleMonitor {
    public method void testA(){
        //Some code
    }

    public method int testB(){
        return 1;
    }
}

To describe a condition variable, we use the cond keyword. A condition variable is a kind of queue of process who are waiting on the same condition. You have several operations available on a condition, the most important is to signal a process waiting to be awaken and to wait on a condition. There are some similarities between signal/wait operations and P and V of semaphores, but this is a little different. The signal operation does nothing if the queue is empty and the wait operation put always the thread in the waiting queue. The process queue is served in a first come, first served mode. When a thread wakes up after waiting on a condition, it must reacquire the lock before continuing in the code.

Before going further, we must have more information about the signal operations. When writing monitors, you normally have the choice between several philosophies for the signaling operation :

Signal & Continue (SC) : The process who signal keep the mutual exclusion and the signaled will be awaken but need to acquire the mutual exclusion before going.
Signal & Wait (SW) : The signaler is blocked and must wait for mutual exclusion to continue and the signaled thread is directly awaken and can start continue its operations.
Signal & Urgent Wait (SU) : Like SW but the signaler thread has the guarantee than it would go just after the signaled thread
Signal & Exit (SX) : The signaler exits from the method directly after the signal and the signaled thread can start directly. This philosophy is not often used.

The available policies depends on the programming language, in Java, there is only one policy available, the SC one.

In Java there is no keyword to directly create a monitor. To implement a monitor, you must create a new class and use Lock and Condition classes. Lock is the interface is ReentrantLock is the main used implementation, this is the one that we'll learn to use in the current post. To create a ReentrantLock, you have two constructors, a default constructor and a constructor with a boolean argument indicating if the lock is fair or not. A fair lock indicates that the threads will acquire the locks in the order they ask for. Fairness is a little heavier than default locking strategies, so use it only if you need it. To acquire the lock, you just have to use the method lock and unlock to release it.

The explicit locks have the same memory semantics than the synchronized blocks. So the visibility of the changes is guarantee when you use lock()/unlock() blocks.

So to implement, the monitor example we've seen before, we just need to create a class and use the lock to make the mutual exclusion :

public class SimpleMonitor {
    private final Lock lock = new ReentrantLock();

    public void testA() {
        lock.lock();

        try {
            //Some code
        } finally {
            lock.unlock();
        }
    }

    public int testB() {
        lock.lock();

        try {
            return 1;
        } finally {
            lock.unlock();
        }
    }
}

The person who've already read the other parts of this post set will say that it will be easier to use the synchronized keyword on the two methods. But with synchronized, we will not have the condition variables. If you don't need condition variables but only locking, it will be easier to use the synchronized blocks instead of Locks.

You can create conditions using the newCondition method on the lock. A condition is a variable of type Condition. You can make the current thread wait on the condition using the await method (and its variant with timeout) and you can signal threads using signal and signalAll methods. The signalAll methods wakes up all the threads waiting on the condition variable.

Let's try with a simple common example : A bounded buffer. It's a cyclic buffer with a certain capacity with a start and an end.

import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

public class BoundedBuffer {
    private final String[] buffer;
    private final int capacity;

    private int front;
    private int rear;
    private int count;

    private final Lock lock = new ReentrantLock();

    private final Condition notFull = lock.newCondition();
    private final Condition notEmpty = lock.newCondition();

    public BoundedBuffer(int capacity) {
        super();

        this.capacity = capacity;

        buffer = new String[capacity];
    }

    public void deposit(String data) throws InterruptedException {
        lock.lock();

        try {
            while (count == capacity) {
                notFull.await();
            }

            buffer[rear] = data;
            rear = (rear + 1) % capacity;
            count++;

            notEmpty.signal();
        } finally {
            lock.unlock();
        }
    }

    public String fetch() throws InterruptedException {
        lock.lock();

        try {
            while (count == 0) {
                notEmpty.await();
            }

            String result = buffer[front];
            front = (front + 1) % capacity;
            count--;

            notFull.signal();

            return result;
        } finally {
            lock.unlock();
        }
    }
}

So some explications :

The two methods are protected with the lock to ensure mutual exclusion
Then we use two conditions variables. One to wait for the buffer to be not empty and an other one to wait for the buffer to be not full.
You can see that I have wrapped the await operation on a while loop. This is to avoid signal stealers problem that can occurs when using Signal & Continue

And that BoundedBuffer can be easily used with several threads with no problems.

As you can see, you can use monitors to solve a lot of concurrent programming problems and this mechanism is really powerful and performing.

I hope you found that article interesting and that this set of posts about Java concurrency brings you some stuff about Java.

Save time with the Gmail Priority Inbox

Baptiste Wicht

2010-09-04 05:44

Comments

Some days ago, Gmail released a new feature, the Priority Inbox. Actually, the feature is not released on all accounts, but if it is released on yours, you can see a message "Try Gmail Priority Inbox", in top right high corner, just after your user name.

Once you activated this new inbox, you will see two inbox, the new Priority Inbox on top and the old Inbox. The Inbox has not changed, but now you can use the Priority Inbox. In this view, you will see your messages sorted in two categories, the first one contains the important unread messages and the second one contains all the other messages.

If you found an error, namely a non-important message tagged as important or the contrary, you have two new buttons :

With these buttons, you can move a message from the priority messages to the others and vice versa. When you do that, the algorithm between the sorting will save your action and try to improve the efficiency of the Priority Inbox to not make the same error again.

At this time, I'm completely happy with this new feature, I've had only one message marked as important that was not, but it's all. I think it's a great tool that Google has offerred to us.

If you want more informations, let's watch this video from Google :

My Java Benchmarks on GitHub

Baptiste Wicht

2010-09-03 07:16

Comments

I've created a new github repository for my Java Benchmarks : java-benchmarks

From now all my benchmarks will be pushed to this repository. This is more simple for me to manage and more secure also.

At this time, there is seven benchmarks on the repository :

Closest Pair Search Benchmark : A benchmark to test two closest pair point search algorithms : the naive one and the sweeping plane one. Results.
File Copy Benchmark : A benchmark on the different ways to make file copy in Java. Results.
Iteration Remove Benchmark : A simple benchmark to test if it's interesting to remove the read elements from a list when we make several iterations over the list.
Reflection Benchmark : A little benchmark to test the performances of reflection versus switch cases and direct invocations.
Short Indexes Loop Benchmark : A benchmark to test which primitive type is the most performing using as iteration index. Results.
Synchronization Benchmark : A benchmark to test the performances of the different synchronization mechanisms available in Java to provide mutual exclusion. Results.
Unmodifiable Benchmark : A benchmark to test the performances of unmodifiable collection versus creating a copy of the list.

I hope you'll find these sources interesting. If you found errors or improvements, don't hesitate to comment to tell me what.

Java Synchronization (Mutual Exclusion) Benchmark

Baptiste Wicht

2010-09-01 07:13

Comments

I've created another benchmark. This time, I've benchmarked the different ways of synchronizing a little code using mutual exclusion on this code.

The code to protect will be very simple. It's a simple counter :

//Init
int counter = 0; 
//Critical section
counter++;

The critical section, if not protected with synchronization system, will not function properly due to possible interleavings (read the article on synchronization if you don't know what is interleaving).

Java Concurrency - Part 4 : Semaphores

Baptiste Wicht

2010-08-30 07:11

Comments

We continue in Java Concurrency with the semaphores. Semaphores is also a way to synchronize threads.

Semaphores are a really simple concept, invented by the famous Dutch computer scientist Edsger Dijkstra. Basically a semaphore is a counter (integer) that allows a thread to get into a critical region if the value of the counter is greater than 0. If it's the case, the counter is decremented by one otherwise, the thread is waiting until it can go. And when the thread go away from the critical region, the counter is incremented by one to allow one more thread to pass the critical region. A semaphore is created with a certain value for its counter. So, you can execute two actions on a semaphore P and V.

By example, if you have a critical that cannot be executed concurrently, you can use a semaphore :

sem mutex = new sem(1)
P(mutex)
//Critical region
V(mutex)

So you must always call by yourself the P operation before the critical region and V after it. We call a mutex (mutual exclusion) a semaphore with a value of one. So only one thread can enter the region guarded by the semaphore. This is the most used semaphore. The other use of semaphore is to guard a set of resources like database connections or a data pool.

In Java, a semaphore is created using the java.util.concurrent.Semaphore class. You can create easily :

Semaphore mutex = new Semaphore(1);
Semaphore available = new Semaphore(100);

The P and V operations are represented using the acquire and release methods. The method acquire can be interrupted if the thread is interrupted. There is an uninterruptible version with the method acquireUninterruptibly(). There is also a third version with the tryAcquire method. This method acquire a permit only if there is one permit available, otherwise, this method return false directly. All the waiting methods have also an overloaded version with a timeout. You can also acquire several permits at once using the permits argument to the different versions of acquire methods.

A little example with a mutex using the same example as the previous post on Java concurrency :

public class Example {
    private int value = 0;

    private final Semaphore mutex = new Semaphore(1)

    public int getNextValue() throws InterruptedException {
        try {
            mutex.acquire();
            return value++;
        } finally {
            mutex.release();
        }
    }
}

For more informations about Semaphore in Java, the best is to consult the Javadoc of the Semaphore class.

To conclude, semaphores are a powerful ways to solve concurrency problems, but this is not adapted to all problems. If you need only mutual exclusion, synchronized blocks are a better solutions. The problems with semaphores is that you can forget to call the release method and that can cause deadlock sometimes difficult to find.

Java Concurrency – Part 3 : Synchronization with intrinsic locks

Baptiste Wicht

2010-08-27 07:15

Comments

<After learning how to create threads and manipulate them, it's time to go to most important things : synchronization.

Synchronization is a way to make some code thread safe. A code that can be accessed by multiple threads must be made thread safe. Thread Safe describe some code that can be called from multiple threads without corrupting the state of the object or simply doing the thing the code must do in right order.

For example, we can take this little class :

public class Example {
    private int value = 0;    

    public int getNextValue(){
        return value++;
    }
}

It's really simple and works well with one thread, but absolutely not with multiple threads. An increment like this is not a simple action, but three actions :

Read the current value of "value"
Add one to the current value
Write that new value to "value"

Normally, if you have two threads invoking the getNextValue(), you can think that the first will get 1 and the next will get 2, but it is possible that the two threads get the value 1. Imagine this situation :

Thread 1 : read the value, get 0, add 1, so value = 1
Thread 2 : read the value, get 0, add 1, so value = 1
Thread 1 : write 1 to the field value and return 1
Thread 2 : write 1 to the field value and return 1

These situations come from what we call interleaving. Interleaving describe the possible situations of several threads executing some statements. Only for three operations and two threads, there is a lot of possible interleavings.

So we must made the operations atomic to works with multiple threads. In Java, the first way to make that is to use a lock. All Java objects contains an intrinsic locks, we'll use that lock to make methods or statement atomic. When a thread has a lock, no other thread can acquire it and must wait for the first thread to release the lock. To acquire the lock, you have to use the synchronized keyword to automatically acquire and release a lock for a code. You can add the synchronized keyword to a method to acquire the lock before invoking the method and release it after the method execution. You can refactor the getNextValue() method using the synchronized keyword :

public class Example {
    private int value = 0;    

    public synchronized int getNextValue(){
        return value++;
    }
}

With that, you have the guarantee that only thread can execute the method at the same time. The used lock is the intrinsic lock of the instance. If the method is static, the used lock is the Class object of Example. If you have two methods with the synchronized keyword, only one method of the two will be executed at the same time because the same lock is used for the two methods. You can also write it using a synchronized block :

public class Example {
    private int value = 0;

    public int getNextValue() {
        synchronized (this) {
            return value++;
        }
    }
}

This is exactly the same as using the synchronized keyword on the method signature. Using synchronized blocks, you can choose the lock to block on. By example, if you don't want to use the intrinsic lock of the current object but an other object, you can use an other object just as a lock :

public class Example {
    private int value = 0;

    private final Object lock = new Object();

    public int getNextValue() {
        synchronized (lock) {
            return value++;
        }
    }
}

The result is the same but has one difference, the lock is internal to the object so no other code can use the lock. With complex classes, it not rare to use several locks to provide thread safety on the class.

There is an other issue with multiple threads : the visibility of the variables. This seems when a change made by a thread is visible by an other thread. For performance improvements, the Java compiler and virtual machines can made some improvements using registers and cache. By default, you have no guarantee that a change made by a thread is visible to an other thread. To make a change visible to an other thread, you must use synchronized blocks to ensure visibility of the change. You must use synchronized blocks for the read and for the write of the shared values. You must make that for every read/write of a value shared between multiple threads.

You can also use the volatile keyword on the field to ensure the visibility of read/write between multiple threads. The volatile keyword ensure only visibility, not atomicity. The synchronized blocks ensure visibility and atomicity. So you can use the volatile keyword on fields that doesn't need atomicity (if you make only read and write to the field without depending on the current value of the field by example).

You can also note that this simple example can be solved using AtomicInteger, but that will be covered later in an other part of the posts.

Pay attention that trying to solve thread safety on a problem can add new issues of deadlock. By example, if thread A owns the lock 1 and are waiting for the lock 2 and if lock 2 is acquired by thread B who waits on lock 1, there is a deadlock. Your program is dead. So you have to pay great attention to the locks.

There is several rules that we must keep in mind when using locks :

Every mutable fields shared between multiple threads must be guarded with a lock or made volatile, if you only need visibility
Synchronize only the operations that must synchronized, this improve the performances. But don't synchronize too few operations. Try to keep the lock only for short operations.
Always know which locks are acquired and when there are acquired and by which thread
An immutable object is always thread safe

Here we are, I hope that this post helps you to understand thread safety and how to achieve it using intrinsic locks. In the next posts, we'll see another synchronization methods.

Java File Copy Benchmark Updates (once again)

Baptiste Wicht

2010-08-25 07:26

Comments

I've made another updates to my file copy benchmark.

First of all, I used my little utility class to automatically create the graphs. The graph are a little less clean, but I spare a lot of time not creating them myself.

Then, I've also made some corrections on the code :

I''ve used a buffer size of 8192 instead of 4096
I've made some corrections using the channels because the old code can forgot to write some portions of the file
I used allocateDirect() instead of allocate() for the ByteBuffer.

And I've added a new method using Java 7 : Path.copyTo(Path path).

So the new results are all based on a Java 7 Virtual Machine.

You'll find all the new informations and result, on the original post : File Copy in Java - Benchmark

I hope this new informations will interest you.

Java 7 : The new try-with-resources statement

Baptiste Wicht

2010-08-24 07:30

Comments

From the build 105, the compiler and runtime of Java 7 Releases have support for the new form of try : try-with-resources, also called ARM (Automatic Resource Management) blocks.

This new statement make working with streams and all kind of closeable resources easier. By example, in Java, you can have this kind of code :

private static void customBufferStreamCopy(File source, File target) {
    InputStream fis = null;
    OutputStream fos = null;
    try {
        fis = new FileInputStream(source);
        fos = new FileOutputStream(target);

        byte[] buf = new byte[8192];

        int i;
        while ((i = fis.read(buf)) != -1) {
            fos.write(buf, 0, i);
        }
    }
    catch (Exception e) {
        e.printStackTrace();
    } finally {
        close(fis);
        close(fos);
    }
}

private static void close(Closeable closable) {
    if (closable != null) {
        try {
            closable.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

A little bit heavy, isn't it ? This is only an example, here the management of exceptions is not good.

So let's use try-with-resources statement to simplify this code, who becomes :

private static void customBufferStreamCopy(File source, File target) {
    try (InputStream fis = new FileInputStream(source);
        OutputStream fos = new FileOutputStream(target)){

        byte[] buf = new byte[8192];

        int i;
        while ((i = fis.read(buf)) != -1) {
            fos.write(buf, 0, i);
        }
    }
    catch (Exception e) {
        e.printStackTrace();
    }
}

A lot cleaner, no ? With that code, the resources are automatically closed after the try. In the try resources list, you can declare several resources, but all these resources must implement the java.lang.AutoCloseable interface.

If you want more informations, about this new statement read try-with-resources specifications.