Thursday, June 21, 2007

Garbage Collection - Moving Closer

In the previous blog, there was a open ended question. The question goes like this "In your JVM, you have few objects that are eligible for the garbage collection. JVM also runs garbage collector. Now the question is, whether all the objects that are eligible for GC will be garbage collected?". The answer to this question needs the deeper understanding of the various garbage collection algorithms, the size of the objects and more importantly the life duration of the objects. Believe me, JVM's GC is not one algorithm but pool of algorithms that sweeps the heap.

We had an overview of Garbage Collection from a programmers' perspective in the previous write-up. It was just an introduction and we did not discuss much from the JVM's perspective. Some of you would started ask a question, "Why should I know all these? I am just a developer". Thanks for your thoughtful question and I appreciate it. You might have high-end desktops for coding a "Hello, World" program and your application might work as expected. But an enterprise level Java application face a lot of challenges in terms of functionalities and performance. If you know the internal working of a system, you get a broader perspective and possibly lead to judicious usage of the system. Your knowledge on JVM will become handy and it will certainly pay-off in near future.

As we discussed in the previous write-ups, there are two factors that are key for successful garbage collection. The first work is to identify the garbage objects (objects that are no longer used) and the second work is to actually clean them up. Apart for cleaning up, the garbage collection algorithm also does memory organization by relocating objects in heap.

Garbage Collection - A Closer Look

In the case of Sun Hotspot JVM, the garbage collection takes place using generational collector. Basically, there are two postulates based on which the generational collector works and they are
  • Most of the objects (more than 90% of the objects) are short lived. They die as soon as they are created.
  • A smaller percentage of the objects are long lived. They live almost till the JVM is live.
Based on this two postulates, Sun Hotspot JVM has designed the garbage collection algorithm. The entire heap is sub-divided into two regions - Young generation and Old generation. The young generation is smaller in size when compared to the older generation. Initially the objects are created in the young generation and they are promoted to older generation, if they live long. JVM employs two difference garbage collection algorithm in young and old generation as they nature of the objects varies. The following diagram gives a pictorial representation of JVM heap.

Figure - 1

Wednesday, June 20, 2007

Garbage Collection – Part 1

Java is platform independent and Java Virtual Machine, the platform dependent component of Java gives this platform independency. JVM is exceptional user space process and it runs from mobile to high end servers. JVM has to bring in platform independency in Class Loading, Threading, Input/Output and Garbage Collection. Looking from the operating system's perspective Java Virtual Machine is just another process. But things are not that easy as they seem. Starting from the loading of classes to the recycling of objects, JVM behaves like an operating system by itself. In this series of blogs, we will discuss how the objects are re-cycled, various garbage collection algorithms and how to tune your JVM for better performance. Our discussions will be based on Sun JDK 1.5 (Tiger) and will be use open source tools as it is freely available in order everyone can use them.

Overview of GC

JVM is just a mighty big process (or it could be a tiny process running in a mobile phone) – that’s the operating system’s perspective. JVM internally does a lot of magic but nothing is visible to the operating system. The operating system just services the JVM. Garbage Collection and Threads are integral part of JVM. There are at least few threads runs when the JVM alive and GC is one among them. But GC is a low priority thread and JVM runs it only on need basis. JVM will try to run GC only when there is memory scarcity. But when it runs, it stalls the other threads in the JVM. The user program creates the objects and uses them as long as they want. The user program de-references it when they no longer need it and JVM takes care of deleting the unused objects executing the finalizer.

There is question that comes to our mind immediately. Who decides the eligibility for the garbage collection? Or how the objects are picked up for deletion? Though JVM automatically deletes the unwanted object but it is developers’ responsibility to say that he/she does not want the objects anymore. The developers need not tell this explicitly but JVM understands it implicitly. Among the objects in the JVM, JVM chooses few objects as special ones and name them as “Root Objects”. Usually the local references of all the stack frames (local variables of all the methods that haven’t exited), string objects in constant pool of the class and the class variables or static variables will be termed as “Root Objects”. When JVM wants to do the garbage collection, it removes the unused object from the Heap. All the objects that are reachable from any of the root objects directly or any objects that are chained with the reachable objects are the objects that are currently being used. All other objects that are not reachable from root objects either directly or through object that are linked with root objects are termed as “Unused objects”. During garbage collection, the JVM marks the unused objects and deletes them freeing up memory. But the algorithm of finding the unused objects and deleting them greatly varies from implementation to implementation. Even a single JVM implementation might have many GC algorithms which can be used based on the application and situation. Before deleting, JVM checks whether the object has finalize. If it has one, JVM postpones the freeing up of the object until finalizer is run. So the objects which have finalizer, GC is has one additional step, that is, invoking finalizer. Until then JVM does not deletes the object.

So far we discussed theoretical aspect of the garbage collection process, the rest of this section explains with an example.

Looking at the figure, the objects that are yellow are the root objects. All the objects that are chained with root objects are currently being used (that are represented in pink color). The objects that are not reachable from any root object directly or indirectly are unused objects which are represented as dotted circle. There are chances for unused object being linked with each other but still they are unused objects and eligible for the garbage collection. The point is, the objects should be reachable from the any of the existing stack frame (Each method when invoked, JVM pushes a stack frame that contains local variable, operand stack).


Eligibility for GC

In the last section, we discussed GC from 10,000 feet. In this section we are going to see Java program and identify the objects that are eligible for garbage collection at various stages of the program. By “Eligible for GC”, we should understand that the objects are only eligible for GC and we are telling JVM that we no longer need the objects. It is up to JVM to delete those objects and recycle the memory. JVM will make every possible attempt to recycle memory.


Consider the above code, at line 18, 19 and 20 we are creating objects. Assume that the control is at line 21 after executing 20. At this point of time, we have references to four objects referred by “args”, “string”, “i” and “j”. So there are four root objects. All the root objects and the objects that are reachable from the root objects are not eligible for GC. Hence at line 20, there are no objects eligible for GC.

Consider the above code, at line 18, 19 and 20 we are creating objects. Assume that the control is at line 21 after executing 20. At this point of time, we have references to four objects referred by “args”, “string”, “i” and “j”. So there are four root objects. All the root objects and the objects that are reachable from the root objects are not eligible for GC. Hence at line 20, there are no objects eligible for GC.

When JVM executes the method “display”, the control goes to the method where it has references to the objects that are passed as arguments. Apart from the arguments, the method “display” has one more local variable “string” which also becomes a root object. After executing line 32, the total root object becomes 5. At line 34 and 37, even though the object referred by “i” and “j” are de-referenced, the objects pointed by “i” and “j” cannot be garbage collected as the method “main” already has a reference. At line 40, the object referred by “string” is eligible for garbage collection. Once again at line 43, the object referred by “str” is not eligible has the method “main” has a reference to it.

When the control gets back to the method “main”, after executing line 23, the object referred by the variable “i” becomes eligible for GC. Subsequently, the objects referred by variable “j” and “string” become eligible for GC when the line 25 and 27 are executed respectively.

To summarize
  • The root object is decided not only passed on the local variable of the method being executed. JVM goes through the entire stack to find the root objects.
  • Apart from the stack, JVM also looks into “static” variable or class variables and keep them as root variables. So the static variables will be eligible for garbage collection when the class is unloaded from JVM
  • Few JVM will implement the method area in Java Heap. That is the JVM allocated memory to hold code in the heap. Those objects are also becomes eligible for GC when the classes are unloaded.

Questions for Understanding

1. What is Garbage Collection?
2. How the objects are recycled?
3. What is the standard garbage collection algorithm recommended by Java Virtual Machine Specification?
4. Elaborate the object lifecycle.
5. What are the candidates of root objects? How they affect the garbage collection?

Question for Thinking

In your JVM, you have few objects that are eligible for the garbage collection. JVM also runs garbage collector. Now the question is, whether all the objects that are eligible for GC will be garbage collected?

Answer in single word "yes" or "no".

If you are not sure on what to answer, the next blog will open some of the concepts and eventually you will answer the question.

Keep Watching and Have a Great Day

Garbage Collection - Understanding the Death

If you are C/C++ guy and migrated to Java, Java might have impressed you with Garbage Collection. Yes, that is true. Java cleans up your mess automatically but it is your responsibility to show which ones are mess. Even with Java automatic GC still memory leak is possible in Java. Refer "Effective Java" by Joshua Bloch (it is a great book on using Java effectively). Like Linux, Java is also being used from Mobile phones to High end servers. Would you believe, if I say, Java does not have just one garbage collection algorithm but it is a suite of algorithms. Above that, the interesting fact is that, Garbage Collection algorithms are tunable and configurable.

By default, it works in a specific way but it can be changed based on the application requirements during starting up of Java Virtual Machine. We will be discussing Garbage Collection techniques and GC tuning techniques with simple example. In the entire exercise, we will rely on generating GC statistics from the JVM and internalize the Garbage collection.

To understand the next few blogs (I dont know how many blogs that I will write on GC), it is assumed that you know Java. Even if you don't know Java the algorithms will be of interest if you are planning to write memory management module. I bet, you will like the way it is written.

Monday, June 18, 2007

java -Xms

Kick Starting With a Little Knowledge (java -Xms)

Java is an interpreted language and lacks performance as it runs on Java Virtual Machine. This is an age old statement and probably true during its early stages. It is quite natural for any Software or Hardware to scale up or perform well. Java is an exception as it came fast exceptionally. Today with a lot of improvements and flexibilities Java Technologies gives an edge. Java Virtual Machine plays a crucial role for these performance improvements. A newbie can get started with Java and write a decent application with greater ease. In this blog, I would like to share the information about JVM, JDK, APIs, Tools and ideas related to Java Technology. As of now, I am planning to write regularly but of course I do not know the interval. Anyways, I am positive to blog and share some of the information I find interesting. Keep watching and give back your comments.