From the Blogosphere
How Memory Leaks Happen in a Java Application | @CloudExpo #JVM #Java #Virtualization
One of the core benefits of Java is the JVM, which is an out-of-the-box memory management
By: Stackify Blog
Aug. 24, 2017 01:00 PM
How Memory Leaks Happen in a Java Application
Introduction to Memory Leaks In Java Apps
Nevertheless, memory leaks can still occur in Java applications.
In this article, we're going to describe the most common memory leaks, understand their causes, and look at a few techniques to detect/avoid them. We're also going to use the Java YourKit profiler throughout the article, to analyze the state of our memory at runtime.
1. What is a Memory Leak in Java?
For a better understanding of the concept, here's a simple visual representation:
As we can see, we have two types of objects - referenced and unreferenced; the Garbage Collector can remove objects that are unreferenced. Referenced objects won't be collected, even if they're actually not longer used by the application.
Detecting memory leaks can be difficult. A number of tools perform static analysis to determine potential leaks, but these techniques aren't perfect because the most important aspect is the actual runtime behavior of the running system.
So, let's have a focused look at some of the standard practices of preventing memory leaks, by analyzing some common scenarios.
2. Java Heap Leaks
An advantageous technique to understand these situations is to make reproducing a memory leak easier by setting a lower size for the Heap. That's why, when starting our application, we can adjust the JVM to suit our memory needs:
These parameters specify the initial Java Heap size as well as the maximum Heap size.
2.1. Static Field Holding On to the Object Reference
Let's have a look at a quick example:
We created our ArrayList as a static field - which will never be collected by the JVM Garbage Collector during the lifetime of the JVM process, even after the calculations it was used for are done. We also invoked Thread.sleep(10000) to allow the GC to perform a full collection and try to reclaim everything that can be reclaimed.
Let's run the test and analyze the JVM with our profiler:
Notice how, at the very beginning, all memory is, of course, free.
Then, in just 2 seconds, the iteration process runs and finishes - loading everything into the list (naturally this will depend on the machine you're running the test on).
After that, a full garbage collection cycle is triggered, and the test continues to execute, to allow this cycle time to run and finish. As you can see, the list is not reclaimed and the memory consumption doesn't go down.
Let's now see the exact same example, only this time, the ArrayList isn't referenced by a static variable. Instead, it's a local variable that gets created, used and then discarded:
Once the method finishes its job, we'll observe the major GC collection, around 50th second on the image below:
Notice how the GC is now able to reclaim some of the memory utilized by the JVM.
How to prevent it?
First, we need to pay close attention to our usage of static; declaring any collection or heavy object as static ties its lifecycle to the lifecycle of the JVM itself, and makes the entire object graph impossible to collect.
We also need to be aware of collections in general - that's a common way to unintentionally hold on to references for longer than we need to.
2.2. Calling String.intern() on Long String
Let's have a look at a quick example:
Here, we simply try to load a large text file into running memory and then return a canonical form, using .intern().
The intern API will place the str String in the JVM memory pool - where it can't be collected - and again, this will cause the GC to be unable to free up enough memory:
We can clearly see that in the first 15th seconds JVM is stable, then we load the file and JVM perform garbage collection (20th second).
Finally, the str.intern() is invoked, which leads to the memory leak - the stable line indicating high heap memory usage, which will never be released.
How to prevent it?
The second solution is to use Java 8 - where the PermGen space is replaced by the Metaspace - which won't lead to any OutOfMemoryError when using intern on Strings:
Finally, there are also several options of avoiding the .intern() API on Strings as well.
2.3. Unclosed Streams
Why partially? Because the try-with-resources syntax is optional:
Let's see how the memory of the application looks when loading a large file from an URL:
As we can see, the heap usage is gradually increasing over time - which is the direct impact of the memory leak caused by not closing the stream.
How to prevent it?
In this case, the BufferedReader will be automatically closed at the end of the try statement, without the need to close it in an explicit finally block.
2.4. Unclosed Connections
Let's see a quick example:
The URLConnection remains open, and the result is, predictably, a memory leak:
Notice how the Garbage Collector cannot do anything to release unused, but referenced memory. The situation is immediately clear after the 1st minute - the number of GC operations rapidly decreases, causing increased Heap memory use, which leads to the OutOfMemoryError.
How to prevent it?
2.5. Adding Objects with no hashCode() and equals() into a HashSet
Specifically, when we start adding duplicate objects into a Set - this will only ever grow, instead of ignoring duplicates as it should. We also won't be able to remove these objects, once added.
Let's create a simple class without either equals or hashCode:
Now, let's see the scenario:
This simple implementation will lead to the following scenario at runtime:
Notice how the garbage collector stopped being able to reclaim memory around 1:40, and notice the memory leak; the number of GC collections dropped almost four times immediately after.
How to prevent it?
3. How to Find Leaking Sources in Your Application
Let's see which techniques can help you in addition to standard profiling.
3.1. Verbose Garbage Collection
By adding the -verbose:gc parameter to the JVM configuration of our application, we're enabling a very detailed trace of GC. Summary reports are shown in default error output file, which should help you understand how your memory is being managed.
3.2. Do Profiling
In this article, we used another profiler - YourKit - which has some additional, more advanced features compared to Visual VM.
3.3. Review Your Code
Simply put - review your code thoroughly, practice regular code reviews and make good use of static analysis tools to help you understand your code and your system.
Then, having the techniques and tools to really see what's happening at runtime, as the leak occurs, is critical as well. Static analysis and careful code-focused reviews can only do so much, and - at the end of the day - it's the runtime that will show you the more complex leaks that aren't immediately identifiable in the code.
Finally, leaks can be notoriously hard to find and reproduce because many of them only happen under intense load, which generally happens in production. This is where you need to go beyond code-level analysis and work on two main aspects - reproduction and early detection.
The best and most reliable way to reproduce memory leaks is to simulate the usage patterns of a production environment as close as possible, with the help of a good suite of performance tests.
And early detection is where a solid performance management solution and even an early detection solution can make a significant difference, as it's the only way to have the necessary insight into the runtime of your application in production.
The full implementation of this tutorial can be found over on GitHub. This is a Maven based project, so it can simply be imported and run as it is.
Best Recent Articles on Cloud Computing & Big Data Topics
As we enter a new year, it is time to look back over the past year and resolve to improve upon it. In 2014, we will see more service providers resolve to add more personalization in enterprise technology. Below are seven predictions about what will drive this trend toward personalization.
IT organizations face a growing demand for faster innovation and new applications to support emerging opportunities in social, mobile, growth markets, Big Data analytics, mergers and acquisitions, strategic partnerships, and more. This is great news because it shows that IT continues to be a key stakeholder in delivering business service innovation. However, it also means that IT must deliver new innovation despite flat budgets, while maintaining existing services that grow more complex every day.
Cloud computing is transforming the way businesses think about and leverage technology. As a result, the general understanding of cloud computing has come a long way in a short time. However, there are still many misconceptions about what cloud computing is and what it can do for businesses that adopt this game-changing computing model. In this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan, Rex Wang, Vice President of Product Marketing at Oracle, discusses and dispels some of the common myths about cloud computing that still exist today.
Despite the economy, cloud computing is doing well. Gartner estimates the cloud market will double by 2016 to $206 billion. The time for dabbling in the cloud is over! The 14th International Cloud Expo, co-located with 5th International Big Data Expo and 3rd International SDN Expo, to be held June 10-12, 2014, at the Javits Center in New York City, N.Y. announces that its Call for Papers is now open. Topics include all aspects of providing or using massively scalable IT-related capabilities as a service using Internet technologies (see suggested topics below). Cloud computing helps IT cut infrastructure costs while adding new features and services to grow core businesses. Clouds can help grow margins as costs are cut back but service offerings are expanded. Help plant your flag in the fast-expanding business opportunity that is The Cloud, Big Data and Software-Defined Networking: submit your speaking proposal today!
What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig? No, you get a “Logical Data Warehouse.” In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one.
While unprecedented technological advances have been made in healthcare in areas such as genomics, digital imaging and Health Information Systems, access to this information has been not been easy for both the healthcare provider and the patient themselves. Regulatory compliance and controls, information lock-in in proprietary Electronic Health Record systems and security concerns have made it difficult to share data across health care providers.
Cloud Expo, Inc. has announced today that Vanessa Alvarez has been named conference chair of Cloud Expo® 2014. 14th International Cloud Expo will take place on June 10-12, 2014, at the Javits Center in New York City, New York, and 15th International Cloud Expo® will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
12th International Cloud Expo, held on June 10–13, 2013 at the Javits Center in New York City, featured four content-packed days with a rich array of sessions about the business and technical value of cloud computing led by exceptional speakers from every sector of the cloud computing ecosystem. The Cloud Expo series is the fastest-growing Enterprise IT event in the past 10 years, devoted to every aspect of delivering massively scalable enterprise IT as a service.
Ulitzer.com announced "the World's 30 most influential Cloud bloggers," who collectively generated more than 24 million Ulitzer page views. Ulitzer's annual "most influential Cloud bloggers" list was announced at Cloud Expo, which drew more delegates than all other Cloud-related events put together worldwide. "The world's 50 most influential Cloud bloggers 2010" list will be announced at the Cloud Expo 2010 East, which will take place April 19-21, 2010, at the Jacob Javitz Convention Center, in New York City, with more than 5,000 expected to attend.
It's a simple fact that the better sales reps understand their prospects' intentions, preferences and pain points during calls, the more business they'll close. Each day, as your prospects interact with websites and social media platforms, their behavioral data profile is expanding. It's now possible to gain unprecedented insight into prospects' content preferences, product needs and budget. We hear a lot about how valuable Big Data is to sales and marketing teams. But data itself is only valuable when it's part of a bigger story, made visible in the right context.
Cloud Expo, Inc. has announced today that Larry Carvalho has been named Tech Chair of Cloud Expo® 2014. 14th International Cloud Expo will take place on June 10-12, 2014, at the Javits Center in New York City, New York, and 15th International Cloud Expo® will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Everyone talks about a cloud-first or mobile-first strategy. It's the trend du jour, and for good reason as these innovative technologies have revolutionized an industry and made savvy companies a lot of money. But consider for a minute what's emerging with the Age of Context and the Internet of Things. Devices, interfaces, everyday objects are becoming endowed with computing smarts. This is creating an unprecedented focus on the Application Programming Interface (API) as developers seek to connect these devices and interfaces to create new supporting services and hybrids. I call this trend the move toward an API-first business model and strategy.
We live in a world that requires us to compete on our differential use of time and information, yet only a fraction of information workers today have access to the analytical capabilities they need to make better decisions. Now, with the advent of a new generation of embedded business intelligence (BI) platforms, cloud developers are disrupting the world of analytics. They are using these new BI platforms to inject more intelligence into the applications business people use every day. As a result, data-driven decision-making is finally on track to become the rule, not the exception.
Digital Transformation Blogs