Overview

Hortonworks Data Flow (HDF) released version 3.1 on February 1, 2018. HDF 3.1 includes Apache NiFi 1.5.0. NiFi 1.4.0 introduced improved Apache Ranger integration with NIFI-4032. The new integration has a bad RangerNiFiAuthorizer memory leak - NIFI-4925.

Debugging Apache NiFi RangerNiFiAuthorizer Memory Leak

We upgraded our development HDF 3.0 cluster to HDF 3.1 on February 22. After upgrading, we noticed that there was a significant increase in the amount of heap being used. Over teh course of less than a few hours, the heap usage exceeded 95% and the cluster became unresponsive. We started eliminating variables and made sure we hadn’t made any changes to our NiFi flows that had been working with HDF 3.0.

By Tuesday 2/27, we had eliminated all potential confounding variables and started to investigate this as a potential NiFi bug. We gathered a heap dump on one of the nodes and ran an analysis on it. We found the following:

One instance of “org.apache.nifi.authorization.AuthorizerFactory$2” loaded by “org.apache.nifi.nar.NarClassLoader @ 0x3cd41a5c0” occupies 12,250,365,064 (84.18%) bytes. The memory is accumulated in one instance of “java.util.concurrent.ConcurrentHashMap$Node[]” loaded by “".

The heap analysis report showed that the cause of the above message was a ConcurrentHashMap in the RangerNiFiAuthorizer class that had grown uncontrollably. The RangerNiFiAuthorizer class was introduced as part of NIFI-4032. The variable resultLookup was being added to and not necessarily being cleaned up.

On Wednesday 2/28, we opened a support case with Hortonworks to try to fix this Apache NiFi bug. By Friday 3/2, Hortonworks had identified the issue and was working on a hot fix for us. This is being tracked publically as NIFI-4925.

What is next?

We are looking forward to NIFI-4925 being resolved quickly and applying the hot fix to our environment. A release of Apache NiFi 1.4.x and 1.5.x could also fix the open source versions of Apache NiFi. This could be included in a HDF 3.1.x release as well.