RDataFrame

ROOT is an open-source framework designed for data analysis, particularly in high-energy physics, but it's also used in other fields. While it's a very powerful tool, ROOT can be challenging to work with due to its complexity.

A relatively new feature in ROOT has been introduced that combines the efficiency and power of ROOT with the ease and simplicity of Pandas dataframes: RDataFrame.

RDataFrame provides a simple way to filter, reduce, and process datasets directly from .root files, making large datasets more manageable. It also allows easy conversion to Pandas dataframes, enabling the use of Python’s data analysis tools without the burden of massive file sizes. In short, a perfect tool for handling big data

Since navigating the RDataFrame documentation can be a bit challenging, I thought it would be helpful to provide a few lines of code below to help you get started.


Once ROOT is installed (Check CERN's official website on how to do it), the ROOT library can be used in Python (and C++ as well). This step is important as ROOT has to be imported in the Python script.

     
import ROOT
    

The initialization of a RDataFrame from a list of .root files can be done in a single line:

     
rdf = ROOT.RDataFrame(BranchName, ListOfROOTFiles)
    

One can print the RDataFrame, get its columns, or count its number of rows:

     
rdf.Display().Print()
rdf.GetColumnNames()
rdf.Count().GetValue()
    

Just as in Pandas, one can define new columns, and apply a selection of the rows of the dataframe:

     
rdf.Define(VariableName, StringOfOperation)
rdf.Filter(StringOfExpression)
    

A last and important feature is the conversion between Pandas dataframes and RDataFrame.

In one way, it can be done like this:

     
data = {key: df[key].values for key in list(df.columns)}
rdf = ROOT.RDF.MakeNumpyDataFrame(data)
    

And in the other yields:

     
ListColumns = [str(col) for col in rdf.GetColumnNames()]
df = pd.DataFrame.from_dict(rdf.AsNumpy(ListColumns))
    

This is in no circumstances a comprehensive overview of the RDataFrame methods and functions that are available. More information can be found online in ROOT forums for example, yet I hope it gives a taste of it.