C++ Raw Pointer vs. Smart Pointer Performance with 1 Million Linked Lists

July 10, 2020

Early look at Massif memory allocation results

In my research work, I need to maintain about 1,000,000 linked lists, with each growing to about 100 nodes and frequently deleting the head. These lists initially populate by reading in data from a file in a matter of seconds, and I’m aiming for <1 millisecond performance to add one new node to all the lists. Each node occupies 29 bytes of storage, including the pointer to the next node. That’s 2.7 GB of data that needs to be processed and managed with minimal overhead. At this scale and with the particular target application, every nanosecond counts.

The conventional wisdom for this problem would be to use smart pointers to manage the node references, especially since the nodes themselves must contain the pointers and the nodes are deleted by a function from a different class. unique_ptrs are the fastest, with the CPP docs saying they offer “little to no overhead over built-in pointers (depending on the deleter used).” So I took their word for it and used unique_ptr references in my nodes with the default deleter. But the performance was awful. Truly bad.

I set up the following test to verify that the use of smart pointers was the culprit, making sure to allocate and populate the memory realistically, as well as test deleting and reinitializing data. Here’s the code to test raw pointer performance:

#include <iostream>
using std::cerr;
using std::cout;
using std::endl;
using std::ostream;

#include <chrono> 
using namespace std::chrono; 
  
#include <array>
#include <cstdint>

class Node
{
private:
    Node *next;
    double d1;
    double d2;
    double d3;

public:
    Node();
    Node(double d1, double d2, double d3);
    Node *getNext();

    void addNext(int count);
};

static int arr_size = 1000000;
static int list_size = 100;

Node::Node()
{
    next = nullptr;
}
Node *Node::getNext()
{
    return next;
}

Node::Node(double d1, double d2, double d3) : d1(d1), d2(d2), d3(d3)
{

    next = nullptr;
}

void Node::addNext(int count)
{
    if (count < list_size)
    {
        next = new Node(d1, d2, d3);
        next->addNext(++count);
    }
}

Node *deleteTest(Node *head)
{
    Node *old_head = head;
    head = head->getNext();
    delete old_head;
    return head;
}

int main(int argc, char *argv[])
{
    auto start = high_resolution_clock::now(); 
    Node **nodes = new Node *[arr_size];
    for (int i = 0; i < arr_size; i++)
    {
        nodes[i] = new Node(1, 2, 3);
        nodes[i]->addNext(0);
    }

     for (int j = 0; j < arr_size; j++)
    {
        for (int i = 0; i < list_size; i++)
        {
            nodes[j] = deleteTest(nodes[j]);
        }
        delete nodes[j];
    }

    for (int i = 0; i < arr_size; i++)
    {
        nodes[i] = new Node(1, 2, 3);
        nodes[i]->addNext(0);
    }

     for (int j = 0; j < arr_size; j++)
    {
        for (int i = 0; i < list_size; i++)
        {
            nodes[j] = deleteTest(nodes[j]);
        }
        delete nodes[j];
    }

    delete [] nodes;
    
    auto stop = high_resolution_clock::now(); 
    cout << "Finished in " << duration_cast<milliseconds>(stop-start).count() << " milliseconds" << endl;
}

And here’s the code to test unique_ptr performance:

#include <iostream>
using std::cerr;
using std::cout;
using std::endl;
#include <chrono> 
using namespace std::chrono; 

#include <iomanip>
#include <array>
#include <cstdint>

class Node
{
private:
    std::unique_ptr<Node> next;
    double d1;
    double d2;
    double d3;

public:
    Node();
    Node(double d1, double d2, double d3);
    std::unique_ptr<Node> getNext();

    void addNext(int count);
};

static int arr_size = 1000000;
static int list_size = 100;

Node::Node()
{
    next = nullptr;
}
std::unique_ptr<Node> Node::getNext()
{
    return std::move(next);
}

Node::Node(double d1, double d2, double d3) : d1(d1), d2(d2), d3(d3)
{

    next = nullptr;
}

void Node::addNext(int count)
{
    if (count < list_size)
    {
        next = std::make_unique<Node>(d1, d2, d3);
        next->addNext(++count);
    }
}

std::unique_ptr<Node> deleteTest(std::unique_ptr<Node> head)
{
    std::unique_ptr<Node> old_head;
    head.swap(old_head);
    head = old_head.get()->getNext();
    old_head.reset();
    return head;
}

int main(int argc, char *argv[])
{
    auto start = high_resolution_clock::now(); 
    std::unique_ptr<Node> *nodes = new std::unique_ptr<Node>[arr_size];
    for (int i = 0; i < arr_size; i++)
    {
        nodes[i] = std::make_unique<Node>(1, 2, 3);
        nodes[i]->addNext(0);
    }
    for (int j = 0; j < arr_size; j++)
    {
        for (int i = 0; i < list_size; i++)
        {
            nodes[j] = deleteTest(std::move(nodes[j]));
        }
        nodes[j].reset();
    }

    for (int i = 0; i < arr_size; i++)
    {
        nodes[i] = std::make_unique<Node>(1, 2, 3);
        nodes[i]->addNext(0);
    }
    for (int j = 0; j < arr_size; j++)
    {
        for (int i = 0; i < list_size; i++)
        {
            nodes[j] = deleteTest(std::move(nodes[j]));
        }
        nodes[j].reset();
    }    
    delete [] nodes;
    
    auto stop = high_resolution_clock::now(); 
    cout << "Finished in " 
         << duration_cast<milliseconds>(stop-start).count() 
         << " milliseconds" << endl;
}

Here were my results from running the programs in a terminal:

Raw vs smart results

The raw pointers performed about 6% better than the unique_ptrs on my machine. This may not seem like a huge difference, but as I said above it really adds up noticeably over the program lifecyle. Things get particularly bad, however, when I’m tyring to debug the code.

Debugging in Visual Studio Code I achieve only a minor slowdown with the raw pointers over their straight execution: Raw debug

Meanwhile the use of unique_ptrs caused a 24X slowdown when debugging: Smart debug

With my approach to development, such a slowdown is intolerable. At this stage in my project, I often need to debug and step through large chunks of my code in the middle of execution (after several GB of data have been processed). So I’ll gladly take the higher risk of using raw pointers over the severe debugging bottleneck of smart pointers, in this case.