How is C++ STL list implemented internally

The previous article in this series called STLHow, showed how to implement STL Vector. This article will go into details about implementation of STL list.

Implementing STL List

The node

The STL list requires traversal in both the directions by means of forward and reverse iterator. Hence the list should be implemented as a doubly linked list. The node of the list is defined as below.

	template <typename T>
	struct ListNode
	{
		ListNode(const T &e) : data(e),
			                   next(0), prev(0)
		{ }

		T data;
		ListNode<T> *next;
		ListNode<T> *prev;
	};

 List class

	template <typename T>
	class list
	{
	public:
		list() : head_(0), tail_(0),
			     size_(0)
		{ }

		int size() { return size_; }
	private:
		ListNode<T> *head_;
		ListNode<T> *tail_;

		int size_;
	};

The list maintains pointers to head and tail of the list. Also the size of the list is stored explicitly. Because of the explicitly stored size, the size function can return the size of the list in constant time which would otherwise be O(n) had the list was required to be traversed to find out the size. The default constructor initializes all these private variables to zero.

Adding elements

	template <typename T>
	ListNode<T> *list<T>::insertInternal(const T &e, ListNode<T> *pos)
	{
		ListNode<T> *n = new ListNode<T>(e);

		size_++;
		// no operation below this should throw
		// as state of the list has changed and memory allocated

		n->next = pos;

		if(pos)
		{
			n->prev = pos->prev;
			pos->prev = n;
		}
		else
		{
			// pos is null that is at the very end of the list
			n->prev = tail_;
			tail_ = n;
		}

		if(n->prev)
		{
			n->prev->next = n;
		}
		else
		{
			// at the very begining of the list
			head_ = n;
		}

		return n;
	}

 

	template <typename T>
	void list<T>::push_back(const T &e)
	{
		// inserts before the position, 
        // 0 is the end() iterator
		// hence insert at the end
		insertInternal(e, 0);
	}

	template <typename T>
	void list<T>::push_front(const T &e)
	{
		// insert before the head
		insertInternal(e, head_);
	}

 

Removing elements

	template <typename T>
	void list<T>::removeInternal(ListNode<T> *pos)
	{
		if(pos)
		{
			if(pos->prev)
				pos->prev->next = pos->next;

			if(pos->next)
				pos->next->prev = pos->prev;

			if(pos == head_)
				head_ = pos->next;

			if(pos == tail_)
				tail_ = pos->prev;

			delete pos;

			size_--;
		}
	}

 

	template <typename T>
	void list<T>::pop_back()
	{
		removeInternal(tail_);
	}

	template <typename T>
	void list<T>::pop_front()
	{		
		removeInternal(head_);
	}

 

Accessing elements

		T &front()
		{ return head_->data; }

		T &back()
		{ return tail_->data; }

 

Destructor

	template <typename T>
	list<T>::~list()
	{
		ListNode<T> *current( head_ );

		while(current)
		{
			ListNode<T> *next( current->next );
			delete current;
			current = next;
		}
	}

 

Forward iterator

		class iterator
		{
		public:
			iterator(ListNode<T> *p=0) : pos_(p) { }
			
			T &operator*()
			{ return pos_->data; }

			T *operator->()
			{ return &(pos_->data); }

			bool operator!=(const iterator &rhs)
			{ return this->pos_ != rhs.pos_; }

			iterator operator++(int)
			{ pos_ = pos_->next; return *this; }

			iterator operator--(int)
			{ pos_ = pos_->prev; return *this; }

		private:
			ListNode<T> *pos_;
		};

		iterator begin()
		{ return iterator(head_); }

		iterator end()
		{ return iterator(0); }

 

Reverse iterator

		class reverse_iterator
		{
		public:
			reverse_iterator(ListNode<T> *p=0) : pos_(p) { }
			
			T &operator*()
			{ return pos_->data; }

			T *operator->()
			{ return &(pos_->data); }

			bool operator!=(const reverse_iterator &rhs)
			{ return this->pos_ != rhs.pos_; }

			reverse_iterator operator++(int)
			{ pos_ = pos_->prev; return *this; }

			reverse_iterator operator--(int)
			{ pos_ = pos_->next; return *this; }

		private:
			ListNode<T> *pos_;
		};

		reverse_iterator rbegin()
		{ return reverse_iterator(tail_); }

		reverse_iterator rend()
		{ return reverse_iterator(0); }

 

C++ auto_ptr for pointer to array

An auto_ptr can wrap a raw pointer but this raw pointer should not be a pointer to an array because auto_ptr will not release memory of the array when it goes out of scope. auto_ptr calls delete on the pointer when it goes out of scope. However for memory pointed to by an array pointer is to be freed then delete[] needs to be called and not delete.

How can you have an auto_ptr for a pointer to array?
Even before you try to answer that question, think whether you really need an auto_ptr for array pointer? Unless you are writing a low level library like STLHow, you most likely don’t need an auto_ptr for array pointer. That exactly is the reason why one is not included in the standard C++ library.

Instead use a Vector.

STLHow internally uses auto_ptr for pointer to array. Its implementation is listed below.
Check out STLHow – the STL that is meant to be readable.

			// an auto pointer to use with arrays
			// lets use arrays in create temp and swap idiom
			template 
			class auto_array
			{
			    public :
					auto_array(T *t) : ptr_( t )
					{ }

					~auto_array()
					{ delete[] ptr_; }

					T *operator->()
					{ return ptr_; }

					T *release()
					{ 
						T *tmp( ptr_ );
						ptr_ = 0;
						return tmp;
					}

					T &operator[](int i)
					{ return ptr_[i]; }

			    private :
					T *ptr_;
			};

How is C++ STL vector implemented internally

There are many open source implementations of STL. However it is difficult to read the STL code because it is meant to be practically usable than readable.

STLHow is an implementation of C++ Standard Template Library (STL) written in easy to understand and human readable code.

A series of articles here on codefreakr will discuss the STLHow to demonstrate “How the STL is implemented”.

Implementing STL Vector

Constructors and Destructor

Vector is implemented as a dynamically allocated array. The memory for this array is allocated in the constructor. As more elements are inserted the array is dynamically increased in size. A constructor without parameter creates an array with a default size. Another constructor with integer parameter creates an array of the specified size. The destructor deletes the memory allocated for the array.

	const int _DEFAULT_VECTOR_SIZE = 4;

	template <typename T>
	class vector
	{
	    public :

			// constructors
			vector()
				: array_( new T[_DEFAULT_VECTOR_SIZE] ),
				  reserved_size_( _DEFAULT_VECTOR_SIZE ),
				  size_( 0 )
			{ }

			vector(int n) // create vector with n default elements
				: array_( new T[n] ),
				  reserved_size_( n ),
				  size_( 0 )
    		{ }

			// destructor
			~vector()
			{ delete[] array_; }

	    private :

		    T *array_;
			int size_;
			int reserved_size_;
	};

Adding elements

The most frequently used function to add elements to vector is push_back. The function adds the element at the end of the vector ie. after the current last element. This is accomplished by putting the element at the size_th position. However that is not sufficient because vector is a dynamically increasing container hence if the currently allocated memory is not sufficient to hold the element then more memory should be allocated. So, see that there is sufficient memory to hold the element, if not allocate more memory and then insert the element.

    template <typename T>
    void vector<T>::push_back(const T &t)
    {
        // if we've run out of space, allocate more memory
        if(size_ == reserved_size_)
            resize(reserved_size_ + _DEFAULT_VECTOR_SIZE);

        // size_ when used as an index, points to the next position after
        // the last element in the vector
        array_[size_] = t;

        // now there is one more element in the vector, so increase the size
        size_++;
    }

The resize function is used to set the size of the reserved memory. Although this function is public and can be called by client code to change the actual size of the memory held by the vector it is used internally for the same purpose. Here is the implementation of the function.

    template <typename T>
    void vector<T>::resize(int n) // make the size of the internal array exactly n
    {
        if(n > reserved_size_) // if requested size is more than the current size
        {
            // allocate a new array of larger capacity
            // copy the old array to new array
            // destroy old array
            auto_array<T> new_array( new T[n] );
            for(int i=0; i<size_; i++)
            {
                new_array[i] = array_[i];
            }

            delete[] array_;
            array_ = new_array.release();
            reserved_size_ = n;
        }
    }

In the above snippet of code, auto_array is an auto_ptr like class. The standard auto_ptr has a destructor which performs delete on the pointer it holds. Hence it is unable to delete an array of objects which requires delete[] instead of delete to free them. The custom auto_ptr class is discussed here.

Forward Iterator

The forward iterator iterates through the vector elements starting from index zero and in increasing index order. Because the elements of the vector are stored in an contiguous array, a pointer of element type can function as a forward iterator. This shows that a simple pointer can work as an iterator hence it is often said that anything that behaves like an iterator is an iterator.

// iterator
typedef T* iterator;

The begin function returns the pointer to the first element in the array and the end function returns pointer to an element past the last element of the array. The end really refers to a memory location that should not be accessed as it is outside the limit of the array. This is the reason why it is advised to not to de-reference the end iterator.

// iterator functions
iterator begin()
{ return array_; }

iterator end()
{ return array_ + size_; }

There is no need to write any special iterator de-reference or access to member operator, as the pointer already has those operators defined.

Reverse Iterator

The reverse iterator iterates through the vector elements starting from the very last element and in decreasing index order. A wrapper class to the plain pointer can do the job of the reverse iterator. The operator++ should perform — and operator– should perform ++ on the plain pointer.

class reverse_iterator
{
    public:
    reverse_iterator(T *p)
    : pos(p) { }

    reverse_iterator()
    : pos(0) { }

    T &amp;operator*()
    { return *pos; }

    T *operator-&gt;()
    { return pos; }

    reverse_iterator operator++(int)
    { pos--; return *this; }

    reverse_iterator operator--(int)
    { pos++; return *this; }

    bool operator!=(const reverse_iterator &amp;rhs)
    { return this-&gt;pos != rhs.pos; }

    private:
    T *pos;
};

The rbegin function returns an object of reverse_iterator that holds pointer to the last element of the array and the rend function returns an object of reverse_iterator that holds pointer to an element before the first element of the array. The location pointed to by the rend is invalid and hence should not be de-referenced.

This article shows the internal implementation of the C++ STL vector container. Not all but only the commonly used features of the vector are described here. Rest of the features can easily be implemented based on this description.

In the next article in the series STLHow describes the implementation of C++ STL List container.