Moving, Rather than Copying, Objects

Motivation: Match handcrafted code

Let’s say we are writing some seemingly harmless code such as the following:

class Matrix { ... };

// Some scope:
{
    Matrix L = ...;
    Matrix M = ...;
    Matrix N = L + M;
    ...
}

C++ abstractions are a wonderful thing: The addition operator is called to evaluate L + M, and the assignment operator is called to assign the result to Matrix N.

But we are worried. Depending on how often this code runs, it may be critical to ensure that the intermediate object computed by the addition operator is not merely used to initialize Matrix N and then get destroyed.

In some string manipulation code, we have a similar worry.

{
    std::string a, b, c;
    ...
    a = b + c;
}

We are concerned that b + c will be concatenated into a temporary, which will then be copied into a while a 's previous contents and the temporary head to the destructor.

The same problem can be compounded. Suppose we use a function readData that returns a container of strings.

typedef std::vector<std::string> VS;
VS readData(char* filename);

int myread()
{
    ...
    VS stringVector = readData("myfile");
    ...
}

Here also we wonder if there is a way to be certain that an intermediate variable will not be constructed, copied, and then destroyed.

Even with a compiler preceding C++11, Return Value Optimization may have been eliminating copy construction, but can we guarantee the outcome?

A few observations

Observation 1: Nontrivial copy is needed

The matrix may be implemented using the following declaration.

template<typename NumberType,
         size_t ncols,
         size_t nrows>
class Matrix
{
    NumberType *data;
    ...
};

We are particularly concerned about the price of copying if large objects such as matrices are involved, but we are also concerned if small temporary objects are destroyed frequently.

Observation 2: Spare the contents of objects heading towards destruction

In the code

{
    Matrix L = ...;
    Matrix M = ...;
    Matrix N = L + M;
    ...
}

we’d like the L + M temporary to be salvaged to construct N.

Observation 3: Named vs. nameless objects

In the following code, L is the name of a matrix object. M is the name of the pointer to a matrix object. The matrix object itself is nameless. Temporaries such as L + (*M) are always nameless.

Matrix getMatrix()
{
    Matrix L = ...;        // L: named
    Matrix *M  = new Matrix(...);
                           // *M: nameless
    Matrix P = L + (*M);   // L + (*M): nameless
    Matrix Q = L - (*M);   // L - (*M): nameless
}

We’d like to be able to salvage named and nameless objects alike.

Some potential solutions

Backwards, nonidiomatic solution 1

We could choose to write an add function for the operation and ensure that it is implemented optimally.

class Matrix
{
    ...
    friend void add(const Matrix& left,
                    const Matrix& right,
                    Matrix& result);
};

{
    Matrix L = ...;
    Matrix M = ...;
    Matrix N = ...;
    add(L, M, N);
}

But this approach has serious drawbacks:

We abandon constness, immutability, and operator overloading.
The constructor would have in any case been already called.
The code is nonidiomatic.

Terrible, leak-prone solution 2

To avoid an unnecessary construction, we could let the add() function create the object.

class Matrix
{
    ...
    friend void add(const Matrix& left,
                    const Matrix& right,
                    Matrix *result);
};

{
    Matrix L = ...;
    Matrix M = ...;
    Matrix *N;
    add(L, M, N);
    /*with luck*/ delete N;
}

Doing so is a recipe for memory leaks.

Solution 3: Devise a new type to return

The solution is instead to create a new type that operator+() can return. Let’s call that type (Sacrificial)Matrix.

class Matrix { ... };

(Sacrificial)Matrix
operator+(const Matrix& left,
          const Matrix& right);

Here we define a new token Sacrificial (in the same vein as const). Sacrificial specifies that a returned object is heading towards destruction, and that its contents can be salvaged.

Returned objects are in any case sacrificial

Observe that we do not need to explicitly specify that a given returned object is sacrificial.

int f();

{
    f();
}
//--------------------------------
class Matrix;

Matrix getMatrix();

{
    getMatrix() = Matrix(...);
}

But also notice that:

It’s all right to ignore returned objects.
It’s also all right to assign to the returned object.
Rvalues may appear on the lhs.

All returned objects are by definition temporaries; they are heading to the destructor. Indeed nothing stops us from writing getMatrix() = ….

Solution 4: Devise a new reference type for parameters

Even though there is no need to specify the (Sacrificial) property for returned objects, we do need to be explicit about specifying it for the parameters.

class Matrix
{
    Matrix();
    Matrix((Sacrificial)Matrix M);
    Matrix operator=((Sacrificial)Matrix M);
};

The new (Sacrificial) parameter type is to be used to define a sacrificial constructor and a sacrificial assignment operator. In both cases they signal that the object being passed is to be used for construction or assignment.

Sacrificial Constructor

class Matrix
{
public:
    ...
    Matrix((Sacrificial)Matrix that)
        : data(that.data)
    {
        that.data = 0;
        // Also set nrows & ncols.
    }
private:
    float *data;
    // ...
};

Because the ```that’' argument is known to be sacrificial, a shallow copy is now exactly what we need!

Overload the Constructor

class Matrix
{
    ...
    Matrix(const Matrix& that);
    Matrix((Sacrificial)Matrix that);
};

Matrix operator+(const Matrix& A,
                 const Matrix& B);

{
    ...
    Matrix N = L + M;
}

After overloading the constructor, L+M will match (Sacrificial)Matrix, not const Matrix&.

Presume a large heap and a small stack

class MyType
{
    float *heapData; // large
    float stackData[(size_t) 1e6];
    MyType(const MyType& that);
    MyType((Sacrificial)MyType that);
};

Stack consumption should remain nimble.
Classes should be designed to facilitate their sacrifice.

Amnesty For Named Sacrificials

void f1(const Matrix& M);
void f1((Sacrificial)Matrix M);

void f2((Sacrificial)Matrix M)
{
    f1(M); // M is named and must be spared...
    f1(M); // ...to remain intact here.
}

Amnesty must be provided for named sacrificials. If the call to f1(M) to the overloaded f1 resolves to f1((Sacrificial)Matrix M), the second call to f1 becomes deceivingly incorrect.

Hence the parameter type in

void f1((Sacrificial)Matrix M);

is polymorphic. It depends on the argument.

Permit overriding the default amnesty

What if we want to insist that an object is no longer needed and that it can be safely destroyed? We define a function std::makeSacrificial that casts an object to make it sacrificial.

void f1(const Matrix& M);
void f1((Sacrificial)Matrix M);

void f2((Sacrificial)Matrix M)
{
    f1(M);
    f1(std::makeSacrificial(M));
}

Use of `makeSacrificial` in `swap`

Sacrificial objects are useful in a swap function, where a deep copy is wasteful and a shallow copy is what is needed.

template<typename T>
void swap(T & a, T & b)
{
    T tmp = a;
    a = b;
    b = tmp;
}

Using makeSacrificial specifies that the parameter passed will no longer be needed.

Use a brief sign as the `(Sacrificial)` qualifier

Replace (Sacrificial)Matrix with Matrix&& and std::makeSacrificial(..) with std::move(..)}

class Matrix {
    Matrix(const Matrix& that);
    Matrix(Matrix&& that);
};
template<typename T>
void swap(T & a, T & b)
{
    T tmp = std::makeSacrificial(a);
    a = std::makeSacrificial(b);
    b = std::makeSacrificial(tmp);
}

The actual token for (Sacrificial) is && (pronounced ``ref ref'').
Matrix::Matrix(Matrix&& that) is a ``move constructor''.

Expression Classification

Figure 1. Expression types

The expression types are:

lvalues (`left'' values) `Matrix& getMatrix();
xvalues (eXpiring values) Matrix&& getMatrix();, std::move(…);
prvalues (pure rvalues) Matrix getMatrix();, temporaries, literals (prvalues in C11 correspond to rvalues in C98/03)
glvalues (``generalized'' lvalues) lvalues ∪ xvalues
rvalues (``right'' values) prvalues ∪ xvalues

References

Bjarne Stroustrup, The C++ Programming Language, Addison-Wesley, 2013, 4th ed.
Nicolai M. Josuttis, The C++ Standard Library, Addison-Wesley, 2012, 2nd ed.
Working Draft, Standard for Programming Language C++, n3337, ISO/IEC, 2012, Jan 16.
Scott Meyers, Universal References in C++, 2012, Oct, Overload, pp. 8—12.
Andrei Alexandrescu, Generic<programming>: move constructors, 2003, Feb, Dr. Dobb’s Journal, http://www.drdobbs.com/move-constructors/184403855
Howard E. Hinnant and Peter Dimov and Dave Abrahams, A Proposal to Add Move Semantics Support to the C++ Language, ISO/IEC JTC1/SC22/WG21, N1377=02-0035, Sep, 2002, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2002/n1377.htm

Back to CodingNirvana.ca