Rust functions are surprisingly diverse, sitting at the intersection of multiple language features which may take time to understand. In this post, we’ll walk through those features and explain how they appear in function signatures, so you can be well-equipped to understand functions you see in the wild, or identify the best way to write the functions you need in your own code.
Table of Contents
- Associated Functions
Describing, not Recommending
This is a survey of what function signatures can look like in Rust, not a commentary on what they should look like. Any one of the patterns shown here may be seen in the wild, and learning to read other people’s code in any language is a valuable skill.
Part of a Series
This is the first of a pair of posts describing how to read Rust function signatures. Part 2, tackling generic functions, is currently in the works.
First things first, we need a function signature. Let’s start with something basic.
This is a function that takes in two 32-bit integers (the
i32 type), and
returns a 32-bit integer as well. The arrow (
->) indicates the return type.
Note that the types are all explicit. Rust does support type inference, but not for function signatures. All parameter and return types must be specified.
We can also see that the pattern comes before the type, separated by a colon. This
<pattern>: <type> syntax matches Rust’s general syntax for
type ascription, which is the Rust term for
specifying types explicitly.
Notice too that I keep saying “pattern” instead of “variable name.” That’s because Rust parameters are patterns, meaning you can destructure and bind against the internal structure of the types.
In this example, the
IpV4Address type is destructured in the function signature, binding four
o4) to the respective items in the tuple struct. These
variables are then used in the function body to print the address.
One restriction of pattern matching in function signatures is that you can only use
irrefutable patterns, meaning patterns that always match.
By contrast, refutable patterns may sometimes fail to match,
perhaps because they specify only a single variant of an
enum with multiple variants.
Refutable patterns can be used in a
match expression or equivalent construct, where the
collection of patterns is checked for exhaustiveness (meaning all values are guaranteed to
match at least one pattern), but patterns in function signatures are all alone, so they must
Here’s a table to help explain:
A survey of pattern refutability.↺
The first three patterns could be found in a function signature. The last two could not.
Another feature of declaring and assigning to new variables in a function signature pattern is that you can create a variable with a name different from the name of the field in the relevant type.
This can be useful when you want to locally use a different name than the name of the field, and do so in a single line rather than binding to a local variable with a new name in the body of the function.
As we use patterns to bind, we can also set the kind of binding, which may be the
default (with no specifier)
ref mut. Collectively, the binding
options are as follows:
Binding syntax and the resulting types↺
The binding pattern specifies how a binding should occur; should it be
by-value (meaning in Rust that it either takes ownership of the bound value, or makes a copy of it,
depending on whether the type of that value implements the
Copy trait), by-reference, or
An owning binding may display one of two behaviors, depending on the type involved in the binding.
If the type implements the
Copy trait, then it will be copied to the new owner at the binding site.
If the type does not implement
Copy, then it will be moved from the prior owner to the new owner
at the binding site.
To understand this, let’s talk about
Copy is a trait indicating a type is “trivially copyable,”
meaning it can be copied with only a call to
memcpy, so all the data contained in the structure
is contiguous; there are no pointers to chase.
Copy tells us that copying a piece of data is fast.
At the same time, it improves ergonomics for certain types which may otherwise be tedious to use under
Rust’s ownership semantics. Imagine if number types (which all implement
Copy) were moved any time
they were assigned. Something as simple as
x = y would invalidate
y, and thus make mathematical code
much more frustrating to write.
Copy pulls double-duty. It tells us something is cheap to copy, and it permits that copying to be
In contexts where a type doesn’t implement
Copy but does implement the
Clone trait, you can instead
.clone() on it explicitly to create a duplicate which will be moved into the new owner, without
invalidating the prior owner.
Now, what does this look like in the context of a binding?
Sometimes this kind of binding is exactly what you want. For example, you may have a need for a “consuming builder,” one of two forms the Builder Pattern can take in Rust. In a consuming builder, the builder type passes ownership of some data to the type that it’s building, because that type will need ownership of the data to operate.
If you take ownership of a piece of data with an owning binding and want to return ownership to the
calling context, you can return it from the function. For example, the popular
once_cell crate features
a type which can only be written to once. If you try to write to it again, it returns ownership of the
value you attempted to set.
Mutable Owning Binding
Sometimes, in addition to taking ownership of a piece of data, you’d like for that data to be mutable from the start as well. In that case, you can use a mutable owning binding.
The use of a mutable owning binding can always be replaced with an immutable owning binding followed by a mutable rebinding to a variable of the same name in the body of the function (shadowing the parameter from that line onward). The choice is one of taste.
Reference bindings inside function signatures in Rust can seem a little unusual, but they are permitted.
The idea is that the binding performed is a reference to the type of the value. If the value was passed
in by value, then it’s either moved or copied as discussed in the owning binding section, and in the body
of the function the value is of a reference to the post-move data (if the type is
Copy, the difference
doesn’t amount to much). This is different from an owning binding of a reference type both for the caller
of the function, and inside the function itself (
x: &Number is not the same as
ref x: Number).
Reference bindings are more useful in the presence of a reference type, along with destructuring. In that case, they permit convenient access to bind-by-reference the internal fields of a type which has been passed by reference.
Mutable Reference Binding
Mutable reference bindings are similar to the above examples for immutable reference bindings, except they’re mutable.
Same as the other reference bindings, they may be considered surprising when used in the presence of a type passed by-value. When working instead with a type passed by reference, there is one additional thing to consider: you can’t get a mutable reference out of a value passed by immutable reference.
Binding vs. Type
Note as well that these bindings are relative to the type on the right hand side of the ascriptive clause. To explain, let’s see some examples, annotated with the resulting types.
Associated functions are functions which are “associated” with a type, meaning they live under the namespace of that type. Otherwise, they behave like normal functions.
Constructors, which usually return the associated type (called
Self, with an uppercase “S”) or some
wrapper of it (like
Result<Self, SomeErrorType> or
Option<Self>), are usually written as associated
functions. It would be perfectly valid, for a typo
Foo, to write a constructor as a free function
(meaning not associated with the type):
However, doing this isn’t ideal Rust style. Instead, you’d use an associated function, like so:
Foo::new has access to the
Self type (which is most convenient for complex
and is called as a path starting at the name of the type.
Deref Collision & Smart Pointers
Another context where associated functions are commonly written is for smart pointers, which are types which wrap another type while still being
usable as if they were the original type. The most common smart pointer types in Rust are
Arc, and they all rely on a special trait called
Deref enables a feature in Rust called
deref coercion, which is used whenever a method call is made. Rust,
at compile time, checks if the method is defined on the type it’s being called with, and on whatever
type may be returned by that type’s
DerefMut implementations (depending on the mutability
self in the method being checked), doing so for however many layers of deref-ing are available.
This is what makes smart pointers easy to use in place of the original type!
However, because of deref coercion, defining methods on the smart pointer may make it difficult to call
any methods on the contained type which have the same name. To avoid this collision, methods on smart
pointer types are often defined as associated functions instead. The
Rc type has multiple examples of this,
with functions like
Rc::strong_count (which returns the number of strong pointers to
the underlying data currently live), being defined as associated functions.
Next, let’s look at methods in Rust. Methods are functions which are attached to a type, meaning they take a parameter called
self. These are distinct from
associated functions syntactically by the presence of the “receiver.”
The receiver is
self, and represents the specific datum of the type
on which the method is being called. The receiver can have a number of possible types, three of which
come which special shorthand syntax because they are the most common options.
List of receiver types and syntactic sugar for them.↺
||Reference counted pointer||None|
||Thread-safe reference counted pointer||None|
||Pinned mutable reference||None|
|…||Nested combinations of any of the above||None|
Each of these has their own distinct meaning, and it’s worthwhile to discuss when and why you’d use each of them.
Taking ownership of
self means that, unless you pass ownership out to a new owner, the
object will be dropped at the end of the function, as its owner has gone out of scope. If the
type implement the
Drop trait, its
Drop::drop implementation will be run to perform any
deallocation or cleanup work necessary. Taking
self by value is commonly used in situations
like the Builder pattern, where you want to consume the builder and return whatever object
it’s designed to build.
self by reference means that
self will be borrowed for the duration of the function call.
Rust’s rules disallow simultaneous mutable and immutable borrows, so if a function takes
reference, the caller will be unable to mutate the object until the function call ends.
Mutable Reference Receiver
Taking self by mutable reference means that
self will be mutably borrowed for the duration of the
function call. As always, Rust’s “aliasing XOR mutability” rule is in play.
Whenever two or more pieces of data from a single struct are borrowed at the same time, Rust performs an analysis to see if the two borrows are disjoint borrows. For example, it is perfectly fine to immutably borrow one field of a struct, and mutably borrow another field of the same struct at the same time, as those two fields are different.
This analysis can be stymied by hiding a borrow behind a method. If one or more of the borrows happens
within a method of the outer type, then the borrows are no longer seen as disjoint, because the method
on the outer type would take
self by some sort of reference. From the perspective of the borrow checker,
with the introduction of a method call, all of
self is now borrowed at the same time as one of its
fields is borrowed, and unlike the original case, these borrows are not disjoint, and do not pass
The same problem arises with partial moves, where a field of a type
is moved, but not the whole type. If a partial move is relocated to a method that takes ownership of
self, then the move is no longer partial in the calling context, which may cause a compilation
error. Dr. David Pearce has a more in-depth guide to partial moves which explains them nicely.
The remaining receiver types are less common, but no less important.
Owning Pointer Receiver
Box<Self> indicates that you’re taking ownership of a pointer to
self. Most of the time
this isn’t necessary, but one particular use case arises when working with unsized types. Rust
requires (and many CPU architectures require) function parameters to have sizes known at compile-time;
because of this, special care must be taken with the treatment of types without a known size. Slices
are one example, because they are an arbitrarily-sized view into a memory location, and trait objects
are another, because the size of the actual data is hidden when the concrete type is erased as part of
trait object construction. When implementing a method for an unsized type, taking a parameter as
self: Self is invalid, because
self: !Sized (this is the notation indicating that self does not
Sized trait). However, taking it as
Box<Self> is valid, because the size of the
pointer is known at compile-time, and it’s now a pointer being passed instead of the underlying data.
Yandros, on the Rust User Forum, has a more thorough explanation of this subject, which I recommend reading for a deeper understanding.
As a summary: Rust generics are
monomorphized, meaning the compiler
generates individual copies of generic code for each each unique set of types it’s called with. On
most computer architectures, function calls require knowing the exact size of the parameters passed
to them, and in the case of unsized types, that size is unknown. So monomorphizing generic code
Self doesn’t implement
Sized doesn’t work in all cases, and is therefore rejected by the
Rust compiler. Wrapping
Self in a
Box or other pointer makes the size known (it’s the size of
a pointer type).
Reference-Counted Pointer Receivers
The two other pointer types are provided for similar reasons.
Arc are respectively the
not-thread-safe and thread-safe versions of a reference-counted pointer, and they provide the same value
as a receiver type that
Box does, with the addition of permitting multiple pointers to exist to the
Pin is a type which indicates the data pointed to by the pointer inside
of it never moves in memory (unless that data implements the
Unpin trait, in which case it may be
safely moved even when inside of a
Pin<&mut self> means that
self is pinned, and may not
move. The context you’re most likely to see this in Rust today is around the
Future trait in the
standard library, which defines a single method where the receiver type is
Pin<&mut Self>. Explaining
why futures need pinning is a more involved topic though, so I recommend reading the
Rust Async Book if you’re interested, as it’s covered in great detail and care there.
Finally, you can nest any of these receiver types as well, so
self: Box<Box<Self>> or
self: Rc<Box<Pin<&mut Self>>> work as receiver types, although these are even less likely to
be necessary than the un-nested versions we’ve just covered.
This covers the basics of reading Rust functions. After reading this post, you should hopefully have a better understanding of some of the following concepts:
- That the left-hand side of each parameter defined in a function signature is an irrefutable pattern which can feature destructuring, renaming, and one of four possible bindings.
- That the right-hand sand of each parameter in a function signature is a type, which may be a reference or owning type, and that the selection of type interacts with the selection of binding to determine whether the actual parameter passed in the calling context is moved into the function, and whether the formal parameter inside the function is a reference or non-reference type.
- That associated functions may be used to put functions inside of the namespace of a particular type, signaling their association to that type, and may include constructors or (in the case of smart pointers) functions which would normally be written as methods, but are written as associated functions to avoid possible naming conflicts due to deref coercion.
- That methods may feature a number of different receiver types, which are selected based on the needs of the function and the future callers of it.
Part 2 will introduce generic functions, and cover topics including:
- Trait bounds, including complex bounds featuring multiple traits.
- Associated types, how they differ from generic types, and how they may be used in trait bounds.
- The “impl Trait” syntax, what it enables in return position, and its syntactic value in non-return position.
- Trait objects, what they are, when they can be created, and what restrictions exist around their use.
- Lifetimes, including the use of lifetime parameters and lifetime bounds.
- Subtype polymorphism, and how it appears and is used in Rust.
- Function parameter types, including function pointers, closures, and the use of Higher-Rank Trait Bounds.
I won’t set a date for when Part 2 will be done, but it is actively in the works. I also haven’t yet decided whether it will include coverage of const generics and specialization, both improvements to Rust’s type system which entail syntactic additions and could be covered in this post, but which are unfinished and unstable, and so may be premature to cover at this time.