JOHN STEEN

What's the fastest way to map a list of values in Python?

I was curious what the fastest way is to map a list of values in Python, so I decided to run a little experiment.

It turns out that when Elon finally gets to Mars, he decides to create a new temperature scale that betters reflects the colder median temperature there.

It's measured in Musk degrees (°M).

We have a list of Mars temperature measurements in Fahrenheit, but need to convert all of the temps to Musk to perform a useful analysis for Elon.

The formula to convert Fahrenheit to Musk is:

°M = (°F + 32) * (9 / 7)

We are going to be dealing with a lot of data, so I decided to try several different ways to map our list from °F to °M and then compare the relative performances.

Test data

I randomly generated 1,000 integers in a range of values between -1,000 and 1,000, representing Fahrenheit temperatures.


import random

random_temps = [random.randint(-1000, 1000) for _ in range(1000)]
    

Conversion function

I abstracted the conversion logic into a helper function fahr_to_musk so that we can compare performance with and without function calls.

Using this function will increase readability, but we will have to see how it affects execution time.


def fahr_to_musk(f_temp):
    return (f_temp + 32) * (9 / 7)
    

Approach 1 - For loop

Our main goal is to take an input list of values, modify each element in the list with the same logic and then return a new list with the modified values.

Arguably, the simplest way to do this is by iterating through the list of values with a for loop, applying our conversion calculation to each one and then returning a new list of the modified values (in Musk):


def map_for_loop():
    mapped_temps = []

    for temp in random_temps:
        musk_temp = (temp + 32) * (9 / 7)
        mapped_temps.append(musk_temp)

    return mapped_temps
    

Approach 2 - For loop with function call

It seems like we could improve the readability of our for loop by abstracting the conversion calculation into the helper function fahr_to_musk.

This might raise a yellow flag for us since Python tends to have a relatively high overhead for function calls, but let's create the function and then include it in our performance testing to make sure.


def map_for_loop_fn_call():
    mapped_temps = []

    for temp in random_temps:
        musk_temp = fahr_to_musk(temp)
        mapped_temps.append(musk_temp)

    return mapped_temps
    

Approach 3 - Built-in map() function

Python's built-in map() function takes in an iterable and a function, applies the function to every element and then returns an iterator with all of the modified elements.

map(function, iterable, ...)

This seems like a natural choice for what we're wanting to do, so let's include it in our tests.

We're passing the function fahr_to_musk to map as the first argument and the list of fahrenheit temperatures as the iterable.

Since map returns an iterator and we want a list, we'll need to create a list from the return iterator.


def map_built_in():
    mapped_temps = list(map(fahr_to_musk, random_temps))
    return mapped_temps
    

Approach 4 - Map built-in with lambda

In lieu of a named function, map() allows us to pass a lambda expression, which is basically just syntactic sugar for an anonymous function.

It'll be interesting to see how map with lambdas compares to map with a named function call, so we include it as one of our approaches.


def map_built_in_lambda():
    mapped_temps = list(map(lambda t: (t + 32) * (9 / 7), random_temps))
    return mapped_temps
    

Approach 5 - List comprehension

List comprehensions are often an intuitive and readable way to create lists from other lists.

Similar to the map built-in, they apply specified operations to each element in a list or iterable and return the result as a new list.

In our first list comprehension, we specify the conversion operations inline as opposed to a function call:


def map_list_comprehension():
    mapped_temps = [((temp + 32) * (9 / 7)) for temp in random_temps]
    return mapped_temps
    

Approach 6 - List comprehension with function call

Finally, let's see what it looks like to simplify our conversion calculation in the list comprehension by replacing it with a function call:


def map_list_comprehension_fn_call():
    mapped_temps = [fahr_to_musk(temp) for temp in random_temps]
    return mapped_temps
    

Testing

Finally, it's time to test out our various map approaches.

We'll use Python's timeit library to test each map function one million times with a set of 1,000 random int arguments.

Here is the full code that we will use:


import timeit
import random

random_temps = [random.randint(-1000, 1000) for _ in range(1000)]

def fahr_to_musk(f_temp):
    return (f_temp + 32) * (9 / 7)

def map_for_loop():
    mapped_temps = []

    for temp in random_temps:
        musk_temp = (temp + 32) * (9 / 7)
        mapped_temps.append(musk_temp)
    
    return mapped_temps

def map_for_loop_fn_call():
    mapped_temps = []

    for temp in random_temps:
        musk_temp = fahr_to_musk(temp)
        mapped_temps.append(musk_temp)
    
    return mapped_temps

def map_built_in():
    mapped_temps = list(map(fahr_to_musk, random_temps))
    return mapped_temps

def map_built_in_lambda():
    mapped_temps = list(map(lambda t: (t + 32) * (9 / 7), random_temps))
    return mapped_temps

def map_list_comprehension():
    mapped_temps = [((temp + 32) * (9 / 7)) for temp in random_temps]
    return mapped_temps

def map_list_comprehension_fn_call():
    mapped_temps = [fahr_to_musk(temp) for temp in random_temps]
    return mapped_temps

t_map_for_loop = timeit.timeit(map_for_loop)
t_map_for_loop_fn_call = timeit.timeit(map_for_loop_fn_call)
t_map_built_in = timeit.timeit(map_built_in)
t_map_built_in_lambda = timeit.timeit(map_built_in_lambda)
t_map_list_comprehension = timeit.timeit(map_list_comprehension)
t_map_list_comprehension_fn_call = timeit.timeit(map_list_comprehension_fn_call)

print('\n')
print('Execution time (s) with 1,000,000 iterations')
print('------------------------------------')
print(f'map_for_loop: {t_map_for_loop}s')
print(f'map_for_loop_fn_call: {t_map_for_loop_fn_call}s')
print(f'map_built_in: {t_map_built_in}s')
print(f'map_built_in_lambda: {t_map_built_in_lambda}s')
print(f'map_list_comprehension: {t_map_list_comprehension}s')
print(f'map_list_comprehension_fn_call: {t_map_list_comprehension_fn_call}s')
    

Results

After running each of our map functions one million times, here are the results:


Execution time (s) with 1,000,000 iterations
------------------------------------
map_for_loop: 120.05264791500001s
map_for_loop_fn_call: 185.766175365s
map_built_in: 111.13615633700005s
map_built_in_lambda: 113.82378142899995s
map_list_comprehension: 80.34981999499996s
map_list_comprehension_fn_call: 143.94168193300004s

    

Interesting!

First of all, notice how much faster the map_list_comprehension approach is compared to all others.

It beats the next fastest approach (map_built_in) by ~31 seconds! And it's more than twice as fast as map_for_loop_fn_call.

Why is it so much faster? Mainly, since we're not making any function calls. Function call overhead can become very costly over time as we see with these other functions, and we are able to significantly reduce execution time by performing all operations inline in this case.

Notice that the next performant approaches are map_built_in and map_built_in_lambda.

The execution times for these approaches are pretty much equivalent. This makes sense, because they are both applying a function to every element in the list. An anonymous lambda function and the named function seem to generate comparable overheads.

This suggests that it's up to the developer on whether a lambda or named function makes more sense from a readability or maintainability perspective.

Another interesting observation is how much worse the list comprehension performs when include a function call vs. having the operations inline (almost twice as bad).

If I was optimizing for performance and debating between a named function in a list comprehension vs. a named function in the built-in map function, I'd choose the map function any day of the week given these results.

The for loop approach without a function call had "okay" performance, but I don't see a reason to use this approach given how readable and performant list comprehensions and built-in map approaches are.

I definitely would not use a for loop that calls a function for every element when mapping. This by far had the worst performance, lagging 43 seconds behind the next worst approach.

Final thoughts

The major tradeoff when mapping elements to a new list is readability vs. performance.

If I was performing simple operations across a reasonable number of elements, I would choose list comprehensions with inline operations because they are both readable and performant.

If I wanted to further improve readability and abstract the function logic into an external function, I would probably use the built-in map function since it strikes a decent balance between both goals.

For further experimentation, I'd be interested to see how using bit operations would affect the performance, as well as how the relative execution times are affected by a larger number of input elements.

Either way, the main takeaway is to experiment and test. This way, we can help Elon with his mission of making humans a multi-planetary species, one Musk degree at a time.

Join my private email list

To get future posts in your inbox, sign up for my private email list here.