Sum Timestamps

Starting file: sum_timestamps.py.

Tests file: test_sum_timestamps.py.

Here is a program that is summing up the total amount of time, based on timestamps of elapsed times. The timestamps are hours:minutes:seconds, where the hour is optional.

def sum_timestamps(timestamps):
    total_time = 0
    for time in timestamps:
        total_time += parse_time(time)
    return format_time(total_time)

def parse_time(time_string):
    sections = time_string.split(':')
    if len(sections) == 2:
        seconds = int(sections[1])
        minutes = int(sections[0])
        hours = 0
    else:
        seconds = int(sections[2])
        minutes = int(sections[1])
        hours = int(sections[0])
    return hours*3600 + minutes*60 + seconds

def format_time(total_seconds):
    hours = str(int(total_seconds / 3600))
    minutes = str(int(total_seconds / 60) % 60)
    seconds = str(total_seconds % 60)
    if len(minutes) < 2 and hours != "0":
        minutes = "0" + minutes
    if len(seconds) < 2:
        seconds = "0" + seconds
    time = minutes + ":" + seconds
    if hours != "0":
        time = hours + ":" + time
    return time

First let’s look at the sum_timestamps function.

In our sum_timestamps function, we’re converting timestamps to seconds and then summing up those seconds in our sum_timestamps function. We could use the built-in sum function in Python if we built up a list of these timestamps.

def sum_timestamps(timestamps):
    times = []
    for time in timestamps:
        times.append(parse_time(time))
    total_time = sum(times)
    return format_time(total_time)

That might seem like an odd step, but what does this allow us to do? What does this code look like? For those who’ve been in one of my classes before.

We could copy-paste this into a list comprehension. What could we do after that?

We could turn that comprehension into a generator expression because we’re only looping over it once.

def sum_timestamps(timestamps):
    total_time = sum(
        parse_time(time)
        for time in timestamps
    )
    return format_time(total_time)

We could combine this all to one line of code:

def sum_timestamps(timestamps):
    return format_time(sum(parse_time(t) for t in timestamps))

I’m not a huge fan of this one-liner, but it’s not so bad. I’d probably prefer the multi-line approach personally.

Can we improve the format_time function, which takes the total number of seconds and formats it back into hours:minutes:seconds for us?

We could use f strings with it!

def format_time(total_seconds):
    hours = str(int(total_seconds / 3600))
    minutes = str(int(total_seconds / 60) % 60)
    seconds = str(total_seconds % 60)
    if len(minutes) < 2 and hours != "0":
        minutes = "0" + minutes
    if len(seconds) < 2:
        seconds = "0" + seconds
    time = f"{minutes}:{seconds}"
    if hours != "0":
        time = f"{hours}:{time}"
    return time

What else? What are all those if statements doing?

We’re trying to zero-pad our numbers. We could do those with our string formatting:

def format_time(total_seconds):
    hours = int(total_seconds / 3600)
    minutes = int(total_seconds / 60) % 60
    seconds = total_seconds % 60
    if hours:
        time = f"{hours}:{minutes:02d}:{seconds:02d}"
    else:
        time = f"{minutes}:{seconds:02d}"
    return time

Notice we removed our string conversions because we need those integers as-is in order to zero-pad them in our f-strings.

We could combine those divisions and modulos by using the built-in function divmod:

def format_time(total_seconds):
    minutes, seconds = divmod(total_seconds, 60)
    hours, minutes = divmod(minutes, 60)
    if hours:
        time = f"{hours}:{minutes:02d}:{seconds:02d}"
    else:
        time = f"{minutes}:{seconds:02d}"
    return time

If we wanted to, we could use an inline if statement here (Python’s equivalent of a ternary expression):

def format_time(total_seconds):
    minutes, seconds = divmod(total_seconds, 60)
    hours, minutes = divmod(minutes, 60)
    return (
        f"{hours}:{minutes:02d}:{seconds:02d}"
        if hours else
        f"{minutes}:{seconds:02d}"
    )

I doubt many of you prefer this approach. I could go either way. I like that it’s compact but that’s also the thing I don’t like about it.

What about parse_time function? We could remove those redundant int calls by moving them into our return statement:

def parse_time(time_string):
    sections = time_string.split(':')
    if len(sections) == 2:
        seconds = sections[1]
        minutes = sections[0]
        hours = 0
    else:
        seconds = sections[2]
        minutes = sections[1]
        hours = sections[0]
    return int(hours)*3600 + int(minutes)*60 + int(seconds)

What else could we do here? When you see hard-coded indexes what does that make you think of?

We can usually replace hard-coded indexes with tuple unpacking, also known as multiple assignment:

def parse_time(time_string):
    sections = time_string.split(':')
    if len(sections) == 2:
        minutes, seconds = sections
        hours = 0
    else:
        hours, minutes, seconds = sections
    return int(hours)*3600 + int(minutes)*60 + int(seconds)

If we wanted to, we could do our hours assignment all on the same line as our minutes and seconds assignment:

def parse_time(time_string):
    sections = time_string.split(':')
    if len(sections) == 2:
        hours, (minutes, seconds) = 0, sections
    else:
        hours, minutes, seconds = sections
    return int(hours)*3600 + int(minutes)*60 + int(seconds)

We could even use an asterisk to write that like this:

def parse_time(time_string):
    sections = time_string.split(':')
    if len(sections) == 2:
        hours, minutes, seconds = 0, *sections
    else:
        hours, minutes, seconds = sections
    return int(hours)*3600 + int(minutes)*60 + int(seconds)

This doesn’t change much, but I do like that we have some symmetry between our if and our else blocks here.

Alright anything else we could do here? What other way is there to parse strings?

Regular expressions!

import re


TIME_RE = re.compile(r'^(?:(\d+):)?(\d+):(\d+)$')


def parse_time(time_string):
    hours, minutes, seconds = TIME_RE.search(time_string).groups()
    if not hours:
        hours = 0
    return int(hours)*3600 + int(minutes)*60 + int(seconds)

Our function is a bit shorter but that regular expression looks a bit intimidating.

If you’re writing a regular expression use VERBOSE mode so you can put whitespace and comments in your regular expression:

import re


TIME_RE = re.compile(r'''
    ^
    (?:             # Optional Hours :
        ( \d + )
        :
    )?
    ( \d + )        # Minutes
    :               # :
    ( \d + )        # Seconds
    $
''', re.VERBOSE)


def parse_time(time_string):
    hours, minutes, seconds = TIME_RE.search(time_string).groups()
    return int(hours or 0)*3600 + int(minutes)*60 + int(seconds)

Regular expressions are code, except each symbol is a comment and we usually write them without whitespace or comments.

Instead of writing regular expressions without whitespace or comments, use VERBOSE mode so that when you might have a chance of understanding your regular expression when you go back and look at it weeks, months, or years later.

Regular expression are great, but we could use the Python standard library to parse these timestamps instead. Python’s datetime module has a timedelta class for this:

from datetime import timedelta
import re


def sum_timestamps(timestamps):
    deltas = (parse_time(t) for t in timestamps)
    total_time = sum(deltas, timedelta(0))
    time = str(total_time)[2:]
    return time[1:] if time.startswith('0') else time


TIME_RE = re.compile(r'''
    ^
    (?:             # Optional Hours :
        ( \d + )
        :
    )?
    ( \d + )        # Minutes
    :               # :
    ( \d + )        # Seconds
    $
''', re.VERBOSE)


def parse_time(time_string):
    hours, minutes, seconds = TIME_RE.search(time_string).groups()
    return timedelta(
        hours=int(hours or 0),
        minutes=int(minutes),
        seconds=int(seconds),
    )

Python’s timedelta objects are great, but I don’t find this change considerably more readable. There’s a lot of weird stuff going on here with our [1:] slicing and startswith('0') check.

Tests

You can run the tests for this exercise with:

$ python test_sum_timestamps.py

The test file includes several test cases:

import unittest

from sum_timestamps import sum_timestamps


class SumTimeStampsTests(unittest.TestCase):

    """Tests for sum_timestamps."""

    def test_single_timestamp(self):
        self.assertEqual(sum_timestamps(['02:01']), '2:01')
        self.assertEqual(sum_timestamps(['2:01']), '2:01')

    def test_multiple_timestamps(self):
        self.assertEqual(sum_timestamps(['02:01', '04:05']), '6:06')
        self.assertEqual(sum_timestamps(['9:38', '4:45', '3:52']), '18:15')

    def test_many_timestamps(self):
        times = [
            '3:52', '3:29', '3:23', '4:05', '3:24', '2:29', '2:16', '2:44',
            '1:58', '3:21', '2:51', '2:53', '2:51', '3:32', '3:20', '2:40',
            '2:50', '3:24', '3:22', '0:42']
        self.assertEqual(sum_timestamps(times), '59:26')

    def test_no_minutes(self):
        self.assertEqual(sum_timestamps(['00:01', '00:05']), '0:06')
        self.assertEqual(sum_timestamps(['0:38', '0:15']), '0:53')

    # To test the Bonus part of this exercise, comment out the following line
    @unittest.expectedFailure
    def test_timestamps_over_an_hour(self):
        times = [
            '3:52', '3:29', '3:23', '4:05', '3:24', '2:29', '2:16', '2:44',
            '1:58', '3:21', '2:51', '2:53', '2:51', '3:32', '3:20', '2:40',
            '2:50', '3:24', '1:20', '3:22', '3:26', '0:42', '5:20']
        self.assertEqual(sum_timestamps(times), '1:09:32')
        times2 = [
            '50:52', '34:29', '36:23', '47:05', '32:24', '20:29', '22:16',
            '23:44', '19:58', '30:21', '24:51', '22:53', '23:51', '34:32',
            '36:20', '25:40', '27:50', '39:24', '18:20', '36:22', '4:00',
        ]
        self.assertEqual(sum_timestamps(times2), '10:12:04')

    # To test the Bonus part of this exercise, comment out the following line
    @unittest.expectedFailure
    def test_allow_optional_hour(self):
        self.assertEqual(sum_timestamps(['1:02:01', '04:05']), '1:06:06')
        self.assertEqual(
            sum_timestamps(['9:05:00', '4:45:10', '3:52']),
            '13:54:02',
        )


if __name__ == "__main__":
    unittest.main(verbosity=2)

Note that some tests are marked with @unittest.expectedFailure because they test bonus functionality that isn’t implemented in the initial version. As you refactor the code, you can uncomment these decorators to enable the additional tests.