Statistics
We have a module with some basic statistics helper functions.
def mean(x):
t = 0
for n in x:
t += n
meanavg = t / len(x)
return meanavg
def median(x):
if len(x)%2 != 0:
return sorted(x)[int(len(x)/2)]
else:
midavg = (sorted(x)[int(len(x)/2)] + sorted(x)[int(len(x)/2-1)])/2.0
return midavg
def mode(x):
o = {}
for n in x:
if n not in o:
o[n] = 0
o[n] += 1
m = max(o.values())
modeavg = [k for (k, v) in o.items() if v == m][0]
return modeavg
if __name__ == "__main__":
values = [5, 8, 2, 7, 2, 60]
assert mean(values) == 14
assert median(values) == 6
assert mode(values) == 2
print("Tests passed.")
Let’s start by refactoring mean:
Use descriptive and meaningful names for
numbersandtotal: PEP 8 and “Readability counts.”Remove unnecessary variable which is immediately returned: “Simple is better than complex.”
def mean(numbers):
total = 0
for n in numbers:
total += n
return total / len(numbers)
Use
suminstead of a for loop to calculatetotal: “Simple is better than complex.”Remove unnecessary
totalvariable
def mean(numbers):
return sum(numbers) / len(numbers)
Now let’s work on median:
Use more descriptive variable name (
numbers): PEP 8 and “Readability counts.”Reduce inefficiency and repetition by only sorting numbers once: “Simple is better than complex.”
Reduce repetition by making
lengthvariableRemove unnecessary variable before
return
def median(numbers):
numbers = sorted(numbers)
length = len(numbers)
if length%2 != 0:
return numbers[int(length/2)]
else:
return (numbers[int(length/2)] + numbers[int(length/2-1)])/2.0
Add more space around
%operator for readability: PEP 8Add
mid_pointvariable to reduce repetitionSpace out
/operator and simplify2.0to2: PEP 8
def median(numbers):
numbers = sorted(numbers)
length = len(numbers)
mid_point = int(length/2)
if length % 2 != 0:
return numbers[mid_point]
else:
return (numbers[mid_point] + numbers[mid_point - 1]) / 2
Let’s refactor mode:
Rename
xtonumbers,otooccurrences, andmtomostRemove unnecessary
modeavgvariableReplace
ifstatement withsetdefaultConsider replacing
setdefaultwithget
def mode(numbers):
occurrences = {}
for n in numbers:
occurrences[n] = occurrences.get(n, 0) + 1
most = max(occurrences.values())
return [k for (k, v) in occurrences.items() if v == most][0]
Use
defaultdictinsteadSwitch from
defaultdictto single-lineCounterstatement
from collections import Counter
def mode(numbers):
occurrences = Counter(numbers)
most = max(occurrences.values())
return [k for (k, v) in occurrences.items() if v == most][0]
Use
most_commonmethod to get the most commonly occurring numberConsider whether to squash two lines into one
return
def mode(numbers):
value, _ = Counter(numbers).most_common(1)[0]
return value
This actually isn’t the most Pythonic implementation of mean, median, and mode. It would be most Pythonic to just use the versions built-in to the standard library (since Python 3.4):
>>> from statistics import mean, median, mode
>>> numbers = [5, 8, 2, 7, 2, 60]
>>> mean(numbers)
14.0
>>> median(numbers)
6.0
>>> mode(numbers)
2