Question
In previous blog, I have investigated how to do array initialization efficiently. This time, I would like to investigate how to do intersections in Python efficiently.
For those need to read code immediately, here is the gist for this blog
Methods
Before digging in the implementations, let’s define what intersection is. By intersection, I mean finding same elements between 2 given lists.
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
B = [2, 4, 6, 8, 10]
# c = a ^ b
# which is c = [2, 4, 6, 8]
How do we get
c = [2, 4, 6, 8]
efficiently?M1: double loop
A double loop for finding same elements
def doubleLoop():
C = [a for a in A for b in B if a == b]
def doubleLoop2():
C = []
for a in A:
for b in B:
if a == b:
C.append(a)
Noted
list comprehension
vsappend
M2: Set intersection
Cast
A
and B
to set and do intersection.setA = set(A)
setB = set(B)
def setAnd():
C = sorted(setA.intersection(setB))
def setAnd2():
# No sort
C = setA.intersection(setB)
def setAnd3():
# Inline construct
C = sorted(set(A).intersection(set(B)))
def setAnd4():
# Inline construct and no sort
C = set(A).intersection(set(B))
Noted:
setAnd
ensures the ordering sames with output fromdoubleLoop
ordoubleLoop2
asset
has no concept ofordering
setAnd2
demonstrates the meaning of noordering
setAnd3
demonstrates how slow for casting them in a run time mannersetAnd4
is similar to `setAnd2
Result
Results of running each implementations for
1000000
times# Double loop
# 1.4873290062
# Double loop2
# 1.70801305771
# Set AND
# 0.525309085846
# Set AND2
# 0.199652194977
# Set AND3
# 0.982024908066
# Set AND4
# 0.657478094101
Conclusion
- Always use
set
to do intersections. No matter the results need to be sorted or not, it is always faster than the double loop way. List comprehension
is always faster thanappend
. Check out more from my previous blog
Explanation
setAnd
vs setAnd2
setAnd
do an extra sort comparing tosetAnd2
- As
set
has no ordering, it need to be cast to list before which is an extra cost
- As
setAnd
vs setAnd3
A performance checks between using a pre-cast sets
setA
and setB
and runtime-cast sets set(A)
and set(B)
. Even though the runtime version is slower, it is still faster than double loop
obviously.
setAnd3
vs setAnd4
Same as
setAnd
vs setAnd2
Gotcha
- Set’s intersection works charm on unique elements inside involved lists. Benchmark here may not be applied to the multi-set one
- Set has no concept of ordering. If ordering is a must for you, you need to do it yourself.
沒有留言:
張貼留言