Question
In previous blog, I have investigated how to do array initialization efficiently. This time, I would like to investigate how to do intersections in Python efficiently.
For those need to read code immediately, here is the gist for this blog
Methods
Before digging in the implementations, let’s define what intersection is. By intersection, I mean finding same elements between 2 given lists.
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
B = [2, 4, 6, 8, 10]
# c = a ^ b
# which is c = [2, 4, 6, 8]
How do we get 
c = [2, 4, 6, 8] efficiently?M1: double loop
A double loop for finding same elements
def doubleLoop():
    C = [a for a in A for b in B if a == b]
def doubleLoop2():
    C = []
    for a in A:
        for b in B:
            if a == b:
                C.append(a)
Noted
- list comprehensionvs- append
M2: Set intersection
Cast 
A and B to set and do intersection.setA = set(A)
setB = set(B)
def setAnd():
    C = sorted(setA.intersection(setB))
def setAnd2():
    # No sort
    C = setA.intersection(setB)
def setAnd3():
    # Inline construct
    C = sorted(set(A).intersection(set(B)))
def setAnd4():
    # Inline construct and no sort
    C = set(A).intersection(set(B))
Noted:
- setAndensures the ordering sames with output from- doubleLoopor- doubleLoop2as- sethas no concept of- ordering
- setAnd2demonstrates the meaning of no- ordering
- setAnd3demonstrates how slow for casting them in a run time manner
- setAnd4is similar to `setAnd2
Result
Results of running each implementations for 
1000000 times# Double loop
# 1.4873290062
# Double loop2
# 1.70801305771
# Set AND
# 0.525309085846
# Set AND2
# 0.199652194977
# Set AND3
# 0.982024908066
# Set AND4
# 0.657478094101
Conclusion
- Always use setto do intersections. No matter the results need to be sorted or not, it is always faster than the double loop way.
- List comprehensionis always faster than- append. Check out more from my previous blog
Explanation
setAnd vs setAnd2
- setAnddo an extra sort comparing to- setAnd2- As sethas no ordering, it need to be cast to list before which is an extra cost
 
- As 
setAnd vs setAnd3
A performance checks between using a pre-cast sets 
setA and setB and runtime-cast sets set(A) and set(B). Even though the runtime version is slower, it is still faster than double loop obviously.
setAnd3 vs setAnd4
Same as 
setAnd vs setAnd2Gotcha
- Set’s intersection works charm on unique elements inside involved lists. Benchmark here may not be applied to the multi-set one
- Set has no concept of ordering. If ordering is a must for you, you need to do it yourself.
沒有留言:
張貼留言