2017/07/23

Copy with or without symbolic links in local or remote manner

About

Have you ever puzzled about how to deal with symbolic files when copying files? Here are some useful commands for you. You can also find them in this gist

Background

Given a directory structure like this:
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
├── s1 -> a
└── s2 -> b
3 directories a, b and c. 2 symbolic links s1 and s2

Local

Copy files / directories locally
Suppose you need to copy from src to dst, you can run following:
$ cp -r src/* dst/
$ tree dst
dst
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
├── s1 -> a
└── s2 -> b
Suppose you need to copy without a symbolic link but a real directory, you can run following:
$ cp -rL src/* dst/
$ tree dst
dst
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
├── s1
│   └── a.txt
└── s2
    └── b.txt

Remote

Copy files / directories from host to host. We need command scp.
Suppose you need to copy from src to dst, you can try tar scp combo:
# Tar the entire folder
$ tar cf src.tar.gz src/

# Scp to the remote side
# Remember to replace `localhost` and `pwd` with your own variables
$ scp src.tar.gz localhost:`pwd`/subdir/

# Untar the zip in `subdir/src.tar.gz` in folder `subdir`
$ tar -x -C subdir/ -f subdir/src.tar.gz

# Rename the `subdir/src` to `subdir/dst`
$ mv subdir/src subdir/dst

$ tree subdir/dst
subdir/dst/
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
├── s1 -> a
└── s2 -> b
Or, you can do them on the fly:
# Open a new terminal for the destination host and run following command
# Use NC to listen on any port (EG: 12345) and untar it on the fly
$ nc -l 12345 | tar xf -

# Open another new terminal for the source host and run following command
# Tar the entire folder and send them through netcat
# Remember to replace `localhost` and `pwd` with your own variables
$ tar cf - src/ | nc localhost 12345

# Verify in destionation host
$ mv src dst
$ tree dst
dst
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
├── s1 -> a
└── s2 -> b
Suppose you need to copy without a symbolic link but a real directory, you can run following:
# Scp to the remote side
# Remember to replace `localhost` and `pwd` with your own variables
$  scp -r src/* localhost:`pwd`/dst
$ tree dst
dst/
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
├── s1
│   └── a.txt
└── s2
    └── b.txt

2017/07/11

A study on SSL certificates

Background

Recently, I have a chance to deploy a real signed SSL certificate on company’s Gitlab server. By following this link from Gitlab, browser shows a lovely lock icon easily.
However, there is a trouble when configuring container registry with such certificate. Gitlab runners throw an error about unable to get issuer certificate. Just like this and that.
Eventually, I find out this is a problem from intermediate certificates and fix it with some struggles. Below are the glossaries I have crawled.

Glossaries

SSL certificates

aka X.509 certificate

Encoding

  • DER
    • a binary encoded certificate
  • PEM
    • BASE 64 ASCII encoded certificate
    • Contains line ----BEGIN----
Usually, you would say I have a DER encoded certificate instead of DER certificate.

Extension

  • CRT
    • Can be encoded with DER, PEM
    • Common on Linux
  • CER
    • Similar to CRT
    • Common on Window
  • KEY
    • Public key or private key

Actions

  • View
    • get human readable string
      • PEM
        • openssl x509 -in cert.xxx -text -noout
      • DER
        • openssl x509 -in certificate.der -inform der -text -noout
  • Transform
    • DER 2 PEM
      • openssl x509 -in cert.crt -inform der -outform pem -out cert.pem
    • PEM 2 DER
      • openssl x509 -in cert.crt -outform der -out cert.der
  • Combinations
    • Concatenate multiple certificates in one file
    • EG: Combine intermediate certificates with your certificate

CSR

  • Certificate Signing Request
  • A step before becoming a CER / CRT (Or signed by CA)
  • Contains information like Common Name, Organization Name etc
  • Decode
    • openssl req -in server.csr -noout -text

Intermediate certificate

  • Certificate(s) between your site and root certificate
    • Construct chain of trust
  • A proxy protecting the root certificate
  • Certificate ordering follow RFC4346
    • 1st Server certificate
    • Any intermediate follow each other one by one

TLS

A successor protocol or an enhancer version of SSL which support DV, OV, EV, BV etc.

References

2017/07/02

How to do list intersection efficiently in python

Question

In previous blog, I have investigated how to do array initialization efficiently. This time, I would like to investigate how to do intersections in Python efficiently.
For those need to read code immediately, here is the gist for this blog

Methods

Before digging in the implementations, let’s define what intersection is. By intersection, I mean finding same elements between 2 given lists.
A = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
B = [2, 4, 6, 8, 10]
# c = a ^ b
# which is c = [2, 4, 6, 8]
How do we get c = [2, 4, 6, 8] efficiently?

M1: double loop

A double loop for finding same elements
def doubleLoop():
    C = [a for a in A for b in B if a == b]

def doubleLoop2():
    C = []
    for a in A:
        for b in B:
            if a == b:
                C.append(a)
Noted
  • list comprehension vs append

M2: Set intersection

Cast A and B to set and do intersection.
setA = set(A)
setB = set(B)

def setAnd():
    C = sorted(setA.intersection(setB))

def setAnd2():
    # No sort
    C = setA.intersection(setB)

def setAnd3():
    # Inline construct
    C = sorted(set(A).intersection(set(B)))

def setAnd4():
    # Inline construct and no sort
    C = set(A).intersection(set(B))
Noted:
  • setAnd ensures the ordering sames with output from doubleLoop or doubleLoop2 as set has no concept of ordering
  • setAnd2 demonstrates the meaning of no ordering
  • setAnd3 demonstrates how slow for casting them in a run time manner
  • setAnd4 is similar to `setAnd2

Result

Results of running each implementations for 1000000 times
# Double loop
# 1.4873290062
# Double loop2
# 1.70801305771
# Set AND
# 0.525309085846
# Set AND2
# 0.199652194977
# Set AND3
# 0.982024908066
# Set AND4
# 0.657478094101

Conclusion

  • Always use set to do intersections. No matter the results need to be sorted or not, it is always faster than the double loop way.
  • List comprehension is always faster than append. Check out more from my previous blog

Explanation

setAnd vs setAnd2

  • setAnd do an extra sort comparing to setAnd2
    • As set has no ordering, it need to be cast to list before which is an extra cost

setAnd vs setAnd3

A performance checks between using a pre-cast sets setA and setB and runtime-cast sets set(A) and set(B). Even though the runtime version is slower, it is still faster than double loop obviously.

setAnd3 vs setAnd4

Same as setAnd vs setAnd2

Gotcha

  • Set’s intersection works charm on unique elements inside involved lists. Benchmark here may not be applied to the multi-set one
  • Set has no concept of ordering. If ordering is a must for you, you need to do it yourself.