Friday, December 30, 2011

Raw sockets with BPF in Python

Update 2021: Note that this was written in 2011. Nowadays I'd not recommend doing it this way, but instead using BCC to write your filters in C, from within Python. The following example shows the use of raw sockets where filtering is applied using BPF. It has only been tested on Linux. The filter data structure is built up to form a machine language-like set of instructions that decide whether the packet should be passed to the raw socket or not. The filter is applied to the socket using SO_ATTACH_FILTER. This specific example filters packets to only allow packets destined a DHCP server (UDP port 67).
from binascii import hexlify
from ctypes import create_string_buffer, addressof
from socket import socket, AF_PACKET, SOCK_RAW, SOL_SOCKET
from struct import pack, unpack


# A subset of Berkeley Packet Filter constants and macros, as defined in
# linux/filter.h.

# Instruction classes
BPF_LD = 0x00
BPF_JMP = 0x05
BPF_RET = 0x06

# ld/ldx fields
BPF_H = 0x08
BPF_B = 0x10
BPF_ABS = 0x20

# alu/jmp fields
BPF_JEQ = 0x10
BPF_K = 0x00

def bpf_jump(code, k, jt, jf):
    return pack('HBBI', code, jt, jf, k)

def bpf_stmt(code, k):
    return bpf_jump(code, k, 0, 0)


# Ordering of the filters is backwards of what would be intuitive for 
# performance reasons: the check that is most likely to fail is first.
filters_list = [
    # Must have dst port 67. Load (BPF_LD) a half word value (BPF_H) in 
    # ethernet frame at absolute byte offset 36 (BPF_ABS). If value is equal to
    # 67 then do not jump, else jump 5 statements.
    bpf_stmt(BPF_LD | BPF_H | BPF_ABS, 36),
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 67, 0, 5),

    # Must be UDP (check protocol field at byte offset 23)
    bpf_stmt(BPF_LD | BPF_B | BPF_ABS, 23), 
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x11, 0, 3),

    # Must be IPv4 (check ethertype field at byte offset 12)
    bpf_stmt(BPF_LD | BPF_H | BPF_ABS, 12), 
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x0800, 0, 1),

    bpf_stmt(BPF_RET | BPF_K, 0x0fffffff), # pass
    bpf_stmt(BPF_RET | BPF_K, 0), # reject
]

# Create filters struct and fprog struct to be used by SO_ATTACH_FILTER, as
# defined in linux/filter.h.
filters = ''.join(filters_list)
b = create_string_buffer(filters)
mem_addr_of_filters = addressof(b)
fprog = pack('HL', len(filters_list), mem_addr_of_filters)

# As defined in asm/socket.h
SO_ATTACH_FILTER = 26

# Create listening socket with filters
s = socket(AF_PACKET, SOCK_RAW, 0x0800)
s.setsockopt(SOL_SOCKET, SO_ATTACH_FILTER, fprog)
s.bind(('eth0', 0x0800))

while True:
    data, addr = s.recvfrom(65565)
    print 'got data from', addr, ':', hexlify(data)

16 comments:

  1. Nice job. But i wonder where you got the info about the programming of the bfp.
    Maybe you could help. I just want to capture all frames with custom ethertype 0x7788 OR 0x7799,
    How would the filter look like? How do you handle the OR condition?
    Many thanks in advance
    Alain

    ReplyDelete
  2. Thanks! Regarding where i got the info, if I remember right, then I found some inspiration in the source code of Scapy, and also by searching for SO_ATTACH_FILTER in general and reading examples.

    To achieve an OR, I think you could do something like
    - Load the ethertype into the register
    - Do a jump to a "return pass" if it's 0x7788
    - Do a jump to the "return pass" if 0x7799
    - "return reject" in case none of the jumps succeeded
    - "return pass" (the jumps would jump to here if they succeeded)

    Does it make sense?

    ReplyDelete
  3. Wow, that's a fast answer.
    Here is my attempt:
    # load proto(ethertype field at byte offset 12)
    bpf_stmt(BPF_LD | BPF_H | BPF_ABS, 12),
    # CHECK IF ethertype== 0x7788
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7788, 0, 2),
    # CHECK IF ethertype== 0x7799
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7799, 0, 1),
    bpf_stmt(BPF_RET | BPF_K, 0x0fffffff), # pass
    bpf_stmt(BPF_RET | BPF_K, 0), # reject

    Is this correct?
    Alain

    ReplyDelete
  4. I guess this is a better version.

    # load proto(ethertype field at byte offset 12)
    bpf_stmt(BPF_LD | BPF_H | BPF_ABS, 12),
    # CHECK IF ethertype== 0x7788, if equal skip 2 statements
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7788, 2, 0),
    # CHECK IF ethertype== 0x7799, if not equal skip 1 statement
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7799, 1, 0),
    bpf_stmt(BPF_RET | BPF_K, 0x0fffffff), # pass
    bpf_stmt(BPF_RET | BPF_K, 0), # reject
    Many thanks
    Alain

    ReplyDelete
  5. I got it working. For the posterity, here is my code:
    # load proto(ethertype field at byte offset 12)
    bpf_stmt(BPF_LD | BPF_H | BPF_ABS, 12),
    # CHECK IF ethertype== 0x7788, if equal skip 2 statements
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7788, 1, 0),
    # CHECK IF ethertype== 0x7799, if not equal skip 1 statement
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0xa799, 0, 1),
    bpf_stmt(BPF_RET | BPF_K, 0x0fffffff), # pass
    bpf_stmt(BPF_RET | BPF_K, 0), # reject

    Many thanks again. This saved me a lot of time!
    Alain

    ReplyDelete
  6. Sorry, pasted wrong code.
    It should be OK now.
    # load proto(ethertype field at byte offset 12)
    bpf_stmt(BPF_LD | BPF_H | BPF_ABS, 12),
    # CHECK IF ethertype== 0x7788, if equal skip 1 statement
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7788, 1, 0),
    # CHECK IF ethertype== 0x7799, if not equal skip 1 statement
    bpf_jump(BPF_JMP | BPF_JEQ | BPF_K, 0x7799, 0, 1),
    bpf_stmt(BPF_RET | BPF_K, 0x0fffffff), # pass
    bpf_stmt(BPF_RET | BPF_K, 0), # reject
    Alain

    ReplyDelete
  7. Glad you got it working :-)

    ReplyDelete
  8. fwiw, you can use tcpdump -d and tcpdump -dd to create expressions to use, I find it much easier then using the language directly.

    For example, let's say you want to filter all icmp pings. I would use something like this:

    export expression="icmp[icmptype] == icmp-echo";
    paste -d"\n" \
    <(tcpdump -d $expression | sed -e 's/^/# /') \
    <(tcpdump -dd $expression |tr -s '{}' '()')

    to generate the filter, which would look like:

    # (000) ldh [12]
    ( 0x28, 0, 0, 0x0000000c ),
    # (001) jeq #0x800 jt 2 jf 10
    ( 0x15, 0, 8, 0x00000800 ),
    # (002) ldb [23]
    ( 0x30, 0, 0, 0x00000017 ),
    # (003) jeq #0x1 jt 4 jf 10
    ( 0x15, 0, 6, 0x00000001 ),
    # (004) ldh [20]
    ( 0x28, 0, 0, 0x00000014 ),
    # (005) jset #0x1fff jt 10 jf 6
    ( 0x45, 4, 0, 0x00001fff ),
    # (006) ldxb 4*([14]&0xf)
    ( 0xb1, 0, 0, 0x0000000e ),
    # (007) ldb [x + 14]
    ( 0x50, 0, 0, 0x0000000e ),
    # (008) jeq #0x8 jt 9 jf 10
    ( 0x15, 0, 1, 0x00000008 ),
    # (009) ret #65535
    ( 0x6, 0, 0, 0x0000ffff ),
    # (010) ret #0
    ( 0x6, 0, 0, 0x00000000 ),

    put this in a list, like expression = [ ... ], and you can use something like:

    blob = ctypes.create_string_buffer(
    ''.join(struct.pack("HBBI", *e) for e in expression))
    address = ctypes.addressof(blob)
    to_pass = struct.pack('HL', len(expression), address)

    I was about to write about this on my blog, http://blog.yosti.net/, but seemed more useful here! Also, thanks for the interesting article.

    ReplyDelete
  9. Hi Mark. Cool stuff. I had read that people were generating the filter code blocks using tcpdump, but never got around to figuring out how in practice. Thanks for sharing!

    ReplyDelete
  10. This is fantastic, thank you!

    However, I'm having a slight issue. Python can't fine AF_PACKET. Is there an alternative I should be using?

    Thanks!

    -brian

    ReplyDelete
    Replies
    1. Maybe the operating system you are using does not support this type of raw sockets. I was using Linux for this example. I see that on my Windows machines, I get the error "ImportError: cannot import name AF_PACKET", so I'm afraid this just dosn't work on Windows. I think on Windows you will have to go for something like libpcap (search the net for examples of this, I haven't tried using it). I hope you find a solution!

      Delete
    2. This was on OpenBSD, which has bpf, so I'm surprised about the AF_PACKET thing. I'll do some research and see what I need to do. libpcap is probably more portable though, so I might just go with that.

      Thanks!!

      -brian

      Delete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Hello Allan, Thank you for the post. How can I bind the socket on multiple interface? I tried "s.bind(('0.0.0.0', 0x0800))", but it is not working. Any help?

    ReplyDelete
  13. Hello,Just want to share info, if anyone interested to sniff traffic on multiple port, just refer my question http://stackoverflow.com/questions/41340734/bpf-in-python-to-sniff-packets-for-multiple-tcp-ports

    ReplyDelete
  14. Hello, here I have a warm tips to yours. If you want to get all the packet flow in your network, please don't bind the interface, or you will just get half of the packet. Because this is a raw socket, and a raw socket doesn't need to bind anything, it receive data from rawsocket not by interface.
    If I am wrong, please let be know, because the upon tips base in my development, if I bind the interface I just can get half of the packet, it doesn't fit my needs.
    However this article is very inspired, thank your for your great job Allan. :)

    ReplyDelete