Cheat Sheet: Python Web Security

Cybersecurity

This is intended as a cheat sheet for the most common vulnerabilities for python web developers. It’s primarily targeted at django but the concepts are relevant to other frameworks.

A number of vulnerabilities are addressed simply by using web frameworks as recommended in the documentation, but you still need to understand where landmines lie.

General Principles

  1. Never trust user input
  2. Even when you can trust user input, do not trust user input

The OWASP Top 10 is a periodically updated resource that categorises the most common vulnerabilities

Vulnerabilities

Common Vulnerabilities

HTML Injection / Cross Site Scripting (XSS)

  • If you allow untrusted users to display unescaped HTML they could embed javascript that steals the viewer’s cookie and the attacker then pretends to be that user
  • Because the javascript is hidden, the admin doesn’t even know that their cookie has been stolen
  • The HTML could have been entered by the user previously (ie saved in a record) that is displayed back to an admin, or it could be in a GET/POST parameter that is echoed back to the user

Mitigation:

  • Django: Template values are escaped by default; also see django.utils.html but be extra careful about generating html manually within python code

Script Tags

  • A common mistake is embedding JSON data inside <script> tags
    • There are a number of pitfalls with escaping because default template escaping is for HTML, not for JavaScript inside HTML.
      • <script>let x = "{{ x }}"; console.log(x);</script>
        • This will not work due to the fact that escaped HTML entities don’t work inside script tags
      • <script>let x = "{{ x | escapejs }}"; console.log(x);</script>
        • This does work for plain strings as long as the string is a single or double quoted string: a JS backquoted string is unsafe
    • Using json.dumps() is not html-safe
      • A string that contains </script><script>alert('foo'); will cause the browser to terminate the script early and the subsequent embedded script will be executed

Mitigation:

  • If you have a plain string then:
    • Use the escapejs filter inside a single or double quoted string.
  • If you have a JSON structure then you can:
    • There’s no built-in way to safely embed a JSON string directly in a script tag
      • The third-party library escapejson provides this functionality
    • Slightly less efficient:
      • json.dumps() the value in python and then inject it into the template with let x = JSON.parse("{{ x | escapejs }}");
      • json.dumps() the value in python and then embed the data in the document using json_script

SQL Injection

  • Given a hypothetical query like query = f"SELECT * FROM tblUser WHERE (username='{username}') AND (password='{password}')", an attacker can submit a username of admin and a password of ' OR = ' (note the quotes in the pasword).
    • A simple substitution of these values will mean the attacker can log in as admin without a password.
    • There are automated tools that try variations of this.
  • In many situations, attackers can use this to run arbitrary queries against your database.

Mitigation:

  • Use query parameters that enforce type integrity
  • Don’t do the escaping yourself; if you don’t get character encoding and escape sequences perfectly correct you may still be vulnerable
  • Django:
    • As long as you’re not injecting manually generated SQL the ORM will escape values automatically
    • Raw SQL - read the documentation
      • Bad: MyTable.objects.raw('SELECT * FROM mytable WHERE value=%s' % value)
      • Good: MyTable.objects.raw('SELECT * FROM mytable WHERE value=%s', value)
      • Note the subtle difference near the end of the line: in the first python in doing the substitution with the % operator, in the second the ORM is doing the substitution.

Email Header Injection

  • If you don’t escape email headers, attackers can insert a newline character that allows them to construct whatever arbitrary email they want. Spammers like using this so that they can use your web server as a relay to send whatever emails they like.

Mitigation:

  • Don’t generate emails yourself (or invoke sendmail); use a library that takes care of this for you.

Shell Parameter Execution

  • If you pass parameters to command line programs, certain characters are dangerous (eg backquotes can be used to run arbitrary programs).

Mitigation:

  • Don’t use shell=True unless needed
  • Use shlex.quote() on parameters passed to shell commands

Client Side Manipulation

  • You cannot trust any code that runs on the client; it is trivial for an attacker to get Javascript to do anything they want.

Mitigation:

  • Any client-side validation (eg Javascript) must be repeated on the server
  • There is no difference between GET and POST URLs; both can be manipulated on the client

Cross Site Request Forgery (CSRF / XSRF)

  • If a user visits a malicious site it could trigger a request to the site that the user is authenticated on
  • eg: <img src="http://mysite.com/admin/createuser?username=foo&password=bar&admin=1" />
    • If an admin loads that image while logged then the createuser action will create an account for the attacker to use

Mitigation:

  • Don’t use JSONP (see below)
  • GET requests should not modify data
  • non-GET requests should include a randomly generated CSRF token that is not contained in the URL. Requests should be rejected if this token does not match the session token.
  • If you use CsrfViewMiddleware django will do this for you

Server Side Request Forgery (SSRF)

  • A server may be induced to make http(s) requests that expose sensitive data or have unexpected side effects
  • eg: If the AWS metadata endpoint is exposed to a user then security credentials may be exposed, allowing full access to the AWS account
  • eg: If an S3 bucket is private (eg database backups) the user may be able to induce the server to expose contents

Mitigation:

  • Don’t make requests to endpoints that depend on data the user provided
  • If requests do need to be made to user-specified endpoints then
  • Validate the endpoint against a known good list of endpoints first
    • You are increasing exposure risk
    • Validate the filetype and do not expose downloaded content
    • Use a separate container that has limited permissions (this can be hard to get right) from other files

Connection Interception / Session Stealing

  • If a user accesses the site from an insecure location (eg public wifi), others can steal their cookie/session and become that user

Mitigation:

  • Use HTTPS-only flag for session cookies
  • Do not assume that an obscure URL is enough to keep random people out

Less Common Vulnerabilities

Paswords

  • Never store passwords in plaintext; if the attacker gets in then they will have access to everyone’s account. This is particularly a problem when people use the same password on multiple sites.

Mitigation:

  • Use the framework’s password handling (it will use strong password hashing with salt)

Brute Force Attacks

  • Attackers may attempt to guess passwords

Mitigation:

  • Use a slow password hashing function
  • Rate limit client requests
    • Rate limiting accounts (rather than clients) can be a problem as it allows attackers to create trivial DoS attacks
  • Ensure minimum password complexity
  • This needs to be balanced against users’ tendency to store passwords in an insecure manner

File Permissions

  • Particularly on shared servers, leaving files world readable can be a problem

Mitigation:

  • Restrict world/group access to files using standard unix permissions

Exposing Passwords/Tokens

  • Never commit passwords or access tokens to a repository
    • You do not know who will have access to source code
    • Black hats scan publicly accessible repositories for tokens and attempt to exploit them
      • AWS credentials are particularly popular (this is why you should have usage alarms set up)
      • Even something as innocuous as slack can be exploited

Mitigation:

  • Defense in depth: don’t just rely on passwords
  • Use environment variables or config files to specify local data
  • Use .gitignore on files that contain sensitive data
    • Use completely separate files for sensitive/nonsensitive data; it is easy to accidentally commit sensitive data if they are all in the same file
  • If you accidentally commit sensitive data to a repository
    • If you haven’t yet pushed upstream, rewrite git history to remove it from all commits before pushing
    • If you have pushed upstream then you need to regenerate the password/key/token: due to the way git packs work, clients may still be able to see sensitive data even if you have removed it from history

Hash Length Extension Attack

  • Some hash types (including MD5, SHA-1, SHA-2) are subject to a length extension attack
  • An attacker can take a legitimately signed message and append data to it without even knowing what the original message was in such a way that the hash does not change

Mitigation:

  • Use HMAC for signing

XML (Various)

  • An XML parser that handles all valid inputs is also valid to various attacks, including:
    • Exposing file contents: <!ENTITY externalEntity SYSTEM "file:///home/user/some-secret-file">
    • Memory exhaustion

Mitigation:

  • Do not use regular XML parsers for user-provided input (or refuse to parse user-provided XML)
  • Use defusedxml or similar
  • See the python summary

Hashing and Weak Typing

  • PHP and javascript will interpret strings such as "0e45" as 0 x 10^45 = 0
    • All strings that match the regex /0+[eE][0-9]+/ will be interpreted as equal to zero
  • An attacker can generate a hash that matches this then behaviour and in some cases 0 == ‘0e45’ can be used to bypass security

Mitigation:

  • Use === when comparing hash values

Redirect spoofing

  • If a page accepts a redirect URL, attackers can replace it with a spoofed page; viewers will often not notice the URL change
  • eg if an attacker can get an admin to click on http://www.mysite.com/user/edit?successUrl=http%3A%2F%2Fwww.badsite.com (eg sending it in a support email) they can make the URL long, and the admin will often not notice the site change. If the bad site looks like the real one, the admin can be tricked into entering their details
  • If the page simply emits the successUrl in the site content, this may also be a vector for an XSS attack (see above)

Mitigation:

  • Validate redirect URLs
  • django.utils.http.url_has_allowed_host_and_scheme (doesn’t appear to be documented but is present and is used in various places)

Cross Domain Manipulation

Regular Expression Denial of Service

  • If you use regexes for on any user input then you may open yourself to DoS if your regex parsing library uses backtracking (which most do)
  • Evil Regexes:
    • Capture Group with [variable count] repetition; and
      • Inside the capture group:
        • [variable count] repetition; or
    • Overlapping alternates
  • Details:

Information Leakage

  • An attacker can infer things from HTTP response codes by guessing URLs

Mitigation:

  • If in doubt, return a generic 404 (not found) rather than 401/403 (unauthorized/forbidden)

JSONP

Mitigation:

  • Return regular JSON
  • Use CORS instead

eval

  • It is extremely easy to introduce bugs that allow attackers to run arbitrary code

Mitigation:

  • Don’t use eval()

File Injection

  • If clients can upload files to a server then the server may be tricked into executing them
  • Commonly occurs on a server that’s configured to run any PHP files it sees (even if the app is not written in PHP)
  • Even if the server is configured to not execute files in an upload directory, the site may be moved to a new server and new configuration overlooked
  • Often introduced through 3rd party extensions (eg TinyMCE file upload)

Mitigation:

  • Make static file serving configuration resilient to server configuration
  • Restrict file upload extensions to whitelist only

Domain Ownership Impersonation

  • SSL Certificates can be obtained by
    • Showing that you have ownership of a special email address on that domain; or
    • Showing that you can create arbitrary files on that domain

Mitigation:

  • Do not let users create arbitrary email addresses on a given domain
  • Do not let users create arbitrary files in the root of a domain

Session Fixation

  • If an attacker can set a known session cookie value then when the user logs in the attacker can use their session
  • Can be exploited through content on subdomains shadowing parent session cookies

Mitigation:

  • Regenerate the session key on login (most frameworks do this by default)
  • Don’t treat user-controlled subdomain content as isolated from the parent domain

Character Encoding Attacks

  • Can exploit differences in character set handling, eg
    • HTML handling of unicode whitespace (details)
    • JS vs JSON unicode whitespace handling (details)
    • Differing assumptions about case insensitivity conversions (e.g. in turkish)
    • Differing char encoding between database server/client (e.g. mysql)
  • Sometimes even the same database can differ (postgres can use libicu or glibc for unicode handling, and in some cases they differ)
  • This can sometimes be exploited to trick a site into sending a password reset email to the wrong account (differing only by an accented character)

Mitigation:

  • Use consistent encoding and parsing; fail if you’re not sure rather than making assumptions
  • Use libraries: other people have already encountered & fixed these problems

File Parsing

  • Some file formats are not safe to parse from untrusted sources
    • Python pickle can lead to remote code execution
    • XML: in theory is safe, but parsing edge cases can lead to DoS (see above)

Mitigation:

  • Use a file format specifically designed for transport with limited functionality
  • JSON
  • YAML (although has similar DoS issues as Excel)
  • Protocol Buffers

Time of Check / Time of Use (TOCTOU)

  • If you check some safety condition and then introduce a delay before depending on that condition, that condition could change in the meantime
    • eg: “Does this file either not exist or is owned by me? If so, create/open the file for writing.”
      • There is a delay between checking the file and opening it

Mitigation:

  • Perform operations atomically where possible
  • Acquire resources/locks then check safety conditions and abort if they aren’t met

Email Bounce Attack

  • If you let an attacker control an email bounce address (either Return-Path or From if no explicit Return-Path) then attackers can read the contents of emails
    • They can flood a recipient’s mailbox until it’s full, and then read the bounces (which contain the original email)
    • If that includes password reset emails then accounts can be compromised
  • See this PDF or these examples

Mitigation:

  • Do not let users control the bounce address (or From address); or
  • Do not include any sensitive information in emails
    • This is risky as you may accidentally reveal things you didn’t think about, eg an email alias’ real mailbox

Reflected File Download

  • If you reflect an upload filename back to the user without properly escaping then an attacker can inject HTTP headers, or terminate the HTTP headers and control the content
  • Content-Disposition attacks
    • The file payload can be any HTML (including javascript to steal auth cookies or take other actions), including spoofing HTML to get the user to take unexpected actions
  • <a download=..."> attacks
    • Even if you set the file type using Content-Disposition, if the href is not escaped correctly then the download attribute can override the way the client interprets the file.
    • This would still require the user to open the downloaded .bat or .cmd file, but they may have had an expectation that this was a safe JSON file from your site.

Mitigation:

  • Don’t create Content-Disposition headers manually
  • Escape html tag attributes
  • Django handles this by default

Other best Practices

  • Minor changes that will stop automated attacks
    • Move the admin url to a less obvious location
    • Do not use default account names (eg admin, dev, administrator)
      • If an account cannot be renamed, create a second account with desired permissions and reduce the permissions of the primary account
  • Use computer generated passwords; humans’ “random” password are often not.
Published: 2023-09-03