Cheat Sheet: Python Web Security
This is intended as a cheat sheet for the most common vulnerabilities for python web developers. It’s primarily targeted at django but the concepts are relevant to other frameworks.
A number of vulnerabilities are addressed simply by using web frameworks as recommended in the documentation, but you still need to understand where landmines lie.
General Principles
- Never trust user input
- Even when you can trust user input, do not trust user input
The OWASP Top 10 is a periodically updated resource that categorises the most common vulnerabilities
Vulnerabilities
Common Vulnerabilities
HTML Injection / Cross Site Scripting (XSS)
- If you allow untrusted users to display unescaped HTML they could embed javascript that steals the viewer’s cookie and the attacker then pretends to be that user
- Because the javascript is hidden, the admin doesn’t even know that their cookie has been stolen
- The HTML could have been entered by the user previously (ie saved in a record) that is displayed back to an admin, or it could be in a GET/POST parameter that is echoed back to the user
Mitigation:
- Django: Template values are escaped by default; also see django.utils.html but be extra careful about generating html manually within python code
Script Tags
- A common mistake is embedding JSON data inside
<script>
tags- There are a number of pitfalls with escaping because default template escaping is for HTML, not for JavaScript inside HTML.
<script>let x = "{{ x }}"; console.log(x);</script>
- This will not work due to the fact that escaped HTML entities don’t work inside script tags
<script>let x = "{{ x | escapejs }}"; console.log(x);</script>
- This does work for plain strings as long as the string is a single or double quoted string: a JS backquoted string is unsafe
- Using
json.dumps()
is not html-safe- A string that contains
</script><script>alert('foo');
will cause the browser to terminate the script early and the subsequent embedded script will be executed
- A string that contains
- There are a number of pitfalls with escaping because default template escaping is for HTML, not for JavaScript inside HTML.
Mitigation:
- If you have a plain string then:
- Use the
escapejs
filter inside a single or double quoted string.
- Use the
- If you have a JSON structure then you can:
- There’s no built-in way to safely embed a JSON string directly in a script tag
- The third-party library
escapejson
provides this functionality
- The third-party library
- Slightly less efficient:
json.dumps()
the value in python and then inject it into the template withlet x = JSON.parse("{{ x | escapejs }}");
json.dumps()
the value in python and then embed the data in the document usingjson_script
- There’s no built-in way to safely embed a JSON string directly in a script tag
SQL Injection
- Given a hypothetical query like
query = f"SELECT * FROM tblUser WHERE (username='{username}') AND (password='{password}')"
, an attacker can submit a username of admin and a password of' OR = '
(note the quotes in the pasword).- A simple substitution of these values will mean the attacker can log in as admin without a password.
- There are automated tools that try variations of this.
- In many situations, attackers can use this to run arbitrary queries against your database.
Mitigation:
- Use query parameters that enforce type integrity
- Don’t do the escaping yourself; if you don’t get character encoding and escape sequences perfectly correct you may still be vulnerable
- Django:
- As long as you’re not injecting manually generated SQL the ORM will escape values automatically
- Raw SQL - read the documentation
- Bad:
MyTable.objects.raw('SELECT * FROM mytable WHERE value=%s' % value)
- Good:
MyTable.objects.raw('SELECT * FROM mytable WHERE value=%s', value)
- Note the subtle difference near the end of the line: in the first python in doing the substitution with the
%
operator, in the second the ORM is doing the substitution.
- Bad:
Email Header Injection
- If you don’t escape email headers, attackers can insert a newline character that allows them to construct whatever arbitrary email they want. Spammers like using this so that they can use your web server as a relay to send whatever emails they like.
Mitigation:
- Don’t generate emails yourself (or invoke
sendmail
); use a library that takes care of this for you.
Shell Parameter Execution
- If you pass parameters to command line programs, certain characters are dangerous (eg backquotes can be used to run arbitrary programs).
Mitigation:
- Don’t use
shell=True
unless needed - Use
shlex.quote()
on parameters passed to shell commands
Client Side Manipulation
- You cannot trust any code that runs on the client; it is trivial for an attacker to get Javascript to do anything they want.
Mitigation:
- Any client-side validation (eg Javascript) must be repeated on the server
- There is no difference between
GET
andPOST
URLs; both can be manipulated on the client
Cross Site Request Forgery (CSRF / XSRF)
- If a user visits a malicious site it could trigger a request to the site that the user is authenticated on
- eg:
<img src="http://mysite.com/admin/createuser?username=foo&password=bar&admin=1" />
- If an admin loads that image while logged then the createuser action will create an account for the attacker to use
Mitigation:
- Don’t use JSONP (see below)
GET
requests should not modify data- non-
GET
requests should include a randomly generated CSRF token that is not contained in the URL. Requests should be rejected if this token does not match the session token. - If you use
CsrfViewMiddleware
django will do this for you
Server Side Request Forgery (SSRF)
- A server may be induced to make http(s) requests that expose sensitive data or have unexpected side effects
- eg: If the AWS IMDS metadata endpoint is exposed to a user then security credentials may be exposed, allowing full access to the AWS account
- eg: If an S3 bucket is private (eg database backups) the attacker may be able to induce the server to expose contents
Mitigation:
- Don’t make requests to endpoints that depend on data the user provided
- If requests do need to be made to user-specified endpoints then
- You are increasing exposure risk
- Only allow user-provided endpoints against a known good list
- Only validate against fully constructed URLs (if you allow the user to only specify a domain part then might include characters like
@
,?
,#
or/
which will change how the final URL is interpreted)
- Only validate against fully constructed URLs (if you allow the user to only specify a domain part then might include characters like
- Validate the response type, and do not expose downloaded content
- Do not re-use the same S3 bucket for different purposes
- Do not allow custom headers (on AWS in particular this helps prevent attacks against IMDSv2)
Connection Interception / Session Stealing
- If a user accesses the site from an insecure location (eg public wifi), others can steal their cookie/session and become that user
Mitigation:
- Use HTTPS-only flag for session cookies
- Do not assume that an obscure URL is enough to keep random people out
Less Common Vulnerabilities
Paswords
- Never store passwords in plaintext; if the attacker gets in then they will have access to everyone’s account. This is particularly a problem when people use the same password on multiple sites.
Mitigation:
- Use the framework’s password handling (it will use strong password hashing with salt)
Brute Force Attacks
- Attackers may attempt to guess passwords
Mitigation:
- Use a slow password hashing function
- Rate limit client requests
- Rate limiting accounts (rather than clients) can be a problem as it allows attackers to create trivial DoS attacks
- Ensure minimum password complexity
- This needs to be balanced against users’ tendency to store passwords in an insecure manner
File Permissions
- Particularly on shared servers, leaving files world readable can be a problem
Mitigation:
- Restrict world/group access to files using standard unix permissions
Exposing Passwords/Tokens
- Never commit passwords or access tokens to a repository
- You do not know who will have access to source code
- Black hats scan publicly accessible repositories for tokens and attempt to exploit them
- AWS credentials are particularly popular (this is why you should have usage alarms set up)
- Even something as innocuous as slack can be exploited
Mitigation:
- Defense in depth: don’t just rely on passwords
- Use environment variables or config files to specify local data
- Use
.gitignore
on files that contain sensitive data- Use completely separate files for sensitive/nonsensitive data; it is easy to accidentally commit sensitive data if they are all in the same file
- If you accidentally commit sensitive data to a repository
- If you haven’t yet pushed upstream, rewrite git history to remove it from all commits before pushing
- If you have pushed upstream then you need to regenerate the password/key/token: due to the way git packs work, clients may still be able to see sensitive data even if you have removed it from history
Hash Length Extension Attack
- Some hash types (including MD5, SHA-1, SHA-2) are subject to a length extension attack
- An attacker can take a legitimately signed message and append data to it without even knowing what the original message was in such a way that the hash does not change
Mitigation:
- Use HMAC for signing
XML (Various)
- An XML parser that handles all valid inputs is also valid to various attacks, including:
- Exposing file contents:
<!ENTITY externalEntity SYSTEM "file:///home/user/some-secret-file">
- Memory exhaustion
- Exposing file contents:
Mitigation:
- Do not use regular XML parsers for user-provided input (or refuse to parse user-provided XML)
- Use defusedxml or similar
- See the python summary
Hashing and Weak Typing
- PHP and javascript will interpret strings such as
"0e45"
as 0 x 10^45 = 0- All strings that match the regex
/0+[eE][0-9]+/
will be interpreted as equal to zero
- All strings that match the regex
- An attacker can generate a hash that matches this then behaviour and in some cases 0 == ‘0e45’ can be used to bypass security
Mitigation:
- Use
===
when comparing hash values
Redirect spoofing
- If a page accepts a redirect URL, attackers can replace it with a spoofed page; viewers will often not notice the URL change
- eg if an attacker can get an admin to click on
http://www.mysite.com/user/edit?successUrl=http%3A%2F%2Fwww.badsite.com
(eg sending it in a support email) they can make the URL long, and the admin will often not notice the site change. If the bad site looks like the real one, the admin can be tricked into entering their details - If the page simply emits the successUrl in the site content, this may also be a vector for an XSS attack (see above)
Mitigation:
- Validate redirect URLs
django.utils.http.url_has_allowed_host_and_scheme
(doesn’t appear to be documented but is present and is used in various places)
Cross Domain Manipulation
<a>
/<form>
withtarget="_blank"
to external/untrusted sites is not secure- https://mathiasbynens.github.io/rel-noopener/
- https://www.jitbit.com/alexblog/256-targetblank---the-most-underestimated-vulnerability-ever/
- Mitigations
- Use JS to open a new window (will trigger a popup blocker though) & wipe the opener attribute
- Use
rel="noopener noreferrer"
Regular Expression Denial of Service
- If you use regexes for on any user input then you may open yourself to DoS if your regex parsing library uses backtracking (which most do)
- Evil Regexes:
- Capture Group with [variable count] repetition; and
- Inside the capture group:
- [variable count] repetition; or
- Inside the capture group:
- Overlapping alternates
- Capture Group with [variable count] repetition; and
- Details:
Information Leakage
- An attacker can infer things from HTTP response codes by guessing URLs
Mitigation:
- If in doubt, return a generic 404 (not found) rather than 401/403 (unauthorized/forbidden)
JSONP
- See this good summary of JSONP issues
- Example of a previous vulnerability
Mitigation:
- Return regular JSON
- Use CORS instead
eval
- It is extremely easy to introduce bugs that allow attackers to run arbitrary code
Mitigation:
- Don’t use
eval()
File Injection
- If clients can upload files to a server then the server may be tricked into executing them
- Commonly occurs on a server that’s configured to run any PHP files it sees (even if the app is not written in PHP)
- Even if the server is configured to not execute files in an upload directory, the site may be moved to a new server and new configuration overlooked
- Often introduced through 3rd party extensions (eg TinyMCE file upload)
Mitigation:
- Make static file serving configuration resilient to server configuration
- Restrict file upload extensions to whitelist only
Domain Ownership Impersonation
- SSL Certificates can be obtained by
- Showing that you have ownership of a special email address on that domain; or
- Showing that you can create arbitrary files on that domain
Mitigation:
- Do not let users create arbitrary email addresses on a given domain
- Do not let users create arbitrary files in the root of a domain
Session Fixation
- If an attacker can set a known session cookie value then when the user logs in the attacker can use their session
- Can be exploited through content on subdomains shadowing parent session cookies
Mitigation:
- Regenerate the session key on login (most frameworks do this by default)
- Don’t treat user-controlled subdomain content as isolated from the parent domain
Character Encoding Attacks
- Can exploit differences in character set handling, eg
- Sometimes even the same database can differ (postgres can use libicu or glibc for unicode handling, and in some cases they differ)
- This can sometimes be exploited to trick a site into sending a password reset email to the wrong account (differing only by an accented character)
Mitigation:
- Use consistent encoding and parsing; fail if you’re not sure rather than making assumptions
- Use libraries: other people have already encountered & fixed these problems
File Parsing
- Some file formats are not safe to parse from untrusted sources
- Python pickle can lead to remote code execution
- XML: in theory is safe, but parsing edge cases can lead to DoS (see above)
Mitigation:
- Use a file format specifically designed for transport with limited functionality
- JSON
- YAML (although has similar DoS issues as Excel)
- Protocol Buffers
Time of Check / Time of Use (TOCTOU)
- If you check some safety condition and then introduce a delay before depending on that condition, that condition could change in the meantime
- eg: “Does this file either not exist or is owned by me? If so, create/open the file for writing.”
- There is a delay between checking the file and opening it
- eg: “Does this file either not exist or is owned by me? If so, create/open the file for writing.”
Mitigation:
- Perform operations atomically where possible
- Acquire resources/locks then check safety conditions and abort if they aren’t met
Email Bounce Attack
- If you let an attacker control an email bounce address (either Return-Path or From if no explicit Return-Path) then attackers can read the contents of emails
- They can flood a recipient’s mailbox until it’s full, and then read the bounces (which contain the original email)
- If that includes password reset emails then accounts can be compromised
- See this PDF or these examples
Mitigation:
- Do not let users control the bounce address (or From address); or
- Do not include any sensitive information in emails
- This is risky as you may accidentally reveal things you didn’t think about, eg an email alias’ real mailbox
Reflected File Download
- If you reflect an upload filename back to the user without properly escaping then an attacker can inject HTTP headers, or terminate the HTTP headers and control the content
- Content-Disposition attacks
- The file payload can be any HTML (including javascript to steal auth cookies or take other actions), including spoofing HTML to get the user to take unexpected actions
<a download=...">
attacks- Even if you set the file type using Content-Disposition, if the
href
is not escaped correctly then thedownload
attribute can override the way the client interprets the file. - This would still require the user to open the downloaded .bat or .cmd file, but they may have had an expectation that this was a safe JSON file from your site.
- Even if you set the file type using Content-Disposition, if the
Mitigation:
- Don’t create Content-Disposition headers manually
- Escape html tag attributes
- Django handles this by default
Other best Practices
- Minor changes that will stop automated attacks
- Move the admin url to a less obvious location
- Do not use default account names (eg
admin
,dev
,administrator
)- If an account cannot be renamed, create a second account with desired permissions and reduce the permissions of the primary account
- Use computer generated passwords; humans’ “random” password are often not.