Nette Documentation Preview

syntax
Latte Is Synonymous with Safety
*******************************

<div class=perex>

Latte is the only templating system for PHP that provides effective protection against the critical Cross-site Scripting (XSS) vulnerability. This is thanks to so-called context-sensitive escaping. Let's talk about:

- what XSS is and why it is so dangerous
- why Twig, Blade and other templates are blind and can be easily compromised
- how Latte is so effective in defending against XSS

</div>


Cross-Site Scripting (XSS)
==========================

Cross-site Scripting (XSS for short) is one of the most common vulnerabilities in websites and a very dangerous one at that. It allows an attacker to insert a malicious script (called malware) into a foreign site that executes in the browser of an unsuspecting user.

What can such a script do? For example, it can send arbitrary content from the compromised site to the attacker, including sensitive data displayed after login. It can modify the page or make other requests on behalf of the user.
For example, if it were webmail, it could read sensitive messages, modify the displayed content, or change settings, e.g., turn on forwarding copies of all messages to the attacker's address to gain access to future emails.

This is also why XSS tops the list of the most dangerous vulnerabilities. If a vulnerability is discovered on a website, it should be removed as soon as possible to prevent exploitation.


How Does the Vulnerability Arise?
---------------------------------

The error occurs in the place where the web page is generated and the variables are printed. Imagine that you are creating a search page, and at the beginning there will be a paragraph with the search term in the form:

```php
echo '<p>Search results for <em>' . $search . '</em></p>';
```

An attacker can write any string, including HTML code like `<script>alert("Hacked!")</script>`, into the search field and thus into the `$search` variable. Since the output is not sanitized in any way, it becomes part of the displayed page:

```html
<p>Search results for <em><script>alert("Hacked!")</script></em></p>
```

Instead of outputting the search string, the browser executes JavaScript. And thus the attacker takes over the page.

You might argue that putting code into a variable will indeed execute JavaScript, but only in the attacker's browser. How does it get to the victim? From this perspective, we can distinguish several types of XSS. In our search page example, we are talking about *reflected XSS*.
In this case, the victim needs to be tricked into clicking on a link that contains malicious code in the parameter:

```
https://example.com/?search=<script>alert("Hacked!")</script>
```

Although it requires some social engineering to make the user to access the link, it's not difficult. Users click on links, whether in emails or on social media, without much thought. And the fact that there's something suspicious in the address can be masked by URL shortener, so the user only sees `bit.ly/xxx`.

However, there is a second and much more dangerous form of attack known as *stored XSS* or *persistent XSS*, where an attacker manages to store malicious code on the server so that it is automatically inserted into certain pages.

An example of this is websites where users post comments. An attacker sends a post containing code and it is saved on the server. If the site is not secure enough, it will then run in every visitor's browser.

It would seem that the point of the attack is to get the `<script>` string into the page. In fact, "there are many ways to embed JavaScript":https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html.
Let's take an example of embedding using an HTML attribute. Let's have a photo gallery where you can insert a caption to the images, which is printed in the `alt` attribute:

```php
echo '<img src="' . $imageFile . '" alt="' . $imageAlt . '">';
```

An attacker just needs to insert a cleverly constructed string `" onload="alert('Hacked!')` as a label, and if the output is not sanitized, the resulting code will look like this:

```html
<img src="photo0145.webp" alt="" onload="alert('Hacked!')">
```

The fake `onload` attribute now becomes part of the page. The browser executes the code written in the attribute immediately after the image is downloaded.


How to Defend Against XSS?
--------------------------

Any attempts to detect an attack using a blacklist, such as blocking the `<script>` string, etc. are insufficient. The basis of a workable defense is **consistent sanitization of all data printed inside the page**.

First of all, this involves replacing all characters with special meaning with other matching sequences, which is called **escaping** in slang (the first character of the sequence is called the escape character, hence the name).
For example, in HTML text, the character `<` has a special meaning, which, if it is not to be interpreted as the beginning of a tag, must be replaced by a visually corresponding sequence, the so-called HTML entity `&lt;`.
And the browser prints a character.

**It is very important to distinguish the context in which the data is output**. Because different contexts sanitize strings differently. And in different contexts, different characters may have special meaning.
For example, escaping differs in HTML text, in HTML attributes, inside some special elements, in comments, etc.

It is best to perform the sanitization just before printing it in the page, to ensure that it is actually done, and done just once. It is best if the treatment is handled **automatically** directly by the templating system.
Because if the treatment is not done automatically, the programmer may forget about it. And one omission means the site is vulnerable.

It is also necessary that the JavaScript in your application handles the data correctly. For example, it should not use `innerHTML` in connection with untrusted data, but only `innerText` or `textContent`.
Special care should be taken with functions that evaluate strings like JavaScript, which is `eval()`, but also `setTimeout()`, or using `setAttribute()` with event attributes like `onload`, etc. But that's out of the realm of templating systems.

**Defense in 3 points:**

1) must distinguish the context in which the data is output
2) sanitizes data according to the rules of the context (ie. context-sensitive or context-aware)
3) does this automatically


Context-Aware Escaping
======================

Latte sees the template the same way you do. It understands HTML, recognizes tags, attributes, etc. And thanks to that, it distinguishes between contexts and chooses sanitization functions accordingly. That's called context-sensitive escaping.

How many such contexts are there in HTML itself? You'd be surprised, but there are dozens. Here we list just a few of them, which Latte distinguishes when listing the `{$text}` variable:

```php .{file:example.latte}
- in text: <span>{$text}</span>
- in tag: <span {$text} ></span>
- in attribute: <span title='{$text}'></span>
- in unquoted attribute: <span title={$text}></span>
- in attribute containing URL: <a href="{$text}"></a>
- in attribute containing JavaScript: <img onload="{$text}">
- in attribute containing CSS: <span style="{$text}"></span>
- in JavaScriptu: <script>var = {$text}</script>
- in CSS: <style>body { content: {$text}; }</style>
- in comment: <!-- {$text} -->
```
In each of these contexts, the variable is treated slightly differently. For example, while in HTML text the `<` and `&` characters have special meaning, in HTML comments it is completely different and "specific rules must be followed":https://html.spec.whatwg.org/multipage/syntax.html#comments.
Escaping in HTML and XML is also different. etc. In the examples below you will find a number of examples of how Latte uses context knowledge.


The way of blind birds
----------------------

Although context resolution is a prerequisite for defending against XSS, **Latte is the only templating system for PHP that can do this.** So how does automatic escaping work in other systems?

Templating systems like Twig, Laravel Blade, and others don't see any HTML structure in the template. Therefore, they don't see contexts either. Compared to Latte, they are blind. They only handle their own markup, everything else is an irrelevant character stream to them:

<div class="juxtapose juxtapose--dark-handle" data-startingposition="80" data-animation="juxtapose-wiper">

```php .{file:Twig template as seen by Twig himself}
░ ░░ ░░░░░ ░░░░░░{{ text }}░░░░░░░
░ ░░ ░░░░ ░░░░░ {{ text }} ░░░░░░░░
░ ░░ ░░░░░░░░░░ ░░░░░ ░░░░░░░{{ text }}░░░░░░░░░
░ ░░ ░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░░{{ text }}░░░░░░░░
░ ░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░ ░░ ░░░░░░{{ text }}░░░░░░
░ ░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░░░░░░░ ░░░░ ░░░░░░░░{{ text }}░░
░ ░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░ ░░░░░ ░░░░░░░{{ text }}░░░░░░░░░
░ ░░ ░░░░░░░░░░░░ ░░░░░░░░░░░ ░ {{ text }}░░░░░░░░░
░ ░░ ░░░░ ░░░░░░░░░░░ ░ ░░░░░░░░ {{ text }}░ ░░░░░░░░░
░ ░░ ░░░░░░░░ ░░░░ {{ text }} ░░░
```

```php .{file:Twig template as the designer sees it}
- in text: <span>{{ text }}</span>
- in tag: <span {{ text }} ></span>
- in attribute: <span title='{{ text }}'></span>
- in unquoted attribute: <span title={{ text }}></span>
- in attribute containing URL: <a href="{{ text }}"></a>
- in attribute containing JavaScript: <img onload="{{ text }}">
- in attribute containing CSS: <span style="{{ text }}"></span>
- in JavaScriptu: <script>var = {{ text }}</script>
- in CSS: <style>body { content: {{ text }}; }</style>
- in comment: <!-- {{ text }} -->
```

</div>

Automatic escaping without knowing the context is bullshit that **creates a false sense of security**.

Blind systems just mechanically convert `< > & ' "` characters to HTML entities, which is a valid way of escaping in most use cases, but far from always. They cannot detect or prevent various security holes, as we will show below.


How to hack blind systems
=========================

We will use a few practical examples to show how important context differentiation is and why blind templating systems do not provide sufficient protection against XSS, unlike Latte.
We will use Twig as a representative of a blind system in the examples, but the same applies to other systems.


Attribute Vulnerability
-----------------------

Let's try to inject malicious code into the page using the HTML attribute as we [showed above|#How does the vulnerability arise]. Let's have a template in Twig displaying an image:

```php .{file:example.html.twig}
<img src={{ imageFile }} alt={{ imageAlt }}>
```

Note that there are no quotes around the attribute values. The coder may have forgotten them, which just happens. For example, in React, the code is written like this, without quotes, and a coder who is switching languages can easily forget about the quotes.

The attacker inserts a cleverly constructed string `foo onload=alert('Hacked!')` as the image caption. We already know that Twig can't tell if a variable is being printed in a stream of HTML text, inside an attribute, inside an HTML comment, etc.; in short, it doesn't distinguish between contexts. And it just mechanically converts `< > & ' "` characters to HTML entities.
So the resulting code will look like this:

```php .{url:view-source:https://...}
<img src=photo0145.webp alt=foo onload=alert(&#039;Hacked!&#039;)>
```

**A security hole has been created!**

A fake `onload` attribute has become part of the page and the browser executes it immediately after downloading the image.

Now let's see how Latte handles the same template:

```php .{file:example.latte}
<img src={$imageFile} alt={$imageAlt}>
```

Latte sees the template the same way you do. Unlike Twig, it understands HTML and knows that a variable is printed as an attribute value that is not in quotes. That's why it adds them. When an attacker inserts the same caption, the resulting code will look like this:

```php .{url:view-source:https://...}
<img src="photo0145.webp" alt="foo onload=alert(&apos;Hacked!&apos;)">
```

**Latte successfully prevented XSS.**


Printing a Variable in JavaScript
---------------------------------

Thanks to context-sensitive escaping, it is possible to use PHP variables natively inside JavaScript.

```latte
<p onclick="alert({$movie})">{$movie}</p>

<script>var movie = {$movie};</script>
```

If `$movie` variable stores `'Amarcord & 8 1/2'` string it generates the following output. Notice different escaping used in HTML and JavaScript and also in `onclick` attribute:

```latte
<p onclick="alert(&quot;Amarcord &amp; 8 1\/2&quot;)">Amarcord &amp; 8 1/2</p>

<script>var movie = "Amarcord & 8 1\/2";</script>
```


Link Checking
-------------

Latte automatically checks whether the variable used in the `src` or `href` attributes contains a web URL (ie protocol HTTP) and prevents the writing of links that may pose a security risk.

```latte
{var $link = 'javascript:attack()'}

<a href="{$link}">click here</a>
```

Writes:

```latte
<a href="">click here</a>
```

The check can be turned off using a filter [nocheck|filters#nocheck].

Latte Is Synonymous with Safety

Latte is the only templating system for PHP that provides effective protection against the critical Cross-site Scripting (XSS) vulnerability. This is thanks to so-called context-sensitive escaping. Let's talk about:

  • what XSS is and why it is so dangerous
  • why Twig, Blade and other templates are blind and can be easily compromised
  • how Latte is so effective in defending against XSS

Cross-Site Scripting (XSS)

Cross-site Scripting (XSS for short) is one of the most common vulnerabilities in websites and a very dangerous one at that. It allows an attacker to insert a malicious script (called malware) into a foreign site that executes in the browser of an unsuspecting user.

What can such a script do? For example, it can send arbitrary content from the compromised site to the attacker, including sensitive data displayed after login. It can modify the page or make other requests on behalf of the user. For example, if it were webmail, it could read sensitive messages, modify the displayed content, or change settings, e.g., turn on forwarding copies of all messages to the attacker's address to gain access to future emails.

This is also why XSS tops the list of the most dangerous vulnerabilities. If a vulnerability is discovered on a website, it should be removed as soon as possible to prevent exploitation.

How Does the Vulnerability Arise?

The error occurs in the place where the web page is generated and the variables are printed. Imagine that you are creating a search page, and at the beginning there will be a paragraph with the search term in the form:

echo '<p>Search results for <em>' . $search . '</em></p>';

An attacker can write any string, including HTML code like <script>alert("Hacked!")</script>, into the search field and thus into the $search variable. Since the output is not sanitized in any way, it becomes part of the displayed page:

<p>Search results for <em><script>alert("Hacked!")</script></em></p>

Instead of outputting the search string, the browser executes JavaScript. And thus the attacker takes over the page.

You might argue that putting code into a variable will indeed execute JavaScript, but only in the attacker's browser. How does it get to the victim? From this perspective, we can distinguish several types of XSS. In our search page example, we are talking about reflected XSS. In this case, the victim needs to be tricked into clicking on a link that contains malicious code in the parameter:

https://example.com/?search=<script>alert("Hacked!")</script>

Although it requires some social engineering to make the user to access the link, it's not difficult. Users click on links, whether in emails or on social media, without much thought. And the fact that there's something suspicious in the address can be masked by URL shortener, so the user only sees bit.ly/xxx.

However, there is a second and much more dangerous form of attack known as stored XSS or persistent XSS, where an attacker manages to store malicious code on the server so that it is automatically inserted into certain pages.

An example of this is websites where users post comments. An attacker sends a post containing code and it is saved on the server. If the site is not secure enough, it will then run in every visitor's browser.

It would seem that the point of the attack is to get the <script> string into the page. In fact, there are many ways to embed JavaScript. Let's take an example of embedding using an HTML attribute. Let's have a photo gallery where you can insert a caption to the images, which is printed in the alt attribute:

echo '<img src="' . $imageFile . '" alt="' . $imageAlt . '">';

An attacker just needs to insert a cleverly constructed string " onload="alert('Hacked!') as a label, and if the output is not sanitized, the resulting code will look like this:

<img src="photo0145.webp" alt="" onload="alert('Hacked!')">

The fake onload attribute now becomes part of the page. The browser executes the code written in the attribute immediately after the image is downloaded.

How to Defend Against XSS?

Any attempts to detect an attack using a blacklist, such as blocking the <script> string, etc. are insufficient. The basis of a workable defense is consistent sanitization of all data printed inside the page.

First of all, this involves replacing all characters with special meaning with other matching sequences, which is called escaping in slang (the first character of the sequence is called the escape character, hence the name). For example, in HTML text, the character < has a special meaning, which, if it is not to be interpreted as the beginning of a tag, must be replaced by a visually corresponding sequence, the so-called HTML entity &lt;. And the browser prints a character.

It is very important to distinguish the context in which the data is output. Because different contexts sanitize strings differently. And in different contexts, different characters may have special meaning. For example, escaping differs in HTML text, in HTML attributes, inside some special elements, in comments, etc.

It is best to perform the sanitization just before printing it in the page, to ensure that it is actually done, and done just once. It is best if the treatment is handled automatically directly by the templating system. Because if the treatment is not done automatically, the programmer may forget about it. And one omission means the site is vulnerable.

It is also necessary that the JavaScript in your application handles the data correctly. For example, it should not use innerHTML in connection with untrusted data, but only innerText or textContent. Special care should be taken with functions that evaluate strings like JavaScript, which is eval(), but also setTimeout(), or using setAttribute() with event attributes like onload, etc. But that's out of the realm of templating systems.

Defense in 3 points:

  1. must distinguish the context in which the data is output
  2. sanitizes data according to the rules of the context (ie. context-sensitive or context-aware)
  3. does this automatically

Context-Aware Escaping

Latte sees the template the same way you do. It understands HTML, recognizes tags, attributes, etc. And thanks to that, it distinguishes between contexts and chooses sanitization functions accordingly. That's called context-sensitive escaping.

How many such contexts are there in HTML itself? You'd be surprised, but there are dozens. Here we list just a few of them, which Latte distinguishes when listing the {$text} variable:

- in text: <span>{$text}</span>
- in tag: <span {$text} ></span>
- in attribute: <span title='{$text}'></span>
- in unquoted attribute: <span title={$text}></span>
- in attribute containing URL: <a href="{$text}"></a>
- in attribute containing JavaScript: <img onload="{$text}">
- in attribute containing CSS: <span style="{$text}"></span>
- in JavaScriptu: <script>var = {$text}</script>
- in CSS: <style>body { content: {$text}; }</style>
- in comment: <!-- {$text} -->

In each of these contexts, the variable is treated slightly differently. For example, while in HTML text the < and & characters have special meaning, in HTML comments it is completely different and specific rules must be followed. Escaping in HTML and XML is also different. etc. In the examples below you will find a number of examples of how Latte uses context knowledge.

The way of blind birds

Although context resolution is a prerequisite for defending against XSS, Latte is the only templating system for PHP that can do this. So how does automatic escaping work in other systems?

Templating systems like Twig, Laravel Blade, and others don't see any HTML structure in the template. Therefore, they don't see contexts either. Compared to Latte, they are blind. They only handle their own markup, everything else is an irrelevant character stream to them:

░ ░░ ░░░░░ ░░░░░░{{ text }}░░░░░░░
░ ░░ ░░░░ ░░░░░ {{ text }} ░░░░░░░░
░ ░░ ░░░░░░░░░░ ░░░░░ ░░░░░░░{{ text }}░░░░░░░░░
░ ░░ ░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░░{{ text }}░░░░░░░░
░ ░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░ ░░ ░░░░░░{{ text }}░░░░░░
░ ░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░░░░░░░ ░░░░ ░░░░░░░░{{ text }}░░
░ ░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░ ░░░░░ ░░░░░░░{{ text }}░░░░░░░░░
░ ░░ ░░░░░░░░░░░░ ░░░░░░░░░░░ ░ {{ text }}░░░░░░░░░
░ ░░ ░░░░ ░░░░░░░░░░░ ░ ░░░░░░░░ {{ text }}░ ░░░░░░░░░
░ ░░ ░░░░░░░░ ░░░░ {{ text }} ░░░
- in text: <span>{{ text }}</span>
- in tag: <span {{ text }} ></span>
- in attribute: <span title='{{ text }}'></span>
- in unquoted attribute: <span title={{ text }}></span>
- in attribute containing URL: <a href="{{ text }}"></a>
- in attribute containing JavaScript: <img onload="{{ text }}">
- in attribute containing CSS: <span style="{{ text }}"></span>
- in JavaScriptu: <script>var = {{ text }}</script>
- in CSS: <style>body { content: {{ text }}; }</style>
- in comment: <!-- {{ text }} -->

Automatic escaping without knowing the context is bullshit that creates a false sense of security.

Blind systems just mechanically convert < > & ' " characters to HTML entities, which is a valid way of escaping in most use cases, but far from always. They cannot detect or prevent various security holes, as we will show below.

How to hack blind systems

We will use a few practical examples to show how important context differentiation is and why blind templating systems do not provide sufficient protection against XSS, unlike Latte. We will use Twig as a representative of a blind system in the examples, but the same applies to other systems.

Attribute Vulnerability

Let's try to inject malicious code into the page using the HTML attribute as we showed above. Let's have a template in Twig displaying an image:

<img src={{ imageFile }} alt={{ imageAlt }}>

Note that there are no quotes around the attribute values. The coder may have forgotten them, which just happens. For example, in React, the code is written like this, without quotes, and a coder who is switching languages can easily forget about the quotes.

The attacker inserts a cleverly constructed string foo onload=alert('Hacked!') as the image caption. We already know that Twig can't tell if a variable is being printed in a stream of HTML text, inside an attribute, inside an HTML comment, etc.; in short, it doesn't distinguish between contexts. And it just mechanically converts < > & ' " characters to HTML entities. So the resulting code will look like this:

<img src=photo0145.webp alt=foo onload=alert(&#039;Hacked!&#039;)>

A security hole has been created!

A fake onload attribute has become part of the page and the browser executes it immediately after downloading the image.

Now let's see how Latte handles the same template:

<img src={$imageFile} alt={$imageAlt}>

Latte sees the template the same way you do. Unlike Twig, it understands HTML and knows that a variable is printed as an attribute value that is not in quotes. That's why it adds them. When an attacker inserts the same caption, the resulting code will look like this:

<img src="photo0145.webp" alt="foo onload=alert(&apos;Hacked!&apos;)">

Latte successfully prevented XSS.

Printing a Variable in JavaScript

Thanks to context-sensitive escaping, it is possible to use PHP variables natively inside JavaScript.

<p onclick="alert({$movie})">{$movie}</p>

<script>var movie = {$movie};</script>

If $movie variable stores 'Amarcord & 8 1/2' string it generates the following output. Notice different escaping used in HTML and JavaScript and also in onclick attribute:

<p onclick="alert(&quot;Amarcord &amp; 8 1\/2&quot;)">Amarcord &amp; 8 1/2</p>

<script>var movie = "Amarcord & 8 1\/2";</script>

Latte automatically checks whether the variable used in the src or href attributes contains a web URL (ie protocol HTTP) and prevents the writing of links that may pose a security risk.

{var $link = 'javascript:attack()'}

<a href="{$link}">click here</a>

Writes:

<a href="">click here</a>

The check can be turned off using a filter nocheck.