Parsing

Avinash PayakAvinash Payak
4 min read

The parsing stage involves taking the raw HTML & CSS code received from the server and converting it into a structured document object model (DOM) that the browser can understand and render onto the screen.

HTML Parsing

Tokenization

  • The browser reads the characters of the HTML file sequentially and breaks them down into tokens.

  • Tokens are essentially the smallest units of the language's syntax, such as tags, attributes, and text content.

Tokenization

Syntax Analysis

  • Once the tokens are generated, the browser uses a parser to interpret the structure and syntax of the HTML code.

  • The parser organizes the tokens into a hierarchical tree structure known as the Document Object Model (DOM).

  • This DOM represents the logical structure of the document and contains elements, attributes, and their relationships.

What is Document Object in Java DOM? - GeeksforGeeks

Script Parsing and Execution

  • In addition to HTML parsing, the browser may also encounter and parse inline or external JavaScript code within <script> tags.

  • Reentrant parsing refers to the ability of a parser to pause parsing at any point, handle another task or event, and then return to parsing from the exact point where it left off without losing context or state.

  • Whenever parser encounters <script> , <link> or <style> tags it pauses the parsing and fetches the code from network for the script and execute the script as it may alter the document.

  • execution of the script can also be deferred or can be executed asynchronously so that parsing can occur uninterrupted

Speculative parsing

Speculative parsing, also known as speculative loading is a technique used by web browsers to improve the performance of web pages by predicting and initiating the loading and parsing of external resources (scripts, stylesheets, and images) before they are explicitly requested by the user.

Prediction

The browser analyzes the current page and makes predictions about which external resources will be needed based on factors such as the HTML structure, previous navigation patterns, and the content of the page.

Preloading

Browser initiates the loading process for these resources in the background, typically using techniques like preloading, prefetching, or prerendering.

Parsing

As the resources are being fetched, the browser may start parsing them before they are actually needed. This involves extracting relevant information from the resource, such as CSS rules or JavaScript code

Execution

the browser may execute the code speculatively if it does not have dependencies on user interactions or the current state of the page.

Caching

Speculatively loaded resources are cached and are not needed at present but if the resources are later requested, the browser may already have them, reducing latency and improving load times.

CSS parsing

It involves interpreting CSS code to determine the styles that apply to each element in the DOM.

Tokenization

The CSS parser starts by tokenizing the input CSS code. It reads the characters sequentially and breaks them down into tokens.

Syntax Analysis

It organizes the tokens into a parse tree, also known as the Abstract Syntax Tree (AST), representing the hierarchical structure of the CSS rules.

Rule validation

During parsing, the parser validates each CSS rule for syntactic correctness. It checks for errors such as missing semicolons, invalid property names, unrecognized selectors, and other syntax violations.

Selector matching

After parsing individual rules, the browser matches selectors against elements in the DOM to determine which rules apply to each element.

Cascading

Once selectors are matched to elements, the parser resolves conflicting styles using the cascade. Conflicting styles are resolved based on specificity, importance (!important), and source order.

Property Parsing

It extracts property names and values from each declaration and applies them to the matched elements in the DOM.

Computed Styles

The browser computes the final styles for each element by combining inherited styles, default styles, and the styles derived from the parsed CSS rules.

HTML parsing determines a structured representation of the document. CSS parsing determines the final presentation of elements on the web page. Both HTML and CSS parsing processes are vital for rendering web content accurately and efficiently.

0
Subscribe to my newsletter

Read articles from Avinash Payak directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Avinash Payak
Avinash Payak