WP_REST_URL_Details_Controller::get_meta_with_content_elements() – Gets all the meta tag elements that have a ‘content’ attribute.

You appear to be a bot. Output may be restricted

Description

Gets all the meta tag elements that have a 'content' attribute.

Usage

$array = WP_REST_URL_Details_Controller::get_meta_with_content_elements( $html );

Parameters

$html
( string ) required – The string of HTML to be parsed.
$0
( string[] ) required – Meta elements with a content attribute.
$1
( string[] ) required – Content attribute's opening quotation mark.
$2
( string[] ) required – Content attribute's value for each meta element. }

Returns

array { A multi-dimensional indexed array on success, else empty array.

Source

File name: wordpress/wp-includes/rest-api/endpoints/class-wp-rest-url-details-controller.php
Lines:

1 to 66 of 66
  private function get_meta_with_content_elements( $html ) {
    /*
		 * Parse all meta elements with a content attribute.
		 *
		 * Why first search for the content attribute rather than directly searching for name=description element?
		 * tl;dr The content attribute's value will be truncated when it contains a > symbol.
		 *
		 * The content attribute's value (i.e. the description to get) can have HTML in it and be well-formed as
		 * it's a string to the browser. Imagine what happens when attempting to match for the name=description
		 * first. Hmm, if a > or /> symbol is in the content attribute's value, then it terminates the match
		 * as the element's closing symbol. But wait, it's in the content attribute and is not the end of the
		 * element. This is a limitation of using regex. It can't determine "wait a minute this is inside of quotation".
		 * If this happens, what gets matched is not the entire element or all of the content.
		 *
		 * Why not search for the name=description and then content="(.*)"?
		 * The attribute order could be opposite. Plus, additional attributes may exist including being between
		 * the name and content attributes.
		 *
		 * Why not lookahead?
		 * Lookahead is not constrained to stay within the element. The first <meta it finds may not include
		 * the name or content, but rather could be from a different element downstream.
		 */
    $pattern = '#<meta\s' .

        /*
				 * Allows for additional attributes before the content attribute.
				 * Searches for anything other than > symbol.
				 */
        '[^>]*' .

        /*
				* Find the content attribute. When found, capture its value (.*).
				*
				* Allows for (a) single or double quotes and (b) whitespace in the value.
				*
				* Why capture the opening quotation mark, i.e. (["\']), and then backreference,
				* i.e \1, for the closing quotation mark?
				* To ensure the closing quotation mark matches the opening one. Why? Attribute values
				* can contain quotation marks, such as an apostrophe in the content.
				*/
        'content=(["\']??)(.*)\1' .

        /*
				* Allows for additional attributes after the content attribute.
				* Searches for anything other than > symbol.
				*/
        '[^>]*' .

        /*
				* \/?> searches for the closing > symbol, which can be in either /> or > format.
				* # ends the pattern.
				*/
        '\/?>#' .

        /*
				* These are the options:
				* - i : case insensitive
				* - s : allows newline characters for the . match (needed for multiline elements)
				* - U means non-greedy matching
				*/
        'isU';

    preg_match_all( $pattern, $html, $elements );

    return $elements;
  }
 

 View on GitHub View on Trac