parser.pyc
1 o 2 5�Hc�E � @ s� d Z ddlZddlZddlmZ dgZe�d�Ze�d�Ze�d�Z e�d�Z 3 e�d �Ze�d 4 �Ze�d�Z e�d�Ze�d �Ze�dej�Ze�d 5 �Ze�d�ZG dd� dej�ZdS )zA parser for HTML and XHTML.� N)�unescape� 6 HTMLParserz[&<]z 7 &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]�>z--\s*>z+([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*aF 8 <[a-zA-Z][^\t\n\r\f />\x00]* # tag name 9 (?:[\s/]* # optional whitespace before attribute name 10 (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name 11 (?:\s*=+\s* # value indicator 12 (?:'[^']*' # LITA-enclosed value 13 |"[^"]*" # LIT-enclosed value 14 |(?!['"])[^>\s]* # bare value 15 ) 16 \s* # possibly followed by a space 17 )?(?:\s|/(?!>))* 18 )* 19 )? 20 \s* # trailing whitespace 21 z#</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>c @ s� e Zd ZdZdZdd�dd�Zdd� Zd d 22 � Zdd� Zd Z dd� Z 23 dd� Zdd� Zdd� Z dd� Zd7dd�Zdd� Zdd� Zdd � Zd!d"� Zd#d$� Zd%d&� Zd'd(� Zd)d*� Zd+d,� Zd-d.� Zd/d0� Zd1d2� Zd3d4� Zd5d6� Zd S )8r aE Find tags and other markup and call handler functions. 24 25 Usage: 26 p = HTMLParser() 27 p.feed(data) 28 ... 29 p.close() 30 31 Start tags are handled by calling self.handle_starttag() or 32 self.handle_startendtag(); end tags by self.handle_endtag(). The 33 data between tags is passed from the parser to the derived class 34 by calling self.handle_data() with the data as argument (the data 35 may be split up in arbitrary chunks). If convert_charrefs is 36 True the character references are converted automatically to the 37 corresponding Unicode character (and self.handle_data() is no 38 longer split in chunks), otherwise they are passed by calling 39 self.handle_entityref() or self.handle_charref() with the string 40 containing respectively the named or numeric reference as the 41 argument. 42 )�script�styleT)�convert_charrefsc C s || _ | �� dS )z�Initialize and reset this instance. 43 44 If convert_charrefs is True (the default), all character references 45 are automatically converted to the corresponding Unicode characters. 46 N)r �reset)�selfr � r 47 �sC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64__qbz5n2kfra8p0\lib\html\parser.py�__init__V s zHTMLParser.__init__c C s( d| _ d| _t| _d| _tj�| � dS )z1Reset this instance. Loses all unprocessed data.� z???N)�rawdata�lasttag�interesting_normal�interesting� 48 cdata_elem�_markupbase� 49 ParserBaser �r r 50 r 51 r r _ s 52 zHTMLParser.resetc C s | j | | _ | �d� dS )z�Feed data to the parser. 53 54 Call this as often as you want, with as little or as much text 55 as you want (may include '\n'). 56 r N)r �goahead�r �datar 57 r 58 r �feedg s zHTMLParser.feedc C s | � d� dS )zHandle any buffered data.� N)r r r 59 r 60 r �closep s zHTMLParser.closeNc C s | j S )z)Return full source of start tag: '<...>'.)�_HTMLParser__starttag_textr r 61 r 62 r �get_starttag_textv s zHTMLParser.get_starttag_textc C s$ |� � | _t�d| j tj�| _d S )Nz</\s*%s\s*>)�lowerr �re�compile�Ir )r �elemr 63 r 64 r �set_cdata_modez s 65 zHTMLParser.set_cdata_modec C s t | _d | _d S �N)r r r r r 66 r 67 r �clear_cdata_mode~ s 68 zHTMLParser.clear_cdata_modec C s: | j }d}t|�}||k �r�| jr;| js;|�d|�}|dk r:|�dt||d ��}|dkr8t�d�� ||�s8�n�|}n| j 69 � ||�}|rI|�� }n| jrN�n�|}||k ro| jrf| jsf| �t |||� �� n | �|||� � | �||�}||kr{�nj|j}|d|��rt�||�r�| �|�} n>|d|�r�| �|�} n3|d|�r�| �|�} n(|d|�r�| �|�} n|d |�r�| �|�} n|d 70 |k r�| �d� |d 71 } n�n| dk �r|sאn|�d|d 72 �} | dk r�|�d|d 73 �} | dk r�|d 74 } n| d 75 7 } | j�r| j�s| �t ||| � �� n | �||| � � | �|| �}n�|d|��rlt�||�}|�rO|�� d d� } 76 | �| 77 � |�� } |d| d 78 ��sH| d 79 } | �|| �}q d||d � v �rk| �|||d � � | �||d �}ny|d|��r�t�||�}|�r�|�d 80 �} 81 | �| 82 � |�� } |d| d 83 ��s�| d 84 } | �|| �}q t�||�}|�r�|�r�|�� ||d � k�r�|�� } | |k�r�|} | �||d 85 �}n|d 86 |k �r�| �d� | �||d 87 �}nnJ d��||k s|�r||k �r| j�s| j�r| j�s| �t |||� �� n | �|||� � | �||�}||d � | _ d S )Nr �<�&�"