api.pyc
1 o 2 ��Yc�J � @ s d dl Z d dlZd dlmZ d dlmZmZ d dlmZm Z m 3 Z 4 mZmZ ddl mZmZmZmZ ddlmZmZmZmZ ddlmZ dd lmZmZ dd 5 lmZmZmZm Z m!Z!m"Z"m#Z# e �$d�Z%e �&� Z'e'�(e �)d�� d&de*de+de+de,dee 6 e- dee 7 e- de.de.defdd�Z/ d&de de+de+de,dee 8 e- dee 9 e- de.de.defdd�Z0 d&d d!de+de+de,dee 10 e- dee 11 e- de.de.defd"d#�Z1 d'd d!de+de+de,dee 12 e- dee 13 e- de.defd$d%�Z2dS )(� N)�PathLike)�basename�splitext)�Any�BinaryIO�List�Optional�Set� )�coherence_ratio�encoding_languages�mb_encoding_languages�merge_coherence_ratios)�IANA_SUPPORTED�TOO_BIG_SEQUENCE�TOO_SMALL_SEQUENCE�TRACE)� 14 mess_ratio)�CharsetMatch�CharsetMatches)�any_specified_encoding�cut_sequence_chunks� iana_name�identify_sig_or_bom� is_cp_similar�is_multi_byte_encoding�should_strip_sig_or_bom�charset_normalizerz)%(asctime)s | %(levelname)s | %(message)s� � 皙�����?TF� sequences�steps� 15 chunk_size� threshold�cp_isolation�cp_exclusion�preemptive_behaviour�explain�returnc - C s� t | ttf�std�t| ����|rtj}t�t � t� 16 t� t| �} | dkrGt� d� |r;t�t � t� 17 |p9tj� tt| dddg d�g�S |dur]t�td d 18 �|�� dd� |D �}ng }|durut�td d 19 �|�� dd� |D �}ng }| || kr�t�td||| � d}| }|dkr�| | |k r�t| | �}t| �tk } 20 t| �tk}| 21 r�t�td�| �� n|r�t�td�| �� g }|r�t| �nd} | dur�|�| � t�td| � t� }g }g }d}d}d}t� }t| �\}}|du�r|�|� t�tdt|�|� |�d� d|v�r|�d� |t D �]�}|�r!||v�r!�q|�r+||v �r+�q||v �r2�q|�|� d}||k}|�oCt|�}|dv �rU|�sUt�td|� �qzt|�}W n t t!f�yo t�td|� Y �qw z9|�r�|du �r�t"|du �r�| dtd�� n | t|�td�� |d� nt"|du �r�| n| t|�d� |d�}W n+ t#t$f�y� } zt |t$��s�t�td|t"|�� |�|� W Y d}~�qd}~ww d}|D ] }t%||��r�d} n�q�|�r�t�td||� �qt&|�s�dnt|�| t| | ��}|�o|du�ot|�| k } | �rt�td|� tt|�d �}!t'|!d �}!d}"d}#g }$g }%z9t(| ||||||||� D ]*}&|$�|&� |%�t)|&|�� |%d! |k�rY|"d7 }"|"|!k�sf|�rh|du �rh n�q?W n! t#�y� } zt�td"|t"|�� |!}"d}#W Y d}~nd}~ww |#�s�|�r�|�s�z| td#�d� j*|d$d%� W n# t#�y� } zt�td&|t"|�� |�|� W Y d}~�qd}~ww |%�r�t+|%�t|%� nd}'|'|k�s�|"|!k�r|�|� t�td'||"t,|'d( d)d*�� |dd| fv �r|#�st| ||dg |�}(|| k�r|(}n 22 |dk�r|(}n|(}�qt�td+|t,|'d( d)d*�� |�s2t-|�})nt.|�})|)�rEt�td,�|t"|)��� g }*|dk�re|$D ]}&t/|&d-|)�r[d.�|)�nd�}+|*�|+� �qNt0|*�},|,�rvt�td/�|,|�� |�t| ||'||,|�� || ddfv �r�|'d-k �r�t� d0|� |�r�t�t � t� 23 |� t|| g� S ||k�r�t� d1|� |�r�t�t � t� 24 |� t|| g� S �qt|�dk�r&|�s�|�s�|�r�t�td2� |�r�t� d3|j1� |�|� n2|�r�|du �s|�r |�r |j2|j2k�s|du�rt� d4� |�|� n |�r&t� d5� |�|� |�r8t� d6|�3� j1t|�d � nt� d7� |�rJt�t � t� 25 |� |S )8ae 26 Given a raw bytes sequence, return the best possibles charset usable to render str objects. 27 If there is no results, it is a strong indicator that the source is binary/not text. 28 By default, the process will extract 5 blocs of 512o each to assess the mess and coherence of a given sequence. 29 And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will. 30 31 The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page 32 but never take it for granted. Can improve the performance. 33 34 You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that 35 purpose. 36 37 This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32. 38 By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain' 39 toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging. 40 Custom logging format and handler can be set manually. 41 z4Expected object of type bytes or bytearray, got: {0}r z<Encoding detection on empty bytes, assuming utf_8 intention.�utf_8g F� Nz`cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : %s.z, c S � g | ]}t |d ��qS �F�r ��.0�cp� r2 ��C:\Users\Jacks.GUTTSPC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\charset_normalizer\api.py� 42 <listcomp>[ � zfrom_bytes.<locals>.<listcomp>zacp_exclusion is set. use this flag for debugging purpose. limited list of encoding excluded : %s.c S r, r- r. r/ r2 r2 r3 r4 f r5 z^override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.r 43 z>Trying to detect encoding from a tiny portion of ({}) byte(s).zIUsing lazy str decoding because the payload is quite large, ({}) byte(s).z@Detected declarative mark in sequence. Priority +1 given for %s.zIDetected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.�ascii> �utf_16�utf_32z[Encoding %s wont be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.z2Encoding %s does not provide an IncrementalDecoderg ��A)�encodingz9Code page %s does not fit given bytes sequence at ALL. %sTzW%s is deemed too similar to code page %s and was consider unsuited already. Continuing!zpCode page %s is a multi byte encoding table and it appear that at least one character was encoded using n-bytes.� � �����zaLazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %sg j�@�strict)�errorsz^LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %szc%s was excluded because of initial chaos probing. Gave up %i time(s). Computed mean chaos is %f %%.�d � )�ndigitsz=%s passed initial chaos probing. Mean measured chaos is %f %%z&{} should target any language(s) of {}g�������?�,z We detected language {} using {}z.Encoding detection: %s is most likely the one.zoEncoding detection: %s is most likely the one as we detected a BOM or SIG within the beginning of the sequence.zONothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.z7Encoding detection: %s will be used as a fallback matchz:Encoding detection: utf_8 will be used as a fallback matchz:Encoding detection: ascii will be used as a fallback matchz]Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.z=Encoding detection: Unable to determine any suitable charset.)4� 44 isinstance� bytearray�bytes� TypeError�format�type�logger�level� 45 addHandler�explain_handler�setLevelr �len�debug� removeHandler�logging�WARNINGr r �log�join�intr r r �append�setr r �addr r �ModuleNotFoundError�ImportError�str�UnicodeDecodeError�LookupErrorr �range�maxr r �decode�sum�roundr r r r r9 �fingerprint�best)-r! r"