a.b.c.d.e.f.g/1.html
a.b.c.d.e.f.g/
(Note: skip b.c.d.e.f.g, since we'll take only the last five hostname components, and the full hostname)
c.d.e.f.g/1.html
c.d.e.f.g/
d.e.f.g/1.html
d.e.f.g/
e.f.g/1.html
e.f.g/
f.g/1.html
f.g/
对于网址 http://1.2.3.4/1/,客户端将尝试以下可能的字符串:
1.2.3.4/1/
1.2.3.4/
哈希计算
创建一组后缀/前缀表达式后,下一步是计算每个表达式的全长 SHA256 哈希值。以下是一个伪 C 语言单元测试,您可以使用它来验证您的哈希计算。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-25。"],[],[],null,["# Hashing URLs\n============\n\nOverview\n--------\n\nThe Web Risk lists consist of variable length SHA256 hashes. For more\ndetails, see [List Contents](/web-risk/docs/lists#list_contents).\nTo check a URL against a Web Risk list, either locally or on the server,\nclients must first compute the hash prefix of that URL.\n\nTo compute the hash prefix of a URL, follow these steps:\n\n1. Canonicalize the URL as described in [Canonicalization](#canonicalization).\n2. Create the suffix/prefix expressions for the URL as described under [Suffix/Prefix expressions](#suffixprefix_expressions).\n3. Compute the full-length hash for each suffix/prefix expression as described under [Hash computations](#hash_computations).\n4. Compute the hash prefix for each full-length hash, as described in [Hash prefix computations](#hash_prefix_computations).\n\nNote that these steps mirror the process the Web Risk server uses to\nmaintain the Web Risk lists.\n\nCanonicalization\n----------------\n\nTo begin, we assume that the client has parsed the URL and made it valid\naccording to RFC 2396. If the URL uses an Internationalized Domain Name (IDN),\nthe client should convert the URL to the ASCII Punycode representation. The URL\nmust include a path component; that is, it must have a leading slash\n(`http://google.com/`).\n\nFirst, remove tab (`0x09`), CR (`0x0d`), and LF (`0x0a`) characters from\nthe URL. Do not remove escape sequences for these characters, like `%0a`.\n\nSecond, if the URL ends in a fragment, remove the fragment. For example, shorten\n`http://google.com/#frag` to `http://google.com/`.\n\nThird, repeatedly remove percent-escapes from the URL until it has no more\npercent-escapes.\n\n### To canonicalize the hostname\n\nExtract the hostname from the URL and then:\n\n1. Remove all leading and trailing dots.\n2. Replace consecutive dots with a single dot.\n3. If the hostname can be parsed as an IP address, normalize it to 4 dot-separated decimal values. The client should handle any legal IP-address encoding, including octal, hex, and fewer than four components.\n4. Lowercase the whole string.\n\n### To canonicalize the path\n\n1. Resolve the sequences `/../` and `/./` in the path by replacing `/./` with `/`, and removing `/../` along with the preceding path component.\n2. Replace runs of consecutive slashes with a single slash character.\n\nDo not apply these path canonicalizations to the query parameters.\n\nIn the URL, percent-escape all characters that are \\\u003c= ASCII 32, \\\u003e= 127, `#`, or\n`%`. The escapes should use uppercase hex characters.\n\nBelow are tests to help validate a canonicalization implementation. \n\n```gdscript\nCanonicalize(\"http://host/%25%32%35\") = \"http://host/%25\";\nCanonicalize(\"http://host/%25%32%35%25%32%35\") = \"http://host/%25%25\";\nCanonicalize(\"http://host/%2525252525252525\") = \"http://host/%25\";\nCanonicalize(\"http://host/asdf%25%32%35asd\") = \"http://host/asdf%25asd\";\nCanonicalize(\"http://host/%%%25%32%35asd%%\") = \"http://host/%25%25%25asd%25%25\";\nCanonicalize(\"http://www.google.com/\") = \"http://www.google.com/\";\nCanonicalize(\"http://%31%36%38%2e%31%38%38%2e%39%39%2e%32%36/%2E%73%65%63%75%72%65/%77%77%77%2E%65%62%61%79%2E%63%6F%6D/\") = \"http://168.188.99.26/.secure/www.ebay.com/\";\nCanonicalize(\"http://195.127.0.11/uploads/%20%20%20%20/.verify/.eBaysecure=updateuserdataxplimnbqmn-xplmvalidateinfoswqpcmlx=hgplmcx/\") = \"http://195.127.0.11/uploads/%20%20%20%20/.verify/.eBaysecure=updateuserdataxplimnbqmn-xplmvalidateinfoswqpcmlx=hgplmcx/\";\nCanonicalize(\"http://host%23.com/%257Ea%2521b%2540c%2523d%2524e%25f%255E00%252611%252A22%252833%252944_55%252B\") = \"http://host%23.com/~a!b@c%23d$e%25f^00&11*22(33)44_55+\";\nCanonicalize(\"http://3279880203/blah\") = \"http://195.127.0.11/blah\";\nCanonicalize(\"http://www.google.com/blah/..\") = \"http://www.google.com/\";\nCanonicalize(\"www.google.com/\") = \"http://www.google.com/\";\nCanonicalize(\"www.google.com\") = \"http://www.google.com/\";\nCanonicalize(\"http://www.evil.com/blah#frag\") = \"http://www.evil.com/blah\";\nCanonicalize(\"http://www.GOOgle.com/\") = \"http://www.google.com/\";\nCanonicalize(\"http://www.google.com.../\") = \"http://www.google.com/\";\nCanonicalize(\"http://www.google.com/foo\\tbar\\rbaz\\n2\") =\"http://www.google.com/foobarbaz2\";\nCanonicalize(\"http://www.google.com/q?\") = \"http://www.google.com/q?\";\nCanonicalize(\"http://www.google.com/q?r?\") = \"http://www.google.com/q?r?\";\nCanonicalize(\"http://www.google.com/q?r?s\") = \"http://www.google.com/q?r?s\";\nCanonicalize(\"http://evil.com/foo#bar#baz\") = \"http://evil.com/foo\";\nCanonicalize(\"http://evil.com/foo;\") = \"http://evil.com/foo;\";\nCanonicalize(\"http://evil.com/foo?bar;\") = \"http://evil.com/foo?bar;\";\nCanonicalize(\"http://\\x01\\x80.com/\") = \"http://%01%80.com/\";\nCanonicalize(\"http://notrailingslash.com\") = \"http://notrailingslash.com/\";\nCanonicalize(\"http://www.gotaport.com:1234/\") = \"http://www.gotaport.com/\";\nCanonicalize(\" http://www.google.com/ \") = \"http://www.google.com/\";\nCanonicalize(\"http:// leadingspace.com/\") = \"http://%20leadingspace.com/\";\nCanonicalize(\"http://%20leadingspace.com/\") = \"http://%20leadingspace.com/\";\nCanonicalize(\"%20leadingspace.com/\") = \"http://%20leadingspace.com/\";\nCanonicalize(\"https://www.securesite.com/\") = \"https://www.securesite.com/\";\nCanonicalize(\"http://host.com/ab%23cd\") = \"http://host.com/ab%23cd\";\nCanonicalize(\"http://host.com//twoslashes?more//slashes\") = \"http://host.com/twoslashes?more//slashes\";\n```\n\nSuffix/prefix expressions\n-------------------------\n\nAfter the URL is canonicalized, the next step is to create the suffix/prefix\nexpressions. Each suffix/prefix expression consists of a host suffix (or full\nhost) and a path prefix (or full path) as shown in these examples.\n\nThe client will form up to 30 different possible host suffix and path prefix\ncombinations. These combinations use only the host and path components of the\nURL. The scheme, username, password, and port are discarded. If the URL includes\nquery parameters, then at least one combination will include the full path and\nquery parameters.\n\n**For the host**, the client will try at most five different strings. They are:\n\n- The exact hostname in the URL.\n- Up to four hostnames formed by starting with the last five components and successively removing the leading component. The top-level domain can be skipped. These additional hostnames should not be checked if the host is an IP address.\n\n**For the path**, the client will try at most six different strings. They are:\n\n- The exact path of the URL, including query parameters.\n- The exact path of the URL, without query parameters.\n- The four paths formed by starting at the root (`/`) and successively appending path components, including a trailing slash.\n\nThe following examples illustrate the check behavior:\n\nFor the URL `http://a.b.c/1/2.html?param=1`, the client will try these\npossible strings: \n\n```\na.b.c/1/2.html?param=1\na.b.c/1/2.html\na.b.c/\na.b.c/1/\nb.c/1/2.html?param=1\nb.c/1/2.html\nb.c/\nb.c/1/\n```\n\nFor the URL `http://a.b.c.d.e.f.g/1.html`, the client will try these possible\nstrings: \n\n```\na.b.c.d.e.f.g/1.html\na.b.c.d.e.f.g/\n(Note: skip b.c.d.e.f.g, since we'll take only the last five hostname components, and the full hostname)\nc.d.e.f.g/1.html\nc.d.e.f.g/\nd.e.f.g/1.html\nd.e.f.g/\ne.f.g/1.html\ne.f.g/\nf.g/1.html\nf.g/\n```\n\nFor the URL `http://1.2.3.4/1/`, the client will try these possible strings: \n\n```\n1.2.3.4/1/\n1.2.3.4/\n```\n\nHash computations\n-----------------\n\nAfter the set of suffix/prefix expressions has been created, the next step is to\ncompute the full-length SHA256 hash for each expression. Below is a unit test\nin pseudo-C that you can use to validate your hash computations.\n\nExamples from\n[FIPS-180-2](http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf): \n\n```transact-sql\n// Example B1 from FIPS-180-2\nstring input1 = \"abc\";\nstring output1 = TruncatedSha256Prefix(input1, 32);\nint expected1[] = { 0xba, 0x78, 0x16, 0xbf };\nassert(output1.size() == 4); // 4 bytes == 32 bits\nfor (int i = 0; i \u003c output1.size(); i++) assert(output1[i] == expected1[i]);\n\n// Example B2 from FIPS-180-2\nstring input2 = \"abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq\";\nstring output2 = TruncatedSha256Prefix(input2, 48);\nint expected2[] = { 0x24, 0x8d, 0x6a, 0x61, 0xd2, 0x06 };\nassert(output2.size() == 6);\nfor (int i = 0; i \u003c output2.size(); i++) assert(output2[i] == expected2[i]);\n\n// Example B3 from FIPS-180-2\nstring input3(1000000, 'a'); // 'a' repeated a million times\nstring output3 = TruncatedSha256Prefix(input3, 96);\nint expected3[] = { 0xcd, 0xc7, 0x6e, 0x5c, 0x99, 0x14, 0xfb, 0x92,\n 0x81, 0xa1, 0xc7, 0xe2 };\nassert(output3.size() == 12);\nfor (int i = 0; i \u003c output3.size(); i++) assert(output3[i] == expected3[i]);\n```\n\nHash prefix computations\n------------------------\n\nFinally, the client needs to compute the hash prefix for each full-length SHA256\nhash. For Web Risk, a hash prefix consists of the most significant 4-32\nbytes of a SHA256 hash.\n\nExamples from\n[FIPS-180-2](http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf):\n\n- Example B1 from FIPS-180-2\n - Input is \"abc\".\n - SHA256 digest is `ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c\n b410ff61 f20015ad`.\n - The 32-bit hash prefix is `ba7816bf`.\n- Example B2 from FIPS-180-2\n - Input is `abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq`.\n - SHA256 digest is `248d6a61 d20638b8 e5c02693 0c3e6039 a33ce459\n 64ff2167 f6ecedd4 19db06c1`.\n - The 48-bit hash prefix is `248d6a61 d206`."]]