Whilethe Domain Name System (DNS) technically supports arbitrary sequences of octets in domain name labels, the DNS standards recommend the use of the LDH subset of ASCII conventionally used for host names, and require that string comparisons between DNS domain names should be case-insensitive. The Punycode syntax is a method of encoding strings containing Unicode characters, such as internationalized domain names (IDNA), into the LDH subset of ASCII favored by DNS. It is specified in IETF Request for Comments 3492.[1]
As stated in RFC 3492, "Punycode is an instance of a more general algorithm called Bootstring, which allows strings composed from a small set of 'basic' code points to uniquely represent any string of code points drawn from a larger set." Punycode defines parameters for the general Bootstring algorithm to match the characteristics of Unicode text. This section demonstrates the procedure for Punycode encoding, using the example of the string "bcher" (Bcher is German for books), which is translated into the label "bcher-kva".
To make the encoding and decoding algorithms simple, no attempt has been made to prevent some encoded values from encoding inadmissible Unicode values: however, these should be checked for and detected during decoding.
Punycode is designed to work across all scripts, and to be self-optimizing by attempting to adapt to the character set ranges within the string as it operates. It is optimized for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalized using nameprep and (for top-level domains) filtered against an officially registered language table before being punycoded, and that the DNS protocol sets limits on the acceptable lengths of the output Punycode string.
Note that hyphens are themselves ASCII characters. Thus, they can be present in the input and, if so, they will be copied to the output. This causes no ambiguity: if the output contains hyphens, the one that got added is always the last one. It marks the end of the ASCII characters.
The non-ASCII characters are sorted by Unicode value, lowest first (if a character occurs more than once they are sorted by position). Each is then encoded as a single number. This single number defines both the location to insert the character at and which character to insert.
The number is encoded using the letters "a" through "z" and the digits "0" through "9". It is not base-36 but a more complex scheme described below, which allows the numbers to be concatenated, with nothing separating them.
A number system with little-endian ordering is used which allows variable-length codes without separate delimiters: a digit lower than a threshold value marks that it is the most-significant digit, hence the end of the number. The threshold value depends on the position in the number and also on previous insertions, to increase efficiency. Correspondingly the weights of the digits vary.
In this case a number system with 36 symbols is used, with the case-insensitive 'a' through 'z' equal to the decimal numbers 0 through 25, and '0' through '9' equal to the decimal numbers 26 through 35. Thus "kva", corresponds to the decimal number string "10 21 0".
The thresholds themselves are determined for each successive encoded character by an algorithm keeping them between 1 and 26 inclusive.[3] The case can then be used to provide information about the original case of the string.[4]
Because special characters are sorted by their code points by encoding algorithm, for the insertion of a second special character in "bcher", the first possibility is "bcher" with code "bcher-kvaa", the second "bcher" with code "bcher-kvab", etc. After "bcher" with code "bcher-kvae" comes codes representing insertion of , the Unicode character following , starting with "bcher" with code "bcher-kvaf" (different from "bcher" coded "bcher-jvab"), etc.
To prevent hyphens in non-international domain names from triggering a Punycode decoding, the string xn-- is prepended to Punycode sequences in internationalized domain names. This is called ACE (ASCII Compatible Encoding).[5]
As PouchContainer keeps iterating and improving functions, the project scale grows larger, attracting many external developers for project participation. Because coding habits vary among contributors, code reviewers shall pay attention to the coding style in addition to logic correctness and performance, because consistent code specification is a premise for keeping project code maintainable. In addition to a consistent coding style, the coverage rate and stability of test cases are also the project focus. How can we ensure each code update has zero impact on existing functions in a project without regression test items?
PouchContainer is a project constructed using Golang. It uses shell scripts to complete automatic operations such as compiling and packaging. In addition to Golang and shell scripts, PouchContainer includes many Markdown documents to help users understand PouchContainer. The standard typography and correct spelling of the documents are the focus of projects. The following describes the tools and use cases of PouchContainer in terms of coding style specification.
Golang has simple syntax, and the complete CodeReview guide of the community from the start helps achieve a consistent coding style across many Golang projects and minimize disputes. Based on the conventions in the developer community, PouchContainer defines specific rules for developers to follow, so as to ensure code readability. For more information, read the code style rules.
However, it is difficult to keep a consistent coding style for projects solely based on written specification. Similar to other programming languages, Golang provides basic tool chains such as golint, gofmt, goimports, and go vet used to check and unify the coding style, making it possible to automate code review and subsequent processes. Currently, PouchContainer runs the preceding code check tools in CircleCI to check every pull request submitted by developers. If an error is returned, the code reviewer can reject review and code merge.
In addition to the tools provided by Golang, we can select third-party code check tools such as errcheck in open source communities to check whether developers have handled the errors returned by functions. However, these tools lack a consistent output format, making it difficult to normalize the outputs of different tools. Open source communities provide gometalinter to normalize various code check tools. The following combination is recommended:
Despite powerful functions, shell scripts require syntax check to avoid potential and unpredictable errors. For example, unused variables may be defined. Though such variables do not affect the use of scripts, they may be a burden on project maintainers.
PouchContainer uses shellcheck to check the shell scripts of the current project. Take the preceding code as an example, shellcheck generates an alarm about unused variables. The shellcheck tool can identify the potential problems of shell scripts during code review to reduce the error probability during execution.
The current continuous integration task of PouchContainer scans the .sh scripts of the project and uses shellcheck to check the scripts one by one. For more information, read the shellcheck documentation.
As an open source project, PouchContainer attaches equal importance to documents and code, because documents are the optimal way users can understand PouchContainer. Documents are prepared using Markdown, and their typography and spelling are the project focus.
Written specification is not enough to avoid false negatives in document checking, just like in the case of code checking. Therefore, PouchContainer uses markdownlint and misspell to check the typography and spelling of documents. Such checking is as important as golint and is performed on each pull request in CircleCI. If an error is returned, the code reviewer can reject review or code merge.
A unit test ensures the correctness of a single module. In a test pyramid, a unit test with wider coverage of more functions is more likely to reduce the debugging costs of integration testing and end-to-end testing. In a complex system, a longer link of task processing results in a higher cost of problem locating, especially problems caused by minor modules. The following lists the conclusions on how to compile Golang unit test cases in PouchContainer.
Simply put, a unit test is intended to determine whether the output of a function meets expectations based on a given function input. When a tested function has various input scenarios, we can organize test cases in Table-Driven mode. See the following code. Table-Driven uses arrays to organize test cases, and verify the correctness of functions by means of cyclic execution.
To debug and maintain test cases with ease, we can add auxiliary information to describe the current test. For example, when reference tests the input of punycode without adding punycode, the code reviewer or project maintainer may not know the differences between xn--bcher-kva.tld/redis:3 and
docker.io/library/redis:3.
For a function with complex behaviors, one input is not enough for executing a complete test case. In TestTeeReader, for example, data reading is complete after TeeReader reads hello, world from the buffer, and further reading is expected to encounter an "end-of-file" error. Such a test case must be executed independently rather than using Table-Driven.
Simply put, if you copy a large portion of code when testing a function, in principle, the expected test code can be fully extracted and used to organized test cases in Table-Driven mode. Be sure to follow the "Don't Repeat Yourself" rule.
Dependencies are frequently encountered during testing. For example, a PouchContainer client requires an HTTP server. However, such dependencies exceed the processing capability of units and fall in the integration test scope. How can we complete these unit tests?
3a8082e126