On Wed, 2018-02-28,
porp...@gmail.com wrote:
> Hi,
> I would like to parse the following string to extract the following parts:
> std::string unsername;
> std::string host;
> int port;
> std::string db;
>
> protocol://[username:password@]host[:port]/db
>
> [...] designates the optional parts
That looks suspiciously like an URL, or URI or whatever it's called.
Can't you rely on the formal definition in some RFC instead of
describing it vaguely yourself? You can add extra limitations if you
want to (e.g. you seem to mandate a cleartext password with the user
name)?
> My algorithm uses std::string::find_first_of heavily.
I personally don't like the std::string methods, the string::npos and
all that business. Consider using <algorithm> on begin(s) .. end(s),
which IMO feels more idiomatic.
> In fact I don't like it. It doesn't look clean.
A parser doesn't have to look clean. Dump it in a well-documented
function, and write unit tests.
> I wonder whether there is an efficient way of doing this using only
> a standard C++ (11+ allowed) or boost (C++ standard preferred)
> Idealy would be to have only a single pass through the string.
>
> Could you please give me some hints or provide some kind of code snippet.
I second the recommendation of std::regex, or splitting it up a bit by
other means and then using std::regex on some of the parts.
But there are some issues you need to clarify for yourself (e.g. by
using an existing formal definition of the syntax; see above):
- Can the username:password part contain :, @ or /? That would mean
you cannot start by splitting on the third / in the string.
- Can the host contain a :, and how would that work with host:port?
Host names don't contain colons, but IPv6 addresses do. In an URL,
you'd write it like this [::1]:80.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/
snipabacken.se> O o .