Hi Niek,
fmpz_set_str just calls GMP for input smaller than about 10000 digits. Unless you're parsing million-digit integers and you compiled FLINT with --enable-avx2 so that the new FFT is used, you're just looking at the performance of GMP's mpn_set_str.
The basecase algorithm used in GMP for numbers up to a few limbs is AFAIK just a nested Horner's rule: you have an inner base-10 Horner loop that accumulates 19 digits in a single word and an outer base-10^19 Horner loop that updates your multi-limb result using mpn_mul_1 and mpn_add_1. For larger numbers GMP uses the standard divide and conquer algorithm, which benefits from reusing a single table of powers 10^(2^n) throughout the tree.
If you implement the same method using GMP's low-level interface you should be able to get close to GMP's speed. If you're comparing GMP against something not based on GMP, probably the difference is that GMP's basic arithmetic is much faster. Other than that, memory management could be an issue; GMP does careful temporary allocations on the stack and tries hard to reuse scratch space within an algorithm; you definitely don't want any mallocs anywhere near the basecase code.
Best,
Fredrik