Verilog usually doesn't like variable slice widths. There may be OK with the generate because the loop is only evaluated "presynthesis". You could rewrite using nested loops with the inner loop accessing only a single bit and the outer loop having a variable length. Cumbersome, but it works.
Here's another style that works:
reg [15:0] grant_simple;
reg [15:0] req_in = 16'hF0F0;
always@(*) begin
reg grant_superceded_var = req_in[0]; // 1=>grant has already been superceded by previous req
grant_simple[0] = req_in[0];
for (int i=1; i<16; i++) begin
grant_simple[i] = req_in[i] && !grant_superceded_var;
grant_superceded_var = grant_superceded_var || req_in[i];
end
$display("grant_simple: 0x%h",grant_simple);
end
Result: grant_simple: 0x0010
This is another style from the Altera Synthesis Cookbook implementing a "trailing one detector". It's not intuitive, but it works very well for large N because it uses the fast carry chain in an FPGA:
reg [15:0] req_in = 16'hF0F0;
wire [15:0] grant2 = req_in & ~(req_in-1); // Trailing 1 detect
initial #10ps $display("grant2: 0x%h",grant2);
Result: grant2: 0x0010